QUICK REVIEW

[Paper Review] Machine Learning approach to boosting neutral particles identification in the LHCb calorimeter

A. S. Boldyrev, V. Chekalina|arXiv (Cornell University)|Dec 18, 2019

Particle physics theoretical and experimental studies2 references1 citations

TL;DR

This paper proposes a machine learning approach to improve identification of boosted neutral particles—specifically photons versus merged neutral pions (π⁰)—in the LHCb electromagnetic calorimeter (ECAL). By using raw energy deposits in a 5×5 ECAL and preshower (PS) cell window as input features, and training a XGBoost-based classifier, the method achieves a 0.97 ROC AUC, reducing the fake rate from 60% to 30% at 98% photon efficiency, with negligible energy dependence.

ABSTRACT

We present a new approach to identification of boosted neutral particles using Electromagnetic Calorimeter (ECAL) of the LHCb detector. The identification of photons and neutral pions is currently based on the geometric parameters which characterise the expected shape of energy deposition in the calorimeter. This allows to distinguish single photons in the electromagnetic calorimeter from overlapping photons produced from high momentum $\pi^0$ decays. The novel approach proposed here is based on applying machine learning techniques to primary calorimeter information, that are energies collected in individual cells around the energy cluster. This method allows to improve separation performance of photons and neutral pions and has no significant energy dependence.

Motivation & Objective

To improve the separation of high-momentum photons from merged neutral pions (π⁰) in the LHCb ECAL, which currently suffer from misidentification due to overlapping energy clusters.
To develop a machine learning model that uses raw ECAL and preshower (PS) cell energy deposits as input, avoiding physics-based feature engineering.
To achieve a classification method with minimal energy dependence to reduce systematic uncertainties in physics analyses.
To validate the method on Monte Carlo simulations and calibrate performance using real data samples from B⁰ → Kπγ and B⁰ → Kππ⁰ decays.
To enable reliable transfer of the trained model from simulation to real data by addressing MC/data discrepancies in input variables.

Proposed method

The method uses a 5×5 window of raw energy deposits from ECAL and PS cells around the cluster seed as input features, totaling 50 features.
A XGBoost classifier is trained on these raw energy values to distinguish between single photons and merged π⁰ decays.
The model is optimized using hyperparameter tuning via ModelGym, with default XGBoost settings: 6000 trees, max depth 3, learning rate 0.05, min child weight 2.
Performance is evaluated using ROC curves and efficiency profiles across different transverse energy (ET) bins.
Calibration is performed using real data samples: B⁰ → Kπγ for photons and B⁰ → Kππ⁰ (via J/ψ → μ⁺μ⁻) for π⁰, ensuring kinematic similarity to signal events.
The method is compared to the baseline shape-based approach using geometric cluster features, with both methods evaluated on MC samples and real data calibration.

Experimental results

Research questions

RQ1Can a machine learning model trained on raw ECAL and PS energy deposits outperform traditional shape-based methods in distinguishing photons from merged π⁰?
RQ2Does the proposed ML approach exhibit negligible dependence on the transverse energy of the particle, thereby reducing systematic uncertainties in physics analyses?
RQ3How well does the model trained on Monte Carlo simulations perform when applied to real data, and what calibration is required to ensure unbiased performance?
RQ4Can a simple, feature-agnostic approach using only raw energy deposits achieve superior discrimination compared to physics-informed geometric features?
RQ5What is the optimal classifier architecture (e.g., XGBoost vs. neural networks) for this task, and how do different boosting algorithms compare?

Key findings

The new XGBoost-based approach achieves an ROC AUC of 0.97, compared to 0.89 for the baseline shape-based method, indicating a significant improvement in discrimination performance.
At 98% photon efficiency, the fake rate for π⁰ misidentified as photons is reduced from approximately 60% to 30% using the new method.
The new approach shows a flat efficiency profile across transverse energy (ET) bins, indicating negligible energy dependence, which is crucial for minimizing systematic uncertainties.
Among tested classifiers, XGBoost outperformed neural network configurations, particularly those with 3–4 hidden layers, which degraded in performance due to insufficient feature complexity.
The method demonstrates robustness when calibrated on real data samples, including B⁰ → Kπγ and B⁰ → J/ψK* → Kππ⁰, confirming its applicability to real-world conditions.
The use of raw energy deposits without physics-based feature engineering enables a more generalizable and transferable model, suitable for integration into future neutral particle identification pipelines.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.