Discriminating QCD Compton and Quark-Antiquark Annihilation Processes in $γ$ + Jets Using Interpretable Machine Learning
We investigate how effectively final-state jet substructure can discriminate between QCD Compton and quark-antiquark annihilation processes from photon-jet production in $pp$ collisions at $\sqrt{s}=13$ TeV. Using infrared- and collinear-safe jet observables, multivariate classifiers – boosted decision trees and multilayer perceptrons – are trained on labeled quark- and gluon-initiated jets from dijet events and applied to photon-jet samples. Observables probing soft and wide-angle radiation, in particular jet multiplicity and jet girth, dominate the discrimination. The jet mass provides a complementary but weaker contribution, while the jet charge exhibits negligible discriminating power. A comparison of the two classifiers demonstrates that the achievable separation is limited primarily by QCD radiation effects rather than by classifier complexity. These findings quantify the extent to which information about the underlying hard process survives hadronization and realistic jet reconstruction, providing a physics-driven baseline for precision jet measurements in $pp$, $ep/$A, and heavy-ion collisions.
💡 Research Summary
In this work the authors address the problem of distinguishing the two leading-order QCD mechanisms that produce a photon‑tagged jet in proton‑proton collisions at √s = 13 TeV: the QCD Compton process (q g → q γ) and the quark‑antiquark annihilation process (q \bar q → g γ). Because the recoil jet in the Compton channel is initiated by a quark while in the annihilation channel it is initiated by a gluon, the two classes exhibit different colour factors (C_F = 4/3 versus C_A = 3) and consequently different radiation patterns. The authors ask how much of this partonic information survives parton showering, hadronisation, and realistic jet reconstruction, and whether it can be extracted with experimentally accessible observables.
Simulation and jet definition
Events are generated with PYTHIA 8 using the “Detroit” tune, requiring a hard scattering scale \hat p_T > 30 GeV. Two samples are produced: (i) inclusive dijet events, which serve as a labelled training set, and (ii) γ + jet events, which are used for the physics measurement. Jets are clustered from final‑state particles with the anti‑k_T algorithm (R = 0.5) as implemented in FastJet. A parton‑level matching (ΔR < 0.25) assigns a quark or gluon label to each jet in the dijet sample; the same reconstruction is applied to the γ + jet sample, and the flavour of the recoil jet is inferred from the underlying hard process (Compton → quark‑like, annihilation → gluon‑like).
Observables
The study uses a compact, infrared‑ and collinear‑safe set of jet substructure variables:
- Jet transverse momentum (p_T), pseudorapidity (η) and azimuth (φ) – included to monitor possible kinematic biases.
- Jet mass (M_jet) – sensitive to the overall amount of radiation inside the jet.
- Charged particle multiplicity (N_const) – directly related to the colour factor ratio ⟨N_g⟩/⟨N_q⟩ ≈ C_A/C_F.
- Girth (g) – a linear radial moment that quantifies how broadly the transverse momentum is distributed.
- Jet charge (Q_ch, κ = 0.5) – a weighted sum of constituent charges, expected to have little discriminating power.
Figure 1 of the paper shows that, even at the level of single‑variable distributions, quark‑ and gluon‑initiated jets differ: gluon jets have larger multiplicities, broader girth, and slightly higher masses, while the jet charge distributions overlap almost completely.
Machine‑learning classifiers
Two well‑established multivariate methods are employed:
Multilayer Perceptron (MLP) – Implemented with TensorFlow/Keras, the network consists of seven input nodes (the observables above), a single hidden layer with eight ReLU neurons, and a sigmoid output node. The output score (0–1) is interpreted as the probability that a jet is quark‑like. A threshold of 0.5 defines the binary decision.
Boosted Decision Tree (BDT) – Built with TMVA (ROOT), the ensemble contains 1000 trees of maximum depth five, trained with gradient boosting (learning rate 0.05) and bagging (80 % of the training sample per tree). The Gini index is used for node splitting, and cost‑complexity pruning prevents over‑training. The BDT score is similarly mapped to a quark‑/gluon decision.
Both classifiers are trained on the dijet sample and then applied unchanged to the γ + jet events, ensuring that any performance difference originates from the physics of the jets rather than from reconstruction artifacts.
Performance evaluation
The authors present ROC curves for each single observable and for the two multivariate classifiers. The key findings are:
- The multiplicity (N_const) and girth (g) provide the strongest single‑variable discrimination, achieving area‑under‑curve (AUC) values around 0.70.
- Jet mass contributes modestly (AUC ≈ 0.60), while jet charge is essentially useless (AUC ≈ 0.51, i.e. random guessing).
- Combining all variables in either the BDT or the MLP raises the AUC by roughly 0.07–0.08 relative to the best single variable, demonstrating that correlations among observables carry additional information.
- The BDT and MLP performances are virtually identical across the full p_T range; the more sophisticated neural‑network does not outperform the tree‑based method, indicating that the discrimination ceiling is set by the physics rather than by model capacity.
Variable‑importance scores extracted from the BDT confirm the visual impression: N_const and girth dominate, followed by jet mass; p_T, η, φ are of secondary relevance, and jet charge receives negligible weight.
p_T dependence
The authors study the AUC as a function of jet transverse momentum. At low p_T (≈ 50 GeV) the discrimination is modest; it improves rapidly up to ≈ 150 GeV where the AUC peaks (≈ 0.78). Beyond ≈ 200 GeV the curve flattens, suggesting that additional phase space for radiation does not translate into better separation. This saturation is interpreted as a manifestation of QCD dynamics: Sudakov form factors and colour coherence limit the amount of distinguishable soft‑wide‑angle radiation that can be captured by the chosen observables.
Physical interpretation and implications
The analysis confirms the textbook expectation that gluon jets, because of their larger colour factor, contain more soft particles and have a broader energy profile. Importantly, the study demonstrates that these differences survive the full chain of parton shower, hadronisation, and detector‑level jet reconstruction, at least in a realistic Monte‑Carlo environment. The fact that a simple BDT can achieve essentially the same performance as a shallow neural network underscores the interpretability of the result: the discrimination is driven by a few physically motivated features rather than by opaque high‑dimensional representations.
From an experimental perspective, the findings provide a physics‑driven baseline for photon‑tagged jet studies. In heavy‑ion collisions, where one wishes to separate quark‑ and gluon‑dominated jet samples to study colour‑charge dependent energy loss, the same methodology could be applied, albeit with additional complications from background subtraction and medium‑induced modifications. The authors suggest that extending the study to other collision systems (p‑A, e‑A, RHIC, future EIC) and to real data will be valuable for establishing robust, process‑level jet tagging tools.
Conclusions
The paper delivers a clear message: jet substructure observables, especially multiplicity and girth, encode enough information to distinguish the underlying QCD Compton and quark‑antiquark annihilation mechanisms in γ + jet events. The achievable separation is limited by intrinsic QCD radiation patterns rather than by the sophistication of the machine‑learning algorithm. This work therefore supplies a transparent, experimentally feasible framework for process‑level jet discrimination, which can serve as a benchmark for precision QCD measurements and for studies of jet quenching in the quark‑gluon plasma.
Comments & Academic Discussion
Loading comments...
Leave a Comment