CNN on `Top': In Search of Scalable & Lightweight Image-based Jet Taggers
While Transformer-based and standard Graph Neural Networks (GNNs) have proven to be the best performers in classifying different types of jets, they require substantial computational power. We explore the scope of using a lightweight and scalable version of EfficientNet architecture, along with global features of the jet. The end product is computationally inexpensive but is capable of competitive performance. We showcase the efficacy of our network in tagging top-quark jets in a sea of other light quark and gluon jets. The work also sheds light on the importance of global features for both the accuracy and the apparent redundancy of the network’s complexity.
💡 Research Summary
The paper addresses the growing need for fast and resource‑efficient jet tagging algorithms in the high‑luminosity phase of the Large Hadron Collider (LHC). While transformer‑based and graph neural network (GNN) models have set the benchmark for top‑quark jet classification, their large parameter counts and intensive message‑passing operations make them impractical for real‑time or large‑scale offline analyses on modest hardware. To overcome this limitation, the authors propose a lightweight convolutional neural network (CNN) built on a scaled‑down EfficientNet architecture, augmented with a set of global jet observables.
Dataset and preprocessing
Events are generated with Pythia 8 at √s = 14 TeV and processed through Delphes detector simulation. Anti‑kT jets with radius R = 0.8 are reconstructed, yielding one million top‑quark (signal) jets and one million light‑quark/gluon (background) jets in the transverse momentum range 550–650 GeV and |η| < 2. For each jet, up to 200 constituents are retained. The constituents are projected onto a 2‑D grid in the Δη–Δφ plane, producing three‑channel images (p_T, mass, energy) of size 35 × 35 or 40 × 40 pixels. The hardest constituent defines the image centre; the remaining particles are binned, standardized (mean‑std subtraction), and centrally cropped to 28 × 28 or 32 × 32. This preprocessing preserves the jet’s core while reducing background noise.
Global features
In addition to the images, the authors compute a suite of high‑level jet observables: four‑momentum (p_T, η, φ, m), constituent multiplicity, N‑subjettiness ratios (τ_21, τ_32, τ_43, τ_54, τ_65 for β = 0.5, 1.0, 2.0), and several Energy Correlation Function (ECF) series (C, D, U, M, N, L). These 30‑plus variables are calculated with FastJet and its contrib packages. Jets with fewer than four constituents are discarded, leaving essentially the full dataset for analysis.
Model architecture
LeNet‑5 serves as a baseline small‑CNN. The primary models are two EfficientNet variants: a “small” (EffNet‑S) and a “baseline” (EffNet‑B0) version. Both employ MBConv blocks with depth‑wise separable convolutions and squeeze‑and‑excitation (SE) modules, following the compound scaling principle (depth coefficient d = α^φ, width coefficient w = β^φ, resolution coefficient r = γ^φ). The resulting networks contain fewer than 0.5 M trainable parameters, far less than typical ImageNet‑trained EfficientNets. After the convolutional trunk, a fully‑connected (FC) head concatenates the image‑derived features with the global observables before producing a binary output.
Training
Training is performed on a desktop equipped with an Intel i9‑13th‑gen CPU, 64 GB RAM, and an NVIDIA RTX A2000 (12 GB). The backend uses Wolfram Language with Apache MXNet. Adam optimizer with a cosine‑annealing learning‑rate schedule (initial LR = 1e‑3) is applied for 50 epochs, batch size 256. Data augmentation consists of random horizontal/vertical flips; no random cropping is used because jets are already centred.
Results
Performance is evaluated using ROC‑AUC and background rejection at 1 % signal efficiency.
- LeNet‑5 (35 × 35) achieves AUC ≈ 0.970 and ≈ 85 % background rejection.
- EfficientNet‑S (image only) improves to AUC ≈ 0.981 and ≈ 91 % rejection.
- Adding the global features pushes AUC to ≈ 0.985 and background rejection to ≈ 94 % (35 × 35). The 40 × 40 configuration yields comparable numbers with a slight dip due to increased resolution without proportional depth increase.
When compared to state‑of‑the‑art transformer/GNN models (e.g., ParticleNet, PF‑Transformer), the proposed network’s AUC is within 1–2 % of the best results, while its inference time on the RTX A2000 is 0.8 ms (35 × 35) and 1.2 ms (40 × 40). On an older GTX 1080 Ti, the same models run 3–4× slower, highlighting the computational advantage.
Discussion
The study demonstrates that (1) a modestly scaled EfficientNet can capture sufficient jet substructure information from low‑resolution images; (2) global high‑level observables provide complementary information that markedly boosts classification performance without increasing model size; and (3) the resulting architecture is fast enough for potential deployment in trigger‑level or large‑scale offline analyses where GPU resources are limited. Limitations include the focus on binary top‑vs‑non‑top classification and reliance on simulated data; future work should explore multi‑class tagging (W/Z, Higgs), domain adaptation to real detector data, and further model compression (quantization, pruning) for deployment on specialized hardware (FPGA, ASIC).
Conclusion
By integrating a lightweight EfficientNet CNN with a carefully selected set of global jet features, the authors achieve a competitive top‑quark jet tagger that is orders of magnitude cheaper computationally than current transformer‑based approaches. This work provides a practical pathway toward scalable, real‑time jet identification in upcoming high‑luminosity LHC runs and sets the stage for broader applications across particle‑physics analyses.
Comments & Academic Discussion
Loading comments...
Leave a Comment