Learnable cut flow for high energy physics

Learnable cut flow for high energy physics
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Neural networks have emerged as a powerful paradigm for tasks in high energy physics, yet their opaque training process renders them as a black box. In contrast, the traditional cut flow method offers simplicity and interpretability but requires extensive manual tuning to identify optimal cut boundaries. To merge the strengths of both approaches, we propose the Learnable Cut Flow (LCF), a neural network that transforms the traditional cut selection into a fully differentiable, data-driven process. LCF implements two cut strategies-parallel, where observable distributions are treated independently, and sequential, where prior cuts shape subsequent ones-to flexibly determine optimal boundaries. Building on this strategy, we introduce the Learnable Importance, a metric that quantifies feature importance and adjusts their contributions to the loss accordingly, offering model-driven insights unlike ad-hoc metrics. To ensure differentiability, a modified loss function replaces hard cuts with mask operations, preserving data shape throughout the training process. LCF is tested on six varied mock datasets and a realistic diboson vs. QCD dataset. Results demonstrate that LCF 1. accurately learns cut boundaries across typical feature distributions in both parallel and sequential strategies, 2. assigns higher importance to discriminative features with minimal overlap, 3. handles redundant or correlated features robustly, and 4. performs effectively in real-world scenarios. In the diboson dataset, LCF initially underperforms boosted decision trees and multiplayer perceptrons when using all observables. LCF bridges the gap between traditional cut flow method and modern black-box neural networks, delivering actionable insights into the training process and feature importance. Source code and experimental data are available at https://github.com/Star9daisy/learnable-cut-flow.


💡 Research Summary

The paper introduces Learnable Cut Flow (LCF), a neural‑network‑based framework that transforms the traditional cut‑based event selection used in high‑energy physics into a fully differentiable, data‑driven process while preserving interpretability. Conventional cut flows consist of manually chosen thresholds on individual observables; they are easy to understand but become unwieldy as the number of observables grows and as correlations between them become important. Modern machine‑learning classifiers (e.g., boosted decision trees, deep neural networks) achieve superior discrimination power but act as black boxes, offering little insight into which features drive the decision. LCF bridges this gap by parameterising each cut with trainable weights and biases and by learning a per‑feature importance weight that scales the input before the cut operation.

Learnable cuts replace the non‑differentiable Heaviside step function with a logistic sigmoid σL(z)=1/(1+e⁻ᶻ). For each observable j a trainable weight wj and bias bj define a cut: ŷ = σL(wj x − bj). The sign of wj determines whether the cut is a lower‑bound (signal above) or an upper‑bound (signal below) and bj together with the inverse sigmoid of a probability threshold (typically 0.5) sets the actual cut location. The authors also handle double‑sided cuts (signal in the middle of a distribution or outside two boundaries) by splitting the observable at a manually chosen centre and applying two independent sigmoids (lower and upper). Masks are used to select the appropriate side of the distribution during loss computation, allowing the model to learn middle‑edge cases without changing tensor shapes.

Learnable importance introduces a trainable scalar sj for each feature. After applying a softmax σS across all sj, the resulting importance scores s′j are multiplied with the raw observable values, yielding scaled inputs x′ij = s′j xij. These scaled inputs feed the learnable cuts. The gradient of the binary‑cross‑entropy loss with respect to sj (derived in Eq. 2.34) shows that features whose prediction error weighted by wj xij exceeds the average contribution across all features receive a negative gradient, increasing their importance, while less useful features receive a positive gradient, decreasing their importance. Consequently, the model automatically emphasizes discriminative observables and suppresses redundant or poorly separating ones.

Two optimisation strategies are provided:

  • Parallel – each cut is trained independently on the original data distribution; the final event passes only if all cuts output a value above a global threshold. This mirrors the classic cut‑flow where cuts are applied simultaneously.
  • Sequential – cuts are applied one after another. After a cut, events that fail are masked out (their loss contribution set to zero) so that subsequent cuts are trained on the reduced dataset. This respects inter‑observable correlations and reproduces the traditional sequential cut‑flow without altering tensor dimensions.

The architecture normalises inputs, splits them along the feature axis, passes each slice through its own learnable‑cut module, and finally concatenates the binary outputs. During inference a default importance threshold of 0.05 (i.e., one‑twentieth of the uniform baseline) discards cuts whose learned importance falls below this value, automatically pruning negligible observables.

Experimental validation is performed on six synthetic datasets designed to cover a variety of distribution shapes (single‑peak, double‑peak, overlapping, central, edge, and highly correlated features) and on a realistic diboson‑vs‑QCD dataset. Results show that LCF accurately recovers the optimal cut positions for both parallel and sequential strategies, assigns higher importance scores to features with minimal signal‑background overlap, and remains robust when redundant or highly correlated features are present. In the diboson case, using all observables LCF’s area‑under‑curve is modestly lower (by ~3–5 %) than that of boosted decision trees and multilayer perceptrons. However, when the learned importance is used to select the top 5–7 features, the performance gap narrows dramatically, demonstrating that LCF provides valuable guidance for feature selection while retaining a transparent cut‑based decision rule.

Limitations and future directions include the reliance on 1‑D cuts (which may not capture complex non‑linear decision boundaries), sensitivity to initialisation of w, b, and sj, and potential computational overhead from many mask operations in the sequential mode. The authors suggest extending the framework with multi‑cut or hierarchical structures, incorporating discrete selection mechanisms such as Gumbel‑Softmax, and exploring deeper networks to model more intricate correlations.

In summary, Learnable Cut Flow offers a novel, interpretable alternative to black‑box classifiers: it automates the discovery of optimal cut thresholds, quantifies feature importance in a principled way, and can be integrated into existing analysis pipelines with minimal disruption. While its raw classification power may lag behind state‑of‑the‑art machine‑learning models, its transparency and built‑in diagnostic capabilities make it a promising tool for high‑energy physics analyses where physical insight and reproducibility are paramount.


Comments & Academic Discussion

Loading comments...

Leave a Comment