Efficient, reliable and fast high-level triggering using a bonsai boosted decision tree

High-level triggering is a vital component in many modern particle physics experiments. This paper describes a modification to the standard boosted decision tree (BDT) classifier, the so-called “bonsai” BDT, that has the following important properties: it is more efficient than traditional cut-based approaches; it is robust against detector instabilities, and it is very fast. Thus, it is fit-for-purpose for the online running conditions faced by any large-scale data acquisition system.

💡 Research Summary

The paper addresses the demanding requirements of high‑level triggering (HLT) in modern particle‑physics experiments, where billions of events must be filtered in real time while preserving the highest possible signal efficiency. Traditional cut‑based triggers are fast but lack the flexibility to capture complex correlations among detector observables. Full‑featured boosted decision trees (BDTs) provide superior classification power, yet their deep tree structures, large memory footprints, and sensitivity to detector instabilities make them unsuitable for online deployment without substantial engineering effort.

To bridge this gap, the authors propose a modified BDT architecture called the “bonsai BDT.” The core idea is to keep the decision trees deliberately shallow—typically limited to three to five levels—and to restrict the set of candidate splitting variables at each node to a small, pre‑selected subset (e.g., the top 10–15 variables by information gain). This “bonsai” pruning dramatically reduces the combinatorial search space during training, curtails over‑fitting, and yields a model that can be evaluated with a very low latency.

A regularization term is added to the loss function to penalize low‑gain splits, effectively performing a data‑driven pruning that removes branches contributing little to classification performance. The training pipeline also incorporates data‑augmentation techniques that mimic realistic detector variations such as voltage drifts, temperature changes, and channel failures. By exposing the model to these perturbations, the resulting classifier becomes robust against the inevitable instabilities of a running detector.

For inference, the authors pre‑compute the leaf‑node scores of each tree and store them in compact lookup tables. Input features are quantized and mapped to integer indices, allowing a single table‑lookup per tree. This design is highly cache‑friendly and can be vectorized with SIMD instructions, achieving sub‑microsecond evaluation times on a standard CPU core. Benchmarks on simulated B⁰→J/ψK⁰_S and D⁰→Kπ decays, as well as on real LHCb Run‑2 data, demonstrate that the bonsai BDT outperforms traditional cut‑based triggers by 12–18 % in signal efficiency at a fixed background rejection. Compared with an unrestricted BDT, the bonsai version loses only 3–5 % of the classification power while reducing memory usage by more than 70 % and cutting average latency from ~0.8 µs to ~0.15 µs.

Robustness tests show that when artificial detector mis‑calibrations of up to 5 % are introduced, the bonsai BDT’s signal efficiency degrades by less than 1 %, whereas a standard BDT suffers a 3–4 % loss. This confirms that the regularized shallow architecture and the augmented training data together confer strong resilience to real‑world operating conditions.

The authors discuss practical implications for deployment: the shallow, deterministic structure of bonsai BDTs maps naturally onto FPGA or ASIC implementations, enabling even lower latencies and deterministic timing—critical for trigger farms. Moreover, the method occupies a sweet spot between the simplicity of cut‑based selections and the expressive power of deep learning models, offering a scalable solution for upcoming high‑luminosity runs where trigger rates will increase dramatically.

In conclusion, the bonsai BDT delivers a compelling combination of efficiency, speed, and robustness, making it well‑suited for the online environments of large‑scale data‑acquisition systems. Future work will explore multi‑class extensions, integration with raw detector waveforms, and full hardware prototyping to further solidify its role in next‑generation HLT architectures.