Transformer-Based Pulse Shape Discrimination in HPGe Detectors with Masked Autoencoder Pre-training

Transformer-Based Pulse Shape Discrimination in HPGe Detectors with Masked Autoencoder Pre-training
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Pulse-shape discrimination (PSD) in high-purity germanium (HPGe) detectors is central to rare-event searches such as neutrinoless double-beta decay (0vBB), yet conventional approaches compress each waveform into a small set of summary parameters, potentially discarding information in the full time series that is relevant for classification. We benchmark transformer-based models that operate directly on digitised waveforms using the Majorana Demonstrator AI/ML data release. Models are trained to reproduce the collaboration-provided accept/reject labels for four standard PSD cuts and to regress calibrated energy. We compare supervised training from scratch, masked autoencoder (MAE) self-supervised pre-training followed by fine-tuning, and a feature-based gradient-boosted decision tree (GBDT) baseline. Transformers outperform GBDT across all PSD targets, with the largest gains on the most challenging labels and on the combined PSD-pass definition. MAE pre-training improves sample efficiency, reducing labelled-data requirements by factors of 2-4 in low-label regimes. For energy regression, both transformer variants show a small common underestimation on the test split, while fine-tuning modestly narrows the residual distribution. These results motivate follow-up studies of robustness across detectors and operating conditions and of performance near QBB.


💡 Research Summary

This paper addresses pulse‑shape discrimination (PSD) in high‑purity germanium (HPGe) detectors, a critical component of neutrinoless double‑beta‑decay (0νββ) searches. Traditional PSD in the Majorana Demonstrator relies on a handful of engineered scalar quantities (A‑vs‑E, LQ, DCR, etc.) that compress each waveform into a few summary parameters. While robust, this compression discards potentially discriminative information contained in the full time‑series. The authors therefore explore end‑to‑end deep‑learning approaches that operate directly on the digitised charge waveforms (3 800 samples per event) and their first‑order derivative (a current proxy).

The core model is a transformer encoder adapted for waveform data. The raw waveform and its gradient are first windowed into non‑overlapping segments of ten samples (≈100 ns). Each segment is linearly projected into a shared 64‑dimensional embedding space, followed by layer normalisation. No explicit positional encoding is used; the self‑attention mechanism alone captures both local pulse features (rise time, peak shape) and long‑range dependencies (tail behaviour, delayed charge). The same architecture is trained under two regimes: (1) supervised learning from labelled data, and (2) a two‑stage pipeline where a masked autoencoder (MAE) is first trained on a large set of unlabelled calibration waveforms, then fine‑tuned on the labelled PSD and energy regression tasks. In the MAE stage, a random fraction of the input tokens is masked and the model learns to reconstruct them, forcing the encoder to learn a compact representation of the underlying waveform distribution.

Experiments use the Majorana Demonstrator AI/ML data release, comprising 1.04 million training waveforms and 0.39 million test waveforms. Four binary PSD targets are provided: low‑side A‑vs‑E, high‑side A‑vs‑E, LQ (late‑charge), and DCR (delayed‑charge recovery). Energy regression is also evaluated. As a baseline, a feature‑based gradient‑boosted decision tree (GBDT) trained on the conventional scalar PSD parameters is employed.

Results show that transformer models consistently outperform the GBDT across all PSD targets. The most pronounced gains appear for the high‑side A‑vs‑E cut and for the combined “PSD‑pass” definition, which aggregates all four cuts. In low‑label regimes (simulated by subsampling the labelled set to as little as 2 % of the full training data), the MAE‑pre‑trained transformer retains 2–4× higher accuracy than the GBDT, demonstrating substantial sample‑efficiency benefits. Energy regression exhibits a modest systematic under‑estimation for both transformer variants; however, fine‑tuning after MAE pre‑training narrows the residual distribution relative to training from scratch, indicating that the self‑supervised stage learns useful calibration‑related features.

The authors also discuss practical considerations: detector‑conditioned embeddings allow the model to adapt to variations in geometry, impurity profiles, and operating conditions across different HPGe units. They note that while the current study focuses on calibration‑type waveforms, future work should test robustness on physics‑run data, explore domain‑shift mitigation, and assess performance near the Qββ region where background rejection is most critical.

In summary, this work demonstrates that transformer‑based architectures, especially when combined with masked‑autoencoder self‑supervision, provide a powerful and data‑efficient alternative to traditional feature‑engineered PSD in HPGe detectors. The approach captures the full temporal richness of the waveforms, reduces the dependence on large labelled datasets, and yields measurable improvements in background discrimination—key advantages for upcoming LEGEND‑200 analyses and the future LEGEND‑1000 ton‑scale experiment.


Comments & Academic Discussion

Loading comments...

Leave a Comment