Contrastive Metric Learning for Point Cloud Segmentation in Highly Granular Detectors

Contrastiv e Metric Learning for P oint Cloud Segmen tation in Highly Gran ular Detectors Max Marriott -Clar ke, 1 Lazar Novakovic, 2 Elizabeth Ratzer, 1 Robert J. Bainbridg e, 1 Loukas Gouskos, 2, 3 and Benedikt Maier 1 1 Blackett Laboratory , Imperial C ollege London, UK 2 Department of Physics & Astronomy , Brown University , US A 3 Brown Center for Theoretical Physics and Innovation (BCTPI), Brown University , US A (Dated: March 25, 2026) W e propose a novel clustering approach for point -cloud segmentation based on supervised con- trastive metric learning (CML). Rather than predicting cl uster assignments or object-cen tric v ariables, the method learns a la tent representa tion in which points belonging to the same object are embedded nearby while unrelated points are separated. Cl usters are then reconstructed using a density-based readout in the learned metric space, decoupling representation learning from cluster formation and enabling ﬂexible inference. The approach is ev aluated on simulated da ta from a highly granular calorimeter , where the task is to separate highly over lapping particle showers represented as sets of calorimeter hits. A direct comparison with object condensation (OC) is performed using identical graph neural network backbones and equal latent dimensionality , isolating the e ﬀ ect of the learning objective. The CML method produces a more stable and separable embedding geometry for both elec- tromagnetic and hadronic particle showers, leading to improved local neighbourhood consistency , a more reliable separation of ov erlapping showers, and better generaliza tion when extrapolating to unseen multiplicities and energies. This translates directly into higher reconstruction e ﬃ ciency and purity , particularly in high-multiplicity regimes, as well as improv ed energy resolution. In mixed- particle en vironments, CML maintains strong performance, suggesting robust learning of the shower topology , while OC exhibits signiﬁcant degradation. These results demonstrate that similarity -based representation learning combined with density -based aggreg ation is a promising alternativ e to object - centric approaches for point cl oud segmentation in highly gran ular detectors. I. INTRODUCTION Modern particle detectors increasingly rely on high- granularity sensor technol ogies that provide detailed spatial and timing measurements of energy deposits. The resulting data naturall y form point clouds with ir - regular geometry and variable size. A central recon- struction task is the segmentation of these point clouds into groups of measurements originating from individ- ual particles. In high-gran ularity calorimeters such as the CMS High Granularity Calorimeter (HGCAL) 1 , this task is particularl y challenging due to the frequent spa- tial and energetic over lap of particle showers, requiring algorithms capable of resolving com plex and highl y cor - related hit patterns. Graph neural networks (GNNs) have emerged as a powerful framewor k for point -cloud reconstruction in particle ph ysics 2 – 4 . A widely used learned clustering approach built on this framework is object condensation (OC) 5 , in which the network predicts object -centric la- tent variables that guide cluster formation. By integrat - ing clustering into the training objective, OC achieves strong performance across a range of reconstruction tasks 5 – 8 . However , this form ulation tightly couples rep- resentation learning to a speciﬁc clustering procedure. In dense en vironments, where mul tiple nearby show- ers compete for representativ e points, this coupling can lead to ambiguities in both the learned clustering coor - dinates and the assignment of hits to objects. In this work, we propose an alternative cluster - ing paradigm based on contrastive metric learning (CML) 9 . Rather than predicting object -level v ariables, the method learns a latent embedding in which hits from the same particle shower are placed nearby while hits from di ﬀ erent showers are separated. Clustering is performed only after training and acts as a readout of the learned representation. This decoupling allows the embedding geometry to be optimized directly for pair - wise com patibility , without imposing constraints asso- ciated with a particular clustering mechanism. A key adv antage of this form ulation is that the con- trastive objective depends onl y on relativ e relationships between hits, rather than absolute object -lev el proper - ties 9 . In object -centric approaches, the networ k m ust learn representativ e points and clustering scales that are implicitly tied to the detailed morphology of particle showers. As a resul t, mis-modelling in simulation can propagate directly into the learned cl ustering variables. In contrast, CML requires only that hits from the same shower be more similar than those from di ﬀ erent show - ers, making the learned representation less sensitive to varia tions in show er shape, energy response, and event composition. T o extract clusters from the learned embedding, we introduce a density -based readout procedure tailored to metric spaces. This method identiﬁes representativ e points from local neighbourhood structure and forms clusters without requiring explicit object -level predic- tions, ensuring consistency with the geometry induced by the contrastiv e objective. The proposed approach is evalua ted on simulated electromagnetic (EM) and hadronic (HAD) showers in 2 a detector model inspired by the CMS HGC AL. A direct comparison with OC is performed using identical GNN backbones and matched embedding dimensionality to isolate the impact of the learning objective. Representa- tion quality is assessed using embedding g eometry met - rics, while reconstruction performance is evalua ted us- ing physics-based observables incl uding e ﬃ ciency , pu- rity , and energy resol ution. W e ﬁnd that CML produces a more structured embed- ding geometry , with improv ed separability particularly in the tails of the distance distributions where recon- struction failures typically occur . This leads to signiﬁ- cant improvements in clustering performance in high- mul tiplicity environments with strong shower ov erlap. These results demonstrate that learning a similarity - based representation provides a robust alterna tive to object -centric clustering methods for high-granularity detectors and more g enerally f or dense point -cloud seg- mentation problems. II. METHODS A. Model Architecture Both learning objectiv es employ an identical GNN backbone to permit a direct comparison. Di ﬀ erences be- tween methods are restricted to the task -speciﬁc output heads required by the respective l oss functions. Each calorimeter hit is represented by a ﬁve- dimensional input f eature vector ( x, y , z, E , L ) , where ( x, y , z ) denote the hit position, E is the deposited energy , and L is the detector lay er index. F urther de- tails of the da taset and input representation are given in Section III . Ev ents are treated as variable-size point clouds with no graph connectivity provided as in put. T ruth show er identiﬁers are stored separately and are used only for deﬁning the learning objectives and eval u- ation metrics. No additional handcr afted high-level fea- tures are introduced, and we ﬁnd that explicit f eature normalisation has negligible impact on perf ormance. The f eatures are ﬁrst projected into a learned latent space using a two-la yer mul tilay er perceptron (MLP) with hidden dimension 64 and exponential linear unit (EL U) activa tions, producing a 64-dimensional repre- sentation. The encoded hit features are subsequently processed by three DynamicEdg eConv lay ers 10 , each constructed using a k -nearest -neighbour gr aph ( k = 24) in the cur - rent latent f eature space using E uclidean distance. A learnable edge function is applied and aggregated us- ing the permutation-in variant “max" operation. Be- cause the graph connectivity is recomputed after every DynamicEdgeCon v la yer , the neighbourhood structure evolv es during training, allowing the networ k to jointly learn feature representations and a data-driven similar - ity metric between hits. Constructing the graph in latent feature space red uces sensitivity to the geometric con- straints imposed by the detector layout and enables in- teractions between hits across layers, which is important for capturing the longitudinal development of calorime- ter showers. After the shared DynamicEdgeConv backbone, the network separates into task -speciﬁc output heads. In both cases, fea tures are processed by a tw o-lay er MLP (64 → 32) with EL U activa tions and dropout ( p = 0 . 1), applied identically across both heads. For the con- trastive objectiv e, a ﬁnal linear layer projects the 32- dimensional features to a 16-dimensional embedding used for metric learning. For object condensation, two parallel linear heads are applied to the same 32- dimensional features, predicting a per -hit condensation score β i ∈ (0 , 1) and 16-dimensional clustering coordi- nates c i . The embedding dimensionality is matched be- tween the two methods to ensure a com parable latent space dimensionality . Both methods were also evalu- ated with a reduced f our -dimensional embedding; the corresponding results are presented in Section V . Since all backbone layers (the input MLP , the three DynamicEdgeCon v la yers, and the shared output MLP), activa tion functions, and optimization h yperparameters are shared, di ﬀ erences in embedding structure and re- construction performance between the two methods can be primarily attributed to the choice of learning objec- tive. B. Contrastiv e Metric Learning CML is a class of methods 11 – 13 that learns a represen- tation space in which samples sharing the same seman- tic identity are embedded nearby , while unrelated sam- ples are separated. Ra ther than predicting discrete la- bels or cluster assignments, the objective directly shapes the geometry of the latent space, making the embed- ding itself the primary target of optimization. This paradigm is well suited to point -cloud segmentation in high-granularity detectors, where reconstruction de- pends on identifying compatible sets of hits rather than assigning them to predeﬁned object templates. Such ob- jectives hav e been widely used in representation learn- ing and clustering tasks 12 , 14 , 15 , where downstream per - formance depends on the separability of learned fea- tures. W e apply supervised contrastive learning at the hit level within individual even ts. For an event contain- ing N hits with embeddings { z i } N i =1 and corresponding shower labels { y i } N i =1 , positive pairs are deﬁned as hits originating from the same simulated particle shower , while negative pairs correspond to hits from di ﬀ erent showers within the same event. Comparisons across events are excluded since separate events do not share physicall y meaningful relationships. 3 The embeddings are ℓ 2 -normalized and compared us- ing cosine similarity s i j = z ⊤ i z j , constraining all repre- sentations to lie on the unit hypersphere. This encour - ages hits from the same show er to form compact regions in angular space while separating hits from di ﬀ erent showers, thereby learning a discriminative similarity metric without requiring explicit cluster centres or as- signment variables. This f ormulation is adv antag eous in calorimeter reconstruction, where showers are spatially extended and frequently ov erlap, making object bound- aries inherently ambiguous. Since the DynamicEdge- Conv graph is constructed in the current latent feature space at each layer , the neighbourhood structure used for message passing evolv es alongside the learned rep- resentation. For CML, this creates a coupling between the contrastiv e objectiv e and the graph topology: as the embedding learns to place shower -compatible hits nearby , the graph increasingl y connects hits from the same shower , reinforcing the representation. This feed- back between representation learning and graph con- struction may contribute to the stability of the CML em- beddings observed in Section V . The loss is given by the supervised contrastive (Sup- Con) objective 13 : L SupCon = − 1 |A| X i ∈A 1 |P ( i ) | X p ∈P ( i ) log exp  z ⊤ i z p / τ  P j , i exp  z ⊤ i z j / τ  , (1) where each hit i acts as an anchor (i.e., a ref erence point against which all other hits in the event are compared), A denotes the set of anchors with at least one positiv e sample, P ( i ) is the set of positive hits for anchor i (i.e., hits sharing the same show er label), and τ is a temper - ature parameter that controls the concentration of the embedding distribution on the hypersphere. Smaller τ sharpens the similarity distribution and encourag es tighter clusters. In this work we use τ = 0 . 1. The loss ag gregates ov er all positiv e and negativ e hits within each event rather than sampling pairs. Calorime- ter events typically contain many w eak nega tives and relativel y few informative hard negativ es, such that ran- dom sampling often yields low-inf ormation pairs. A g- gregating over all pairs ensures that informativ e rela- tionships consistently contribute to the optimization. As shown in the SupCon formulation 13 , the gradient naturally emphasises hard positives and hard negativ es, providing implicit hard pair mining. Although con- trastive methods often rely on larg e batch sizes to ob- tain su ﬃ cient negatives, the large number of hits within each event natur ally provides a rich negative set , allow- ing each event to act as an e ﬀ ective tr aining batch. C. Object Condensation Baseline The OC 5 is a learned clustering method that recon- structs particle candidates from detector hits using per - hit latent variables. The approach has been applied across a range of high-energy ph ysics tasks 5 , 7 , 8 , 16 , and is adopted here as a baseline. For each hit i , the network predicts clustering coordi- nates c i ∈ R d and a condensa tion score β i ∈ (0 , 1). The coordinates deﬁne a latent clustering space, while β i identiﬁes representativ e hits. Hits with large β act as condensation points around which compatible hits are grouped. In this wor k, the clustering space dimension is d = 16, matching the contrastive embedding dimension to ensure a comparable laten t space dimensionality . The condensation score β is mapped to a positive charge q i = arctanh 2 ( β i ) + q min , (2) which determines the in teraction strength in the cl us- tering space. For each object k , the hit with the largest charge deﬁnes the condensation poin t α k = arg max i ∈ k q i , (3) where k indexes the set of hits belonging to a given truth shower and K denotes the total n umber of show ers in the event. Hits belonging to the same object are attracted to the condensation point , L att = 1 K K X k =1 1 | k | X i ∈ k q i q α k    c i − c α k    2 , (4) while hits from di ﬀ erent objects are repelled within a ﬁnite radius, L rep = 1 K K X k =1 1 | ¯ k | X i < k q i q α k max  0 , 1 −    c i − c α k     . (5) A regularisation term encourages condensation points to attain high conﬁdence, L β = 1 K K X k =1  1 − β α k  . (6) The total event l oss is L OC = s att L att + s rep L rep + s coward L β . (7) In our im plementation s att = s rep = s coward = 1 . 0 and q min = 0 . 1, following the default conﬁguration of 5 . A limited scan of the loss weights was performed and found to produce negligible chang es in reconstruction performance relative to the default conﬁguration. The model learns latent variables that deﬁne clustering be- haviour , while the actual grouping of hits is performed at inference time using these learned quan tities. For direct comparison, the OC model uses the same DynamicEdgeCon v backbone described in Section II A , di ﬀ ering only in the output heads predicting β i and c i . 4 T raining is performed event -wise with all hits assigned to their corresponding truth shower , consistent with the supervised setting used f or contrastiv e learning. In con- trast to CML, which learns a global similarity metric, OC directly optimizes object -centric clustering variables in the laten t space. In this formula tion, cl ustering struc- ture is explicitly encoded through object -level attractors rather than emerging from pairwise representations. D. Clustering Clustering is performed after training and treated as a readout of the learned representation rather than part of the learning objective. All cl ustering is performed independently for each ev ent. a. Agglomer ative clustering. As a common baseline, agglomera tive clustering 17 is applied directl y to the learned embeddings produced by both C ML and OC. W e use W ard linkage with a Euclidean distance metric, which provides robust performance across both embed- ding spaces. The method does not require a ﬁxed num- ber of clusters and naturally accommodates ev ents with variable particle multiplicity . The number of clusters is determined by applying a distance threshold δ agg to the W ard linkage dendrogram: cl usters separated by a dis- tance greater than δ agg are not merged. The value of δ agg is optimized on the a uxiliary dataset and reported in T a- ble I . Applying the same clustering algorithm to both embeddings enables comparison of the intrinsic quality of the learned representations, independent of method- speciﬁc inference proced ures. b. Object condensation inference. For OC, clusters are obtained using the inf erence proced ure associa ted with the OC loss 5 . Candidate condensation points are selected by thresholding the predicted scores β OC ,i > t β and sorting in decreasing β OC . A greedy separation step enforces a minimum E uclidean distance t d between se- lected points in the learned clustering coordinate space c i ∈ R 16 . Hits within radius t d of a condensation point are assigned to it, and any remaining hits are assigned to the nearest condensation point. The val ues of t β and t d are given in T able I . c. Density-based readout of metric embeddings. Con- trastive embeddings do not predict representative points. W e therefore deﬁne a clustering readout operat - ing directly in the embedding space, deriving candidate centres from local neighbourhood structure rather than from a learned score. For each hit , a local density estimate is obtained from the distance to its k -th nearest neighbour d k ( i ). In this work we use the same v alue of k as in the DynamicEdge- Conv graph construction, although the two choices are conceptually independent. This distance is mapped to a score β CML ,i = exp − d k ( i ) τ ! , (8) where τ is the temper ature parameter used in the con- trastive loss. This mapping is consistent with the sim- ilarity scaling used during training, assigning large scores to densely populated regions of the embedding. Candidate centres are selected by thresholding β CML ,i > t β and enforcing a minimum separation t d . Clusters are formed by assigning hits within distance t d to each centre, foll owed by nearest -centre assignment for remaining hits. Here β CML ,i plays a role anal ogous to the condensa tion score in OC, but is computed from the embedding rather than predicted by a networ k head. Unlike standard density-based methods such as DB- SC AN 18 and HDBSC AN 19 , which rely on ﬁxed den- sity thresholds, the proposed readout operates on lo- cal neighbourhood structure in the learned embedding, making it compatible with spatially varying density across showers. d. Summary . Three clustering strategies are evalu- ated: shared aggl omerative clustering applied to both embedding spaces, the native OC inference procedure, and the proposed density-based readout for contrastiv e embeddings. This design isolates the e ﬀ ect of the learned representation from the choice of clustering al- gorithm. III. EXPERIMENT AL SETUP A. Dataset The datasets used in this study are produced with a standalone Geant4 20 simulation closely resembling the HGC AL detector 21 . The calorimeter model comprises 50 l ongitudinal lay - ers: 28 electromagnetic (CE-E) followed by 22 hadronic (CE-H). The transverse cell size is 1 cm × 1 cm over a 100 cm × 100 cm area, ensuring full shower containment. Energy deposits are calibrated in minimum-ionising- particle (MIP) units, with the simula ted hit response consistent with test -beam observations 22 , 23 . Three training datasets are considered: an electro- magnetic (EM, electrons) sample, a hadronic (HAD, charged pions) sample, and a mixed sample contain- ing both particle types. The EM da taset contains 2–10 electrons per event with primary energies uniformly distributed between 30 and 400 GeV . The HAD dataset contains 2–7 pions per even t in the same energy range. The mixed dataset contains an equal fraction of elec- trons and charged pions with 2–7 particles per event and primary energies between 30 and 250 GeV . For each conﬁguration, 200 000 events are used for training and 50 000 for validation. Models are ev aluated on independent test datasets containing between 1 and 30 particles per even t with primary energies ranging from 30 to 600 GeV . Separate EM and HAD test samples are used. Models trained on the EM and HAD datasets are evaluated on their respec- tive particle types, while the model trained on the mixed 5 dataset is ev aluated on both. In addition to the training, valida tion, and primary test datasets, an auxiliary optimization dataset is gen- erated for each training conﬁguration with the same mul tiplicity and energy distributions as the correspond- ing training sample. These events are not used dur - ing network training and are used only to determine the clustering hyperparameters at inference time. Op- timal clustering thresholds are selected by maximising reconstruction performance on this dataset. For models trained on the EM and HAD samples, thresholds are de- termined separately . For the mixed model, a single set of thresholds is determined on the mixed optimization dataset and applied to both test samples. The resulting threshold v alues are summarised in T able I . T ABLE I: Clustering thresholds. The parameter δ agg denotes the agglomera tive cl ustering distance threshold, while t β and t d correspond to thresholds used in both the density-based readout and object condensation inference. Model Method δ agg t β t d EM CML 8 0.4 0.45 OC 6.75 0.1 0.25 HAD CML 10 0.4 0.45 OC 11 0.1 0.35 Mixed CML 9.5 0.4 0.45 OC 7.5 0.1 0.3 When m ultiple particles deposit energy in the same cell, their contributions are merged into a single hit, as in realistic detector readout. Each hit is assigned the label of the particle contributing the largest fraction of its deposited energy . The fractional contribution of the dominant particle is recorded to deﬁne hit purity , which is close to unity across the dataset. This quantity is not used as an input f eature. All particles are generated within a cone of ﬁxed opening angle, with a half - angle of 5 ◦ around the detec- tor axis. Consequently , increasing the event mul tiplicity leads to a systematic increase in l ocal particle density within the calorimeter vol ume. This provides a con- trolled mechanism to vary the degree of shower ov er - lap, allowing reconstruction performance to be probed across regimes of increasing ambiguity and hit -lev el confusion. B. T raining Details All models are trained using the Adam optimizer 24 with an initial learning rate of 3 × 10 − 4 and a batch size of 16 ev ents. The learning rate is reduced by a factor of 0.5 every 25 epochs, and training is performed for up to 100 epochs. Iden tical optimization settings, data splits, and initialization procedures are used f or both CML and OC models, with di ﬀ erences arising only from the learning objective and corresponding output heads. Model selection is based on the valida tion loss eval- uated after each epoch on an independent valida tion set. The checkpoint with the l owest valida tion loss is retained, and early stopping is applied once no further improv ement is observed. T raining is performed using mixed-precision arith- metic with gradient scaling. All models are trained on a single GPU (NVIDIA T esla V100, 32 GB, or NVIDIA H100, 80 GB), with consistent results observed across both hardware conﬁgurations. IV . EV AL U A TION A. Embedding Geometry Metrics The learned representations are evalua ted indepen- dently of any clustering procedure by quantifying g e- ometric properties of the embedding space within each event. Such metrics are commonly used to assess rep- resentation quality in metric learning settings 25 , 26 . All quantities are computed using cosine similarity on ℓ 2 - normalized embeddings and av eraged ov er events. a. Recall@ k (R@ k ) For each hit, the k nearest neigh- bours in embedding space are identiﬁed. A hit is consid- ered correctly retrieved if at least one neighbour origi- nates from the same truth shower . The fraction of such hits deﬁnes R@ k . W e report R@1 and R@10, which probe whether compatible hits are placed as the closest neighbour and within the local neighbourhood, respec- tivel y . b. Contamination@10 (C@10). Local neighbour - hood purity is quantiﬁed as the fraction of the 10 nearest neighbours that originate from a di ﬀ erent truth shower . Unlike Recall@ k , which provides a binary criterion, this metric captures the continuous fraction of incorrect neighbours. Lower contamination indicates that local neighbourhoods correspond more closely to physicall y consistent showers. c. Pairwise separability . P ositive (same-show er) and negativ e (di ﬀ erent -shower) hit pairs are sampled and their cosine similarities compared. The area under the receiver operating characteristic (A UC) curve measures the probability that a positive pair has higher similar - ity than a negativ e pair , providing a global measure of separability in the embedding space. d. Intra- and inter -shower distances. T o characterise the geometric structure of individual showers, the anal- ysis is restricted to the energetic core of each shower by retaining the highest -energy hits that together account for 95% of the total shower energy . This suppresses l ow- energy peripheral deposits while preserving the physi- cally relev ant structure. Cosine distances d = 1 − z ⊤ i z j are computed between hits. Intra-shower compactness is quantiﬁed by the 99 th percentile of same-shower pairwise distances, denoted 6 d Q 99 intra , capturing the largest separations within a shower core. Inter -shower separation is measured as the median distance betw een hits from di ﬀ erent show ers, providing a robust estimate of typical separation between distinct objects. B. Reconstruction Metrics The performance of reconstructed particle candi- dates is eval uated using standard high-energy physics (HEP) metrics following the CMS TICL validation pro- cedure 27 . These quantities assess the physical quality of reconstructed objects rather than the learned represen- tation. a. E ﬃ ciency . A simulated particle is considered re- constructed if at least 70% of its deposited energy is con- tained within a single reconstructed object. The recon- struction e ﬃ ciency is deﬁned as ε = N matched sim N sim . b. Purity . Purity quantiﬁes the extent to which a reconstructed object originates from a single simula ted particle. This is eval uated using the Reco-to-Sim score S ( r → s ), deﬁned for a reconstructed object r and simu- lated particle s as S ( r → s ) = P hits max ( 0 , f reco − f sim ) 2 E 2 P hits f 2 reco E 2 , (9) where f reco and f sim are the fractional energy contri- butions of a hit to the reconstructed object and simu- lated particle, respectivel y , and E is the hit energy . The score is zero for a perf ectly pure object and increases with contamination. A reconstructed object is consid- ered pure if there exists at least one sim ulated particle s for which S ( r → s ) < 0 . 2. The purity is then deﬁned as P = N pure reco N reco . c. Multiplicity ratio. T o quantify the tendency of the clustering to split or merg e objects, the ratio R N = N reco N sim is used, where R N > 1 indicates object splitting and R N < 1 indicates merging. d. Energy resolution. For each simulated particle, the reconstructed object with the largest shared energy is selected as the match. Objects with S ( r → s ) > 0 . 5 are discarded as impure. The energy response is deﬁned as r = E reco E sim , T ABLE II: Embedding-geometry metrics for EM and HAD showers at represen tative m ultiplicities. CML consistently achiev es higher Recall@ k , lower contamination@10, and higher A UC than OC, with the performance gap increasing as m ultiplicity increases. Dataset N particles Method R @ k C @10 A UC 1 10 EM 2 CML 0.941 0.992 0.067 0.956 OC 0.935 0.989 0.072 0.944 15 CML 0.691 0.932 0.339 0.950 OC 0.667 0.920 0.363 0.923 30 CML 0.572 0.886 0.467 0.945 OC 0.536 0.871 0.503 0.918 HAD 2 CML 0.938 0.992 0.077 0.945 OC 0.922 0.989 0.092 0.927 10 CML 0.744 0.943 0.286 0.927 OC 0.698 0.927 0.330 0.889 20 CML 0.628 0.900 0.407 0.917 OC 0.573 0.876 0.460 0.877 where E reco is the sum of the energies of the hits as- signed to the reconstructed object and E sim is the true particle energy . Within bins of E sim , the mean response µ is used to deﬁne a calibrated response r ′ = r / µ . The energy reso- lution is then deﬁned as the e ﬀ ective sigma σ e ﬀ , given by half the minimum interval containing 68 . 3% of the r ′ distribution. For Ga ussian distributions, σ e ﬀ is equiv- alent to the standard deviation, while remaining robust to non-Gaussian tails and outliers 28 . As a reference, we also compute the resolution of an ideal pattern-recognition algorithm that perfectly as- signs all hits to their truth shower , with no merging or splitting. The energy of each reconstructed object is taken as the sum of the energies of all hits assigned to the corresponding truth shower . This provides a lower bound on the achievable resolution given the detector response and the hit -level energy calibration, and is in- dependent of an y clustering algorithm. V . RESUL TS A. Embedding Geometry Embedding-level performance is evaluated separa tely for the EM and HAD datasets using the metrics deﬁned in Section IV A . Results are reported a t representative event multiplicities to probe behaviour as shower over - lap increases. Across both datasets, CML consistentl y outperf orms OC, achieving higher Recall@ k , lower contamina- tion@10, and higher A UC a t all tested multiplicities (T a- ble II ). The performance gap is modest at low mul ti- plicity but increases steadily with shower density , indi- cating that CML preserves more stable local neighbour - 7 0.2 0.4 0.6 0.8 1.0 0 8 16 24 32 Density N p a r t i c l e s = 2 Median: Inter 0.2 0.4 0.6 0.8 1.0 N p a r t i c l e s = 1 5 0.2 0.4 0.6 0.8 1.0 N p a r t i c l e s = 3 0 CML OC EM HAD 0.2 0.4 0.6 0.8 1.0 0 8 16 24 32 Density Q99: Intra 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0 0.4 0.2 0.0 0.2 0.4 Metric value 0 8 16 24 Density Margin 0.4 0.2 0.0 0.2 0.4 Metric value 0.4 0.2 0.0 0.2 0.4 Metric value FIG. 1: Ev ent -lev el embedding-geometry distributions for EM and HAD show ers at increasing genera ted particle mul tiplicity ( N particles ). T op: median inter -shower separa tion d median inter . Middle: intra-shower distance tail d Q99 intra . Bottom: separation margin ∆ = d median inter − d Q99 intra . Narrow positiv e or near -zero margins indicate a well-deﬁned clustering scale, while broad or nega tive margins indicate increasing ambiguity betw een showers. hood structure under increasing over lap. Figure 1 shows the corresponding embedding- geometry distributions. The top row presents the me- dian inter -shower separation d median inter , while the mid- dle row shows the intra-shower distance tail d Q99 intra . For CML, both quantities remain narrowly distributed across all mul tiplicities, indica ting a stable embedding geometry with consistent distance scales both within and between showers. In contrast, OC produces much broader distributions, reﬂecting substantially larger event -to-event v ariability . The bottom row shows the separation margin ∆ = d median inter − d Q99 intra , which directly measures whether show ers remain geo- metrically separable. For CML, the margin distribution remains narrow across all multiplicities, with a positive peak for EM showers and a near -zero peak for HAD showers. This indicates that EM showers are intrin- sically more separable, while HAD showers are more complex, but in both cases the clustering scale remains well deﬁned. In contrast, OC produces broad and often nega- tive margin distributions. For EM showers, the mar - gin spans both positive and negative values, imply - ing strong overlap between intra- and inter -shower dis- tances. For HAD showers, the margin is systematically negativ e and becomes increasingly broad with multi- plicity . In both cases, no single distance threshol d can separate show ers consistently across even ts. Overall, the advantage of CML arises primarily from controlling the tails of the distance distributions rather than shifting their central values. By maintaining com- pact intra-shower structure tog ether with stable inter - shower separation, CML produces embeddings that re- main reliably cluster able even in dense environmen ts. 8 B. Reconstruction P erformance Reconstruction performance is eval uated using the physics metrics deﬁned in Section IV B . A cross all datasets and reconstruction conﬁgurations, CML out - performs OC, with the largest g ains appearing at high mul tiplicity . These di ﬀ erences follow directly from the embedding geometry: the narrow margin distributions of CML deﬁne a stable clustering scale, whereas the broad and often negative margins of OC make cluster - ing increasingly ambiguous. a. Energy dependence. Figure 2 shows reconstruc- tion e ﬃ ciency and purity as a function of the primary particle energy . For the dedicated EM model, CML achieves near -perfect e ﬃ ciency ( ∼ 99%) and substan- tially higher purity ( ∼ 80–82%) than OC, which remains below 85% e ﬃ ciency and 72% purity . For HAD show- ers, e ﬃ ciencies are more similar across methods ( ∼ 70– 76%), but CML retains a clear purity advantage of ∼ 10– 15%. The mixed-trained model reveals a stronger separa- tion. For EM showers, OC degrades severely , with ef - ﬁciencies dropping to ∼ 30–40%, whereas CML retains e ﬃ ciencies above 80% and substantiall y higher purity . For HAD showers, e ﬃ ciencies remain com parable, but CML again achiev es higher purity . Across all cases, perf ormance remains stable beyond the training range, indicating that the dominant di ﬀ er - ences arise from the learned representation rather than from any strong dependence on primary energy . This is consistent with the embedding geometry , for which the separation margin is controlled mainly by local shower structure rather than absol ute shower energy . b. Multiplicity dependence. Figure 3 shows perfor - mance as a function of the particle mul tiplicity . This is the regime in which the separation between methods is most pronounced. At low multiplicity , all methods perform similarly . As multiplicity increases, clear di ﬀ erences emerge. For EM showers at N = 30, CML maintains high e ﬃ ciency ( ∼ 95–98%) and purity ( ∼ 73–78%), while OC degrades to ∼ 75% e ﬃ ciency and ∼ 47–55% purity . The number ratio remains close to unity for CML but drops signiﬁ- cantly for OC, indica ting substantial merging. A similar trend is observed for HAD showers. At N = 30, CML improves purity by nearly 20 percentage points over OC while also providing a modest but sys- tematic e ﬃ ciency g ain. Again, C ML maintains a more stable R N , indicating improved control of merging in dense environments. The mixed-trained model shows the strongest con- trast. For EM showers at N = 30, CML remains func- tional with e ﬃ ciencies of ∼ 70%, whereas OC collapses to ∼ 20–30%. For HAD showers, e ﬃ ciencies remain comparable, but CML retains a clear purity adv antag e. These trends can be understood directly from the embedding margins. For CML, the margin distribu- tions remain narrow and concentrated near the sepa- ration boundary across mul tiplicities, so a single cl us- tering scale continues to separate showers even as over - lap increases. For OC, the behaviour depends strongly on shower type. In EM ev ents, the margin becomes extremely broad, so no global threshold can separate showers consistently; this explains the larg e degrada- tion in R N and purity , particularly for agglomerativ e clustering. In HAD events, the OC margin is less broad but systematically negative, so clustering is somewhat more stable but still intrinsically ambiguous, leading to persistent mis-clustering and reduced purity . The mixed setting makes this contrast most explicit: CML produces similar margin distributions for EM and HAD showers and therefore supports a common threshold, whereas OC does not, causing the EM performance of the mixed model to collapse while the HAD perfor - mance remains closer to that of the HAD-onl y model. c. Energy resolution. Figure 4 shows the corre- sponding energy resolution. For EM showers, CML con- sistently achieves the best resolution and remains clos- est to the ideal pattern-recognition limit. At high en- ergy , O (600 GeV), CML reaches ∼ 1.6%, compared with ∼ 2.0% for OC with native inference and ∼ 2.4% for OC with agglomera tive cl ustering. For HAD show ers, the separation is smaller but re- mains systematic. The CML method provides the largest improv ement at low energy , where pattern-recognition e ﬀ ects dominate (e.g. 15 . 4% versus 18–20% at 50 GeV), while at higher energies the curves approach a constant term and the gap narrows. These results are the direct consequence of the clus- tering behaviour . The stable separation margins of CML reduce both merging and fragmenta tion, leading to more accurate assignment of energy deposits to re- constructed objects. In contrast , the broad and negativ e margins of OC lead to systema tic mis- assignment of en- ergy , particularly in dense environments, and therefore to degraded resolution. d. Dependence on embedding dimensionality . T a- ble III shows the mean reconstruction performance of the mixed-trained model for 16- and 4-dimensional em- beddings, evaluated on an independent mixed sam ple. Reducing the embedding dimension leads to a modest degradation for all methods, but the relative ordering remains unchanged: CML contin ues to outperform OC in both purity and e ﬃ ciency across all conﬁgurations. This shows that the observ ed adv antage is not speciﬁc to the 16-dimensional setting. VI. DISCUSSION AND CONCL USIONS W e have presented a clustering framewor k for point - cloud segmentation in high-granularity cal orimeters based on CML. Rather than learning object -centric clus- tering variables, the method learns a represen tation in which hits from the same show er are placed nearby and hits from di ﬀ erent showers are separated, with cluster - 9 0.0 0.2 0.4 0.6 0.8 1.0 1.2 Efficiency EM HAD Mixed: EM Mixed: HAD 50 100 200 400 600 0.0 0.2 0.4 0.6 0.8 1.0 1.2 P urity 50 100 200 400 600 50 100 200 400 600 50 100 200 400 600 P r i m a r y p a r t i c l e e n e r g y E [ G e V ] CML + Density r eadout CML + Agglomerative OC + Native infer ence OC + Agglomerative FIG. 2: Reconstruction e ﬃ ciency (top) and purity (bottom) as functions of primary particle energy for electromagnetic (EM) and hadronic (HAD) showers. Columns show models trained on EM, trained on HAD, and a mixed-trained model ev aluated separa tely on EM and HAD showers. The vertical dashed line indicates the upper boundary of the training energy range. The dominant di ﬀ erences are larg ely independent of energy , indicating that performance is controlled primarily by the learned represen tation rather than by the absolute show er energy . T ABLE III: Mean reconstruction purity and e ﬃ ciency for the mixed-trained model ev aluated on the combined EM and HAD test sample for 16- and 4-dimensional embeddings. The relativ e ordering between methods is unchang ed when the embedding dimensionality is reduced, showing that the C ML advantag e is not driven by la tent -space size alone. Model Dimension Cl ustering Purity E ﬃ ciency CML 16 Aggl omerative 0.722 0.932 16 Density 0.710 0.945 4 Aggl omerative 0.690 0.887 4 Density 0.637 0.905 OC 16 Aggl omerative 0.469 0.849 16 Density 0.520 0.838 4 Aggl omerative 0.512 0.813 4 Density 0.500 0.866 ing applied onl y as a readout of the learned geometry . This decoupling allows the representation to be opti- mized for pairwise compatibility while retaining ﬂexi- bility in the choice of inference proced ure. The central result of this work is that the learned embedding geometry directly determines clustering performance. The CML approach produces narrow separation-margin distributions that remain positive for EM showers and only slightly negativ e for HAD show- ers, indica ting a stable and well-deﬁned clustering scale even in dense environments. In contrast , OC yields substantially broader and often negativ e margin distri- butions, particularly for EM show ers, implying strong over lap betw een intra- and inter -shower distances and therefore intrinsically ambiguous clustering decisions. These geometric di ﬀ erences explain the observed recon- struction behaviour: CML consistently achieves higher purity , higher e ﬃ ciency , more stable R N , and improved energy resolution, with the largest gains appearing at high mul tiplicity where shower over lap is most severe. The mixed-training results provide the strongest evi- dence f or robustness. The CML method maintains simi- lar separation scales for EM and HAD show ers, all owing a single clustering threshol d to operate e ﬀ ectiv ely across both particle types. By contrast , OC learns di ﬀ erent ge- ometric structure for EM and HAD showers. This leads to a pronounced degradation for EM showers in particu- lar , indicating that the object -centric formulation is less able to accommodate heterogeneous show er topologies within a single model. T aken together , these results show that , for highly granular calorimeter reconstruction, learning a stable similarity geometry is more e ﬀ ective than learning ex- 10 0.00 0.25 0.50 0.75 1.00 Efficiency EM HAD Mixed: EM Mixed: HAD 0.00 0.25 0.50 0.75 1.00 P urity 5 10 15 20 25 30 0.0 0.5 1.0 1.5 2.0 R N 5 10 15 20 25 30 5 10 15 20 25 30 5 10 15 20 25 30 P a r t i c l e m u l t i p l i c i t y N p a r t i c l e s CML + Density r eadout CML + Agglomerative OC + Native infer ence OC + Agglomerative FIG. 3: Reconstruction e ﬃ ciency (top), purity (middle), and number r atio (bottom) as functions of particle mul tiplicity N for electromagnetic (EM) and hadronic (HAD) showers. Columns show models trained on EM, trained on HAD, and a mixed-trained model ev aluated separ ately on EM and HAD showers. The vertical dashed line indicates the upper boundary of the training m ultiplicity range. The performance gap betw een CML and OC increases strongly with mul tiplicity , showing that clustering stability in dense en vironments is determined by the underlying embedding g eometry . plicit object-cen tric clustering variables. More broadly , they suggest that contrastive metric learning provides a robust alternativ e for dense point -cloud segmenta- tion problems in which object boundaries are ambigu- ous, ov erlap is common, and inference m ust remain sta- ble under changing event complexity . W e propose to test this strategy with ultra-realistic simulations of the HGC AL detector in high-pileup conditions, as imple- mented in the CMS software stack. A CKNOWLEDGEMENTS W e thank Sunanda Banerjee for the help in creating a realistic dataset corresponding to particle showers in a highly granular detector . The neural networks in this study have been trained on the Imperial College RCS HPC cluster and the Oscar cluster at Brown University . L. G. and L. N. are supported by the DOE, O ﬃ ce of Sci- ence, O ﬃ ce of High Energy Physics Ear ly Career Re- search program under A ward No. DE-SC0026288. B. M. acknowledges the support of Schmidt Sciences. 11 0.00 0.01 0.02 0.03 0.04 0.05 EM Mixed: EM 100 200 300 400 500 600 0.000 0.025 0.050 0.075 0.100 0.125 0.150 0.175 0.200 HAD 100 200 300 400 500 600 Mixed: HAD ( E r e c o / E s i m ) P r i m a r y p a r t i c l e e n e r g y E [ G e V ] CML + Density r eadout CML + Agglomerative OC + Native infer ence OC + Agglomerative CML + Density r eadout CML + Agglomerative OC + Native infer ence OC + Agglomerative Ideal algorithm FIG. 4: Energy resolution as a function of primary particle energy f or electromagnetic (EM) and hadronic (HAD) showers. P anels show models trained on EM, a mixed-tr ained model eval uated on EM, trained on HAD, and a mixed-trained model ev aluated on HAD. The v ertical dashed line marks the upper boundary of the training energy range. The black dashed curve denotes the ideal pattern-recognition limit , deﬁned as the resolution obtained by a perfect cl ustering algorithm with no merging or splitting, and represents a l ower bound on the achiev able resolution giv en the detector response. Improv ements in CML resol ution follow directly from its im proved clustering purity and red uced merging. 12 1 CMS Collaboration, The Phase-2 Upgrade of the C MS End- cap Calorimeter , T ech. Rep. (CERN, Geneva, 2017) https: //cds.cern.ch/record/2293646 . 2 Xiangyang Ju et al. , “Graph Neural Networks for P ar - ticle Reconstruction in High Energy Physics Detec- tors,” (2020), https://cds.cern.ch/record/2715452 , arXiv:2003.11603 . 3 L ukas Ehrke et al. , “T opological Reconstruction of P article Physics Processes Using Graph Neural Networ ks,” Phys. Rev . D 107 , 116019 (2023) . 4 J avier Duarte and Jean-Roch Vlimant, “Graph Neural Net - works for P article T racking and Reconstruction,” in Arti- ﬁcial Intelligence for High Energy Physics (W orld Scientiﬁc, 2022) pp. 387–436. 5 J an Kieseler , “Object Condensation: One-Stag e Grid-Free Multi-Object Reconstruction in Physics Detectors, Graph, and Image Data,” E ur . Ph ys. J. C 80 , 886 (2020) . 6 Gregory Matousek and Anselm V ossen, “AI- Assisted Ob- ject Condensation Clustering for Calorimeter Shower Reconstruction at CLAS12,” (2025), [physics.ins-det] . 7 Shah Rukh Qasim et al. , “Multi-P article Reconstruc- tion in the High Granularity Calorimeter Using Object Condensation and Graph Neural Networks,” (2021), arXiv:2106.01832 [physics.ins-det] . 8 S. Gardner , R. T yson, D. Glazier , and K. Livingston, “Ob- ject Condensation for T rack Building in a Backward Elec- tron T agger at the EIC,” JINST 19 , C05052 (2024) . 9 R. Hadsell, S. Chopra, and Y . LeCun, “Dimensionality Reduction by Learning an In variant Mapping,” in CVPR (2006) pp. 1735–1742. 10 Y ue W ang, Y ongbin Sun, Ziw ei Liu, Sanjay E. Sarma, Michael M. Bronstein, and J ustin M. Solomon, “Dy- namic Graph CNN for Learning on P oint Clouds,” (2019), arXiv:1801.07829 [cs.CV] . 11 Aaron van den Oord, Y azhe Li, and Oriol Vinyals, “Rep- resentation Learning with Contrastiv e Predictive Coding,” (2019), arXiv:1807.03748 [cs.LG] . 12 Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geo ﬀ rey Hinton, “A Simple Framewor k for Con- trastive Learning of Visual Representations,” (2020), arXiv:2002.05709 [cs.LG] . 13 Prannay Khosla et al. , “Supervised Contrastive Learning,” (2021), arXiv:2004.11362 [cs.LG] . 14 Phuc H. Le-Khac, Graham Healy , and Alan F . Smeaton, “Contrastiv e Representation Learning: A Framewor k and Review,” IEEE Access 8 , 193907–193934 (2020) . 15 Y unfan Li et al. , “Contrastive Clustering,” (2020), arXiv:2009.09687 [cs.LG] . 16 Kilian Lieret et al. , “High Pileup P article T racking with Object Condensation,” (2023), [physics.da ta-an] . 17 Daniel Müllner , “Modern Hierarchical, Aggl omera- tive Clustering Algorithms,” (2011), [stat.ML] . 18 Martin Ester, Hans-P eter Kriegel, Jörg Sander , and Xiaowei X u, “A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise,” in KDD (1996) pp. 226–231. 19 Leland McInnes, John Healy , and Steve Astels, “hdbscan: Hierarchical Density Based Clustering,” J. Open Source Softw . 2 , 205 (2017) . 20 S. A gostinelli et al. , “Geant4—A Simula tion T oolkit,” Nucl. Instrum. Meth. A 506 , 250–303 (2003) . 21 CMS Collaboration, The Phase-2 Upgrade of the C MS End- cap Calorimeter , T ech. Rep. (CERN, Geneva, 2017) https: //cds.cern.ch/record/2293646 . 22 Nural Akchurin, Response of a CMS HGC AL Silicon- P ad Electromagnetic Calorimeter Proto type to 20–300 GeV P ositrons , T ech. Rep. (CERN, Geneva, 2021) https://cds. cern.ch/record/2798347 . 23 CMS Collaboration, “P erformance of the CMS High Gran u- larity Calorimeter Prototype to Charg ed Pion Beams of 20– 300 GeV/c,” (2023), arXiv:2211.04740 [physics.ins-det] . 24 Diederik P . Kingma and Jimmy Ba, “Adam: A Method for Stochastic Optimization,” (2017), arXiv:1412.6980 [cs.LG] . 25 Dingyi Zhang et al. , “Deep Metric Learning with Spherical Embedding,” (2020), arXiv:2011.02785 [cs.CV] . 26 Kevin Musgra ve et al. , “A Metric Learning Reality Check,” (2020), arXiv:2003.08505 [cs.CV] . 27 CMS Collaboration, “The Iterative Clustering (TICL) (v5a) Reconstruction at the CMS Phase-2 High Granularity Calorimeter Endcap,” (2024), https://cds.cern.ch/ record/2920448 . 28 CMS Collabor ation, “Energy C alibration and Resol ution of the CMS Electromagnetic Calorimeter in pp Collisions at √ s = 7 T eV,” JINST 8 , P09009 (2013) .

Contrastive Metric Learning for Point Cloud Segmentation in Highly Granular Detectors

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment