DCTracks: An Open Dataset for Machine Learning-Based Drift Chamber Track Reconstruction

Prep ared for submission to JINST DCT racks: An Open Dataset fo r Machine Learnin g-Based Drift Chamber T rack Reconstructi on Liyan Qian, 𝑎 , 𝑏 Y ao Z hang, 𝑎 , 𝑏 ∗ Y e Y uan, 𝑎 , 𝑏 ∗∗ Zhaoke Zhang, 𝑎 , 𝑏 Jin F ang, 𝑐 Shimiao Jiang, 𝑑 Jin Zhang, 𝑐 Ke Li, 𝑎 , 𝑏 Beijiang Liu, 𝑎 , 𝑏 Chenglin X u, 𝑒, 𝑏 Yifan Zhang, 𝑒, 𝑏 Xiaoqian Jia, 𝑓 Xiaoshuai Qin 𝑓 and Xingtao Huang 𝑓 𝑎 Institute of Hig h Energy Physics, Chinese Academy of Sciences N o.19B Y uqua n Road, Shijing shan, Beijing, Chin a 𝑏 Univ ersity of Chin ese Academy of Sciences N o.19A Y uqua n Road, Shijing shan, Beijing, Chin a 𝑐 Sun Y at-sen Univ ersity School of Scienc e, Shenzhen Campus o f Sun Y at-sen Univ ersity , Sh enzhen, China 𝑑 China Academy of Sp a ce T echnology N o.104 Y o uyi Road , Haidian, Beijing , China 𝑒 Institute of Au tomation Chinese Academy of Scien ces N o.95 Zho nggua n cun Eas t Road, H a idian, Beijin g , China 𝑓 K ey Laboratory of P article Physics and P article Irr adiation (MOE), Institut e of F r ontier and Inter disci- plinar y Sc ience, Sha n dong Univ ersity , Qingdao , Shandon g, China E-mail: zhangy ao@ihe p.ac.cn, yuany@ ihep.a c.cn Abstra ct: W e introd uce a Monte Carlo ( MC ) dataset of single- and two -trac k dr ift chambe r e v ents to advanc e M achine Lear ning ( ML )-based trac k reconst ruction. T o enable stand ardize d and comparab le ev aluation, we deﬁne trac k recons truction speciﬁc metrics and report results f or tradi- tional trac k recons tr uction algorithms and a Graph Neural Netw orks ( GNNs ) method, fa cilitat ing rigorous, reproducib le v alidat ion f or future researc h. Keyw or ds: Data proce ssing methods; Particle trac king detecto rs; Pattern recogniti on, c lus ter ﬁnding, calibr ation and ﬁ tting methods 1 * Corresponding author . 2 ** Cor responding author . Contents 1 Introductio n 1 2 Rel ated wor k 2 3 The cyli ndrical multila y er drift chamber 3 4 Dataset f or drift chambers 4 4.1 Ev ent simulation 4 4.2 Data preproces sing 5 4.3 Dataset description 5 4.4 Dataset access 7 5 Ev aluation metrics 7 6 Benchmark ex periments 9 6.1 T rac k ﬁ nding and ﬁtting 9 6.2 Res ults 10 6.2.1 Hit eﬃciency and hit purity 10 6.2.2 T rac k ﬁ nding and ﬁtting eﬃciencies 12 6.2.3 T rac k parameter perf or mance 13 7 Conclusion 15 8 Outlook 15 A Hit eﬃciency and hit purity 16 B T rac k ﬁn ding and ﬁtting eﬃciencies 18 C T rac k parameters 20 1 Introdu ction Precision test s of th e Sta ndard Model and search es f or ph ysic s be y ond it rel y on the high ener gy ph ysi cs e xper iments. T o achie v e the ph ys ics goals of the e xperiments, high precision detector s and adv anced data analy sis are both es sentia l. In particular , it relies on the precise char g ed particle recons tr uction through pattern recog nition and trac k ﬁtting. As high energ y ph ys ics e xper iments f ace rising ins tanta neous luminosity , detector upgrades and increasingl y stringent demands on data- simulatio n stati sti cal compatib ility , track recon struction m us t maintain accura cy , processin g speed and robus tness under comple x ﬁnal-sta te conditio ns as w ell as detector imperf ections[ 1 – 3 ]. Ke y – 1 – cha lleng es in c har g ed par ticle track recons tr uction include back ground suppr ession , integrating trac k reconst ruction across sub detect ors, impro ve d eﬃciency f or lo w-momentum and displaced trac ks, reduci ng clone and fak e rates, and impro ving data–simulati on agreement. T raditio nal track rec ons truction relies mainl y on mature pattern recognit ion [ 4 ] and Kalman ﬁlter -based track ﬁtting algorithms [ 5 ]. Recen tl y , Machine L ear ning ( ML ) appro ach es — particu- lar ly Graph Neural Netwo rks ( GN Ns ) oﬀer signiﬁcant beneﬁts f or track recon struction b y enabling end-to -end lea rning of trac k parameters such as momentum, di rection and associa ted hits di rectl y from ra w detector data. This capabili ty allo ws f or direct optimiz ation of ke y ev aluation metr ics, making them a highl y promising approa ch f or track reco nst ruction [ 6 – 8 ]. Ho we v er , the shortag e of pu blicl y a v ailable datasets and speciﬁc ev aluation metrics remains a ma jor barr ier: it impedes reprod ucible testing and fair comparison across studie s, and it larg ely discourag es participati on from the broader ML community . In this conte xt, interdisc iplina ry collaboratio n and open datasets are essential to realizing the full potential of ML -based trac k reconstruction. W e address the shor tag e of public dataset s b y releasin g a dr ift chamb er dataset with full Monte Carl o (MC) and detector respons e, f ollo we d b y a preprocess ing pipeline (see section " Dataset f or dr ift chambers "). T o support fair comparison, w e also propose speciﬁc ev aluation metrics (see section " Eva luatio n metrics "). Subseq uently , we implement a ML track recons truction model based on GNN s [ 7 ] and com- pare it with traditi onal methods (see section " Bench mark ex periments "). The results conﬁr m the reliabi lity of this datase t and the eﬀectiv eness of the ev aluation metrics, estab lishin g a robus t, open platf or m f or future researc h in ML -base d trac k reconstruction . 2 Re lated wor k ML -based trac k recons truction has ac hiev ed notabl e progress [ 9 – 11 ] and a handful of public datasets ha v e emerg ed to support this researc h. Diﬀerent research teams use varied datase ts and ev aluation metrics, hindering direct comparison of model per f ormance. The dataset of T rac kML Particle T rac king Challen g e [ 12 ], as utilized by Samuel V an S troud [ 13 ] and Ruso v , D. I. [ 14 ], is g enerated from a gen eraliz ed LHC-lik e detect or and pro vides its e v aluation metrics. Each ev ent simulates one hard top quar k -antiq uark pair ( 𝑡 ¯ 𝑡 ) interactio n o ver laid with an additio nal 200 soft QCD interact ions, which reproduces the high pileup conditi ons e xpect ed at the HL -LH C [ 15 ]. About 10 4 particles and 10 5 hits simulated in an ev ent. T o dev elop and ev aluate particle reconst ruction alg orithms, Luka s Heinrich et al. [ 9 ] utilize the OpenDataDete ctor (ODD) [ 16 ] to genera te simulation ev ents. This vir tual hermetic detector is design ed to serve as a template f or (HL -)LHC-sty le particle de tectors , pro viding a standar dized frame work f or algorithm research and dev elopment. In their wor k, the y genera te top-an titop quark pair ( 𝑡 ¯ 𝑡 ) ev ents with a pile-u p of 200, correspondin g to challen ging high-multip licity scenarios typi cal of collider en vironmen ts. Rec entl y , the ColliderML datas et [ 17 ] was released as a larg e-scale, open, e xperiment-agnos tic resour ce f or high-luminos ity LH C ph ysics. It pro vides o v er one million full y simulatio n e v ents across ten Stan dard Model and Bey ond Stand ard M odel process es, with realistic pile-up ov erla y and O DD-based detector geo metry . While ColliderML ﬁlls critical gap s f or HL -LHC -oriented ML researc h, it s till targ ets the high -pileu p, high-multipli city env ironmen t of futur e hadro n colliders , – 2 – which is fundamental l y diﬀerent from the low -bac kgroun d, lo w-multipl icity scenarios of precis ion e xperiments. U nlike the hi gh-energ y-frontier en vironment of the HL-LHC, precisi on ﬂa v or factories (e.g., BESIII [ 18 ] and B elleII [ 19 ]) operate with much low er back g round s and prioritize precision mea- surement s. The y fea ture subs tantia ll y lo wer ev ent multipli city , cleane r ev ent topolog ies and stricter req uirement s on momentum resolution and trac king eﬃc iency , especiall y f or lo w- momentum par - ticles. This creates a g ap: there is a shortag e of datasets with simple track topologies that fait hfull y captur e dr ift chamber chara cteristics —an essenti al resou rce f or the fundament al va lidatio n and iterati v e dev elopment of trac k recons tr uction methods tailore d to 𝜏 -cha rm e xperiments. W e aim to esta blish suc h a datase t to accel erate the dev elopment of ML meth ods f or track reconstruction in high-p recisi on ph ysics e xperiments. 3 The cylindrical multila y er drift chamber The cy lindrical multila y er dr ift chamber is task ed with measuring the momentum and positi on of the trac ks f or ﬁ nal-s tate charg ed particles and identifyin g par ticle specie s b y measuring the ioniza tion ene rgy loss (dE/dx) of char g ed pa rticles in the ga s. It is widel y adopte d in high ener gy ph ysi cs e xper iments includ ing BES III, CE PC [ 20 ], STCF [ 21 ], BelleII, COMET [ 22 ], MEGII [ 23 ], FCC [ 24 ]. Our dataset is based on the M ultil a y er D rift Chamber ( MDC ) [ 25 ] of the BESIII spectr ometer . The BES III at the Beijing E lectron Positron Collider II ( BEPC II )[ 26 ] is located in Beijing, China, condu cts particle phy sics researc h in the 𝜏 -charm energy region . Since 2009, BEPCII has acc u- mulated appro ximatel y 10 billion 𝐽 / 𝜓 ev ents, 2.7 billion 𝜓 ( 2 𝑆 ) ev ents and 20.3 fb − 1 of data at the 𝜓 ( 3770 ) reso nance [ 27 ]. Figure 1 sho ws the structure of MDC and the 3D vie w of an ev ent. Geometricall y , the MDC f eatures a length of 2400 mm, an inner radius of 59 mm and an outer radius of 800 mm, with a polar angle co vera g e in − 0 . 93 < cos 𝜃 < 0 . 93. It consis ts of 6796 drift cells, each with a squ are-lik e structure. In ter ms of wire la y er ing, the MDC has 43 sense-wire la y ers, grouped in to super la yer s of f our sense-wire la ye rs each, ex cept f or the oute rmost super la yer , which conta ins 3 la y ers (see table 1 ). The M DC operates in a 1 . 0 T magnetic ﬁeld with a helium-b ased gas mixture as the wo rking medium. The design single -wire spatial resolu tion is about 130 𝜇 m and a transv erse momentum resolution = 0.5% at 1 Ge V/ 𝑐 . Figure 1 . BESIII MDC structu re (left) and 3D view of the ev en t (r ight). – 3 – T able 1 . MDC lay er str ucture and geometr y p arameters. Superla y er T ype 𝑵 la yer 𝑵 wire / la yer Radius ( mm) Length (mm) 1 U 4 40,44 ,48,56 ∼ 79 – 11 5 780–8 16 2 V 4 64,72 ,80,80 ∼ 127 – 1 62 828 – 8 64 3 A 4 76,76 ,88,88 ∼ 197 – 2 46 1092 –1272 4 A 4 1 00,100 ,112,11 2 ∼ 262 – 311 1442 – 16 12 5 A 4 1 28,128 ,140,14 0 ∼ 327 – 375 1782 – 19 52 6 U 4 160 × 4 ∼ 400 – 4 48 2174 – 2 192 7 V 4 176 × 4 ∼ 464 – 5 14 2198 – 2 216 8 U 4 208 × 4 ∼ 530 – 5 79 2222 – 2 240 9 V 4 240 × 4 ∼ 595 – 6 42 2246 – 2 264 10 A 4 256 × 4 ∼ 667 – 7 16 2276 – 2 294 11 A 3 288 × 3 ∼ 732 – 7 63 2300 – 2 306 Notation: a × n denotes n lay ers each with number of wire a. A: axial superla y ers, U: stereo superla yers with negativ e tilt angle, V : Stereo superla y ers with positiv e tilt angle. 4 Dataset for drift chamber s 4.1 Ev ent simulation The dataset in this w ork is g enerated using a GE ANT4-based full simulat ion [ 28 ] in the BESIII Oﬄine Softwa re S y stem (BOSS) [ 29 ]. T o support f oundatio nal researc h and reduce comple xity of trac k recons truction algorithm, sing le-tr ac k and two -tr ac k ev ents are included. T o av oid complica- tions from curle d trac ks in the MDC , we impose a requiremen t of transv erse momentum 𝑝 T > 0.15 Ge V . T he detailed simulati on settings are list ed in table 2 . W e plan to include dedic ated support f or lo w- 𝑝 T cur led tracks in future updates. T able 2 . Kinematic settings for sing le-trac k and tw o -tr ack ev ent simu lation. Ev ent T ype 𝑝 𝑇 [Ge V/ 𝑐 ] cos 𝜃 𝜙 [rad ] P ar ticles Single-track 0 . 15 ∼ 1 . 5 − 0 . 93 ∼ 0 . 93 0 ∼ 2 𝜋 𝑒 ± , 𝜇 ± , 𝜋 ± , 𝐾 ± , 𝑝 , ¯ 𝑝 Conv ention al two-trac k 0 . 15 ∼ 1 . 5 − 0 . 93 ∼ 0 . 93 0 ∼ 2 𝜋 𝜋 + 𝜋 − Close-by two-track 0 . 15 ∼ 1 . 5 − 0 . 93 ∼ 0 . 93 Δ 𝜙 = 0 . 2 𝜋 + 𝜋 − For sing le-tr ac k ev ents, each ev ent contains one charg ed track, as illus trated in ﬁ gure 2 (a). These e v ents include ﬁ v e char ged particles sp ecies: 𝑒 ± , 𝜇 ± , 𝜋 ± , 𝐾 ± , 𝑝 and ¯ 𝑝 . T o ensure comprehen siv e and eﬀectiv e model training, all ev ents are g enerat ed with kinematic parameters ( 𝑝 T , cos 𝜃 , 𝜙 ) sampled unif or m l y o ve r the accessible phase space. For two- tr ack ev ents, each ev ent contains tw o char ged tracks , as sho wn in ﬁgure 2 (b) and (c). These are further categorized into two types: conv entional tw o-tr ac k ev ents in ﬁgure 2 (b), where the azimutha l angle diﬀerence Δ 𝜙 betwe en the tw o trac ks is uncons trained, and close-by tw o-trac k e v ents in ﬁgure 2 (c), where Δ 𝜙 betw een the tw o track s is const rained to a narrow rang e. For both types , the kinemat ic parameters of eac h indiv idual track in the tw o-trac k ev ent are sampled unif or ml y ov er the accessible phase space . – 4 – T o reproduce the e xperimental conditions, all simulated ev ents are mix ed with noise includi ng beam-ind uced back grounds and detecto r noise measured in real data. −0.8 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8 X (m) −0.8 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8 Y (m) Single-track Simulation Noise Signal −0.8 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8 X (m) −0.8 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8 Y (m) Conventional two-track Simulation Noise Signal −0.8 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8 X (m) −0.8 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8 Y (m) Close-by two-track Simulation Noise Signal Figure 2 . Disp la ys of the simulated ev ents in the x-y plane for a sing le-tr ack ev ent (left), a conventiona l tw o-trac k ev en t ( middle) and a close-by tw o -tr ack ev ent ( rig ht). 4.2 Data preproces sing T o build a high-q uality dataset , we appl y a ser ies of selections to the simulated ev ents. The detailed step s are described as f ollo ws : Ev ent-le v el selection. Using MC tr uth, w e identify and remov e ev ents or iginatin g from non-si gnal process es (e.g., non-tar get deca ys) . This truth-le vel v eto suppre sses bac kground con- taminatio n, impro v es data sample purity and pre ven ts the model from learning spurious cor relatio ns unrela ted to the signal. T rac k- le v el selection. T o ensure that the tra c ks used f or ML training hav e suﬃci ent number of hits f or the ﬁtting (see sectio n " T rac k ﬁ nding and ﬁtting "), onl y those track s that tra ve rse at leas t 6 la y ers in the MD C are retained . Fo r an y trac k that fa ils to meet this minimum la y er requ irement, all its corresponding hits are labeled as noise hits. This operat ion eﬀectiv ely ﬁlters out short trac ks. 4.3 Dataset description The dataset is stored in Comma-Separated V alues (CSV) f orm at, chos en f or its e x cellen t compatibil - ity , readability and ease of parsing across various programming langua g es and anal y sis framew orks. Each ro w represents a single detector hit in the MDC. T o suppo rt noise hit ﬁ ltering, track ﬁnding and global trac k ﬁtting tasks, this hit-centric dataset has its fea tures and labels deﬁned as f ollo ws: F eature s. Inp ut feat ures are der iv ed from individual dr ift chamber hit measurements, capturing both the spatial and phy sical measurement properties of each hit. Spatial f eatures describe the g eometric position and hierarch ical structure of the sense wire associated w ith each hit. • middleX , middl eY The Cartesian coordinat es of the mid-point of the sense wire at the two ends of the MDC (in cm). • layer,sl ayer,local layer As described in section " The cy lindrical m ultil a ye r dr ift chamber ", layer is the global la y er – 5 – inde x (ranging from 0 to 42, correspondin g to the 43 total sense-wire la yer s); slayer is the super la yer inde x (ranging from 0 to 10, f or the 11 total super la y ers); lo callayer is the local la y er inde x of its par ent super la y er (ranging from 0 to 3 f or mos t super la y ers and 0 to 2 f or the outermost super la yer ). Measure ment fe atures charact erize the phy sical signal recorded from the hit, speciﬁcall y related to the drift time measurement. • rawDrift Dist,rawDr iftDistErr rawDrift Dist is the drift distan ce in the cell, der iv ed from the measured drift time by an ini- tial T -X (time–dis tance) calibration (in cm); rawDr iftDistErr is the estimate d uncer tainty of rawD riftDist (in cm). Labels. L abels are divided into tw o lev els: hit-le v el and trac k -le vel , allo wing the model to ﬁrst dis tingui sh signal hits from noise at the hit le v el and learn track -le v el parameter s (e.g., momentum, positi on and char ge). Hit-le v el labels are assigned to each indiv idual hit and are primar il y used f or hit classiﬁcation , noise suppression and hit-to -trac k g roup ing. • isSignal A ccording to MC truth, isSignal is 1 f or signal hits and 0 f or noise hits. This label is cr ucial f or trainin g the m odel to reject noise. • trackInd ex The uniq ue identi ﬁer of the simulated particle to which this hit belongs. Signal hits from the same simulated particle share the same tra ckIndex , with tra ckIndex > 0. This label enabl es supervised lear ning of hit-to-t rac k associat ion and serv es as g roun d tr uth f or ML -based trac k recons tr uction methods. • scaledFl tLen The path length alon g the trac k from the par ticle ’ s production v ertex to the hit posit ion, normalized b y the circumf erence of the cor respo nding helix turn. • lrAmbig The hit left-right ﬂag is a binary label indicating on which side of the sense wire the hit lies in the local wire coordin ate sys tem. T rac k- le v el labels are assigned at the lev el of each MC simulated particle and pro vide the trac k paramete rs and spatial inf or mation needed f or the supervised lear ning. • initialM omX , initi alMomY , initialM omZ The momentum v ector components at the point of closes t approac h (PO C A) to the or igin 𝑂 ( 0 , 0 , 0 ) (in Ge V/ 𝑐 ) of the particle. These valu es serve as g round -truth targ ets f or momen- tum regression. • initialP osX , initi alPosY , initialP osZ The Car tesian coord inates at the P OCA to the origin of th e particle (in cm). These v alues pro vide ground truth f or v erte x regression. – 6 – • charge Signed char g e of the track ( + 1 or − 1). The separation of hit-le v el and track -lev el labels supports v arious tasks lear ning, such as binary classiﬁca tion (signal v s. noise) , cluste ring (hit clustering f or trac k ﬁnding) and regression (trac k paramete rs). A dditional fe ature and label detail s are not elaborate d on here. Please ref er to the oﬃcial documen tation of the dataset, w here access inst ructions are speciﬁed in the section " D atase t access ". 4.4 Dataset access For members of the BESIII Collabora tion, the dataset is a vailab le f or direct do wnload from the IHEP AI Platf orm [ 30 ] at h ttps://ai. ihep.ac.cn . T o support cross-disci plinary collabor ation on this datase t, e xter nal researc hers ma y reque st acces s b y emailing [hep ai@ihep.ac .cn] and pro viding a short de scription of the researc h objec tiv es and intended use. Req uests are subject to appro val b y the BE SIII Software G roup. 5 Ev aluation metrics T o assess trac k reconst ruction per f ormance and facilit ate f air comparison among ML -based meth- ods, we introduce a set of speciﬁc ev aluation metrics. T he algorithms f or these metrics are a v ailabl e on GitHub at https:// github.com /lyqian1220/DCTracksMetrics.git . Hit eﬃciency ( 𝜖 hit ) is deﬁned as the fraction of a par ticle ’ s detectab le truth hits that are cor rectl y recons tr ucted and matche d to that particle: 𝜖 hit = 𝑁 matc hed hit 𝑁 detec table hit . (5.1) Here, 𝑁 matc hed hit is the number of recons tr ucted h its cor rectly matched to the par ticle and 𝑁 detectable hit is the number of reado ut-eli gible MC tr uth hi ts fro m that particle (i.e., after o v erla y , digitization , thresh olding and detector ineﬃciency losses). Hit purity ( 𝑝 hit ) is deﬁned as the fractio n of recons tructed hits assigned to a track that are correctl y matched to the originating particle: 𝑝 hit = 𝑁 matc hed hit 𝑁 assigned hit . (5.2) Here, 𝑁 assigned hit is the total number of reconst ructed hits assigned to the trac k. T rac k eﬃciency ( 𝜖 trac k ) is deﬁned as the fraction of detecta ble tr uth track s f or w hich a matched recons tr ucted track e xist s: 𝜖 trac k = 𝑁 matc hed trac k 𝑁 detec table trac k . (5.3) Here, we deﬁne a simulated particle a detecta ble tr uth trac k if it has at leas t six detecta ble tr uth hits. 𝑁 detec table trac k denote s the number of detectable truth trac ks in the samples and 𝑁 matc hed trac k denote s the subset that ha v e a matched recons tructed trac k. – 7 – A recons tr ucted track is considered to be a matc hed tr ac k if it satisﬁes the trac k -matchin g criter ia: 𝑝 hit > 0 . 50, 𝜖 hit > 0 . 20 and 𝑁 matc hed hit ≥ 6. The 𝑝 hit thresh old enf orces hit pur ity—at leas t half of the hits assigned to the recons tr ucted trac k must originate from the same tr uth trac k; the 𝜖 hit thresh old enf orces hit eﬃciency—a minimum fraction of that tr uth track’ s detecta ble tr uth hits must be reco v ered; and the 𝑁 matc hed hit req uirement ensures a minimum number of hits f or a st able helix ﬁt and suppre sses spurious candidate s suc h as hit- sharing ar tif acts and random combinations. W e deﬁne a fa ke tr ac k as one that fails to satisfy the requi rements f or pur ity or eﬃciency . If multiple recons tr ucted trac ks satisfy the matchi ng cr iteria f or the same detectabl e truth trac k, the candida te with the highes t 𝜖 hit is retained as the matche d one and the remainder are ter med clone trac ks . T rac k charg e eﬃciency ( 𝜖 trac k,q ) is deﬁned as the fra ction of detectable truth trac ks that are recons tr ucted with the correct char g e: 𝜖 trac k,q = 𝑁 matc hed,q-correct trac k 𝑁 detec table trac k . (5.4) W rong charg e rate ( 𝑅 wrong,q ) is deﬁne d as the fraction of detec table truth trac ks that are recons tr ucted with the wrong charg e: 𝑅 wrong,q = 𝑁 matc hed,q-inco rrect trac k 𝑁 detec table trac k . (5.5) Clone rate ( 𝑅 clone ) is deﬁned as the total cou nt of clone trac ks divide d b y the total count of detect able tr uth tracks : 𝑅 clone = 𝑁 clone trac k 𝑁 detec table trac k . (5.6) Here, 𝑁 clone trac k denote s the total number of clone trac ks in the samples. F ak e rate ( 𝑅 f ake ) is deﬁned as the total number of fak e tracks divided b y the total number of detect able tr uth tracks : 𝑅 f ake = 𝑁 f ake trac k 𝑁 detec table trac k . (5.7) Here, 𝑁 f ake trac k denote s the total number of fak e track s in the samples. T o chara cterize the per f ormance of the trac k ﬁnding and trac k ﬁtting stag es of track recons tr uc- tion separatel y , w e deﬁne tw o metr ic sets. For trac k ﬁnding, we report the trac k ﬁnding eﬃciency , tr ack c har g e ﬁn ding eﬃciency , clone ﬁnding r ate , f ake ﬁnding rat e and wrong c harg e ﬁnding rat e . For trac k ﬁtting, we rep ort the trac k ﬁtting eﬃciency , trac k c har g e ﬁtting eﬃciency , clone ﬁtting r ate , fa ke ﬁtting ra te and wrong c harg e ﬁtting rat e . Finall y , we ev aluate the precisio n of the matched recons tructed tra c k parameter s, f ocusi ng on the transv erse m omentum 𝑝 T f or trac ks with correct charg e. The nor malized residu al is deﬁned as 𝜂 𝑝 T = 𝑝 reco T − 𝑝 MC T 𝑝 MC T . (5.8) This quan tity 𝜂 𝑝 T repres ents the relativ e de viatio n of the recons tructed 𝑝 T from its MC tr uth v alue, normalized to the MC tr uth 𝑝 T . – 8 – The dist ribution of 𝜂 𝑝 T is typica ll y Gaussian f or unbiased reconstruction . The 𝑝 T resolu tion is then quantiﬁed as the 68% co v erag e of the absolute residu al distribution around its median: 𝑟 ( 𝑝 T ) = 𝑃 68%    𝜂 𝑝 T − 𝑃 50% ( 𝜂 𝑝 T )    , (5.9) where 𝑃 𝑞 denote s the 𝑞 -th q uantil e of the distribution and 𝑃 50% is the median [ 7 ]. For a normal dis tribution, this cor respo nds to the standard de viation. 6 Benc hmark exp eriments T o vali date the eﬀectiv eness of our dataset and ev aluation metr ics, and to est ablish a uniﬁed benc hmark f or do w ns tream ML-based methods , we conduct a comparativ e study using tw o trac k ﬁnding approac hes—a traditional method and a M L-based method. B oth approac hes are e valuat ed with and without subseque nt track ﬁtting. N otabl y , the results f or the ML -based method are prelimin ary , se rving as an ex plorat ory start f or future de v elopment. 6.1 T rack ﬁnd ing and ﬁtting Baseline trac k ﬁnding. The baseline trac k ﬁnding (called Baseline F inder in the f ollo wing) emplo ys traditi onal track recon struction algorithms [ 31 – 33 ] in the BOSS to recons tr uct trac k candidates from detector hits, assuming a unif orm 1 T m agne tic ﬁeld and negle cting energ y loss and multiple scattering. The Baseline Fi nder emplo ys pattern dictio nary matching, local trac k segment ﬁnding, Hough transf orm and other techniq ues. GNN trac k ﬁnding. The GN N -base d trac k ﬁnding method (called G NN Find er hereaf ter) adopte d in this wo rk f ollo ws the end-to- end multi-trac k reconst ruction frame work [ 7 ] propose d b y L. Re uter et al. This method processes raw detector hits without prior ﬁltering, simultaneou sl y predic ting both the number of track candidates in a n ev ent and their tr ack parameters. In a subseq uent clus tering step, hits are assigned to eac h predic ted trac k candidat e and passed to the trac k ﬁtting stag e. In our work, f or sing le-tra c k ev ents, the ﬁv e par ticle species were process ed b y combining their positi v ely and nega tiv ely char ged counte rpar ts f or training and vali dation , while the positiv ely and neg ativ ely charg ed particles of each species were tested separatel y . For the conv entional two -tr ac k and close-b y two-t r ac k ev ents, each cate gory was trained, v alidat ed and tested ind epend entl y . In terms of dataset sc ale, all training and v alida tion sets comprise appro ximatel y 100,000 simulated e v ents, w hic h w ere split into training and v alida tion subsets at a ratio of 9:1. For the independe nt test sets, eac h kin d of sing le-tra c k ev ents (f or positiv e and negati v e char ges respect iv ely ) contain s about 55,000 ev ents and each two -trac k subcat egory ( conv entional and close-by ) includes arou nd 25,000 ev ents. In th e future, w e plan to pro vide a mixed -e ve nt datas et. Rese arch ers can then use this dataset to jointl y train, v alidate and test the mode l, which ma y help impro v e its pe rf ormance and gene raliza tion ability . T rac k ﬁtting. The trac k ﬁnder pro vides initia l estimates of track parameter s and the associated hits to th e sub seq uent track ﬁtting. First, to impro ve the qualit y of the tra c k candi dates from the trac k ﬁnder , a Run g e-Kutta [ 34 ] ﬁtting cor rects the trac ks considering the ener gy loss, the multiple scattering and the non-unif orm magneti c ﬁeld eﬀect. Then, trac ks are ﬁtted b y GenFit [ 35 , 36 ], – 9 – where mass h ypotheses are applied b y Kalman ﬁlter and trac k parameters are deﬁned at the P O CA to the or igin. In the f ollo wing, Baseline F itte r ref ers to the trac k collecti on obtained b y ﬁtting the outpu ts of the Baseline Fin der and like wise f or the GNN Fitt er . 6.2 Re sults This section sho ws a comparison betw een the GN N F inder and the Baseline F inder , both with and without track ﬁ tting. W e e v aluate the track recons truction perf or mance ov er three e v ent categories in our datase t (see table 2 ): sing le-tra c k ev ents, conv entional two -tr ac k ev ents and close -by two - tr ack e ve nts. Re sults f or sing le-tr ac k ev ents are illustrat ed using 𝜋 + as a repr esenta tiv e; those f or other single-trac k par ticle specie s are f ound to be similar and are thus omitted f or bre vity . The results co v er " Hit eﬃciency and hit purity " and " T rack ﬁnding and ﬁ tting eﬃciencie s ", f ollo we d by " T rac k parameter per f ormance ". Hereafter , we use 𝑝 MC T and cos 𝜃 MC to denote the transv erse momentum and the cosin e of the polar angle of detectab le tr uth trac ks, respectiv ely . Example displa ys of trac k recons tr uction ev ents are sho w n in ﬁgure 3 f or diﬀerent ev ent catego ries. −0.8 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8 X (m) −0.8 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8 Y (m) Single- rack Reconstruction Noise Condensa ion T rack 1 Cor r ec R eco . (T rack 1) −0.8 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8 X (m) −0.8 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8 Y (m) Conven ional wo- rack Reconstruction Noise Condensa ion T rack 1 Cor r ec R eco . (T rack 1) T rack 2 Cor r ec R eco . (T rack 2) W r ong R eco . −0.8 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8 X (m) −0.8 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8 Y (m) Clo e-by two-track Reconstruction Noi e Conden ation T rack 1 Cor r ect R eco . (T rack 1) T rack 2 Cor r ect R eco . (T rack 2) W r ong R eco . Figure 3 . Display s of reconstr ucted ev ents in the x-y plane for a sing le - tr a c k ev ent (left), a conventional tw o-trac k ev ent (middle) and a close-by tw o-trac k ev ent (r ight). A co n densation p oint on the track provides estimates the track param e ters. Th is co n cept is closely related to the GNN ﬁn d ing metho d we u se. 6.2.1 Hit eﬃciency and h it purity The hit eﬃciency ( 𝜖 hit ) and hit purity ( 𝑝 hit ) f or track s f ound b y both the GNN Finder and the Baseline F inder are summar ized in table 3 . For sing le-tr ac k 𝜋 + and conv entiona l two-t r ac k 𝜋 + 𝜋 − e v ents, the G NN Find er ex hibits compa- rable hit eﬃcienc y and hit purity to that of the Baseline Fin der . Figure 4 sho ws the hit eﬃciency and hit pur ity as functio ns of 𝑝 MC T and cos 𝜃 MC f or sing le-tr ac k e v ents, comparing the perf or mance of the GN N Find er and the Baseline Fin der . The correspondin g dis tributions f or the conv ention al tw o-trac k e v ents are pro vided in the appendix A . In co ntras t, f or clo se-by tw o-tr ac k ev ents, the GNN Finde r e xhibi ts a signi ﬁcant degradatio n in hit eﬃciency , while hit pur ity remains comparable. As this ex periment represents an initial e xploration , we anticipate that future in ves tigat ions by research ers will fur ther reﬁne ML methods to addres s suc h scenarios. T he cor respo nding dist ributions f or close-by two- tr ac k ev ents are presente d in appendix A . – 10 – T able 3 . Hit eﬃciency ( 𝜖 hit ) and hit pu rity ( 𝑝 hit ) f o r diﬀerent e v e n t catego r ies. in % Ev ent T ype 𝝐 hit 𝒑 hit Baseline Finder Single-track ( 𝜋 + ) 92 . 24 + 0 . 12 − 0 . 12 98 . 58 + 0 . 05 − 0 . 05 GNN Finder 92 . 20 + 0 . 12 − 0 . 12 98 . 91 + 0 . 05 − 0 . 05 Baseline Finder Conv ention al two-trac k ( 𝜋 + 𝜋 − ) 90 . 8 7 + 0 . 14 − 0 . 14 97 . 93 + 0 . 07 − 0 . 07 GNN Finder 91 . 62 + 0 . 13 − 0 . 13 98 . 83 + 0 . 05 − 0 . 05 Baseline Finder Close-by two-track ( 𝜋 + 𝜋 − ) 91 . 26 + 0 . 16 − 0 . 16 97 . 95 + 0 . 08 − 0 . 08 GNN Finder 82 . 68 + 0 . 21 − 0 . 21 97 . 89 + 0 . 08 − 0 . 08 0.22 0.35 0.49 0.62 0.76 0.89 1.03 1.16 1.30 1.43 p MC T [GeV/ c ] 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Hit Efficiency Sin le-track π + Baseline F inder GNN F inder -0.86 -0.70 -0.50 -0.30 -0.10 0.10 0.30 0.50 0.70 0.86 co s θ MC 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Hit Efficiency Single-track π + Baseline F inder GNN F inder 0.22 0.35 0.49 0.62 0.76 0.89 1.03 1.16 1.30 1.43 p MC T [GeV/ c ] 0.85 0.87 0.89 0.91 0.93 0.95 0.97 0.99 1.01 H t P ur ty S ngle-track π + Basel ne F nder GNN F nder -0.86 -0.70 -0.50 -0.30 -0.10 0.10 0.30 0.50 0.70 0.86 cos θ MC 0.85 0.87 0.89 0.91 0.93 0.95 0.97 0.99 1.01 Hit P urity Single-track π + Ba eline F inder GNN F inder Figure 4 . Hit eﬃciency and h it pu rity for tracks found by both the GNN F inder and the Baseline F in der . R esults are sh o wn as f unctions of 𝑝 MC T (left co lumn) an d cos 𝜃 MC (r ight colum n) for single-tr ack 𝜋 + ev ents. – 11 – 6.2.2 T rack ﬁnd in g and ﬁtting eﬃciencies The track ﬁnding and ﬁtting eﬃciencies f or trac ks f ound by both the G N N Fi nder and the Baseline F inder are summar ized in table 4 . For sing le-tr ac k 𝜋 + and conv entional tw o-tr ac k 𝜋 + 𝜋 − e v ents, w e ﬁnd that the GNN Fin der ach ie ves trac k ﬁnding eﬃciencie s comparable to that of the Baseline Finde r , with ﬁtting eﬃciencies margi nall y lo w er , while its w rong charg e rate is slightl y higher . The trac k eﬃciency and trac k charg e eﬃcienc y f or sing le-tra c k 𝜋 + e v ents are sho wn as functions of 𝑝 MC T and cos 𝜃 MC in ﬁgure 5 ; f or conv ention al two -tr ack ev ents, corresponding distribution ﬁ gures are presented in appendix B . T able 4 . T rack ﬁnd ing and ﬁtting eﬃciencies for diﬀeren t e v e n t catego ries. in % 𝜖 track 𝜖 track,q 𝑅 clone 𝑅 f ake 𝑅 wrong,q Single-tr a ck 𝜋 + ev ents Baseline Finder 99 . 71 + 0 . 02 − 0 . 02 99 . 69 + 0 . 02 − 0 . 02 0 . 07 + 0 . 01 − 0 . 01 0 . 01 0 . 02 + 0 . 01 − 0 . 01 GNN Finder 99 . 81 + 0 . 02 − 0 . 02 99 . 55 + 0 . 03 − 0 . 03 0 . 00 0 . 01 0 . 27 + 0 . 02 − 0 . 02 Baseline Fitter 99 . 70 + 0 . 02 − 0 . 02 99 . 68 + 0 . 03 − 0 . 03 0 . 06 + 0 . 01 − 0 . 01 0 . 01 0 . 02 + 0 . 01 − 0 . 01 GNN Fitter 99 . 75 + 0 . 02 − 0 . 02 99 . 50 + 0 . 03 − 0 . 03 0 . 00 0 . 01 0 . 25 + 0 . 02 − 0 . 02 Conv entional tw o-track 𝜋 + 𝜋 − ev ents Baseline Finder 99 . 63 + 0 . 03 − 0 . 03 99 . 59 + 0 . 03 − 0 . 03 0 . 10 + 0 . 01 − 0 . 01 0 . 01 + 0 . 01 − 0 . 01 0 . 04 + 0 . 01 − 0 . 01 GNN Finder 99 . 50 + 0 . 03 − 0 . 03 99 . 31 + 0 . 04 − 0 . 04 0 . 00 0 . 02 + 0 . 01 − 0 . 01 0 . 19 + 0 . 02 − 0 . 02 Baseline Fitter 99 . 62 + 0 . 03 − 0 . 03 99 . 59 + 0 . 03 − 0 . 03 0 . 10 + 0 . 01 − 0 . 01 0 . 01 + 0 . 01 − 0 . 01 0 . 03 + 0 . 01 − 0 . 01 GNN Fitter 99 . 45 + 0 . 04 − 0 . 04 99 . 29 + 0 . 04 − 0 . 04 0 . 00 0 . 02 + 0 . 01 − 0 . 01 0 . 16 + 0 . 02 − 0 . 02 Close-by tw o-track 𝜋 + 𝜋 − ev ents Baseline Finder 99 . 55 + 0 . 03 − 0 . 03 99 . 52 + 0 . 03 − 0 . 03 0 . 13 + 0 . 02 − 0 . 02 0 . 02 + 0 . 01 − 0 . 01 0 . 03 + 0 . 01 − 0 . 01 GNN Finder 76 . 22 + 0 . 20 − 0 . 20 75 . 44 + 0 . 21 − 0 . 21 0 . 14 + 0 . 02 − 0 . 02 0 . 32 + 0 . 03 − 0 . 03 0 . 77 + 0 . 04 − 0 . 04 Baseline Fitter 99 . 53 + 0 . 02 − 0 . 02 99 . 50 + 0 . 03 − 0 . 03 0 . 12 + 0 . 02 − 0 . 02 0 . 01 0 . 03 + 0 . 01 − 0 . 01 GNN Fitter 75 . 85 + 0 . 20 − 0 . 20 75 . 27 + 0 . 21 − 0 . 21 0 . 12 + 0 . 02 − 0 . 02 0 . 20 + 0 . 02 − 0 . 02 0 . 58 + 0 . 04 − 0 . 04 – 12 – 0.2 0.4 0.6 0.8 1.0 1.2 1.4 p MC T [GeV/ c ] 0.94 0.96 0.98 1.00 T rack Efficiency Sing e-track π + GNN F inder Base ine F inder GNN F itter Base ine F itter −0.75 −0.50 −0.25 0.00 0.25 0.50 0.75 cos θ MC 0.94 0.96 0.98 1.00 T rack Efficie cy Si gle-track π + GNN F i der Baseli e F i der GNN F itter Baseli e F itter 0.2 0.4 0.6 0.8 1.0 1.2 1.4 p MC T [GeV/ c ] 0.94 0.96 0.98 1.00 T rac Char ge Efficiency Single-trac π + GNN F inder Baseline F inder GNN F itter Baseline F itter −0.75 −0.50 −0.25 0.00 0.25 0.50 0.75 cos θ MC 0.94 0.96 0.98 1.00 T rack Char ge Efficiency Sing e-track π + GNN F inder Base ine F inder GNN F itter Base ine F itter Figure 5 . T rack eﬃ ciency and tra ck charge eﬃcie n cy for tracks found by bo th the GNN F inder (o range) and the Baseline Finder (b lue) with an d witho ut ﬁtting. Results are shown as func tio ns of 𝑝 MC T (left colum n) and cos 𝜃 MC (r ight colum n) for single-tr ack 𝜋 + ev ents. In contrast , f or close-b y tw o-tra c k ev ents, where the tw o track s are closel y spaced, the per - f ormance g ap becomes pronou nced. The GNN F inder and F itte r suﬀer a subs tantia l drop in trac k ﬁnding and ﬁtting eﬃciencie s compared to the Baseline ’ s near ∼ 100% v alues. A dditio nall y , the wrong c har g e rate r ises sharpl y to 0.77% (th e GNN Finde r ) and 0.58% (the GNN F itte r ), far ex- ceedin g the Baseline ’ s ∼ 0.03 %. The corresponding dis tr ibution ﬁgures are present ed in appendix B . 6.2.3 T rack parameter per forma nce The tra jectory f or a charg ed track in a unif orm magnetic ﬁeld can be represen ted b y a helix which can be deﬁned with ﬁv e trac k parameters ( 𝑑 𝑟 , 𝜙 0 , 𝜅 , 𝑑 𝑧 , tan 𝜆 ) 𝑇 , deﬁned at the P OCA to the origin. An animated visua lizatio n of this helix parametrization f or a par ticle tra jectory is av ailable at https:// lyqian1220 .github.io/ ; a static illustr ation is sho wn in ﬁ gure 6 .The ﬁv e trac k paramete rs are deﬁned as f ollo ws: • 𝑑 𝑟 is the signed distan ce from the POCA to the or igin in the 𝑥 - 𝑦 plan e (in cm). The sign is deﬁned by ( ® 𝑑 × ® 𝑝 ) , where ® 𝑑 is the vec tor from origin to the trac k and ® 𝑝 is the tang ent to the trac k direction. • 𝜙 0 is the azimuthal angle of the P OCA relativ e to the helix center in the transv erse plane (in rad). The rang e of 𝜙 0 is from 0 to 2 𝜋 . – 13 – • 𝜅 is the recipro cal of the transv erse momentum 𝑝 𝑇 (in ( Ge V / 𝑐 ) − 1 ). The sign of 𝜅 re presen ts the charg e of the trac k. • 𝑑 𝑧 is the z-coor dinate of the POCA relativ e to the or igin (in cm). • tan 𝜆 is the slope of the trac k, or the tang ent of the dip angle 𝜆 . The polar angle of the trac k is deﬁned as 𝜃 ≡ 𝜋 / 2 − 𝜆 . Figure 6 . Helix para metr ization o f a par ticle trajector y . W e ev aluate the transv erse momen tum resolution f or reconstructed trac ks w ith cor rect charg e that are f ound by both the G NN F inder and the B aseli ne F inder . The relati v e transv erse momentum resolu tion of the GNN Fin der is inf erior to that of the Baseline Finde r , while the GN N F itte r is comparab le to the Baseline Fitt er , as sho wn in ﬁgure 7 . Ho we ve r , in the case of close-b y tw o- tr ack e v ents, the G NN Finde r ex hibits clear ly wors e resolutio n. T he detailed distributions of trac k paramete rs f or both the GNN Find er and the Baseline F inder with and without ﬁtting are sho wn in appen dix C . 0.25 0.50 0.75 1.00 1.25 1.50 p MC T [ GeV / c ] 10 −3 10 −2 10 −1 10 0 Resolution ( p pred T − p MC T )/ p MC T Si gle-track π + Baseli e F i der GNN F i der Baseli e F itter GNN F itter 0.25 0.50 0.75 1.00 1.25 1.50 p MC T [ GeV / c ] 10 −3 10 −2 10 −1 10 0 Resolu tion( p pred T − p MC T )/ p MC T Convent. π + π − Baseline F inder GNN F inder Baseline F itter GNN F itter 0.25 0.50 0.75 1.00 1.25 1.50 p MC T [ GeV / c ] 10 −3 10 −2 10 −1 10 0 Res olution( p pred T − p MC T )/ p MC T Clo e-by π + π − Bas eline F inder GNN F inder Ba eline F itter GNN F itter Figure 7 . R elative transverse momen tum r e solution f o r tracks f ound by bo th the GNN F ind er and the Baseline Finder with an d withou t ﬁtting . R esults are shown as a function of 𝑝 MC T f o r sing le-track ev ents (left), conventional two-tr ack ev ents (mid dle) and close-by two-trac k (r ight) ev en ts. – 14 – 7 Conclusion This work presents an open datas et f or ML -based track recons truction, built from realistic dr ift cha mber simulatio n and detector respon se. It co v ers the phase space within the detector acceptan ce, includ ing sing le-tra c k , conv ention al tw o-tra c k and close-by tw o-tr ac k ev ents with realistic noise o v er lay . Ad dition all y , we estab lish a set of ev aluation metrics and complete benc hmark e xper iments using this dataset. Preliminary results sho w that while the GNN Finde r achie v es trac k recons truction perf or mance comparable to that of the Baseline Find er f or sing le-tra c k and conv entional two-tr ac k e v ents, its per f ormance degrades signiﬁcan tl y when handlin g close-by tw o-trac k e v ents. In conclus ion, this work addresses the shortage of dr ift chamber trac k reconstruction datasets and pro vides speciﬁc ev aluation metrics f or fa ir and reproducib le comparison f or the ML -based trac king m ethod s, thereb y hoping to promote adv ancemen ts and inno v ations in the ﬁeld. 8 Outlook Future work w ill ﬁrst f ocus on impro ving the dataset in r ic hness and applicabi lity . It will be ex tended to include both MC simulatio n and real data, co ve ring dis place d track s, curve d trac ks an d ph y sics e v ents. In additi on, a sample co v er ing both the inner trac k er and the drift chamber will be pro vided. This will signiﬁcantl y enr ich the data div ersity and make the datase t more represen tativ e of real e xperimental scenarios, ultimatel y impro ving reconstruction per f ormance, rare-sign al sensit ivity and disco v ery potential. Moreo v er , e valu ation of the baseline ﬁnding and ﬁtting methods will be enabled via public interfa ces; at presen t, access is a vaila ble onl y upon reques t and f orm al coordi nation w ith our team. – 15 – A Hit eﬃciency and hit purity Figure 8 sho ws the hit eﬃciency and hit purity f or conv entiona l tw o-trac k 𝜋 + 𝜋 − e v ents. For close-by tw o-trac k 𝜋 + 𝜋 − e v ents, the perf ormance is illust rated in ﬁgure 9 . Speciﬁcall y , the hit eﬃciency of the GNN Fin der is main l y degraded by ev ents with high transv erse momentu m and larg e pol ar angles . 0.22 0.35 0.49 0.62 0.76 0.89 1.03 1.16 1.30 1.43 p MC T [GeV/ c ] 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Hit Efficiency Convent. π + π − Base ine F inder GNN F inder -0.86 -0.70 -0.50 -0.30 -0.10 0.10 0.30 0.50 0.70 0.86 cos θ MC 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Hit Efficiency Convent. π + π − Ba eline F inder GNN F inder 0.22 0.35 0.49 0.62 0.76 0.89 1.03 1.16 1.30 1.43 p MC T [GeV/ c ] 0.85 0.87 0.89 0.91 0.93 0.95 0.97 0.99 1.01 Hit P urity C nvent. π + π − Baseline F inder GNN F inder -0.86 -0.70 -0.50 -0.30 -0.10 0.10 0.30 0.50 0.70 0.86 cos θ MC 0.85 0.87 0.89 0.91 0.93 0.95 0.97 0.99 1.01 Hit P rity Convent. π + π − Baseline F inder GNN F inder Figure 8 . Hit eﬃciency and h it pu rity for tracks found by both the GNN F inder and the Baseline F in der . R esults are shown as fu nctions of 𝑝 MC T (left column ) and cos 𝜃 MC (r ight co lumn) for con v entional two-tr ack 𝜋 + 𝜋 − ev ents. – 16 – 0.22 0.35 0.49 0.62 0.76 0.89 1.03 1.16 1.30 1.43 p MC T [GeV/ c ] 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Hit E iciency Close-by π + π − Baseline F inder GNN F inder -0.86 -0.70 -0.50 -0.30 -0.10 0.10 0.30 0.50 0.70 0.86 cos θ MC 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Hit Efficiency Close-by π + π − Baseline F inde GNN F inde 0.22 0.35 0.49 0.62 0.76 0.89 1.03 1.16 1.30 1.43 p MC T [GeV/ c ] 0.85 0.87 0.89 0.91 0.93 0.95 0.97 0.99 1.01 Hit P urity C ose-by π + π − Basel ine F inder GNN F inder -0.86 -0.70 -0.50 -0.30 -0.10 0.10 0.30 0.50 0.70 0.86 cos θ MC 0.85 0.87 0.89 0.91 0.93 0.95 0.97 0.99 1.01 Hit P uri y Close-by π + π − Baseline F inder GNN F inder Figure 9 . Hit eﬃciency and h it pu rity for tracks found by both the GNN F inder and the Baseline F in der . R esults are shown a s f unctions of 𝑝 MC T (left colu mn) and c o s 𝜃 MC (r ight column ) f or close-by tw o-trac k 𝜋 + 𝜋 − ev ents. – 17 – B T rack ﬁnding and ﬁtting eﬃciencies Figure 10 and 11 sho w the track eﬃciency and trac k charg e eﬃciency f or the G NN Fin der and the Baseline Find er with and without ﬁtting in conv entional two -tr ac k 𝜋 + 𝜋 − and close-by two- tr ac k e v ents. 0.2 0.4 0.6 0.8 1.0 1.2 1.4 p MC T [GeV/ c ] 0.90 0.92 0.94 0.96 0.98 1.00 1.02 T rack Efficiency C nvent. π + π − GNN F inder Baseline F inder GNN F itter Baseline F itter −0.75 −0.50 −0.25 0.00 0.25 0.50 0.75 cos θ MC 0.90 0.92 0.94 0.96 0.98 1.00 1.02 T rack Efficiency Convent. π + π − GNN F inder Ba eline F inder GNN F itter Ba eline F itter 0.2 0.4 0.6 0.8 1.0 1.2 1.4 p MC T [GeV/ c ] 0.90 0.92 0.94 0.96 0.98 1.00 1.02 T rack Char ge Efficiency Convent. π + π − GNN F inder Base ine F inder GNN F itter Base ine F itter −0.75 −0.50 −0.25 0.00 0.25 0.50 0.75 co s θ MC 0.90 0.92 0.94 0.96 0.98 1.00 1.02 T rack Char ge Efficiency C nvent. π + π − GNN F inder Baseline F inder GNN F itter Baseline F itter Figure 10 . T rack eﬃciency and tra ck charge eﬃciency for tr ac ks f ound by bo th th e GNN Finder ( orange) and the Baseline F inder ( b lue) with an d witho ut ﬁtting. Results are shown as func tions of 𝑝 MC T (left column) and cos 𝜃 MC (r ight colum n) for con v entional tw o-trac k 𝜋 + 𝜋 − ev ents. – 18 – 0.2 0.4 0.6 0.8 1.0 1.2 1.4 p MC T [GeV/ c ] 0.0 0.2 0.4 0.6 0.8 1.0 1.2 T rack Efficie cy Close-by π + π − GNN F i der Baseli e F i der GNN F itter Baseli e F itter −0.75 −0.50 −0.25 0.00 0.25 0.50 0.75 cos θ MC 0.0 0.2 0.4 0.6 0.8 1.0 1.2 T r ack Efficiency Close-by π + π − GNN F inde Baseline F inde GNN F itte Baseline F itte 0.2 0.4 0.6 0.8 1.0 1.2 1.4 p MC T [GeV/ c ] 0.0 0.2 0.4 0.6 0.8 1.0 1.2 T rac Char ge Efficiency Close-by π + π − GNN F inder Baseline F inder GNN F itter Baseline F itter −0.75 −0.50 −0.25 0.00 0.25 0.50 0.75 cos θ MC 0.0 0.2 0.4 0.6 0.8 1.0 1.2 T rack Char ge Efficie cy Close-by π + π − GNN F i der Baseli e F i der GNN F itter Baseli e F itter Figure 11 . T rack eﬃciency and tra ck charge eﬃciency for tr ac ks f ound by bo th th e GNN Finder ( orange) and the Baseline F inder ( b lue) with an d witho ut ﬁtting. Results are shown as func tions of 𝑝 MC T (left column) and cos 𝜃 MC (r ight colum n) for close- b y tw o-trac k 𝜋 + 𝜋 − ev ents. – 19 – C T rack parameter s Figure 12 , 13 and 14 present the trac k par ameters ( 𝑑 𝑟 , 𝜙 0 , 𝜅 , 𝑑 𝑧 and tan 𝜆 ) of MC tr uth and trac ks both f ound and ﬁtted by the Baseline Fi nder and the GNN Finde r f or sing le-tra c k 𝜋 + , conv entional tw o-trac k 𝜋 + 𝜋 − and close-b y tw o-tr ac k 𝜋 + 𝜋 − e v ents, respectiv ely . T he trac k parameter distributio ns f or other single-tr ac k particle species are analogou s to those of 𝜋 + and are thus omitted f or bre vity . 0.25 0.50 0.75 1.00 1.25 1.50 p T [GeV/ c ] 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 Number of T racks 1e3 Single-track π + MC T rut Baseline F inder GNN F inder Baseline F itter GNN F itter −0.10 −0.05 0.00 0.05 0.10 d r [c ] 0 2 4 6 Nu ber of T racks 1e4 Single-track π + MC T ruth Baseline F inder GNN F inder Baseline F itter GNN F itter 0 2 4 6 ϕ 0 [rad] 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 Nu ber of T racks 1e3 Single-track π + MC T ruth Baseline F inder GNN F inder Baseline F itter GNN F itter −10 −5 0 5 10 κ [GeV⁻¹] 0.0 0.5 1.0 1.5 2.0 Number of T racks 1e4 Sing e-track π ⁻ MC T ruth Base ine F inder GNN F inder Base ine F itter GNN F itter −4 −2 0 2 4 d z [cm] 0 1 2 3 Number f T racks 1e3 Single-track π + MC T ruth Baseline F inder GNN F inder Baseline F itter GNN F itter −4 −2 0 2 4 tan λ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Number f T racks 1e3 Single-track π + MC T ruth Baseline F inder GNN F inder Baseline F itter GNN F itter Figure 12 . T rack p arameters of sing le-track 𝜋 + ev ents. 0.25 0.50 0.75 1.00 1.25 1.50 p T [GeV/ c ] 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 Number of T rac s 1e3 Convent. π + π − MC T ruth Baseline F inder GNN F inder Baseline F itter GNN F itter −0.10 −0.05 0.00 0.05 0.10 d r [cm] 0 2 4 6 Numbe of T acks 1e4 Convent. π + π − MC T uth Baseline F inde GNN F inde Baseline F itte GNN F itte 0 2 4 6 ϕ 0 [rad] 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 Number f T racks 1e3 C nvent. π + π − MC T ruth Baseline F inder GNN F inder Baseline F itter GNN F itter −10 −5 0 5 10 κ [GeV⁻¹] 0 2 4 6 8 Number of T racks 1e3 Convent. π ⁻ π − MC T ruth Base ine F inder GNN F inder Base ine F itter GNN F itter −4 −2 0 2 4 d z [cm] 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Numbe of T acks 1e3 Convent. π + π − MC T uth Baseline F inde GNN F inde Baseline F itte GNN F itte −4 −2 0 2 4 t an λ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Number of T racks 1e3 Conven . π + π − MC T ru h Baseline F inder GNN F inder Baseline F i er GNN F i er Figure 13 . T rack p arameters of conventional two-tr ack 𝜋 + 𝜋 − ev ents. – 20 – 0.25 0.50 0.75 1.00 1.25 1.50 p T [GeV/ c ] 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 Number of T racks 1e3 Close-by π + π − MC T ruth Basel ne F nder GNN F nder Basel ne F tter GNN F tter −0.10 −0.05 0.00 0.05 0.10 d r [cm] 0 1 2 3 4 Number f T racks 1e4 Cl se-by π + π − MC T ruth Baseline F inder GNN F inder Baseline F itter GNN F itter 0 2 4 6 ϕ 0 [rad] 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 Number of T racks 1e3 Close-by π + π − MC T ruth Baseli e F i der GNN F i der Baseli e F itter GNN F itter −10 −5 0 5 10 κ [GeV⁻¹] 0 1 2 3 4 5 6 Nu ber of T racks 1e3 Close-by π ⁻ π − MC T ruth Baseline F inder GNN F inder Baseline F itter GNN F itter −4 −2 0 2 4 d z [cm] 0.00 0.25 0.50 0.75 1.00 1.25 1.50 Number of T racks 1e3 Close-by π + π − MC T ruth Baseli e F i der GNN F i der Baseli e F itter GNN F itter −4 −2 0 2 4 tan λ 0 2 4 6 8 Number of T rack 1e3 Clo e-by π + π − MC T ruth Ba eline F inder GNN F inder Ba eline F itter GNN F itter Figure 14 . T rack p arameters of close-by tw o-trac k 𝜋 + 𝜋 − ev ents. A ckno wledgments W e thank Mingrun Li f or dev eloping the py bes3 Python pac kage, w hic h supports the de v elopment of our ev aluation metrics algorithm. Funding. This wor k is supported b y the Nat ional Natural Science Foun dation of C hina (NSFC) under Contract s Nos. 1257520 7, 1217525 9, 12175124 and 12375197. A dditional support was pro vided b y the Stra tegic P r iority Resear ch Program of the Chinese A cadem y of S cienc es under Grants No s. XD A0480600 and XD A 0480203. Intere sts. The authors declare that the y ha ve no conﬂict of interes t. Re f erences [1] I . Balo ssino et al., The CGEM-IT : An Upg rade for the BESI II Experiment , Symmetry 14 (2022 ) . [2] Belle II collab oration, Snowmass Whitepaper: Th e Belle II Detector Upg r ade Prog r am , arXiv (20 22) [ arXiv: 2203.1 1349 ]. [3] D. Yin et al., Recent a dvances of experiment and simu la tion on luminosity per f o rmance a t BEPCII , N uclear Instruments an d Methods in Physics Resear c h Section A 1074 ( 2025) 170 289 . [4] Z . Y ao et al., P a ttern-Matching T r ack Reconstruction for the BES III Ma in Drift Chamber , Chinese Physics C 3 1 (20 07) 570. [5] J.- K. W an g et al., BESIII trac k ﬁtting algo rithm , Chin. P hys . C 33 (2009 ) 8 7 0 . [6] A. Cor reia et a l. , Graph Neur a l Ne tw o rk -Ba sed Pipe line for T r ack Finding in the V elo at LHCb , in Connecting The Dots 2023 , 6, 2025 [ arXiv: 2406.1 2869 ]. [7] L . Reuter et al., End- to-End Multi-trac k Reconstruction Using Graph Neur al N etw orks at Belle II , Comput. Sof tw . Big Sci. 9 (2025) 6 . – 21 – [8] L . Plin i, G. Tinti, T . Spadaro an d F . Galasso, Gr aph Neur a l Ne tw o rks for par ticle trac king in N A62 Experiment , N uov o Cim. C 48 (2025 ) 1 49 . [9] L . He in rich et al., Combined trac k ﬁn ding with GNN & C KF , in 8th Interna tional Connecting the Dots W orkshop (CTD 20 2 3) , 1, 2 024 [ arXiv: 2401.1 6016 ]. [10] J. Du arte and J.-R. Vlim a nt, Graph Neur al Ne tworks for P article T r acking a nd Reconstruction , [arXiv :2012. 01249] . [11] C. Biscarat et al., T ow a rds a realis tic trac k r econstruction algorithm based on gr aph neural networks for the HL -LHC , EP J W eb Conf. 251 (2021 ) 03 047 [ arXi v:2103. 00916 ]. [12] Kaggle, CERN, “Trac kM L P ar ticle T r ac king Challen ge.” https: //www. kaggle.com/competitions/trackml- p article- identification , 2018 . [13] S. V an Stroud et al., T ransformers for Charg ed P article T r a c k Reconstruction in High-Energy Physics , Phys. Rev . X 15 (2025) 04 1046 [ arX iv:241 1.07149 ]. [14] D.I. Ruso v et al., Deep Lea rning Methods in High Lumino sity T rac k Reconstruction Scena rio: Applying T r a c kNET to T r ackML Cha lleng e , Phys. P ar t. Nucl. 56 (2025 ) 1 5 99 . [15] O. Brüning an d L. R o ssi, The High Lumino sity La r ge Hadron Collider – HL -LHC , in A dvanced Series on Directions in High Energy Physics , vol. 31, pp . 1–5 3, W orld Scientiﬁc (202 4), DOI . [16] C. Allaire et al., “OpenDataDetector .” https: //zeno do.org/records/6445359 , 4, 2022 . 10.52 81/zenodo .6445359. [17] D. Elitez et al., ColliderML: The Fir st Release of an OpenDataDetector High- Luminosity Physics Benchmark Dataset , 2 0 25. 10.4855 0/arXiv . 2512.15230. [18] BESIII co llaboration, Design and Construction of the B ESIII Detector , N ucl. Instrum. Meth. A 614 (2010 ) 34 5 . [19] I.H. d e la Cr uz, The Belle II exp eriment: funda mental p hysics at the ﬂavor frontier , Journ al of Physics: Conf er ence Series 761 (2 016) 0120 17 . [20] M.- Y . Liu et al., S imulation an d r econstruction of pa rticle tra jectories in the CEPC drif t chamber , N uclear Science a nd T echniques 35 (202 4) 1 28 . [21] S TCF collaboratio n, S TCF co nceptual design report (V olume 1): Physics & detect or , F rontiers o f Ph y sics 19 (2023 ) . [22] COMET collab oration, Design a nd co nstruction of the cylindrical drif t chamber for the C OME T Phase-I e x periment , N uclear Instruments an d Methods in Physics Resear c h Section A 1069 ( 2024) 169 926 . [23] MEGII collaboration , T owar ds a New 𝜇 → e 𝛾 Searc h with the MEG I I Expe riment : Fr om Design to Commissioning , Univ erse 7 ( 2021) . [24] FCC c o llaboration, F uture Circular Collider F easibility Study Report V olume 1: Physics and Experiments , 2025. 10.17 181/CERN.9DKX.TDH9. [25] BESIII co llaboration, Preliminary Design Report: The BES III Detector , T ech. Rep. Institute of High Energy Physi cs, Chinese Academ y of Sciences (200 4). [26] C. Y u et al., BEPCII P erformanc e and Beam Dy namics Studies o n Lu minosity , in 7th Interna tional P ar ticle Acceler a tor Confer ence , p. TU Y A01, 2016, DOI . – 22 – [27] I. Garzia, Highlig hts from the B ESIII experiment , in QCD@W ork 2024: Internatio n al W orkshop on Quantum Chr omodyna mics - Theor y a nd Ex periment , vol. 314 o f EP J W eb o f Confer en ces , p. 0 0008, 2024, DOI . [28] Z. Deng et al., BESIII simulation sof twar e , P oS A CA T (20 07) 043 . [29] J. Z o u, W . Li, Q. Ma et al. , Oﬄine data processing sys tem of the BESI II ex p eriment , Eur . Phys. J. C 84 (20 24) 937 . [30] Institute of High Energy Phy sics, Chinese Academy of Sciences (C AS), “Hig h Energy Physics AI Platf o rm .” https: //ai.i hep.ac.cn/ . [31] Q.-G. L I U et al., T r a c k reconstruction using the TSF method for the BESII I main drift chamber , Chinese Physics C 32 (200 8) 5 65 . [32] L.-K. Jia et al., Study of low momen tum trac k r econstruction for the BESIII main drif t chamber , Chin. P hys . C 34 (2010) 18 66 . [33] J. Z h ang et al., Low transv erse momen tu m tr ack reconstruction b ased on the Houg h transf or m for the BESIII drift chamber , Radiat. D etect. T echnol. Methods 2 (2018 ) 20 . [34] X. Ai, H.M. Gray , A. Salzburger and N. S tyles, A no n-linear kalma n ﬁlter for trac k paramet ers estimation in hig h energy physics , N uclear Instruments an d Methods in Physics Resear c h Section A 1049 ( 2023) 168 041 . [35] C. Hö ppner, S. Neuber t, B. Ketzer a n d S. Paul, A nov el generic fr amewor k for tr ack ﬁtting in co mp lex detect or sys tems , N uclear I nstruments an d Methods in Physics Resear c h Section A 620 (2 0 10) 518 . [36] J. Rau ch and T . Schlüter, GENFI T — a Ge neric T rac k-F itting T oolkit , J. Phys. Conf. S er . 6 08 ( 2015) 012 042 [ arXi v:1410. 3698 ]. – 23 –

DCTracks: An Open Dataset for Machine Learning-Based Drift Chamber Track Reconstruction

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment