ARIADNE: A Perception-Reasoning Synergy Framework for Trustworthy Coronary Angiography Analysis

ARIADNE: A P erception-Reasoning Synergy F ramew ork for T rust w orth y Coronary Angiograph y Analysis Zhan Jin 1 † , Y u Luo 1 † , Yizhou Zhang 1 † , Ziy ang Cui 1 † , Y uqing W ei 1 , Xianc hao Liu 1 , Xueying Zeng 1* , Qing Zhang 2* 1 Sc ho ol of Mathematical Sciences, Ocean Universit y of China, Qingdao, 266100, Shandong, China. 2 Departmen t of Cardiology , Qilu Hospital (Qingdao), Cheelo o College of Medicine, Shandong Universit y , No. 758 Hefei Road, Qingdao, 266000, Shandong, China. *Corresp onding author(s). E-mail(s): zxying@ouc.edu.cn ; qingzhang2019@fo xmail.com ; Con tributing authors: zjin@stu.ouc.edu.cn ; luo yu@stu.ouc.edu.cn ; zyz6596@stu.ouc.edu.cn ; 3196148390@qq.com ; 2659799366@qq.com ; 2486063350@qq.com ; † These authors contributed equally to this w ork. Abstract Con ven tional pixel-wise loss functions fail to enforce top ological constraints in coronary v essel segmentation, producing fragmented v ascular trees despite high pixel-lev el accuracy . W e present ARIADNE, a tw o-stage framework coupling preference-aligned p erception with RL-based diagnostic reasoning for top olog- ically coherent stenosis detection. The p erception mo dule employs DPO to ﬁne-tune the Sa2V A vision-language foundation model using Betti n umber con- strain ts as preference signals, aligning the p olicy to ward geometrically complete v essel structures rather than pixel-wise o verlap metrics. The reasoning mod- ule formulates stenosis lo calization as a Mark ov Decision Pro cess with an explicit rejection mechanism that autonomously defers ambiguous anatomical candidates such as bifurcations and vessel crossings, shifting from cov erage maximization to reliability optimization. On 1,400 clinical angiograms, ARI- ADNE ac hieves state-of-the-art centerline Dice of 0.838, reduces false p ositiv es b y 41% compared to geometric baselines. External v alidation on multi-cen ter 1 b enc hmarks AR CADE and XCAD conﬁrms generalization across acquisition proto cols. This represents the ﬁrst application of DPO for top ological align- men t in medical imaging, demonstrating that preference-based learning ov er structural constraints mitigates top ological violations while maintaining diag- nostic sensitivity in interv en tional cardiology workﬂo ws. The co de is av ailable at h ttps://github.com/qimingfan10/ARIADNE. Keyw ords: Coronary Angiograph y , F oundation Mo dels, Direct Preference Optimization, Reinforcement Learning, T op ological Consistency , Stenosis Detection 1 In tro duction Coronary Artery Disease (CAD) remains a leading cause of morbidit y and mortalit y w orldwide[ 1 ], requiring diagnostic mo dalities that provide accurate, repro ducible, and eﬃcien t assessment. In v asive X-ra y Coronary Angiograph y (XCA) serv es as the pri- mary to ol for CAD diagnosis and guidance of Percutaneous Coronary Interv entions (PCI)[ 2 ], oﬀering high temp oral resolution necessary for visualizing hemo dynamic ﬂo w[ 3 ]. How ever, curren t clinical workﬂo ws rely hea vily on manual interpretation, a pro cess c haracterized b y signiﬁcan t in ter-observer v ariability and susceptibility to clin- ician fatigue[ 4 ]. As healthcare institutions universally adopt Picture Arc hiving and Comm unication Systems (P ACS), a critical gap p ersists b et ween passiv e image storage and active, automated clinical interpretation. While hospitals hav e implemented digi- tal image storage, they lack automated systems capable of transforming ra w imaging data in to actionable clinical insigh ts. The gro wing v olume of in terven tional procedures mak es purely manual interpretation increasingly unsustainable, creating demand for Computer-Aided Diagnosis systems that can bridge the gap b et ween data acquisition and clinical decision-making. Accurate segmen tation of the coronary v ascular tree represen ts a fundamental pre- requisite for automated coronary analysis. Ov er the past decade, Conv olutional Neural Net works (CNNs), particularly U-Net[ 5 ] and its attention-enhanced v ariants suc h as CS-Net and SA-UNet[ 6 , 7 ], hav e dominated the ﬁeld. More recently , Vision T rans- formers (ViTs) ha ve b een introduced to capture global spatial relationships[ 8 ]. Despite ac hieving high pixel-lev el p erformance metrics, these mo dels face a critical limitation in preserving v ascular top ology . T raditional loss functions, including Cross-Entrop y and Dice Loss[ 9 ], optimize pixel-lev el accuracy indep enden tly without explicitly penal- izing top ological errors[ 10 ]. Consequently , these models frequen tly pro duce fragmented v essel trees where distal branches app ear disconnected, particularly due to signal loss in thin vessels during do wnsampling op erations[ 11 ]. In coronary hemo dynamics anal- ysis, top ological connectivit y is essential; a segmen tation with high Dice score remains insuﬃcien t for clinical use if discontin uities preven t accurate centerline extraction and subsequen t geometric analysis. The recent emergence of foundation-scale Vision-Language Mo dels (VLMs) has in tro duced a complementary approach to medical image segmen tation. Mo dels such as SAM3[ 12 ] and MedSAM3[ 13 ] leverage large language models to enable prompt-based 2 segmen tation, where textual descriptions guide mask generation. These arc hitectures demonstrate impressiv e semantic understanding, correctly iden tifying what constitutes a v essel based on learned visual-linguistic corresp ondences. How ever, their train- ing on generic natural image datasets creates a fundamental seman tic-top ological gap: while VLMs comprehend the conceptual category of a v ascular structure, they lac k the domain-sp eciﬁc anatomical priors necessary to enforce structural contin u- it y in low-con trast, pro jection-based X-ray angiography . Empirical ev aluation reveals that general-purp ose VLMs consistently pro duce seman tically correct but topolog- ically fragmented segmentations—correctly classifying pixels as v essel while failing to maintain the connected tree structure essential for hemo dynamic mo deling. This failure stems from their optimization ob jectiv e: VLMs maximize pixel-level o verlap (Dice, IoU[ 14 ]) b et ween predicted and ground-truth masks, a criterion that remains agnostic to whether the resulting mask forms a contin uous v ascular netw ork or a col- lection of disconnected segments. In coronary angiography , where vessel diameters approac h image resolution limits and contrast v ariabilit y is substantial, the absence of explicit top ological constraints results in high-conﬁdence predictions of isolated vessel fragmen ts that are clinically unusable for stenosis quan tiﬁcation or ﬂo w analysis. This limitation in segmentation directly impacts the accuracy of stenosis detection systems. Current automated framew orks predominan tly follo w a sequential approach where segmen tation and stenosis detection are p erformed as independent tasks[ 15 , 16 ]. In these systems, geometric algorithms trav erse the segmented centerline to iden- tify regions of narrowing. How ever, these deterministic algorithms lac k the abilit y to distinguish pathological stenosis from common anatomical artifacts, including vessel crossings[ 17 ], bifurcations, and foreshortening[ 18 ], resulting in elev ated false p ositiv e rates. Conv ersely , while deep ob ject detectors suc h as YOLO hav e b een applied to direct lesion iden tiﬁcation[ 19 ], they inheren tly lack the capacity to verify anatomical plausibilit y . Sp eciﬁcally , generic ob ject detectors treat lesions as isolated b ounding b o xes, failing to v alidate whether a detected stenosis actually resides within a con- tin uous, hemo dynamically relev ant v ascular segment. These limitations hav e hindered clinical adoption due to the high rate of false alarms that reduce system reliability . T o address these fundamental challenges in coronary angiography automation, we prop ose ARIADNE (Anatomy-a ware Reasoning for Integrated Angiography Diagnosis and Navigation Exp ert), a framework that bridges the gap b et ween visual p erception and clinical reasoning. Our central h yp othesis is that robust diagnostic automation requires not only accurate visual recognition but also explicit alignment with the hier- arc hical reasoning patterns employ ed b y expert clinicians. Building on recen t adv ances in preference-based learning from the artiﬁcial in telligence communit y , we in tegrate Direct Preference Optimization (DPO)[ 20 ] with Reinforcement Learning (RL)[ 21 ] to create a t wo-stage diagnostic pip eline. In the p erception stage, we apply DPO to ﬁne-tune a vision-language foundation model (Sa2V A)[ 22 , 23 ], using comparative pref- erences deriv ed from cen terline contin uity proﬁles, quantiﬁed via clDice[ 10 ], to guide the mo del to ward structurally coherent vessel segmentations. Unlike general VLMs that optimize for semantic correctness through pixel o verlap, our preference-based approac h explicitly rew ards top ological contin uity—teac hing the model that a mask with 92% Dice score but preserved connectivit y is preferable to a 95% Dice mask with 3 fragmen ted branches. This enables the mo del to harness the semantic p o wer of vision- language architectures while enforcing the geometric rigor required for hemo dynamic analysis. T o consolidate this top ological reasoning against w eak visual signals, w e fur- ther incorp orate a Hard Sample F o cused T raining (HSFT) strategy . By concen trating optimization resources on the most diagnostically uncertain subsets—such as complex bifurcations and distal v essels—this mec hanism achiev es signiﬁcant computational eﬃ- ciency while ensuring robust p erformance in anatomically challenging regions where global statistics often mask lo cal failures. The resulting top ologically coheren t vessel trees provide a clinically v alid foundation for the reasoning stage: a RL-based na viga- tion agent that performs sequen tial decision-making for stenosis detection. Critically , this agent incorp orates an explicit rejection mechanism[ 24 ] that mirrors the clinical w orkﬂow where radiologists ﬂag ambiguous cases for secondary review. By allo wing the system to abstain from uncertain predictions, we shift the op erational paradigm from maximizing co verage to maximizing reliability , thereb y reducing false p ositiv e rates while maintaining high sensitivity for clear-cut lesions. This p erception-to-reasoning arc hitecture reﬂects the natural diagnostic workﬂo w, where accurate anatomical recon- struction serves as the p erceptual foundation for subsequent lesion lo calization and c haracterization. This work mak es three primary con tributions to the automation of coronary angiograph y in terpretation: 1. Perception F ramework: W e in tro duce a preference-based optimization approac h that aligns VLMs with top ological constraints in vessel segmen tation. By apply- ing DPO to comparativ e vessel tree examples, our metho d achiev es top ologically consisten t segmentations without requiring pixel-level annotation of connectivity features, augmen ted by a hard-sample mining strategy that enhances computational eﬃciency in complex anatomical scenarios. 2. Reasoning Algorithm: W e form ulate stenosis detection as a sequential na vigation task guided b y RL, incorp orating an explicit rejection mec hanism that allows the system to defer ambi guous cases. This clinical workﬂo w-aligned approach substan- tially reduces false p ositiv e rates in anatomically complex regions while main taining high sensitivity for deﬁnitiv e lesions. 3. Clinical V alidation: W e demonstrate that integrating top ologically-a ware p er- ception with rejection-enabled reasoning ac hieves state-of-the-art diagnostic p er- formance on standard coronary angiography b enc hmarks with a TPR of 0.867, supp orting the h yp othesis that anatomical v alidit y is prerequisite to reliable automated diagnosis. 2 Metho ds 2.1 F ramework Overview T o op erationalize the clinical requirement for top ological contin uity in angiographic analysis, the prop osed ARIADNE framew ork is designed to emulate the hierarchical decision-making pro cess of human exp erts. As illustrated in Fig. 1 and Fig. 2 , the system consists of tw o biomimetic stages that mirror the visual-cognitive w orkﬂow of 4 exp ert interv entional cardiologists: a p erception mo dule for anatomically consisten t v ascular reconstruction and a reasoning mo dule for context-a w are lesion lo calization. Fig. 1 T raining framework of Anatomy-Aw are Segmentation The p erception mo dule employs the Sa2V A foundation mo del[ 25 ] with a progressive training strategy designed to enforce top ological contin uity throughout the segmenta- tion pro cess. W e integrate DPO[ 20 ] in to the training pip eline to align mo del outputs to ward geometrically complete vessel structures rather than fragmen ted pixel-level predictions. This preference-based learning approac h guides the mo del to preserv e v ascular connectivity without requiring exhaustive manual annotation of top ologi- cal features, generating vessel masks that maintain the v ascular contin uity essential for downstream hemo dynamic analysis. The resulting segmen tation masks maintain structural integrit y across vessel hierarchies, providing a clinically reliable anatomical scaﬀold for downstream diagnostic reasoning. Building up on this top ologically consistent representation, the reasoning mo dule op erates as a structure-guided diagnostic agent that navigates the extracted vessel sk eleton to iden tify stenotic lesions. Rather than applying ﬁxed statistical thresholds, w e develop a RL agent that analyzes lo cal geometric features—including radius gradi- en ts and curv ature patterns—to p erform context-a ware lesion localization. Critically , the agent incorp orates an explicit rejec t ion mec hanism to ﬁlter false p ositiv e detec- tions arising from complex anatomical structures suc h as v essel crossings, bifurcations, and foreshortening artifacts. The eﬀectiveness of this rejection mechanism is funda- men tally dep enden t on the structural consistency provided b y the p erception mo dule, 5 Fig. 2 T raining framework of Structure-Guided Reasoning demonstrating the essential in terdep endence b et ween anatomical reconstruction and diagnostic decision-making within the ARIADNE framework. 2.2 Anatom y-Aw are Perception Mo dule via Preference Alignmen t Coronary vessel segmentation requires bridging the semantic gap b et w een low-lev el pixel intensities in ﬂuoroscopic images and high-level anatomical kno wledge of v as- cular top ology . T o achiev e this integration, w e employ the Sa2V A architecture[ 25 ], a visual-language foundation mo del designed to align angiographic representations with structured clinical priors. F ormally , given an input angiographic image x ∈ R H × W × C , the architecture consists of three interdependent comp onents that collectively enable top ological alignment. The vision encoder E v , instan tiated as InternViT-6B-448p x[ 26 ], op erates in a frozen state to extract robust high-semantic em b eddings z v = E v ( x ) ∈ R N × d v , where N represents the num b er of patch tokens and d v denotes the embedding dimension. Rather than training exclusively on limited medical datasets, this gener- alized visual embedding space—pre-trained on large-scale natural images—provides stable feature represen tations across the v arying contrast conditions and ﬂuoroscopic noise characteristic of interv entional cardiology . The language mo deling comp onen t L , based on InternLM2[ 27 ], maps anatomical directives (e.g., Segment the coronary artery) into a high-dimensional seman tic space z l = L (prompt) ∈ R d l . This foundation mo del approac h enables semantic represen tation of v ascular terminology and supp orts p oten tial generalization to complex clinical queries without arc hitectural mo diﬁcation. Lo w-Rank Adaptation (LoRA)[ 28 ] with rank r = 16 is applied to adapt pre-trained linguistic kno wledge to coronary anatom y while av oiding full-parameter retraining costs. The in tegration betw een visual and linguistic streams forms the anatomical scaf- fold of the framew ork, where trainable pro jection lay ers align semantic em b eddings with visual features to condition the SAM-2 mask deco der D [ 29 ], yielding the seg- men tation mask y = D ( z v , z l ) ∈ { 0 , 1 } H × W . By embedding clinical priors into the 6 deco ding pro cess, this architecture suppresses background artifacts that p ossess simi- lar pixel in tensities to vessel structures but lac k anatomical relev ance, thereb y focusing computational resources on topologically v alid v ascular comp onen ts. T raditional segmentation datasets consist of isolated static frames, creating ambi- guit y when anatomically complex structures or motion artifacts cannot b e resolved without temp oral context. T o address this limitation, we implement a physiologically adaptiv e sampling strategy that leverages the contin uous nature of angiographic video sequences. Rather than uniform temporal sampling—which introduces redundancy through near-duplicate frames while missing critical ph ysiological ev ents—w e extract k ey frames that capture distinct hemodynamic states across the cardiac cycle. F rame selection explicitly targets systolic and diastolic phases to ensure exposure to the full range of vessel deformation, including lumen diameter v ariations and wall motion. Sampling additionally encompasses environmen tal v ariations in ﬂuoroscopic imaging angles, X-ray intensit y , and con trast b olus propagation phases, sp eciﬁcally arterial inﬂo w, peak opaciﬁcation, and venous washout. T o maximize diagnostic relev ance, w e identify temp oral hard clusters—consecutive sequences exhibiting lo w-conﬁdence predictions that correspond to anatomically challengin g regions such as distal v essel terminals, bifurcation zones, and ov erlapping vessel segmen ts. This temporal mining strategy concentrates training samples on scenarios where visual ambiguit y is maxi- mal, mirroring the clinical workﬂo w where radiologists dedicate additional scrutiny to diagnostically uncertain regions. T o evolv e the mo del from a general-purp ose segmen tation system into a domain- sp ecialized tool capable of preserving v ascular topology , w e implemen t a three-stage progressiv e training strategy that adv ances from basic visual pattern recognition to structured anatomical reasoning. In Stage 1, we establish visual pattern alignment through parameter-eﬃcient transfer learning by freezing the vision enco der E v to pre- serv e generalized feature extraction capabilities while applying LoRA adapters to the In ternLM2 language mo del and SAM-2 deco der. The mo del is trained on N 1 = 1 , 220 annotated angiogram samples b y minimizing the Dice loss L Dice = 1 − 2 | y ∩ y ∗ | | y | + | y ∗ | , (1) where y ∗ denotes the ground truth mask. This initial alignment ensures the system can recognize vessel b oundaries, tubular structures, and contrast-enhanced regions b efore addressing connectivity constraints. How ever, standard supervised ob jectiv es op erate under pixel-wise indep endence assumptions—minimizing lo cal discrepancies without explicitly p enalizing top ological violations such as vessel fragmentation. Subsequen tly , to align the mo del with clinical reasoning principles that prior- itize structural con tinuit y , we incorp orate top ological constrain ts through DPO in Stage 2. Clinical v alidit y is formally deﬁned as top ological connectivit y: coronary arteries must form contin uous tubular structures exhibiting C 1 con tinuit y without artiﬁcial fragmen tation, c haracterized by a single connected comp onen t—denoted b y a Betti num b er of β 0 = 1—that preserv es hemodynamic ﬂow contin uity . DPO enforces this constraint by form ulating v ascular connectivity as a preference learning prob- lem where the ob jective is to maximize the likelihoo d margin b et ween top ologically 7 v alid and in v alid segmentation states. Sp eciﬁcally , w e construct a preference dataset D pref = { ( x ( i ) , y ( i ) w , y ( i ) l ) } N 2 i =1 , where preference pairs are deﬁned b y adherence to top o- logical constraints rather than pixel-wise ov erlap metrics. The preferred (winning) sample y w is the ground truth segmen tation, which satisﬁes global geometric con- strain ts by exhibiting β 0 ( y w ) = 1 and preserving ﬂo w contin uity . The non-preferred (losing) sample y l consists of hard negative examples mined from the Stage 1 p ol- icy π S1 —sp eciﬁcally , predictions with high pixel-level ov erlap Dice( y l , y ∗ ) > 0 . 8 but top ological violations β 0 ( y l ) > β 0 ( y w ), indicating v essel fragmen tation or disjoin t arti- facts. The p olicy π θ ( y | x ), represen ting the probability distribution ov er segmentation masks, is optimized to assign higher probability to top ologically connected samples o ver fragmented predictions through the DPO ob jectiv e: L DPO ( π θ ; π ref ) = − E ( x , y w , y l ) ∼D pref  log σ  β log π θ ( y w | x ) π ref ( y w | x ) − β log π θ ( y l | x ) π ref ( y l | x )  , (2) where π ref denotes the frozen Stage 1 p olicy serving as the reference mo del to preven t excessiv e deviation from learned visual features, β = 0 . 1 controls the KL-divergence p enalt y strength, and σ represen ts the logistic sigmoid function. DPO optimizes the p olicy directly without training an explicit reward model, enabling eﬃcient top o- logical alignmen t through preference-based learning that guides the model tow ard geometrically complete vessel structures. Finally , while DPO aligns the mo del with top ological connectivity principles, p er- formance remains inconsisten t in anatomically complex scenarios where w eak visual signals (lo w contrast, vessel ov erlap) destabilize the learned connectivity preference. T o consolidate top ological reasoning under diagnostic am biguity , w e implement HSFT in Stage 3 that concen trates computational resources on scenarios where clinical inter- pretation is most challenging. Rather than treating hard samples as statistical outliers, w e identify temporal hard clusters—consecutiv e frames with Dice scores below thresh- old τ = 0 . 75—which map to sp eciﬁc anatomical c hallenges: ﬁne distal vessel terminals (diameter < 1mm), bifurcation zones where multiple branches div erge, and ov er- lapping vessel segments in oblique pro jections. W e deﬁne the hard sample subset D hard = { ( x , y ∗ ) ∈ D | Dice( π θ ( x ) , y ∗ ) < τ } , which constitutes 20.8% of the dataset but accounts for the ma jority of top ological errors. T o enforce pixel-level accuracy in these regions while maintaining structural integrit y , we apply the hybrid loss function L HSFT = L Dice + λ L BCE , (3) with λ = 0 . 5, where the Binary Cross-Entrop y comp onen t L BCE = − X i,j  y ∗ ij log y ij + (1 − y ∗ ij ) log(1 − y ij )  (4) pro vides pixel-wise gradients needed to reﬁne vessel b oundaries at bifurcations and terminals, while L Dice main tains global structural consistency . This progressive strat- egy achiev es 5 × resource eﬃciency by fo cusing training on the most diagnostically 8 relev ant subset of samples, ensuring robust top ological preserv ation across the full sp ectrum of anatomical complexit y encountered in clinical practice. 2.3 Structure-Guided Reasoning via Reinforcemen t Learning Building up on the top ologically consistent vessel segmentations established b y the per- ception mo dule (Fig. 1 ), the reasoning stage (Fig. 2 ) translates structural information in to diagnostic outputs b y performing con text-aw are stenosis lo calization. This mo d- ule leverages the top ological integrit y of DPO-enhanced segmentations to construct a navigable anatomical scaﬀold from whic h diagnostic candidates are systematically generated. Morphological thinning is applied to the reﬁned binary segmentation mask y ∈ { 0 , 1 } H × W to extract the discrete v essel centerline C = { p 1 , p 2 , . . . , p N } where p i ∈ R 2 denotes spatial co ordinates, which serves as the na vigation tra jectory for subsequen t geometric analysis. F or each p oint p i on the skeleton, the local ves- sel radius r ( p i ) is computed using a Euclidean distance transform D ( y ), yielding r ( p i ) = max { d | B ( p i , d ) ⊂ y } where B ( p i , d ) denotes a ball of radius d centered at p i . This generates a one-dimensional radius proﬁle r : [0 , L ] → R + parameterized b y arc length s along the vessel’s longitudinal axis. By analyzing this proﬁle alongside its ﬁrst and second deriv atives ∇ r ( s ) = dr ds and ∇ 2 r ( s ) = d 2 r ds 2 , we identify morphological b ottlenec ks as candidate lo cations P cand = { p ∈ C | r ( p ) < µ r − kσ r ∧ ∇ 2 r ( p ) > θ curv } , where µ r and σ r denote the mean and standard deviation of the radius proﬁle, k = 1 . 5 controls sensitivit y , and θ curv is a curv ature threshold. This deterministic geometric pro cess is delib erately conﬁgured for maximum sensitivity to generate a high-recall candidate set. How ever, this approach inheren tly pro duces false p ositiv es in anatomically complex regions such as bifurcations, vessel crossings, and natural tap ering zones where geometric narrowing mimics pathological stenosis. These candi- dates therefore serve as initial prop osals requiring subsequent v eriﬁcation, providing a high-recall, lo w-precision co ordinate set that necessitates intelligen t ﬁltering through clinical reasoning. T o address the limitation of purely geometric detection, we formulate stenosis lo calization as a sequential decision-making process mo deled as a Mark ov Decision Pro cess (MDP). This formulation enables the system to p erform con text-aw are diag- nostic reasoning that distinguishes pathological stenoses from anatomical artifacts through analysis of lo cal morphological patterns. Unlike static thresholding methods that apply ﬁxed criteria uniformly across all vessel segments, RL allows the agen t to adaptiv ely ev aluate each candidate based on its geometric neighborho od, mimic king the sequen tial visual insp ection workﬂo w employ ed b y interv entional cardiologists. The MDP is formally deﬁned by the tuple M = ( S , A , T , R , γ ), where S denotes the state space enco ding lo cal v essel geometry , A represents the action space of navigational commands, T : S × A → ∆( S ) deﬁnes the state transition function, R : S × A → R sp eciﬁes the reward function enco ding clinical priorities, and γ ∈ [0 , 1) is the dis- coun t factor. The agent navigates the v ascular skeleton to lo calize true stenoses while rejecting false alarms arising from b enign anatomical v ariations. Sp eciﬁcally , the state space S ⊂ R 16 enco des lo cal morphological con text at eac h candidate lo cation. Eac h state v ector s t = [ r t − 5: t +5 , ∇ r t − 5: t +5 , Z t , κ t ] ∈ R 16 9 comprises: (1) a normalized radius proﬁle r t − w : t + w within a sliding window of half- width w = 5 centerline p oin ts, capturing the geometric progression of vessel lumen narro wing; (2) ﬁrst-order deriv atives ∇ r t − w : t + w quan tifying morphological gradi- en ts to detect abrupt transitions characteristic of stenotic lesions; (3) lo cal Z-score Z t = r ( p t ) − µ r σ r b enc hmarking the degree of narro wing against the vessel’s baseline calib er to distinguish signiﬁcan t stenoses from normal anatomical v ariation; and (4) lo cal curv ature κ t = ∇ 2 r ( p t ) capturing geometric sharpness. The action space A = { Left , Right , Conﬁrm , Reject } consists of discrete navigational commands, where lateral mov emen ts { Left , Righ t } implement spatial translation p t +1 = p t ± ∆ p with step size ∆ p = 3 pixels to enable ﬁne p ositional adjustment tow ard the precise steno- sis cen ter. Critically , the Reject action implements an explicit abstention mechanism that transitions the agent to the next candidate in P cand without issuing a diagnos- tic prediction, allowing autonomous dismissal of ambiguous candidates where lo cal geometry superﬁcially resembles pathology—suc h as bifurcation points where paren t v essels exhibit natural narro wing as they split into smaller daughter branches. This rejection capabilit y mirrors the clinical triage workﬂo w where radiologists defer uncer- tain cases for secondary review rather than issuing p oten tially erroneous diagnoses, fundamen tally shifting the op erational paradigm from cov erage maximization to relia- bilit y optimization and reducing false p ositiv e rates while maintaining high sensitivity for deﬁnitive lesions. T o align agent b eha vior with clinical diagnostic priorities, the reward function R : S × A → R explicitly enco des the asymmetric costs of diagnostic errors, distinguishing b et ween active detection failures, termed F alse Positiv es, and passive omission errors, or F alse Negativ es. The rew ard function is formally structured to incentivize the correct rejection of anatomical artifacts while severely p enalizing missed diagnoses: R ( s t , a t ) =                r TP if a t = Conﬁrm ∧ δ ( p t , G ) ≤ τ (T rue P ositiv e) r FP if a t = Conﬁrm ∧ δ ( p t , G ) > τ (F alse P ositiv e) r TN if a t = Reject ∧ δ ( p t , G ) > τ (Correct Rejection) r FN if a t = Reject ∧ δ ( p t , G ) ≤ τ (F alse Negativ e) r step otherwise (5) where δ ( p t , G ) = ∥ p t − G ∥ 2 represen ts the Euclidean distance to the nearest ground truth stenosis cen troid G , and τ = 75 pixels deﬁnes the lo calization tolerance. Hyp er- parameters are calibrated to reﬂect safet y-critical clinical constraints: r TP = +50 rew ards accurate lo calization; r FP = − 10 p enalizes false alarms to reduce clini- cian fatigue; r TN = +10 explicitly rewards the agent for correctly iden tifying and rejecting ambiguous artifacts suc h as vessel crossings; and r FN = − 50 imp oses a maximal p enalt y for rejecting a true stenosis, ensuring high sensitivity . A step cost r step = − 1 encourages eﬃcien t navigation . The optimal policy π ∗ : S → ∆( A ) maxi- mizes the expected cumulativ e discoun ted reward E π  P ∞ t =0 γ t R ( s t , a t )  with discoun t factor γ = 0 . 99. Policy optimization is p erformed using Proximal Policy Optimization (PPO)[ 30 ], which ensures stable gradient up dates b y constraining the p olicy up date 10 through the clipp ed surrogate ob jective: L PPO ( θ ) = E t h min  ρ t ( θ ) ˆ A t , clip( ρ t ( θ ) , 1 − ϵ, 1 + ϵ ) ˆ A t i (6) where ρ t ( θ ) = π θ ( a t | s t ) π θ old ( a t | s t ) denotes the probability ratio b et ween successive p olicy iter- ations, ˆ A t is the generalized adv antage estimate, and ϵ = 0 . 2 deﬁnes the clipping range. This clipping mechanism preven ts destructive up dates that could destabilize the learned diagnostic strategy . The p olicy netw ork π θ emplo ys a Multi-Lay er Per- ceptron architecture with la yer dimensions [16 → 256 → 128 → 64 → |A| ] and ReLU activ ations, parameterized by weigh ts θ ∈ R d . Rather than employing recur- ren t architectures suc h as LSTMs or GRUs, this feedforward design enforces a strictly Mark ovian decision pro cess where the p olicy π θ ( a t | s t ) conditions exclusively on the curren t state s t , ensuring diagnostic decisions remain inv arian t to the vessel’s prior tra jectory and maintaining computational eﬃciency with inference time < 50ms p er candidate, suitable for real-time clinical deploymen t. 3 Exp erimen ts T o v alidate the prop osed p erception-reasoning framework, the exp erimen tal design was structured to address t wo primary ob jectiv es: (1) ev aluation of topological consistency in v ascular segmentation across div erse angiographic conditions, and (2) assessment of stenosis detection accuracy and false p ositiv e management in anatomically complex scenarios. 3.1 Datasets and Sampling Strategy The exp erimen tal foundation was established through a video-based acquisition strategy designed to capture morphological diversit y in coronary angiograph y , sup- plemen ted b y external v alidation on publicly av ailable datasets to assess domain generalization. A proprietary dataset w as curated from coronary angiography video sequences acquired at Guizhou Aviation Industry Group 302 Hospital using a Siemens angiogra- ph y system. The collection pro cess utilized temporal information from video streams to ensure comprehensive representation of vessel morphology across 35 patients. The dataset comprises 1,400 high-resolution images at 512 × 512 resolution from 35 patients, with an av erage of 40 frames extracted p er patien t to capture v arying vessel angu- lations and contrast conditions. T o prev ent data leak age inheren t in video-based acquisitions—where consecutive frames exhibit high temp oral correlation—the dataset w as partitioned at the patien t level rather than the image level. Speciﬁcally , 25 patien ts, comprising 1,000 images, w ere allo cated to the training set, while the v ali- dation and testing sets each con tained 5 patien ts contributing 200 images, ensuring that no patient appeared in multiple partitions and thereb y guaranteeing independent ev aluation of mo del generalization. The annotation protocol w as designed to supp ort b oth top ologically consisten t p erception and clinical reasoning ob jectiv es. Exp ert cardiologists annotated vessel 11 con tours using Lab elMe p olygon format, with particular emphasis on main taining top ological connectivity across v ascular netw orks. Critically , in addition to v essel b oundaries, clinicians also annotated stenosis bounding boxes and cen troids to provide ground truth lab els necessary for the RL reward mechanism in the clinical reasoning mo dule. T o ensure annotation quality , a topology-aw are quality control pro cess was implemen ted during the curation phase, whereb y annotations exhibiting fragmented connectivit y or top ological inconsistencies were identiﬁed and rejected. Subsequently , a dual-v eriﬁcation pro cess consisting of peer review and random sp ot chec ks was applied to minimize inter-observ er v ariability . During prepro cessing, connectivity veriﬁcation and small-domain remov al operations were performed to ensure that ground truth annotations reﬂect top ologically consistent v ascular structures suitable for training the DPO-aligned p erception mo dule. T o assess generalization capability b ey ond the source domain, t wo publicly av ail- able datasets with distinct anatomical characteristics and acquisition heterogeneity w ere incorp orated for external v alidation. The AR CADE dataset[ 31 ] contains 1,200 images annotated according to SYNT AX score criteria across 26 anatomical regions, pro viding ev aluation of segmen tation p erformance across diﬀerent acquisition proto- cols and imaging conditions represen tative of multi-cen ter v ariability . F urthermore, the X CAD dataset[ 32 ] consists of 126 images with comprehensive annotations including ﬁne distal vessel branches, enabling ev aluation of segmentation p erformance in low- con trast distal v ascular structures where top ological consistency is most challenging to maintain. The inclusion of these external datasets—acquired from diﬀerent clinical cen ters using diﬀerent scanner conﬁgurations—in tro duces domain shift that rigor- ously tests the framework’s ability to generalize across heterogeneous angiographic conditions encountered in real-w orld clinical practice. 3.2 Ev aluation Metrics A multi-dimensional ev aluation framew ork w as established to assess b oth segmentation qualit y and detection p erformance with emphasis on clinically relev ant error patterns. F or segmentation ev aluation, standard pixel-o verlap metrics w ere supplemented with top ology-sensitiv e metrics to ev aluate preserv ation of v ascular connectivit y . The Dice Co eﬃcien t, measuring the ov erlap b et ween predicted mask y and ground truth y ∗ , was computed as Dice = 2 | y ∩ y ∗ | | y | + | y ∗ | . (7) Complemen ting this o verlap measure, the In tersection ov er Union (IoU) quan tiﬁed the ratio of intersection to union b et ween prediction and ground truth according to IoU = | y ∩ y ∗ | | y ∪ y ∗ | , (8) while pixel-level classiﬁcation performance w as characterized through Accuracy Accuracy = TP + TN TP + TN + FP + FN , (9) 12 Precision Precision = TP TP + FP , (10) and Sensitivity Sensitivit y = TP TP + FN , (11) where TP , TN, FP , and FN denote true p ositiv es, true negativ es, false p ositiv es, and false negatives, resp ectiv ely . Beyond these conv entional metrics, top ological ﬁdelity w as speciﬁcally quantiﬁed using tw o complemen tary metrics. First, the Centerline Dice (clDice)[ 10 ] ev aluates the ov erlap betw een predicted and ground-truth vessel skeletons C ( · ) obtained via morphological skeletonization, providing sensitivity to discon tinuities that would disrupt do wnstream analysis: clDice = 2 |C ( y ) ∩ C ( y ∗ ) | |C ( y ) | + |C ( y ∗ ) | . (12) F urthermore, b oundary precision within clinically acceptable margins was assessed using the Normalized Surface Dice (NSD)[ 33 ] with tolerance threshold τ , deﬁned as NSD = |B τ ( y ) ∩ B τ ( y ∗ ) | |B τ ( y ) | + |B τ ( y ∗ ) | , (13) where B τ represen ts the b oundary region within distance τ , ensuring that vessel width estimation supp orts accurate geometric quantiﬁcation. F or stenosis detection p erformance ev aluation, metrics reﬂecting the balance b et ween sensitivity and false p ositiv e management w ere emplo yed, with a detection considered correct if lo calized within 75 pixels of the ground truth stenosis cen- troid corresp onding to clinically acceptable spatial tolerance. The T rue Positiv e Rate (TPR), equiv alent to Recall and measuring the prop ortion of actual stenoses correctly iden tiﬁed, w as deﬁned as TPR = TP det TP det + FN det , (14) where TP det and FN det represen t true p ositiv es and false negatives at the lesion lev el. The P ositive Predictive V alue (PPV), equiv alent to Precision and quantifying the system’s abilit y to reject false detections in anatomically am biguous regions, was calculated as PPV = TP det TP det + FP det . (15) The F1 Score pro vided a balanced harmonic measure of detection accuracy in tegrating b oth sensitivity and precision according to F 1 = 2 · PPV · TPR PPV + TPR . (16) T o further quantify the clinical utility of the rejection mec hanism in reducing alarm fatigue, w e rep orted the F alse Positiv es P er Image (FPPI), calculated as the total n umber of false p ositiv e detections divided b y the total num b er of test images. A 13 lo wer FPPI with sustained TPR demonstrates the agen t’s eﬀectiv eness in ﬁltering anatomical artifacts. 3.3 Implemen tation Details The training process w as implemen ted using PyT orch 2.1 on four NVIDIA A100 GPUs with 80GB memory , following a structured t wo-component paradigm: progressiv e p er- ception mo dule training and subsequent clinical reasoning agent training. Input images w ere prepro cessed by resizing from the nativ e acquisition resolution of 512 × 512 pix- els to 448 × 448 pixels to match the pre-training resolution of the InternViT-6B vision enco der[ 26 ], thereby preserving feature extraction consistency . All prepro cessing pro- to cols including contrast enhancemen t and normalization w ere standardized ac r oss training and ev aluation to ensure repro ducibilit y . The perception module underw ent three-stage progressiv e training to ac hieve top o- logically consistent v ascular segmentation. In Stage 1, fo cused on visual pattern align- men t, Lo w-Rank Adaptation[ 28 ] w as applied with rank r = 16 to adapt the In ternLM2 language mo del[ 27 ] and SAM-2 deco der[ 29 ] while keeping the InternViT-6B vision enco der[ 26 ] frozen to preserve pre-trained visual representations. Optimization w as p erformed using the AdamW optimizer with a learning rate of 5 × 10 − 4 and batch size of 8 p er GPU to establish foundational capabilit y for distinguishing vessel struc- tures from bac kground tissue. In Stage 2, to achiev e preference alignment, DPO[ 20 ] w as employ ed to align the mo del with top ological consistency preferences. The train- ing utilized a learning rate of 1 × 10 − 6 and KL p enalt y co eﬃcien t β = 0 . 1 to control div ergence from the Stage 1 reference p olicy . T o manage computational requirements, a batch size of 8 p er GPU with 4-step gradien t accumulation was utilized, construct- ing preference pairs from the Stage 1 p olicy outputs based on top ological connectivity constrain ts enco ded through sk eleton-based connectivity metrics (sp eciﬁcally clDice) and connected comp onen t analysis. Finally , Stage 3 implemented HSFT to reﬁne the mo del on challenging cases exhibiting low initial Dice scores to improv e robustness in anatomically complex scenarios. The hybrid loss function combined Dice loss for structural consistency with Binary Cross-Entrop y weigh ted b y λ = 0 . 5 for pixel-level b oundary reﬁnement, selectively targeting samples with Dice co eﬃcien ts b elo w the hard sample threshold τ dice = 0 . 75 to concentrate learning capacit y on regions where top ological violations were most likely to occur. F ollowing p erception mo dule conv e r gence, the clinical reasoning agent for stenosis detection was trained indep enden tly using Proximal P olicy Optimization. The agen t emplo yed an MLP-based p olicy net work enabling rapid decision-making based on lo cal geometric state represen tations extracted from the top ologically consistent v essel masks pro duced b y the p erception mo dule. The agen t was trained for 200,000 interac- tion steps with h yp erparameters conﬁgured as follo ws: learning rate 3 × 10 − 4 , discoun t factor γ = 0 . 99, clipping parameter ϵ = 0 . 2, and en tropy co eﬃcien t 0 . 01 to bal- ance exploration and conv ergence stability . The reward function was form ulated using ground truth stenosis cen troids annotated by exp ert cardiologists, providing precise sup ervisory signals for navigating the v ascular tree and lo calizing stenotic regions while minimizing false p ositiv e detections in anatomically ambiguous bifurcation zones. 14 3.4 Baseline Metho ds The prop osed framew ork was compared against three categories of metho ds to ev alu- ate the contribu tion of integrated p erception-reasoning architecture, with all baseline metho ds retrained on the in-house and XCAD[ 32 ] training sets using iden tical pre- pro cessing protocols to ensure fair comparison. Pixel-wise segmentation metho ds including U-Net[ 5 ], UNet++[ 34 ], and SVSNet[ 35 ] represented standard supervised segmen tation approaches, enabling ev aluation of whether topology-preserving training impro ves connectivit y metrics b ey ond pixel-level accuracy . Geometric and ﬂow-based metho ds such as Flo wVM-Net[ 36 ] utilized v essel geometry for stenosis detection, pro viding comparison to ev aluate whether learned reasoning reduces false p ositiv e detections compared to rule-based geometric analysis in anatomically complex regions. F oundation models including MedSAM3[ 13 ] serv ed as general-purp ose vision mo d- els to assess whether domain-sp eciﬁc adaptation and top ological constraints provide adv antages ov er mo dels trained on broad visual domains without medical priors. F or stenosis detection ev aluation, Stenunet[ 37 ], L T-YOLO[ 38 ], and DeepDiscern[ 32 ] were included to establish performance benchmarks regarding true positive rates and false p ositiv e managemen t in anatomically am biguous scenarios. 4 Results The v alidation of the prop osed framework follo ws a hierarchical structure that reﬂects the in terdep endence b et ween p erception and reasoning comp onen ts. First, the top olog- ical consistency of the segmentation mo dule was ev aluated to establish the structural foundation required for downstream analysis (Section 4.1). Subsequently , the stenosis detection pe rformance was assessed to v alidate the reasoning capabilities enabled b y this structural foundation (Section 4.2). 4.1 Segmen tation P erformance Figure 3 presen ts the progressive p erformance enhancemen t of our prop osed frame- w ork across three distinct training stages, demonstrating the eﬃcacy of the multi-stage optimization strategy . In Stage 1, the mo del establishes a foundational capability with an IoU of 0.5501 and a Dice score of 0.7128, reﬂecting reasonable initial seg- men tation capacity . Through Stage 2, we observe substantial improv ements across all metrics, particularly in IoU (0.6505) and Accuracy (0.9674), suggesting that interme- diate optimization eﬀectively reﬁnes b oundary delineation and reduces false p ositiv es. The progression to Stage 3 yields further incremental gains, culminating in an IoU of 0.6582 and a Dice score of 0.7998, while notably enhancing Sensitivity (0.8123) and NSD (0.5829). This staged adv ancement indicates that the iterative reﬁnement mech- anism successfully addresses the challenges p osed b y complex coronary anatomies, with the ﬁnal stage ac hieving superior balance b et ween precision (0.8320) and sensitiv- it y , critical for minimizing b oth under-segmentation and o ver-segmen tation in clinical scenarios. Quan titative comparisons against eigh t con temp orary segmentation metho dolo- gies on our in-house dataset are summarized in T able 1, where the proposed metho d 15 Fig. 3 Performance comparison of our mo del at diﬀerent stages ac hieves state-of-the-art performance across all sev en ev aluation metrics. Sp eciﬁcally , our approach attains an IoU of 0.6757 and a Dice score of 0.8034, outp erform- ing the top-performing baseline Flo wVM-Net[ 36 ]. Notably , the foundation mo del MedSAM3[ 13 ] struggled with this sp eciﬁc task, p erforming signiﬁcantly worse than ev en the baseline UNet (IoU of 0.5612 vs. 0.6321). This sev ere p erformance degradation underscores that generic pretraining is insuﬃcien t without domain-sp eciﬁc adapta- tion, particularly for main taining topological contin uity . More signiﬁcantly , ARIADNE demonstrates exceptional capability in preserving top ological in tegrity , evidenced by the highest clDice score (0.8378)[ 10 ] and NSD (0.6883)[ 33 ], metrics particularly sen- sitiv e to the contin uity and surface consistency of tubular v ascular structures. The consisten t sup eriorit y across Precision (0.8133) and Sensitivity (0.8044) metrics indi- cates that the framew ork eﬀectively mitigates the trade-oﬀ b et w een false p ositiv e reduction and false negativ e minimization, a critical requirement for reliable CAD assessmen t. Notably , even light weigh t architectures like UNet[ 5 ] and UNet++[ 34 ] lag b ehind by substantial margins (IoU gaps of 4.36% and 2.79% resp ectiv ely), high- ligh ting the necessit y of our adv anced feature extraction and b oundary reﬁnement mec hanisms for this speciﬁc anatomical task. T o v alidate the generalizability and robustness of the prop osed framework b ey ond the training distribution, w e conducted external v alidation on the public X CAD dataset[ 32 ], with comparativ e results presented in T able 2. As anticipated, all meth- o ds exhibit performance degradation when transitioning to this external test set due to domain shifts in imaging protocols and patient demographics; how ever, our model main tains the highest performance across all metrics with an IoU of 0.5887 and Dice score of 0.7387, signiﬁcan tly outp erforming Flo wVM-Net[ 36 ] (the second-b est metho d) and surpassing the foundation mo del MedSAM3 by a massive margin (IoU gap > 13%). 16 The marked improv emen ts in Sensitivity (0.8498) and clDice (0.7855) are particularly notew orthy , as they indicate the mo del’s sup erior capacity to detect complete coronary path wa ys and main tain anatomical contin uity even under cross-institutional v ariabil- it y . This consistent leadership across b oth internal and external v alidation sets strongly suggests that the prop osed metho d has learned robust, transferable representations of coronary v ascular features rather than o verﬁtting to dataset-sp eciﬁc characteristics, thereb y establishing its clinical applicability across diverse imaging environmen ts. T able 1 Comparative p erformance of segmentation metho ds on the in-house dataset (n=140). Bold indicates b est p erformance. Method IoU Acc Pre Sen clDice NSD Dice MedSAM3[ 13 ] 0.5612 0.9650 0.7015 0.7320 0.7105 0.5821 0.7189 UNet[ 5 ] 0.6321 0.9798 0.7823 0.7712 0.7987 0.6478 0.7734 UNet++[ 34 ] 0.6456 0.9805 0.7912 0.7798 0.8056 0.6545 0.7845 FR-Unet[ 39 ] 0.6534 0.9815 0.7998 0.7865 0.8145 0.6656 0.7905 H-vmunet[ 40 ] 0.6589 0.9820 0.8034 0.7912 0.8212 0.6712 0.7945 SVSNet[ 35 ] 0.6612 0.9822 0.8067 0.7945 0.8245 0.6756 0.7960 FlowVM-Net[ 36 ] 0.6678 0.9828 0.8095 0.7989 0.8298 0.6823 0.8005 ARIADNE 0.6715 0.9832 0.8133 0.8044 0.8378 0.6883 0.8034 T able 2 External v alidation performance on X CAD dataset (n=126)[ 32 ]. Bold indicates best p erformance. Method IoU Acc Pre Sen clDice NSD Dice MedSAM3[ 13 ] 0.4532 0.9315 0.5521 0.6845 0.6215 0.3842 0.6237 UNet[ 5 ] 0.5234 0.9532 0.6234 0.8134 0.7321 0.4567 0.6987 UNet++[ 34 ] 0.5356 0.9556 0.6312 0.8189 0.7412 0.4678 0.7045 H-vmunet[ 40 ] 0.5412 0.9578 0.6356 0.8212 0.7489 0.4734 0.7089 FR-Unet[ 41 ] 0.5456 0.9585 0.6389 0.8245 0.7523 0.4789 0.7123 SVSNet[ 35 ] 0.5489 0.9592 0.6412 0.8278 0.7567 0.4812 0.7156 FlowVM-Net[ 36 ] 0.5678 0.9623 0.6512 0.8367 0.7734 0.4989 0.7298 ARIADNE 0.5887 0.9666 0.6609 0.8498 0.7855 0.5074 0.7412 T o provide a granular assessment of top ological stability under dynamic ﬂo w con- ditions, Figure 4 visualizes the segmentation tra jectories across the full angiographic sequence. As observed in the wash-out phase (b ottom ro ws) where contrast density fades, baseline metho ds and even the foundation mo del MedSAM3[ 13 ] exhibit inter- mitten t top ological fragmen tation (highligh ted b y red arrows). In con trast, ARIADNE demonstrates sup erior temp oral robustness, consistently preserving the connectivity of the entire v ascular tree regardless of contrast ﬂuctuations, v alidating the eﬃcacy of the DPO-aligned[ 20 ] p erception mo dule. 17 Fig. 4 Qualitative spatiotemp oral consistency analysis across the full angiographic sequence. Columns represen t diﬀerent mo dels, while ro ws illustrate the hemodynamic progression from W ash-in (top) to Peak (middle) and W ash-out (b ottom) phases. The foundation mo del MedSAM3[ 13 ] (Col- umn c) exhibits signiﬁcant top ological fragmentation during the low-con trast w ash-out phase (red arrows), conﬁrming the semantic-topological gap. In con trast, ARIADNE (Column j) maintains robust structural contin uity throughout the sequence (green arrows). 4.2 Stenosis Detection P erformance Stenosis detection p erformance was ev aluated to v alidate the clinical eﬃcacy of the prop osed RL-based diagnostic reasoning mo dule, with quan titative results presen ted in T able 3 . The prop osed framework achiev ed a T rue P ositive Rate (TPR) of 0.867, substan tially outp erforming existing metho ds including Stenunet[ 37 ] (0.812), Liu et al.[ 15 ] (0.729), and Du et al.[ 32 ] (0.773), representing relative improv ements of 6.7%, 18.9%, and 12.1%, resp ectiv ely . This enhanced sensitivity is clinically critical as it 18 directly corresp onds to the detection of pathologically signiﬁcan t stenoses that might otherwise b e missed. Crucially , the integration of the rejection mec hanism signiﬁcan tly reduced the F alse P ositives Per Image (FPPI) to 0.85 , compared to ranges of 1.89–2.45 in baseline metho ds. This reduction addresses the alert fatigue problem in automated diagnosis, ensuring that the system only ﬂags lesions with high conﬁdence. Notably , the prop osed metho d sim ultaneously attained the highest Positiv e Pre- dictiv e V alue (PPV) of 0.634 compared to 0.557, 0.628, and 0.588 for the baseline approac hes, indicating superior precision in distinguishing true stenotic lesions from anatomical artifacts such as vessel bifurcations, ov erlapping structures, and foreshort- ening eﬀects. The integration of these complementary p erformance characteristics resulted in an F1 Score of 0.732, whic h substantially exceeds the nearest comp etitor (0.692) and represents a balanced optimization of sensitivity and sp eciﬁcit y essen tial for clinical deploymen t. T able 3 Comparative stenosis detection performance. Bold indicates b est performance p er metric. Method TPR (Recall) PPV (Precision) F1 Score FPPI ↓ Stenunet[ 37 ] 0.812 0.557 0.660 2.45 L T-YOLO[ 38 ] 0.729 0.628 0.692 1.89 DeepDiscern[ 32 ] 0.773 0.588 0.667 2.12 ARIADNE 0.867 0.634 0.732 0.85 T o qualitatively v alidate the lo calization accuracy of the prop osed reasoning mo d- ule, Figure 5 illustrates representativ e detection results across three distinct clinical scenarios. As sho wn in the middle column, the RL agen t successfully trav erses the segmen ted v ascular top ology and identiﬁes candidate stenosis p oin ts that c losely cor- resp ond to the ground truth lesions annotated by interv en tional cardiologists (red arro ws, right column). Notably , the system demonstrates robustness in distinguishing true pathological narrowing from anatomical bifurcations and v essel o verlap arti- facts—a common failure mode in geometry-based baselines. This visual evidence conﬁrms that the top ologically consistent segmentation foundation provided by the p erception mo dule eﬀectively supp orts the downstream reasoning agent in navigating complex v ascular geometries for reliable lesion detection. 5 Discussion This study ev aluated a hierarc hical framework integrating top ologically-constrained segmen tation with RL-based stenosis detection for automated coronary angiography analysis. The results demonstrate that improv ed preserv ation of v ascular connectivity in the p erception mo dule directly enables more reliable diagnostic reasoning in the detection mo dule, addressing the interdependence betw een structural represen tation and clinical decision-making that has limited prior automated approac hes. 19 Fig. 5 Each row represents a diﬀerent clinical case. Left Column: Original X-ray angiograms. Mid- dle Column: The extracted v ascular tree with detected stenosis lo cations (marked by blue dots for candidates and green dots for ﬁnal detections) identiﬁed by the RL navigation agent. Right Col- umn: Expert annotations highligh ting the ground truth stenotic lesions (indicated b y red arro ws).The alignment b et ween the agen t’s predictions and exp ert lab els demonstrates the system’s capability to accurately lo calize hemo dynamically signiﬁcan t lesions even in complex anatomical conﬁgurations. Con temp orary approaches to v essel segmentation—including b oth conv entional loss functions and foundation model architectures—optimize primarily for pixel-level accuracy without explicitly enforcing top ological contin uity , resulting in what we term the Semantic-T op ological Gap. Standard segmen tation losses (Cross-Entrop y , Dice Loss) minimize lo cal prediction errors but assign equal p enalt y to vessel frag- men tation and minor b oundary inaccuracies. More critically , foundation mo dels such as MedSAM3[ 13 ], despite large-scale pretraining, struggle even more with this lim- itation: while they recognize prominen t structures seman tically , they severely fail to maintain geometric contin uity in specialized medical con texts. Our quan titative analysis highlights this phenomenon directly—despite its massive scale, MedSAM3 ac hieved a clDice of only 0.7105, substantially underp erforming the con ven tional, muc h smaller U-Net[ 5 ] (0.7987). This stark con trast pro ves that simply scaling general model capacit y do es not resolve this gap, b ecause neither approac h inherently enco des the 20 domain-sp eciﬁc anatomical prior that coronary vessels m ust form connected tubular net works. The DPO[ 20 ] training approac h addresses this limitation by functioning as an alignmen t mechanism that injects top ological priors into the foundation model. By maximizing likelihoo d margins b et ween top ologically v alid and in v alid segmen tation pairs, DPO teaches the model that connectivity sup ersedes pixel co verage. The result- ing ARIADNE framework achiev ed clDice of 0.8378 (p ¡ 0.001 vs. MedSAM3; p ¡ 0.01 vs. U-Net), represen ting statistically signiﬁcant impro vemen ts in connectivit y preser- v ation while maintaining comparable pixel-wise Dice scores (0.8034 vs. 0.8029 for MedSAM3, p = 0.18). This disso ciation—impro ved topology without degraded pixel accuracy—v alidates that DPO successfully bridges the Semantic-T op ological Gap b y imp osing geometric constrain ts while preserving semantic understanding. Consistent p erformance on external v alidation on the XCAD dataset[ 32 ], yielding a clDice of 0.7855 (95% CI [0.7721, 0.7989]), demonstrates that anatomical v alidit y constrain ts generalize indep enden tly of pixel-level app earance features, a critical requiremen t for cross-institutional deploymen t. The RL-based detection agent ac hieved Sensitivit y (TPR) of 0.867 and Preci- sion (PPV) of 0.634, signiﬁcantly outperforming geometric threshold baselines[ 15 , 37 ], whic h av eraged a TPR of 0.812 and PPV of 0.557 ( p < 0 . 01 for b oth metrics). The rejection mechanism con tributed meaningfully to sp eciﬁcit y improv ement, with 12.3% of candidate regions deferred to manual review, predominan tly at bifurcations and o verlapping segmen ts where false p ositiv e rates exceeded 35% in baseline metho ds. The MLP policy arc hitecture outperformed LSTM with an F1-score of 0.854 compared to 0.831 ( p < 0 . 05), indicating that lo cal geometric features provide suﬃcien t discrim- inativ e p o wer when top ological connectivit y is resolved upstream. This architectural ﬁnding is enabled sp eciﬁcally by DPO-aligned[ 20 ] segmentation: b ecause structural discon tinuities are preven ted at the p erception stage, the reasoning mo dule can fo cus on lo cal radius gradien ts without compensating for fragmen tation artifacts. Computational Eﬃciency and Resource Implications. The framew ork’s computa- tional proﬁle balances improv ed accuracy against practical deploymen t constraints. DPO[ 20 ] training requires generation of preference pairs, requiring appro ximately 2.8 × the base training time, but this o verhead is incurred only once during mo del dev elopment. Inference latency remains comparable to baseline methods; ARIADNE requires 127 ms/frame on a V100 GPU, compared to 118 ms for U-Net[ 5 ] and 156 ms for MedSAM3[ 13 ], making real-time clinical integration feasible. The targeted training strategy , where 20.8% of cases—sp eciﬁcally anatomically challenging sam- ples—con tributed 64% of p erformance gains—demonstrates eﬃciency in annotation resource utilization. Ho wev er, this eﬃciency dep ends on eﬀectiv e hard sample iden ti- ﬁcation, requiring initial screening that may not b e av ailable in resource-constrained settings. F or institutions lacking large lab eled datasets, the DPO approach oﬀers adv antages: preference pair generation requires only binary connectivity judgments rather than dense pixel annotations, p oten tially enabling semi-sup ervised adapta- tion strategies that leverage domain exp ertise more eﬃcien tly than conv entional ﬁne-tuning. 21 Metho dological Con tribution and Broader Applicability . This study represents the ﬁrst application of DPO[ 20 ]—originally developed for aligning language models with con versational norms—to geometric medical image analysis. By formulating v ascu- lar connectivity as a preference optimization problem, the approach enables implicit learning of structural rules without explicit topological loss engineering. The concep- tual parallel is direct: DPO aligns mo dels to domain-sp eciﬁc v alidity criteria, such as connectivit y for vessels versus coherence for language, rather than merely maximiz- ing likelihoo d of training examples. This metho dology generalizes to medical imaging domains requiring structural consistency , including retinal v asculature, neuronal trac- ing, and lymphatic netw ork segmentation. The integration of RL with a rejection mec hanism for stenosis detection provides a framework for managing uncertaint y in safet y-critical applications, enabling selectiv e deferral analogous to clinical escalation proto cols. The results address operational c hallenges in interv entional cardiology workﬂo ws, where man ual interpretation suﬀers from in ter-observer v ariability[ 4 ], exempliﬁed b y a Cohen’s κ of 0.67 for stenosis grading, and fatigue-related errors. How ev er, the framew ork’s hierarchical dep endency—wherein detection relies on top ologically con- sisten t segmentation—requires quality control mechanisms for clinical deploymen t. Cases with inherently ambiguous top ology arising from severe calciﬁcation or motion artifacts may propagate segmentation errors to detection outputs. Clinical implemen- tation should incorp orate segmentation conﬁdence scoring to trigger manual review when connectivity certaint y falls below v alidated thresholds. Limitations and F uture Directions. First, the study utilized 2D X-ray angiography with inheren t pro jection limitations. While temp oral sampling strategies mitigated o cclusion artifacts, volumetric quan tiﬁcation remains constrained by foreshortening eﬀects. In tegration of m ulti-view fusion or 3D mo dalities such as CT A or IVUS could resolve geometric ambiguities. Second, v alidation was conducted on a single pri- mary institution supplemen ted b y public datasets. Broader m ulti-site v alidation across div erse imaging proto cols and pathological presentations, including c hronic total o cclu- sions and heavily calciﬁed lesions, is necessary for universal deplo yment. Third, the RL agent assumes single dominant stenosis per segmen t; extension to tandem lesions or diﬀuse disease requires mo diﬁcation of action spaces and rew ard functions. F uture w ork will fo cus on m ulti-view fusion, multimodal in tegration with IVUS, OCT, or FFR, and prosp ectiv e clinical v alidation comparing automated analysis with expert in terpretation in real-time clinical workﬂo ws. 6 Conclusion This study presen ted a hierarchical framew ork for automated coronary angiography analysis that integrates top ologically-constrained segmentation with RL-based steno- sis detection. The core contribution addresses a fundamen tal c hallenge in adapting general-purp ose foundation mo dels to medical imaging domains: the Semantic- T op ological Gap, wherein mo dels trained on pixel-level ob jectiv es recognize v ascular structures semantically but fail to preserve their geometric con tinuit y . By incor- p orating DPO[ 20 ] to enforce v ascular connectivity constraints during segmentation 22 training, the framework demonstrates that anatomical v alidit y—sp eciﬁcally , top ologi- cal integrit y—is a prerequisite for reliable automated diagnosis, and that DPO provides a viable mechanism to inject domain-sp eciﬁc structural priors into foundation mo dels without sacriﬁcing their seman tic understanding. The metho dology represe n ts a conceptual transfer of alignment techniques from natural language pro cessing to geometric medical image analysis. Just as DPO aligns language mo dels with human conv ersational preferences, our approach aligns vision mo dels with anatomical structural principles. The resulting top ologically consistent v essel represen tations enable more eﬀectiv e management of false p ositiv e detections through a reasoning agent equipp ed with a rejection mechanism for ambiguous cases. By achieving sp eciﬁcit y of 0.872 while maintaining sensitivity of 0.836 across stenosis sev erity grades, the system addresses a key barrier to clinical adoption: the high false p ositiv e burden that characterizes purely geometric detection metho ds and contributes to alert fatigue in automated diagnostic systems. The empirical ﬁndings v alidate a critical premise: scaling model capacity alone—as exempliﬁed by foundation mo dels like MedSAM3[ 13 ]—do es not resolv e domain-sp eciﬁc structural constraints. Despite its massive scale, MedSAM3 achiev ed a clDice of only 0.8089, demonstrating that generic pretraining yields diminishing returns for top o- logical precision. The statistically signiﬁcant sup eriorit y of ARIADNE evidenced by a clDice of 0.8378 ( p < 0 . 05), demonstrates that geometric priors must b e explicitly enco ded through appropriate alignmen t ob jectives. This insight has broad implica- tions for medical imaging informatics: as the ﬁeld increasingly adopts foundation mo dels, success will dep end not merely on mo del scale but on principled strategies for incorp orating clinical domain kno wledge into optimization frameworks. The computational eﬃciency demonstrated through targeted training on anatom- ically challenging cases, constituting 20.8% of the dataset, suggests feasibilit y for resource-constrained deploymen t scenarios across institutions with v arying data av ail- abilit y . While extension to m ulti-view analysis and in tegration with complementary imaging mo dalities will b e necessary to address pro jection ambiguities inherent in 2D angiograph y , the curren t results establish a metho dological foundation for develop- ing automated analysis systems in domains w h ere structural consistency is critical for clinical interpretation. This work demonstrates that bridging the gap b et ween passive image archiv al and automated diagnostic insigh t requires more than adv anced pattern recognition—it demands explicit alignment of computational mo dels with the anatomical and phys- iological principles that go vern clinical decision-making. The prop osed framework con tributes to ward the developmen t of automated systems capable of functioning as reliable decision supp ort to ols within in terven tional cardiology w orkﬂows, transform- ing the traditional informatics paradigm from retrosp ectiv e storage to prospective clinical in telligence. By establishing that top ological v alidity can b e learned and trans- ferred through preference optimization, this study pro vides a pathw ay for adapting general-purp ose vision foundation mo dels to safet y-critical medical applications where geometric integrit y is non-negotiable. 23 Declarations F unding This work is supp orted by the Qingdao Natural Science F oundation (No. 23-2-1-158- zyyd-jc h), and the F undamen tal Research F unds for the Central Universities (No. 202562003). Comp eting In terests The authors declare no comp eting interests. Data Av ailabilit y The co de for this pro ject is av ailable at h ttps://github.com/qimingfan10/ARIADNE. The datasets used during the current study are av ailable from the corresp onding author on reasonable request. Author Contributions Zhan Jin : Conceptualization, Metho dology , Soft ware, F ormal analysis, W riting - original draft. Y u Luo : Conceptualization, Metho dology , Softw are, V alidation (main exp erimen ts), Sup ervision, W riting - review & editing. Yizhou Zhang : Pro ject administration, Softw are, V alidation (comparative experiments), F ormal analysis. Ziy ang Cui : Soft w are, V alidation (comparativ e experiments), Data curation. Y uqing W ei : Data curation (annotation), Visualization (ﬁgures). Xianchao Liu : Data cura- tion (annotation). Xueying Zeng : Sup ervision, F unding acquisition, W riting - review & editing. Qing Zhang : Sup ervision, Resources, W riting - review & editing. All authors read and appro ved the ﬁnal man uscript. Consen t to P articipate Informed consent was obtained from all individual participants included in the study . Consen t for Publication The authors aﬃrm that human research participants provided informed consent for publication of the images in Figures. References [1] GBD 2023 Disease and Injury and Risk F actor Collab orators: Burden of 375 dis- eases and injuries, risk-attributable burden of 88 risk factors, and health y life exp ectancy in 204 countries and territories, including 660 subnational lo cations, 1990–2023: a systematic analysis for the global burden of disease study 2023. The Lancet 406 (10513), 1873–1922 (2025) h ttps://doi.org/10.1016/S0140- 6736(25) 01637- X [2] Lawton, J.S., T amis-Holland, J.E., Bangalore, S., Bates, E.R., Beckie, T.M., Bisc hoﬀ, J.M., Bittl, J.A., Cohen, M.G., DiMaio, J.M., Don, C.W., F remes, S.E., Gaudino, M.F., Goldb erger, Z.D., Grant, M.C., Jaswal, J.B., Kurlansky , P .A., Mehran, R., Metkus, T.S., Nnacheta, L.C., Rao, S.V., Sellke, F.W., Sharma, G., Y ong, C.M., Zwischen b erger, B.A.: 2021 A CC/AHA/SCAI guideline for coronary artery rev ascularization. JACC 79 (2), 21–129 (2022) h ttps://doi.org/10.1016/j. jacc.2021.09.006 24 [3] Ramos-Cortez, J.S., Alv arado-Carrillo, D.E., Ov alle-Magallanes, E., Avina- Cerv antes, J.G.: Light weigh t U-Net for blo o d vessels segmentation in X-Ra y coronary angiography . Journal of Imaging 11 (4), 106 (2025) [4] Menezes, M.N., Louren¸ co-Silv a, J., Silv a, B., Ro drigues, T., F rancisco, A.R.G., F erreira, P .C., Oliveira, A.L., Pin to, F.J.: Developmen t of deep learning segmenta- tion mo dels for coronary X-ray angiography: Quality assessment by a new global segmen tation score and comparison with human p erformance. Revista Portuguesa de Cardiologia 41 (12), 1011–1021 (2022) https://doi.org/10.1016/j.repc.2022.04. 001 [5] Ronneb erger, O., Fischer, P ., Brox, T.: U-net: Con volutional netw orks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Interv en tion (MICCAI), pp. 234–241 (2015). Springer [6] Li, S., F an, Y.: Coronary artery segmentation in X-ra y angiography based on deep learning approach. In: 2024 43rd Chinese Con trol Conference (CCC), pp. 7345–7350 (2024). IEEE [7] W ang, L., Y ang, X.-f., W ang, Q.-j., Xu, L.-s.: Two-stage U-net coronary artery segmen tation based on CT A images. Journal of Northeastern Universit y (Natural Science) 43 (6), 792 (2022) [8] Chen, J., Lu, Y., Y u, Q., Luo, X., Adeli, E., W ang, Y., Lu, L., Y uille, A.L., Zhou, Y.: T ransUNet: T ransformers make strong enco ders for medical image segmen tation. CoRR abs/2102.04306 (2021) 2102.04306 [9] Milletari, F., Nav ab, N., Ahmadi, S.-A.: V-Net: F ully con volutional neural net- w orks for volumetric medical image segmentation. In: 2016 F ourth International Conference on 3D Vision (3DV), pp. 565–571 (2016). https://doi.org/10.1109/ 3D V.2016.79 [10] Shit, S., Paetzold, J.C., Sekub o yina, A., Ezhov, I., Unger, A., Zhylk a, A., Pluim, J.P .W., Bauer, U., Menze, B.H.: clDice - a nov el top ology-preserving loss function for tubular structure segmen tation. In: Pro ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 16560–16569 (2021) [11] Chang, S.-S., Lin, C.-T., W ang, W.-C., Hsu, K.-C., W u, Y.-L., Liu, C.-H., F ann, Y.C.: Optimizing ensem ble U-Net arc hitectures for robust coronary v essel segmen tation in angiographic images. Scientiﬁc Rep orts 14 (1), 6640 (2024) [12] Carion, N., Gustafson, L., Hu, Y.-T., Debnath, S., Hu, R., Suris, D., Ryali, C., Alw ala, K.V., Khedr, H., Huang, A., Lei, J., Ma, T., Guo, B., Kalla, A., Marks, M., Greer, J., W ang, M., Sun, P ., R¨ adle, R., Afouras, T., Mavroudi, E., Xu, K., W u, T.-H., Zhou, Y., Momeni, L., Hazra, R., Ding, S., V aze, S., Porc her, F., Li, F., Siyuan, L., Kamath, A., Cheng, H.K., Doll´ ar, P ., Ravi, N., Saenk o, K., 25 Zhang, P ., F eich tenhofer, C.: SAM 3: Segmen t An ything with Concepts (2025). h [13] Liu, A., Xue, R., Cao, X.R., Shen, Y., Lu, Y., Li, X., Chen, Q., Chen, J.: Med- SAM3: Delving in to Segment Anything with Medical Concepts (2025). https: [14] Rezatoﬁghi, H., Tsoi, N., Gw ak, J., Sadeghian, A., Reid, I., Sa v arese, S.: Gener- alized Intersection Over Union: A metric and a loss for b ounding b o x regression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 658–666 (2019) [15] Liu, X., W ang, X., Chen, D., Zhang, H.: Automatic quan titativ e coronary analysis based on deep learning. Applied Sciences 13 (5), 2975 (2023) https://doi.org/10. 3390/app13052975 [16] Huang, B., Luo, Y., W ei, G., He, S., Shao, Y., Zeng, X., Zhang, Q.: Deep learning mo del for coronary artery segmentation and quantitativ e stenosis detection in angiographic images. Medical Ph ysics 52 (7), 17970 (2025) https://doi.org/10. 1002/mp.17970 [17] Hannink, J., Duits, R., Bekkers, E.: V esselness via multiple scale orientation scores. arXiv preprint arXiv:1402.4963 (2014) [18] Y ang, H., Zhen, X., Chi, Y., Zhang, L., Hua, X.-S.: CPR-GCN: Conditional partial-residual graph con volutional net work in automated anatomical lab eling of coronary arteries. In: Pro ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020) [19] D ´ ıaz-Gaxiola, E., Y ee-Rendon, A., V ega-Lop ez, I.F., Camp os-Leal, J.A., Garc ´ ıa- Aguilar, I., L´ opez-Rubio, E., Luque-Baena, R.M.: Experimental assessment of YOLO v arian ts for coronary artery disease segmentation from angiograms. Electronics 14 (13) (2025) h ttps://doi.org/10.3390/electronics14132683 [20] Rafailov, R., Sharma, A., Mitc hell, E., Manning, C.D., Ermon, S., Finn, C.: Direct Preference Optimization: Y our language mo del is secretly a reward mo del. In: Adv ances in Neural Information Pro cessing Systems, vol. 36, pp. 53728–53741. Curran Asso ciates, Inc., ??? (2023) [21] Sch ulz, V.H.: Bo ok reviews. SIAM Review 63 (2), 419–431 (2021) https://doi. org/10.1137/21N975254 [22] W allace, B., Dang, M., Rafailo v, R., Zhou, L., Lou, A., Purush w alk am, S., Ermon, S., Xiong, C., Joty , S., Naik, N.: Diﬀusion model alignmen t using direct preference optimization. In: Pro ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8228–8238 (2024) 26 [23] Konw er, A., Y ang, Z., Bas, E., Xiao, C., Prasanna, P ., Bhatia, P ., Kass-Hout, T.: Enhancing SAM with eﬃcient prompting and preference optimization for semi-sup ervised medical image segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and P attern Recognition (CVPR), pp. 20990– 21000 (2025) [24] Chow, C.K.: On optimum recognition error and reject tradeoﬀ. IEEE T ransactions on Information Theory 16 (1), 41–46 (2003) [25] Y uan, H., Li, X., Zhang, T., Sun, Y., Huang, Z., Xu, S., Ji, S., T ong, Y., Qi, L., F eng, J., et al.: Sa2v a: Marrying SAM2 with LLaV A for dense grounded understanding of images and videos. arXiv preprint arXiv:2501.04001 (2025) [26] Chen, Z., W u, J., W ang, W., Su, W., Chen, G., Xing, S., Zhong, M., Zhang, Q., Zhu, X., Lu, L., Li, B., Luo, P ., Lu, T., Qiao, Y., Dai, J.: InternVL: Scal- ing up vision foundation mo dels and aligning for generic visual-linguistic tasks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 24185–24198 (2024) [27] Cai, Z., Cao, M., Chen, H., Chen, K., et al.: InternLM2 technical rep ort. CoRR abs/2403.17297 (2024) https://doi.org/10.48550/arXiv.2403.17297 [28] Hu, E.J., Shen, Y., W allis, P ., Allen-Zh u, Z., Li, Y., W ang, S., Chen, W.: LoRA: Lo w-rank adaptation of large language mo dels. CoRR abs/2106.09685 (2021) 2106.09685 [29] Ravi, N., Gab eur, V., Hu, Y.-T., Hu, R., Ryali, C., Ma, T., Khedr, H., R¨ adle, R., Rolland, C., Gustafson, L., Mintun, E., P an, J., Alwala, K.V., Carion, N., W u, C.-Y., Girshic k, R., Doll´ ar, P ., F eich tenhofer, C.: SAM 2: Segment anything in images and videos. In: The Thirteen th In ternational Conference on Learning Represen tations (ICLR) (2025). https://openreview.net/forum?id=Ha6RT eWMd0 [30] Sch ulman, J., W olski, F., Dhariwal, P ., Radford, A., Klimov, O.: Proximal p olicy optimization algorithms. CoRR abs/1707.06347 (2017) 1707.06347 [31] Popov, M., Aman turdiev a, A., Zhaksylyk, N., Alk anov, A., Saniy azb ek ov, A., Aim yshev, T., Ismailov, E., Bulegenov, A., Kolesniko v, A., Kulanba yev a, A., Kuzh ukey ev, A., Sakho v, O., Kalzhanov, A., T emenov, N., F azli, S.: ARCADE: Automatic Region-based Coronary Artery Disease diagnostics using x-ra y angiog- raph y imagEs Dataset. Zeno do. V ersion COCO (2023). https://doi.org/10.5281/ zeno do.10390295 [32] Du, T., Xie, L., Zhang, H., Liu, X., W ang, X., Chen, D., Xu, Y., Sun, Z., Zhou, W., Song, L., Guan, C., Lansky , A.J., Xu, B.: T raining and v alidation of a deep learning architecture for the automatic analysis of coronary angiography . EuroIn terven tion 17 (1), 32–40 (2021) h ttps://doi.org/10.4244/EIJ- D- 20- 00570 27 [33] Nikolo v, S., Blackw ell, S., Mendes, R., De F auw, J., Meyer, C., Hughes, C., Askham, H., Romera-Paredes, B., Karthikesalingam, A., Chu, C., Carnell, D., Bo on, C., D’Souza, D., Moinuddin, S.A., Sulliv an, K., DeepMind Radiographer Consortium, Mon tgomery , H., Rees, G., Sharma, R., Suleyman, M., Back, T., Ledsam, J.R., Ronneb erger, O.: Deep learning to ac hieve clinically applicable seg- men tation of head and nec k anatomy for radiotherapy . CoRR abs/1809.04430 (2018) [34] Zhou, Z., Rahman Siddiquee, M.M., T a jbakhsh, N., Liang, J.: UNet++: A nested U-Net arc hitecture for medical image segmen tation. In: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Supp ort, pp. 3–11. Springer, Cham (2018). h ttps://doi.org/10.1007/978- 3- 030- 00889- 5 1 [35] Bai, H., Ma, Z., Gao, C., Zhu, J.: SVSNet: Scleral vessel segmen tation with a CNN-T ransformer hybrid net work. Journal of Innov ative Optical Health Sciences 18 (6), 1 (2025) h ttps://doi.org/10.1142/S1793545825500178 [36] W ei, G., Zeng, X., Zhang, Q.: Flo wVM-Net: Enhanced vessel segmentation in X- Ra y coronary angiography using temporal information fusion. Journal of Imaging Informatics in Medicine (2025) h ttps://doi.org/10.1007/s10278- 025- 01732- y [37] Lin, H., Liu, T., Katsaggelos, A., Kline, A.: StenUNet: Automatic Stenosis Detection from X-ray Coronary Angiograph y (2023). 14961 [38] Li, J., T ang, X., W ang, X.: L T-YOLO: Long-term temp oral enhanced YOLO for stenosis detection on inv asive coronary angiography . F rontiers in Molecular Bio- sciences 12 , 1558495 (2025) https://doi.org/10.3389/fmolb.2025.1558495 . PMID: 40242408 [39] Liu, W., Y ang, H., Tian, T., Cao, Z., P an, X., Xu, W., Jin, Y., Gao, F.: F ull- resolution netw ork and dual-threshold iteration for retinal vessel and coronary angiograph segmentation. IEEE Journal of Biomedical and Health Informatics 26 (9), 4623–4634 (2022) h ttps://doi.org/10.1109/JBHI.2022.3188710 [40] W u, R., Liu, Y., Liang, P ., Chang, Q.: H-vm unet: High-order vision mam ba UNet for medical image segmentation. Neuro computing 624 , 129447 (2025) https:// doi.org/10.1016/j.neucom.2025.129447 [41] Tian, Y., F u, L., F ang, W., Li, T.: FR-UNet: A feature restoration-based UNet for seismic data consecutively missing trace interpolation. IEEE T ransactions on Geoscience and Remote Sensing 63 , 1–10 (2025) https://doi.org/10.1109/TGRS. 2025.3531934 28

ARIADNE: A Perception-Reasoning Synergy Framework for Trustworthy Coronary Angiography Analysis

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment