ARIADNE: A Perception-Reasoning Synergy Framework for Trustworthy Coronary Angiography Analysis
Conventional pixel-wise loss functions fail to enforce topological constraints in coronary vessel segmentation, producing fragmented vascular trees despite high pixel-level accuracy. We present ARIADNE, a two-stage framework coupling preference-align…
Authors: Zhan Jin, Yu Luo, Yizhou Zhang
ARIADNE: A P erception-Reasoning Synergy F ramew ork for T rust w orth y Coronary Angiograph y Analysis Zhan Jin 1 † , Y u Luo 1 † , Yizhou Zhang 1 † , Ziy ang Cui 1 † , Y uqing W ei 1 , Xianc hao Liu 1 , Xueying Zeng 1* , Qing Zhang 2* 1 Sc ho ol of Mathematical Sciences, Ocean Universit y of China, Qingdao, 266100, Shandong, China. 2 Departmen t of Cardiology , Qilu Hospital (Qingdao), Cheelo o College of Medicine, Shandong Universit y , No. 758 Hefei Road, Qingdao, 266000, Shandong, China. *Corresp onding author(s). E-mail(s): zxying@ouc.edu.cn ; qingzhang2019@fo xmail.com ; Con tributing authors: zjin@stu.ouc.edu.cn ; luo yu@stu.ouc.edu.cn ; zyz6596@stu.ouc.edu.cn ; 3196148390@qq.com ; 2659799366@qq.com ; 2486063350@qq.com ; † These authors contributed equally to this w ork. Abstract Con ven tional pixel-wise loss functions fail to enforce top ological constraints in coronary v essel segmentation, producing fragmented v ascular trees despite high pixel-lev el accuracy . W e present ARIADNE, a tw o-stage framework coupling preference-aligned p erception with RL-based diagnostic reasoning for top olog- ically coherent stenosis detection. The p erception mo dule employs DPO to fine-tune the Sa2V A vision-language foundation model using Betti n umber con- strain ts as preference signals, aligning the p olicy to ward geometrically complete v essel structures rather than pixel-wise o verlap metrics. The reasoning mod- ule formulates stenosis lo calization as a Mark ov Decision Pro cess with an explicit rejection mechanism that autonomously defers ambiguous anatomical candidates such as bifurcations and vessel crossings, shifting from cov erage maximization to reliability optimization. On 1,400 clinical angiograms, ARI- ADNE ac hieves state-of-the-art centerline Dice of 0.838, reduces false p ositiv es b y 41% compared to geometric baselines. External v alidation on multi-cen ter 1 b enc hmarks AR CADE and XCAD confirms generalization across acquisition proto cols. This represents the first application of DPO for top ological align- men t in medical imaging, demonstrating that preference-based learning ov er structural constraints mitigates top ological violations while maintaining diag- nostic sensitivity in interv en tional cardiology workflo ws. The co de is av ailable at h ttps://github.com/qimingfan10/ARIADNE. Keyw ords: Coronary Angiograph y , F oundation Mo dels, Direct Preference Optimization, Reinforcement Learning, T op ological Consistency , Stenosis Detection 1 In tro duction Coronary Artery Disease (CAD) remains a leading cause of morbidit y and mortalit y w orldwide[ 1 ], requiring diagnostic mo dalities that provide accurate, repro ducible, and efficien t assessment. In v asive X-ra y Coronary Angiograph y (XCA) serv es as the pri- mary to ol for CAD diagnosis and guidance of Percutaneous Coronary Interv entions (PCI)[ 2 ], offering high temp oral resolution necessary for visualizing hemo dynamic flo w[ 3 ]. How ever, curren t clinical workflo ws rely hea vily on manual interpretation, a pro cess c haracterized b y significan t in ter-observer v ariability and susceptibility to clin- ician fatigue[ 4 ]. As healthcare institutions universally adopt Picture Arc hiving and Comm unication Systems (P ACS), a critical gap p ersists b et ween passiv e image storage and active, automated clinical interpretation. While hospitals hav e implemented digi- tal image storage, they lack automated systems capable of transforming ra w imaging data in to actionable clinical insigh ts. The gro wing v olume of in terven tional procedures mak es purely manual interpretation increasingly unsustainable, creating demand for Computer-Aided Diagnosis systems that can bridge the gap b et ween data acquisition and clinical decision-making. Accurate segmen tation of the coronary v ascular tree represen ts a fundamental pre- requisite for automated coronary analysis. Ov er the past decade, Conv olutional Neural Net works (CNNs), particularly U-Net[ 5 ] and its attention-enhanced v ariants suc h as CS-Net and SA-UNet[ 6 , 7 ], hav e dominated the field. More recently , Vision T rans- formers (ViTs) ha ve b een introduced to capture global spatial relationships[ 8 ]. Despite ac hieving high pixel-lev el p erformance metrics, these mo dels face a critical limitation in preserving v ascular top ology . T raditional loss functions, including Cross-Entrop y and Dice Loss[ 9 ], optimize pixel-lev el accuracy indep enden tly without explicitly penal- izing top ological errors[ 10 ]. Consequently , these models frequen tly pro duce fragmented v essel trees where distal branches app ear disconnected, particularly due to signal loss in thin vessels during do wnsampling op erations[ 11 ]. In coronary hemo dynamics anal- ysis, top ological connectivit y is essential; a segmen tation with high Dice score remains insufficien t for clinical use if discontin uities preven t accurate centerline extraction and subsequen t geometric analysis. The recent emergence of foundation-scale Vision-Language Mo dels (VLMs) has in tro duced a complementary approach to medical image segmen tation. Mo dels such as SAM3[ 12 ] and MedSAM3[ 13 ] leverage large language models to enable prompt-based 2 segmen tation, where textual descriptions guide mask generation. These arc hitectures demonstrate impressiv e semantic understanding, correctly iden tifying what constitutes a v essel based on learned visual-linguistic corresp ondences. How ever, their train- ing on generic natural image datasets creates a fundamental seman tic-top ological gap: while VLMs comprehend the conceptual category of a v ascular structure, they lac k the domain-sp ecific anatomical priors necessary to enforce structural contin u- it y in low-con trast, pro jection-based X-ray angiography . Empirical ev aluation reveals that general-purp ose VLMs consistently pro duce seman tically correct but topolog- ically fragmented segmentations—correctly classifying pixels as v essel while failing to maintain the connected tree structure essential for hemo dynamic mo deling. This failure stems from their optimization ob jectiv e: VLMs maximize pixel-level o verlap (Dice, IoU[ 14 ]) b et ween predicted and ground-truth masks, a criterion that remains agnostic to whether the resulting mask forms a contin uous v ascular netw ork or a col- lection of disconnected segments. In coronary angiography , where vessel diameters approac h image resolution limits and contrast v ariabilit y is substantial, the absence of explicit top ological constraints results in high-confidence predictions of isolated vessel fragmen ts that are clinically unusable for stenosis quan tification or flo w analysis. This limitation in segmentation directly impacts the accuracy of stenosis detection systems. Current automated framew orks predominan tly follo w a sequential approach where segmen tation and stenosis detection are p erformed as independent tasks[ 15 , 16 ]. In these systems, geometric algorithms trav erse the segmented centerline to iden- tify regions of narrowing. How ever, these deterministic algorithms lac k the abilit y to distinguish pathological stenosis from common anatomical artifacts, including vessel crossings[ 17 ], bifurcations, and foreshortening[ 18 ], resulting in elev ated false p ositiv e rates. Conv ersely , while deep ob ject detectors suc h as YOLO hav e b een applied to direct lesion iden tification[ 19 ], they inheren tly lack the capacity to verify anatomical plausibilit y . Sp ecifically , generic ob ject detectors treat lesions as isolated b ounding b o xes, failing to v alidate whether a detected stenosis actually resides within a con- tin uous, hemo dynamically relev ant v ascular segment. These limitations hav e hindered clinical adoption due to the high rate of false alarms that reduce system reliability . T o address these fundamental challenges in coronary angiography automation, we prop ose ARIADNE (Anatomy-a ware Reasoning for Integrated Angiography Diagnosis and Navigation Exp ert), a framework that bridges the gap b et ween visual p erception and clinical reasoning. Our central h yp othesis is that robust diagnostic automation requires not only accurate visual recognition but also explicit alignment with the hier- arc hical reasoning patterns employ ed b y expert clinicians. Building on recen t adv ances in preference-based learning from the artificial in telligence communit y , we in tegrate Direct Preference Optimization (DPO)[ 20 ] with Reinforcement Learning (RL)[ 21 ] to create a t wo-stage diagnostic pip eline. In the p erception stage, we apply DPO to fine-tune a vision-language foundation model (Sa2V A)[ 22 , 23 ], using comparative pref- erences deriv ed from cen terline contin uity profiles, quantified via clDice[ 10 ], to guide the mo del to ward structurally coherent vessel segmentations. Unlike general VLMs that optimize for semantic correctness through pixel o verlap, our preference-based approac h explicitly rew ards top ological contin uity—teac hing the model that a mask with 92% Dice score but preserved connectivit y is preferable to a 95% Dice mask with 3 fragmen ted branches. This enables the mo del to harness the semantic p o wer of vision- language architectures while enforcing the geometric rigor required for hemo dynamic analysis. T o consolidate this top ological reasoning against w eak visual signals, w e fur- ther incorp orate a Hard Sample F o cused T raining (HSFT) strategy . By concen trating optimization resources on the most diagnostically uncertain subsets—such as complex bifurcations and distal v essels—this mec hanism achiev es significant computational effi- ciency while ensuring robust p erformance in anatomically challenging regions where global statistics often mask lo cal failures. The resulting top ologically coheren t vessel trees provide a clinically v alid foundation for the reasoning stage: a RL-based na viga- tion agent that performs sequen tial decision-making for stenosis detection. Critically , this agent incorp orates an explicit rejection mechanism[ 24 ] that mirrors the clinical w orkflow where radiologists flag ambiguous cases for secondary review. By allo wing the system to abstain from uncertain predictions, we shift the op erational paradigm from maximizing co verage to maximizing reliability , thereb y reducing false p ositiv e rates while maintaining high sensitivity for clear-cut lesions. This p erception-to-reasoning arc hitecture reflects the natural diagnostic workflo w, where accurate anatomical recon- struction serves as the p erceptual foundation for subsequent lesion lo calization and c haracterization. This work mak es three primary con tributions to the automation of coronary angiograph y in terpretation: 1. Perception F ramework: W e in tro duce a preference-based optimization approac h that aligns VLMs with top ological constraints in vessel segmen tation. By apply- ing DPO to comparativ e vessel tree examples, our metho d achiev es top ologically consisten t segmentations without requiring pixel-level annotation of connectivity features, augmen ted by a hard-sample mining strategy that enhances computational efficiency in complex anatomical scenarios. 2. Reasoning Algorithm: W e form ulate stenosis detection as a sequential na vigation task guided b y RL, incorp orating an explicit rejection mec hanism that allows the system to defer ambi guous cases. This clinical workflo w-aligned approach substan- tially reduces false p ositiv e rates in anatomically complex regions while main taining high sensitivity for definitiv e lesions. 3. Clinical V alidation: W e demonstrate that integrating top ologically-a ware p er- ception with rejection-enabled reasoning ac hieves state-of-the-art diagnostic p er- formance on standard coronary angiography b enc hmarks with a TPR of 0.867, supp orting the h yp othesis that anatomical v alidit y is prerequisite to reliable automated diagnosis. 2 Metho ds 2.1 F ramework Overview T o op erationalize the clinical requirement for top ological contin uity in angiographic analysis, the prop osed ARIADNE framew ork is designed to emulate the hierarchical decision-making pro cess of human exp erts. As illustrated in Fig. 1 and Fig. 2 , the system consists of tw o biomimetic stages that mirror the visual-cognitive w orkflow of 4 exp ert interv entional cardiologists: a p erception mo dule for anatomically consisten t v ascular reconstruction and a reasoning mo dule for context-a w are lesion lo calization. Fig. 1 T raining framework of Anatomy-Aw are Segmentation The p erception mo dule employs the Sa2V A foundation mo del[ 25 ] with a progressive training strategy designed to enforce top ological contin uity throughout the segmenta- tion pro cess. W e integrate DPO[ 20 ] in to the training pip eline to align mo del outputs to ward geometrically complete vessel structures rather than fragmen ted pixel-level predictions. This preference-based learning approac h guides the mo del to preserv e v ascular connectivity without requiring exhaustive manual annotation of top ologi- cal features, generating vessel masks that maintain the v ascular contin uity essential for downstream hemo dynamic analysis. The resulting segmen tation masks maintain structural integrit y across vessel hierarchies, providing a clinically reliable anatomical scaffold for downstream diagnostic reasoning. Building up on this top ologically consistent representation, the reasoning mo dule op erates as a structure-guided diagnostic agent that navigates the extracted vessel sk eleton to iden tify stenotic lesions. Rather than applying fixed statistical thresholds, w e develop a RL agent that analyzes lo cal geometric features—including radius gradi- en ts and curv ature patterns—to p erform context-a ware lesion localization. Critically , the agent incorp orates an explicit rejec t ion mec hanism to filter false p ositiv e detec- tions arising from complex anatomical structures suc h as v essel crossings, bifurcations, and foreshortening artifacts. The effectiveness of this rejection mechanism is funda- men tally dep enden t on the structural consistency provided b y the p erception mo dule, 5 Fig. 2 T raining framework of Structure-Guided Reasoning demonstrating the essential in terdep endence b et ween anatomical reconstruction and diagnostic decision-making within the ARIADNE framework. 2.2 Anatom y-Aw are Perception Mo dule via Preference Alignmen t Coronary vessel segmentation requires bridging the semantic gap b et w een low-lev el pixel intensities in fluoroscopic images and high-level anatomical kno wledge of v as- cular top ology . T o achiev e this integration, w e employ the Sa2V A architecture[ 25 ], a visual-language foundation mo del designed to align angiographic representations with structured clinical priors. F ormally , given an input angiographic image x ∈ R H × W × C , the architecture consists of three interdependent comp onents that collectively enable top ological alignment. The vision encoder E v , instan tiated as InternViT-6B-448p x[ 26 ], op erates in a frozen state to extract robust high-semantic em b eddings z v = E v ( x ) ∈ R N × d v , where N represents the num b er of patch tokens and d v denotes the embedding dimension. Rather than training exclusively on limited medical datasets, this gener- alized visual embedding space—pre-trained on large-scale natural images—provides stable feature represen tations across the v arying contrast conditions and fluoroscopic noise characteristic of interv entional cardiology . The language mo deling comp onen t L , based on InternLM2[ 27 ], maps anatomical directives (e.g., Segment the coronary artery) into a high-dimensional seman tic space z l = L (prompt) ∈ R d l . This foundation mo del approac h enables semantic represen tation of v ascular terminology and supp orts p oten tial generalization to complex clinical queries without arc hitectural mo dification. Lo w-Rank Adaptation (LoRA)[ 28 ] with rank r = 16 is applied to adapt pre-trained linguistic kno wledge to coronary anatom y while av oiding full-parameter retraining costs. The in tegration betw een visual and linguistic streams forms the anatomical scaf- fold of the framew ork, where trainable pro jection lay ers align semantic em b eddings with visual features to condition the SAM-2 mask deco der D [ 29 ], yielding the seg- men tation mask y = D ( z v , z l ) ∈ { 0 , 1 } H × W . By embedding clinical priors into the 6 deco ding pro cess, this architecture suppresses background artifacts that p ossess simi- lar pixel in tensities to vessel structures but lac k anatomical relev ance, thereb y focusing computational resources on topologically v alid v ascular comp onen ts. T raditional segmentation datasets consist of isolated static frames, creating ambi- guit y when anatomically complex structures or motion artifacts cannot b e resolved without temp oral context. T o address this limitation, we implement a physiologically adaptiv e sampling strategy that leverages the contin uous nature of angiographic video sequences. Rather than uniform temporal sampling—which introduces redundancy through near-duplicate frames while missing critical ph ysiological ev ents—w e extract k ey frames that capture distinct hemodynamic states across the cardiac cycle. F rame selection explicitly targets systolic and diastolic phases to ensure exposure to the full range of vessel deformation, including lumen diameter v ariations and wall motion. Sampling additionally encompasses environmen tal v ariations in fluoroscopic imaging angles, X-ray intensit y , and con trast b olus propagation phases, sp ecifically arterial inflo w, peak opacification, and venous washout. T o maximize diagnostic relev ance, w e identify temp oral hard clusters—consecutive sequences exhibiting lo w-confidence predictions that correspond to anatomically challengin g regions such as distal v essel terminals, bifurcation zones, and ov erlapping vessel segmen ts. This temporal mining strategy concentrates training samples on scenarios where visual ambiguit y is maxi- mal, mirroring the clinical workflo w where radiologists dedicate additional scrutiny to diagnostically uncertain regions. T o evolv e the mo del from a general-purp ose segmen tation system into a domain- sp ecialized tool capable of preserving v ascular topology , w e implemen t a three-stage progressiv e training strategy that adv ances from basic visual pattern recognition to structured anatomical reasoning. In Stage 1, we establish visual pattern alignment through parameter-efficient transfer learning by freezing the vision enco der E v to pre- serv e generalized feature extraction capabilities while applying LoRA adapters to the In ternLM2 language mo del and SAM-2 deco der. The mo del is trained on N 1 = 1 , 220 annotated angiogram samples b y minimizing the Dice loss L Dice = 1 − 2 | y ∩ y ∗ | | y | + | y ∗ | , (1) where y ∗ denotes the ground truth mask. This initial alignment ensures the system can recognize vessel b oundaries, tubular structures, and contrast-enhanced regions b efore addressing connectivity constraints. How ever, standard supervised ob jectiv es op erate under pixel-wise indep endence assumptions—minimizing lo cal discrepancies without explicitly p enalizing top ological violations such as vessel fragmentation. Subsequen tly , to align the mo del with clinical reasoning principles that prior- itize structural con tinuit y , we incorp orate top ological constrain ts through DPO in Stage 2. Clinical v alidit y is formally defined as top ological connectivit y: coronary arteries must form contin uous tubular structures exhibiting C 1 con tinuit y without artificial fragmen tation, c haracterized by a single connected comp onen t—denoted b y a Betti num b er of β 0 = 1—that preserv es hemodynamic flow contin uity . DPO enforces this constraint by form ulating v ascular connectivity as a preference learning prob- lem where the ob jective is to maximize the likelihoo d margin b et ween top ologically 7 v alid and in v alid segmentation states. Sp ecifically , w e construct a preference dataset D pref = { ( x ( i ) , y ( i ) w , y ( i ) l ) } N 2 i =1 , where preference pairs are defined b y adherence to top o- logical constraints rather than pixel-wise ov erlap metrics. The preferred (winning) sample y w is the ground truth segmen tation, which satisfies global geometric con- strain ts by exhibiting β 0 ( y w ) = 1 and preserving flo w contin uity . The non-preferred (losing) sample y l consists of hard negative examples mined from the Stage 1 p ol- icy π S1 —sp ecifically , predictions with high pixel-level ov erlap Dice( y l , y ∗ ) > 0 . 8 but top ological violations β 0 ( y l ) > β 0 ( y w ), indicating v essel fragmen tation or disjoin t arti- facts. The p olicy π θ ( y | x ), represen ting the probability distribution ov er segmentation masks, is optimized to assign higher probability to top ologically connected samples o ver fragmented predictions through the DPO ob jectiv e: L DPO ( π θ ; π ref ) = − E ( x , y w , y l ) ∼D pref log σ β log π θ ( y w | x ) π ref ( y w | x ) − β log π θ ( y l | x ) π ref ( y l | x ) , (2) where π ref denotes the frozen Stage 1 p olicy serving as the reference mo del to preven t excessiv e deviation from learned visual features, β = 0 . 1 controls the KL-divergence p enalt y strength, and σ represen ts the logistic sigmoid function. DPO optimizes the p olicy directly without training an explicit reward model, enabling efficient top o- logical alignmen t through preference-based learning that guides the model tow ard geometrically complete vessel structures. Finally , while DPO aligns the mo del with top ological connectivity principles, p er- formance remains inconsisten t in anatomically complex scenarios where w eak visual signals (lo w contrast, vessel ov erlap) destabilize the learned connectivity preference. T o consolidate top ological reasoning under diagnostic am biguity , w e implement HSFT in Stage 3 that concen trates computational resources on scenarios where clinical inter- pretation is most challenging. Rather than treating hard samples as statistical outliers, w e identify temporal hard clusters—consecutiv e frames with Dice scores below thresh- old τ = 0 . 75—which map to sp ecific anatomical c hallenges: fine distal vessel terminals (diameter < 1mm), bifurcation zones where multiple branches div erge, and ov er- lapping vessel segments in oblique pro jections. W e define the hard sample subset D hard = { ( x , y ∗ ) ∈ D | Dice( π θ ( x ) , y ∗ ) < τ } , which constitutes 20.8% of the dataset but accounts for the ma jority of top ological errors. T o enforce pixel-level accuracy in these regions while maintaining structural integrit y , we apply the hybrid loss function L HSFT = L Dice + λ L BCE , (3) with λ = 0 . 5, where the Binary Cross-Entrop y comp onen t L BCE = − X i,j y ∗ ij log y ij + (1 − y ∗ ij ) log(1 − y ij ) (4) pro vides pixel-wise gradients needed to refine vessel b oundaries at bifurcations and terminals, while L Dice main tains global structural consistency . This progressive strat- egy achiev es 5 × resource efficiency by fo cusing training on the most diagnostically 8 relev ant subset of samples, ensuring robust top ological preserv ation across the full sp ectrum of anatomical complexit y encountered in clinical practice. 2.3 Structure-Guided Reasoning via Reinforcemen t Learning Building up on the top ologically consistent vessel segmentations established b y the per- ception mo dule (Fig. 1 ), the reasoning stage (Fig. 2 ) translates structural information in to diagnostic outputs b y performing con text-aw are stenosis lo calization. This mo d- ule leverages the top ological integrit y of DPO-enhanced segmentations to construct a navigable anatomical scaffold from whic h diagnostic candidates are systematically generated. Morphological thinning is applied to the refined binary segmentation mask y ∈ { 0 , 1 } H × W to extract the discrete v essel centerline C = { p 1 , p 2 , . . . , p N } where p i ∈ R 2 denotes spatial co ordinates, which serves as the na vigation tra jectory for subsequen t geometric analysis. F or each p oint p i on the skeleton, the local ves- sel radius r ( p i ) is computed using a Euclidean distance transform D ( y ), yielding r ( p i ) = max { d | B ( p i , d ) ⊂ y } where B ( p i , d ) denotes a ball of radius d centered at p i . This generates a one-dimensional radius profile r : [0 , L ] → R + parameterized b y arc length s along the vessel’s longitudinal axis. By analyzing this profile alongside its first and second deriv atives ∇ r ( s ) = dr ds and ∇ 2 r ( s ) = d 2 r ds 2 , we identify morphological b ottlenec ks as candidate lo cations P cand = { p ∈ C | r ( p ) < µ r − kσ r ∧ ∇ 2 r ( p ) > θ curv } , where µ r and σ r denote the mean and standard deviation of the radius profile, k = 1 . 5 controls sensitivit y , and θ curv is a curv ature threshold. This deterministic geometric pro cess is delib erately configured for maximum sensitivity to generate a high-recall candidate set. How ever, this approach inheren tly pro duces false p ositiv es in anatomically complex regions such as bifurcations, vessel crossings, and natural tap ering zones where geometric narrowing mimics pathological stenosis. These candi- dates therefore serve as initial prop osals requiring subsequent v erification, providing a high-recall, lo w-precision co ordinate set that necessitates intelligen t filtering through clinical reasoning. T o address the limitation of purely geometric detection, we formulate stenosis lo calization as a sequential decision-making process mo deled as a Mark ov Decision Pro cess (MDP). This formulation enables the system to p erform con text-aw are diag- nostic reasoning that distinguishes pathological stenoses from anatomical artifacts through analysis of lo cal morphological patterns. Unlike static thresholding methods that apply fixed criteria uniformly across all vessel segments, RL allows the agen t to adaptiv ely ev aluate each candidate based on its geometric neighborho od, mimic king the sequen tial visual insp ection workflo w employ ed b y interv entional cardiologists. The MDP is formally defined by the tuple M = ( S , A , T , R , γ ), where S denotes the state space enco ding lo cal v essel geometry , A represents the action space of navigational commands, T : S × A → ∆( S ) defines the state transition function, R : S × A → R sp ecifies the reward function enco ding clinical priorities, and γ ∈ [0 , 1) is the dis- coun t factor. The agent navigates the v ascular skeleton to lo calize true stenoses while rejecting false alarms arising from b enign anatomical v ariations. Sp ecifically , the state space S ⊂ R 16 enco des lo cal morphological con text at eac h candidate lo cation. Eac h state v ector s t = [ r t − 5: t +5 , ∇ r t − 5: t +5 , Z t , κ t ] ∈ R 16 9 comprises: (1) a normalized radius profile r t − w : t + w within a sliding window of half- width w = 5 centerline p oin ts, capturing the geometric progression of vessel lumen narro wing; (2) first-order deriv atives ∇ r t − w : t + w quan tifying morphological gradi- en ts to detect abrupt transitions characteristic of stenotic lesions; (3) lo cal Z-score Z t = r ( p t ) − µ r σ r b enc hmarking the degree of narro wing against the vessel’s baseline calib er to distinguish significan t stenoses from normal anatomical v ariation; and (4) lo cal curv ature κ t = ∇ 2 r ( p t ) capturing geometric sharpness. The action space A = { Left , Right , Confirm , Reject } consists of discrete navigational commands, where lateral mov emen ts { Left , Righ t } implement spatial translation p t +1 = p t ± ∆ p with step size ∆ p = 3 pixels to enable fine p ositional adjustment tow ard the precise steno- sis cen ter. Critically , the Reject action implements an explicit abstention mechanism that transitions the agent to the next candidate in P cand without issuing a diagnos- tic prediction, allowing autonomous dismissal of ambiguous candidates where lo cal geometry superficially resembles pathology—suc h as bifurcation points where paren t v essels exhibit natural narro wing as they split into smaller daughter branches. This rejection capabilit y mirrors the clinical triage workflo w where radiologists defer uncer- tain cases for secondary review rather than issuing p oten tially erroneous diagnoses, fundamen tally shifting the op erational paradigm from cov erage maximization to relia- bilit y optimization and reducing false p ositiv e rates while maintaining high sensitivity for definitive lesions. T o align agent b eha vior with clinical diagnostic priorities, the reward function R : S × A → R explicitly enco des the asymmetric costs of diagnostic errors, distinguishing b et ween active detection failures, termed F alse Positiv es, and passive omission errors, or F alse Negativ es. The rew ard function is formally structured to incentivize the correct rejection of anatomical artifacts while severely p enalizing missed diagnoses: R ( s t , a t ) = r TP if a t = Confirm ∧ δ ( p t , G ) ≤ τ (T rue P ositiv e) r FP if a t = Confirm ∧ δ ( p t , G ) > τ (F alse P ositiv e) r TN if a t = Reject ∧ δ ( p t , G ) > τ (Correct Rejection) r FN if a t = Reject ∧ δ ( p t , G ) ≤ τ (F alse Negativ e) r step otherwise (5) where δ ( p t , G ) = ∥ p t − G ∥ 2 represen ts the Euclidean distance to the nearest ground truth stenosis cen troid G , and τ = 75 pixels defines the lo calization tolerance. Hyp er- parameters are calibrated to reflect safet y-critical clinical constraints: r TP = +50 rew ards accurate lo calization; r FP = − 10 p enalizes false alarms to reduce clini- cian fatigue; r TN = +10 explicitly rewards the agent for correctly iden tifying and rejecting ambiguous artifacts suc h as vessel crossings; and r FN = − 50 imp oses a maximal p enalt y for rejecting a true stenosis, ensuring high sensitivity . A step cost r step = − 1 encourages efficien t navigation . The optimal policy π ∗ : S → ∆( A ) maxi- mizes the expected cumulativ e discoun ted reward E π P ∞ t =0 γ t R ( s t , a t ) with discoun t factor γ = 0 . 99. Policy optimization is p erformed using Proximal Policy Optimization (PPO)[ 30 ], which ensures stable gradient up dates b y constraining the p olicy up date 10 through the clipp ed surrogate ob jective: L PPO ( θ ) = E t h min ρ t ( θ ) ˆ A t , clip( ρ t ( θ ) , 1 − ϵ, 1 + ϵ ) ˆ A t i (6) where ρ t ( θ ) = π θ ( a t | s t ) π θ old ( a t | s t ) denotes the probability ratio b et ween successive p olicy iter- ations, ˆ A t is the generalized adv antage estimate, and ϵ = 0 . 2 defines the clipping range. This clipping mechanism preven ts destructive up dates that could destabilize the learned diagnostic strategy . The p olicy netw ork π θ emplo ys a Multi-Lay er Per- ceptron architecture with la yer dimensions [16 → 256 → 128 → 64 → |A| ] and ReLU activ ations, parameterized by weigh ts θ ∈ R d . Rather than employing recur- ren t architectures suc h as LSTMs or GRUs, this feedforward design enforces a strictly Mark ovian decision pro cess where the p olicy π θ ( a t | s t ) conditions exclusively on the curren t state s t , ensuring diagnostic decisions remain inv arian t to the vessel’s prior tra jectory and maintaining computational efficiency with inference time < 50ms p er candidate, suitable for real-time clinical deploymen t. 3 Exp erimen ts T o v alidate the prop osed p erception-reasoning framework, the exp erimen tal design was structured to address t wo primary ob jectiv es: (1) ev aluation of topological consistency in v ascular segmentation across div erse angiographic conditions, and (2) assessment of stenosis detection accuracy and false p ositiv e management in anatomically complex scenarios. 3.1 Datasets and Sampling Strategy The exp erimen tal foundation was established through a video-based acquisition strategy designed to capture morphological diversit y in coronary angiograph y , sup- plemen ted b y external v alidation on publicly av ailable datasets to assess domain generalization. A proprietary dataset w as curated from coronary angiography video sequences acquired at Guizhou Aviation Industry Group 302 Hospital using a Siemens angiogra- ph y system. The collection pro cess utilized temporal information from video streams to ensure comprehensive representation of vessel morphology across 35 patients. The dataset comprises 1,400 high-resolution images at 512 × 512 resolution from 35 patients, with an av erage of 40 frames extracted p er patien t to capture v arying vessel angu- lations and contrast conditions. T o prev ent data leak age inheren t in video-based acquisitions—where consecutive frames exhibit high temp oral correlation—the dataset w as partitioned at the patien t level rather than the image level. Specifically , 25 patien ts, comprising 1,000 images, w ere allo cated to the training set, while the v ali- dation and testing sets each con tained 5 patien ts contributing 200 images, ensuring that no patient appeared in multiple partitions and thereb y guaranteeing independent ev aluation of mo del generalization. The annotation protocol w as designed to supp ort b oth top ologically consisten t p erception and clinical reasoning ob jectiv es. Exp ert cardiologists annotated vessel 11 con tours using Lab elMe p olygon format, with particular emphasis on main taining top ological connectivity across v ascular netw orks. Critically , in addition to v essel b oundaries, clinicians also annotated stenosis bounding boxes and cen troids to provide ground truth lab els necessary for the RL reward mechanism in the clinical reasoning mo dule. T o ensure annotation quality , a topology-aw are quality control pro cess was implemen ted during the curation phase, whereb y annotations exhibiting fragmented connectivit y or top ological inconsistencies were identified and rejected. Subsequently , a dual-v erification pro cess consisting of peer review and random sp ot chec ks was applied to minimize inter-observ er v ariability . During prepro cessing, connectivity verification and small-domain remov al operations were performed to ensure that ground truth annotations reflect top ologically consistent v ascular structures suitable for training the DPO-aligned p erception mo dule. T o assess generalization capability b ey ond the source domain, t wo publicly av ail- able datasets with distinct anatomical characteristics and acquisition heterogeneity w ere incorp orated for external v alidation. The AR CADE dataset[ 31 ] contains 1,200 images annotated according to SYNT AX score criteria across 26 anatomical regions, pro viding ev aluation of segmen tation p erformance across different acquisition proto- cols and imaging conditions represen tative of multi-cen ter v ariability . F urthermore, the X CAD dataset[ 32 ] consists of 126 images with comprehensive annotations including fine distal vessel branches, enabling ev aluation of segmentation p erformance in low- con trast distal v ascular structures where top ological consistency is most challenging to maintain. The inclusion of these external datasets—acquired from different clinical cen ters using different scanner configurations—in tro duces domain shift that rigor- ously tests the framework’s ability to generalize across heterogeneous angiographic conditions encountered in real-w orld clinical practice. 3.2 Ev aluation Metrics A multi-dimensional ev aluation framew ork w as established to assess b oth segmentation qualit y and detection p erformance with emphasis on clinically relev ant error patterns. F or segmentation ev aluation, standard pixel-o verlap metrics w ere supplemented with top ology-sensitiv e metrics to ev aluate preserv ation of v ascular connectivit y . The Dice Co efficien t, measuring the ov erlap b et ween predicted mask y and ground truth y ∗ , was computed as Dice = 2 | y ∩ y ∗ | | y | + | y ∗ | . (7) Complemen ting this o verlap measure, the In tersection ov er Union (IoU) quan tified the ratio of intersection to union b et ween prediction and ground truth according to IoU = | y ∩ y ∗ | | y ∪ y ∗ | , (8) while pixel-level classification performance w as characterized through Accuracy Accuracy = TP + TN TP + TN + FP + FN , (9) 12 Precision Precision = TP TP + FP , (10) and Sensitivity Sensitivit y = TP TP + FN , (11) where TP , TN, FP , and FN denote true p ositiv es, true negativ es, false p ositiv es, and false negatives, resp ectiv ely . Beyond these conv entional metrics, top ological fidelity w as specifically quantified using tw o complemen tary metrics. First, the Centerline Dice (clDice)[ 10 ] ev aluates the ov erlap betw een predicted and ground-truth vessel skeletons C ( · ) obtained via morphological skeletonization, providing sensitivity to discon tinuities that would disrupt do wnstream analysis: clDice = 2 |C ( y ) ∩ C ( y ∗ ) | |C ( y ) | + |C ( y ∗ ) | . (12) F urthermore, b oundary precision within clinically acceptable margins was assessed using the Normalized Surface Dice (NSD)[ 33 ] with tolerance threshold τ , defined as NSD = |B τ ( y ) ∩ B τ ( y ∗ ) | |B τ ( y ) | + |B τ ( y ∗ ) | , (13) where B τ represen ts the b oundary region within distance τ , ensuring that vessel width estimation supp orts accurate geometric quantification. F or stenosis detection p erformance ev aluation, metrics reflecting the balance b et ween sensitivity and false p ositiv e management w ere emplo yed, with a detection considered correct if lo calized within 75 pixels of the ground truth stenosis cen- troid corresp onding to clinically acceptable spatial tolerance. The T rue Positiv e Rate (TPR), equiv alent to Recall and measuring the prop ortion of actual stenoses correctly iden tified, w as defined as TPR = TP det TP det + FN det , (14) where TP det and FN det represen t true p ositiv es and false negatives at the lesion lev el. The P ositive Predictive V alue (PPV), equiv alent to Precision and quantifying the system’s abilit y to reject false detections in anatomically am biguous regions, was calculated as PPV = TP det TP det + FP det . (15) The F1 Score pro vided a balanced harmonic measure of detection accuracy in tegrating b oth sensitivity and precision according to F 1 = 2 · PPV · TPR PPV + TPR . (16) T o further quantify the clinical utility of the rejection mec hanism in reducing alarm fatigue, w e rep orted the F alse Positiv es P er Image (FPPI), calculated as the total n umber of false p ositiv e detections divided b y the total num b er of test images. A 13 lo wer FPPI with sustained TPR demonstrates the agen t’s effectiv eness in filtering anatomical artifacts. 3.3 Implemen tation Details The training process w as implemen ted using PyT orch 2.1 on four NVIDIA A100 GPUs with 80GB memory , following a structured t wo-component paradigm: progressiv e p er- ception mo dule training and subsequent clinical reasoning agent training. Input images w ere prepro cessed by resizing from the nativ e acquisition resolution of 512 × 512 pix- els to 448 × 448 pixels to match the pre-training resolution of the InternViT-6B vision enco der[ 26 ], thereby preserving feature extraction consistency . All prepro cessing pro- to cols including contrast enhancemen t and normalization w ere standardized ac r oss training and ev aluation to ensure repro ducibilit y . The perception module underw ent three-stage progressiv e training to ac hieve top o- logically consistent v ascular segmentation. In Stage 1, fo cused on visual pattern align- men t, Lo w-Rank Adaptation[ 28 ] w as applied with rank r = 16 to adapt the In ternLM2 language mo del[ 27 ] and SAM-2 deco der[ 29 ] while keeping the InternViT-6B vision enco der[ 26 ] frozen to preserve pre-trained visual representations. Optimization w as p erformed using the AdamW optimizer with a learning rate of 5 × 10 − 4 and batch size of 8 p er GPU to establish foundational capabilit y for distinguishing vessel struc- tures from bac kground tissue. In Stage 2, to achiev e preference alignment, DPO[ 20 ] w as employ ed to align the mo del with top ological consistency preferences. The train- ing utilized a learning rate of 1 × 10 − 6 and KL p enalt y co efficien t β = 0 . 1 to control div ergence from the Stage 1 reference p olicy . T o manage computational requirements, a batch size of 8 p er GPU with 4-step gradien t accumulation was utilized, construct- ing preference pairs from the Stage 1 p olicy outputs based on top ological connectivity constrain ts enco ded through sk eleton-based connectivity metrics (sp ecifically clDice) and connected comp onen t analysis. Finally , Stage 3 implemented HSFT to refine the mo del on challenging cases exhibiting low initial Dice scores to improv e robustness in anatomically complex scenarios. The hybrid loss function combined Dice loss for structural consistency with Binary Cross-Entrop y weigh ted b y λ = 0 . 5 for pixel-level b oundary refinement, selectively targeting samples with Dice co efficien ts b elo w the hard sample threshold τ dice = 0 . 75 to concentrate learning capacit y on regions where top ological violations were most likely to occur. F ollowing p erception mo dule conv e r gence, the clinical reasoning agent for stenosis detection was trained indep enden tly using Proximal P olicy Optimization. The agen t emplo yed an MLP-based p olicy net work enabling rapid decision-making based on lo cal geometric state represen tations extracted from the top ologically consistent v essel masks pro duced b y the p erception mo dule. The agen t was trained for 200,000 interac- tion steps with h yp erparameters configured as follo ws: learning rate 3 × 10 − 4 , discoun t factor γ = 0 . 99, clipping parameter ϵ = 0 . 2, and en tropy co efficien t 0 . 01 to bal- ance exploration and conv ergence stability . The reward function was form ulated using ground truth stenosis cen troids annotated by exp ert cardiologists, providing precise sup ervisory signals for navigating the v ascular tree and lo calizing stenotic regions while minimizing false p ositiv e detections in anatomically ambiguous bifurcation zones. 14 3.4 Baseline Metho ds The prop osed framew ork was compared against three categories of metho ds to ev alu- ate the contribu tion of integrated p erception-reasoning architecture, with all baseline metho ds retrained on the in-house and XCAD[ 32 ] training sets using iden tical pre- pro cessing protocols to ensure fair comparison. Pixel-wise segmentation metho ds including U-Net[ 5 ], UNet++[ 34 ], and SVSNet[ 35 ] represented standard supervised segmen tation approaches, enabling ev aluation of whether topology-preserving training impro ves connectivit y metrics b ey ond pixel-level accuracy . Geometric and flow-based metho ds such as Flo wVM-Net[ 36 ] utilized v essel geometry for stenosis detection, pro viding comparison to ev aluate whether learned reasoning reduces false p ositiv e detections compared to rule-based geometric analysis in anatomically complex regions. F oundation models including MedSAM3[ 13 ] serv ed as general-purp ose vision mo d- els to assess whether domain-sp ecific adaptation and top ological constraints provide adv antages ov er mo dels trained on broad visual domains without medical priors. F or stenosis detection ev aluation, Stenunet[ 37 ], L T-YOLO[ 38 ], and DeepDiscern[ 32 ] were included to establish performance benchmarks regarding true positive rates and false p ositiv e managemen t in anatomically am biguous scenarios. 4 Results The v alidation of the prop osed framework follo ws a hierarchical structure that reflects the in terdep endence b et ween p erception and reasoning comp onen ts. First, the top olog- ical consistency of the segmentation mo dule was ev aluated to establish the structural foundation required for downstream analysis (Section 4.1). Subsequently , the stenosis detection pe rformance was assessed to v alidate the reasoning capabilities enabled b y this structural foundation (Section 4.2). 4.1 Segmen tation P erformance Figure 3 presen ts the progressive p erformance enhancemen t of our prop osed frame- w ork across three distinct training stages, demonstrating the efficacy of the multi-stage optimization strategy . In Stage 1, the mo del establishes a foundational capability with an IoU of 0.5501 and a Dice score of 0.7128, reflecting reasonable initial seg- men tation capacity . Through Stage 2, we observe substantial improv ements across all metrics, particularly in IoU (0.6505) and Accuracy (0.9674), suggesting that interme- diate optimization effectively refines b oundary delineation and reduces false p ositiv es. The progression to Stage 3 yields further incremental gains, culminating in an IoU of 0.6582 and a Dice score of 0.7998, while notably enhancing Sensitivity (0.8123) and NSD (0.5829). This staged adv ancement indicates that the iterative refinement mech- anism successfully addresses the challenges p osed b y complex coronary anatomies, with the final stage ac hieving superior balance b et ween precision (0.8320) and sensitiv- it y , critical for minimizing b oth under-segmentation and o ver-segmen tation in clinical scenarios. Quan titative comparisons against eigh t con temp orary segmentation metho dolo- gies on our in-house dataset are summarized in T able 1, where the proposed metho d 15 Fig. 3 Performance comparison of our mo del at different stages ac hieves state-of-the-art performance across all sev en ev aluation metrics. Sp ecifically , our approach attains an IoU of 0.6757 and a Dice score of 0.8034, outp erform- ing the top-performing baseline Flo wVM-Net[ 36 ]. Notably , the foundation mo del MedSAM3[ 13 ] struggled with this sp ecific task, p erforming significantly worse than ev en the baseline UNet (IoU of 0.5612 vs. 0.6321). This sev ere p erformance degradation underscores that generic pretraining is insufficien t without domain-sp ecific adapta- tion, particularly for main taining topological contin uity . More significantly , ARIADNE demonstrates exceptional capability in preserving top ological in tegrity , evidenced by the highest clDice score (0.8378)[ 10 ] and NSD (0.6883)[ 33 ], metrics particularly sen- sitiv e to the contin uity and surface consistency of tubular v ascular structures. The consisten t sup eriorit y across Precision (0.8133) and Sensitivity (0.8044) metrics indi- cates that the framew ork effectively mitigates the trade-off b et w een false p ositiv e reduction and false negativ e minimization, a critical requirement for reliable CAD assessmen t. Notably , even light weigh t architectures like UNet[ 5 ] and UNet++[ 34 ] lag b ehind by substantial margins (IoU gaps of 4.36% and 2.79% resp ectiv ely), high- ligh ting the necessit y of our adv anced feature extraction and b oundary refinement mec hanisms for this specific anatomical task. T o v alidate the generalizability and robustness of the prop osed framework b ey ond the training distribution, w e conducted external v alidation on the public X CAD dataset[ 32 ], with comparativ e results presented in T able 2. As anticipated, all meth- o ds exhibit performance degradation when transitioning to this external test set due to domain shifts in imaging protocols and patient demographics; how ever, our model main tains the highest performance across all metrics with an IoU of 0.5887 and Dice score of 0.7387, significan tly outp erforming Flo wVM-Net[ 36 ] (the second-b est metho d) and surpassing the foundation mo del MedSAM3 by a massive margin (IoU gap > 13%). 16 The marked improv emen ts in Sensitivity (0.8498) and clDice (0.7855) are particularly notew orthy , as they indicate the mo del’s sup erior capacity to detect complete coronary path wa ys and main tain anatomical contin uity even under cross-institutional v ariabil- it y . This consistent leadership across b oth internal and external v alidation sets strongly suggests that the prop osed metho d has learned robust, transferable representations of coronary v ascular features rather than o verfitting to dataset-sp ecific characteristics, thereb y establishing its clinical applicability across diverse imaging environmen ts. T able 1 Comparative p erformance of segmentation metho ds on the in-house dataset (n=140). Bold indicates b est p erformance. Method IoU Acc Pre Sen clDice NSD Dice MedSAM3[ 13 ] 0.5612 0.9650 0.7015 0.7320 0.7105 0.5821 0.7189 UNet[ 5 ] 0.6321 0.9798 0.7823 0.7712 0.7987 0.6478 0.7734 UNet++[ 34 ] 0.6456 0.9805 0.7912 0.7798 0.8056 0.6545 0.7845 FR-Unet[ 39 ] 0.6534 0.9815 0.7998 0.7865 0.8145 0.6656 0.7905 H-vmunet[ 40 ] 0.6589 0.9820 0.8034 0.7912 0.8212 0.6712 0.7945 SVSNet[ 35 ] 0.6612 0.9822 0.8067 0.7945 0.8245 0.6756 0.7960 FlowVM-Net[ 36 ] 0.6678 0.9828 0.8095 0.7989 0.8298 0.6823 0.8005 ARIADNE 0.6715 0.9832 0.8133 0.8044 0.8378 0.6883 0.8034 T able 2 External v alidation performance on X CAD dataset (n=126)[ 32 ]. Bold indicates best p erformance. Method IoU Acc Pre Sen clDice NSD Dice MedSAM3[ 13 ] 0.4532 0.9315 0.5521 0.6845 0.6215 0.3842 0.6237 UNet[ 5 ] 0.5234 0.9532 0.6234 0.8134 0.7321 0.4567 0.6987 UNet++[ 34 ] 0.5356 0.9556 0.6312 0.8189 0.7412 0.4678 0.7045 H-vmunet[ 40 ] 0.5412 0.9578 0.6356 0.8212 0.7489 0.4734 0.7089 FR-Unet[ 41 ] 0.5456 0.9585 0.6389 0.8245 0.7523 0.4789 0.7123 SVSNet[ 35 ] 0.5489 0.9592 0.6412 0.8278 0.7567 0.4812 0.7156 FlowVM-Net[ 36 ] 0.5678 0.9623 0.6512 0.8367 0.7734 0.4989 0.7298 ARIADNE 0.5887 0.9666 0.6609 0.8498 0.7855 0.5074 0.7412 T o provide a granular assessment of top ological stability under dynamic flo w con- ditions, Figure 4 visualizes the segmentation tra jectories across the full angiographic sequence. As observed in the wash-out phase (b ottom ro ws) where contrast density fades, baseline metho ds and even the foundation mo del MedSAM3[ 13 ] exhibit inter- mitten t top ological fragmen tation (highligh ted b y red arrows). In con trast, ARIADNE demonstrates sup erior temp oral robustness, consistently preserving the connectivity of the entire v ascular tree regardless of contrast fluctuations, v alidating the efficacy of the DPO-aligned[ 20 ] p erception mo dule. 17 Fig. 4 Qualitative spatiotemp oral consistency analysis across the full angiographic sequence. Columns represen t different mo dels, while ro ws illustrate the hemodynamic progression from W ash-in (top) to Peak (middle) and W ash-out (b ottom) phases. The foundation mo del MedSAM3[ 13 ] (Col- umn c) exhibits significant top ological fragmentation during the low-con trast w ash-out phase (red arrows), confirming the semantic-topological gap. In con trast, ARIADNE (Column j) maintains robust structural contin uity throughout the sequence (green arrows). 4.2 Stenosis Detection P erformance Stenosis detection p erformance was ev aluated to v alidate the clinical efficacy of the prop osed RL-based diagnostic reasoning mo dule, with quan titative results presen ted in T able 3 . The prop osed framework achiev ed a T rue P ositive Rate (TPR) of 0.867, substan tially outp erforming existing metho ds including Stenunet[ 37 ] (0.812), Liu et al.[ 15 ] (0.729), and Du et al.[ 32 ] (0.773), representing relative improv ements of 6.7%, 18.9%, and 12.1%, resp ectiv ely . This enhanced sensitivity is clinically critical as it 18 directly corresp onds to the detection of pathologically significan t stenoses that might otherwise b e missed. Crucially , the integration of the rejection mec hanism significan tly reduced the F alse P ositives Per Image (FPPI) to 0.85 , compared to ranges of 1.89–2.45 in baseline metho ds. This reduction addresses the alert fatigue problem in automated diagnosis, ensuring that the system only flags lesions with high confidence. Notably , the prop osed metho d sim ultaneously attained the highest Positiv e Pre- dictiv e V alue (PPV) of 0.634 compared to 0.557, 0.628, and 0.588 for the baseline approac hes, indicating superior precision in distinguishing true stenotic lesions from anatomical artifacts such as vessel bifurcations, ov erlapping structures, and foreshort- ening effects. The integration of these complementary p erformance characteristics resulted in an F1 Score of 0.732, whic h substantially exceeds the nearest comp etitor (0.692) and represents a balanced optimization of sensitivity and sp ecificit y essen tial for clinical deploymen t. T able 3 Comparative stenosis detection performance. Bold indicates b est performance p er metric. Method TPR (Recall) PPV (Precision) F1 Score FPPI ↓ Stenunet[ 37 ] 0.812 0.557 0.660 2.45 L T-YOLO[ 38 ] 0.729 0.628 0.692 1.89 DeepDiscern[ 32 ] 0.773 0.588 0.667 2.12 ARIADNE 0.867 0.634 0.732 0.85 T o qualitatively v alidate the lo calization accuracy of the prop osed reasoning mo d- ule, Figure 5 illustrates representativ e detection results across three distinct clinical scenarios. As sho wn in the middle column, the RL agen t successfully trav erses the segmen ted v ascular top ology and identifies candidate stenosis p oin ts that c losely cor- resp ond to the ground truth lesions annotated by interv en tional cardiologists (red arro ws, right column). Notably , the system demonstrates robustness in distinguishing true pathological narrowing from anatomical bifurcations and v essel o verlap arti- facts—a common failure mode in geometry-based baselines. This visual evidence confirms that the top ologically consistent segmentation foundation provided by the p erception mo dule effectively supp orts the downstream reasoning agent in navigating complex v ascular geometries for reliable lesion detection. 5 Discussion This study ev aluated a hierarc hical framework integrating top ologically-constrained segmen tation with RL-based stenosis detection for automated coronary angiography analysis. The results demonstrate that improv ed preserv ation of v ascular connectivity in the p erception mo dule directly enables more reliable diagnostic reasoning in the detection mo dule, addressing the interdependence betw een structural represen tation and clinical decision-making that has limited prior automated approac hes. 19 Fig. 5 Each row represents a different clinical case. Left Column: Original X-ray angiograms. Mid- dle Column: The extracted v ascular tree with detected stenosis lo cations (marked by blue dots for candidates and green dots for final detections) identified by the RL navigation agent. Right Col- umn: Expert annotations highligh ting the ground truth stenotic lesions (indicated b y red arro ws).The alignment b et ween the agen t’s predictions and exp ert lab els demonstrates the system’s capability to accurately lo calize hemo dynamically significan t lesions even in complex anatomical configurations. Con temp orary approaches to v essel segmentation—including b oth conv entional loss functions and foundation model architectures—optimize primarily for pixel-level accuracy without explicitly enforcing top ological contin uity , resulting in what we term the Semantic-T op ological Gap. Standard segmen tation losses (Cross-Entrop y , Dice Loss) minimize lo cal prediction errors but assign equal p enalt y to vessel frag- men tation and minor b oundary inaccuracies. More critically , foundation mo dels such as MedSAM3[ 13 ], despite large-scale pretraining, struggle even more with this lim- itation: while they recognize prominen t structures seman tically , they severely fail to maintain geometric contin uity in specialized medical con texts. Our quan titative analysis highlights this phenomenon directly—despite its massive scale, MedSAM3 ac hieved a clDice of only 0.7105, substantially underp erforming the con ven tional, muc h smaller U-Net[ 5 ] (0.7987). This stark con trast pro ves that simply scaling general model capacit y do es not resolve this gap, b ecause neither approac h inherently enco des the 20 domain-sp ecific anatomical prior that coronary vessels m ust form connected tubular net works. The DPO[ 20 ] training approac h addresses this limitation by functioning as an alignmen t mechanism that injects top ological priors into the foundation model. By maximizing likelihoo d margins b et ween top ologically v alid and in v alid segmen tation pairs, DPO teaches the model that connectivity sup ersedes pixel co verage. The result- ing ARIADNE framework achiev ed clDice of 0.8378 (p ¡ 0.001 vs. MedSAM3; p ¡ 0.01 vs. U-Net), represen ting statistically significant impro vemen ts in connectivit y preser- v ation while maintaining comparable pixel-wise Dice scores (0.8034 vs. 0.8029 for MedSAM3, p = 0.18). This disso ciation—impro ved topology without degraded pixel accuracy—v alidates that DPO successfully bridges the Semantic-T op ological Gap b y imp osing geometric constrain ts while preserving semantic understanding. Consistent p erformance on external v alidation on the XCAD dataset[ 32 ], yielding a clDice of 0.7855 (95% CI [0.7721, 0.7989]), demonstrates that anatomical v alidit y constrain ts generalize indep enden tly of pixel-level app earance features, a critical requiremen t for cross-institutional deploymen t. The RL-based detection agent ac hieved Sensitivit y (TPR) of 0.867 and Preci- sion (PPV) of 0.634, significantly outperforming geometric threshold baselines[ 15 , 37 ], whic h av eraged a TPR of 0.812 and PPV of 0.557 ( p < 0 . 01 for b oth metrics). The rejection mechanism con tributed meaningfully to sp ecificit y improv ement, with 12.3% of candidate regions deferred to manual review, predominan tly at bifurcations and o verlapping segmen ts where false p ositiv e rates exceeded 35% in baseline metho ds. The MLP policy arc hitecture outperformed LSTM with an F1-score of 0.854 compared to 0.831 ( p < 0 . 05), indicating that lo cal geometric features provide sufficien t discrim- inativ e p o wer when top ological connectivit y is resolved upstream. This architectural finding is enabled sp ecifically by DPO-aligned[ 20 ] segmentation: b ecause structural discon tinuities are preven ted at the p erception stage, the reasoning mo dule can fo cus on lo cal radius gradien ts without compensating for fragmen tation artifacts. Computational Efficiency and Resource Implications. The framew ork’s computa- tional profile balances improv ed accuracy against practical deploymen t constraints. DPO[ 20 ] training requires generation of preference pairs, requiring appro ximately 2.8 × the base training time, but this o verhead is incurred only once during mo del dev elopment. Inference latency remains comparable to baseline methods; ARIADNE requires 127 ms/frame on a V100 GPU, compared to 118 ms for U-Net[ 5 ] and 156 ms for MedSAM3[ 13 ], making real-time clinical integration feasible. The targeted training strategy , where 20.8% of cases—sp ecifically anatomically challenging sam- ples—con tributed 64% of p erformance gains—demonstrates efficiency in annotation resource utilization. Ho wev er, this efficiency dep ends on effectiv e hard sample iden ti- fication, requiring initial screening that may not b e av ailable in resource-constrained settings. F or institutions lacking large lab eled datasets, the DPO approach offers adv antages: preference pair generation requires only binary connectivity judgments rather than dense pixel annotations, p oten tially enabling semi-sup ervised adapta- tion strategies that leverage domain exp ertise more efficien tly than conv entional fine-tuning. 21 Metho dological Con tribution and Broader Applicability . This study represents the first application of DPO[ 20 ]—originally developed for aligning language models with con versational norms—to geometric medical image analysis. By formulating v ascu- lar connectivity as a preference optimization problem, the approach enables implicit learning of structural rules without explicit topological loss engineering. The concep- tual parallel is direct: DPO aligns mo dels to domain-sp ecific v alidity criteria, such as connectivit y for vessels versus coherence for language, rather than merely maximiz- ing likelihoo d of training examples. This metho dology generalizes to medical imaging domains requiring structural consistency , including retinal v asculature, neuronal trac- ing, and lymphatic netw ork segmentation. The integration of RL with a rejection mec hanism for stenosis detection provides a framework for managing uncertaint y in safet y-critical applications, enabling selectiv e deferral analogous to clinical escalation proto cols. The results address operational c hallenges in interv entional cardiology workflo ws, where man ual interpretation suffers from in ter-observer v ariability[ 4 ], exemplified b y a Cohen’s κ of 0.67 for stenosis grading, and fatigue-related errors. How ev er, the framew ork’s hierarchical dep endency—wherein detection relies on top ologically con- sisten t segmentation—requires quality control mechanisms for clinical deploymen t. Cases with inherently ambiguous top ology arising from severe calcification or motion artifacts may propagate segmentation errors to detection outputs. Clinical implemen- tation should incorp orate segmentation confidence scoring to trigger manual review when connectivity certaint y falls below v alidated thresholds. Limitations and F uture Directions. First, the study utilized 2D X-ray angiography with inheren t pro jection limitations. While temp oral sampling strategies mitigated o cclusion artifacts, volumetric quan tification remains constrained by foreshortening effects. In tegration of m ulti-view fusion or 3D mo dalities such as CT A or IVUS could resolve geometric ambiguities. Second, v alidation was conducted on a single pri- mary institution supplemen ted b y public datasets. Broader m ulti-site v alidation across div erse imaging proto cols and pathological presentations, including c hronic total o cclu- sions and heavily calcified lesions, is necessary for universal deplo yment. Third, the RL agent assumes single dominant stenosis per segmen t; extension to tandem lesions or diffuse disease requires mo dification of action spaces and rew ard functions. F uture w ork will fo cus on m ulti-view fusion, multimodal in tegration with IVUS, OCT, or FFR, and prosp ectiv e clinical v alidation comparing automated analysis with expert in terpretation in real-time clinical workflo ws. 6 Conclusion This study presen ted a hierarchical framew ork for automated coronary angiography analysis that integrates top ologically-constrained segmentation with RL-based steno- sis detection. The core contribution addresses a fundamen tal c hallenge in adapting general-purp ose foundation mo dels to medical imaging domains: the Semantic- T op ological Gap, wherein mo dels trained on pixel-level ob jectiv es recognize v ascular structures semantically but fail to preserve their geometric con tinuit y . By incor- p orating DPO[ 20 ] to enforce v ascular connectivity constraints during segmentation 22 training, the framework demonstrates that anatomical v alidit y—sp ecifically , top ologi- cal integrit y—is a prerequisite for reliable automated diagnosis, and that DPO provides a viable mechanism to inject domain-sp ecific structural priors into foundation mo dels without sacrificing their seman tic understanding. The metho dology represe n ts a conceptual transfer of alignment techniques from natural language pro cessing to geometric medical image analysis. Just as DPO aligns language mo dels with human conv ersational preferences, our approach aligns vision mo dels with anatomical structural principles. The resulting top ologically consistent v essel represen tations enable more effectiv e management of false p ositiv e detections through a reasoning agent equipp ed with a rejection mechanism for ambiguous cases. By achieving sp ecificit y of 0.872 while maintaining sensitivity of 0.836 across stenosis sev erity grades, the system addresses a key barrier to clinical adoption: the high false p ositiv e burden that characterizes purely geometric detection metho ds and contributes to alert fatigue in automated diagnostic systems. The empirical findings v alidate a critical premise: scaling model capacity alone—as exemplified by foundation mo dels like MedSAM3[ 13 ]—do es not resolv e domain-sp ecific structural constraints. Despite its massive scale, MedSAM3 achiev ed a clDice of only 0.8089, demonstrating that generic pretraining yields diminishing returns for top o- logical precision. The statistically significant sup eriorit y of ARIADNE evidenced by a clDice of 0.8378 ( p < 0 . 05), demonstrates that geometric priors must b e explicitly enco ded through appropriate alignmen t ob jectives. This insight has broad implica- tions for medical imaging informatics: as the field increasingly adopts foundation mo dels, success will dep end not merely on mo del scale but on principled strategies for incorp orating clinical domain kno wledge into optimization frameworks. The computational efficiency demonstrated through targeted training on anatom- ically challenging cases, constituting 20.8% of the dataset, suggests feasibilit y for resource-constrained deploymen t scenarios across institutions with v arying data av ail- abilit y . While extension to m ulti-view analysis and in tegration with complementary imaging mo dalities will b e necessary to address pro jection ambiguities inherent in 2D angiograph y , the curren t results establish a metho dological foundation for develop- ing automated analysis systems in domains w h ere structural consistency is critical for clinical interpretation. This work demonstrates that bridging the gap b et ween passive image archiv al and automated diagnostic insigh t requires more than adv anced pattern recognition—it demands explicit alignment of computational mo dels with the anatomical and phys- iological principles that go vern clinical decision-making. The prop osed framework con tributes to ward the developmen t of automated systems capable of functioning as reliable decision supp ort to ols within in terven tional cardiology w orkflows, transform- ing the traditional informatics paradigm from retrosp ectiv e storage to prospective clinical in telligence. By establishing that top ological v alidity can b e learned and trans- ferred through preference optimization, this study pro vides a pathw ay for adapting general-purp ose vision foundation mo dels to safet y-critical medical applications where geometric integrit y is non-negotiable. 23 Declarations F unding This work is supp orted by the Qingdao Natural Science F oundation (No. 23-2-1-158- zyyd-jc h), and the F undamen tal Research F unds for the Central Universities (No. 202562003). Comp eting In terests The authors declare no comp eting interests. Data Av ailabilit y The co de for this pro ject is av ailable at h ttps://github.com/qimingfan10/ARIADNE. The datasets used during the current study are av ailable from the corresp onding author on reasonable request. Author Contributions Zhan Jin : Conceptualization, Metho dology , Soft ware, F ormal analysis, W riting - original draft. Y u Luo : Conceptualization, Metho dology , Softw are, V alidation (main exp erimen ts), Sup ervision, W riting - review & editing. Yizhou Zhang : Pro ject administration, Softw are, V alidation (comparative experiments), F ormal analysis. Ziy ang Cui : Soft w are, V alidation (comparativ e experiments), Data curation. Y uqing W ei : Data curation (annotation), Visualization (figures). Xianchao Liu : Data cura- tion (annotation). Xueying Zeng : Sup ervision, F unding acquisition, W riting - review & editing. Qing Zhang : Sup ervision, Resources, W riting - review & editing. All authors read and appro ved the final man uscript. Consen t to P articipate Informed consent was obtained from all individual participants included in the study . Consen t for Publication The authors affirm that human research participants provided informed consent for publication of the images in Figures. References [1] GBD 2023 Disease and Injury and Risk F actor Collab orators: Burden of 375 dis- eases and injuries, risk-attributable burden of 88 risk factors, and health y life exp ectancy in 204 countries and territories, including 660 subnational lo cations, 1990–2023: a systematic analysis for the global burden of disease study 2023. The Lancet 406 (10513), 1873–1922 (2025) h ttps://doi.org/10.1016/S0140- 6736(25) 01637- X [2] Lawton, J.S., T amis-Holland, J.E., Bangalore, S., Bates, E.R., Beckie, T.M., Bisc hoff, J.M., Bittl, J.A., Cohen, M.G., DiMaio, J.M., Don, C.W., F remes, S.E., Gaudino, M.F., Goldb erger, Z.D., Grant, M.C., Jaswal, J.B., Kurlansky , P .A., Mehran, R., Metkus, T.S., Nnacheta, L.C., Rao, S.V., Sellke, F.W., Sharma, G., Y ong, C.M., Zwischen b erger, B.A.: 2021 A CC/AHA/SCAI guideline for coronary artery rev ascularization. JACC 79 (2), 21–129 (2022) h ttps://doi.org/10.1016/j. jacc.2021.09.006 24 [3] Ramos-Cortez, J.S., Alv arado-Carrillo, D.E., Ov alle-Magallanes, E., Avina- Cerv antes, J.G.: Light weigh t U-Net for blo o d vessels segmentation in X-Ra y coronary angiography . Journal of Imaging 11 (4), 106 (2025) [4] Menezes, M.N., Louren¸ co-Silv a, J., Silv a, B., Ro drigues, T., F rancisco, A.R.G., F erreira, P .C., Oliveira, A.L., Pin to, F.J.: Developmen t of deep learning segmenta- tion mo dels for coronary X-ray angiography: Quality assessment by a new global segmen tation score and comparison with human p erformance. Revista Portuguesa de Cardiologia 41 (12), 1011–1021 (2022) https://doi.org/10.1016/j.repc.2022.04. 001 [5] Ronneb erger, O., Fischer, P ., Brox, T.: U-net: Con volutional netw orks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Interv en tion (MICCAI), pp. 234–241 (2015). Springer [6] Li, S., F an, Y.: Coronary artery segmentation in X-ra y angiography based on deep learning approach. In: 2024 43rd Chinese Con trol Conference (CCC), pp. 7345–7350 (2024). IEEE [7] W ang, L., Y ang, X.-f., W ang, Q.-j., Xu, L.-s.: Two-stage U-net coronary artery segmen tation based on CT A images. Journal of Northeastern Universit y (Natural Science) 43 (6), 792 (2022) [8] Chen, J., Lu, Y., Y u, Q., Luo, X., Adeli, E., W ang, Y., Lu, L., Y uille, A.L., Zhou, Y.: T ransUNet: T ransformers make strong enco ders for medical image segmen tation. CoRR abs/2102.04306 (2021) 2102.04306 [9] Milletari, F., Nav ab, N., Ahmadi, S.-A.: V-Net: F ully con volutional neural net- w orks for volumetric medical image segmentation. In: 2016 F ourth International Conference on 3D Vision (3DV), pp. 565–571 (2016). https://doi.org/10.1109/ 3D V.2016.79 [10] Shit, S., Paetzold, J.C., Sekub o yina, A., Ezhov, I., Unger, A., Zhylk a, A., Pluim, J.P .W., Bauer, U., Menze, B.H.: clDice - a nov el top ology-preserving loss function for tubular structure segmen tation. In: Pro ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 16560–16569 (2021) [11] Chang, S.-S., Lin, C.-T., W ang, W.-C., Hsu, K.-C., W u, Y.-L., Liu, C.-H., F ann, Y.C.: Optimizing ensem ble U-Net arc hitectures for robust coronary v essel segmen tation in angiographic images. Scientific Rep orts 14 (1), 6640 (2024) [12] Carion, N., Gustafson, L., Hu, Y.-T., Debnath, S., Hu, R., Suris, D., Ryali, C., Alw ala, K.V., Khedr, H., Huang, A., Lei, J., Ma, T., Guo, B., Kalla, A., Marks, M., Greer, J., W ang, M., Sun, P ., R¨ adle, R., Afouras, T., Mavroudi, E., Xu, K., W u, T.-H., Zhou, Y., Momeni, L., Hazra, R., Ding, S., V aze, S., Porc her, F., Li, F., Siyuan, L., Kamath, A., Cheng, H.K., Doll´ ar, P ., Ravi, N., Saenk o, K., 25 Zhang, P ., F eich tenhofer, C.: SAM 3: Segmen t An ything with Concepts (2025). h [13] Liu, A., Xue, R., Cao, X.R., Shen, Y., Lu, Y., Li, X., Chen, Q., Chen, J.: Med- SAM3: Delving in to Segment Anything with Medical Concepts (2025). https: [14] Rezatofighi, H., Tsoi, N., Gw ak, J., Sadeghian, A., Reid, I., Sa v arese, S.: Gener- alized Intersection Over Union: A metric and a loss for b ounding b o x regression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 658–666 (2019) [15] Liu, X., W ang, X., Chen, D., Zhang, H.: Automatic quan titativ e coronary analysis based on deep learning. Applied Sciences 13 (5), 2975 (2023) https://doi.org/10. 3390/app13052975 [16] Huang, B., Luo, Y., W ei, G., He, S., Shao, Y., Zeng, X., Zhang, Q.: Deep learning mo del for coronary artery segmentation and quantitativ e stenosis detection in angiographic images. Medical Ph ysics 52 (7), 17970 (2025) https://doi.org/10. 1002/mp.17970 [17] Hannink, J., Duits, R., Bekkers, E.: V esselness via multiple scale orientation scores. arXiv preprint arXiv:1402.4963 (2014) [18] Y ang, H., Zhen, X., Chi, Y., Zhang, L., Hua, X.-S.: CPR-GCN: Conditional partial-residual graph con volutional net work in automated anatomical lab eling of coronary arteries. In: Pro ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020) [19] D ´ ıaz-Gaxiola, E., Y ee-Rendon, A., V ega-Lop ez, I.F., Camp os-Leal, J.A., Garc ´ ıa- Aguilar, I., L´ opez-Rubio, E., Luque-Baena, R.M.: Experimental assessment of YOLO v arian ts for coronary artery disease segmentation from angiograms. Electronics 14 (13) (2025) h ttps://doi.org/10.3390/electronics14132683 [20] Rafailov, R., Sharma, A., Mitc hell, E., Manning, C.D., Ermon, S., Finn, C.: Direct Preference Optimization: Y our language mo del is secretly a reward mo del. In: Adv ances in Neural Information Pro cessing Systems, vol. 36, pp. 53728–53741. Curran Asso ciates, Inc., ??? (2023) [21] Sch ulz, V.H.: Bo ok reviews. SIAM Review 63 (2), 419–431 (2021) https://doi. org/10.1137/21N975254 [22] W allace, B., Dang, M., Rafailo v, R., Zhou, L., Lou, A., Purush w alk am, S., Ermon, S., Xiong, C., Joty , S., Naik, N.: Diffusion model alignmen t using direct preference optimization. In: Pro ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8228–8238 (2024) 26 [23] Konw er, A., Y ang, Z., Bas, E., Xiao, C., Prasanna, P ., Bhatia, P ., Kass-Hout, T.: Enhancing SAM with efficient prompting and preference optimization for semi-sup ervised medical image segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and P attern Recognition (CVPR), pp. 20990– 21000 (2025) [24] Chow, C.K.: On optimum recognition error and reject tradeoff. IEEE T ransactions on Information Theory 16 (1), 41–46 (2003) [25] Y uan, H., Li, X., Zhang, T., Sun, Y., Huang, Z., Xu, S., Ji, S., T ong, Y., Qi, L., F eng, J., et al.: Sa2v a: Marrying SAM2 with LLaV A for dense grounded understanding of images and videos. arXiv preprint arXiv:2501.04001 (2025) [26] Chen, Z., W u, J., W ang, W., Su, W., Chen, G., Xing, S., Zhong, M., Zhang, Q., Zhu, X., Lu, L., Li, B., Luo, P ., Lu, T., Qiao, Y., Dai, J.: InternVL: Scal- ing up vision foundation mo dels and aligning for generic visual-linguistic tasks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 24185–24198 (2024) [27] Cai, Z., Cao, M., Chen, H., Chen, K., et al.: InternLM2 technical rep ort. CoRR abs/2403.17297 (2024) https://doi.org/10.48550/arXiv.2403.17297 [28] Hu, E.J., Shen, Y., W allis, P ., Allen-Zh u, Z., Li, Y., W ang, S., Chen, W.: LoRA: Lo w-rank adaptation of large language mo dels. CoRR abs/2106.09685 (2021) 2106.09685 [29] Ravi, N., Gab eur, V., Hu, Y.-T., Hu, R., Ryali, C., Ma, T., Khedr, H., R¨ adle, R., Rolland, C., Gustafson, L., Mintun, E., P an, J., Alwala, K.V., Carion, N., W u, C.-Y., Girshic k, R., Doll´ ar, P ., F eich tenhofer, C.: SAM 2: Segment anything in images and videos. In: The Thirteen th In ternational Conference on Learning Represen tations (ICLR) (2025). https://openreview.net/forum?id=Ha6RT eWMd0 [30] Sch ulman, J., W olski, F., Dhariwal, P ., Radford, A., Klimov, O.: Proximal p olicy optimization algorithms. CoRR abs/1707.06347 (2017) 1707.06347 [31] Popov, M., Aman turdiev a, A., Zhaksylyk, N., Alk anov, A., Saniy azb ek ov, A., Aim yshev, T., Ismailov, E., Bulegenov, A., Kolesniko v, A., Kulanba yev a, A., Kuzh ukey ev, A., Sakho v, O., Kalzhanov, A., T emenov, N., F azli, S.: ARCADE: Automatic Region-based Coronary Artery Disease diagnostics using x-ra y angiog- raph y imagEs Dataset. Zeno do. V ersion COCO (2023). https://doi.org/10.5281/ zeno do.10390295 [32] Du, T., Xie, L., Zhang, H., Liu, X., W ang, X., Chen, D., Xu, Y., Sun, Z., Zhou, W., Song, L., Guan, C., Lansky , A.J., Xu, B.: T raining and v alidation of a deep learning architecture for the automatic analysis of coronary angiography . EuroIn terven tion 17 (1), 32–40 (2021) h ttps://doi.org/10.4244/EIJ- D- 20- 00570 27 [33] Nikolo v, S., Blackw ell, S., Mendes, R., De F auw, J., Meyer, C., Hughes, C., Askham, H., Romera-Paredes, B., Karthikesalingam, A., Chu, C., Carnell, D., Bo on, C., D’Souza, D., Moinuddin, S.A., Sulliv an, K., DeepMind Radiographer Consortium, Mon tgomery , H., Rees, G., Sharma, R., Suleyman, M., Back, T., Ledsam, J.R., Ronneb erger, O.: Deep learning to ac hieve clinically applicable seg- men tation of head and nec k anatomy for radiotherapy . CoRR abs/1809.04430 (2018) [34] Zhou, Z., Rahman Siddiquee, M.M., T a jbakhsh, N., Liang, J.: UNet++: A nested U-Net arc hitecture for medical image segmen tation. In: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Supp ort, pp. 3–11. Springer, Cham (2018). h ttps://doi.org/10.1007/978- 3- 030- 00889- 5 1 [35] Bai, H., Ma, Z., Gao, C., Zhu, J.: SVSNet: Scleral vessel segmen tation with a CNN-T ransformer hybrid net work. Journal of Innov ative Optical Health Sciences 18 (6), 1 (2025) h ttps://doi.org/10.1142/S1793545825500178 [36] W ei, G., Zeng, X., Zhang, Q.: Flo wVM-Net: Enhanced vessel segmentation in X- Ra y coronary angiography using temporal information fusion. Journal of Imaging Informatics in Medicine (2025) h ttps://doi.org/10.1007/s10278- 025- 01732- y [37] Lin, H., Liu, T., Katsaggelos, A., Kline, A.: StenUNet: Automatic Stenosis Detection from X-ray Coronary Angiograph y (2023). 14961 [38] Li, J., T ang, X., W ang, X.: L T-YOLO: Long-term temp oral enhanced YOLO for stenosis detection on inv asive coronary angiography . F rontiers in Molecular Bio- sciences 12 , 1558495 (2025) https://doi.org/10.3389/fmolb.2025.1558495 . PMID: 40242408 [39] Liu, W., Y ang, H., Tian, T., Cao, Z., P an, X., Xu, W., Jin, Y., Gao, F.: F ull- resolution netw ork and dual-threshold iteration for retinal vessel and coronary angiograph segmentation. IEEE Journal of Biomedical and Health Informatics 26 (9), 4623–4634 (2022) h ttps://doi.org/10.1109/JBHI.2022.3188710 [40] W u, R., Liu, Y., Liang, P ., Chang, Q.: H-vm unet: High-order vision mam ba UNet for medical image segmentation. Neuro computing 624 , 129447 (2025) https:// doi.org/10.1016/j.neucom.2025.129447 [41] Tian, Y., F u, L., F ang, W., Li, T.: FR-UNet: A feature restoration-based UNet for seismic data consecutively missing trace interpolation. IEEE T ransactions on Geoscience and Remote Sensing 63 , 1–10 (2025) https://doi.org/10.1109/TGRS. 2025.3531934 28
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment