SAMSEM -- A Generic and Scalable Approach for IC Metal Line Segmentation

IA CR T ransactions on Cryptographic Hardw are and Em bedded Systems ISSN XXXX-XXXX, V ol. 0, No. 0, pp. 1–24. DOI:XXXXXXXX SAMSEM – A Generic and Scalable App roach fo r IC Metal Line Segmentation Christian Gehrmann 1 , Jonas Ric k er 2 , Simon Damm 2 , Deruo Cheng 3 , Julian Sp eith 1 , Yiqiong Shi 3 , Asja Fisc her 2 and Christof P aar 1 1 Max Planc k Institute for Securit y and Priv acy (MPI-SP), Bo ch um, German y , { christian.gehrmann , julian.speith , christof.paar }@mpi-sp.org 2 R uhr Univ ersit y Bo c h um (R UB), Boch um, German y , { jonas.ricker , simon.damm , asja.fischer }@rub.de 3 Nan y ang T ec hnological Univ ersity (NTU), Singap ore, Singap ore, { deruo.cheng , yqshi }@ntu.edu.sg Abstract. In light of globalized hardware supply chains, the assurance of hardware comp onen ts has gained signiﬁcant interest, particularly in cryptographic applications and high-stakes scenarios. Identifying metal lines on scanning electron microscop e (SEM) images of in tegrated circuits (ICs) is one essen tial step in v erifying the absence of malicious circuitry in c hips man ufactured in un trusted en vironmen ts. Due to v arying man ufacturing pro cesses and technologies, such veriﬁcation usually requires tuning parameters and algorithms for each target IC. Often, a machine learning mo del trained on images of one IC fails to accurately detect metal lines on other ICs. T o address this challenge, w e create SAMSEM by adapting Meta’s Segment An ything Mo del 2 (SAM2) to the domain of IC metal line segmentation. Sp eciﬁcally , we dev elop a multi-scale segmentation approach that can handle SEM images of v arying sizes, resolutions, and magniﬁcations. F urthermore, we deploy a top ology-based loss alongside pixel-based losses to fo cus our segmen tation on electrical connectivit y rather than pixel-lev el accuracy . Based on a hyperparameter optimization, we then ﬁne-tune the SAM2 mo del to obtain a mo del that generalizes across diﬀerent tec hnology no des, man ufacturing materials, sample preparation metho ds, and SEM imaging tec hnologies. T o this end, we leverage an unpreceden ted dataset of SEM images obtained from 48 metal la yers across 14 diﬀerent ICs. When ﬁne-tuned on seven ICs, SAMSEM ac hieves an error rate as lo w as 0 . 72 % when ev aluated on other images from the same ICs. F or the remaining seven unseen ICs, it still achiev es error rates as low as 5 . 53 % . Finally , when ﬁne-tuned on all 14 ICs, we observe an error rate of 0 . 62 % . Hence, SAMSEM prov es to b e a reliable to ol that signiﬁcantly adv ances the frontier in metal line segmentation, a key challenge in p ost-manufacturing IC veriﬁcation. Keyw ords: Hardware Assurance · Metal Line Segmentation · SAM2 1 Intro duction In tegrated circuits ( IC s) are deploy ed across all asp ects of our digital so ciet y . They provide the foundation not only for our electronic devices, suc h as smartphones and computers, but also for many safety and securit y applications in critical infrastructure and defense. T o this end, assurance of the absence of malicious circuitry in such IC s is essen tial to ensure the reliabilit y and trustw orthiness of their supply chain. This is particularly true for man y cryptographic applications, as hardware T ro jans in, e. g., a trusted platform mo dule ( TPM ), could easily b e lev eraged to subv ert securit y altogether. One means to ac hieve such assurance is through ph ysical inspection, i. e., destructively op ening up Licensed under Creative Commons License CC-BY 4.0. 2 SAMSEM – A Generic and Scalable Approac h for IC Metal Line Segmen tation IC s, grinding them down lay er b y la yer while contin uously taking images of ev ery la yer. No wada ys, suc h images are captured using scanning electron microscop es ( SEM s) due to ever-shrinking IC feature sizes in mo dern IC s. These images are then analyzed and compared against a ground truth [ PMB + 23 ], such as Graphic Data System I I ( GDSI I ) design ﬁles, or used to extract a gate-lev el netlist [ HCS + 18 ] for further analysis. This destructive imaging process often results in hundreds of gigab ytes of images that need to be analyzed in order to ensure the absence of malicious circuitry . T o this end, machine-learning models are increasingly deplo yed to impro ve accuracy in image segmen tation and recognition tasks while ensuring scalability [ RKA + 23 , HCS + 18 ]. F or the p olysilicon lay er that implemen ts the transistors of an IC , pattern-matching approaches are often deploy ed [ KSS + 20 , ZWD + 25 ]. In contrast, the metal lay ers mostly require segmen tation of metal lines and vias [ KSS + 20 , HCS + 18 ]. How ever, while this segmen tation of t ypically bright metal lines against a dark background ma y seem straigh tforw ard, it turns out to b e rather complicated in practice for multiple reasons: (i) Man ufacturing tec hnologies do not only v ary betw een IC s, but also b etw een diﬀeren t la yers of the same IC , resulting in v astly diﬀeren t shapes, see Figure 1 for examples of metal-lay er images. A segmen tation algorithm optimized for one la y er of one IC ma y not generalize to other la yers or other ICs without further adjustmen ts. (ii) Dela yering errors, artifacts from sample preparation, or dust particles can conceal critical connections and imp ede ev en adv anced image analysis algorithms. (iii) Structures on modern IC are so dense that adjacen t but unconnected metal lines ma y app ear to b e linked for an automated algorithm due to image distortions or noise, which are inheren t to SEM images. (iv) The c haracteristics of SEM images diﬀer v astly dep ending on the employ ed SEM , its detectors, and image capturing settings such as magniﬁcation, dwelling time, resolution, and acceleration voltage. I m a g e G r o u n d T r u t h F C N S A M I C S A M 2 O u r s Figure 1: Metal la yer images from three diﬀeren t IC s as w ell as the corresp onding ground-truth mask and the image segmen tation masks pro duced from diﬀeren t methods. Non-ob vious electrically signiﬁcan t diﬀerences ( ESD ) errors in the segmen tation are marked with orange circles. C. Gehrmann, J. Rick er, S. Damm, D. Cheng, J. Sp eith, Y. Shi, A. Fischer, C. P aar 3 Esp ecially for netlist extraction, a pro cess in which SEM images of all IC la yers are analyzed to reco ver a gate-lev el netlist description of the implemented circuit, we can tolerate only very few segmentation errors. Suc h errors w ould signiﬁcantly impair the analysis of the recov ered netlist later on. Currently , the aforementioned issues are often addressed b y man ually annotating a subset of the images from each lay er of every IC and then using these annotated images to train a machine learning mo del (or adjust parameters in a classical image pro cessing pip eline) to segment the remainder of that la yer [ TUSP18 , NTCG24 ], resulting in signiﬁcant manual ov erhead. Our Contributions. This w ork addresses a sub-problem in IC image analysis: developing a solution for metal line segmen tation that generalizes well across diﬀeren t la yers and IC s without requiring man ual in terven tion or retraining. This step is essential for reconstructing the in terconnections of transistors and, b y extension, standard cells on an IC , the analysis of which is crucial for hardw are assurance. Figure 1 depicts segmentation issues observed for tec hniques for metal line segmen tation from the literature when applied to previously unseen IC s. T o address such generalization issues across metal lay ers and IC s, we present SAMSEM 1 , which builds on top of Meta’s Segmen t Anything Mo del 2 ( SAM2 ) [ R GH + 25 ] for image segmentation. While SAM2 striv es for generalized segmen tation in natural images, w e ﬁne-tune and ev aluate the mo del on metal line segmen tation using an unpreceden ted dataset comprising SEM images from 14 diﬀerent IC s. T o deal with v arying resolutions, magniﬁcations, and metal line structure sizes in SEM images, w e dev elop a multi-scale segmen tation approach and wrap it around the ﬁne-tuned SAM2. F or gate-level netlist extraction, we are less in terested in pixel-accurate segmen tation than in ensuring correct electrical connectivit y . Therefore, for ﬁne-tuning of the SAM2 mo del, we deploy a top ology-based loss function in addition to standard pixel-based losses. In particular, by considering the structure’s top ology in eac h image, this loss function explicitly penalizes both short and open connections b etw een metal lines. This w ay , we emphasize correct electrical connections ov er pixel accuracy in the segmen tation. In the con text of hardware assurance, this reduces the num b er of short circuits and op en connections observed in an extracted netlist, thereby signiﬁcantly improving the eﬀectiv eness of later analysis steps on the gate-level netlist. Our exp erimen ts are p erformed on a small cluster of eigh t Nvidia H100 mac hine learning accelerators, and ﬁne-tuning on a single Nvidia H100 tak es around 72 hours when utilizing the full dataset. In summary , our main con tributions are: • Throughout our exp eriments, we utilize an unprecedented dataset of metal-lay er images from 14 diﬀerent IC s, spanning tec hnology no des from 200 nm do wn to 20 nm . • W e develop SAMSEM b y adapting the SAM2 foundation mo del to the domain of IC metal-la yer image segmen tation via multi-scale segmentation, a top ology-based loss function, and ﬁne-tuning. Thereb y , w e ac hieve an in-distribution error rate of 0 . 72 % when ﬁne-tuning and ev aluating on seven diﬀerent IC s, compared to 4 . 44 % for the b est-kno wn approac h that we could reproduce. • W e demonstrate that SAMSEM generalizes w ell, ev en for unseen IC s, and therefore do es not require retraining when analyzing new IC s. Our approach achiev es an out- of-distribution error rate of 5 . 53 % on the seven previously unseen IC s, representing a signiﬁcan t impro vemen t o ver the 24 . 77 % error rate observ ed for the b est-repro duced approac h from the literature. 1 S egment A ny M etal-lay er S canning E lectron M icroscop e image 4 SAMSEM – A Generic and Scalable Approach for IC Metal Line Segmentation • W e publish a model ﬁne-tuned on 90 % of all images across all 14 IC s, along with all our ﬁne-tuning and b enchmarking scripts, so that others can build up on our w ork. This ﬁnal mo del achiev es an in-distribution error rate of just 0 . 62 % . Structure of this P ap er. W e in tro duce the tec hnical background and discuss related work on metal line segmen tation, mac hine learning, and topology-based loss functions in Section 2 . In Section 3 , w e describ e our metho dology , including our multi-scale segmen tation pip eline, training and ev aluation datasets, the top ology-based loss function, and the ﬁne-tuning pro cess, along with details on hyperparameter optimization. Section 4 pro vides details on our ev aluation procedures and giv es results for our segmen tation and generalization p erformance. Finally , w e discuss limitations and draw conclusions in Section 5 . 2 T echnical Background and Related W ork In this section, we review relev ant technical background and discuss related work on IC metal line segmen tation in Section 2.1 , Meta’s SAM2 in Section 2.2 , and topology-based loss functions in Section 2.3 . 2.1 Metal Line Segmentation IC s are built from a polysilicon lay er at the bottom that (for digital logic) implemen ts transistors and man y metal la y ers on top. These metal lay ers establish the electrical con- nection b et w een the transistors on the p olysilicon lay er, thereby forming an interconnected circuit. Hence, a full extraction of all metal lines from IC images is an essential step to ward hardware assurance. Mac hine-learning-based image segmentation algorithms are often emplo yed to this end, whic h pro duce a segmentation mask that clearly distinguishes b et ween metal lines and the bac kground. Ho wev er, the app earance of metal lines from diﬀeren t IC s, diﬀeren t la yers of the same IC , and even under diﬀeren t SEM capture settings ma y v ary signiﬁcan tly . Therefore, metal line segmentation algorithms m ust often b e manually adjusted for the speciﬁc dataset on whic h they op erate. Related Wo rk. T rindade et al. propose classical image pro cessing for metal line segmen- tation [ TUSP18 ], which must b e tuned for every new dataset. F urthermore, they introduce the ESD error as a metric to fo cus on relev ant deviations of the segmen tation output from the ground truth. The metric counts the op en and short circuits in the segmen tation, as well as the false positives ( FP s) (i. e., segmen ted metal lines that do not exist) and false negatives ( FN s) (i. e., undetected metal lines in the original images). Thi s ESD metric is more relev ant for circuit extraction than pixel-accuracy metrics, as it fo cuses on errors that result in a fault y extracted circuit description. Cheng et al. are the ﬁrst to use mac hine learning for metal line segmen tation b y prop osing a metho d based on h ybrid K-means clustering and support v ector mac hines [ CSL + 18 ]. In the same year, Hong et al. in tro duced the ﬁrst deep-learning-based tec hnique and ev aluated it using a metric similar to the ESD metric [ HCS + 18 ]. Y u et al. built their segmentation solution around HRNet and also used the ESD metric for ev aluation [ YTG + 22 ]. A bit later, T ee et al. concluded that common training approaches do not properly capture the lo cal nature of ESD er- rors [ THC + 23 ]. Hence, they put forw ard a patch-based approach, aiming a discriminator net work—comparing the ground truth to intermediate segmentation results—at small patc hes of the input images. None of the aforemen tioned approac hes has been tested for whether they generalize to unseen IC s or even to other lay ers within the same IC s. Instead, they are ev aluated on a small dataset often obtained from just a single lay er of a single IC . W e assume this is the case for economic reasons, as sample preparation and imaging are tedious and costly . C. Gehrmann, J. Rick er, S. Damm, D. Cheng, J. Sp eith, Y. Shi, A. Fischer, C. P aar 5 Rothaug et al. approach metal line segmentation using unsup ervised learning to reduce the manual workload when facing new datasets [ RKA + 23 , R CK + 25 ]. How ever, some of their parameters m ust still b e adjusted for ev ery new dataset. They also used the ESD metric for ev aluation and released not only their code, but also their training dataset. T o com bat man ual parameter adjustments altogether, Ng et al. prop osed ﬁne-tuning Meta’s Segmen t Anything Mo del ( SAM ), the predecessor of SAM2 , using a dataset from four diﬀeren t IC s, resulting in a model they coined Segmen t Anything Mo del for Integrated Circuit Image Analysis ( SAMIC ) [ NTCG24 ]. How ever, they did not ev aluate SAMIC ’s out-of-distribution performance on IC s not seen during ﬁne-tuning, used a signiﬁcantly smaller dataset than us, did not employ top ological loss functions, and did not address diﬀeren t scales of input images. In our exp erimen ts, w e sho w that SAMSEM is superior to their approach by around one order of magnitude. 2.2 Segment Anything Mo del 2 The Segment Anything Mo del 2 ( SAM2 ) is a large foundation segmentation mo del dev el- op ed b y Meta [ R GH + 25 ]. It was primarily designed for a broad range of segmentation tasks on real-world images and videos, but can b e ﬁne-tuned for sp eciﬁc applications. Compared to other segmen tation netw orks, SAM2 tec hnically requires a prompt input, whic h can b e a p oint, a mask, or a b ounding box of the ob ject to b e segmented. Figure 2: Mo del components of SAM2 and their interactions [ RGH + 25 ]. SAM2 comprises an image enc o der , memory attention , mask de c o der , pr ompt enc o der , memory enc o der , and memory b ank ; see Figure 2 . The image enc o der works in conjunction with the memory attention to generate the image em b eddings for the mask decoder. Memory attention is used for videos and has no eﬀect on image segmentation; hence, w e disregard it. F or image enco ding, a Hiera image enco der [ RHB + 23 ] is used, pre-trained with a masked auto enco der ( MAE ) [ HCX + 22 ]. This image enco der extracts visual features from the input images and pro vides these feature em b eddings as unconditioned tok ens to the mask deco der (via the memory atten tion mo dule). Next, the pr ompt enc o der accepts a range of user input prompts, suc h as masks, p oints, and boxes. F or our work, we focus on input points to generate p ositional encodings. The p oin t prompt can b e either p ositive or negativ e, indicating whether the foreground or bac kground should b e selected. The mask de c o der generates the ﬁnal segmentation masks based on the feature embeddings produced b y the image enco der and the user prompts pro cessed b y the prompt encoder. While the pro duced segmentation mask is usually binary , conﬁdence scores for each pixel indicating its likelihoo d of b elonging to the segmen ted object can also be accessed. Related W ork. Numerous works hav e built up on SAM2 to improv e its p erformance in sp eciﬁc domains. F or example, Mandal et al. [ MKP25 ] proposed SAM2LoRA to eﬃcien tly ﬁne-tune SAM2 for retinal fundus segmentation b y only considering sp eciﬁc parameters for ﬁne-tuning and freezing the remaining ones. T o b etter cop e with v arying image resolutions, Liu et al. [ L Y vD + 24 ] presen t WSI-SAM, which aims to handle high-resolution whole-slide images in the context of pathology . Although their approach implies changes to the 6 S AMSEM – A Generic and Scalable Approac h for IC Metal Line Segmen tation SAM arc hitecture, it do es not require full retraining of the entire mo del. In a similar v ein, Gao et al. [ GZYL24 ] improv e SAM for salient ob ject detection by b etter capturing m ulti-scale features and preserving ﬁne-grained details. HQ-SAM b y Ke et al. [ KYD + 23 ] adds a learnable high-quality output token to SAM , and fuses features from diﬀerent la yers to pro duce m uch more accurate and detailed segmentation masks, especially for ob jects with intricate shap es Finally , Meta has released the Segment Anything Mo del 3 ( SAM3 ) [ CGH + 25 ] shortly before the submission of our w ork. At the time of writing, SAM3 is a v ailable only for ﬁne-tuning on request and is not y et publicly accessible. 2.3 T op ology-Based Loss Functions Pixel-based loss functions lik e the Dice loss [ SL V + 17 ] or the in tersection o ver union ( IoU ) loss [ R TG + 19 ] are commonly used for training image-related mac hine learning mo dels. These loss functions penalize pixel diﬀerences betw een the mo del output (e. g., a segmentation mask) and the ground truth. In con trast to pixel-based loss functions, top ology-based loss functions prioritize top ological accuracy ov er pixel-level accuracy . Suc h loss functions are often researc hed in the domain of biological cell boundary detection and automated street recognition in satellite images [ HLSC19 , HWL + 21 ]. In our case, we are in terested in correct electrical connections rather than the accurate segmentation of ev ery pixel on an IC image. Hence, top ology-based loss functions seem to be a natural ﬁt. Related Wo rk. In their seminal w ork on top ology-based image segmentation, Hu et al. prop ose a loss function based on Betti num b ers that count structures suc h as the connected comp onen ts and holes in an input image [ HLSC19 ]. They use p ersistence diagrams to measure the topological similarity betw een the ground truth and the mo del prediction. Later, Shit et al. prop osed clDice as a loss function speciﬁcally designed for tubular struc- tures such as v ascular net works and roads [ SPS + 21 ]. The loss compares the cen terline of a predicted tubular structure (i. e., its skeleton) to the ground truth. Liu et al. later enhanced a skeleton-based loss function with better a wareness of structure b oundaries [ LMB + 24 ]. The DMT loss proposed b y Hu et al. uses discrete Morse theory to iden tify un wan ted critical p oints in the predicted segmentation [ HWL + 21 ]. Similar to Hu et al. [ HLSC19 ], p ersistence diagrams of the prediction are compared to the ground truth. Hu et al. further adv ance the ﬁeld by using homotopy warping to consider not only the n umber of topological features, but also their geometric placement [ Hu22 ]. It identiﬁes top ologically critical pixels by warping the predicted mask in to the ground truth. Stuc ki et al. built up on the initial publication of Hu et al. [ HLSC19 ]. Instead of simply measuring o verall top ological counts (i. e., Betti num b ers), it matc hes individual top ological features betw een the predicted segmentation and the ground truth in a spatially consisten t wa y , deﬁning a Betti matc hing error that can be used as a diﬀeren tiable loss to impro ve the segmentation’s top ological correctness [ SPS + 23 ]. Similarly , W en et al. enco de spatial a wareness with p ersistence diagrams to construct their loss function [ WZB + 24 ]. Expanding on their o wn work, Stucki et al. adjust Betti matching to 3D segmen tation b y dev eloping a faster implementation of the Betti matc hing loss, making topology-based loss functions practical for volumetric data [ SBPB24 ]. Berger et al. then extend Betti matc hing to multi-class segmentation by reducing an n -class segmentation to n single-class segmen tation tasks and applying induced barco de matchings for eac h class [ BLS + 24 ]. Finally , T op ograph b y Lux et al. enco des the topology as a comp onent graph, with nodes corresp onding to pixels or regions thereof and edges representing adjacency [ LBW + 25 ]. By building their loss functions solely on graph algorithms, they ac hieve b etter eﬃciency than most other algorithms while pro viding strong top ological guaran tees for segmentation. C. Gehrmann, J. Rick er, S. Damm, D. Cheng, J. Sp eith, Y. Shi, A. Fischer, C. P aar 7 3 Metho dology W e fo cus on segmenting metal lines while ignoring vias, since vias can t ypically be detected using thresholding or machine learning [ SLG20 ]. In this section, we describ e ho w we construct SAMSEM to produce reliable segmen tation results, even for unseen IC s. After motiv ating ﬁne-tuning in Section 3.1 and introducing our multi-scale segmentation pip eline in Section 3.2 , we describ e our dataset comprising 14 IC s in Section 3.3 . Next, in Section 3.4 , w e explain our ﬁne-tuning, data augmen tation, and (top ological) loss function. W e then rep ort the b est-suited parameters for our segmen tation pipeline as determined through hyperparameter optimization in Section 3.5 . Finally , w e describ e ho w we generate segmen tation masks with SAMSEM in Section 3.6 . 3.1 Motivation SAM2 w as trained on a large and diverse dataset of natural images. It w as designed to segmen t RGB images of ob jects, animals, or people. SAMSEM is based on SAM2 , but targets the non-interactiv e segmen tation of metal lay er images from IC s tak en with SEM s. These images diﬀer signiﬁcantly in that they are grayscale and exhibit rep eating structures, but little v ariation in their features. Hence, when simply segmen ting on our metal la yer images, SAM2 (without ﬁne-tuning) pro duces catastrophic results with a segmen tation pixel accuracy (P A) of only 0.639 and an ESD error rate of 75 . 6 % , see T able 2 . T o truly lev erage the p ow er of SAM2 for IC metal lay er image segmen tation, w e m ust adapt the mo del to our sp eciﬁc domain and ﬁne-tune the SAM2 mo del on representativ e data. 3.2 Segmentation Pip eline W e no w discuss the segmen tation pipeline of our metal-lay er segmen tation system based on SAM2 . The pip eline takes into account that SEM images of metal lay ers ma y diﬀer signiﬁcan tly in shap e, size, and resolution. 3.2.1 Mo del 1 – Segmenting Original-Size Images Initially , w e ﬁne-tuned SAM2 b y simply providing the original-size images obtained from an SEM , along with the ground truth. W e refer to this mo del as model 1 . The SAM2 image enco der op erates on 1024 × 1024 -pixel images; hence, images of other sizes are simply scaled to ﬁt its resolution. In our setting, w e often encoun ter high-resolution images of 10 , 000 × 10 , 000 pixels or more, as these are typical output sizes for SEM s. When fed to the image encoder, these images are simply scaled down to 1024 × 1024 pixels. How ever, suc h metal lay er images often depict ﬁne-grained structures that are then brough t closer together (in terms of pixel distance) or ev en fused during do wnscaling. This, in turn, A causes short circuits in the segmentation when compared to the ground truth, thereb y signiﬁcan tly increasing the ESD error rate; see the fused metal lines in Figure 3c . (a) Input image. (b) Ground truth. (c) Segmentation mask. Figure 3: A – ( a ) to ( c ) depict short circuits in the segmen tat ion mask resulting from do wnscaling the input image to ﬁt the shap e exp ected b y the image encoder. 8 S AMSEM – A Generic and Scalable Approac h for IC Metal Line Segmen tation 3.2.2 Mo del 2 – Segmenting Images in Smaller Patches A straigh tforw ard approac h to address this issue w ould b e to simply cut the input images in to smaller patches and feed only these patc hes into the SAM2 image enco der for segmen- tation. T o this end, w e ﬁne-tuned a second mo del, named mo del 2 , on 512 × 512 -pixel patc hes only . The patches are cut out with at least 10 % o verlap and upscaled to 1024 × 1024 pixels to ﬁt the image encoder shape. The upscaling prov ed useful as it increases the pixel distance b etw een closely neighboring metal lines. Mo del 2 signiﬁcan tly impro v es up on the issue observ ed in mo del 1 and thereb y reduces the n umber of shorts in the segmen tation. Ho wev er, it also generates a diﬀerent class of errors that drastically increases the num b er of false p ositives in the resulting segmen tation mask. Analyzing these errors, w e found some input image patches that comprise primarily , or even entirely , of either bac kground or foreground structures. F or these patches, our mo del 2 hallucinates segmentations that do not exist in the actual input images. Here, we observed tw o t yp es of errors stemming from these hallucinations: Firstly , B the segmen tation mask may end up b eing mostly in verse to the ground truth, e. g., bac kground is segmented as a metal line, see Figures 4a to 4c . Secondly , C parts of the patc h ma y be wrongly classiﬁed as a metal line with sp eckled transitions to bac kground, see Figures 4d to 4f . Here, each white speckle would b e in terpreted as a false p ositiv e b y the ESD metric, thereby v astly increasing our error rate. (a) Input image. (b) Ground truth. (c) Segmentation mask. (d) Input image. (e) Ground truth. (f ) Segmentation mask. Figure 4: B – ( a ) to ( c ) depict bac kground being segmen ted as a metal line b ecause of the lack of structures in the input image, resulting in FP s or short circuit. C – ( d ) to ( f ) sho w white sp eckles around the correctly identiﬁed metal line in the segmentation mask, resulting in v ast amoun ts of false positives. 3.2.3 Multi-Scale Segmentation Approach T o address b oth issues at once, we introduce a multi-scale segmentation approac h, as depicted in Figure 5 . W e sim ultaneously segmen t the full-size image using model 1 and 512 × 512 -pixel patches extracted from the input image using mo del 2 . This multi-scale approac h ensures that our fully-automated segmentation algorithm p erforms w ell for b oth small and large structures in arbitrary metal lay er images and op erates reliably , indep enden t of a SEM ’s magniﬁcation and resolution. By w orking on full-size images, w e ensure that SAM2 is provided suﬃcient context for larger structures that extend b ey ond a single patc h, while working on patches improv es segmentation p erformance for ﬁne-grained structures. This approac h produces t wo segmentation masks for eac h input image: one C. Gehrmann, J. Rick er, S. Damm, D. Cheng, J. Sp eith, Y. Shi, A. Fischer, C. P aar 9 comp osed of 512 × 512 -pixel patc hes and one directly corresp onding to the segmen tation of the original-sized image. Both masks are analyzed patch-b y-patch to construct the ﬁnal segmentation mask. Since SAM2 alw ays pro duces segmen tation masks of the size of the input image, the mask pro duced by model 1 is cut into patc hes of 512 × 512 pixels. The masks pro duced by mo del 2 are already of size 512 × 512 . Our decision algorithm, whic h w e present hereafter, then comp oses the ﬁnal segmen tation mask by choosing patc hes from either model 1 or mo del 2 based on light w eight quality metrics. This c hoice is made for eac h segmen tation mask patch, so the ﬁnal mask can comprise patches from both approaches. Simply put, the decision algorithm fav ors segmentation mask patc hes from model 2 for ﬁne-grained structures in images, while selecting patches from model 1 for images depicting larger structures or lac king structure altogether. Thereb y , w e combine the adv antages of b oth approac hes while mitigating the issues depicted in Figures 3 and 4 . C B A A 1 2 Figure 5: W orkﬂow of our multi-scale segmentation approach. 3.2.4 Deciding Betw een P atches from Mo del 1 o r Mo del 2 Our decision algorithm, which chooses betw een patc hes from mo dels 1 and 2 , is based on classical computer vision tec hniques applied to the segmen tation mask patc hes produced b y either approac h. Giv en that the errors B and C alw ays pro duce repro ducible patterns, detecting their presence is straigh tforward. T o this end, we count the small comp onen ts within a segmen tation mask patc h to determine the n umber of sp eckles in the segmen tation. Our experiments hav e sho wn that many such sp ec kles indicate a noisy segmen tation that should b e disregarded. F or a detected comp onent to b e counted as a speckle, it must b e at most 16 pixels in size, b ecause larger comp onents are usually v alid segmen tations. Figure 6 depicts the decision diagram for how a segmentation mask patch from mo dels 1 or model 2 is selected. Here, the property “sp eckled” refers to a patch containing at least 50 sp eckles for model 1 and 50 speckles for model 2 . F urthermore, t wo patches (one from eac h mo del) “agree” if at least 60 % of their pixels are iden tical. These thresholds were determined through a grid search in whic h w e automatically tested diﬀeren t parameter v alues to minimize the n um b er of ESD errors across segmen tation patc hes. If both the patc h from model 1 and the one from mo del 2 are considered “sp ec kled”, an error is rep orted, and the resp ective patch of the original image is ﬂagged for man ual inspection. 3.3 Datasets Our v ast and diverse dataset con tains images from 48 metal lay ers of 14 diﬀeren t IC s. T o the b est of our knowledge, previous w ork has op erated on images of at most four IC s, 10 SAMSEM – A Generic and Scalable Approac h for IC Metal Line Segmen tation speckled 1 speckled 2 speckled 2 & agree 2 1 choose or 1 2 choose 2 choose 1 ERROR no no no yes yes yes no yes Figure 6: Decision diagram illustrating how patches from either model 1 or 2 are c hosen, or an error is pro duced, based on their segmen tation qualit y . cf. Ng et al. [ NTCG24 ]. Our dataset includes these four IC s, one IC published by Rothaug et al. [ RKA + 23 ], and multiple la y ers from 9 additional IC s captured by our o wn team. While details on these IC s cannot be disclosed for legal reasons, they range from around 200 nm do wn to 20 nm structure size and are composed of v arying materials. The metal la yer images w ere captured using diﬀeren t SEM s, diﬀeren t capturing settings, and also diﬀeren t sample preparation tec hniques. As shown in Figure 1 , the resulting dataset is quite diverse in the shap es, num b ers, and sizes of the metal lines. This dataset div ersity is essen tial for ﬁne-tuning the SAM2 mo del so that it p erforms well ev en on IC s it has not seen during ﬁne-tuning. Finally , the dataset enables us to eﬀectiv ely test b oth the in-distribution p erformance of SAMSEM on IC s it has seen during ﬁne-tuning and its out-of-distribution p erformance on unseen ICs, see Section 4 . 3.3.1 Dataset Comp osition Ov erall, a small fraction of the metal la yer images captured from eac h IC has b een semi- automatically annotated and man ually veriﬁed to generate a ground truth for training, v alidation, and testing. Hence, each image in our dataset is accompanied by a corresp onding binary ground-truth mask. T able 1 lists the absolute num b er of annotated images for each IC , the num b er of 512 × 512 pixel image patc hes, and the num b er of metal lay ers from whic h these images ha ve b een tak en. Due to the v ariet y of capture tec hniques used for the diﬀeren t IC s, the full-size images v ary widely in size. Hence, the n um b er of patc hes is b etter suited for a comparison in terms of the amoun t of a v ailable data per IC . F or IC s 8 to 14, rather few images hav e been annotated. T able 1: Number of images a v ailable from each IC , n um b er of 512 × 512 pixel patc hes, and num b er of metal la yers from whic h the images w ere tak en. IC 1 2 3 4 5 6 7 # images 150 70 90 50 50 140 321 # patc hes 4050 1120 1950 1000 1000 2600 23,112 # la yers 3 7 2 1 4 3 1 IC 8 9 10 11 12 13 14 # images 12 5 5 3 3 28 10 # patc hes 756 342 254 189 48 2735 810 # la yers 4 5 5 3 3 6 1 C. Gehrmann, J. Rick er, S. Damm, D. Cheng, J. Sp eith, Y. Shi, A. Fischer, C. P aar 11 3.3.2 Dataset Splits F or ﬁne-tuning (see Section 3.4 ), the hyperparameter optimization (see Section 3.5 ), and ev aluation (see Section 4 ), we split the original-size images from IC s 1 to 7 into three distinct datasets: 70 % of the images of eac h IC are used for training, 10 % for v alidation and mo del selection during the hyperparameter searc h, and 20 % for testing and ev aluation of the in-distribution segmentation accuracy on IC s that the mo del has seen during ﬁne- tuning. T o ev aluate the out-of-distribution performance of SAMSEM on unseen IC s, all images of IC s 8 to 14 are used for testing. In particular, none of the images from IC s 8 to 14 are used for training, v alidation, or mo del selection. In a separate experiment, we use 90 % of all images from all 14 IC s to ﬁne-tune our ﬁnal mo del, using the h yp erparameters determined b eforehand (see Section 4.5 ). In this case, the remaining 10 % of the images are used to ev aluate the segmentation accuracy of this ﬁnal mo del. 3.4 Fine-T uning In this section, w e pro vide details on how we ﬁne-tune SAM2 for IC metal la yer image segmen tation. In particular, w e discuss the need for ﬁne-tuning, the data augmen tation tec hniques w e deplo yed, and our choice of loss functions. 3.4.1 Selection of Mo del Components F or SAMIC [ NTCG24 ], Ng et al. only ﬁne-tuned the mask decoder of SAM (v ersion 1). Ho wev er, we argue that to thoroughly inv estigate the potential of SAM2 for metal line segmen tation, we also need to consider the tw o remaining mo del comp onen ts relev ant to image segmen tation, cf. Section 2.2 . Hence, w e now inv estigate the individual impact of the mask decoder, prompt encoder, and image enco der on SAMSEM’s segmen tation p erformance. T o this end, we ﬁne-tune sev en SAM2 mo dels with all p ossible combinations of these three components, and compare them against the oﬀ-the-shelf SAM2 mo del as trained b y Meta in T able 2 . The ev aluation setup is equiv alent to the in-distribution tests p erformed later in Section 4.1 , i. e., w e train on the training set comprising 70 % of the images from IC s 1 to 7 and test on the test set con taining 20 % of the images from the same IC s. W e compare the impact of the components using standard pixel-based pixel accuracy ( P A ), Dice [ SLG20 ], and IoU [ R TG + 19 ] metrics, as well as the ESD error rate. The ESD error rate is given as the relative num b er of ESD errors p er metal line. T able 2: Ablation study for SAMSEM’s p erformance when ﬁne-tuning only selected comp onen ts of SAM2 . The ﬁrst line refers to the default SAM2 mo del without ﬁne-tuning. ﬁne-tuned components P A ↑ Dice ↑ IoU ↑ ESD ↓ (%) training time mask prompt image deco der enco der enco der □ □ □ 0.639 0.111 0.067 75.6 - ■ □ □ 0.948 0.933 0.885 12.1 43 h □ ■ □ 0.673 0.332 0.236 66.1 39 h ■ ■ □ 0.944 0.927 0.879 11.5 34 h □ □ ■ 0.972 0.960 0.924 0.8 51 h ■ □ ■ 0.971 0.957 0.921 1.0 42 h □ ■ ■ 0.969 0.954 0.917 1.4 43 h ■ ■ ■ 0.972 0.960 0.925 0.7 44 h F rom T able 2 , w e conclude that the image encoder has the most impact on the segmen tation accuracy , ac hieving an ESD error rate of 0 . 8 % . Fine-tuning the mask deco der also signiﬁcan tly reduces the error rate to 12 . 1 % . The prompt enco der has little 12 SAMSEM – A Generic and Scalable Approac h for IC Metal Line Segmen tation impact on segmen tation and still results in a 66 . 1 % ESD error rate. Combining diﬀeren t enco ders and deco ders naturally pro duces better results than ev aluating them in isolation. W e also see that com bining all three comp onents yields the lo west ESD error rate of 0 . 7 % and the best pixel accuracy , although this is only a marginal improv ement ov er ﬁne-tuning the image enco der alone. Still, given that there is no signiﬁcan t o verhead in training time when combining all three approaches compared to ﬁne-tuning the image encoder alone, we decided to ﬁne-tune the mask deco der, prompt enco der, and image encoder together. 3.4.2 Fine-T uning Pro cess SAM2 comes with four diﬀeren t “c heckpoints”, i. e., mo del sizes. F or SAMSEM, we chose to fo cus on the large c heckpoint, which has the most tunable parameters, as it yielded the most promising results in initial experiments. Fine-tuning is p erformed using PyT orch v2.7.1 and CUD A 13.0. All ﬁne-tuning runs are executed on a cluster of Nvidia H100 mac hine-learning accelerators, whic h comprises a mix of SXM cards with 80 GB and NVL cards with 94 GB of memory . F or all our experiments, we set a ﬁxed batch size of 12 , as this is the maximum that ﬁts in the 80 GB of memory on the SXM cards. The ﬁne-tuning process is as follo ws: First, w e retriev e segmen tation masks from the curren t mo del. T o this end, w e feed the input images of eac h batc h to the image enco der. F or eac h of these images, we then get embeddings of a p oin t prompt from the prompt enco der. Here, w e sample a random p oin t ( x, y ) from the ground-truth foreground (i. e., the p oint lies on a metal line) and use it as a p ositive prompt to the prompt enco der. Next, w e feed the image and prompt em b eddings to the mask deco der, whic h produces lo w-resolution masks. These masks are then p ostpro cessed b y in ternal SAM2 functions to obtain three prediction masks for eac h image in the batch, as w e enabled m ulti-mask output for better segmen tation results [ R GH + 25 ]. Finally , w e compute the loss betw een the ground truth and the segmen tation mask, then perform bac kward propagation and optimize the targeted mo del components using A dam W [ LH19 ]. 3.4.3 Data Augmentation W e lev erage data augmentation during ﬁne-tuning to make our ﬁne-tuned SAM2 mo del more robust to small p erturbations in input images and to expand the training dataset. T o this end, w e use T orch vision from PyT orch to apply transforms to the training data b efore feeding it to the ﬁne-tuning algorithms. Belo w, w e list the individual transforms that we deploy in the order in which they are applied to the input data: 1. RandomResizedCrop : size = 1024 , scale = [0 . 99 − 0 . 5 · intensity , 0 . 99] 2. GaussianBlur : kernel_size = (5 , 9) , sigma = 5 . 0 · intensity 3. ColorJitter : brightness = 0 . 75 · intensity , contrast = 0 . 5 · intensity 4. RandomVerticalFlip : probability = 0 . 5 · intensity 5. RandomHorizontalFlip : probability = 0 . 5 · intensity 6. GaussianNoise : mean = 0 . 0 , sigma = 0 . 5 · intensity , clip = T rue The sp eciﬁc parameters were manually selected b y domain exp erts to serve as the upp er b ound for data augmen tation. The experts selected these parameters so that, for an in tensity v alue of 1 , the resulting images could still (but barely) b e interpreted by a human analyst. All transforms share a common data augmen tation in tensity v alue that can be set to an y v alue in (0 , 1] and is determined by our h yp erparameter optimization, as sho wn in Section 3.5 . During eac h iteration, we randomly apply all data augmentation transforms to the input image or none at all. The probabilit y of applying data augmen tation to an image is also determined through hyperparameter optimization. C. Gehrmann, J. Rick er, S. Damm, D. Cheng, J. Sp eith, Y. Shi, A. Fischer, C. P aar 13 3.4.4 Loss Functions F or our pixel-based loss function L pixel , w e c ho ose a com bination of the binary cross en tropy ( BCE ) loss L BCE [ MMZ23 ] and the Dice loss L Dice [ SL V + 17 ] based on a blending parameter α ∈ (0 , 1) that is determined by our hyperparameter searc h: L pixel = α · L BCE + (1 − α ) · L Dice . SAMIC [ NTCG24 ] additionally employs the IoU loss [ R TG + 19 ], but our segmen tation accuracy did not improv eme when incorp orating the IoU in to L pixel . W e w ant to promote correct connectivity and penalize shorts and op en circuits during training to reduce the ESD error rate. Since the ESD error computation itself is not diﬀeren tiable, it cannot b e used as a loss function righ t a wa y . Instead, we need to ﬁnd a diﬀerentiable loss term that behav es similarly to the ESD error. T o this end, w e supplemen t our pixel-based loss with a top ological loss. After careful consideration and consultation with experts, w e selected the Betti matc hing loss [ SPS + 23 ], L Betti , for this purp ose. W e also explored tw o other topology-preserving loss functions, namely T op oLoss b y Hu et al. [ HLSC19 ] and T op ograph by Lux et al. [ LBW + 25 ], but decided against them for runtime and eﬀectiveness reasons. Generally , such loss functions fo cus on correct connectivit y , but still need to b e combined with pixel-based losses to enforce lo cal accuracy . In the follo wing, w e only consider the application of Betti matc hing to images. The Betti matching loss is based on concepts from p ersistent homology , which examines ho w top ological features emerge and disappear as a ﬁltration parameter ϵ v aries. In particular, it trac ks when top ological features, such as connected comp onents, cycles, or holes, are born and die along the ﬁltration. This ﬁltration can, for example, b e induced b y thresholding a scalar-v alued function such as the model’s prediction likelihoo ds. The n umber of detected top ological structures (e. g., corresp onding to metal lines in our images) then dep ends on the c hosen threshold v alue. F or eac h topological feature, a pair ( b, d ) is stored that records its birth time b (i. e., the v alue of ϵ at whic h the feature ﬁrst app ears) and its death time d (i. e., the v alue of ϵ at which it merges in to an older feature or b ecomes top ologically trivial). T racking many such features sim ultaneously pro duces a so-called p ersistence barco de, where each bar represen ts the birth and death times of a single feature. Betti n um b ers β k ( ϵ ) summarize this information by counting the n umber of k -dimensional top ological features that are aliv e at a given v alue of ϵ . F or images, we only consider k ∈ { 0 , 1 } , where β 0 ( ϵ ) counts connected components and β 1 ( ϵ ) counts lo ops or holes. The Betti matc hing loss go es one step further by spatially matc hing topological features from one image (e.g., the ground truth G ) to corresp onding features in another image (e.g., the prediction mask L ). T o this end, persistence barcodes are computed for b oth images. F or the loss computation, an optimal matching b et ween the t wo p ersistence barcodes is determined, and the resulting matching error is calculated. This error is then used to construct a diﬀeren tiable, eﬃcien t, and top ology-aw are loss function. The Betti matc hing loss L Betti is already implemen ted as a Python package 2 that is based on the w ork of Stuc ki et al. [ SPS + 23 , SBPB24 ] and Berger et al. [ BLS + 24 ], making it straightforw ard to use. It comes with a few t weakable parameters relev ant to our h yp erparameter searc h. Firstly , the filtration_type deﬁnes ho w the comparison image C is computed as C ← min ( G, L ) , i. e., the element-wise minimum of ground truth G and prediction mask L , whic h in turn go v erns the Betti matching process. It can tak e the v alues superlevel (features appear as input v alues decrease), sublevel (features appear as input v alues increase), and bothlevels (applies b oth ﬁltration types and com bines results). Secondly , the parameter barcode_length_threshold go verns the robustness of the Betti matching loss against short-lived top ological features that may arise from unclean or noisy prediction masks. Lastly , push_unmatched_to_1_0 determines whether 2 https://pypi.org/project/topolosses/ 14 SAMSEM – A Generic and Scalable Approac h for IC Metal Line Segmen tation unmatc hed birth p oints are pushed tow ard 1 while unmatc hed death p oin ts are pushed to ward 0 ( True ) or whether b oth points are just pushed together ( False ). In the end, we construct our ﬁnal segmen tation loss L seg used for ﬁne-tuning as L seg = L pixel + λ · L Betti with λ ∈ [0 , 1] being a blending parameter determined b y the hyperparameter searc h. 3.5 Hyp erpa rameter Optimization T o determine the b est-suited hyperparameters for ﬁne-tuning SAM2 , w e conducted a h yp erparameter search using the hyperband pruner from the Optuna framework [ ASY + 19 ]. This pruner attempts to minimize an ob jective function, speciﬁcally the relativ e num b er of ESD errors in relation to the total n umber of metal lines, as computed on our v alidation dataset from ICs 1 to 7. In total, the h yp erparameter searc h ran for sev en da ys. 3.5.1 Setup Due to runtime limitations, we opted for a tw o-step h yp erparameter optimization: (i) First, w e searched for the b est parameters related to data augmen tation and pixel-based loss functions b y rep eatedly ﬁne-tuning SAM2 on our training dataset from IC s 1 to 7. T o this end, w e set the maximum n umber of epo c hs p er h yp erparameter optimization run to 25 and allo wed pruning only after 5 ep o chs. During this ﬁrst step, we set λ = 0 and hence L seg = L pixel . This is in line with recommendations from Berger et al. to add Betti matc hing to the loss computation only after pixel-based losses hav e stabilized, to ac hieve p erformance gains. T o iden tify the b est parameter set, we ev aluated eac h model created during the hyperparameter searc h after every ep o ch using our v alidation dataset from IC s 1 to 7. W e then selected the best-p erforming parameter set b y analyzing the p erformance of all ﬁne-tuned mo dels at ev ery epo ch. Using these selected parameters, w e ﬁne-tuned the oﬀ-the-shelf SAM2 for 50 ep o c hs. Here, w e found that this ﬁne-tuned mo del performs b est after 35 training epo chs; hence, w e selected this v ersion as a starting p oint for the second step. (ii) Finally , we enabled the Betti matching loss for contin ued ﬁne-tuning of this selected model and started a second hyperparameter optimization to determine suitable hyperparameters for this topological loss function. The second h yp erparameter optimization ran for 15 additional ep o chs, bringing the total to up to 50 ﬁne-tuning ep o chs p er trial across b oth steps. 3.5.2 Results F or the ﬁrst hyperparameter optimization step (i), we ran 56 Optuna trials, eac h yielding a ﬁne-tuned mo del trained for up to 25 epo chs. In total, 49 ﬁne-tuning trials were pruned b y Optuna before completing all epo chs. F or the b est trial, the objective function, i. e., the percentage of ESD errors in relation to the total n umber of metal lines, reac hed its minim um at ep o ch 22 with a v alue of 0 . 659 % . Based on our ev aluation of this trial, we determined the probabilit y of data augmentation b eing applied to be 38 . 5 % and the data augmen tation in tensit y to be 0 . 61 . F or the pixel- based loss function L pixel and the A dam W optimizer, we ﬁx the learning rate at 1 . 111 · 10 − 5 , the w eight decay at 2 . 059 · 10 − 6 , and a blending parameter v alue of α = 0 . 6 . F urthermore, w e observ ed that the learning rate has the greatest impact on segmen tation performance, with v alues abov e 10 − 3 often leading to lo cal minima, resulting in the ﬁne-tuned mo del pro ducing only completely black segmentation masks. In the second step (ii), w e conducted a total of 75 Optuna trials o ver 5 da ys, with 58 trials b eing pruned b efore completion. F or the b est trial, w e rep ort λ = 0 . 375 as the blending parameter for L seg . F urthermore, w e determined the optimal parameter C. Gehrmann, J. Rick er, S. Damm, D. Cheng, J. Sp eith, Y. Shi, A. Fischer, C. P aar 15 set for the Betti matc hing loss in our application. T o this end, we c ho ose sublevel as the filtration_type , 0 . 345 as the barcode_length_threshold , and enabled pushing unmatc hed birth p oints to 1 and death p oints to 0 ( push_unmatched_to_1_0 ). 3.6 Segmenting Images Besides the input image, SAMSEM also requires at least one point prompt pinpointing the structure to segment. T o generate an eﬀectiv e point prompt, w e need to ensure that the p ositive prompt w e pro vide is actually located on one of the metal lines in the input image. T o this end, w e use light w eight classical image analysis on the input images fed to b oth mo dels 1 and 2 to identify parts that are highly likely to b e foreground, i. e., b elong to a metal line. W e do not aim for a fully accurate segmen tation at this point, w e just need to iden tify some foreground areas, but not all of them. The binary mask is generated b y aggressively thresholding the brightest pixels in the input image. In addition, we apply denoising, remov e small ob jects, and apply ligh t morphology to eliminate isolated pixels, ﬁll tin y gaps, and smo othen ob ject b oundaries. W e then place ﬁve random point prompts within the iden tiﬁed foreground areas and feed them to the ﬁne-tuned model along with the input image. In our experiments, w e observed that pro viding more point prompts do es not impro ve segmentation results. SAMSEM then pro duces m ultiple segmen tation masks, along with scores rating their quality . W e alw ays return the mask with the highest score as the ﬁnal segmentation mask. 4 Evaluation First, we ev aluate SAMSEM’s in-distribution performance using previously unseen SEM images from IC s 1 to 7, as sho wn in Section 4.1 . Next, w e in vestigate SAMSEM’s out-of- distribution performance when faced with metal la yer images from previously unseen IC s in Section 4.2 . W e then analyze failure cases in Section 4.3 , test diﬀerent dataset splits with reduced num b ers of IC s used for ﬁne-tuning to b etter understand how SAMSEM generalizes to unseen IC s in Section 4.4 , and present its segmentation accuracy when trained on our full dataset of 14 ICs in Section 4.5 . 4.1 In-Distribution Perfo rmance on Seen ICs In this section, we ev aluate the in-distribution performance of SAMSEM and compare it to previous w ork and established image segmen tation approac hes. T o this end, w e used the training datasets for IC s 1 to 7 that we established in Section 3.3 for all considered tec hniques. T o compare the p erformance of SAMSEM with other techniques, we ﬁne-tuned SAMIC [ NTCG24 ] using the co de provided b y Ng. et al. on request. F urthermore, we trained U-Net, DeepLabV3, and F CN using the co de published b y Rothaug et al. [ RKA + 23 ]. A t the time of writing, w e w ere unable to execute the unsupervised method proposed b y Rothaug et al. on our dataset, as it appeared to get stuck in lo cal minima and just pro duced all-blac k segmen tation masks during our experiments. W e then ev aluated eac h mo del on our test dataset from IC s 1 to 7 and measured the accuracy of the resulting segmen tation against the ground truth. Refer to T able 3 for the results. Hence, in this exp eriment, we assess ho w w ell SAMSEM p erforms on metal-la y er images similar in structure to those on whic h it was trained. As w e are more interested in electrical correctness than in pixel-accuracy , we follow T rindade et al. [ TUSP18 ] and adopt their ESD metric for ev aluation. W e leverage the implemen tation of Rothaug et al. [ RKA + 23 , R CK + 25 ] to coun t the num b er of op ens, shots, FP s, and FN s in the segmen ted images compared to the ground truth. T o this end, w e rep ort the absolute n umber of ESD errors observ ed in our test set, the relativ e n umber of 16 SAMSEM – A Generic and Scalable Approac h for IC Metal Line Segmen tation T able 3: In-distribution performance of SAMSEM compared to other approaches. All mo dels were trained and ev aluated on the same training dataset from IC s 1–7. W e report the ESD error rate in tw o forms: as the absolute num b er of ESD errors observed for our test set, and as the relative num b er of ESD errors p er metal line, expressed as a p ercen tage. The total n umber of metal lines in this test set is 36 , 413 . F urthermore, w e presen t commonly used metrics for pixel-lev el segmen tation accuracy . ESD ↓ pixel ↑ total % opens shorts FPs FNs P A Dice IoU SAMSEM 263 0.72 27 146 59 31 0.972 0.960 0.925 - only pixel loss 323 0.88 29 211 54 29 0.972 0.961 0.926 - single model 289 0.79 22 185 59 23 0.972 0.961 0.925 - only 1 1914 5.25 223 1519 96 76 0.950 0.926 0.865 - only 2 286 0.78 27 146 82 31 0.954 0.940 0.900 SAMIC 2178 5.98 412 1083 649 34 0.961 0.944 0.899 U-Net 1618 4.44 78 764 735 41 0.966 0.949 0.905 DeepLabV3 2187 6.01 236 1705 212 34 0.961 0.946 0.901 FCN 2067 5.68 277 1443 316 31 0.963 0.950 0.907 ESD errors per metal line, and the counts of opens, shorts, FP s, and FN s. All ev aluations are conducted on the original-size images. F or SAMSEM, w e also provide ev aluation results for when only the pixel-based loss is used, as well as for each of the t w o mo dels, 1 (op erating on the original-size images) and 2 (op erating on 512 × 512 -pixel patc hes). F urthermore, w e analyze the p erformance when ﬁne-tuning a single model on b oth original-sized images and smaller patc hes. W e are not y et ev aluating SAMSEM’s out-of-distribution performance; hence, we test on the IC s we also used for training. Results. F rom T able 3 , w e observe that SAMSEM is on-par with SAMIC, U-Net, DeepLabV3, and F CN for in-distribution segmentation in terms of the pixel-based metrics, but pro duces signiﬁcan tly few er ESD errors. In particular, SAMSEM drastically reduces the n um b er of op ens, shorts, and FP s in the segmen tation b y about a factor of 6 relativ e to U-Net, which is the next-b est approac h. F urthermore, w e clearly see the b eneﬁt of our m ulti-scale approac h, as mo dels 1 and 2 individually do not ac hiev e comparable results. F or in-distribution analysis, we do not observe a signiﬁcan t impact of using a single model that com bines mo dels 1 and 2 . The ESD error rate increases marginally , which could still b e tolerated in exchange for a more straigh tforward ﬁne-tuning process. Finally , we can see a clear b eneﬁt of using our Betti matc hing loss function o ver pixel-based losses alone. 4.2 Out-of-Distribution Perfo rmance on Unseen ICs T o assess ho w w ell SAMSEM p erforms on previously unseen IC s, w e utilize the models previously trained on the training dataset from IC s 1 to 7 for the in-distribution ev aluation in Section 4.1 . This time, how ever, we test its out-of-distribution p erformance on all a v ailable images from IC s 8 to 14. As these IC s are not used for training, we can safely use all their metal-la y er images for testing. F or comparison, we again ev aluate SAMIC, U-Net, DeepLabV3, and F CN on our dataset. F ollo wing the same ev aluation procedure as describ ed in Section 4.1 , w e report results from these exp eriments in T able 4 . Results. Examining the out-of-distribution results in T able 4 , we can see that SAMSEM no w fully plays to its strengths. While existing work exhibits, at b est, an ESD error rate of 24 . 77 % for unseen IC s, meaning that ab out every fourth segmented metal line is incorrect, SAMSEM achiev es an error rate of just 5 . 53 % . Again, w e see that using either model 1 C. Gehrmann, J. Rick er, S. Damm, D. Cheng, J. Sp eith, Y. Shi, A. Fischer, C. P aar 17 T able 4: Out-of-distribution p erformance of SAMSEM compared to other approac hes. All mo dels were trained on the same training dataset from IC s 1–7 and ev aluated on IC s 8–14. W e rep ort the ESD error rate in t wo forms: as the absolute num b er of ESD errors observed for our test set, and as the relativ e n umber of ESD errors p er metal line, expressed as a p ercentage. The total num b er of metal lines in this test set is 15 , 153 . F urthermore, w e presen t commonly used metrics for pixel-lev el segmen tation accuracy . ESD ↓ pixel ↑ total % opens shorts FPs FNs P A Dice IoU SAMSEM 838 5.53 531 170 104 33 0.959 0.939 0.888 - only pixel loss 789 5.21 351 234 164 40 0.959 0.944 0.896 - single model 908 5.99 410 235 204 59 0.958 0.937 0.886 - only 1 1607 10.60 340 1112 72 83 0.936 0.920 0.854 - only 2 1443 9.52 649 468 280 46 0.934 0.911 0.852 SAMIC 4116 27.16 2079 363 1608 66 0.934 0.935 0.831 U-Net 7704 44.90 1570 603 4435 196 0.907 0.830 0.753 DeepLabV3 3753 24.77 2462 154 437 700 0.904 0.787 0.714 FCN 5815 38.38 3158 303 2006 348 0.910 0.826 0.751 or mo del 2 in isolation w ould not yield satisfactory results, although both still p erform b etter than other approaches from the literature. Now, we can also observ e degraded p erformance if w e com bine mo del 1 and 2 in to a single one. In that case, the ESD error rate for generalization w ould increase b y about 0.5 percentage p oints. Finally , when using only the pixel-based loss, w e observed a slight improv ement in segmentation accuracy . Lo oking more closely , w e can observe that SAMSEM with Betti matc hing loss results in more open circuits, but few er shorts and FP s than the model that was ﬁne-tuned without an y top ological loss function. 4.3 Analysis of F ailure Cases W e no w tak e a closer look at the distribution of ESD errors across the test images used in our in-distribution ( IC s 1 to 7) and out-of-distribution ( IC s 8 to 14) ev aluation. F or in-distribution, we observe only one outlier image, exhibiting 92 ESD errors, as sho wn in Figure 7a . Although w e observe more outliers in our out-of-distribution ev aluation, w e can still report decen t results for most images, as sho wn in Figure 7b . Here, we count 7 images with more than 40 ESD errors and another 5 with more than 20 errors. 0 50 100 150 0 20 40 60 80 100 image ID ESD errors (a) In-distribution (ICs 1 to 7). 0 20 40 60 0 20 40 60 80 100 image ID ESD errors (b) Out-of-distribution (ICs 8 to 14). Figure 7: Distribution of the absolute num b er of ESD errors across all original-size images. W e no w examine some of these failure cases in more detail to identify the reasons b ehind the observed results. Across our sample, we mostly observ e three sources of errors: 18 SAMSEM – A Generic and Scalable Approac h for IC Metal Line Segmen tation (i) F or the outlier in Figure 7a , w e observ e a sligh t misalignmen t betw een the actual metal lines on the SEM images and the ground truth mask, as sho wn in Figure 8a . Here, the ESD metric simply coun ts in v alid o verlaps b etw een a metal line in the ground truth and vias in the predicted segmentation mask as shorts. (ii) The ﬁrst outlier in Figure 7b results from t wo via-related issues. Existing vias in that image cast a blac k shade around them, see Figure 8b . They partially obscure the underlying metal line and mislead SAMSEM into predicting bac kground around the via. W e observ ed the same issue for other outliers in the out-of-distribution test, whic h can be explained by the absence of an y images with similar c haracteristics in our training set. F or Figure 8b , we also observe that some vias are missing altogether, as they w ere accidentally remo v ed during sample preparation. This results in narro w metal lines around the missing via that SAMSEM do es not properly segmen t. (iii) Some failures around image 54 in Figure 7b are due to dela yering defects, see Figure 8c for an example. Here, some metal lines were incorrectly remov ed during sample preparation, leaving only their outlines visible. These failures increase the ESD error rate across all aﬀected images. (a) Misalignmen t of ground truth (green) and image. (b) Dark shades around vias and missing vias. (c) Dela yering defects, only outlines of metal lines visible. Figure 8: F aulty images with artifacts or defects from sample preparation or imaging. Of course, not all ESD errors can b e attributed to suc h sample preparation and image qualit y issues. How ever, our out-of-distribution test set from IC s 8 to 14 is of lo wer quality than IC s 1 to 7, which helps to explain the more frequen t signiﬁcan t outliers in Figure 7b . 4.4 Mo re ICs Result in Better Generalization W e now inv estigate ho w the num b er of IC s seen during ﬁne-tuning aﬀects the out-of- distribution segmen tation performance on unseen IC s. T o this end, we ﬁne-tune SAM2 on diﬀerent subsets of IC s 1 to 7, ranging from a single IC to six diﬀeren t IC s. F or eac h n umber of IC s, w e ﬁne-tune on four diﬀerent randomly c hosen IC com binations and then alw ays infer on the same out-of-distribution test set from IC s 8 to 14. As b ecomes eviden t from the resulting n umbers of ESD errors in Figure 9 , the more IC s are used for ﬁne-tuning, the b etter the out-of-distribution p erformance. Intuitiv ely , this was exp ected. Ho wev er, our results also indicate that the ESD error rate has not y et saturated, and this down ward trend is lik ely to contin ue when using ev en more ICs for ﬁne-tuning. 4.5 Fine-T uning on All 14 ICs After carefully ev aluating the in-distribution and out-of-distribution capabilities of SAM- SEM when ﬁne-tuning on sev en IC s, we ﬁne-tuned on 90 % of all annotated images from C. Gehrmann, J. Rick er, S. Damm, D. Cheng, J. Sp eith, Y. Shi, A. Fischer, C. P aar 19 7 2 3 6 1,3 3,7 1,2 3,4 1,2,6 1,5,7 2,3,7 2,3,5 2,4,5,7 1,2,4,6 4–7 1–4 1–4,6 2–6 2,4–7 1,2,4,5,7 2–7 1–5,7 1–6 1,3–7 0 1 , 000 2 , 000 3 , 000 6 ICs 5 ICs 4 ICs 3 ICs 2 ICs 1 IC IC combinations num b er of ESD errors opens shorts FPs FNs Figure 9: Diﬀerent training data splits used for ﬁne-tuning to in vestigate whether more div ersity in ICs seen during ﬁne-tuning signiﬁcan tly aﬀects the segmen tation accuracy of SAMSEM on unseen ICs. all 14 IC s. As our ﬁndings from Section 4.4 imply that SAMSEM’s generalization accuracy has not yet saturated, we exp ect this ﬁnal mo del to achiev e even b etter segmentation accuracy . T o test its in-distribution performance, we allocated 10 % of all images from all IC s for testing. When ev aluating on this 10 % test set, w e obtain a P A of 0.971 (Dice: 0.960, IoU: 0.924) and an ESD error rate of 0 . 62 % (22 shorts, 50 opens, 41 FP s, and 6 FN s across 19 , 088 metal lines). Hence, w e observ e a sligh t improv ement ov er the 0 . 72 % in-distribution ESD error rate rep orted for sev en IC s. Giv en that we now use all our IC s for ﬁne-tuning, w e cannot re-ev aluate out-of-distribution p erformance, as the mo del has already seen all IC s in our dataset. Ho wev er, as we publish this ﬁnal mo del as part of our w ork, in terested readers can test it on their o wn IC images without further ﬁne-tuning. 5 Discussion and Conclusion W e now discuss limitations of our approac h in Section 5.1 , p otential av enues for future researc h in Section 5.2 , and presen t our concluding remarks in Section 5.3 . 5.1 Limitations One ma jor limitation is that our ground truth is based on manually annotated SEM image data. In rare cases, it w as even diﬃcult, if not impossible, for the human creating the ground truth to decide whether metal lines were connected. Hence, some remaining errors can b e attributed to uncertain ties in the ground truth. The only viable solution for this problem would b e to use the design ﬁles of the analyzed IC s as a ground truth, which were not av ailable to us for any of the targeted third-part y ICs. The runtime of our exp erimen ts, sometimes spanning up to ﬁve days, presented a b ottlenec k that preven ted us from conducting more complex exp erimen ts, simply b ecause we lac ked the required resources. W e could, for example, not exhaustively test all parameters 20 SAMSEM – A Generic and Scalable Approach for IC Metal Line Segmentation of the Betti matching loss during hyperparameter optimization. F or similar reasons, we w ere unable to conduct an in-depth exploration of other top ology-based loss functions. While we publish our mo dels and scripts as open source, w e are unable to do so for the datasets of 13 of the 14 diﬀerent IC s for legal reasons. Only the dataset we adopted from Rothaug et al. [ RKA + 23 ] is made publicly av ailable by the authors. In all other cases, cop yright law, license agreemen ts, and the risk of legal action b y either the designers or man ufacturers of the IC s result in an unfortunate legal situation that could threaten not only us as researchers but also the publishers of w orks lik e ours. 5.2 F uture W ork While w orking on this publication, SAM3 has been released [ CGH + 25 ]. At the time of writing, the mo del was only av ailable on request. F urthermore, p erformance data published b y Meta does not indicate a signiﬁcan t impro vemen t in image segmen tation; therefore, w e do not exp ect a substantial b eneﬁt from switching to SAM3 . Nonetheless, the applicability of SAM3 to metal line segmentation remains to b e in vestigated. W e primarily see additional room for improv emen t in our generalization results. Here, the use of additional, higher-quality datasets featuring more annotated images could help further improv e generalization p erformance, as suggested by our ﬁndings in Section 4.4 . Ideally , suc h datasets could b e made publicly a v ailable b y IC designers or manuf acturers, who also p ossess the GDSII ground truth corresp onding to the images. Finally , future research should inv estigate whether techniques to mitigate sample preparation or imaging defects [ HCY + 21 ], as depicted in Figure 8 , can be incorporated in to approaches lik e SAMSEM. T o this end, larger datasets con taining more erroneous SEM images, paired with reliable ground truth, would b e required. 5.3 Conclusion In this work, we introduced SAMSEM, a to ol for IC metal lay er image segmentation based on SAM2 . W e demonstrated that our approach pro duces reliable segmentation masks with an ESD error rate of 0 . 62 % . SAMSEM works almost ﬂa wlessly for IC s that it has been ﬁne-tuned on, signiﬁcantly impro ving up on existing work. F urthermore, it shows promising results ev en for IC s that it w as not previously exp osed to. Therefore, w e ha ve sho wn that SAMSEM generalizes w ell across IC tec hnology no des and image capturing tec hniques. Ho wev er, we also see that our approach could b e further enhanced by ﬁne-tuning on additional image datasets from more diverse IC s. Therefore, we strongly adv o cate for IC designers and man ufacturers to mak e annotated image datasets publicly av ailable as b enc hmarks to improv e not only the reliability and accuracy of IC v eriﬁcation tools lik e ours, but also the reproducibility and comparabilit y of researc h results. By publishing our ﬁnal SAMSEM mo del trained on images from all 14 IC s, along with our training, inference, and ev aluation scripts, as op en source, we tak e a ﬁrst step in this direction and hope to lo wer the barriers for researc hers to enter this domain. References [ASY + 19] T akuya Akiba, Shotaro Sano, T oshihiko Y anase, T akeru Oh ta, and Masanori K oy ama. Optuna: A next-generation h yp erparameter optimization framework. In Ankur T eredesai, Vipin Kumar, Ying Li, Rómer Rosales, Evimaria T erzi, and George Karypis, editors, Pr o c e e dings of the 25th A CM SIGKDD International Confer enc e on K now le dge Disc overy & Data Mining, KDD 2019, A nchor age, AK, USA, A ugust 4-8, 2019 , pages 2623–2631. A CM, 2019. C. Gehrmann, J. Rick er, S. Damm, D. Cheng, J. Sp eith, Y. Shi, A. Fischer, C. P aar 21 [BLS + 24] Alexander H. Berger, Laurin Lux, Nico Stuc ki, Vincent Bürgin, Suprosanna Shit, Anna Banaszak, Daniel R ueck ert, Ulric h Bauer, and Johannes C. Paet- zold. T op ologically faithful multi-class segmen tation in medical images. In Marius George Linguraru, Qi Dou, Aasa F eragen, Stamatia Giannarou, Ben Glo c ker, Karim Lekadir, and Julia A. Schnabel, editors, Me dic al Image Comput- ing and Computer A ssiste d Intervention - MICCAI 2024 - 27th International Confer enc e, Marr akesh, Mor o c c o, Octob er 6-10, 2024, Pr o c e e dings, Part VIII , v olume 15008 of L e ctur e Notes in Computer Scienc e , pages 721–731. Springer, 2024. [CGH + 25] Nicolas Carion, Laura Gustafson, Y uan-Ting Hu, Shoubhik Debnath, Ronghang Hu, Didac Suris, Chaitany a Ry ali, Kaly an V asudev Alw ala, Haitham Khedr, Andrew Huang, et al. Sam 3: Segmen t an ything with concepts. arXiv pr eprint arXiv:2511.16719 , 2025. [CSL + 18] Deruo Cheng, Yiqiong Shi, T ong Lin, Bah-Hwee Gwee, and Kar-Ann T oh. Hybrid k-means clustering and support v ector mac hine method for via and metal line detections in delay ered IC images. IEEE T r ans. Cir cuits Syst. II Expr ess Briefs , 65-I I(12):1849–1853, 2018. [GZYL24] Shixuan Gao, Pingping Zhang, Tian yu Y an, and Huch uan Lu. Multi-scale and detail-enhanced segmen t anything mo del for salien t object detection. In Jianfei Cai, Mohan S. Kankanhalli, Balakrishnan Prabhakaran, Susanne Boll, Ramanathan Subramanian, Liang Zheng, Vivek K. Singh, Pablo César, Lexing Xie, and Dong Xu, editors, Pr o c e e dings of the 32nd A CM International Confer enc e on Multime dia, MM 2024, Melb ourne, VIC, A ustr alia, 28 Octob er 2024 - 1 Novemb er 2024 , pages 9894–9903. A CM, 2024. [HCS + 18] Xuenong Hong, Deruo Cheng, Yiqiong Shi, T ong Lin, and Bah-Hw ee Gw ee. Deep learning for automatic IC image analysis. In 23r d IEEE International Confer enc e on Digital Signal Pr o c essing, DSP 2018, Shanghai, China, Novem- b er 19-21, 2018 , pages 1–5. IEEE, 2018. [HCX + 22] Kaiming He, Xinlei Chen, Saining Xie, Y anghao Li, Piotr Dollár, and Ross B. Girshic k. Masked autoenco ders are scalable vision learners. In IEEE/CVF Confer enc e on Computer V ision and Pattern R e c o gnition, CVPR 2022, New Orle ans, LA, USA, June 18-24, 2022 , pages 15979–15988. IEEE, 2022. [HCY + 21] Ling Huang, Deruo Cheng, Xulei Y ang, T ong Lin, Yiqiong Shi, Kaiyi Y ang, Bah-Hw ee Gwee, and Bihan W en. Joint anomaly detection and inpainting for microscopy images via deep self-supervised learning. In 2021 IEEE Inter- national Confer enc e on Image Pr o c essing, ICIP 2021, A nchor age, AK, USA, Septemb er 19-22, 2021 , pages 3497–3501. IEEE, 2021. [HLSC19] Xiaoling Hu, F uxin Li, Dimitris Samaras, and Chao Chen. T op ology-preserving deep image segmen tation. In Hanna M. W allach, Hugo Laro c helle, Alina Beygelzimer, Florence d’Alché-Buc, Emily B. F ox, and Roman Garnett, editors, A dvanc es in Neur al Information Pr o c essing Systems 32: A nnual Confer enc e on Neur al Information Pr o c essing Systems 2019, NeurIPS 2019, De c emb er 8-14, 2019, V anc ouver, BC, Canada , pages 5658–5669, 2019. [Hu22] Xiaoling Hu. Structure-aw are image segmen tation with homotopy w arping. In Sanmi K oy ejo, S. Mohamed, A. Agarw al, Danielle Belgrav e, K. Cho, and A. Oh, editors, A dvanc es in Neur al Information Pr o c essing Systems 35: A nnual Confer enc e on Neur al Information Pr o c essing Systems 2022, NeurIPS 2022, New Orle ans, LA, USA, Novemb er 28 - De c emb er 9, 2022 , 2022. 22 SAMSEM – A Generic and Scalable Approach for IC Metal Line Segmentation [HWL + 21] Xiaoling Hu, Y usu W ang, F uxin Li, Dimitris Samaras, and Chao Chen. T op ology-aw are segmen tation using discrete morse theory . In 9th International Confer enc e on L e arning R epr esentations, ICLR 2021, V irtual Event, A ustria, May 3-7, 2021 . Op enReview.net, 2021. [KSS + 20] A dam G. Kimura, Jon Sc holl, James Schaﬀranek, Matthew Sutter, Andrew Elliott, Mike Strizic h, and Glen David Via. A decomp osition w orkﬂow for in tegrated circuit veriﬁcation and v alidation. J. Har dw. Syst. Se cur. , 4(1):34–43, 2020. [KYD + 23] Lei Ke, Mingqiao Y e, Martin Danelljan, Yifan Liu, Y u-Wing T ai, Chi-Keung T ang, and Fisher Y u. Segment anything in high quality . In Alice Oh, T ristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, and Sergey Levine, editors, A dvanc es in Neur al Information Pr o c essing Systems 36: A nnual Confer enc e on Neur al Information Pr o c essing Systems 2023, NeurIPS 2023, New Orle ans, LA, USA, De c emb er 10 - 16, 2023 , 2023. [LBW + 25] Laurin Lux, Alexander H. Berger, Alexander W eers, Nico Stucki, Daniel R ueck ert, Ulric h Bauer, and Johannes C. Paetzold. T op ograph: An eﬃcient graph-based framew ork for strictly top ology preserving image segmentation. In The Thirte enth International Confer enc e on L e arning R epr esentations, ICLR 2025, Singap or e, A pril 24-28, 2025 . OpenReview.net, 2025. [LH19] Ily a Loshc hilov and F rank Hutter. Decoupled weigh t deca y regularization. In 7th International Confer enc e on L e arning R epr esentations, ICLR 2019, New Orle ans, LA, USA, May 6-9, 2019 . Op enReview.net, 2019. [LMB + 24] Ch uni Liu, Boyuan Ma, Xiaojuan Ban, Y ujie Xie, Hao W ang, W eihua Xue, Jingc hao Ma, and Ke Xu. Enhancing b oundary segmen tation for top ological accuracy with skeleton-based metho ds. In Pr o c e e dings of the Thirty-Thir d International Joint Confer enc e on A rtiﬁcial Intel ligenc e, IJCAI 2024, Jeju, South K or e a, A ugust 3-9, 2024 , pages 1092–1100. ijcai.org, 2024. [L Y vD + 24] Hong Liu, Haosen Y ang, Paul J. v an Diest, Josien P . W. Pluim, and Mitko V eta. WSI-SAM: multi-resolution segmen t anything model (SAM) for histopathology whole-slide images. In F rancesco Ciompi, Nadieh Khalili, Linda Studer, Milda P o ceviciute, Amjad Khan, Mitk o V eta, Yiping Jiao, Neda Haj-Hosseini, Hao Chen, Shan Raza, F ayy az Minhas, In ti Zlobec, Nik olay Burlutskiy , V eronica Vilaplana, Biagio Brattoli, Henning Müller, and Manfredo Atzori, editors, Pr o c e e dings of the MICCAI W orkshop on Computational Patholo gy, Marr akesh, Mor o c c o, 6 Octob er 2024 , volume 254 of Pr o c e e dings of Machine L e arning R ese ar ch , pages 25–37. PMLR, 2024. [MKP25] Sa yan Mandal, Divyadarshini Karthik eyan, and Manas Paldhe. Sam2lora: Comp osite loss-guided, parameter-eﬃcien t ﬁnetuning of SAM2 for retinal fundus segmentation. CoRR , abs/2510.10288, 2025. [MMZ23] Anqi Mao, Mehry ar Mohri, and Y utao Zhong. Cross-entrop y loss functions: Theoretical analysis and applications. In Andreas Krause, Emma Brunskill, Kyungh yun Cho, Barbara Engelhardt, Siv an Sabato, and Jonathan Scarlett, editors, International Confer enc e on Machine L e arning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA , v olume 202 of Pr o c e e dings of Machine L e arning R ese ar ch , pages 23803–23828. PMLR, 2023. [NTCG24] Y ong-Jian Ng, Y ee-Y ang T ee, Deruo Cheng, and Bah-Hwee Gwee. SAMIC: segmen t an ything model for in tegrated circuit image analysis. In IEEE R e gion C. Gehrmann, J. Rick er, S. Damm, D. Cheng, J. Sp eith, Y. Shi, A. Fischer, C. P aar 23 10 Confer enc e, TENCON 2024, Singap or e, De c emb er 1-4, 2024 , pages 418–421. IEEE, 2024. [PMB + 23] Endres Pusc hner, Thorben Mo os, Steﬀen Beck er, Christian Kison, Amir Moradi, and Christof Paar. Red team vs. blue team: A real-w orld hardware tro jan detection case study across four modern CMOS tec hnology generations. In 44th IEEE Symp osium on Se curity and Privacy, SP 2023, San F r ancisc o, CA, USA, May 21-25, 2023 , pages 56–74. IEEE, 2023. [R CK + 25] Nils Rothaug, Deruo Cheng, Simon Klix, Nicole Auth, Sinan Böc ker, Endres Pusc hner, Steﬀen Beck er, and Christof P aar. A dv ancing training stability in unsup ervised SEM image segmentation for IC lay out extraction. J. Crypto gr. Eng. , 15(4):21, 2025. [R GH + 25] Nikhila Ra vi, V alentin Gabeur, Y uan-Ting Hu, Ronghang Hu, Chaitany a R yali, T engyu Ma, Haitham Khedr, Roman Rädle, Chloé Rolland, Laura Gustafson, Eric Min tun, Junting Pan, Kaly an V asudev Alwala, Nicolas Carion, Chao- Y uan W u, Ross B. Girshick, Piotr Dollár, and Christoph F eic htenhofer. SAM 2: Segment anything in images and videos. In The Thirte enth International Confer enc e on L e arning R epr esentations, ICLR 2025, Singap or e, A pril 24-28, 2025 . Op enReview.net, 2025. [RHB + 23] Chaitan ya Ry ali, Y uan-Ting Hu, Daniel Bolya, Chen W ei, Hao qi F an, Po- Y ao Huang, V aibhav Aggarw al, Arkabandhu Cho wdhury , Omid Poursaeed, Judy Hoﬀman, Jitendra Malik, Y anghao Li, and Christoph F eich tenhofer. Hiera: A hierarchical vision transformer without the b ells-and-whistles. In Andreas Krause, Emma Brunskill, Kyungh yun Cho, Barbara Engelhardt, Siv an Sabato, and Jonathan Scarlett, editors, International Confer enc e on Machine L e arning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA , v olume 202 of Pr o c e e dings of Machine L e arning R ese ar ch , pages 29441–29454. PMLR, 2023. [RKA + 23] Nils Rothaug, Simon Klix, Nicole A uth, Sinan Böck er, Endres Puschner, Steﬀen Bec ker, and Christof Paar. T ow ards unsupervised SEM image segmen tation for IC la yout extraction. In Chip-Hong Chang, Ulric h Rührmair, Lejla Batina, and Domenic F orte, editors, Pr o c e e dings of the 2023 W orkshop on A ttacks and Solutions in Har dwar e Se curity, ASHES 2023, Cop enhagen, Denmark, 30 Novemb er 2023 , pages 123–128. A CM, 2023. [R TG + 19] Hamid Rezatoﬁghi, Nathan T soi, JunY oung Gw ak, Amir Sadeghian, Ian D. Reid, and Silvio Sav arese. Generalized in tersection o ver union: A metric and a loss for bounding box regression. In IEEE Confer enc e on Computer V ision and Pattern R e c o gnition, CVPR 2019, L ong Be ach, CA, USA, June 16-20, 2019 , pages 658–666. Computer Vision F oundation / IEEE, 2019. [SBPB24] Nico Stuc ki, Vincent Bürgin, Johannes C. P aetzold, and Ulrich Bauer. Eﬃcient b etti matching enables top ology-aw are 3d segmentation via p ersistent homology . CoRR , abs/2407.04683, 2024. [SLG20] Aa yush Singla, Bernhard Lippmann, and Helmut Graeb. Reco very of 2d and 3d lay out information through an adv anced image stitching algorithm using scanning electron microscop e images. In 25th International Confer enc e on Pattern R e c o gnition, ICPR 2020, V irtual Event / Milan, Italy, January 10-15, 2021 , pages 3860–3867. IEEE, 2020. [SL V + 17] Carole H. Sudre, W enqi Li, T om V ercauteren, Sébastien Ourselin, and M. Jorge Cardoso. Generalised dice ov erlap as a deep learning loss function for highly 24 SAMSEM – A Generic and Scalable Approach for IC Metal Line Segmentation un balanced segmen tations. In M. Jorge Cardoso, T al Arbel, Gustav o Carneiro, T anv eer F. Syeda-Mahmoo d, João Man uel R. S. T av ares, Mehdi Moradi, Andrew P . Bradley , Ha yit Greenspan, João P aulo P apa, Anan t Madabhushi, Jacin to C. Nascimento, Jaime S. Cardoso, V asileios Belagiannis, and Zhi Lu, editors, De ep L e arning in Me dic al Image A nalysis and Multimo dal L e arning for Clinic al De cision Supp ort - Thir d International W orkshop, DLMIA 2017, and 7th International W orkshop, ML-CDS 2017, Held in Conjunction with MICCAI 2017, Québ e c City, QC, Canada, Septemb er 14, 2017, Pr o c e e dings , v olume 10553 of L e ctur e Notes in Computer Scienc e , pages 240–248. Springer, 2017. [SPS + 21] Suprosanna Shit, Johannes C. Paetzold, Anjan y Sekub oyina, Iv an Ezhov, Alexander Unger, Andrey Zhylka, Josien P . W. Pluim, Ulric h Bauer, and Bjo ern H. Menze. cldice - A no vel top ology-preserving loss function for tubular structure segmen tation. In IEEE Confer enc e on Computer V ision and Pattern R e c o gnition, CVPR 2021, virtual, June 19-25, 2021 , pages 16560– 16569. Computer Vision F oundation / IEEE, 2021. [SPS + 23] Nico Stucki, Johannes C. P aetzold, Suprosanna Shit, Bjoern H. Menze, and Ulric h Bauer. T op ologically faithful image segmentation via induced matching of p ersistence barco des. In Andreas Krause, Emma Brunskill, Kyungh yun Cho, Barbara Engelhardt, Siv an Sabato, and Jonathan Scarlett, editors, In- ternational Confer enc e on Machine L e arning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA , v olume 202 of Pr o c e e dings of Machine L e arning R ese ar ch , pages 32698–32727. PMLR, 2023. [THC + 23] Y ee-Y ang T ee, Xuenong Hong, Deruo Cheng, Ch ye-Soon Chee, Yiqiong Shi, T ong Lin, and Bah-Hw ee Gw ee. P atch-based adv ersarial training for error- a ware circuit annotation of dela yered IC images. IEEE T r ans. Cir cuits Syst. II Expr ess Briefs , 70(9):3694–3698, 2023. [TUSP18] Bruno Machado T rindade, Eranga Ukwatta, Mik e Sp ence, and Chris P awlo wicz. Segmen tation of in tegrated circuit la youts from scan electron microscopy images. In 2018 IEEE Canadian Confer enc e on Ele ctric al & Computer Engine ering, CCECE 2018, Queb e c, QC, Canada, May 13-16, 2018 , pages 1–4. IEEE, 2018. [WZB + 24] Bo W en, Hao c hen Zhang, Dirk-Uw e G. Bartsch, William R. F reeman, T ruong Q. Nguy en, and Cheolhong An. T op ology-preserving image segmentation with spatial-a ware p ersistent feature matc hing. CoRR , abs/2412.02076, 2024. [YTG + 22] Zifan Y u, Bruno Machado T rindade, Mic hael Green, Zhikang Zhang, Pullela Sneha, Erfan Bank T av akoli, Christopher Pa wlowicz, and F engb o Ren. A data-driv en approac h for automated in tegrated circuit segmen tation of scan electron microscopy images. In 2022 IEEE International Confer enc e on Image Pr o c essing, ICIP 2022, Bor de aux, F r anc e, 16-19 Octob er 2022 , pages 2851– 2855. IEEE, 2022. [ZWD + 25] Mengdi Zh u, Ronald Wilson, Reiner N. Dizon-P aradis, Olivia P . Dizon-Paradis, Domenic J. F orte, and Damon L. W o odard. Genetic algorithm-assisted golden- free standard cell library extraction from SEM images. In 26th International Symp osium on Quality Ele ctr onic Design, ISQED 2025, San F r ancisc o, CA, USA, A pril 23-25, 2025 , pages 1–8. IEEE, 2025.

SAMSEM -- A Generic and Scalable Approach for IC Metal Line Segmentation

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment