SAMSEM -- A Generic and Scalable Approach for IC Metal Line Segmentation
In light of globalized hardware supply chains, the assurance of hardware components has gained significant interest, particularly in cryptographic applications and high-stakes scenarios. Identifying metal lines on scanning electron microscope (SEM) i…
Authors: Christian Gehrmann, Jonas Ricker, Simon Damm
IA CR T ransactions on Cryptographic Hardw are and Em bedded Systems ISSN XXXX-XXXX, V ol. 0, No. 0, pp. 1–24. DOI:XXXXXXXX SAMSEM – A Generic and Scalable App roach fo r IC Metal Line Segmentation Christian Gehrmann 1 , Jonas Ric k er 2 , Simon Damm 2 , Deruo Cheng 3 , Julian Sp eith 1 , Yiqiong Shi 3 , Asja Fisc her 2 and Christof P aar 1 1 Max Planc k Institute for Securit y and Priv acy (MPI-SP), Bo ch um, German y , { christian.gehrmann , julian.speith , christof.paar }@mpi-sp.org 2 R uhr Univ ersit y Bo c h um (R UB), Boch um, German y , { jonas.ricker , simon.damm , asja.fischer }@rub.de 3 Nan y ang T ec hnological Univ ersity (NTU), Singap ore, Singap ore, { deruo.cheng , yqshi }@ntu.edu.sg Abstract. In light of globalized hardware supply chains, the assurance of hardware comp onen ts has gained significant interest, particularly in cryptographic applications and high-stakes scenarios. Identifying metal lines on scanning electron microscop e (SEM) images of in tegrated circuits (ICs) is one essen tial step in v erifying the absence of malicious circuitry in c hips man ufactured in un trusted en vironmen ts. Due to v arying man ufacturing pro cesses and technologies, such verification usually requires tuning parameters and algorithms for each target IC. Often, a machine learning mo del trained on images of one IC fails to accurately detect metal lines on other ICs. T o address this challenge, w e create SAMSEM by adapting Meta’s Segment An ything Mo del 2 (SAM2) to the domain of IC metal line segmentation. Sp ecifically , we dev elop a multi-scale segmentation approach that can handle SEM images of v arying sizes, resolutions, and magnifications. F urthermore, we deploy a top ology-based loss alongside pixel-based losses to fo cus our segmen tation on electrical connectivit y rather than pixel-lev el accuracy . Based on a hyperparameter optimization, we then fine-tune the SAM2 mo del to obtain a mo del that generalizes across different tec hnology no des, man ufacturing materials, sample preparation metho ds, and SEM imaging tec hnologies. T o this end, we leverage an unpreceden ted dataset of SEM images obtained from 48 metal la yers across 14 different ICs. When fine-tuned on seven ICs, SAMSEM ac hieves an error rate as lo w as 0 . 72 % when ev aluated on other images from the same ICs. F or the remaining seven unseen ICs, it still achiev es error rates as low as 5 . 53 % . Finally , when fine-tuned on all 14 ICs, we observe an error rate of 0 . 62 % . Hence, SAMSEM prov es to b e a reliable to ol that significantly adv ances the frontier in metal line segmentation, a key challenge in p ost-manufacturing IC verification. Keyw ords: Hardware Assurance · Metal Line Segmentation · SAM2 1 Intro duction In tegrated circuits ( IC s) are deploy ed across all asp ects of our digital so ciet y . They provide the foundation not only for our electronic devices, suc h as smartphones and computers, but also for many safety and securit y applications in critical infrastructure and defense. T o this end, assurance of the absence of malicious circuitry in such IC s is essen tial to ensure the reliabilit y and trustw orthiness of their supply chain. This is particularly true for man y cryptographic applications, as hardware T ro jans in, e. g., a trusted platform mo dule ( TPM ), could easily b e lev eraged to subv ert securit y altogether. One means to ac hieve such assurance is through ph ysical inspection, i. e., destructively op ening up Licensed under Creative Commons License CC-BY 4.0. 2 SAMSEM – A Generic and Scalable Approac h for IC Metal Line Segmen tation IC s, grinding them down lay er b y la yer while contin uously taking images of ev ery la yer. No wada ys, suc h images are captured using scanning electron microscop es ( SEM s) due to ever-shrinking IC feature sizes in mo dern IC s. These images are then analyzed and compared against a ground truth [ PMB + 23 ], such as Graphic Data System I I ( GDSI I ) design files, or used to extract a gate-lev el netlist [ HCS + 18 ] for further analysis. This destructive imaging process often results in hundreds of gigab ytes of images that need to be analyzed in order to ensure the absence of malicious circuitry . T o this end, machine-learning models are increasingly deplo yed to impro ve accuracy in image segmen tation and recognition tasks while ensuring scalability [ RKA + 23 , HCS + 18 ]. F or the p olysilicon lay er that implemen ts the transistors of an IC , pattern-matching approaches are often deploy ed [ KSS + 20 , ZWD + 25 ]. In contrast, the metal lay ers mostly require segmen tation of metal lines and vias [ KSS + 20 , HCS + 18 ]. How ever, while this segmen tation of t ypically bright metal lines against a dark background ma y seem straigh tforw ard, it turns out to b e rather complicated in practice for multiple reasons: (i) Man ufacturing tec hnologies do not only v ary betw een IC s, but also b etw een differen t la yers of the same IC , resulting in v astly differen t shapes, see Figure 1 for examples of metal-lay er images. A segmen tation algorithm optimized for one la y er of one IC ma y not generalize to other la yers or other ICs without further adjustmen ts. (ii) Dela yering errors, artifacts from sample preparation, or dust particles can conceal critical connections and imp ede ev en adv anced image analysis algorithms. (iii) Structures on modern IC are so dense that adjacen t but unconnected metal lines ma y app ear to b e linked for an automated algorithm due to image distortions or noise, which are inheren t to SEM images. (iv) The c haracteristics of SEM images differ v astly dep ending on the employ ed SEM , its detectors, and image capturing settings such as magnification, dwelling time, resolution, and acceleration voltage. I m a g e G r o u n d T r u t h F C N S A M I C S A M 2 O u r s Figure 1: Metal la yer images from three differen t IC s as w ell as the corresp onding ground-truth mask and the image segmen tation masks pro duced from differen t methods. Non-ob vious electrically significan t differences ( ESD ) errors in the segmen tation are marked with orange circles. C. Gehrmann, J. Rick er, S. Damm, D. Cheng, J. Sp eith, Y. Shi, A. Fischer, C. P aar 3 Esp ecially for netlist extraction, a pro cess in which SEM images of all IC la yers are analyzed to reco ver a gate-lev el netlist description of the implemented circuit, we can tolerate only very few segmentation errors. Suc h errors w ould significantly impair the analysis of the recov ered netlist later on. Currently , the aforementioned issues are often addressed b y man ually annotating a subset of the images from each lay er of every IC and then using these annotated images to train a machine learning mo del (or adjust parameters in a classical image pro cessing pip eline) to segment the remainder of that la yer [ TUSP18 , NTCG24 ], resulting in significant manual ov erhead. Our Contributions. This w ork addresses a sub-problem in IC image analysis: developing a solution for metal line segmen tation that generalizes well across differen t la yers and IC s without requiring man ual in terven tion or retraining. This step is essential for reconstructing the in terconnections of transistors and, b y extension, standard cells on an IC , the analysis of which is crucial for hardw are assurance. Figure 1 depicts segmentation issues observed for tec hniques for metal line segmen tation from the literature when applied to previously unseen IC s. T o address such generalization issues across metal lay ers and IC s, we present SAMSEM 1 , which builds on top of Meta’s Segmen t Anything Mo del 2 ( SAM2 ) [ R GH + 25 ] for image segmentation. While SAM2 striv es for generalized segmen tation in natural images, w e fine-tune and ev aluate the mo del on metal line segmen tation using an unpreceden ted dataset comprising SEM images from 14 different IC s. T o deal with v arying resolutions, magnifications, and metal line structure sizes in SEM images, w e dev elop a multi-scale segmen tation approach and wrap it around the fine-tuned SAM2. F or gate-level netlist extraction, we are less in terested in pixel-accurate segmen tation than in ensuring correct electrical connectivit y . Therefore, for fine-tuning of the SAM2 mo del, we deploy a top ology-based loss function in addition to standard pixel-based losses. In particular, by considering the structure’s top ology in eac h image, this loss function explicitly penalizes both short and open connections b etw een metal lines. This w ay , we emphasize correct electrical connections ov er pixel accuracy in the segmen tation. In the con text of hardware assurance, this reduces the num b er of short circuits and op en connections observed in an extracted netlist, thereby significantly improving the effectiv eness of later analysis steps on the gate-level netlist. Our exp erimen ts are p erformed on a small cluster of eigh t Nvidia H100 mac hine learning accelerators, and fine-tuning on a single Nvidia H100 tak es around 72 hours when utilizing the full dataset. In summary , our main con tributions are: • Throughout our exp eriments, we utilize an unprecedented dataset of metal-lay er images from 14 different IC s, spanning tec hnology no des from 200 nm do wn to 20 nm . • W e develop SAMSEM b y adapting the SAM2 foundation mo del to the domain of IC metal-la yer image segmen tation via multi-scale segmentation, a top ology-based loss function, and fine-tuning. Thereb y , w e ac hieve an in-distribution error rate of 0 . 72 % when fine-tuning and ev aluating on seven different IC s, compared to 4 . 44 % for the b est-kno wn approac h that we could reproduce. • W e demonstrate that SAMSEM generalizes w ell, ev en for unseen IC s, and therefore do es not require retraining when analyzing new IC s. Our approach achiev es an out- of-distribution error rate of 5 . 53 % on the seven previously unseen IC s, representing a significan t impro vemen t o ver the 24 . 77 % error rate observ ed for the b est-repro duced approac h from the literature. 1 S egment A ny M etal-lay er S canning E lectron M icroscop e image 4 SAMSEM – A Generic and Scalable Approach for IC Metal Line Segmentation • W e publish a model fine-tuned on 90 % of all images across all 14 IC s, along with all our fine-tuning and b enchmarking scripts, so that others can build up on our w ork. This final mo del achiev es an in-distribution error rate of just 0 . 62 % . Structure of this P ap er. W e in tro duce the tec hnical background and discuss related work on metal line segmen tation, mac hine learning, and topology-based loss functions in Section 2 . In Section 3 , w e describ e our metho dology , including our multi-scale segmen tation pip eline, training and ev aluation datasets, the top ology-based loss function, and the fine-tuning pro cess, along with details on hyperparameter optimization. Section 4 pro vides details on our ev aluation procedures and giv es results for our segmen tation and generalization p erformance. Finally , w e discuss limitations and draw conclusions in Section 5 . 2 T echnical Background and Related W ork In this section, we review relev ant technical background and discuss related work on IC metal line segmen tation in Section 2.1 , Meta’s SAM2 in Section 2.2 , and topology-based loss functions in Section 2.3 . 2.1 Metal Line Segmentation IC s are built from a polysilicon lay er at the bottom that (for digital logic) implemen ts transistors and man y metal la y ers on top. These metal lay ers establish the electrical con- nection b et w een the transistors on the p olysilicon lay er, thereby forming an interconnected circuit. Hence, a full extraction of all metal lines from IC images is an essential step to ward hardware assurance. Mac hine-learning-based image segmentation algorithms are often emplo yed to this end, whic h pro duce a segmentation mask that clearly distinguishes b et ween metal lines and the bac kground. Ho wev er, the app earance of metal lines from differen t IC s, differen t la yers of the same IC , and even under differen t SEM capture settings ma y v ary significan tly . Therefore, metal line segmentation algorithms m ust often b e manually adjusted for the specific dataset on whic h they op erate. Related Wo rk. T rindade et al. propose classical image pro cessing for metal line segmen- tation [ TUSP18 ], which must b e tuned for every new dataset. F urthermore, they introduce the ESD error as a metric to fo cus on relev ant deviations of the segmen tation output from the ground truth. The metric counts the op en and short circuits in the segmen tation, as well as the false positives ( FP s) (i. e., segmen ted metal lines that do not exist) and false negatives ( FN s) (i. e., undetected metal lines in the original images). Thi s ESD metric is more relev ant for circuit extraction than pixel-accuracy metrics, as it fo cuses on errors that result in a fault y extracted circuit description. Cheng et al. are the first to use mac hine learning for metal line segmen tation b y prop osing a metho d based on h ybrid K-means clustering and support v ector mac hines [ CSL + 18 ]. In the same year, Hong et al. in tro duced the first deep-learning-based tec hnique and ev aluated it using a metric similar to the ESD metric [ HCS + 18 ]. Y u et al. built their segmentation solution around HRNet and also used the ESD metric for ev aluation [ YTG + 22 ]. A bit later, T ee et al. concluded that common training approaches do not properly capture the lo cal nature of ESD er- rors [ THC + 23 ]. Hence, they put forw ard a patch-based approach, aiming a discriminator net work—comparing the ground truth to intermediate segmentation results—at small patc hes of the input images. None of the aforemen tioned approac hes has been tested for whether they generalize to unseen IC s or even to other lay ers within the same IC s. Instead, they are ev aluated on a small dataset often obtained from just a single lay er of a single IC . W e assume this is the case for economic reasons, as sample preparation and imaging are tedious and costly . C. Gehrmann, J. Rick er, S. Damm, D. Cheng, J. Sp eith, Y. Shi, A. Fischer, C. P aar 5 Rothaug et al. approach metal line segmentation using unsup ervised learning to reduce the manual workload when facing new datasets [ RKA + 23 , R CK + 25 ]. How ever, some of their parameters m ust still b e adjusted for ev ery new dataset. They also used the ESD metric for ev aluation and released not only their code, but also their training dataset. T o com bat man ual parameter adjustments altogether, Ng et al. prop osed fine-tuning Meta’s Segmen t Anything Mo del ( SAM ), the predecessor of SAM2 , using a dataset from four differen t IC s, resulting in a model they coined Segmen t Anything Mo del for Integrated Circuit Image Analysis ( SAMIC ) [ NTCG24 ]. How ever, they did not ev aluate SAMIC ’s out-of-distribution performance on IC s not seen during fine-tuning, used a significantly smaller dataset than us, did not employ top ological loss functions, and did not address differen t scales of input images. In our exp erimen ts, w e sho w that SAMSEM is superior to their approach by around one order of magnitude. 2.2 Segment Anything Mo del 2 The Segment Anything Mo del 2 ( SAM2 ) is a large foundation segmentation mo del dev el- op ed b y Meta [ R GH + 25 ]. It was primarily designed for a broad range of segmentation tasks on real-world images and videos, but can b e fine-tuned for sp ecific applications. Compared to other segmen tation netw orks, SAM2 tec hnically requires a prompt input, whic h can b e a p oint, a mask, or a b ounding box of the ob ject to b e segmented. Figure 2: Mo del components of SAM2 and their interactions [ RGH + 25 ]. SAM2 comprises an image enc o der , memory attention , mask de c o der , pr ompt enc o der , memory enc o der , and memory b ank ; see Figure 2 . The image enc o der works in conjunction with the memory attention to generate the image em b eddings for the mask decoder. Memory attention is used for videos and has no effect on image segmentation; hence, w e disregard it. F or image enco ding, a Hiera image enco der [ RHB + 23 ] is used, pre-trained with a masked auto enco der ( MAE ) [ HCX + 22 ]. This image enco der extracts visual features from the input images and pro vides these feature em b eddings as unconditioned tok ens to the mask deco der (via the memory atten tion mo dule). Next, the pr ompt enc o der accepts a range of user input prompts, suc h as masks, p oints, and boxes. F or our work, we focus on input points to generate p ositional encodings. The p oin t prompt can b e either p ositive or negativ e, indicating whether the foreground or bac kground should b e selected. The mask de c o der generates the final segmentation masks based on the feature embeddings produced b y the image enco der and the user prompts pro cessed b y the prompt encoder. While the pro duced segmentation mask is usually binary , confidence scores for each pixel indicating its likelihoo d of b elonging to the segmen ted object can also be accessed. Related W ork. Numerous works hav e built up on SAM2 to improv e its p erformance in sp ecific domains. F or example, Mandal et al. [ MKP25 ] proposed SAM2LoRA to efficien tly fine-tune SAM2 for retinal fundus segmentation b y only considering sp ecific parameters for fine-tuning and freezing the remaining ones. T o b etter cop e with v arying image resolutions, Liu et al. [ L Y vD + 24 ] presen t WSI-SAM, which aims to handle high-resolution whole-slide images in the context of pathology . Although their approach implies changes to the 6 S AMSEM – A Generic and Scalable Approac h for IC Metal Line Segmen tation SAM arc hitecture, it do es not require full retraining of the entire mo del. In a similar v ein, Gao et al. [ GZYL24 ] improv e SAM for salient ob ject detection by b etter capturing m ulti-scale features and preserving fine-grained details. HQ-SAM b y Ke et al. [ KYD + 23 ] adds a learnable high-quality output token to SAM , and fuses features from different la yers to pro duce m uch more accurate and detailed segmentation masks, especially for ob jects with intricate shap es Finally , Meta has released the Segment Anything Mo del 3 ( SAM3 ) [ CGH + 25 ] shortly before the submission of our w ork. At the time of writing, SAM3 is a v ailable only for fine-tuning on request and is not y et publicly accessible. 2.3 T op ology-Based Loss Functions Pixel-based loss functions lik e the Dice loss [ SL V + 17 ] or the in tersection o ver union ( IoU ) loss [ R TG + 19 ] are commonly used for training image-related mac hine learning mo dels. These loss functions penalize pixel differences betw een the mo del output (e. g., a segmentation mask) and the ground truth. In con trast to pixel-based loss functions, top ology-based loss functions prioritize top ological accuracy ov er pixel-level accuracy . Suc h loss functions are often researc hed in the domain of biological cell boundary detection and automated street recognition in satellite images [ HLSC19 , HWL + 21 ]. In our case, we are in terested in correct electrical connections rather than the accurate segmentation of ev ery pixel on an IC image. Hence, top ology-based loss functions seem to be a natural fit. Related Wo rk. In their seminal w ork on top ology-based image segmentation, Hu et al. prop ose a loss function based on Betti num b ers that count structures suc h as the connected comp onen ts and holes in an input image [ HLSC19 ]. They use p ersistence diagrams to measure the topological similarity betw een the ground truth and the mo del prediction. Later, Shit et al. prop osed clDice as a loss function specifically designed for tubular struc- tures such as v ascular net works and roads [ SPS + 21 ]. The loss compares the cen terline of a predicted tubular structure (i. e., its skeleton) to the ground truth. Liu et al. later enhanced a skeleton-based loss function with better a wareness of structure b oundaries [ LMB + 24 ]. The DMT loss proposed b y Hu et al. uses discrete Morse theory to iden tify un wan ted critical p oints in the predicted segmentation [ HWL + 21 ]. Similar to Hu et al. [ HLSC19 ], p ersistence diagrams of the prediction are compared to the ground truth. Hu et al. further adv ance the field by using homotopy warping to consider not only the n umber of topological features, but also their geometric placement [ Hu22 ]. It identifies top ologically critical pixels by warping the predicted mask in to the ground truth. Stuc ki et al. built up on the initial publication of Hu et al. [ HLSC19 ]. Instead of simply measuring o verall top ological counts (i. e., Betti num b ers), it matc hes individual top ological features betw een the predicted segmentation and the ground truth in a spatially consisten t wa y , defining a Betti matc hing error that can be used as a differen tiable loss to impro ve the segmentation’s top ological correctness [ SPS + 23 ]. Similarly , W en et al. enco de spatial a wareness with p ersistence diagrams to construct their loss function [ WZB + 24 ]. Expanding on their o wn work, Stucki et al. adjust Betti matching to 3D segmen tation b y dev eloping a faster implementation of the Betti matc hing loss, making topology-based loss functions practical for volumetric data [ SBPB24 ]. Berger et al. then extend Betti matc hing to multi-class segmentation by reducing an n -class segmentation to n single-class segmen tation tasks and applying induced barco de matchings for eac h class [ BLS + 24 ]. Finally , T op ograph b y Lux et al. enco des the topology as a comp onent graph, with nodes corresp onding to pixels or regions thereof and edges representing adjacency [ LBW + 25 ]. By building their loss functions solely on graph algorithms, they ac hieve b etter efficiency than most other algorithms while pro viding strong top ological guaran tees for segmentation. C. Gehrmann, J. Rick er, S. Damm, D. Cheng, J. Sp eith, Y. Shi, A. Fischer, C. P aar 7 3 Metho dology W e fo cus on segmenting metal lines while ignoring vias, since vias can t ypically be detected using thresholding or machine learning [ SLG20 ]. In this section, we describ e ho w we construct SAMSEM to produce reliable segmen tation results, even for unseen IC s. After motiv ating fine-tuning in Section 3.1 and introducing our multi-scale segmentation pip eline in Section 3.2 , we describ e our dataset comprising 14 IC s in Section 3.3 . Next, in Section 3.4 , w e explain our fine-tuning, data augmen tation, and (top ological) loss function. W e then rep ort the b est-suited parameters for our segmen tation pipeline as determined through hyperparameter optimization in Section 3.5 . Finally , w e describ e ho w we generate segmen tation masks with SAMSEM in Section 3.6 . 3.1 Motivation SAM2 w as trained on a large and diverse dataset of natural images. It w as designed to segmen t RGB images of ob jects, animals, or people. SAMSEM is based on SAM2 , but targets the non-interactiv e segmen tation of metal lay er images from IC s tak en with SEM s. These images differ significantly in that they are grayscale and exhibit rep eating structures, but little v ariation in their features. Hence, when simply segmen ting on our metal la yer images, SAM2 (without fine-tuning) pro duces catastrophic results with a segmen tation pixel accuracy (P A) of only 0.639 and an ESD error rate of 75 . 6 % , see T able 2 . T o truly lev erage the p ow er of SAM2 for IC metal lay er image segmen tation, w e m ust adapt the mo del to our sp ecific domain and fine-tune the SAM2 mo del on representativ e data. 3.2 Segmentation Pip eline W e no w discuss the segmen tation pipeline of our metal-lay er segmen tation system based on SAM2 . The pip eline takes into account that SEM images of metal lay ers ma y differ significan tly in shap e, size, and resolution. 3.2.1 Mo del 1 – Segmenting Original-Size Images Initially , w e fine-tuned SAM2 b y simply providing the original-size images obtained from an SEM , along with the ground truth. W e refer to this mo del as model 1 . The SAM2 image enco der op erates on 1024 × 1024 -pixel images; hence, images of other sizes are simply scaled to fit its resolution. In our setting, w e often encoun ter high-resolution images of 10 , 000 × 10 , 000 pixels or more, as these are typical output sizes for SEM s. When fed to the image encoder, these images are simply scaled down to 1024 × 1024 pixels. How ever, suc h metal lay er images often depict fine-grained structures that are then brough t closer together (in terms of pixel distance) or ev en fused during do wnscaling. This, in turn, A causes short circuits in the segmentation when compared to the ground truth, thereb y significan tly increasing the ESD error rate; see the fused metal lines in Figure 3c . (a) Input image. (b) Ground truth. (c) Segmentation mask. Figure 3: A – ( a ) to ( c ) depict short circuits in the segmen tat ion mask resulting from do wnscaling the input image to fit the shap e exp ected b y the image encoder. 8 S AMSEM – A Generic and Scalable Approac h for IC Metal Line Segmen tation 3.2.2 Mo del 2 – Segmenting Images in Smaller Patches A straigh tforw ard approac h to address this issue w ould b e to simply cut the input images in to smaller patches and feed only these patc hes into the SAM2 image enco der for segmen- tation. T o this end, w e fine-tuned a second mo del, named mo del 2 , on 512 × 512 -pixel patc hes only . The patches are cut out with at least 10 % o verlap and upscaled to 1024 × 1024 pixels to fit the image encoder shape. The upscaling prov ed useful as it increases the pixel distance b etw een closely neighboring metal lines. Mo del 2 significan tly impro v es up on the issue observ ed in mo del 1 and thereb y reduces the n umber of shorts in the segmen tation. Ho wev er, it also generates a different class of errors that drastically increases the num b er of false p ositives in the resulting segmen tation mask. Analyzing these errors, w e found some input image patches that comprise primarily , or even entirely , of either bac kground or foreground structures. F or these patches, our mo del 2 hallucinates segmentations that do not exist in the actual input images. Here, we observed tw o t yp es of errors stemming from these hallucinations: Firstly , B the segmen tation mask may end up b eing mostly in verse to the ground truth, e. g., bac kground is segmented as a metal line, see Figures 4a to 4c . Secondly , C parts of the patc h ma y be wrongly classified as a metal line with sp eckled transitions to bac kground, see Figures 4d to 4f . Here, each white speckle would b e in terpreted as a false p ositiv e b y the ESD metric, thereby v astly increasing our error rate. (a) Input image. (b) Ground truth. (c) Segmentation mask. (d) Input image. (e) Ground truth. (f ) Segmentation mask. Figure 4: B – ( a ) to ( c ) depict bac kground being segmen ted as a metal line b ecause of the lack of structures in the input image, resulting in FP s or short circuit. C – ( d ) to ( f ) sho w white sp eckles around the correctly identified metal line in the segmentation mask, resulting in v ast amoun ts of false positives. 3.2.3 Multi-Scale Segmentation Approach T o address b oth issues at once, we introduce a multi-scale segmentation approac h, as depicted in Figure 5 . W e sim ultaneously segmen t the full-size image using model 1 and 512 × 512 -pixel patches extracted from the input image using mo del 2 . This multi-scale approac h ensures that our fully-automated segmentation algorithm p erforms w ell for b oth small and large structures in arbitrary metal lay er images and op erates reliably , indep enden t of a SEM ’s magnification and resolution. By w orking on full-size images, w e ensure that SAM2 is provided sufficient context for larger structures that extend b ey ond a single patc h, while working on patches improv es segmentation p erformance for fine-grained structures. This approac h produces t wo segmentation masks for eac h input image: one C. Gehrmann, J. Rick er, S. Damm, D. Cheng, J. Sp eith, Y. Shi, A. Fischer, C. P aar 9 comp osed of 512 × 512 -pixel patc hes and one directly corresp onding to the segmen tation of the original-sized image. Both masks are analyzed patch-b y-patch to construct the final segmentation mask. Since SAM2 alw ays pro duces segmen tation masks of the size of the input image, the mask pro duced by model 1 is cut into patc hes of 512 × 512 pixels. The masks pro duced by mo del 2 are already of size 512 × 512 . Our decision algorithm, whic h w e present hereafter, then comp oses the final segmen tation mask by choosing patc hes from either model 1 or mo del 2 based on light w eight quality metrics. This c hoice is made for eac h segmen tation mask patch, so the final mask can comprise patches from both approaches. Simply put, the decision algorithm fav ors segmentation mask patc hes from model 2 for fine-grained structures in images, while selecting patches from model 1 for images depicting larger structures or lac king structure altogether. Thereb y , w e combine the adv antages of b oth approac hes while mitigating the issues depicted in Figures 3 and 4 . C B A A 1 2 Figure 5: W orkflow of our multi-scale segmentation approach. 3.2.4 Deciding Betw een P atches from Mo del 1 o r Mo del 2 Our decision algorithm, which chooses betw een patc hes from mo dels 1 and 2 , is based on classical computer vision tec hniques applied to the segmen tation mask patc hes produced b y either approac h. Giv en that the errors B and C alw ays pro duce repro ducible patterns, detecting their presence is straigh tforward. T o this end, we count the small comp onen ts within a segmen tation mask patc h to determine the n umber of sp eckles in the segmen tation. Our experiments hav e sho wn that many such sp ec kles indicate a noisy segmen tation that should b e disregarded. F or a detected comp onent to b e counted as a speckle, it must b e at most 16 pixels in size, b ecause larger comp onents are usually v alid segmen tations. Figure 6 depicts the decision diagram for how a segmentation mask patch from mo dels 1 or model 2 is selected. Here, the property “sp eckled” refers to a patch containing at least 50 sp eckles for model 1 and 50 speckles for model 2 . F urthermore, t wo patches (one from eac h mo del) “agree” if at least 60 % of their pixels are iden tical. These thresholds were determined through a grid search in whic h w e automatically tested differen t parameter v alues to minimize the n um b er of ESD errors across segmen tation patc hes. If both the patc h from model 1 and the one from mo del 2 are considered “sp ec kled”, an error is rep orted, and the resp ective patch of the original image is flagged for man ual inspection. 3.3 Datasets Our v ast and diverse dataset con tains images from 48 metal lay ers of 14 differen t IC s. T o the b est of our knowledge, previous w ork has op erated on images of at most four IC s, 10 SAMSEM – A Generic and Scalable Approac h for IC Metal Line Segmen tation speckled 1 speckled 2 speckled 2 & agree 2 1 choose or 1 2 choose 2 choose 1 ERROR no no no yes yes yes no yes Figure 6: Decision diagram illustrating how patches from either model 1 or 2 are c hosen, or an error is pro duced, based on their segmen tation qualit y . cf. Ng et al. [ NTCG24 ]. Our dataset includes these four IC s, one IC published by Rothaug et al. [ RKA + 23 ], and multiple la y ers from 9 additional IC s captured by our o wn team. While details on these IC s cannot be disclosed for legal reasons, they range from around 200 nm do wn to 20 nm structure size and are composed of v arying materials. The metal la yer images w ere captured using differen t SEM s, differen t capturing settings, and also differen t sample preparation tec hniques. As shown in Figure 1 , the resulting dataset is quite diverse in the shap es, num b ers, and sizes of the metal lines. This dataset div ersity is essen tial for fine-tuning the SAM2 mo del so that it p erforms well ev en on IC s it has not seen during fine-tuning. Finally , the dataset enables us to effectiv ely test b oth the in-distribution p erformance of SAMSEM on IC s it has seen during fine-tuning and its out-of-distribution p erformance on unseen ICs, see Section 4 . 3.3.1 Dataset Comp osition Ov erall, a small fraction of the metal la yer images captured from eac h IC has b een semi- automatically annotated and man ually verified to generate a ground truth for training, v alidation, and testing. Hence, each image in our dataset is accompanied by a corresp onding binary ground-truth mask. T able 1 lists the absolute num b er of annotated images for each IC , the num b er of 512 × 512 pixel image patc hes, and the num b er of metal lay ers from whic h these images ha ve b een tak en. Due to the v ariet y of capture tec hniques used for the differen t IC s, the full-size images v ary widely in size. Hence, the n um b er of patc hes is b etter suited for a comparison in terms of the amoun t of a v ailable data per IC . F or IC s 8 to 14, rather few images hav e been annotated. T able 1: Number of images a v ailable from each IC , n um b er of 512 × 512 pixel patc hes, and num b er of metal la yers from whic h the images w ere tak en. IC 1 2 3 4 5 6 7 # images 150 70 90 50 50 140 321 # patc hes 4050 1120 1950 1000 1000 2600 23,112 # la yers 3 7 2 1 4 3 1 IC 8 9 10 11 12 13 14 # images 12 5 5 3 3 28 10 # patc hes 756 342 254 189 48 2735 810 # la yers 4 5 5 3 3 6 1 C. Gehrmann, J. Rick er, S. Damm, D. Cheng, J. Sp eith, Y. Shi, A. Fischer, C. P aar 11 3.3.2 Dataset Splits F or fine-tuning (see Section 3.4 ), the hyperparameter optimization (see Section 3.5 ), and ev aluation (see Section 4 ), we split the original-size images from IC s 1 to 7 into three distinct datasets: 70 % of the images of eac h IC are used for training, 10 % for v alidation and mo del selection during the hyperparameter searc h, and 20 % for testing and ev aluation of the in-distribution segmentation accuracy on IC s that the mo del has seen during fine- tuning. T o ev aluate the out-of-distribution performance of SAMSEM on unseen IC s, all images of IC s 8 to 14 are used for testing. In particular, none of the images from IC s 8 to 14 are used for training, v alidation, or mo del selection. In a separate experiment, we use 90 % of all images from all 14 IC s to fine-tune our final mo del, using the h yp erparameters determined b eforehand (see Section 4.5 ). In this case, the remaining 10 % of the images are used to ev aluate the segmentation accuracy of this final mo del. 3.4 Fine-T uning In this section, w e pro vide details on how we fine-tune SAM2 for IC metal la yer image segmen tation. In particular, w e discuss the need for fine-tuning, the data augmen tation tec hniques w e deplo yed, and our choice of loss functions. 3.4.1 Selection of Mo del Components F or SAMIC [ NTCG24 ], Ng et al. only fine-tuned the mask decoder of SAM (v ersion 1). Ho wev er, we argue that to thoroughly inv estigate the potential of SAM2 for metal line segmen tation, we also need to consider the tw o remaining mo del comp onen ts relev ant to image segmen tation, cf. Section 2.2 . Hence, w e now inv estigate the individual impact of the mask decoder, prompt encoder, and image enco der on SAMSEM’s segmen tation p erformance. T o this end, we fine-tune sev en SAM2 mo dels with all p ossible combinations of these three components, and compare them against the off-the-shelf SAM2 mo del as trained b y Meta in T able 2 . The ev aluation setup is equiv alent to the in-distribution tests p erformed later in Section 4.1 , i. e., w e train on the training set comprising 70 % of the images from IC s 1 to 7 and test on the test set con taining 20 % of the images from the same IC s. W e compare the impact of the components using standard pixel-based pixel accuracy ( P A ), Dice [ SLG20 ], and IoU [ R TG + 19 ] metrics, as well as the ESD error rate. The ESD error rate is given as the relative num b er of ESD errors p er metal line. T able 2: Ablation study for SAMSEM’s p erformance when fine-tuning only selected comp onen ts of SAM2 . The first line refers to the default SAM2 mo del without fine-tuning. fine-tuned components P A ↑ Dice ↑ IoU ↑ ESD ↓ (%) training time mask prompt image deco der enco der enco der □ □ □ 0.639 0.111 0.067 75.6 - ■ □ □ 0.948 0.933 0.885 12.1 43 h □ ■ □ 0.673 0.332 0.236 66.1 39 h ■ ■ □ 0.944 0.927 0.879 11.5 34 h □ □ ■ 0.972 0.960 0.924 0.8 51 h ■ □ ■ 0.971 0.957 0.921 1.0 42 h □ ■ ■ 0.969 0.954 0.917 1.4 43 h ■ ■ ■ 0.972 0.960 0.925 0.7 44 h F rom T able 2 , w e conclude that the image encoder has the most impact on the segmen tation accuracy , ac hieving an ESD error rate of 0 . 8 % . Fine-tuning the mask deco der also significan tly reduces the error rate to 12 . 1 % . The prompt enco der has little 12 SAMSEM – A Generic and Scalable Approac h for IC Metal Line Segmen tation impact on segmen tation and still results in a 66 . 1 % ESD error rate. Combining differen t enco ders and deco ders naturally pro duces better results than ev aluating them in isolation. W e also see that com bining all three comp onents yields the lo west ESD error rate of 0 . 7 % and the best pixel accuracy , although this is only a marginal improv ement ov er fine-tuning the image enco der alone. Still, given that there is no significan t o verhead in training time when combining all three approaches compared to fine-tuning the image encoder alone, we decided to fine-tune the mask deco der, prompt enco der, and image encoder together. 3.4.2 Fine-T uning Pro cess SAM2 comes with four differen t “c heckpoints”, i. e., mo del sizes. F or SAMSEM, we chose to fo cus on the large c heckpoint, which has the most tunable parameters, as it yielded the most promising results in initial experiments. Fine-tuning is p erformed using PyT orch v2.7.1 and CUD A 13.0. All fine-tuning runs are executed on a cluster of Nvidia H100 mac hine-learning accelerators, whic h comprises a mix of SXM cards with 80 GB and NVL cards with 94 GB of memory . F or all our experiments, we set a fixed batch size of 12 , as this is the maximum that fits in the 80 GB of memory on the SXM cards. The fine-tuning process is as follo ws: First, w e retriev e segmen tation masks from the curren t mo del. T o this end, w e feed the input images of eac h batc h to the image enco der. F or eac h of these images, we then get embeddings of a p oin t prompt from the prompt enco der. Here, w e sample a random p oin t ( x, y ) from the ground-truth foreground (i. e., the p oint lies on a metal line) and use it as a p ositive prompt to the prompt enco der. Next, w e feed the image and prompt em b eddings to the mask deco der, whic h produces lo w-resolution masks. These masks are then p ostpro cessed b y in ternal SAM2 functions to obtain three prediction masks for eac h image in the batch, as w e enabled m ulti-mask output for better segmen tation results [ R GH + 25 ]. Finally , w e compute the loss betw een the ground truth and the segmen tation mask, then perform bac kward propagation and optimize the targeted mo del components using A dam W [ LH19 ]. 3.4.3 Data Augmentation W e lev erage data augmentation during fine-tuning to make our fine-tuned SAM2 mo del more robust to small p erturbations in input images and to expand the training dataset. T o this end, w e use T orch vision from PyT orch to apply transforms to the training data b efore feeding it to the fine-tuning algorithms. Belo w, w e list the individual transforms that we deploy in the order in which they are applied to the input data: 1. RandomResizedCrop : size = 1024 , scale = [0 . 99 − 0 . 5 · intensity , 0 . 99] 2. GaussianBlur : kernel_size = (5 , 9) , sigma = 5 . 0 · intensity 3. ColorJitter : brightness = 0 . 75 · intensity , contrast = 0 . 5 · intensity 4. RandomVerticalFlip : probability = 0 . 5 · intensity 5. RandomHorizontalFlip : probability = 0 . 5 · intensity 6. GaussianNoise : mean = 0 . 0 , sigma = 0 . 5 · intensity , clip = T rue The sp ecific parameters were manually selected b y domain exp erts to serve as the upp er b ound for data augmen tation. The experts selected these parameters so that, for an in tensity v alue of 1 , the resulting images could still (but barely) b e interpreted by a human analyst. All transforms share a common data augmen tation in tensity v alue that can be set to an y v alue in (0 , 1] and is determined by our h yp erparameter optimization, as sho wn in Section 3.5 . During eac h iteration, we randomly apply all data augmentation transforms to the input image or none at all. The probabilit y of applying data augmen tation to an image is also determined through hyperparameter optimization. C. Gehrmann, J. Rick er, S. Damm, D. Cheng, J. Sp eith, Y. Shi, A. Fischer, C. P aar 13 3.4.4 Loss Functions F or our pixel-based loss function L pixel , w e c ho ose a com bination of the binary cross en tropy ( BCE ) loss L BCE [ MMZ23 ] and the Dice loss L Dice [ SL V + 17 ] based on a blending parameter α ∈ (0 , 1) that is determined by our hyperparameter searc h: L pixel = α · L BCE + (1 − α ) · L Dice . SAMIC [ NTCG24 ] additionally employs the IoU loss [ R TG + 19 ], but our segmen tation accuracy did not improv eme when incorp orating the IoU in to L pixel . W e w ant to promote correct connectivity and penalize shorts and op en circuits during training to reduce the ESD error rate. Since the ESD error computation itself is not differen tiable, it cannot b e used as a loss function righ t a wa y . Instead, we need to find a differentiable loss term that behav es similarly to the ESD error. T o this end, w e supplemen t our pixel-based loss with a top ological loss. After careful consideration and consultation with experts, w e selected the Betti matc hing loss [ SPS + 23 ], L Betti , for this purp ose. W e also explored tw o other topology-preserving loss functions, namely T op oLoss b y Hu et al. [ HLSC19 ] and T op ograph by Lux et al. [ LBW + 25 ], but decided against them for runtime and effectiveness reasons. Generally , such loss functions fo cus on correct connectivit y , but still need to b e combined with pixel-based losses to enforce lo cal accuracy . In the follo wing, w e only consider the application of Betti matc hing to images. The Betti matching loss is based on concepts from p ersistent homology , which examines ho w top ological features emerge and disappear as a filtration parameter ϵ v aries. In particular, it trac ks when top ological features, such as connected comp onents, cycles, or holes, are born and die along the filtration. This filtration can, for example, b e induced b y thresholding a scalar-v alued function such as the model’s prediction likelihoo ds. The n umber of detected top ological structures (e. g., corresp onding to metal lines in our images) then dep ends on the c hosen threshold v alue. F or eac h topological feature, a pair ( b, d ) is stored that records its birth time b (i. e., the v alue of ϵ at whic h the feature first app ears) and its death time d (i. e., the v alue of ϵ at which it merges in to an older feature or b ecomes top ologically trivial). T racking many such features sim ultaneously pro duces a so-called p ersistence barco de, where each bar represen ts the birth and death times of a single feature. Betti n um b ers β k ( ϵ ) summarize this information by counting the n umber of k -dimensional top ological features that are aliv e at a given v alue of ϵ . F or images, we only consider k ∈ { 0 , 1 } , where β 0 ( ϵ ) counts connected components and β 1 ( ϵ ) counts lo ops or holes. The Betti matc hing loss go es one step further by spatially matc hing topological features from one image (e.g., the ground truth G ) to corresp onding features in another image (e.g., the prediction mask L ). T o this end, persistence barcodes are computed for b oth images. F or the loss computation, an optimal matching b et ween the t wo p ersistence barcodes is determined, and the resulting matching error is calculated. This error is then used to construct a differen tiable, efficien t, and top ology-aw are loss function. The Betti matc hing loss L Betti is already implemen ted as a Python package 2 that is based on the w ork of Stuc ki et al. [ SPS + 23 , SBPB24 ] and Berger et al. [ BLS + 24 ], making it straightforw ard to use. It comes with a few t weakable parameters relev ant to our h yp erparameter searc h. Firstly , the filtration_type defines ho w the comparison image C is computed as C ← min ( G, L ) , i. e., the element-wise minimum of ground truth G and prediction mask L , whic h in turn go v erns the Betti matching process. It can tak e the v alues superlevel (features appear as input v alues decrease), sublevel (features appear as input v alues increase), and bothlevels (applies b oth filtration types and com bines results). Secondly , the parameter barcode_length_threshold go verns the robustness of the Betti matching loss against short-lived top ological features that may arise from unclean or noisy prediction masks. Lastly , push_unmatched_to_1_0 determines whether 2 https://pypi.org/project/topolosses/ 14 SAMSEM – A Generic and Scalable Approac h for IC Metal Line Segmen tation unmatc hed birth p oints are pushed tow ard 1 while unmatc hed death p oin ts are pushed to ward 0 ( True ) or whether b oth points are just pushed together ( False ). In the end, we construct our final segmen tation loss L seg used for fine-tuning as L seg = L pixel + λ · L Betti with λ ∈ [0 , 1] being a blending parameter determined b y the hyperparameter searc h. 3.5 Hyp erpa rameter Optimization T o determine the b est-suited hyperparameters for fine-tuning SAM2 , w e conducted a h yp erparameter search using the hyperband pruner from the Optuna framework [ ASY + 19 ]. This pruner attempts to minimize an ob jective function, specifically the relativ e num b er of ESD errors in relation to the total n umber of metal lines, as computed on our v alidation dataset from ICs 1 to 7. In total, the h yp erparameter searc h ran for sev en da ys. 3.5.1 Setup Due to runtime limitations, we opted for a tw o-step h yp erparameter optimization: (i) First, w e searched for the b est parameters related to data augmen tation and pixel-based loss functions b y rep eatedly fine-tuning SAM2 on our training dataset from IC s 1 to 7. T o this end, w e set the maximum n umber of epo c hs p er h yp erparameter optimization run to 25 and allo wed pruning only after 5 ep o chs. During this first step, we set λ = 0 and hence L seg = L pixel . This is in line with recommendations from Berger et al. to add Betti matc hing to the loss computation only after pixel-based losses hav e stabilized, to ac hieve p erformance gains. T o iden tify the b est parameter set, we ev aluated eac h model created during the hyperparameter searc h after every ep o ch using our v alidation dataset from IC s 1 to 7. W e then selected the best-p erforming parameter set b y analyzing the p erformance of all fine-tuned mo dels at ev ery epo ch. Using these selected parameters, w e fine-tuned the off-the-shelf SAM2 for 50 ep o c hs. Here, w e found that this fine-tuned mo del performs b est after 35 training epo chs; hence, w e selected this v ersion as a starting p oint for the second step. (ii) Finally , we enabled the Betti matching loss for contin ued fine-tuning of this selected model and started a second hyperparameter optimization to determine suitable hyperparameters for this topological loss function. The second h yp erparameter optimization ran for 15 additional ep o chs, bringing the total to up to 50 fine-tuning ep o chs p er trial across b oth steps. 3.5.2 Results F or the first hyperparameter optimization step (i), we ran 56 Optuna trials, eac h yielding a fine-tuned mo del trained for up to 25 epo chs. In total, 49 fine-tuning trials were pruned b y Optuna before completing all epo chs. F or the b est trial, the objective function, i. e., the percentage of ESD errors in relation to the total n umber of metal lines, reac hed its minim um at ep o ch 22 with a v alue of 0 . 659 % . Based on our ev aluation of this trial, we determined the probabilit y of data augmentation b eing applied to be 38 . 5 % and the data augmen tation in tensit y to be 0 . 61 . F or the pixel- based loss function L pixel and the A dam W optimizer, we fix the learning rate at 1 . 111 · 10 − 5 , the w eight decay at 2 . 059 · 10 − 6 , and a blending parameter v alue of α = 0 . 6 . F urthermore, w e observ ed that the learning rate has the greatest impact on segmen tation performance, with v alues abov e 10 − 3 often leading to lo cal minima, resulting in the fine-tuned mo del pro ducing only completely black segmentation masks. In the second step (ii), w e conducted a total of 75 Optuna trials o ver 5 da ys, with 58 trials b eing pruned b efore completion. F or the b est trial, w e rep ort λ = 0 . 375 as the blending parameter for L seg . F urthermore, w e determined the optimal parameter C. Gehrmann, J. Rick er, S. Damm, D. Cheng, J. Sp eith, Y. Shi, A. Fischer, C. P aar 15 set for the Betti matc hing loss in our application. T o this end, we c ho ose sublevel as the filtration_type , 0 . 345 as the barcode_length_threshold , and enabled pushing unmatc hed birth p oints to 1 and death p oints to 0 ( push_unmatched_to_1_0 ). 3.6 Segmenting Images Besides the input image, SAMSEM also requires at least one point prompt pinpointing the structure to segment. T o generate an effectiv e point prompt, w e need to ensure that the p ositive prompt w e pro vide is actually located on one of the metal lines in the input image. T o this end, w e use light w eight classical image analysis on the input images fed to b oth mo dels 1 and 2 to identify parts that are highly likely to b e foreground, i. e., b elong to a metal line. W e do not aim for a fully accurate segmen tation at this point, w e just need to iden tify some foreground areas, but not all of them. The binary mask is generated b y aggressively thresholding the brightest pixels in the input image. In addition, we apply denoising, remov e small ob jects, and apply ligh t morphology to eliminate isolated pixels, fill tin y gaps, and smo othen ob ject b oundaries. W e then place five random point prompts within the iden tified foreground areas and feed them to the fine-tuned model along with the input image. In our experiments, w e observed that pro viding more point prompts do es not impro ve segmentation results. SAMSEM then pro duces m ultiple segmen tation masks, along with scores rating their quality . W e alw ays return the mask with the highest score as the final segmentation mask. 4 Evaluation First, we ev aluate SAMSEM’s in-distribution performance using previously unseen SEM images from IC s 1 to 7, as sho wn in Section 4.1 . Next, w e in vestigate SAMSEM’s out-of- distribution performance when faced with metal la yer images from previously unseen IC s in Section 4.2 . W e then analyze failure cases in Section 4.3 , test different dataset splits with reduced num b ers of IC s used for fine-tuning to b etter understand how SAMSEM generalizes to unseen IC s in Section 4.4 , and present its segmentation accuracy when trained on our full dataset of 14 ICs in Section 4.5 . 4.1 In-Distribution Perfo rmance on Seen ICs In this section, we ev aluate the in-distribution performance of SAMSEM and compare it to previous w ork and established image segmen tation approac hes. T o this end, w e used the training datasets for IC s 1 to 7 that we established in Section 3.3 for all considered tec hniques. T o compare the p erformance of SAMSEM with other techniques, we fine-tuned SAMIC [ NTCG24 ] using the co de provided b y Ng. et al. on request. F urthermore, we trained U-Net, DeepLabV3, and F CN using the co de published b y Rothaug et al. [ RKA + 23 ]. A t the time of writing, w e w ere unable to execute the unsupervised method proposed b y Rothaug et al. on our dataset, as it appeared to get stuck in lo cal minima and just pro duced all-blac k segmen tation masks during our experiments. W e then ev aluated eac h mo del on our test dataset from IC s 1 to 7 and measured the accuracy of the resulting segmen tation against the ground truth. Refer to T able 3 for the results. Hence, in this exp eriment, we assess ho w w ell SAMSEM p erforms on metal-la y er images similar in structure to those on whic h it was trained. As w e are more interested in electrical correctness than in pixel-accuracy , we follow T rindade et al. [ TUSP18 ] and adopt their ESD metric for ev aluation. W e leverage the implemen tation of Rothaug et al. [ RKA + 23 , R CK + 25 ] to coun t the num b er of op ens, shots, FP s, and FN s in the segmen ted images compared to the ground truth. T o this end, w e rep ort the absolute n umber of ESD errors observ ed in our test set, the relativ e n umber of 16 SAMSEM – A Generic and Scalable Approac h for IC Metal Line Segmen tation T able 3: In-distribution performance of SAMSEM compared to other approaches. All mo dels were trained and ev aluated on the same training dataset from IC s 1–7. W e report the ESD error rate in tw o forms: as the absolute num b er of ESD errors observed for our test set, and as the relative num b er of ESD errors p er metal line, expressed as a p ercen tage. The total n umber of metal lines in this test set is 36 , 413 . F urthermore, w e presen t commonly used metrics for pixel-lev el segmen tation accuracy . ESD ↓ pixel ↑ total % opens shorts FPs FNs P A Dice IoU SAMSEM 263 0.72 27 146 59 31 0.972 0.960 0.925 - only pixel loss 323 0.88 29 211 54 29 0.972 0.961 0.926 - single model 289 0.79 22 185 59 23 0.972 0.961 0.925 - only 1 1914 5.25 223 1519 96 76 0.950 0.926 0.865 - only 2 286 0.78 27 146 82 31 0.954 0.940 0.900 SAMIC 2178 5.98 412 1083 649 34 0.961 0.944 0.899 U-Net 1618 4.44 78 764 735 41 0.966 0.949 0.905 DeepLabV3 2187 6.01 236 1705 212 34 0.961 0.946 0.901 FCN 2067 5.68 277 1443 316 31 0.963 0.950 0.907 ESD errors per metal line, and the counts of opens, shorts, FP s, and FN s. All ev aluations are conducted on the original-size images. F or SAMSEM, w e also provide ev aluation results for when only the pixel-based loss is used, as well as for each of the t w o mo dels, 1 (op erating on the original-size images) and 2 (op erating on 512 × 512 -pixel patc hes). F urthermore, w e analyze the p erformance when fine-tuning a single model on b oth original-sized images and smaller patc hes. W e are not y et ev aluating SAMSEM’s out-of-distribution performance; hence, we test on the IC s we also used for training. Results. F rom T able 3 , w e observe that SAMSEM is on-par with SAMIC, U-Net, DeepLabV3, and F CN for in-distribution segmentation in terms of the pixel-based metrics, but pro duces significan tly few er ESD errors. In particular, SAMSEM drastically reduces the n um b er of op ens, shorts, and FP s in the segmen tation b y about a factor of 6 relativ e to U-Net, which is the next-b est approac h. F urthermore, w e clearly see the b enefit of our m ulti-scale approac h, as mo dels 1 and 2 individually do not ac hiev e comparable results. F or in-distribution analysis, we do not observe a significan t impact of using a single model that com bines mo dels 1 and 2 . The ESD error rate increases marginally , which could still b e tolerated in exchange for a more straigh tforward fine-tuning process. Finally , we can see a clear b enefit of using our Betti matc hing loss function o ver pixel-based losses alone. 4.2 Out-of-Distribution Perfo rmance on Unseen ICs T o assess ho w w ell SAMSEM p erforms on previously unseen IC s, w e utilize the models previously trained on the training dataset from IC s 1 to 7 for the in-distribution ev aluation in Section 4.1 . This time, how ever, we test its out-of-distribution p erformance on all a v ailable images from IC s 8 to 14. As these IC s are not used for training, we can safely use all their metal-la y er images for testing. F or comparison, we again ev aluate SAMIC, U-Net, DeepLabV3, and F CN on our dataset. F ollo wing the same ev aluation procedure as describ ed in Section 4.1 , w e report results from these exp eriments in T able 4 . Results. Examining the out-of-distribution results in T able 4 , we can see that SAMSEM no w fully plays to its strengths. While existing work exhibits, at b est, an ESD error rate of 24 . 77 % for unseen IC s, meaning that ab out every fourth segmented metal line is incorrect, SAMSEM achiev es an error rate of just 5 . 53 % . Again, w e see that using either model 1 C. Gehrmann, J. Rick er, S. Damm, D. Cheng, J. Sp eith, Y. Shi, A. Fischer, C. P aar 17 T able 4: Out-of-distribution p erformance of SAMSEM compared to other approac hes. All mo dels were trained on the same training dataset from IC s 1–7 and ev aluated on IC s 8–14. W e rep ort the ESD error rate in t wo forms: as the absolute num b er of ESD errors observed for our test set, and as the relativ e n umber of ESD errors p er metal line, expressed as a p ercentage. The total num b er of metal lines in this test set is 15 , 153 . F urthermore, w e presen t commonly used metrics for pixel-lev el segmen tation accuracy . ESD ↓ pixel ↑ total % opens shorts FPs FNs P A Dice IoU SAMSEM 838 5.53 531 170 104 33 0.959 0.939 0.888 - only pixel loss 789 5.21 351 234 164 40 0.959 0.944 0.896 - single model 908 5.99 410 235 204 59 0.958 0.937 0.886 - only 1 1607 10.60 340 1112 72 83 0.936 0.920 0.854 - only 2 1443 9.52 649 468 280 46 0.934 0.911 0.852 SAMIC 4116 27.16 2079 363 1608 66 0.934 0.935 0.831 U-Net 7704 44.90 1570 603 4435 196 0.907 0.830 0.753 DeepLabV3 3753 24.77 2462 154 437 700 0.904 0.787 0.714 FCN 5815 38.38 3158 303 2006 348 0.910 0.826 0.751 or mo del 2 in isolation w ould not yield satisfactory results, although both still p erform b etter than other approaches from the literature. Now, we can also observ e degraded p erformance if w e com bine mo del 1 and 2 in to a single one. In that case, the ESD error rate for generalization w ould increase b y about 0.5 percentage p oints. Finally , when using only the pixel-based loss, w e observed a slight improv ement in segmentation accuracy . Lo oking more closely , w e can observe that SAMSEM with Betti matc hing loss results in more open circuits, but few er shorts and FP s than the model that was fine-tuned without an y top ological loss function. 4.3 Analysis of F ailure Cases W e no w tak e a closer look at the distribution of ESD errors across the test images used in our in-distribution ( IC s 1 to 7) and out-of-distribution ( IC s 8 to 14) ev aluation. F or in-distribution, we observe only one outlier image, exhibiting 92 ESD errors, as sho wn in Figure 7a . Although w e observe more outliers in our out-of-distribution ev aluation, w e can still report decen t results for most images, as sho wn in Figure 7b . Here, we count 7 images with more than 40 ESD errors and another 5 with more than 20 errors. 0 50 100 150 0 20 40 60 80 100 image ID ESD errors (a) In-distribution (ICs 1 to 7). 0 20 40 60 0 20 40 60 80 100 image ID ESD errors (b) Out-of-distribution (ICs 8 to 14). Figure 7: Distribution of the absolute num b er of ESD errors across all original-size images. W e no w examine some of these failure cases in more detail to identify the reasons b ehind the observed results. Across our sample, we mostly observ e three sources of errors: 18 SAMSEM – A Generic and Scalable Approac h for IC Metal Line Segmen tation (i) F or the outlier in Figure 7a , w e observ e a sligh t misalignmen t betw een the actual metal lines on the SEM images and the ground truth mask, as sho wn in Figure 8a . Here, the ESD metric simply coun ts in v alid o verlaps b etw een a metal line in the ground truth and vias in the predicted segmentation mask as shorts. (ii) The first outlier in Figure 7b results from t wo via-related issues. Existing vias in that image cast a blac k shade around them, see Figure 8b . They partially obscure the underlying metal line and mislead SAMSEM into predicting bac kground around the via. W e observ ed the same issue for other outliers in the out-of-distribution test, whic h can be explained by the absence of an y images with similar c haracteristics in our training set. F or Figure 8b , we also observe that some vias are missing altogether, as they w ere accidentally remo v ed during sample preparation. This results in narro w metal lines around the missing via that SAMSEM do es not properly segmen t. (iii) Some failures around image 54 in Figure 7b are due to dela yering defects, see Figure 8c for an example. Here, some metal lines were incorrectly remov ed during sample preparation, leaving only their outlines visible. These failures increase the ESD error rate across all affected images. (a) Misalignmen t of ground truth (green) and image. (b) Dark shades around vias and missing vias. (c) Dela yering defects, only outlines of metal lines visible. Figure 8: F aulty images with artifacts or defects from sample preparation or imaging. Of course, not all ESD errors can b e attributed to suc h sample preparation and image qualit y issues. How ever, our out-of-distribution test set from IC s 8 to 14 is of lo wer quality than IC s 1 to 7, which helps to explain the more frequen t significan t outliers in Figure 7b . 4.4 Mo re ICs Result in Better Generalization W e now inv estigate ho w the num b er of IC s seen during fine-tuning affects the out-of- distribution segmen tation performance on unseen IC s. T o this end, we fine-tune SAM2 on different subsets of IC s 1 to 7, ranging from a single IC to six differen t IC s. F or eac h n umber of IC s, w e fine-tune on four different randomly c hosen IC com binations and then alw ays infer on the same out-of-distribution test set from IC s 8 to 14. As b ecomes eviden t from the resulting n umbers of ESD errors in Figure 9 , the more IC s are used for fine-tuning, the b etter the out-of-distribution p erformance. Intuitiv ely , this was exp ected. Ho wev er, our results also indicate that the ESD error rate has not y et saturated, and this down ward trend is lik ely to contin ue when using ev en more ICs for fine-tuning. 4.5 Fine-T uning on All 14 ICs After carefully ev aluating the in-distribution and out-of-distribution capabilities of SAM- SEM when fine-tuning on sev en IC s, we fine-tuned on 90 % of all annotated images from C. Gehrmann, J. Rick er, S. Damm, D. Cheng, J. Sp eith, Y. Shi, A. Fischer, C. P aar 19 7 2 3 6 1,3 3,7 1,2 3,4 1,2,6 1,5,7 2,3,7 2,3,5 2,4,5,7 1,2,4,6 4–7 1–4 1–4,6 2–6 2,4–7 1,2,4,5,7 2–7 1–5,7 1–6 1,3–7 0 1 , 000 2 , 000 3 , 000 6 ICs 5 ICs 4 ICs 3 ICs 2 ICs 1 IC IC combinations num b er of ESD errors opens shorts FPs FNs Figure 9: Different training data splits used for fine-tuning to in vestigate whether more div ersity in ICs seen during fine-tuning significan tly affects the segmen tation accuracy of SAMSEM on unseen ICs. all 14 IC s. As our findings from Section 4.4 imply that SAMSEM’s generalization accuracy has not yet saturated, we exp ect this final mo del to achiev e even b etter segmentation accuracy . T o test its in-distribution performance, we allocated 10 % of all images from all IC s for testing. When ev aluating on this 10 % test set, w e obtain a P A of 0.971 (Dice: 0.960, IoU: 0.924) and an ESD error rate of 0 . 62 % (22 shorts, 50 opens, 41 FP s, and 6 FN s across 19 , 088 metal lines). Hence, w e observ e a sligh t improv ement ov er the 0 . 72 % in-distribution ESD error rate rep orted for sev en IC s. Giv en that we now use all our IC s for fine-tuning, w e cannot re-ev aluate out-of-distribution p erformance, as the mo del has already seen all IC s in our dataset. Ho wev er, as we publish this final mo del as part of our w ork, in terested readers can test it on their o wn IC images without further fine-tuning. 5 Discussion and Conclusion W e now discuss limitations of our approac h in Section 5.1 , p otential av enues for future researc h in Section 5.2 , and presen t our concluding remarks in Section 5.3 . 5.1 Limitations One ma jor limitation is that our ground truth is based on manually annotated SEM image data. In rare cases, it w as even difficult, if not impossible, for the human creating the ground truth to decide whether metal lines were connected. Hence, some remaining errors can b e attributed to uncertain ties in the ground truth. The only viable solution for this problem would b e to use the design files of the analyzed IC s as a ground truth, which were not av ailable to us for any of the targeted third-part y ICs. The runtime of our exp erimen ts, sometimes spanning up to five days, presented a b ottlenec k that preven ted us from conducting more complex exp erimen ts, simply b ecause we lac ked the required resources. W e could, for example, not exhaustively test all parameters 20 SAMSEM – A Generic and Scalable Approach for IC Metal Line Segmentation of the Betti matching loss during hyperparameter optimization. F or similar reasons, we w ere unable to conduct an in-depth exploration of other top ology-based loss functions. While we publish our mo dels and scripts as open source, w e are unable to do so for the datasets of 13 of the 14 different IC s for legal reasons. Only the dataset we adopted from Rothaug et al. [ RKA + 23 ] is made publicly av ailable by the authors. In all other cases, cop yright law, license agreemen ts, and the risk of legal action b y either the designers or man ufacturers of the IC s result in an unfortunate legal situation that could threaten not only us as researchers but also the publishers of w orks lik e ours. 5.2 F uture W ork While w orking on this publication, SAM3 has been released [ CGH + 25 ]. At the time of writing, the mo del was only av ailable on request. F urthermore, p erformance data published b y Meta does not indicate a significan t impro vemen t in image segmen tation; therefore, w e do not exp ect a substantial b enefit from switching to SAM3 . Nonetheless, the applicability of SAM3 to metal line segmentation remains to b e in vestigated. W e primarily see additional room for improv emen t in our generalization results. Here, the use of additional, higher-quality datasets featuring more annotated images could help further improv e generalization p erformance, as suggested by our findings in Section 4.4 . Ideally , suc h datasets could b e made publicly a v ailable b y IC designers or manuf acturers, who also p ossess the GDSII ground truth corresp onding to the images. Finally , future research should inv estigate whether techniques to mitigate sample preparation or imaging defects [ HCY + 21 ], as depicted in Figure 8 , can be incorporated in to approaches lik e SAMSEM. T o this end, larger datasets con taining more erroneous SEM images, paired with reliable ground truth, would b e required. 5.3 Conclusion In this work, we introduced SAMSEM, a to ol for IC metal lay er image segmentation based on SAM2 . W e demonstrated that our approach pro duces reliable segmentation masks with an ESD error rate of 0 . 62 % . SAMSEM works almost fla wlessly for IC s that it has been fine-tuned on, significantly impro ving up on existing work. F urthermore, it shows promising results ev en for IC s that it w as not previously exp osed to. Therefore, w e ha ve sho wn that SAMSEM generalizes w ell across IC tec hnology no des and image capturing tec hniques. Ho wev er, we also see that our approach could b e further enhanced by fine-tuning on additional image datasets from more diverse IC s. Therefore, we strongly adv o cate for IC designers and man ufacturers to mak e annotated image datasets publicly av ailable as b enc hmarks to improv e not only the reliability and accuracy of IC v erification tools lik e ours, but also the reproducibility and comparabilit y of researc h results. By publishing our final SAMSEM mo del trained on images from all 14 IC s, along with our training, inference, and ev aluation scripts, as op en source, we tak e a first step in this direction and hope to lo wer the barriers for researc hers to enter this domain. References [ASY + 19] T akuya Akiba, Shotaro Sano, T oshihiko Y anase, T akeru Oh ta, and Masanori K oy ama. Optuna: A next-generation h yp erparameter optimization framework. In Ankur T eredesai, Vipin Kumar, Ying Li, Rómer Rosales, Evimaria T erzi, and George Karypis, editors, Pr o c e e dings of the 25th A CM SIGKDD International Confer enc e on K now le dge Disc overy & Data Mining, KDD 2019, A nchor age, AK, USA, A ugust 4-8, 2019 , pages 2623–2631. A CM, 2019. C. Gehrmann, J. Rick er, S. Damm, D. Cheng, J. Sp eith, Y. Shi, A. Fischer, C. P aar 21 [BLS + 24] Alexander H. Berger, Laurin Lux, Nico Stuc ki, Vincent Bürgin, Suprosanna Shit, Anna Banaszak, Daniel R ueck ert, Ulric h Bauer, and Johannes C. Paet- zold. T op ologically faithful multi-class segmen tation in medical images. In Marius George Linguraru, Qi Dou, Aasa F eragen, Stamatia Giannarou, Ben Glo c ker, Karim Lekadir, and Julia A. Schnabel, editors, Me dic al Image Comput- ing and Computer A ssiste d Intervention - MICCAI 2024 - 27th International Confer enc e, Marr akesh, Mor o c c o, Octob er 6-10, 2024, Pr o c e e dings, Part VIII , v olume 15008 of L e ctur e Notes in Computer Scienc e , pages 721–731. Springer, 2024. [CGH + 25] Nicolas Carion, Laura Gustafson, Y uan-Ting Hu, Shoubhik Debnath, Ronghang Hu, Didac Suris, Chaitany a Ry ali, Kaly an V asudev Alw ala, Haitham Khedr, Andrew Huang, et al. Sam 3: Segmen t an ything with concepts. arXiv pr eprint arXiv:2511.16719 , 2025. [CSL + 18] Deruo Cheng, Yiqiong Shi, T ong Lin, Bah-Hwee Gwee, and Kar-Ann T oh. Hybrid k-means clustering and support v ector mac hine method for via and metal line detections in delay ered IC images. IEEE T r ans. Cir cuits Syst. II Expr ess Briefs , 65-I I(12):1849–1853, 2018. [GZYL24] Shixuan Gao, Pingping Zhang, Tian yu Y an, and Huch uan Lu. Multi-scale and detail-enhanced segmen t anything mo del for salien t object detection. In Jianfei Cai, Mohan S. Kankanhalli, Balakrishnan Prabhakaran, Susanne Boll, Ramanathan Subramanian, Liang Zheng, Vivek K. Singh, Pablo César, Lexing Xie, and Dong Xu, editors, Pr o c e e dings of the 32nd A CM International Confer enc e on Multime dia, MM 2024, Melb ourne, VIC, A ustr alia, 28 Octob er 2024 - 1 Novemb er 2024 , pages 9894–9903. A CM, 2024. [HCS + 18] Xuenong Hong, Deruo Cheng, Yiqiong Shi, T ong Lin, and Bah-Hw ee Gw ee. Deep learning for automatic IC image analysis. In 23r d IEEE International Confer enc e on Digital Signal Pr o c essing, DSP 2018, Shanghai, China, Novem- b er 19-21, 2018 , pages 1–5. IEEE, 2018. [HCX + 22] Kaiming He, Xinlei Chen, Saining Xie, Y anghao Li, Piotr Dollár, and Ross B. Girshic k. Masked autoenco ders are scalable vision learners. In IEEE/CVF Confer enc e on Computer V ision and Pattern R e c o gnition, CVPR 2022, New Orle ans, LA, USA, June 18-24, 2022 , pages 15979–15988. IEEE, 2022. [HCY + 21] Ling Huang, Deruo Cheng, Xulei Y ang, T ong Lin, Yiqiong Shi, Kaiyi Y ang, Bah-Hw ee Gwee, and Bihan W en. Joint anomaly detection and inpainting for microscopy images via deep self-supervised learning. In 2021 IEEE Inter- national Confer enc e on Image Pr o c essing, ICIP 2021, A nchor age, AK, USA, Septemb er 19-22, 2021 , pages 3497–3501. IEEE, 2021. [HLSC19] Xiaoling Hu, F uxin Li, Dimitris Samaras, and Chao Chen. T op ology-preserving deep image segmen tation. In Hanna M. W allach, Hugo Laro c helle, Alina Beygelzimer, Florence d’Alché-Buc, Emily B. F ox, and Roman Garnett, editors, A dvanc es in Neur al Information Pr o c essing Systems 32: A nnual Confer enc e on Neur al Information Pr o c essing Systems 2019, NeurIPS 2019, De c emb er 8-14, 2019, V anc ouver, BC, Canada , pages 5658–5669, 2019. [Hu22] Xiaoling Hu. Structure-aw are image segmen tation with homotopy w arping. In Sanmi K oy ejo, S. Mohamed, A. Agarw al, Danielle Belgrav e, K. Cho, and A. Oh, editors, A dvanc es in Neur al Information Pr o c essing Systems 35: A nnual Confer enc e on Neur al Information Pr o c essing Systems 2022, NeurIPS 2022, New Orle ans, LA, USA, Novemb er 28 - De c emb er 9, 2022 , 2022. 22 SAMSEM – A Generic and Scalable Approach for IC Metal Line Segmentation [HWL + 21] Xiaoling Hu, Y usu W ang, F uxin Li, Dimitris Samaras, and Chao Chen. T op ology-aw are segmen tation using discrete morse theory . In 9th International Confer enc e on L e arning R epr esentations, ICLR 2021, V irtual Event, A ustria, May 3-7, 2021 . Op enReview.net, 2021. [KSS + 20] A dam G. Kimura, Jon Sc holl, James Schaffranek, Matthew Sutter, Andrew Elliott, Mike Strizic h, and Glen David Via. A decomp osition w orkflow for in tegrated circuit verification and v alidation. J. Har dw. Syst. Se cur. , 4(1):34–43, 2020. [KYD + 23] Lei Ke, Mingqiao Y e, Martin Danelljan, Yifan Liu, Y u-Wing T ai, Chi-Keung T ang, and Fisher Y u. Segment anything in high quality . In Alice Oh, T ristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, and Sergey Levine, editors, A dvanc es in Neur al Information Pr o c essing Systems 36: A nnual Confer enc e on Neur al Information Pr o c essing Systems 2023, NeurIPS 2023, New Orle ans, LA, USA, De c emb er 10 - 16, 2023 , 2023. [LBW + 25] Laurin Lux, Alexander H. Berger, Alexander W eers, Nico Stucki, Daniel R ueck ert, Ulric h Bauer, and Johannes C. Paetzold. T op ograph: An efficient graph-based framew ork for strictly top ology preserving image segmentation. In The Thirte enth International Confer enc e on L e arning R epr esentations, ICLR 2025, Singap or e, A pril 24-28, 2025 . OpenReview.net, 2025. [LH19] Ily a Loshc hilov and F rank Hutter. Decoupled weigh t deca y regularization. In 7th International Confer enc e on L e arning R epr esentations, ICLR 2019, New Orle ans, LA, USA, May 6-9, 2019 . Op enReview.net, 2019. [LMB + 24] Ch uni Liu, Boyuan Ma, Xiaojuan Ban, Y ujie Xie, Hao W ang, W eihua Xue, Jingc hao Ma, and Ke Xu. Enhancing b oundary segmen tation for top ological accuracy with skeleton-based metho ds. In Pr o c e e dings of the Thirty-Thir d International Joint Confer enc e on A rtificial Intel ligenc e, IJCAI 2024, Jeju, South K or e a, A ugust 3-9, 2024 , pages 1092–1100. ijcai.org, 2024. [L Y vD + 24] Hong Liu, Haosen Y ang, Paul J. v an Diest, Josien P . W. Pluim, and Mitko V eta. WSI-SAM: multi-resolution segmen t anything model (SAM) for histopathology whole-slide images. In F rancesco Ciompi, Nadieh Khalili, Linda Studer, Milda P o ceviciute, Amjad Khan, Mitk o V eta, Yiping Jiao, Neda Haj-Hosseini, Hao Chen, Shan Raza, F ayy az Minhas, In ti Zlobec, Nik olay Burlutskiy , V eronica Vilaplana, Biagio Brattoli, Henning Müller, and Manfredo Atzori, editors, Pr o c e e dings of the MICCAI W orkshop on Computational Patholo gy, Marr akesh, Mor o c c o, 6 Octob er 2024 , volume 254 of Pr o c e e dings of Machine L e arning R ese ar ch , pages 25–37. PMLR, 2024. [MKP25] Sa yan Mandal, Divyadarshini Karthik eyan, and Manas Paldhe. Sam2lora: Comp osite loss-guided, parameter-efficien t finetuning of SAM2 for retinal fundus segmentation. CoRR , abs/2510.10288, 2025. [MMZ23] Anqi Mao, Mehry ar Mohri, and Y utao Zhong. Cross-entrop y loss functions: Theoretical analysis and applications. In Andreas Krause, Emma Brunskill, Kyungh yun Cho, Barbara Engelhardt, Siv an Sabato, and Jonathan Scarlett, editors, International Confer enc e on Machine L e arning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA , v olume 202 of Pr o c e e dings of Machine L e arning R ese ar ch , pages 23803–23828. PMLR, 2023. [NTCG24] Y ong-Jian Ng, Y ee-Y ang T ee, Deruo Cheng, and Bah-Hwee Gwee. SAMIC: segmen t an ything model for in tegrated circuit image analysis. In IEEE R e gion C. Gehrmann, J. Rick er, S. Damm, D. Cheng, J. Sp eith, Y. Shi, A. Fischer, C. P aar 23 10 Confer enc e, TENCON 2024, Singap or e, De c emb er 1-4, 2024 , pages 418–421. IEEE, 2024. [PMB + 23] Endres Pusc hner, Thorben Mo os, Steffen Beck er, Christian Kison, Amir Moradi, and Christof Paar. Red team vs. blue team: A real-w orld hardware tro jan detection case study across four modern CMOS tec hnology generations. In 44th IEEE Symp osium on Se curity and Privacy, SP 2023, San F r ancisc o, CA, USA, May 21-25, 2023 , pages 56–74. IEEE, 2023. [R CK + 25] Nils Rothaug, Deruo Cheng, Simon Klix, Nicole Auth, Sinan Böc ker, Endres Pusc hner, Steffen Beck er, and Christof P aar. A dv ancing training stability in unsup ervised SEM image segmentation for IC lay out extraction. J. Crypto gr. Eng. , 15(4):21, 2025. [R GH + 25] Nikhila Ra vi, V alentin Gabeur, Y uan-Ting Hu, Ronghang Hu, Chaitany a R yali, T engyu Ma, Haitham Khedr, Roman Rädle, Chloé Rolland, Laura Gustafson, Eric Min tun, Junting Pan, Kaly an V asudev Alwala, Nicolas Carion, Chao- Y uan W u, Ross B. Girshick, Piotr Dollár, and Christoph F eic htenhofer. SAM 2: Segment anything in images and videos. In The Thirte enth International Confer enc e on L e arning R epr esentations, ICLR 2025, Singap or e, A pril 24-28, 2025 . Op enReview.net, 2025. [RHB + 23] Chaitan ya Ry ali, Y uan-Ting Hu, Daniel Bolya, Chen W ei, Hao qi F an, Po- Y ao Huang, V aibhav Aggarw al, Arkabandhu Cho wdhury , Omid Poursaeed, Judy Hoffman, Jitendra Malik, Y anghao Li, and Christoph F eich tenhofer. Hiera: A hierarchical vision transformer without the b ells-and-whistles. In Andreas Krause, Emma Brunskill, Kyungh yun Cho, Barbara Engelhardt, Siv an Sabato, and Jonathan Scarlett, editors, International Confer enc e on Machine L e arning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA , v olume 202 of Pr o c e e dings of Machine L e arning R ese ar ch , pages 29441–29454. PMLR, 2023. [RKA + 23] Nils Rothaug, Simon Klix, Nicole A uth, Sinan Böck er, Endres Puschner, Steffen Bec ker, and Christof Paar. T ow ards unsupervised SEM image segmen tation for IC la yout extraction. In Chip-Hong Chang, Ulric h Rührmair, Lejla Batina, and Domenic F orte, editors, Pr o c e e dings of the 2023 W orkshop on A ttacks and Solutions in Har dwar e Se curity, ASHES 2023, Cop enhagen, Denmark, 30 Novemb er 2023 , pages 123–128. A CM, 2023. [R TG + 19] Hamid Rezatofighi, Nathan T soi, JunY oung Gw ak, Amir Sadeghian, Ian D. Reid, and Silvio Sav arese. Generalized in tersection o ver union: A metric and a loss for bounding box regression. In IEEE Confer enc e on Computer V ision and Pattern R e c o gnition, CVPR 2019, L ong Be ach, CA, USA, June 16-20, 2019 , pages 658–666. Computer Vision F oundation / IEEE, 2019. [SBPB24] Nico Stuc ki, Vincent Bürgin, Johannes C. P aetzold, and Ulrich Bauer. Efficient b etti matching enables top ology-aw are 3d segmentation via p ersistent homology . CoRR , abs/2407.04683, 2024. [SLG20] Aa yush Singla, Bernhard Lippmann, and Helmut Graeb. Reco very of 2d and 3d lay out information through an adv anced image stitching algorithm using scanning electron microscop e images. In 25th International Confer enc e on Pattern R e c o gnition, ICPR 2020, V irtual Event / Milan, Italy, January 10-15, 2021 , pages 3860–3867. IEEE, 2020. [SL V + 17] Carole H. Sudre, W enqi Li, T om V ercauteren, Sébastien Ourselin, and M. Jorge Cardoso. Generalised dice ov erlap as a deep learning loss function for highly 24 SAMSEM – A Generic and Scalable Approach for IC Metal Line Segmentation un balanced segmen tations. In M. Jorge Cardoso, T al Arbel, Gustav o Carneiro, T anv eer F. Syeda-Mahmoo d, João Man uel R. S. T av ares, Mehdi Moradi, Andrew P . Bradley , Ha yit Greenspan, João P aulo P apa, Anan t Madabhushi, Jacin to C. Nascimento, Jaime S. Cardoso, V asileios Belagiannis, and Zhi Lu, editors, De ep L e arning in Me dic al Image A nalysis and Multimo dal L e arning for Clinic al De cision Supp ort - Thir d International W orkshop, DLMIA 2017, and 7th International W orkshop, ML-CDS 2017, Held in Conjunction with MICCAI 2017, Québ e c City, QC, Canada, Septemb er 14, 2017, Pr o c e e dings , v olume 10553 of L e ctur e Notes in Computer Scienc e , pages 240–248. Springer, 2017. [SPS + 21] Suprosanna Shit, Johannes C. Paetzold, Anjan y Sekub oyina, Iv an Ezhov, Alexander Unger, Andrey Zhylka, Josien P . W. Pluim, Ulric h Bauer, and Bjo ern H. Menze. cldice - A no vel top ology-preserving loss function for tubular structure segmen tation. In IEEE Confer enc e on Computer V ision and Pattern R e c o gnition, CVPR 2021, virtual, June 19-25, 2021 , pages 16560– 16569. Computer Vision F oundation / IEEE, 2021. [SPS + 23] Nico Stucki, Johannes C. P aetzold, Suprosanna Shit, Bjoern H. Menze, and Ulric h Bauer. T op ologically faithful image segmentation via induced matching of p ersistence barco des. In Andreas Krause, Emma Brunskill, Kyungh yun Cho, Barbara Engelhardt, Siv an Sabato, and Jonathan Scarlett, editors, In- ternational Confer enc e on Machine L e arning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA , v olume 202 of Pr o c e e dings of Machine L e arning R ese ar ch , pages 32698–32727. PMLR, 2023. [THC + 23] Y ee-Y ang T ee, Xuenong Hong, Deruo Cheng, Ch ye-Soon Chee, Yiqiong Shi, T ong Lin, and Bah-Hw ee Gw ee. P atch-based adv ersarial training for error- a ware circuit annotation of dela yered IC images. IEEE T r ans. Cir cuits Syst. II Expr ess Briefs , 70(9):3694–3698, 2023. [TUSP18] Bruno Machado T rindade, Eranga Ukwatta, Mik e Sp ence, and Chris P awlo wicz. Segmen tation of in tegrated circuit la youts from scan electron microscopy images. In 2018 IEEE Canadian Confer enc e on Ele ctric al & Computer Engine ering, CCECE 2018, Queb e c, QC, Canada, May 13-16, 2018 , pages 1–4. IEEE, 2018. [WZB + 24] Bo W en, Hao c hen Zhang, Dirk-Uw e G. Bartsch, William R. F reeman, T ruong Q. Nguy en, and Cheolhong An. T op ology-preserving image segmentation with spatial-a ware p ersistent feature matc hing. CoRR , abs/2412.02076, 2024. [YTG + 22] Zifan Y u, Bruno Machado T rindade, Mic hael Green, Zhikang Zhang, Pullela Sneha, Erfan Bank T av akoli, Christopher Pa wlowicz, and F engb o Ren. A data-driv en approac h for automated in tegrated circuit segmen tation of scan electron microscopy images. In 2022 IEEE International Confer enc e on Image Pr o c essing, ICIP 2022, Bor de aux, F r anc e, 16-19 Octob er 2022 , pages 2851– 2855. IEEE, 2022. [ZWD + 25] Mengdi Zh u, Ronald Wilson, Reiner N. Dizon-P aradis, Olivia P . Dizon-Paradis, Domenic J. F orte, and Damon L. W o odard. Genetic algorithm-assisted golden- free standard cell library extraction from SEM images. In 26th International Symp osium on Quality Ele ctr onic Design, ISQED 2025, San F r ancisc o, CA, USA, A pril 23-25, 2025 , pages 1–8. IEEE, 2025.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment