SoK: Systematizing Software Artifacts Traceability via Associations, Techniques, and Applications

JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, A UGUST 2021 1 SoK: Systematizing Software Artif acts T raceability via Associations, T echniques, and Applications Zhifei Chen, Lata Y i, Liming Nie, Y angyang Zhao, Hao Liu, Y iqing Shi, and W ei Song, Senior Member , IEEE Abstract —Software development r elies hea vily on traceability links between various softwar e artifacts to ensure quality and facilitate maintenance. While automated traceability reco very techniques ha ve advanced for differ ent artifact pairs, the ﬁeld remains fragmented with an incomplete over view of artifact associations, ambiguous linking techniques, and fragmented knowledge of application scenarios. T o bridge these gaps, we conducted a systematic literature review on softwar e traceability reco very to synthesize the linked artifacts, reco very tools, and usage scenarios across the traceability ecosystem. First, we constructed the ﬁrst global artifacts traceability graph of 23 associations among 22 artifact types, exposing a sev ere research imbalance that heavily favors code-r elated links. Second, while reco very techniques are shifting toward deep semantic models, a repr oducibility crisis persists (e.g., only 37% of studies released code); to address this, we provided a comprehensi ve e valuation framework including a technical decision map and standardized benchmarks. Finally , we quantiﬁed an industrial adoption gap (i.e., 95% of tools remain conﬁned to academia) and pr oposed a role-centric framework to dynamically align artifact paths with concrete engineering activities. This re view contrib utes a coherent knowledge framew ork for artifacts traceability resear ch, identiﬁes current tr ends, and provides directions for future work. Index T erms —Literature r eview , softwar e artifacts, traceability reco very . I . I N T RO D U C T I O N Software de velopment is an inherently complex process that generates a multitude of artifacts, ranging from initial requirement and design models to code, tests, and deploy- ment architectures [1]–[3]. T o satisfy the increasing regula- tion for safety- or security-critical systems, establishing and maintaining clear relationships, or “traceability links”, among div erse artifacts is paramount. For example, robust traceability provides the necessary evidence to demonstrate that software design fulﬁlls all speciﬁed software requirement [4], and all code is linked to well-deﬁned speciﬁcations and established testing procedures [2], [5], [6]. Furthermore, it supports a variety of software tasks, such as change impact analysis [7], bug-ﬁx commit identiﬁcation [8], selectiv e regression testing [9], and project management [10]. Con versely , the absence of Zhifei Chen, Lata Yi, Y iqing Shi, and W ei Song are with the School of Computer Science and Engineering, Nanjing University of Science and T ech- nology , China (e-mail: chenzhifei@njust.edu.cn; yilata1241@njust.edu.cn; 125106010860@njust.edu.cn; wsong@njust.edu.cn). Liming Nie and Hao Liu are with the School of Artiﬁcial Intelligence, Shenzhen T echnology Univ ersity , China (e-mail: nieliming@sztu.edu.cn; 2410263026@stumail.sztu.edu.cn). Y angyang Zhao is with the School of Computer Science and T echnology , Zhejiang Sci-T ech University , China (e-mail: yangyangzhao@zstu.edu.cn). Lata Y i contributed equally as the co-ﬁrst author . Liming Nie is the corresponding author . these links forces engineering teams to comprehend, validate, and ev olve complex systems blindly . Thus, traceability is not merely an optional beneﬁt, but a fundamental engineering necessity to pre vent structural decay during softw are e volution. Howe ver , due to the dynamic nature of software projects, manual creation and maintenance of links between softw are artifacts are often labor-intensi ve, error-prone, and frequently neglected, leading to “traceability debt” [11]. Consequently , automated or semi-automated recovery of software artifacts traceability has emerged as a critical research area in recent years. Researchers ha ve e xplored v arious techniques, including information retrie v al (IR) methods [12], machine learning (ML) approaches [13], and heuristic-based algorithms [14]– [16], to identify and reconstruct links between disparate soft- ware artifacts. Existing studies often focus on a particular pair of artifacts, such as “ r equir ement - code ” [17]–[21] and “ test - code ” [22]–[24]. Although these advances have shown promising results in speciﬁc contexts, the ﬁeld remains frag- mented with di verse artif act relationships, recov ery techniques, ev aluations, and scopes. This leads to a lack of a holistic ov erview of the practical applicability of these recovered links in real-world dev elopment scenarios. Follo wing these previous studies, we can ﬁnd that tackling the topic of software traceability recov ery presents vital chal- lenges in conceptual, technical, and practical aspects: (1) Incomplete Overview of Artifact Associations: Current research often focuses on speciﬁc pairs of artifacts or lim- ited sets of relationships. This incomplete landscape leav es researchers operating with sev ere blind spots. W ithout kno w- ing which critical associations are missing, the community cannot strategically prioritize ne w artifact pairs to impro ve tool cov erage or support more complex engineering scenarios. (2) Ambiguous T echniques and Evaluations: Despite various tools being developed, the underlying linking tech- niques and e valuation boundaries are reported inconsistently across studies. This methodological ambiguity prev ents fair and cross-study comparisons, making it nearly impossible to establish clear state-of-the-art baselines or for practitioners to select the optimal technique for their speciﬁc context. (3) F ragmented Knowledge of Practical Applications: While theoretical beneﬁts of recovered links are widely ac- knowledged, their real-world application remains unclear . This lack of synthesized usage scenarios creates a signiﬁcant barrier to industrial adoption. Engineering teams are unclear in which speciﬁc lifecycle phases or tasks these links will deliv er con- crete practical beneﬁts or alleviate actual maintenance efforts. Motiv ated by the urgent need to resolv e these bottlenecks, this paper conducts a systematic literature re view to bridge JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, A UGUST 2021 2 these critical gaps. W e aim to answer three questions: RQ1) What types of softwar e artifacts and their associations con- stitute the curr ent traceability networks? RQ2) What is the curr ent status of existing tools to establish links between softwar e artifacts? RQ3) What are the usage scenarios of the r ecover ed links between software artifacts? T o in vestigate these questions, we adopted the research methodology of Systematization of Kno wledge (SoK). In particular , we conducted a comprehensi ve search for rel- ev ant literature published in major academic conferences and journals. Through rigorous selection, classiﬁcation, and synthesis of 76 selected literatures, we carefully identiﬁed associations between artifacts, core linking techniques, and usage scenarios. This study uncov ers se veral critical empirical ﬁndings. W e identiﬁed 22 distinct software artifacts and 23 types of associations, but the research landscape is highly unbalanced: nearly half of the studies focus on the consistency between documentation and source code . T echnically , the ﬁeld exhibits a clear paradigm shift from traditional IR methods to advanced learning models to achiev e superior semantic com- prehension. Despite this advancement, reproducibility remains a major bottleneck: only 37% of studies made their source code publicly av ailable and researchers still lack standardized ev aluation benchmarks. Furthermore, 95% of studies were ev aluated in academic settings, primarily targeting requirement - implementation consistency and maintenance support, which exposes a massiv e gap in real-world industrial adoption. Beyond simply reporting these empirical ﬁndings, this paper provides actionable framew orks and strate gic insights for the community . Our primary contributions are manifold: (1) Artifacts T raceability Graph: W e constructed the ﬁrst global artifacts traceability graph that synthesizes 23 distinct relational associations among 22 hierarchical software arti- facts. Building on this holistic map, we formulated structural guidelines (including multi-hop chains, the central pi vot, and quality boundaries) to shift the research focus from isolated binary mappings toward a complete ecosystem perspectiv e. (2) T echnical Landscape and Evaluation F rameworks: W e re vealed the technical paradigm shift from traditional IR to deep semantic models. T o address the reproducibility crisis in this ﬁeld, we proposed a technical decision map and a comprehensi ve ev aluation framework to standardize benchmarks. (3) Goal-Driven Traceability F ramework: By mapping the application domains of recovered links, we provided quantita- tiv e evidence of the disconnect between academic research and real-world industrial utility . T o bridge this gap, we introduced a role-centric traceability frame work that dynamically aligns speciﬁc artifact paths with concrete engineering objecti ves. All materials related to this paper are publicly av ailable [25]. The remainder of this paper is organized as follows. Section II presents the background and related work. Section III details our revie w methodology , and Section IV presents the corresponding results. Section V discusses the ﬁndings and limitations of this study . Finally , Section VI concludes this paper and outlines future work. I I . B AC K G R O U N D A N D R E L A T E D W O R K A. Softwar e T raceability Recovery Software artif acts are all tangible byproducts generated throughout the software lifec ycle. T o our knowledge, there is no comprehensiv e list of software artifacts deﬁned to date. These diverse outputs encompass a wide range of deliver - ables, all essential for deﬁning and supporting a software system. Software artifacts traceability recov ery is the process of automatically identifying and establishing the relationships between these disparate softw are artifacts. In the complex landscape of modern software de velopment, it is necessary to maintain a clear and comprehensi ve understanding of the rela- tionships between various artifacts [26]. Software traceability ensures that ev ery component of a system can be linked back to its origin and forward to its impact, providing a crucial foundation for effecti ve project management [27]. Howe ver , manually establishing and maintaining artifact links throughout the software lifecycle is labor-intensi ve and error-prone. As projects grow in size and complexity and as dev elopment teams become more distributed, the sheer v olume of artifacts and the dynamic nature of their interdependencies make manual traceability increasingly impractical [28]. This challenge gi ves rise to the critical need for the research of artifacts traceability recovery . B. Related W ork 1) Establishment of Artifacts T raceability Links: Numerous studies have explored methods for establishing traceability links between different software artifacts. In the initial phase, traceability was maintained manually by qualiﬁed dev elopers who were responsible for creating and updating trace links between software artifacts. For example, Alves-Foss et al. [29] established trace links between UML design speciﬁca- tions and corresponding source code using XML technology , supporting hyperte xt-based traceability through manual means. Afterwards, semi-automated methods were proposed to reduce manual workload while retaining human ov ersight for critical decisions. For example, Hammad et al. [30] proposed an automated technique to determine whether changes in source code af fect UML class diagrams in design documents. The system notiﬁed users when speciﬁc decisions were required. In recent years, researchers have been conducting studies tow ards fully automated traceability techniques. Abadi et al. [31] compared ﬁve traditional IR techniques (LSI, VSM, JSM, PLSI, and SDR) for traceability between code and documenta- tion . They concluded that VSM and LSI are the most suitable for traceability recov ery tasks, despite their relatively poor performance in dimensionality reduction. Cleland-Huang et al. [32] employed a classiﬁer model trained on manually curated traceability matrices to establish links between r e gulation and r equir ement . Rahimi et al. [33] addressed the issue of low accuracy in automated traceability between software r equir e- ment and source code by lev eraging a neural network-based semantic vectorization approach. Chen et al. [34] improved the accuracy and stability of “ documentation - code ” traceability by integrating machine learning, and heuristic optimization JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, A UGUST 2021 3 techniques. Although numerous techniques exist for associat- ing software artif acts, there is a clear need for a comprehensi ve synthesis of these approaches. 2) Existing Reviews of Artifacts T raceability Recovery: Beyond individual studies on different artifacts linking ap- proaches, se veral re views have systematically assessed the broader landscape of artifacts traceability establishment. Sev eral revie ws concentrated on a speciﬁc artifact to high- light its unique role across the software lifecycle [35]. For example, P arizi et al. [36] presented a systematic literature revie w focusing on tests, addressing a kno wledge gap in the recov ery of traceability from test to code. Similarly , W ang et al. [37] explored trends and adv ances in bug traceability , concluding that improving the accuracy of “ bug - commit ” and “ bug - code ” linking is essential for software systems. Other revie ws examined traceability across multiple soft- ware artifacts within the scope of a speciﬁc research domain or a speciﬁc technique. For example, Aung et al. [7] conducted a systematic revie w on automated trace link recov ery methods within the domain of change impact analysis, concluding that there is a lack of public datasets and that traceability research remains limited in these areas. W ang et al. [13] focused on the intersection of ML and software traceability , providing a comprehensi ve surve y that considered the classiﬁcation of artifact links as a key research area. Charalampidou et al. [26] conducted a study of previous artifact-related surveys and concluded that requirement artif acts dominate the traceability literature with most surveys. Recent systematic re views high- light a decisi ve paradigm shift to ward deep learning. Khalil et al. [38] and Rahman et al. [39] noted that advanced embedding and Transformer -based architectures (e.g., BER T) have be- come the dominant approach for capturing complex semantic relationships, appearing in o ver half of the latest traceability studies. Complementing this, the comprehensive revie w by Antonio et al. [40] on machine learning techniques for require- ment engineering points out that although supervised learning is widely applied to tasks such as requirement classiﬁcation and traceability across the requirement engineering lifecycle, its effecti veness is often constrained by the scarcity of high- quality labeled datasets. Furthermore, K oboyatshwene et al. [41] identiﬁed persistent gaps in the literature, particularly the ongoing neglect of non-functional requirement and the lack of standardized benchmarks for global artifact sets. Howe ver , these e xisting re views focus only on limited artifacts, single domains, or single techniques, without providing a systematic ov erview of this research area. These limitations underscore the need for a more general framew ork to bridge the semantic gap across div erse and global software artifacts. I I I . R E S E A R C H D E S I G N In this section, we outline the research questions and then present the detailed description of our research framework. A. Resear ch Questions Our study aims to answer the following research questions. RQ1: What types of software artifacts and their associ- ations constitute the current traceability networks? Rather than viewing traceability as isolated point-to-point mappings, this question e xplores the multidimensional associations be- tween heterogeneous artif acts. W e aim to construct a global traceability network, pinpointing established research focuses and highlighting structural blind spots within the ecosystem. RQ2: What is the current status of existing tools to establish links between software artifacts? Recognizing the structural complexity of the traceability network (RQ1), this question in vestigates the technical foundations required to construct it. By comprehensively analyzing artifact repre- sentations, linking techniques, and ev aluation frameworks, we aim to understand how existing tools bridge heterogeneous semantic gaps and to assess the maturity of current empirical ev aluations, providing guidance for future tool selection. RQ3: What are the usage scenarios of the recover ed links between software artifacts? Ev en when successfully constructed using advanced techniques (RQ2), the traceability network is prone to decay if it lacks clear practical purposes. Therefore, this question seeks to understand how the reco vered links align with speciﬁc domains, software phases, and prac- tical objectiv es. Identifying these scenarios provides crucial insights to bridge the gap between academic research and real- world industrial adoption. B. Resear ch F r amework T o address these research questions, our study follows the general guidelines for the preceding systematic revie ws proposed by Kitchenham and Barbara [42]. The general ﬂow of the process is illustrated in Fig. 1. First, the Literature Selection module selected a list of relev ant papers on software traceability recovery from v arious sources in recent years. W e believe that these papers can reﬂect the research trends and encompass the majority of relev ant associations between different artifacts. Second, the Literature Revie w module ex- tracted information from each selected paper . Finally , we summarized the associations between artifacts analyzed in the literature to construct an artifacts traceability graph to answer RQ1. Meanwhile, we analyzed the linking tools and the applications in current research to answer RQ2 and RQ3, respectiv ely . W e describe the methodology in the follo wing subsections. More details are a vailable publicly [25]. C. Systematic Literatur e Review 1) Sear ch Query Generation: In our re view protocol, we in- tend to utilize reputable literature search engines and databases to identify high-quality research papers. Considering the scope of our literature revie w , we concentrated on a speciﬁc set of keyw ords to perform the paper search. Our search query is structured as a conjunction of two research domains: D1 ) Softwar e Artifacts and D2 ) Link . Each domain within the search string is expressed as a disjunction of its associated keyw ords. Our search query Q is deﬁned as follo ws: Q = ^ d ∈{ D 1 ,D 2 }   _ keywor d ∈ K d keywor d   where K d is the set of ke ywords for the domain d . JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, A UGUST 2021 4 Fig. 1. Overvie w of the Study Framework. Howe ver , these keyw ords appear in the literature in dif- ferent forms. In particular , the scope of D1 is lar ge and D2 contains multiple synon ymous words. Therefore, to ensure a comprehensive search, we ﬁrst manually summarized all keyw ords by checking high-quality research papers in recent years. W e collected all research papers from ﬁv e journals and conferences ( RE , ICSME , ICPC , ICST , and SST ) related to our research topic published since 2022. Subsequently , for each paper, we carefully analyzed whether it in vestigated the establishment of traceability links between different types of software artifacts. This analysis was conducted with reference to existing formal descriptions and classiﬁcation standards of software artifacts, such as ISO/IEC 19506:2012 [43], Model Services Contract (MSC) [44], and Softwar e Engineering (9th ed.) [45]. During this step, we identiﬁed 18 rele vant papers in the scope of this study . After that, we manually extracted the domain keywords from their titles, abstracts, and author key- words, and ﬁnally collected 13 indi vidual keyw ords for both D1 and D2 . Meanwhile, we also included all common software artifacts listed in the formal deﬁnitions [43]–[45] into the set of keyw ords for D1 . After combining four keyword sources and subsequently eliminating duplicate and subsumed keywords, we were able to identify 11 and 4 speciﬁc keyw ords for D1 and D2 , respectiv ely . This preliminary analysis provided basic knowledge for e xpressing the search query string Q : ( “r equir ement” OR “sour ce code” OR “design” OR “issue” OR “commit” OR “bug r eport” OR “use case” OR “doc- umentation” OR “model” OR “test” OR “speciﬁcation” ) AND ( “traceability” OR “link” OR “r elationship” OR “trace” ) 2) Literatur e Sear ch: Using the carefully generated Q , we conducted a literature search in ﬁve authoritati ve databases including Google Scholar , DBLP , IEEE, Scopus, and A CM. Our search was restricted to titles, abstracts, and author key- words. After this systematic search, the number of candidate literature retrie ved is: Google Scholar (1107), DBLP (485), IEEE (330), Scopus (316), and ACM (124). This search process was completed by Nov ember 7, 2025. T o maintain the integrity of our revie w , we merged them and removed duplicate entries from the initial pool of literature. It led to a set of 1,660 papers to be ev aluated in the ne xt stage. 3) Literatur e Filtering: T o ensure the quality , quantity , and rele vance of the selected literature to our research topic, we conducted a two-stage ﬁltering process for the collected papers: source ﬁltering and content ﬁltering. Source Filtering. The purpose of paper source ﬁltering was to improv e the quality and relev ance of collected papers. Our source selection was based on a dual criterion: (1) Specialized journals/conferences highly related to soft- ware traceability recov ery: RE , ICSME , ICPC , ICST , and SST . (2) T op-tier core journals/conferences in software engineer- ing: TSE , TOSEM , EMSE , JSS , IST , SPE , IEEE Softwar e , ICSE , FSE , ASE . After combining these criteria, the ﬁnal set of sources from which we selected papers included 15 reputable jour- nals/conferences. Through this dual strategy , we captured the most authoritativ e research (consisting of 243 papers) while ensuring deep cov erage of the specialized area of traceability . Content Filtering. The purpose of paper content ﬁltering was to improve the relev ance of the research content. W e applied the following inclusion criteria to select papers: (1) Papers whose topic is software traceability reco very . (2) Peer-re viewed journal/conference papers. (3) Papers written in English. (4) Papers with full text av ailable. (5) Papers that explicitly describe linking techniques. (6) Papers that establish links between software artifacts. T o ensure the rigor of the study , two authors independently conducted content ﬁltering. They ﬁrst jointly excluded stud- ies that were clearly not related to the research topic. The remaining papers were then revie wed in full to determine whether they addressed software traceability recov ery . For the two papers that had conﬂicting decisions, a third author participated in the discussion and made the ﬁnal judgment. After this ﬁltering process, we obtained 56 relev ant papers. 4) Snowballing: T o include more essential literature, we employed a bidirectional sno wball strategy to trace both backward and forward references of the 56 relev ant papers. In particular , we considered papers that were cited by these papers or that cited them, extending our search to a two-layer citation depth to maintain focus. Throughout the entire snow- balling process, we consistently applied the source ﬁltering and content ﬁltering described in the previous subsection. This JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, A UGUST 2021 5 process identiﬁed 20 additional rele vant papers, bringing 76 papers for the follow-up step. D. Artifacts T raceability Graph Construction T o support a systematic overvie w of dif ferent artifacts associations, we built an artifacts traceability graph to visualize the associations that have been a research focus. Software Artifacts (Nodes) Construction. The construc- tion of artifact nodes in volv ed three steps: extracting analyzed artifacts from the revie wed literature, removing redundant artifacts, and constructing an artifact hierarchy . The whole process was manually veriﬁed by at least two authors. First, we carefully revie wed the “Methodology” section of each paper collected, and identiﬁed the analyzed software artifacts according to the formal deﬁnition [43], [44]. This process resulted in a total of 152 software artifacts. T o eliminate redundancy , we simpliﬁed the list of artifacts by consolidating softw are artifacts. W e merged coarser -grained terms that referred to the same artifact (e.g., test and test artifact) across the software lifecycle. This step results in the identiﬁcation of 20 distinct software artifacts. Follo wing that, we proceeded to identify the hierarchy of software artifacts. The artifacts referenced in the guidelines [44] served as the foundation for the artifact hierarchy , pro vid- ing the primary structure. First, we extracted artifact categories from the guidelines based on their granularity to form the top layer of the hierarchy . Then, each identiﬁed artifact was carefully analyzed in terms of its function within the software dev elopment lifecycle. Based on both its granularity and func- tional role, each artifact was assigned to the most appropriate position within the remaining hierarchical layers. T raceability Links (Edges) Construction. The nature of the relationships between dif ferent artifacts largely determines whether such links can be meaningfully established. Building upon the generated software artifact hierarchy , we systemati- cally examined the “Introduction” sections of selected papers to extract descriptions concerning the tar geted connections between software artifacts. For the papers that focused on the same pair of software artifacts, we compared their descriptions of analyzed traceability links to infer whether they refer to the same relationships between artifacts. Finally , distinct link types can be identiﬁed after synthesizing and categorizing the underlying relationships targeted in all papers. Graph Generation. After completing the construction of software artifacts and traceability links, we can build the artifacts traceability graph, where artifacts are represented as nodes and the relationships between them as edges. This graph systematically illustrates the interconnections among different types of software artifacts that have been a focus of research. Note that the artifact nodes are arranged hierarchically within their respectiv e software artifact groups in the graph. Expert Survey . T o v alidate the constructed artifacts trace- ability graph and the related ﬁndings, we conducted an expert ev aluation in volving software engineers and researchers using two complementary methods: a structured online surv ey and in-depth expert intervie ws. The online surve y ev aluated the rationality , completeness, and practical v alue of our research. T ABLE I D E MO G R A PH I C I N F OR M A T I O N O F S U RVE Y P A RT IC I PAN T S . Information Measure Number Per centage Age 18-25 1 2.63% 26-30 20 52.63% 31-40 7 18.42% 41-50 7 18.42% 51-60 3 7.89% Y ears of professional experience 0-2 0 - 3-5 9 23.68% 6-10 20 52.63% 11-15 4 10.53% Over 16 5 13.16% Occupation Academic researcher 14 36.84% Software engineer 21 55.26% Oher 3 7.89% Familiarity with this topic 5(Familiar) 12 31.58% 4 18 47.37% 3 8 21.05% 2 0 - 1(Unfamiliar) 0 - Participants in this survey were required to accept a 30-minute training based on our material provided. The questionnaire consisted of 12 questions, where sev en questions used a Likert scale from 1 (low) to 5 (high) [46] to assess different dimensions of the graph and our ﬁndings, four questions collected demographic information from participants, and one question v eriﬁed whether they had carefully read the provided materials. The surve y was carried out on the Wjx.cn platform from Jan. 21, 2026 to Feb . 20, 2026. W e in vited de velopers of open-source projects and authors of academic papers to participate in it. A total of 38 responses ha ve been recei ved. The complete questionnaire is a vailable on our website [25]. T able I presents the demographic information of all online surve y participants, including age, years of professional ex- perience, occupational background, and familiarity with the concept of softw are traceability recov ery . The results sho w that 76% of participants have more than six years of professional experience. 92% of participants rated their understanding of software traceability recovery as 4 or 5, indicating a high le vel of e xpertise. Their professional background provides a solid foundation for ev aluating the effecti veness of our research. The expert interviews aimed to ev aluate the value of our research in real-world software dev elopment and research contexts. Open-ended questions were designed to gather expert insights on our research and how it could help optimize soft- ware dev elopment or research processes. The speciﬁc question posed was: “ Based on the Artifacts T raceability Graph and the Hierar chical Software Artifacts Diagram, what is the speciﬁc value of our ﬁndings in tac kling the primary issues in your pr ofessional ﬁeld or resear ch ar ea? ” The interviews were con- ducted between Feb . 15, 2026 to Feb . 30, 2026 and in volv ed four experts. All data were anonymized and pri vac y protection measures were implemented to ensure conﬁdentiality . E. Analysis of Linking T ools W e analyzed the tools used to establish links between software artifacts in selected papers. T o streamline this process and ensure the focus of our analysis, we deﬁned the following inclusion criteria. JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, A UGUST 2021 6 (1) Applicability: The tool must be explicitly introduced to establish links between software artifacts, using terms such as “identify”, “establish”, or “link”. (2) Automation: The tool must offer automated software traceability recovery capabilities. (3) Description: The tool must be supported by clear and comprehensiv e documentation. For each included tool, we performed a comprehensive analysis in three key dimensions: linking technique, input representation, and e valuation experiments. Our tool analysis relies exclusi vely on the datasets, descriptions, and results reported in the selected literature, rather than empirical ex- ecution or independent benchmarking. First, we examined “Methodology” sections of relev ant papers to identify the main technical method employed by each tool. W e extracted details on the linking algorithm or w orkﬂow , gi ving priority to the choice of the proposed linking technique that achiev ed the highest performance as reported in “Results” sections. Second, we identiﬁed the types of artifact input representations based on the information provided in “Methodology” sections. Certain representation types are suitable only for speciﬁc kinds of artif acts; e.g., Abstract Syntax T rees are often used to represent source code. The papers we revie wed exhibit a wide range of artifact representations, also depending on the linking techniques employed. During our examination of each tool, we recorded the representation format used for each software artifact whenev er a link was established. The input formats of two artifacts determine how to compute their similarity to support traceability recov ery . Third, in order to assess the reproducibility of the study , we examine the “Methodology” section of each paper to identify any references to executable source code. W e also analyzed the ev aluation design for each tool, focusing on the datasets and e valuation metrics reported in the sections “Experiment” or “Results”. F . Analysis of Usage Scenarios T o understand the usage scenarios in which links between software artifacts are established, we in v estigated the speciﬁc application context of building artifacts links for each paper . Giv en that software artifacts emer ge at different stages of software lifecycle and serve varying roles, we inv estigated three dimensions: research domain, software lifecycle phase, and primary objective. W e collected usage scenario information from “Introduc- tion” and “Result” sections where contributions are typically summarized. W e also refer to the discussions of applications described in the papers. First, we checked whether the software they analyzed contain general projects across v arious domains or contain industrial projects in speciﬁc domains. Through that, we can determine whether the proposed method was ap- plied in an academic or industrial setting. Second, we inferred the phase of software lifecycle emphasized in the study , based on the nature of associated artifacts. Lastly , we extracted and summarized the main objective of recovering traceability links from each study . This three-dimensional analysis provides an ov erview of the application contexts in which different pairs of software artifacts traceability should be e xplored. I V . R E S E A R C H R E S U LT S A. (RQ1) Artifacts and Associations After linking and grouping the identiﬁed artifacts from relev ant papers, we b uilt hierarchical artifacts diagram shown in Fig. 2 and the traceability graph sho wn in Fig. 3. In the traceability graph, the background colors represent different groups of artifacts, and the node colors represent different hierarchical layers of artifacts in each group. In the following, we describe the ﬁndings in these ﬁgures. 1) In vestigated Softwar e Artifacts.: W e identiﬁed 22 dis- tinct types of software artifacts from the 76 selected papers. Based on the characteristics of these 22 artifacts and the roles they play in software systems, we classify them and construct a Hierarchical Software Artifacts Diagram, as shown in Fig. 2. There are eight groups of analyzed software artifacts: source code artifacts, documentation artifacts, architecture artifacts, model artifacts, component artifacts, test artif acts, maintenance artifacts, and re gulation artifacts. Among them, 68% of pa- pers established links for “ sour ce code ”, 41% of papers for “ r equir ement ”, and 14% of papers for “ test ”. These software artifacts represent the elements most commonly studied in the ﬁeld of software traceability recovery . The majority of the research centers around “ sour ce code ”, which is closely related to the inherent nature of software systems: the source code is frequently modiﬁed to fulﬁll e volving requirement and such changes often require corresponding tests updates. This dynamic interplay underscores the necessity of establishing traceability links among these artifacts. 2) In vestigated Artifacts Associations.: For these studied artifacts, we found multiple types of associations resolved to build their links. Fig. 3 constructs a multidimensional association network which sho ws the interwea ving of het- erogeneous relationships across different abstraction le vels. These connections capture the semantic correspondence be- tween “describes” and elaboration, as well as the “imple- ments” mapping that reﬂects the evolution from abstract speciﬁcations to concrete logic. The graph also establishes paths for externally imposed “constrains” and for closed-loop conﬁrmation through “veriﬁes” . In addition, the association network models “causes” triggering relationships, “ﬁxed by” mechanisms for addressing speciﬁc issues, and “repr esents” relationships across v arying lev els of abstraction. T ogether , these relationships form closed internal dependency loops within a single domain and external association structures that span across domain boundaries. In the traceability graph, certain software artifacts appearing in multiple semantic relations indicate their multiple roles in the traceability network. Especially , “sour ce code” appears in multiple types of associations (e.g., “causes” , “r epr e- sents” , “implements” , “veriﬁes” , “describes” ), underscoring its central position in all phases of software development and maintenance. In addition, “requir ement” participants in “constrains” , “repr esents” , “implements” , “veriﬁes” , and “describes” traceability links, bahaving as a projection point of the semantic intent of the software. T raditional literature revie w often focus only on which artifacts are linked, which fails to uncov er the signiﬁcance of JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, A UGUST 2021 7 Fig. 2. Hierarchical Software Artifacts Diagram. It presents a hierarchical taxonomy of 22 distinct software artifacts identiﬁed in the literature, organized into eight functional groups and three granularity layers. In each group, green, blue, and yellow nodes denote the ﬁrst, second, and third hierarchical layer, respectiv ely . Fig. 3. Artifacts T raceability Graph. Background colors represent dif ferent artifact groups in Fig. 2. The dashed lines denote “is-a” relationships, while solid lines represent concrete relationships. The upper-right corner displays the links corresponding to each relationship. linking these pairs of software artifacts. T o articulate the soft- ware engineering relev ance of these links, we not only record the artifact pairs addressed in the papers, but also conduct an in-depth analysis of the papers, with a focus on extracting the practical problems that each link aims to solve. This helps practitioners and researchers identify high-value research directions and guides them in selecting appropriate artifact pairs for link establishment based on their needs. In particular, for different associations among the software artifacts in Fig. 3, we identify the signiﬁcance of these associations in software engineering research. In the following, we present the internal and external associations for each group of artifacts, along with the number of in v olved papers. Associations Fr om The Group of Source Code Artifacts. As the physical implementation layer of the system, the source code group embodies the concrete e xecution logic of the b usiness processes. Its internal associations reﬂect the structured organization of code entities as follows: • 4 b ug -> source code (1) : links bugs to code, which helps identify bug-prone modules and guides pre venti ve main- tenance. • 11 function -> source code (1) : enables precise impact analysis at the function level, minimizing side effects of code updates. For this group of artifacts, their external associations estab- lish mappings between implementation artifacts and require- ment, design, and testing, supporting functional veriﬁcation, and change impact analysis, which includes the associations as follows: • 22 source code -> documentation (9) : lowers onboarding JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, A UGUST 2021 8 barriers, ensuring docs effecti vely help dev elopers understand business logic. • 6 source code -> design (2) : v alidates that code follows intended patterns, reducing cognitiv e load during maintenance. • 23 source code -> requir ement (25) : maps code to requirement to verify that the ﬁnal deli very meets all original customer needs. • 7 comment -> test code (1) : explains complex test logic, reducing maintenance and reuse costs of test cases. • 12 source code -> test code (1) : maps code to tests, supporting regression optimization and ensuring stability after each commit. • 5 class -> component (1) : ensures that class implemen- tation aligns with component logic, prev enting decay and boosting reuse. Associations From The Group of Documentation Arti- facts. Document artifacts deﬁne the system’ s b usiness intent and technical solutions. Their internal associations rev eal the ev olution from requirement to design: • 16 requir ement -> design (3) : ensures that e very design decision is backed by a requirement, eliminating waste from ov er-engineering. The external associations of this group of artifacts establish bidirectional traceability paths between requir ement , code , and tests , ensuring that system development aligns with stake- holder expectations: • 17 use case -> source code (3) : links user v alue to code, ensuring dev elopment effort is prioritized for core business features. • 10 use case -> class (1) : identiﬁes which classes support speciﬁc user behaviors, aiding in ﬁne-grained impact assess- ment. • 8 requirement -> test case (1) : ensures that all require- ments are tested, directly reducing the risk of production failures due to missed logic. • 9 requirement -> model (1) : ensures that models accu- rately capture requirements, resolving contradictions before coding. Associations From The Group of Maintenance Artifacts. Maintenance artifacts record software evolution and repair activities. By linking change issues, defect reports, and code commits, they form a complete change log. These associations aim to improve defect localization ef ﬁciency and provide an auditable ev olution history for long-term system ev olution. There is one type of internal dependency and one type of external dependency: • 21 issue -> commit (6) : provides complete change logs, ensuring ev ery edit is justiﬁed and improving collaboration. • 18 bug report -> source code (4) : speeds up bug localiza- tion, signiﬁcantly shortening recovery cycles after failures. Associations Fr om The Gr oup of T est Artifacts. T est artifacts constitute the v eriﬁcation layer of quality assurance. By establishing external mappings from tests to source code artifacts, this category deﬁnes the coverage boundaries of veriﬁcation acti vities, provides a basis for regression testing optimization, and ensures that software iterations meet quality standards. • 20 unit test -> tested code (4) : establishes feedback loops at the unit level, lo wering integration debugging costs. • 14 test code -> tested code (2) : deﬁnes test coverage boundaries, enhancing safety during ref actoring and pre venting regressions. • 15 test case -> tested code (2) : ensures that test cases reach internal code implementations, improving bug detection efﬁcienc y . Associations From The Group of Regulation Artifacts. Regulatory artifacts represent external legal constraints and industry standards. This category transforms non-functional constraints into concrete system requirement and establishes external traceability links to support automated compliance auditing, thereby reducing legal violation risks and refactoring costs. • 2 regulation -> requir ement (1) : ensures softw are specs adhere to industry standards, av oiding le gal risks and costly refactoring. • 3 regulation -> use case (1) : ensures user scenarios do not violate industry guidelines, catching le gal conﬂicts early . Associations From The Group of Model Artifacts. Model artifacts provide abstract representations of the system. Their external associations aim to verify whether code implemen- tations deviate from model constraints, prevent architectural drift, and ensure that the system maintains structural robust- ness and consistency throughout ev olution. • 13 model -> source code (2) : veriﬁes that code has not drifted from model constraints, ensuring system robustness and scalability . Associations From The Group of Architectur e Artifacts. Architecture artifacts deﬁne the system’ s global topology and design principles. By monitoring code compliance with architectural speciﬁcations, this category preserves system modularity and stability , preventing micro-lev el code changes from undermining overall system cohesion. • 19 architectur e -> source code (4) : monitors if code violates architectural principles, ensuring stability during ev o- lution or under high-load. According to the statistical results, “sour ce code - r equir e- ment” emerges as the most intensively studied artifact pair , studied in 25 papers. In terms of artifact connectivity , “sour ce code” acts as a core node participating in the construction of 10 traceability chains, while “r equir ement” are in v olved in 5 links; together , they form the central pillars of the traceability graph. In contrast, traceability studies in volving “r e gulation” , “model” , and ﬁne-grained entities such as “class/function” remain relatively scarce. Approximately 45% of the artifact pairs appear only once in the existing literature, indicating substantial research gaps in traceability for regulatory compli- ance, model-driv en development, and lo w-lev el code entities. 3) Surve y Results: T o v alidate the practical value of our research, we conducted a quantitative e valuation through an online practitioner surve y (results in T able II) and a qualitative ev aluation via expert interviews (results in T able III). The results of the online surve y indicate strong support for our research. Among the 37 valid responses, we calculated the proportion of participants who assigned scores of 4 or 5 JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, A UGUST 2021 9 T ABLE II F E ED B AC K R E S U L T S O F O N L I NE S URV E Y . # Feedback # V alid Feedback Artifacts Completeness Hierarchy Clarity 38 37 95% 92% Association Completeness Associations Clarity Associations V alue Associations Rationality 95% 92% 92% 92% to each question. 92% of participants recognized the clarity , ra- tionality , and practical value of our artifacts traceability graph, and 95% acknowledged its completeness. In addition, 92% of participants afﬁrmed the interpretability of our software artifact hierarchy , and 95% recognized its completeness. A qualitati ve ev aluation conducted through expert inter- views (n = 4) further validated the values of our software artifact relationship netw ork. T wo industry e xperts empha- sized the study’ s potential for addressing real engineering challenges, noting that it provides a clear practical pathway for managing architectural ev olution risks and optimizing automated testing workﬂows. Meanwhile, two academic e x- perts v alued the study’ s contribution to decomposing complex semantic relationships and identifying research gaps in the ﬁeld, considering it a solid theoretical foundation for future work on LLM-driv en traceability and automated compliance auditing. This dual validation conﬁrms that our study not only systematizes knowledge at the theoretical le vel, but also demonstrates practical value for guiding real-world tasks. 4) Guidelines: Based on our ﬁndings, we propose three guidelines to advance artif acts traceability research, which provides a systematic roadmap for software management. Multi-hop T raceability Chains. While existing studies predominantly focus on speciﬁc binary artifact pairs (e.g., “ r equir ement - sour ce code ”), these isolated relationships are often insuf ﬁcient for complex engineering decisions. W e observe that many critical but missing associations can be bridged by integrating multi-hop traceability chains [47]. For example, by lev eraging an intermediary artifact C to connect A and B (where A → C and C → B links can be well established), we can construct a global association network. This shift from localized binary mappings to path-based trace- ability provides a systematic guideline for uncovering hidden dependencies among heterogeneous artifacts. This chain-based recovery strategy reveals deep relation- ships that remain in visible to single-pair methods, supporting more sophisticated analytical tasks. Follo wing this guideline, T able IV presents four highly valuable artifact pairs for which no direct recov ery methods were proposed in our literature pool. For each pair, we propose indirect traceability paths by utilizing intermediate artifacts as stepping stones, through which their associations can be ef fectively established and explored in future research. For example, a traceability chain “bug report” -> “sour ce code” -> “requir ement” illustrates the interconnected process of b ug ﬁnding and user requirement. Such chains facilitate establishing potential links between artifacts within the sequence via intermediate associations. If there is a signiﬁcant gap between two artifacts in the chain, appropriate intermediate artifacts can be chosen to provide additional semantic information, thereby extending the scope of link establishment. The Central Piv ot (r equirement - Code). In the land- scape of software e volution, the consistency between sour ce code and requir ement forms the backbone of traceability research. This link is more than a binary association; it is the essential semantic bridge between b usiness intent (“what”) and technical implementation (“how”). By anchoring high- lev el requirement to low-le vel source code, this relationship provides the most direct evidence for verifying functional completeness and ensuring that the ﬁnal deli very aligns with stakeholder expectations. From a structural perspectiv e, the “ source code - r equir e- ment ” link exhibits e xtraordinary topological centrality within the global Artifacts Traceability Graph. Our analysis re veals that these two artifacts serve as the primary hubs for nearly all long-range traceability paths: source code mediates links to architecture, tests, and defects, while requirement anchor upstream regulation and downstream design. If this core consistency is compromised, the global graph fragments into isolated “knowledge islands”. Consequently , we propose a synchronous construction guideline: traceability should shift from post-hoc recovery to early-stage integration. Establishing bidirectional “ code - r e- quir ement ” links from the project’ s inception creates a “single source of truth”. This mechanism not only enables precise change impact analysis and optimized regression testing, but also secures an immutable e vidence chain for compliance. The Quality Boundaries (Peripheral Links). In contrast to core artifacts, peripheral artifact pairs (such as “ r e gulation - use case ”, “ class - component ”, and “ requir ement - test case ”) are located at the margins of the traceability graph. These links connect different stages of the software lifecycle and in volv e cooperation between different experts (e.g., architects and testers). Neglecting these mar ginal relationships could lead to practical problems in software engineering, for example: • neglecting “ regulation - use case ” : The system may function correctly but fail to meet security standards, leading to high rew ork costs or legal risks when violations are discovered late. • neglecting “ class - component ” : Frequent code changes without architectural alignment make the system messy , hin- dering future updates or technology migrations. • neglecting “ requirement - test case ” : It is unclear whether core logic is actually veriﬁed, leading to redundant testing of minor features while critical risks remain hidden. • neglecting “ model - requirement ” : When technical models drift from business intent, the resulting system may be techni- cally functional but fail to solve the actual business problem. Researching these peripheral links is essential for a full- lifecycle management approach. Moving beyond simple point- to-point traceability to explore these less-studied links helps build a self-explanatory software system. In such an environ- ment, ev ery piece of code and every test can be traced back to its business purpose and legal origin, ensuring the software is reliable and easy to maintain over the long term. Answer to RQ1: There are 22 types of software artifacts and 23 types of associations analyzed in the current research. Half of existing studies recovered the links JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, A UGUST 2021 10 T ABLE III R E SP O N S ES OF E XP E RT I N T E R V I E W . Expert 1 : I work on architecture maintenance, and my biggest concern is architectural drift - a small code change can easily push the architecture off track. This study points out that research on the architecture -> source code link is extremely limited, which really hits the pain point. It made me realize we shouldn’t focus only on requirements; architecture should be treated as the top-level artifact, with automated checks to keep the system cohesi ve. Expert 2 : I’m responsible for CI/CD pipelines and constantly deal with bloated test suites. The paper highlights that source code is the hub of ten traceability chains, and it distinguishes between test cases and test code, which is very practical. This inspired me to anchor traceability on code and use bidirectional links to prune useless test cases from the pipeline. Expert 3 : I use LLMs for automated traceability . I used to think traceability was just a simple binary matching problem. But your idea of a multidimensional association network was quite enlightening. Treating requirements as a semantic projection point and distinguishing dif ferent relation weights made me realize that we shouldn’t treat all links the same - I need to design more ﬁne-grained prompts for LLMs. Expert 4 : I research compliance auditing for high-assurance systems, which has long been a niche area. The statistics in this study are quite re vealing: regulation-related links are extremely scarce, and nearly half of the artifact pairs appear only once. The paths you outlined, such as Regulation -> Requirement, basically provide the backbone for my research and strengthen my belief in using traceability graphs as e vidence chains for compliance. T ABLE IV E X AM P L E S O F P R OP O S E D I N DI R E C T T R AC E A BI L I T Y C H A I NS F OR L A T EN T A RT I FAC T A S S OC I A T I O NS . Missing Artifacts Associations Research V alue Multi-hop T raceability Chain regulation -> test case It maps legal constraints to veriﬁable entities, which enables automated compliance auditing and ensures that system behavior conforms to regulatory standards with empirical evidence. regulation -> requirement -> test case bug report -> requirement It links defects to requirement distinguishes coding errors from requirement ambiguities. By identifying requirement characteristics that induce failures, this approach improves requirement engineering quality and reduces rework costs. bug report -> source code -> requir ement architecture -> requirement It analyzes the alignment between implementation and business objectives to identify architectural erosion. Quantifying architectural support for functional ev olution provides a quantitativ e basis for refactoring decisions. architectur e -> source code -> requir ement model -> test code It establishes alignment between design speciﬁcations and test logic, ensuring that veriﬁcation conforms to architectural intent. It supports automatic derivation of test scripts when models change, improving the le vel of automated veriﬁcation. model -> source code -> test code between dif ferent documentations and the code they describe, which is the biggest research hotspot. B. (RQ2) Curr ent Status of Linking T ools W e in vestigated the existing linking tools by analyzing the representations of artifacts, the techniques used, and the ev aluation designs, which is reported in T able V. 1) Artifact Input Reprsentation.: Certain linking techniques can only be applied on speciﬁc representations of artifacts. W e compiled a list of artif act representations identiﬁed in the revie wed literature sho wn in the third column of T able V. Basically , these various artifact representations can be classiﬁed into ﬁve categories: • Linguistic T extual Repr esentations: This category primar- ily consists of human-written text or discrete linguistic units that lack programming syntax constraints. 16 software artifacts use this representation; e.g., “ r equir ement ”, “ b ug r eport ”, and ‘ issue ” can be represented by text or structured text. • Static Implementation Structur es: It represents imple- mentation logic and structural features governed by formal programming language syntax or static rules. 12 artifacts use this representation; e.g., “ sour ce code ” and “ test code ” can be represented by AST , code snippets, or structured code. • High-Level Design Blueprints: This cate gory focuses on high-lev el system architecture, logical relationships, or inter- face layouts while abstracting away speciﬁc implementation details. 5 software artifacts use this representation; e.g., class is represented by UML, design by UML or model element. • Dynamic Runtime Behaviors: It captures runtime behaviors and temporal sequences of a program during its actual or sim- ulated execution. 3 software artifacts use this representation; e.g., test code is represented by runtime execution. • Mathematical Relational Models Representations: This category transforms software information into mathematical objects or relational matrices to facilitate computational anal- ysis and link recording. 6 software artifacts use this represen- tation;, e.g., “ commit ” is represented by metadata. 2) Linking T echnique: Through an in-depth analysis of proposed tools, the techniques used for each pair of artifacts are presented in the second column of T able V. The linking techniques employed between artifacts are typically limited to only one or two categories. But the link of “ requir ement ” and “ sour ce code ” can apply nearly multiple techniques. W e summarize the following categories of linking techniques: • Information Retrieval (IR): IR is the most dominant technique, accounting for as much as 57% of papers. It relies on te xtual similarity methods including TF-IDF and VSM to connect artifacts such as requirement and models [48]. • Static/Dynamic Pr ogram Analysis (P A): Used in 21 studies, these program analysis techniques establish links between artifacts based on structural dependencies or runtime behaviors, such as call graphs [49] or code cov erage [50]. • Machine Learning (ML): ML is an emerging trend (10 studies). It trains models to recognize complex patterns to link artifacts by integrating v arious data sources [3], [51]–[58]. • Rules/Pattern Matching (Matching): W idely used in 7 JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, A UGUST 2021 11 T ABLE V S U MM A RY O F T E C H NI Q U ES , R E P R E SE N TA T IO N S , A N D E V A LU A T IO N M E T H O DS F OR A RTI FAC T L I N K IN G T O O L S Artifacts Pair (#Papers) T echnique Input Representation Code Public Dataset Public Metrics source code - requirement (25) IR, DF A, P A, Eye-T racking, Manual, LLM source code: AST , bytecode, token, ev ents, XML, identiﬁer, set, dynamic call graph, structured code, structured text, ﬁle, call graph requir ement: text, ID-R TM, R TM, labels, issues, structured text, identiﬁer 11/25 25/25 Recall, Precision, AP , MAP , IGR, F-Measure, FP , Cliff’s delta, P-value, Dif fAR, Effort, Selection Proportion, Correctness, Incorrectness, Similarity , Conﬂicts, Pass@1, Co verage source code - documentation (9) IR, ML, P A, Matching source code: term, identiﬁer , text, ﬁle, call graph, set o words documentation: text, ﬁle, model 1/9 8/9 Recall, Precision, F-Measure, REI issue - commit (6) IR, ML, DL, CL, PLM issue: text, structured te xt, metadata commit: text, code, structured te xt, metadata 4/6 6/6 Recall, Precision, AP , MAP , A UC, MCC, PF , MRR, A CC, F-Measure, P-value, Hit unit test - tested code (4) DL, Matching, DF A, P A, PLM unit test: text, token, vector , code tested code: text, identiﬁer , vector , code 1/4 4/4 Recall, Precision, A CC, F-Measure, Cliff’s delta, P-v alue architecture - source code (4) IR, ML, DF A, Matching, P A architectur e: keywords, te xt, identiﬁer, label source code: term, text, identiﬁer , vector 2/4 4/4 Recall, Precision, A CC, F-Measure, Correctness, Effecti veness, Speciﬁcity bug report - source code (4) IR, ML, Matching bug report: text source code: ﬁle, entity , identiﬁer , AST 3/4 4/4 Recall, Precision, MAP , MRR, Hit, Usefulness, Effecti veness use case - source code (3) IR, ML, P A, Matching use case: text, label source code: text, e vent, ﬁle 0/3 2/3 Recall, Precision, VPR, A CC, Cliff’s delta, P-v alue requirement - design (3) IR, DL requir ement: text, structured text design: model element, text 1/3 1/3 Recall, Precision, MAP , F-Measure test case - tested code (2) Matching, Manual test case: text, XML tested code: code, XML 1/2 2/2 Counts, Graph Connectivity test code - tested code (2) P A test code: structured code, runtime execution tested code: structured code, bytecode 1/2 2/2 Recall, Precision, MAP , A UC, F-Measure model - source code (2) IR, DF A model: identiﬁer , structured text source code: vector , structured code 0/2 0/2 Recall, Precision, FP , Effort, Effecti veness source code - test code (1) IR source code: token, AST test code: token, AST 1/1 1/1 Recall, Precision, A UC, A CC, F-Measure function - source code (1) P A function: list source code: ﬁle 0/1 1/1 Feasibility , Coverage source code - UML (1) Eye-T racking source code: visual widgets UML: visual widgets 0/1 0/1 - use case - class (1) IR use case: text class: structured code 0/1 1/1 Recall, Precision, AP , MAP , F-Measure, Cliff’s delta, P-v alue requirement - model (1) IR requir ement: text model: element, text 1/1 1/1 Recall, Precision, A UC, MCC, F-Measure requirement - test case (1) IR, ML requir ement: feature vector test case: feature vector 0/1 1/1 Recall, Precision comment - test code (1) IR, P A comment: text test code: code snippets 1/1 1/1 Recall, Precision, F-Measure source code - design (1) Matching source code: XML design: UML 0/1 1/1 Intersection, Agreements class - component (1) Manual class: UML component: UML 0/1 1/1 Effectiv eness bug - source code (1) DF A bug: patch source code: ﬁle 0/1 1/1 Agreements, Disagreements use case - regulation (1) IR use case: text regulation: te xt 0/1 0/1 AP , MAP , ACC regulation - requirement (1) IR, ML regulation: te xt requir ement: text 0/1 1/1 Recall, Precision, AP , F-Measure V alues in the “Code Public” and “Dataset Public” columns denote the number of papers releasing public resources out of the total papers for that task. studies, it uses predeﬁned rules or consistent patterns to create links between artifacts; e.g., linking a function with the test whose name includes the function name [15], [22], [23]. • Data Flow Analysis (DF A): Used in 6 studies, DF A tracks the transformation of data across variables and logical paths within a program. It is primarily employed to identify func- tional dependencies between artifacts or to verify the logical consistency of traceability links by analyzing ho w information propagates through the code implementation [24], [59]–[63]. • Eye-T racking : Used in 3 studies, this technique in volv es monitoring dev elopers’ visual attention patterns while they navigate between artifacts. As a non-automated approach speciﬁcally focused on connections between sour ce code and other software artifacts, it is primarily used to explore cogni- tiv e processes or manually validate the quality of established links [1], [64], [65]. • Manual Establishment (Manual): Reported in 3 studies, manual traceability in v olves human e xperts establishing or verifying links. As a non-automated approach that is generally not categorized as an experimental method, it is also dedicated to ev aluating the performance of other automated techniques [2], [66], [67]. JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, A UGUST 2021 12 • Deep Lear ning (DL): Used in 3 studies, DL lev erages neural architectures such as CNNs and RNNs to automatically extract hierarchical features from artifacts, capturing deep semantic relationships that traditional IR might miss [22], [68], [69]. • Pre-trained Language Models (PLM): Featured in 3 latest studies, PLMs such as CodeBER T or GraphCodeBER T utilize large-scale pre-training on code and natural language corpora to provide conte xt-aware embeddings, serving as a po werful backbone for modern traceability tasks [22], [69], [70]. • Large Language Model (LLM): As a ne w and promising direction (1 studies), LLMs such as GPT [71] rely on adv anced semantic understanding to connect artifacts across different languages and abstraction levels, such as between r equir ement and sour ce code [19]. • Contrastive Learning (CL): Employed in 1 studies, CL is an adv anced subset of representation learning that trains mod- els to pull related artifact pairs closer in the embedding space while pushing unrelated ones apart, signiﬁcantly impro ving the accuracy of cross-modal retriev al [70]. Among these categories of techniques, traditional methods, such as IR , Matching , DF A , and P A , of fer sev eral adv antages. They are well-established with a wealth of mature research av ailable that enables practitioners to select suitable methods based on speciﬁc needs. The y typically require fewer com- putational resources and hardware support. Howe ver , these traditional approaches hav e weak semantic understanding and are only suitable for well-structured and standard-design soft- wares. In addition, Manual Establishment and Eye-T rac king rely on human ef fort to establish links, which are generally not used in routine e xperiments. T o overcome these shortcomings, more recent and advanced techniques, including LLM , PLM , DL , ML , and CL , demon- strate signiﬁcantly stronger semantic comprehension for large- scale and complex softwares. Howe ver , the y require substantial resources for training models and constructing datasets. Fur- thermore, their application in this domain is still in its early stages, and achieving optimal performance often necessitates extensi ve customization, experimentation, and research. 3) Evaluation: W e analyzed three aspects of ev aluation designs used in the literature, reported in the last three columns of T able V. Code Accessibility . Only 37% of selected studies have released their technical source code. These open source im- plementations are mainly concentrated in studies that focus on traceability between source code and documentation . This trend may be attributed to the adoption of well-established IR and NLP techniques. They provide a valuable baseline for future research, allowing for reproducibility , comparison, and further extension. In contrast, the linking techniques used in the remaining studies are either non-reproducible or difﬁcult to reproduce. This means that a majority of contributions in this ﬁeld cannot be easily validated by the research community , which is a big challenge for fair and consistent comparisons. Dataset Accessibility . Compared with code accessibility , dataset availability is relatively higher: 89% of the selected papers made their datasets av ailable. Some papers used a combination of public benchmark datasets, modern open- source projects, and industrial/pri vate datasets; we regard such cases as having disclosed the datasets used in the paper . Metrics. A variety of ev aluation metrics were found in ev aluating the performance of different linking tools. The task of software traceability link recovery is fundamentally framed as either a binary classiﬁcation problem - determining whether a potential link is a “true” or “false” link - or as an IR problem - retrieving a set of rele vant target artifacts for a giv en source artifact. T o ev aluate solutions to such problems, we found three commonly used metrics: Recall , Precision , and F , which used in 67%, 68%, and 43% of research articles, respecti vely . These commonly adopted metrics can pro vide guidance for the e valuation experiments conducted in future research, yet other metrics can also be chosen based on the speciﬁc artifacts in volv ed (e.g., using Covera ge to assess established links between test cases and sour ce code ). 4) Guidelines: T o move beyond a simple list of tools, we synthesize the identiﬁed techniques into a practical frame- work for researchers. This subsection provides representation- technique-cost decision support to help researchers choose tools that ﬁt their av ailable resources. Furthermore, we propose common datasets and minimum metrics criteria to reduce ex- perimental bias and ensure the reliability of future ev aluations. Through this, we of fer a holistic frame work that bridges the gap between theoretical techniques and experimental rigor . T echnical Decision Map for T raceability Recov ery . W e argue that establishing traceability links essentially entails computing semantic or structural similarity between artifact representations via various automated techniques. Howe ver , the selection of speciﬁc recov ery approaches is not arbitrary; instead, it is contingent upon intrinsic properties of the artifacts and av ailable resource constraints. W e further acknowledge that dif ferent techniques entail varying lev els of human effort and computational ov erhead. T o of fer actionable guidance to researchers on selecting suitable recov ery approaches, we propose a technical decision map shown in Fig. 4, where we further analyzed all papers and mapped ﬁ ve categories of arti- fact representations to the corresponding recovery techniques and their associated cost proﬁles. Our analysis categorizes these techniques based on their cost proﬁles, ranging from LLLC (Low Labor , Low Computation) to HLHC (High Labor, High Computation). The detailed deﬁnition criteria are presented in T able VI. Labor cost was determined based on whether the research process entailed manual tasks such as data labeling, dataset construction, or groundtruth creation; while computation cost was determined based on whether a study’ s experimental settings inv olve high- performance servers, specialized hardware resources, or high computational intensity . The technical decision map in Fig. 4 illustrates a sophis- ticated trade-off between the advancement of linking tech- niques and the associated recov ery costs. Our analysis rev eals that links between natural language and source code exhibit the highest di versity in techniques, ranging from lo w-cost IR methods to computationally intensiv e ML/DL models. Notably , HLHC techniques are predominantly concentrated in scenarios requiring the bridging of signiﬁcant semantic gaps, such as mapping mathematical abstractions or design JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, A UGUST 2021 13 T ABLE VI D E FIN I T I ON C RI T E RI A FO R TH E C O S T O F L I N K I NG T EC H N IQ U E S Dimension Low Deﬁnition Criteria High Deﬁnition Criteria Labor Low Labor (LL): No manual standard data is required; datasets and groundtruth are not manually created, directly using existing public benchmarks. High Labor (HL): Labor-intensive tasks are required (e.g., manual labeling, manual dataset/groundtruth construction, and manual validation). Computation Low Computation (LC): Only basic computational resources are required (e.g., standard PCs); the algorithm has low complexity and does not require specialized hardware acceleration. High Computation (HC): The implementation requires sustained computational power from high-performance servers, specialized hardware such as GPUs, or experimental setups with high computational intensity. Fig. 4. T echnical Decision Map for Artifacts Traceability Linking. The techniques for each pair of artifact representations are categorized into four groups based on their costs: LLLC (low labor and low computation), LLHC (low labor and high computation), HLLC (high labor and lo w computation), and HLHC (high labor and high computation). models to dynamic execution traces. This pattern suggests that while automation is advancing, researchers must still navig ate the tension between precision and resource in vestment when dealing with high-level or v olatile artifacts. Common Datasets For Evaluation. Datasets are a key fac- tor in e xperimental ev aluation, as their quality and characteris- tics directly inﬂuence the performance and generalizability of traceability reco very techniques. Our analysis cate gorizes the datasets used in the literature into three types: (a) Benchmark Datasets (used in 25% of papers): widely cited and highly comparable, yet often suffer from being outdated, small-scale, and poorly maintained; (b) Selected Open-Sour ce Pr ojects (used in 74% of papers): large-scale projects sourced from platforms such as GitHub and Jira. They reﬂect contemporary dev elopment practices but may lack uniﬁed ground truth; (c) Industrial/Private Datasets (used in 24% of paper s): offer high realism b ut generally inaccessible and irreproducible for the broader research community . T o mitigate performance biases stemming from heteroge- neous dataset selection, we propose sets of “common datasets” for artifact pairs studied in more than two papers. W e deﬁne common datasets as those used in over 50% of papers or ov er ﬁve papers in a pair category . As illustrated in Fig. 5, speciﬁc datasets such as iT rust for “ sour ce code - r equir ement ” links and JDK v1.5 for “ source code - documentation ” links, serve as established benchmarks. W e strongly recommend that researchers prioritize these common datasets to ensure results are comparable across different studies. Notably , our analysis also rev eals a lack of common datasets for emerging or comple x artifact pairs, such as “ ar chitectur e - source code and use case - sour ce code ”. This absence of standardized benchmarks in these areas hinders the direct com- parison of different recovery techniques and suggests a critical need for the community to develop and share high-quality , open-source ground truth data for these speciﬁc scenarios. Minimum Metrics Criteria For Evaluation. W e observed that the e v aluation metrics are not entirely consistent across different pairs of artifacts. T o improv e the reliability and comparability of the experimental results, we also e xtracted a minimum metrics criterion for artifact pairs studied in more than two papers. Similar to our dataset selection logic, a metric is included in the minimum metrics criteria if its usage e xceeds 50% in its cate gory , while those between 20% and 50% are labeled as additional metrics. As illustrated in Fig. 5, the selection of metrics is highly sensitive to the nature of the traceability task. For requirement-related links, Recall , Precision , and F1- scor e constitute the minimum reporting requirement, ensuring a balanced ev aluation of retriev al completeness and accuracy . In contrast, for tasks in volving bug reports or architecture abstractions, ranking-oriented metrics such as MAP and MRR are prioritized to reﬂect the effecti veness of automated rec- ommendation lists. The absence of recommended metrics for “ use case - source code ” still highlights a lack of community consensus. Adhering to these criteria ensures that the most critical information of interest for a gi ven artifact pair is consistently ev aluated. Answer to RQ2: The current landscape of traceability tools is dominated by IR (57%), though a shift tow ard resource-intensiv e semantic models (especially ML/DL and PLM/LLM) is emer ging to bridge comple x artifact gaps. While dataset accessibility is high (89%), the community faces signiﬁcant challenges in reproducibility JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, A UGUST 2021 14 Fig. 5. Metrics and Datasets Recommendation for Software Artifact P airs. This ﬁgure summarizes common datasets and minimum/additional metrics criteria for artifact pairs studied in the literature. due to low code transparency (37%) and a lack of standardized benchmarks for high-level abstractions. C. (RQ3) Usage Scenarios T o understand the application of recovered links, we in vesti- gated their domains, in volv ed software phases, and objectiv es. 1) Domain: The usage scenarios of dif ferent artifact asso- ciations are summarized in T able VII. W e found that 95% of linking approaches were applied in an academic setting and 5% were applied in speciﬁc industrial projects. Among them, ﬁv e papers addressed both academic and industrial concerns. 72 academic-oriented studies covered a wide range of software artifacts, whereas 4 industry-oriented studies focused on a few pairs of artifacts that recei ve little attention in academic research. F or example, Bonner et al. [4] recov ered the links between design and requir ement in the domain of automotiv e electronics and electrical systems de velopment. 2) Softwar e Lifecycle Phase: The fourth column of T able VII shows that the traceability links between artifacts studied by existing research participate in nearly all phases of the software lifecycle. Half of existing studies aimed at restoring links appeared in the “implementation” phase, followed by the “maintenance” phase which was targeted by 26% of studies. In long-liv ed software systems that undergo multiple dev elopment and maintenance cycles, the “implementation” phase determines the creation and quality of traceability links, while the “maintenance” phase determines their longevity and value. 3) Objective: W e cate gorized the objecti ves of the recov- ered links between artifacts in all collected papers. • Requirements-Implementation Consistency: In this cat- egory , artifact links are mainly recovered for maintaining consistency between requir ement and code . For example, Mona et al. [72] addressed the problem that traceability links rapidly degrade due to code refactoring during software ev olution, leading to a loss of consistency between requirement documents and implementation code. • Quality Assurance and Maintenance: The primary focus within this category is to support defect localization. For example, Zhang et al. [70] proposed an efﬁcient pretraining framew ork named “EALink”, which addresses the problem that missing associations between issue and commit in soft- ware maintenance adversely affect defect localization and prediction accuracy . • Program Comprehension: This category aims to facilitate program comprehension by b uilding links between documen- tation and implementation. For e xample, Nepomuceno et al. [66] v alidated the effecti veness of the SMarty traceability mechanism to assist dev elopers in understanding the impact of conﬁgurations across versions and diagrams. • Model-driven System Understanding: The most exten- siv ely studied objectiv e is to maintain consistency between goals and system speciﬁcations. For example, Ghabi et al. [73] proposed a veriﬁcation framework that supports uncertainty representation, enabling the automatic deriv ation of logical links between models and implementations, thereby enhancing dev elopers’ deep understanding of system consistency across abstraction lev els. • Regulatory Compliance Support: This category of studies aims to ensure that software applications comply with regu- latory requirement; e.g., to reb uild missing traceability links between requir ement and r e gulation in the healthcare, ﬁnance, or insurance domains [32], [74]. The various studies targeting requirement artifact differ in the granularity of the requirement being analyzed. The div ersity of research scenarios reﬂects the multidi- mensional value of software traceability throughout software lifecycle. Man y core application scenarios (e.g., change impact analysis, b ug localization, code navig ation, and program com- prehension) directly support developers’ daily tasks, indicating a major trend in current research: leveraging traceability to enhance dev elopment efﬁciency . 4) Guidelines: While our statistical analysis indicates that the technical foundations for link recov ery are becoming JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, A UGUST 2021 15 increasingly sophisticated (with high academic output focusing on metrics such as precision and recall), the stark contrast between high academic and low industrial adoption suggests a critical alignment gap. Many traceability links are currently established in isolation, optimizing for statistical performance without a clear deﬁnition of who will use them or how they generate value. Such blindly established links incur continuous maintenance costs while delivering negligible utility , contribut- ing to the traceability decay often observed in practice. T o address this, we propose a paradigm shift from “artifact- centric” to “role-centric” traceability , as illustrated in Fig. 6. Speciﬁcally , we synthesized research objecti ves from the literature and mapped them to lifecycle stages and core issues relev ant to four key roles: requirements analysts, dev elop- ers, test engineers, and compliance auditors. This framew ork identiﬁes the critical artifact associations that each role must prioritize to address their speciﬁc engineering challenges. As visualized in the ﬁgure, traceability should not be vie wed as a static graph of artifact pairs, but as a dynamic service layer supporting speciﬁc stakeholder needs. For example, for compliance auditors, links must prioritize auditability and prov enance (e.g., “ r e gulation - r equir ement ”) to minimize legal risk; for de velopers, links should focus on impact analysis and comprehension (e.g., “ source code – design ”) to facilitate code ev olution; and for test engineers, links must ensure co verage (e.g., “ requir ement – test case ”) to optimize regression testing. Future research and practice should adhere to the principle of deﬁning the goal before the link. Practitioners are advised to periodically revie w existing trace links to prune lo w-value associations, focusing resources strictly on the high-value paths identiﬁed in our goal-dri ven traceability framew ork. Answer to RQ3: Software traceability recovery stud- ies were conducted mainly in an academic setting and cov ered the entire software lifecycle. The usage scenar- ios for recov ered links are overwhelmingly concentrated on requirement-implementation consistenc y and mainte- nance support, which account for the majority of the studied literature. V . D I S C U S S I O N Based on the analysis conducted in the previous sections, we distill se veral critical insights to provide a broader perspectiv e on the current state of software traceability recovery . After that, we present the limitations of our study in this section. A. T akeaway Complementing the speciﬁc guidelines deri ved from RQ1 to RQ3, we highlight se veral takea ways for this community . From Static Recovery to Dynamic Evolution. Early research treated traceability as a static and binary mapping task. Howe ver , our Artifacts Traceability Graph re veals a mul- tidimensional ecosystem, demonstrating that isolated point-to- point reco very is structurally insufﬁcient for e v olving software. T o prevent se vere traceability decay , future research must transition from one-time mapping to continuous traceabil- ity maintenance, inte grating automated recovery directly into CI/CD pipelines to ensure global consistency without incurring traceability debt. Breaking the Semantic Bottleneck. Our analysis exposes a se vere asymmetric maturity: traceability recov ery excels in textual artifacts but largely ne glects structural models and abstractions. This disparity creates a semantic bottleneck that leav es developers blind to whether speciﬁc code changes violate high-le vel architectural constraints. T o pre vent silent ar - chitectural drift, future research must move beyond traditional textual similarity . W e ur ge the community to le verage cross- modal alignment and graph representation learning to project heterogeneous artifacts into a uniﬁed topological space. Balancing Semantic Power , Reliability , and Efﬁciency . Our RQ2 analysis reveals that traceability techniques are rapidly ev olving from cost-efﬁcient lexical matching to ad- vanced semantic models (e.g., PLMs/LLMs). Howe ver , this shift introduces critical challenges: massive computational ov erhead and the se vere risk of hallucinations (i.e., generating plausible but incorrect links). T o achiev e trustworthy and sustainable traceability , future research can dev elop hybrid approaches that constrain lar ge models’ outputs with determin- istic rules, or compress the deep contextual reasoning of large models into lightweight and locally-deployable tools for daily engineering workﬂo ws (e.g., through kno wledge distillation). B. Thr eats to V alidity W e discuss the threats to the validity of this revie w across three dimensions. Construct V alidity . The diverse and ev olving terminology in software traceability introduces a risk of missing rele vant literature during keyword-based searches. T o mitigate this threat, we systematically queried ﬁ ve major databases and supplemented our retriev al with a bidirectional snowballing strategy to capture elusiv e studies. Internal V alidity . The manual literature ﬁltering and artif act classiﬁcation ine vitably introduce subjectiv e selection bias. W e minimized this by establishing rigorous inclusion criteria and employing independent cross-v alidation by two authors, with a third resolving conﬂicts. Furthermore, our linking tool analysis relies entirely on self-reported data from the primary studies rather than independent tool e xecution. External V alidity . The generalizability of our proposed framew orks may be limited. Although validated by an expert surve y , the results reﬂect participants’ perceiv ed usefulness rather than actual performance in complex industrial settings. Future large-scale controlled experiments are required to val- idate their practical efﬁcacy . V I . C O N C L U S I O N S In this paper , we conducted a systematic literature revie w to provide a comprehensi ve ov erview of software traceability recov ery . W e collected 76 research papers that recov ered traceability links between software artifacts. W e summarized 22 types of software artifacts and 23 types of associations analyzed in the current research. W e conﬁrmed that div erse ar- tifacts, predominantly sour ce code , test code , and r equirement , are focal points for traceability . Additionally , our analysis JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, A UGUST 2021 16 Fig. 6. Goal-Driven Traceability Framework. This framework aligns speciﬁc traceability links with the lifecycle stages and core issues relev ant to four ke y roles: auditors, analysts, testers, and developers. T ABLE VII S U MM A RY O F D O M A IN S , S O F T W AR E L I F E C YC L E P H A S ES , A N D O B J E CT I V E S F O R A S S O CI ATI O N S B E TW E E N A RT I F A CT S Software Artifacts A - Software Artifacts B Domain Software Lifecycle Phase Objective Academic Industrial source code - requirement (25) 25/25 0/25 Requirement Design Implementation Requirements-Implementation Consistency (34) mainly includes: · Enhancing Accurac y and Quality of Automated Recov ery (10) · Bridging Semantic Gaps and V ocabulary Mismatches (9) · Mitigating T raceability Degradation during Softw are Ev olution (6) use case - source code (3) 3/3 0/3 requirement - design (3) 2/3 1/3 source code - design (2) 2/2 0/2 use case - class (1) 1/1 0/1 issue - commit (6) 6/6 0/6 Requirement Implementation T esting Maintenance Quality Assurance and Maintenance (21) mainly includes: · Enhancing Reco very Accuracy and Semantic Alignment (7) · Addressing Data Sparsity and Industrial Practicality (5) bug report - source code (4) 4/4 0/4 unit test - tested code (4) 4/4 0/4 test code - tested code (2) 2/2 0/2 test case - tested code (2) 2/2 0/2 bug - source code (1) 1/1 0/1 requirement - test case (1) 1/1 0/1 source code - test code (1) 1/1 0/1 source code - documentation (9) 8/9 1/9 Design Implementation T esting Maintenance Program Comprehension (12) mainly includes: · Establishing T raceability for Informal and Leg acy Environments (5) class - component (1) 1/1 0/1 function - source code (1) 1/1 0/1 comment - test code (1) 1/1 0/1 architecture - source code (4) 4/4 0/4 Requirement Design Implementation Model-driven System Understanding (7) mainly includes: · Reconstructing Architectural Knowledge from Limited Information (3) model - source code (2) 1/2 1/2 requirement - model (1) 0/1 1/1 regulation - requirement (1) 1/1 0/1 Requirement Regulatory Compliance Support (2) mainly includes: · ensuring softw are application compliance to regulation (2) use case - regulation (1) 1/1 0/1 Numbers in parentheses indicate the total number of selected studies in each category . V alues in the “ Academic” and “Industrial” columns denote the number of papers in the respective domain out of the total papers. JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, A UGUST 2021 17 highlighted a signiﬁcant shift tow ards employing advanced learning models for their adv anced semantic understanding, complementing traditional techniques in establishing these crucial links. Howe ver , the current e valuation re vealed critical limitations: the low a vailability of source code (only 37% of studies) and the absence of standardized ev aluation metrics. W e also found that the usage scenarios for recovered links are ov erwhelmingly concentrated on requirement-implementation consistency and maintenance support (72% of studies). Future research should prioritize developing robust tools with openly shared source code, establishing uniﬁed metrics for performance e valuation, and exploring the full potential of advanced AI techniques, to address the remaining challenges in global and goal-driven traceability recov ery scenarios. R E F E R E N C E S [1] N. Ali, Z. Sharaﬂ, Y .-G. Guéhéneuc, and G. Antoniol, “ An empirical study on requirements traceability using eye-tracking, ” in Pr oceedings of the 2012 28th IEEE International Conference on Software Maintenance , IEEE. IEEE, 2012, pp. 191–200. [2] A. Kicsi, L. V idács, and T . Gyimóthy , “T estroutes: A manually curated method le vel dataset for test-to-code traceability , ” in Proceedings of the 17th International Confer ence on Mining Softwar e Repositories , 2020, pp. 593–597. [3] W . Zogaan, I. Mujhid, J. C. S. Santos, D. Gonzalez, and M. Mirakhorli, “ Automated training-set creation for software architecture traceability problem, ” Empirical Software Engineering , vol. 22, no. 3, pp. 1028– 1062, 2017. [4] M. Bonner, M. Zeller , G. Schulz, D. Beyer, and M. Olteanu, “ Automated traceability between requirements and model-based design. ” in REFSQ W orkshops , 2023. [5] R. White and J. Krinke, “Tctracer: Establishing test-to-code traceability links using dynamic and static techniques, ” Empirical Software Engi- neering , vol. 27, no. 3, p. 67, 2022. [6] A. Qusef, G. Bavota, R. Oliveto, A. D. Lucia, and D. Binkley , “Eval- uating test-to-code traceability recovery methods through controlled experiments, ” Journal of Software: Evolution and Pr ocess , vol. 25, no. 11, pp. 1167–1191, 2013. [7] T . W . W . Aung, H. Huo, and Y . Sui, “ A literature revie w of automatic traceability links recovery for software change impact analysis, ” in Pr oceedings of the 28th International Conference on Pr ogr am Com- pr ehension . ACM, 2020, pp. 14–24. [8] G. Nguyen-T ruong, H. J. Kang, D. Lo, A. Sharma, A. E. Santosa, A. Sharma, and M. Y . Ang, “Hermes: Using commit-issue linking to detect vulnerability-ﬁxing commits, ” in Pr oceedings of the 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering . Honolulu, HI, USA: IEEE, 2022, pp. 51–62. [9] L. Naslavsk y and D. J. Richardson, “Using traceability to support model-based re gression testing, ” in Proceedings of the 22nd IEEE/A CM International Conference on A utomated Software Engineering . New Y ork, NY , USA: ACM, 2007, pp. 567–570. [10] M. C. Panis, “Successful deployment of requirements traceability in a commercial engineering organization... really , ” in Proceedings of the 2010 18th IEEE International Requirements Engineering Confer ence . Sydney , NSW , Australia: IEEE, 2010, pp. 303–307. [11] A. D. Rodriguez, J. Cleland-Huang, and D. F alessi, “Le veraging interme- diate artifacts to improve automated trace link retriev al, ” in Pr oceedings of the 2021 IEEE International Conference on Softwar e Maintenance and Evolution . Luxembourg: IEEE, 2021, pp. 81–92. [12] B. W ang, H. W ang, R. Luo, S. Zhang, and Q. Zhu, “ A systematic mapping study of information retrie val approaches applied to require- ments trace recovery . ” in Pr oceedings of the Software Engineering and Knowledge Engineering , 2022, pp. 1–6. [13] B. W ang, X. Li, H. W an, and Y . Deng, “ A systematic mapping study of machine learning techniques applied to software traceability , ” in Pr oceedings of the 2023 IEEE International Confer ence on Systems, Man, and Cybernetics . Honolulu, Oahu, HI, USA: IEEE, 2023, pp. 623–628. [14] M. Hammad, M. L. Collard, and J. I. Maletic, “ Automatically identifying changes that impact code-to-design traceability , ” in Pr oceedings of the 2009 IEEE 17th International Confer ence on Pr ogram Comprehension , IEEE. IEEE, 2009, pp. 20–29. [15] S. Shimmi and M. Rahimi, “Patterns of code-to-test co-evolution for automated test suite maintenance, ” in Pr oceedings of the 2022 IEEE Confer ence on Software T esting, V eriﬁcation and V alidation , IEEE Computer Society . IEEE Computer Society , 2022, pp. 116–127. [16] I. Omoronyia, G. Sindre, M. Roper , J. Ferguson, and M. W ood, “Use case to source code traceability: The dev eloper na vigation view point, ” in Pr oceedings of the 2009 17th IEEE International Requirements Engineering Conference , IEEE. IEEE, 2009, pp. 237–242. [17] N. Ali, Z. Sharaﬁ, Y .-G. Guéhéneuc, and G. Antoniol, “ An empirical study on the importance of source code entities for requirements traceability , ” Empirical softwar e engineering , vol. 20, no. 2, pp. 442– 478, 2015. [18] H. Kuang, P . Mäder , H. Hu, A. Ghabi, L. Huang, J. Lü, and A. Egyed, “Can method data dependencies support the assessment of traceability between requirements and source code?” Journal of Software: Evolution and Pr ocess , v ol. 27, no. 11, pp. 838–866, 2015. [19] M. North, A. Atapour-Abarghouei, and N. Bencomo, “Code gradients: T o wards automated traceability of llm-generated code, ” in Pr oceedings of the 2024 IEEE 32nd International Requir ements Engineering Con- fer ence , IEEE. IEEE, 2024, pp. 321–329. [20] A. Ghabi and A. Egyed, “Code patterns for automatically v alidating requirements-to-code traces, ” in Pr oceedings of the 27th IEEE/ACM International Conference on Automated Softwar e Engineering , 2012, pp. 200–209. [21] H. Kuang, P . Mäder, H. Hu, A. Ghabi, L. Huang, L. Jian, and A. Egyed, “Do data dependencies in source code complement call dependencies for understanding requirements traceability?” in Pr oceedings of the 2012 28th IEEE International Conference on Software Maintenance , IEEE. IEEE, 2012, pp. 181–190. [22] W . Sun, Z. Guo, M. Y an, Z. Liu, Y . Lei, and H. Zhang, “Method- lev el test-to-code traceability link construction by semantic correlation learning, ” IEEE T ransactions on Software Engineering , 2024. [23] A. Qusef, G. Bavota, R. Oli veto, A. De Lucia, and D. Binkley , “Scotch: T est-to-code traceability using slicing and conceptual coupling, ” in Pr o- ceedings of the 2011 27th IEEE International Confer ence on Softwar e Maintenance , IEEE. IEEE, 2011, pp. 63–72. [24] A. Qusef, R. Oliveto, and A. De Lucia, “Recovering traceability links between unit tests and classes under test: An improv ed method, ” in Pr oceedings of the 2010 IEEE International Conference on Softwar e Maintenance , IEEE. IEEE, 2010, pp. 1–10. [25] “website: https://sites.google.com/view/sok-software-traceability , ” Jul. 10, 2025. [26] S. Charalampidou, A. Ampatzoglou, E. Karountzos, and P . A vgeriou, “Empirical studies on software traceability: A mapping study , ” Journal of Software: Evolution and Process , v ol. 33, no. 2, p. e2294, 2021. [27] Z. W an, Y . Zhang, X. Xia, Y . Jiang, and D. Lo, “Software architecture in practice: Challenges and opportunities, ” in Pr oceedings of the 31st ACM Joint Eur opean Software Engineering Confer ence and Symposium on the F oundations of Softwar e Engineering . Ne w Y ork, NY , USA: A CM, 2023, pp. 1457–1469. [28] J. P . Castellanos Ardila, B. Gallina, and F . Ul Muram, “Compliance checking of softw are processes: A systematic literature re view , ” Journal of Software: Evolution and Process , v ol. 34, no. 5, p. e2440, 2022. [29] J. Alves-Foss, D. Conte de Leon, and P . Oman, “Experiments in the use of xml to enhance traceability between object-oriented design speciﬁcations and source code, ” in Proceedings of the 35th Annual Hawaii International Confer ence on System Sciences . Big Island, HI, USA: IEEE, 2002, pp. 3959–3966. [30] M. Hammad, M. L. Collard, and J. I. Maletic, “ Automatically identi- fying changes that impact code-to-design traceability during ev olution, ” Softwar e Quality Journal , v ol. 19, pp. 35–64, 2011. [31] A. Abadi, M. Nisenson, and Y . Simionovici, “ A traceability technique for speciﬁcations, ” in Pr oceedings of the 2008 16th IEEE International Confer ence on Pr ogram Comprehension . Amsterdam, Netherlands: IEEE, 2008, pp. 103–112. [32] J. Cleland-Huang, A. Czauderna, M. Gibiec, and J. Emenecker , “ A machine learning approach for tracing regulatory codes to product spe- ciﬁc requirements, ” in Proceedings of the 32nd ACM/IEEE International Confer ence on Softwar e Engineering-V olume 1 , 2010, pp. 155–164. [33] M. Rahimi, W . Goss, and J. Cleland-Huang, “Ev olving requirements-to- code trace links across versions of a software system, ” in Proceedings of the 2016 IEEE International conference on softwar e maintenance and evolution , IEEE. IEEE, 2016, pp. 99–109. [34] A. Marcus and J. I. Maletic, “Recovering documentation-to-source-code traceability links using latent semantic inde xing, ” in Pr oceedings of the 25th International Confer ence on Softwar e Engineering, 2003. , IEEE. IEEE, 2003, pp. 125–135. JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, A UGUST 2021 18 [35] J. L. Cybulski, R. D. Neal, A. Kram, and J. C. Allen, “Reuse of early life- cycle artifacts: W orkproducts, methods and tools, ” Annals of Software Engineering , vol. 5, no. 1, pp. 227–251, 1998. [36] R. M. P arizi, S. P . Lee, and M. Dabbagh, “ Achie vements and challenges in state-of-the-art software traceability between test and code artifacts, ” IEEE T ransactions on Reliability , v ol. 63, no. 4, pp. 913–926, 2014. [37] B. W ang, S. Hu, L. Y e, H. W an, Z. Zou, X. Li, and J. Zhu, “ Advance- ments in bug traceability: A systematic mapping study , ” in Pr oceedings of the 2024 IEEE International Confer ence on Systems, Man, and Cybernetics . Kuching, Malaysia: IEEE, 2024, pp. 4757–4762. [38] F . Khalil, G. Rebdawi, and N. Ghneim, “ A systematic mapping revie w: T racking the relationships between software artifacts using nlp, ” ECTI T ransactions on Computer and Information T echnology (ECTI-CIT) , vol. 19, no. 2, pp. 321–333, 2025. [39] D. H. Abd Rahman, R. A. Sulaiman, and M. Jelani, “Exploring traceabil- ity techniques on software engineering: A revie w and future directions, ” Journal of Applied Science, T echnology and Computing , vol. 2, no. 1, pp. 32–42, 2025. [40] A. M. Rosado da Cruz and E. F . Cruz, “Machine learning techniques for requirements engineering: A comprehensive literature review , ” Software , vol. 4, no. 3, p. 14, 2025. [41] T . Koboyatshwene and Y . A yale w , “Requirements traceability: A sys- tematic literature review , ” in Proceedings of the 40th ACM/SIGAPP Symposium on Applied Computing , 2025, pp. 1509–1513. [42] B. Kitchenham, “Procedures for performing systematic revie ws, ” Keele, UK, Keele University , vol. 33, no. 2004, pp. 1–26, 2004. [43] I. . 2012., “Information technology – object management group architecture-driv en modernization (adm) – knowledge discovery meta- model (kdm). standard. international organization for standardiza- tion, geneva, ch. https://www .iso.org/obp/ui/#iso:std:iso- iec:19506:ed- 1:v1:en, ” 2012. [44] “Model services contract. https://www .go v .uk/government/ publications/model-services-contract. ” 2019. [45] I. Sommerville., “Software engineering (9th ed.). addison-wesley pub- lishing company , usa. ” 2010. [46] G. Albaum, “The likert scale revisited, ” Market Research Society . Journal. , v ol. 39, no. 2, pp. 1–21, 1997. [47] H. Gao, H. Kuang, W . K. Assunção, C. Mayr-Dorn, G. Rong, H. Zhang, X. Ma, and A. Egyed, “T riad: Automated traceability recovery based on biterm-enhanced deduction of transitiv e links among artifacts, ” in Pro- ceedings of the IEEE/A CM 46th International Confer ence on Softwar e Engineering , 2024, pp. 1–13. [48] R. Lapeña, F . Pérez, C. Cetina, and Ó. Pastor , “Le veraging bpmn particularities to improve traceability links recov ery among requirements and bpmn models, ” Requirements Engineering , v ol. 27, no. 1, pp. 135– 160, 2022. [49] T . Y oshikawa, S. Hayashi, and M. Saeki, “Recovering traceability links between a simple natural language sentence and source code using domain ontologies, ” in Proceedings of the 2009 IEEE International Confer ence on Software Maintenance , IEEE. IEEE, 2009, pp. 551– 554. [50] P . J. A. V ianna Ferreira and M. d. O. Barros, “T raceability between function point and source code, ” in Pr oceedings of the 6th International W orkshop on T r aceability in Emer ging F orms of Software Engineering , 2011, pp. 10–16. [51] J. H. Hayes, G. Antoniol, B. Adams, and Y .-G. Guéhéneuc, “Inherent characteristics of traceability artifacts less is more, ” in Pr oceedings of the 2015 IEEE 23r d International Requir ements Engineering Confer ence , IEEE. IEEE, 2015, pp. 196–201. [52] M. Grechanik, K. S. McKinley , and D. E. Perry , “Recovering and using use-case-diagram-to-source-code traceability links, ” in Pr oceedings of the 6th joint meeting of the European softwar e engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering , 2007, pp. 95–104. [53] M. Rath, D. Lo, and P . Mäder , “ Analyzing requirements and traceability information to improve bug localization, ” in Pr oceedings of the 15th International Conference on Mining Software Repositories , 2018, pp. 442–453. [54] V . Zapalo wski, I. Nunes, and D. J. Nunes, “Rev ealing the relationship between architectural elements and source code characteristics, ” in Pr oceedings of the 22nd International Confer ence on Pr ogram Com- pr ehension , 2014, pp. 14–25. [55] T .-D. B. Le, M. Linares-V asquez, D. Lo, and D. Poshyvanyk, “Rclinker: Automated linking of issue reports and commits leveraging rich conte x- tual information, ” in Pr oceedings of the 2015 IEEE 23r d International Confer ence on Pr ogram Compr ehension , 2015, pp. 36–47. [56] L. Dong, H. Zhang, W . Liu, Z. W eng, and H. Kuang, “Semi-supervised pre-processing for learning-based traceability framework on real-world software projects, ” in Pr oceedings of the 30th ACM Joint Eur opean Softwar e Engineering Confer ence and Symposium on the F oundations of Software Engineering , 2022, pp. 570–582. [57] X. Chen, J. Hosking, J. Grundy , and R. Amor, “Dctracvis: a system retrieving and visualizing traceability links between source code and documentation, ” Automated Software Engineering , vol. 25, no. 4, pp. 703–741, 2018. [58] X. Chen and J. Grundy , “Improving automated documentation to code traceability by combining retrie val techniques, ” in Pr oceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering , IEEE. IEEE, 2011, pp. 223–232. [59] C. S. Corle y , N. A. Kraft, L. H. Etzkorn, and S. K. Lukins, “Reco vering traceability links between source code and ﬁxed b ugs via patch analysis, ” in Pr oceedings of the 6th international workshop on traceability in emer ging forms of software engineering , 2011, pp. 31–37. [60] J. Adersberger and M. Philippsen, “Reﬂexml: Uml-based architecture- to-code traceability and consistenc y checking, ” in Pr oceedings of the Eur opean Confer ence on Softwar e Ar chitectur e , Springer . Springer , 2011, pp. 344–359. [61] J. C. Santos, M. Mirakhorli, I. Mujhid, and W . Zogaan, “Budget: A tool for supporting software architecture traceability research, ” in Pro- ceedings of the 2016 13th W orking IEEE/IFIP Conference on Softwar e Ar chitectur e , IEEE. IEEE, 2016, pp. 303–306. [62] T . Hey , F . Chen, S. W eigelt, and W . F . T ichy , “Improving traceability link recovery using ﬁne-grained requirements-to-code relations, ” in Pr oceedings of the 2021 IEEE International Conference on Softwar e Maintenance and Evolution , IEEE. IEEE, 2021, pp. 12–22. [63] P . Hübner and B. Paech, “Interaction-based creation and maintenance of continuously usable trace links between requirements and source code, ” Empirical Software Engineering , vol. 25, no. 5, pp. 4350–4377, 2020. [64] B. W alters, M. Falcone, A. Shibble, and B. Sharif, “T o wards an eye-tracking enabled ide for software traceability tasks, ” in 2013 7th International W orkshop on T raceability in Emer ging F orms of Softwar e Engineering , IEEE. IEEE, 2013, pp. 51–54. [65] B. Sharif, J. Meinken, T . Shaffer , and H. Kagdi, “Eye movements in software traceability link reco very , ” Empirical Softwar e Engineering , vol. 22, no. 3, pp. 1063–1102, 2017. [66] T . Nepomuceno, E. OliveiraJr , R. Geraldi, A. Malucelli, S. Reinehr, and M. A. G. Silv a, “Software product line conﬁguration and traceability: An empirical study on smarty class and component diagrams, ” in Pr oceedings of the 2020 IEEE 44th Annual Computers, Software, and Applications Conference , IEEE. IEEE, 2020, pp. 979–984. [67] A. Egyed, F . Graf, and P . Grünbacher , “Effort and quality of recovering requirements-to-code traces: T wo e xploratory e xperiments, ” in Pr oceed- ings of the 2010 18th IEEE International Requir ements Engineering Confer ence , IEEE. IEEE, 2010, pp. 221–230. [68] J. Guo, J. Cheng, and J. Cleland-Huang, “Semantically enhanced software traceability using deep learning techniques, ” in Proceedings of the 2017 IEEE/A CM 39th International Confer ence on Software Engineering , IEEE. IEEE, 2017, pp. 3–14. [69] J. Lan, L. Gong, J. Zhang, and H. Zhang, “Btlink: automatic link recovery between issues and commits based on pre-trained bert model, ” Empirical Software Engineering , vol. 28, no. 4, p. 103, 2023. [70] C. Zhang, Y . W ang, Z. W ei, Y . Xu, J. W ang, H. Li, and R. Ji, “Ealink: An efﬁcient and accurate pre-trained framew ork for issue-commit link recovery , ” in 2023 38th IEEE/A CM International Confer ence on Auto- mated Software Engineering , 2023, pp. 217–229. [71] J. Achiam, S. Adler , S. Agarwal, L. Ahmad, I. Akkaya, F . L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkat et al. , “Gpt-4 technical report, ” ArXiv Preprint , 2023. [72] M. Rahimi and J. Cleland-Huang, “Evolving software trace links be- tween requirements and source code, ” Empirical Software Engineering , vol. 23, no. 4, pp. 2198–2231, 2018. [73] A. Ghabi and A. Egyed, “Exploiting traceability uncertainty between architectural models and code, ” in Proceedings of the 2012 Joint W orking IEEE/IFIP Conference on Softwar e Ar chitectur e and Eur opean Confer ence on Software Architectur e , IEEE. IEEE, 2012, pp. 171–180. [74] R. Jain, S. Ghaisas, and A. Sureka, “Sanayojan: a framew ork for traceability link recov ery between use-cases in software requirement speciﬁcation and regulatory documents, ” in Pr oceedings of the 3r d International W orkshop on Realizing Artiﬁcial Intellig ence Synergies in Softwar e Engineering , 2014, pp. 12–18.

SoK: Systematizing Software Artifacts Traceability via Associations, Techniques, and Applications

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment