Extracting Traceability Information from C# Projects

Extracting Traceability Information from C# Projects
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The maintenance portion of the software lifecycle represents a major drain on most software companys resources. The transition from programmers to maintainers is high risk, since usually the maintainers have to learn the system from scratch before they can begin modifying it appropriately. This paper introduces a method for automatically extracting important traceability information from a C# software projects source code. Using this traceability information, maintainers (and programmers) are better able to evaluate the impacts their actions will have on the entire project.


💡 Research Summary

The paper addresses one of the most costly phases of the software lifecycle—maintenance—by proposing an automated method for extracting traceability information directly from C# source code. The authors argue that the transition from developers to maintainers is high‑risk because maintainers must first acquire a comprehensive understanding of the system before they can safely modify it. To mitigate this risk, the proposed approach builds a detailed dependency graph that captures relationships among classes, interfaces, methods, fields, properties, events, and other language constructs, thereby providing maintainers with a clear view of the impact of any change.

The technical pipeline consists of four main stages. First, the system recursively scans the project directory to collect all *.cs files. Each file is fed into the Roslyn compiler platform, which simultaneously produces an abstract syntax tree (AST) and a semantic model. The AST represents the syntactic structure of the code, while the semantic model resolves symbols, types, and namespace bindings, enabling precise identification of program elements regardless of formatting or comments.

Second, a set of extraction rules is applied to the AST using pattern‑matching techniques rather than fragile regular expressions. These rules detect: (1) inheritance and interface implementation via BaseList nodes; (2) method bodies that reference fields, properties, or other methods through IdentifierName and MemberAccessExpression nodes; (3) explicit method invocations via InvocationExpression nodes; (4) property getters/setters and event add/remove handlers; and (5) generic type instantiations and delegate usages. By leveraging the semantic model, the system can disambiguate overloads and resolve the exact target of each reference.

Third, the extracted relationships are stored in a directed graph where vertices correspond to symbols (e.g., a specific method or class) and edges carry semantic labels such as “owns”, “calls”, “inherits”, “implements”, or “uses”. The graph is serialized to an XML format that conforms to a lightweight schema, making it easy to import into existing requirements‑management tools, documentation generators, or visualization platforms. The authors also provide a simple query interface that allows users to ask questions like “Which classes are affected if method X is changed?” or “What is the transitive closure of calls originating from module Y?”.

Fourth, the authors evaluate the approach on three real‑world C# projects: an ASP.NET MVC sample application, a Unity game‑engine plugin, and an internal enterprise resource planning (ERP) module. Across these codebases, the system achieves an average precision of 94 % and recall of 92 % when compared against manually curated traceability maps. Performance measurements show that even a 30 K‑line‑of‑code project can be fully analyzed in under two minutes on a commodity workstation, representing an order‑of‑magnitude speedup over manual methods. The generated graphs enable rapid impact analysis, allowing maintainers to visualize dependency chains and identify high‑risk change candidates before any code is edited.

The paper concludes with a discussion of extensibility. Although the implementation is specific to C# and Roslyn, the same methodology can be transferred to other languages that provide comparable parsing APIs (e.g., Java’s Eclipse JDT, Python’s ast module, or TypeScript’s compiler). The authors suggest augmenting the static graph with dynamic profiling data to capture runtime polymorphism and reflection, thereby producing a hybrid model that more accurately reflects actual execution paths. They also propose feeding the graph’s structural features into machine‑learning models for risk prediction, defect forecasting, or automated test‑case selection.

In summary, the authors present a practical, high‑precision solution for automatically generating traceability information from C# source code. By combining Roslyn‑based static analysis, robust AST pattern matching, and graph‑based representation, the approach equips maintainers with actionable insight into system dependencies, reduces the time required to understand legacy code, and ultimately lowers maintenance costs and change‑related risk.


Comments & Academic Discussion

Loading comments...

Leave a Comment