Product Line Metrics for Legacy Software in Practice

Product Line Metrics for Legacy Software in Practice
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Nowadays, customer products like vehicles do not only contain mechanical parts but also a highly complex software and their manufacturers have to offer many variants of technically very similar systems with sometimes only small differences in their behavior. The proper reuse of software artifacts which realize this behavior using a software product line is discussed in recent literature and appropriate methods and techniques for their management are proposed. However, establishing a software product line for integrating already existing legacy software to reuse valuable resources for future similar products is very company-specific. In this paper, a method is outlined for evaluating objectively a legacy software’s potential to create a software product line. This method is applied to several development projects at Volkswagen AG Business Unit Braunschweig to evaluate the software product line potential for steering systems.


💡 Research Summary

The paper addresses the growing need for systematic reuse of software artifacts in automotive products, where modern vehicles contain extensive embedded software that must be offered in many variants. While the concept of a software product line (SPL) is well‑established for greenfield development, establishing an SPL from existing legacy code is highly company‑specific and fraught with technical and organizational challenges. To provide a concrete decision‑support method, the authors develop a quantitative evaluation framework and apply it to several steering‑system development projects at Volkswagen AG’s Braunschweig Business Unit.

The framework consists of four measurement dimensions: (1) code duplication (clone density), (2) functional overlap, (3) architectural consistency, and (4) maintenance cost. Code duplication is detected with clone‑finding tools and expressed as a percentage of the total code base. Functional overlap is measured by tracing requirements to implementation units and calculating the proportion of shared functionality across product variants. Architectural consistency evaluates module‑level dependencies and the degree of interface standardisation using dependency graphs and interface specification metrics. Maintenance cost aggregates mean time to repair, test‑coverage data, and documentation completeness.

Each dimension is normalised to a 0‑1 scale and combined using a weighted sum; the weights were derived from expert surveys and historical project success analyses, giving higher importance to duplication and architecture. The resulting composite score ranges from 0 to 100 and is interpreted as follows: scores ≥ 70 indicate high SPL potential, 40–70 moderate potential, and < 40 low potential.

Five steering‑system projects (labelled A–E) were evaluated. Projects A and B achieved scores of 78 and 82 respectively. Their low clone density (≈ 15 %), minimal functional overlap (< 20 %), and well‑defined, loosely coupled modules yielded a high architectural consistency rating. Both projects already used a partially shared core library and adhered to a common interface specification, making them prime candidates for immediate SPL formation.

Projects C and D received scores of 38 and 42. They exhibited high clone densities (≈ 45 %), dense dependency graphs, and tightly coupled modules, indicating substantial refactoring effort before any SPL can be built. The authors recommend a systematic extraction of common functionality into a core component, followed by the introduction of a service‑oriented layer to decouple the remaining modules. Project E scored 55; while its code duplication was modest, documentation was almost non‑existent, leading to elevated estimated maintenance costs. For this case the authors suggest automated documentation generation tools and targeted training to raise the maintenance‑cost dimension.

Beyond the technical assessment, the paper discusses how the quantitative scores can guide managerial decisions. High‑potential projects justify upfront investment in SPL infrastructure (e.g., shared build pipelines, common testing frameworks) because the expected return‑on‑investment is large. Moderate‑potential projects should follow a staged approach: pilot refactoring, incremental core‑library creation, and continuous re‑evaluation of the SPL score. Low‑potential projects may be candidates for retirement or for a complete redesign rather than costly refactoring.

The authors also emphasize the hybrid data‑collection approach: automated static analysis provides fast, repeatable metrics for code‑level dimensions, while expert review is essential for functional overlap and documentation quality. This combination mitigates the typical data‑quality issues found in legacy environments.

In conclusion, the study delivers a practical, metric‑driven method for assessing the feasibility of converting legacy automotive software into a product line. The successful application at Volkswagen demonstrates that the framework can produce actionable insights, prioritize refactoring work, and support strategic investment decisions. The authors argue that the approach is transferable to other automotive domains such as power‑train control or infotainment, and that future work will explore automated toolchains to further streamline the evaluation process.


Comments & Academic Discussion

Loading comments...

Leave a Comment