Program Understanding: A Reengineering Case for the Transformation Tool Contest

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

In Software Reengineering, one of the central artifacts is the source code of the legacy system in question. In fact, in most cases it is the only definitive artifact, because over the time the code has diverged from the original architecture and design documents. The first task of any reengineering project is to gather an understanding of the system’s architecture. Therefore, a common approach is to use parsers to translate the source code into a model conforming to the abstract syntax of the programming language the system is implemented in which can then be subject to querying. Despite querying, transformations can be used to generate more abstract views on the system’s architecture. This transformation case deals with the creation of a state machine model out of a Java syntax graph. It is derived from a task that originates from a real reengineering project.

💡 Research Summary

The paper presents a transformation case designed for the Transformation Tool Contest (TTC 2011) that addresses a common challenge in software reengineering: extracting high‑level architectural information from legacy source code when the code itself is the only reliable artifact. The authors focus on a Java‑based legacy system that implements a state‑machine‑driven graphical user interface using strict coding conventions. Their goal is to automatically generate a concise state‑machine model from a Java abstract syntax graph (JaMoPP Ecore model) that captures the same behavior expressed implicitly in the source code.

The core transformation task consists of two main steps. First, it identifies every concrete (non‑abstract) Java class that directly or indirectly extends an abstract class named “State”. These classes represent the states of the machine and are implemented as singletons; the transformation therefore looks for the pattern ClassName.Instance() to obtain the unique instance. Second, it discovers transitions by locating method calls of the form NewState.Instance().activate(). Such calls may appear anywhere inside method bodies, possibly nested deep within switch statements, catch blocks, or other control structures. The transformation must therefore perform non‑local graph matching: it traverses the syntax graph to resolve the called method, verify that the return value is a singleton instance, and finally confirm the subsequent activate invocation. The source and target metamodels are clearly defined: the source is the JaMoPP Java metamodel, while the target is a minimal state‑machine metamodel containing StateMachine, State, and Transition elements with attributes name, trigger, and action.

Beyond the core task, two optional extensions enrich the generated model. Extension 1 populates the trigger attribute of each transition according to four deterministic coding conventions: (1) if the transition occurs in any method other than run(), the method’s name becomes the trigger; (2) if it occurs inside a non‑default case of a switch statement within run(), the case’s enum constant is the trigger; (3) if it occurs inside a catch block within run(), the caught exception class name is the trigger; (4) otherwise (i.e., an unconditional transition inside run()), the trigger is set to “–”. Extension 2 determines the action attribute by inspecting the statement block that contains the transition. If a call to send() is present, the enum constant passed to send becomes the action; otherwise the action is “–”. These extensions require the transformation to analyse surrounding syntactic constructs, not merely the method‑call pattern itself.

The authors provide three input models of increasing size: a small model (~6 500 elements), a medium model with deeper inheritance and statement nesting, and a large industrial model derived from a real Java project (≈1 million elements, 900 classes, 220 k lines of code). All models encode the TCP protocol state machine using the described conventions, so the expected output is identical across inputs (a state machine with 11 states and 21 transitions, plus trigger and action information when extensions are applied).

Evaluation criteria are split into three weighted categories: understandability and conciseness (30 %), correctness and completeness (35 %), and performance (5 %). Understandability measures how clearly each coding convention maps to a transformation rule; conciseness assesses the brevity of the solution. Correctness checks that every state and transition in the generated model corresponds exactly to the behavior encoded in the source code, while completeness ensures no transition is omitted. Performance is measured by the ability to process the large model within acceptable time limits on typical hardware.

The case study demonstrates that, by exploiting well‑defined coding conventions, a model transformation can replace manual code inspection for program understanding. The reference solution, implemented in the GReTL transformation language, produces a compact visualizable state‑machine model (fewer than 100 nodes and edges) that faithfully represents the original system’s possible UI flows. This not only accelerates the initial analysis phase of reengineering projects but also reduces the risk of human error inherent in manual modeling. The paper thus argues for the practical value of transformation‑based approaches in large‑scale legacy system analysis and provides a concrete benchmark for future transformation tools.

Program Understanding: A Reengineering Case for the Transformation Tool Contest

💡 Research Summary

Comments & Academic Discussion

Leave a Comment