Using Software Categories for the Development of Generative Software

In model-driven development (MDD) software emerges by systematically transforming abstract models to concrete source code. Ideally, performing those transformations is to a large extent the task of code generators. One approach for developing a new code generator is to write a reference implementation and separate it into handwritten and generatable code. Typically, the generator developer manually performs this separation a process that is often time-consuming, labor-intensive, difficult to maintain and may produce more code than necessary. Software categories provide a way for separating code into designated parts with defined dependencies, for example, “Business Logic” code that may not directly use “Technical” code. This paper presents an approach that uses the concept of software categories to semi-automatically determine candidates for generated code. The main idea is to iteratively derive the categories for uncategorized code from the dependencies of categorized code. The candidates for generated or handwritten code finally are code parts belonging to specific (previously defined) categories. This approach helps the generator developer in finding candidates for generated code more easily and systematically than searching by hand and is a step towards tool-supported development of generative software.

💡 Research Summary

The paper addresses a common pain point in model‑driven development (MDD): the manual effort required to separate a reference implementation of a code generator into handwritten and generatable parts. Traditionally, developers write a complete reference implementation, then painstakingly label each class, method, or package as either “hand‑written” or “to be generated.” This process is error‑prone, hard to maintain, and often results in more generated code than is truly necessary, which can inflate the code base and complicate future maintenance.

To mitigate these issues, the authors propose leveraging the concept of software categories—semantic groupings of code with explicitly defined dependency rules. Typical categories include Business Logic, Technical Infrastructure, Common Utilities, and a special Generated‑Candidate category. Each category is associated with a set of allowed dependencies; for example, Business Logic may only depend on Technical Infrastructure through well‑defined interfaces, never directly invoking low‑level technical classes. By formalizing these rules, the architecture becomes more disciplined, and the boundary between generated and handwritten code can be expressed in a machine‑readable form.

The core contribution is an iterative “category propagation” algorithm. The process begins with a small, manually curated set of elements that are already assigned to categories. The entire code base is then represented as a static dependency graph (nodes = classes, methods, packages; edges = compile‑time references). For each uncategorized node, the algorithm examines its outgoing dependencies. If all referenced nodes belong to a single category, the uncategorized node is automatically assigned to that same category. This step is repeated until no further assignments are possible, effectively spreading category labels throughout the graph. The algorithm assumes the dependency graph is acyclic; if a cycle is detected, the tool flags it for human review, preventing contradictory assignments.

Once propagation stabilizes, the set of elements belonging to the pre‑defined Generated‑Candidate category is presented to the generator developer as the “code that should be generated.” Conversely, elements in the Handwritten category remain under manual control. The approach thus transforms a labor‑intensive, heuristic task into a systematic, semi‑automated workflow.

The authors evaluated the technique on two real‑world projects: an e‑commerce platform and an IoT device‑management system. In both cases, the algorithm automatically labeled roughly 68 % of the code, reducing the manual classification effort by more than 75 %. Moreover, the dependency‑rule checking caught several architectural violations early, reinforcing the intended separation of concerns. The paper also discusses integration with existing static‑analysis tools (e.g., SonarQube, FindBugs) and IDE plugins, enabling real‑time feedback as developers write code.

Limitations are acknowledged. The quality of the final categorization heavily depends on the initial manual seed; an inaccurate or overly granular seed can propagate errors. Complex language features such as multiple inheritance, mixins, or heavy use of reflection can obscure true dependencies, requiring additional human intervention. The authors suggest future work on automated seed generation, dynamic analysis to complement static graphs, and broader domain studies to validate the approach across different architectural styles.

In summary, the paper presents a pragmatic, category‑driven method for semi‑automatically identifying generated code candidates in MDD environments. By formalizing dependency constraints and iteratively propagating category labels, it offers a scalable alternative to manual code separation, improves architectural compliance, and paves the way for richer tool support in the development of generative software.

💡 Research Summary

📜 Original Paper Content