Leveraging LLMs to support co-evolution between definitions and instances of textual DSLs

Reading time: 5 minute
...

📝 Original Info

  • Title: Leveraging LLMs to support co-evolution between definitions and instances of textual DSLs
  • ArXiv ID: 2512.06836
  • Date: 2025-12-07
  • Authors: Weixing Zhang, Regina Hebig, Daniel Strüber

📝 Abstract

Software languages evolve over time for various reasons, such as the addition of new features. When the language's grammar definition evolves, textual instances that originally conformed to the grammar become outdated. For DSLs in a model-driven engineering context, there exists a plethora of techniques to co-evolve models with the evolving metamodel. However, these techniques are not geared to support DSLs with a textual syntax -- applying them to textual language definitions and instances may lead to the loss of information from the original instances, such as comments and layout information, which are valuable for software comprehension and maintenance. This study explores the potential of Large Language Model (LLM)-based solutions in achieving grammar and instance co-evolution, with attention to their ability to preserve auxiliary information when directly processing textual instances. By applying two advanced language models, Claude-3.5 and GPT-4o, and conducting experiments across seven case languages, we evaluated the feasibility and limitations of this approach. Our results indicate a good ability of the considered LLMs for migrating textual instances in small-scale cases with limited instance size, which are representative of a subset of cases encountered in practice. In addition, we observe significant challenges with the scalability of LLM-based solutions to larger instances, leading to insights that are useful for informing future research.

💡 Deep Analysis

Figure 1

📄 Full Content

Leveraging LLMs to support co-evolution between definitions and instances of textual DSLs Weixing Zhang1,*,†, Regina Hebig2,*,† and Daniel Strüber1,3,*,† 1Chalmers University of Technology and University of Gothenburg, Hörselgången 5, 417 56 Göteborg, Sweden 2Universität Rostock, Albert-Einstein-Straße 22, 18059 Rostock, Germany 3Radboud University, Toernooiveld 212, 6525 EC Nijmegen, The Netherlands Abstract Software languages evolve over time for various reasons, such as the addition of new features. When the language’s grammar definition evolves, textual instances that originally conformed to the grammar become outdated. For DSLs in a model-driven engineering context, there exists a plethora of techniques to co-evolve models with the evolving metamodel. However, these techniques are not geared to support DSLs with a textual syntax — applying them to textual language definitions and instances may lead to the loss of information from the original instances, such as comments and layout information, which are valuable for software comprehension and maintenance. This study explores the potential of Large Language Model (LLM)-based solutions in achieving grammar and instance co-evolution, with attention to their ability to preserve auxiliary information when directly processing textual instances. By applying two advanced language models, Claude-3.5 and GPT-4o, and conducting experiments across seven case languages, we evaluated the feasibility and limitations of this approach. Our results indicate a good ability of the considered LLMs for migrating textual instances in small-scale cases with limited instance size, which are representative of a subset of cases encountered in practice. In addition, we observe significant challenges with the scalability of LLM-based solutions to larger instances, leading to insights that are useful for informing future research. Keywords Co-Evolution, textual DSLs, LLM 1. Introduction Domain-specific languages (DSLs) are useful tools to describe and solve problems in a specific application domain. As domain knowledge evolves and requirements change, DSLs often need to evolve accordingly [1]. For example, features may be added and existing functionality may be adjusted, leading to a need to update the definition of the DSL, to introduce new language constructs and modify existing ones. When the definition of a DSL evolves, existing instances face challenges: they may contain constructs that no longer conform to the new definition and require appropriate modification, or they may need to additions to support newly introduced required language elements [2, 3, 4]. While the model-driven engineering community has developed numerous approaches for metamodel-instance co-evolution [5], these works are generally focused on metamodel-based language definitions, usually in the context of graphical DSLs [5]. In practice, there is an ongoing trend towards textual DSLs, which can emulate the look and feel of familiar general-purpose languages and are easy to integrate with standard developer tools for versioning, differencing, and merging. These textual DSLs, developed in frameworks like Xtext, Langium, and textX, are technically defined through grammars and instantiated as textual instances. Dedicated approaches to co-evolving textual instances are scarce. One possible way to address co-evolution of textual instances is by using the available metamodel-based approaches. To that end, the original instance needs to be parsed into the form of a model and transformed back into textual form after Joint Proceedings of the STAF 2025 Workshops: OCL, OOPSLE, LLM4SE, ICMM, AgileMDE, AI4DPS, and TTC. Koblenz, Germany, June 10-13, 2025 $ weixing.zhang@gu.se (W. Zhang); regina.hebig@uni-rostock.de (R. Hebig); danstru@chalmers.se (D. Strüber) € https://wilson008.github.io/ (W. Zhang); https://se.informatik.uni-rostock.de/team/lehrstuhlinhaber/prof-dr-rer-nat-regina-hebig/ (R. Hebig); https://www.danielstrueber.de/ (D. Strüber)  0000-0003-2890-6034 (W. Zhang); 0000-0002-1459-2081 (R. Hebig); 0000-0002-5969-3521 (D. Strüber) © 2025 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). the model is co-evolved. However, this approach leads to information loss: during the transformation process between textual instance and model, auxiliary information in the original instances, such as code comments and formatting styles, cannot be retained [6][7]. While this information does not affect program functionality, it serves a critical purpose during tasks such as code maintenance, debugging, and understanding design intent [8]. Hence, there is arguably a need to preserve such information during the co-evolution of instances. In recent years, Large Language Models (LLMs) have demonstrated exceptional capabilities in code understanding, transformation, and generation [9] [10]. These models not only perform well at tasks that require understand

📸 Image Gallery

page_1.png page_2.png page_3.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut