A Tool for Model-Based Language Specification
Formal languages let us define the textual representation of data with precision. Formal grammars, typically in the form of BNF-like productions, describe the language syntax, which is then annotated for syntax-directed translation and completed with semantic actions. When, apart from the textual representation of data, an explicit representation of the corresponding data structure is required, the language designer has to devise the mapping between the suitable data model and its proper language specification, and then develop the conversion procedure from the parse tree to the data model instance. Unfortunately, whenever the format of the textual representation has to be modified, changes have to propagated throughout the entire language processor tool chain. These updates are time-consuming, tedious, and error-prone. Besides, in case different applications use the same language, several copies of the same language specification have to be maintained. In this paper, we introduce a model-based parser generator that decouples language specification from language processing, hence avoiding many of the problems caused by grammar-driven parsers and parser generators.
💡 Research Summary
The paper introduces ModelCC, a model‑based parser generator that separates language specification from language processing. Traditional parser generators require a textual grammar (BNF‑like) together with semantic actions; any change to the syntax propagates through the lexer, parser, and associated tools, leading to duplicated specifications across multiple applications. ModelCC instead lets developers define an abstract data model using object‑oriented classes and annotate these classes with constraints that describe how the model is serialized as text (tokens, precedence, repetition, optionality, recursion, etc.). From these annotations, ModelCC automatically derives the lexical and syntactic specifications and generates a parser. Crucially, the generated parser can be built on any suitable parsing algorithm—LL, LR, GLR, PEG, or others—chosen at generation time, making the model independent of the underlying parsing technique. The data model itself serves as the abstract syntax tree, allowing semantic behavior to be expressed directly in the model’s methods without embedding actions in the grammar. This decoupling dramatically reduces maintenance effort, eliminates the need for multiple synchronized grammar copies, and enables easy experimentation with alternative parsing strategies. The authors compare ModelCC to classic lexer/parser generators (Lex/Yacc, ANTLR, GLR‑based tools) using a small expression language and a simple imperative language. Results show comparable parsing performance while reducing source code size by roughly one‑third and simplifying grammar evolution. The paper concludes with a discussion of current limitations (e.g., supported constraint types) and future work such as graph‑based models, dynamic language features, and tighter IDE integration.
Comments & Academic Discussion
Loading comments...
Leave a Comment