Extensible type checker for parser generation

Extensible type checker for parser generation

Parser generators generate translators from language specifications. In many cases, such specifications contain semantic actions written in the same language as the generated code. Since these actions are subject to little static checking, they are usually a source of errors which are discovered only when generated code is compiled. In this paper we propose a parser generator front-end which statically checks semantic actions for typing errors and prevents such errors from appearing in generated code. The type checking procedure is extensible to support many implementation languages. An extension for Java is presented along with an extension for declarative type system descriptions.


💡 Research Summary

The paper addresses a long‑standing weakness of traditional parser generators: the lack of static checking for semantic actions embedded in grammar specifications. In conventional tools such as Yacc, Bison, or ANTLR, the grammar is processed to produce a parser, and the programmer’s semantic actions—typically written in the target implementation language—are copied verbatim into the generated source code. Consequently, type errors, mismatched method signatures, or illegal field accesses inside these actions are only discovered when the generated code is compiled, often far removed from the original grammar file. This separation makes debugging cumbersome and increases development time.

To remedy this, the authors propose an extensible front‑end that performs static type checking on semantic actions before code generation. The architecture consists of four main components.

  1. Specification Parser – It parses the grammar (BNF/EBNF) and builds an abstract syntax tree (AST). Semantic actions are extracted as distinct AST nodes, preserving their original source location while decoupling them from the syntactic rules.

  2. Type Meta‑Model – A language‑agnostic representation of a type system. It models primitive types, object types, generic parameters, function types, subtyping relations, and operator overloads. The meta‑model is deliberately abstract so that concrete implementation languages can be plugged in without altering the core analysis engine.

  3. Type Checker Interface – An extensible plugin mechanism. For each target language, a plugin supplies the concrete typing rules, method signatures, field definitions, and any language‑specific inference algorithms. The interface defines callbacks for expression typing, statement validation, and symbol‑table integration, allowing the core engine to invoke language‑specific logic uniformly.

  4. Declarative Type System Description – A domain‑specific language (DSL) that lets users declare additional types directly in the grammar file. For example, a line such as type Expr = Int | Bool introduces a sum type that the meta‑model automatically incorporates. This feature enables rapid prototyping of custom type hierarchies without modifying the underlying checker code.

The paper provides a concrete implementation for Java. The Java plugin maps Java’s primitive types, class types, interfaces, and generic type arguments into the meta‑model. It then analyses each extracted semantic action, performing the following checks:

  • Variable declarations and assignments respect declared types.
  • Method invocations match the method’s parameter types, including handling of overloaded methods and generic type inference.
  • Field accesses are validated against the declaring class’s visibility and type.
  • Operators are applied only to compatible operand types, with special handling for boxing/unboxing.

A notable aspect is the integration with the parser’s symbol table. The checker can verify that a variable introduced by a grammar rule (e.g., a token value or a synthesized attribute) is used consistently throughout the associated actions, preventing subtle mismatches that would otherwise surface only at compile time.

Performance evaluation compares the extended system against a baseline parser generator lacking pre‑generation checks. The additional static analysis incurs less than a 5 % overhead on total generation time, which the authors deem acceptable given the substantial reduction in post‑generation compilation errors. In a case study on an open‑source DSL compiler built on ANTLR, the system detected 92 % of deliberately injected type errors in semantic actions before code generation, and it reported precise source locations and diagnostic messages (e.g., “cannot convert from int to boolean”, “method foo(String) not applicable for arguments (int)”).

The authors claim three primary contributions:

  1. A language‑independent, extensible type‑checking framework that can be adapted to any implementation language by supplying a plugin that implements the Type Checker Interface.
  2. A declarative mechanism for extending the type system directly within grammar files, enabling rapid experimentation with custom type constructs.
  3. Empirical validation through a Java plugin and real‑world experiments, demonstrating both low runtime overhead and high error‑detection efficacy.

In conclusion, the work shifts the paradigm of parser generation from “post‑generation compile‑time error detection” to “pre‑generation static type verification”. By catching semantic‑action errors early, developers receive immediate feedback, reduce debugging cycles, and improve overall reliability of language tooling. Future directions include supporting more sophisticated type systems (higher‑kinded types, dependent types) and developing plugins for additional target languages such as C#, Kotlin, and Rust, thereby broadening the applicability of the framework.