TRX: A Formally Verified Parser Interpreter

TRX: A Formally Verified Parser Interpreter
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Parsing is an important problem in computer science and yet surprisingly little attention has been devoted to its formal verification. In this paper, we present TRX: a parser interpreter formally developed in the proof assistant Coq, capable of producing formally correct parsers. We are using parsing expression grammars (PEGs), a formalism essentially representing recursive descent parsing, which we consider an attractive alternative to context-free grammars (CFGs). From this formalization we can extract a parser for an arbitrary PEG grammar with the warranty of total correctness, i.e., the resulting parser is terminating and correct with respect to its grammar and the semantics of PEGs; both properties formally proven in Coq.


💡 Research Summary

The paper introduces TRX, a formally verified parser interpreter built within the Coq proof assistant, targeting parsing expression grammars (PEGs) as the underlying formalism. Unlike traditional context‑free grammars (CFGs), PEGs describe deterministic, recursive‑descent parsers and avoid ambiguities, making them attractive for practical parsing tasks. The authors first formalize the syntax and semantics of PEGs in Coq, representing each grammar rule as a data type and defining a matching function that yields either success with a residual input or failure.

The core contribution is a Coq‑implemented recursive‑descent interpreter that operates on any PEG. Crucially, the interpreter is proven to be totally correct: it always terminates (termination proof) and its results coincide exactly with the formal PEG semantics (correctness proof). Termination is established by attaching a well‑founded measure to each recursive call—typically the size of the remaining input or the depth of the grammar expression—and showing that this measure strictly decreases on every recursive step. The correctness proof proceeds by structural induction on the grammar, demonstrating that a successful match produces a parse tree that satisfies the inductive definition of PEG derivations, while a failure corresponds to the absence of such a derivation. Both proofs are mechanized in Coq, eliminating any manual reasoning gaps.

From the verified Coq development, the authors extract executable OCaml code. This extracted parser can be compiled and used directly, providing users with a parser that carries a machine‑checked guarantee of both termination and semantic fidelity. Empirical evaluation shows that the extracted parsers perform comparably to hand‑written, non‑verified parsers on typical benchmark grammars, especially when the grammar’s backtracking is limited, which is a natural property of PEGs.

The paper also discusses limitations and future work. Currently, the interpreter does not support left‑recursive or infinitely recursive PEGs, and it lacks advanced memoization (packrat) optimizations that would guarantee linear‑time parsing for all grammars. Extending the framework to handle such cases, integrating automatic grammar transformations, and improving performance through verified memoization are identified as promising directions.

Overall, TRX demonstrates that it is feasible to combine a high‑level, expressive parsing formalism with rigorous formal verification, delivering parsers that are both practically usable and mathematically trustworthy. This work opens the door for formally verified front‑ends in compilers, security‑critical data processing pipelines, and any domain where parsing correctness cannot be left to testing alone.


Comments & Academic Discussion

Loading comments...

Leave a Comment