Experimental Aspects of Synthesis
We discuss the problem of experimentally evaluating linear-time temporal logic (LTL) synthesis tools for reactive systems. We first survey previous such work for the currently publicly available synthesis tools, and then draw conclusions by deriving useful schemes for future such evaluations. In particular, we explain why previous tools have incompatible scopes and semantics and provide a framework that reduces the impact of this problem for future experimental comparisons of such tools. Furthermore, we discuss which difficulties the complex workflows that begin to appear in modern synthesis tools induce on experimental evaluations and give answers to the question how convincing such evaluations can still be performed in such a setting.
💡 Research Summary
**
The paper “Experimental Aspects of Synthesis” addresses the challenging problem of evaluating linear‑time temporal logic (LTL) synthesis tools for reactive systems in a systematic, reproducible manner. It begins with a concise survey of the four publicly available synthesis tools at the time of writing—ANZU, LILY, ACACIA, and UNBEAST—highlighting that each tool operates under a different set of assumptions, semantics (Mealy versus Moore), input languages, and algorithmic back‑ends (BDD, SAT/SMT, Safra‑less constructions). Because of these divergences, direct performance comparison across tools is virtually impossible.
The authors identify three root causes for the current state of affairs. First, the engineering effort required to implement a synthesis tool is substantially higher than that for a SAT solver; deterministic automata constructions such as Safra’s determinisation are notoriously complex, discouraging many researchers from releasing usable prototypes. Second, the publication cultures of the formal methods community and the SAT/SMT community differ: the former rewards conceptual breakthroughs while the latter values empirical scalability, leading to a scarcity of papers that present thorough experimental evaluations of synthesis algorithms. Third, even when a tool is built, the lack of a common benchmark suite and the incompatibility of semantics force authors to rewrite or adapt existing benchmarks, inflating the experimental overhead and reducing reproducibility.
The paper then analyses the concrete workflows of the four tools. ANZU implements GR(1) synthesis in a symbolic BDD framework, restricting specifications to a PSL‑style implication of assumptions and guarantees. LILY accepts arbitrary LTL, translates the negated specification to a nondeterministic Büchi automaton, then to a universal co‑Büchi tree automaton, finally checking emptiness via alternating weak tree automata and nondeterministic Büchi tree automata; a parameter k controls the size of intermediate structures. ACACIA shares LILY’s input format but uses Moore semantics and adds support for local assumptions, offering two versions (2009 and 2010) that incorporate different algorithmic ideas. UNBEAST, although not described in detail, combines BDD and SAT/SMT techniques and introduces more elaborate pipelines (e.g., parallel semi‑algorithms, heuristic assumption dropping). All of these tools have been evaluated on a handful of hand‑crafted case studies (e.g., AMBA arbiter, generalized buffer, traffic‑light controllers), which are insufficient for statistically significant conclusions.
Recognizing these obstacles, the authors propose a standardized evaluation framework aimed at future synthesis research. The framework consists of four pillars: (1) a canonical semantics definition with explicit Mealy↔Moore conversion rules; (2) a tool‑agnostic benchmark format, preferably based on PSL, together with conversion scripts that adapt a benchmark to each tool’s required subset; (3) a hierarchical set of performance metrics, separating raw execution time and memory consumption from higher‑level measures such as the number of BDD nodes, SAT solver calls, or the depth of the automata construction; and (4) a reproducibility package that includes all configuration files, parameter values (e.g., the k‑parameter for LILY), versioned dependencies (BDD libraries, SAT/SMT solvers), and hardware specifications. By making each stage of the workflow transparent and publicly available, the framework enables researchers to isolate the contribution of a new algorithmic idea from the surrounding engineering optimizations.
In conclusion, the paper argues that rigorous experimental evaluation is as essential to the progress of LTL synthesis as theoretical breakthroughs. The proposed standardization not only lowers the entry barrier for new tool developers but also facilitates meaningful, head‑to‑head comparisons that can drive the field forward. The authors anticipate that, as the community adopts these practices and as more benchmark repositories become available, synthesis tools will evolve from isolated prototypes into robust, industrial‑grade solutions for reactive system design.
Comments & Academic Discussion
Loading comments...
Leave a Comment