Faster Algorithms for Weighted Recursive State Machines
Pushdown systems (PDSs) and recursive state machines (RSMs), which are linearly equivalent, are standard models for interprocedural analysis. Yet RSMs are more convenient as they (a) explicitly model function calls and returns, and (b) specify many natural parameters for algorithmic analysis, e.g., the number of entries and exits. We consider a general framework where RSM transitions are labeled from a semiring and path properties are algebraic with semiring operations, which can model, e.g., interprocedural reachability and dataflow analysis problems. Our main contributions are new algorithms for several fundamental problems. As compared to a direct translation of RSMs to PDSs and the best-known existing bounds of PDSs, our analysis algorithm improves the complexity for finite-height semirings (that subsumes reachability and standard dataflow properties). We further consider the problem of extracting distance values from the representation structures computed by our algorithm, and give efficient algorithms that distinguish the complexity of a one-time preprocessing from the complexity of each individual query. Another advantage of our algorithm is that our improvements carry over to the concurrent setting, where we improve the best-known complexity for the context-bounded analysis of concurrent RSMs. Finally, we provide a prototype implementation that gives a significant speed-up on several benchmarks from the SLAM/SDV project.
💡 Research Summary
The paper tackles the problem of efficiently computing distances in weighted recursive state machines (RSMs) under a semiring framework, with a particular focus on finite‑height idempotent semirings. An RSM consists of a collection of modules, each with a set of entry and exit nodes and explicit call/return boxes that implicitly manage a call stack. By labeling every transition with a semiring element, the weight of a computation is the ⊗‑product of its transitions, while the weight of a set of computations is the ⊕‑sum of the individual weights. This abstraction captures classic interprocedural reachability (Boolean semiring) as well as a wide range of data‑flow analyses (e.g., IFDS/IDE) and more general quantitative properties.
Main contributions
-
Improved algorithm for configuration and node distances – The authors introduce a symbolic representation called a configuration automaton that compactly encodes all reachable configurations from a given set of sources. By combining this automaton with pre‑computed entry‑to‑exit summaries for each module, they devise a dynamic‑programming procedure that runs in
O(H·(|R|·θ_e + |Call|·θ_e·θ_x)) time, where H is the semiring height, |R| the size of the RSM, θ_e and θ_x the maximum numbers of entry and exit nodes per module, and |Call| the number of call nodes. This improves the previous best bound O(H·|R|·θ_e·θ_x·f) by a factor of roughly (|R|·f)/(θ_x+|Call|). -
Query processing after a one‑time preprocessing – Once the configuration automaton is built, answering a distance query of size n (e.g., a set of target configurations) takes O(n·θ_e²) time. When the semiring domain is small (Boolean, small integer ranges), the authors exploit fast matrix‑vector multiplication to obtain additional constant‑factor speed‑ups. For RSMs with a sparse call graph, they adapt a Four‑Russians‑style technique that spends polynomial preprocessing to compress query inputs into logarithmic‑size blocks, yielding a flexible trade‑off between preprocessing and per‑query cost.
-
Context‑bounded analysis of concurrent RSMs – Extending the approach to the concurrent setting, the paper presents an algorithm for k‑context‑bounded reachability that runs in
O(|R_k|·θ_‖e·θ_‖x·n_k·|G|^k) time, where |R_k| is the size of the concurrent RSM, θ_‖e/θ_‖x are the global entry/exit bounds, n_k the number of component RSMs, G the global component, and k the context‑switch bound. This dramatically improves upon the prior O(|R_k|⁵·θ_‖⁵·n_k·|G|^k) bound. -
Experimental validation – A prototype implementation (explicit, not symbolic) was compared against jMoped, a mature tool for weighted pushdown systems. Using benchmarks from the SLAM/SDV project, the new algorithm consistently outperformed jMoped, achieving average speed‑ups of threefold and up to tenfold on hardest cases. The gains are especially pronounced when the number of entry/exit nodes per module is small and the call graph is sparse.
Technical highlights
- Configuration automaton: a finite automaton whose states correspond to symbolic sets of RSM configurations; transitions inherit semiring weights.
- Entry‑to‑exit summaries: for each module, the ⊕‑sum of all paths from any entry to any exit is pre‑computed, eliminating redundant recomputation during the main DP.
- Dynamic programming over the automaton: the algorithm iteratively propagates distances respecting the semiring’s height, guaranteeing termination after at most H iterations.
- Sparse‑graph optimizations: when each module calls only a constant number of others, the summary tables remain small, enabling fast matrix‑vector products and the Four‑Russians compression.
Impact and future work
The work demonstrates that directly exploiting the structural parameters of RSMs—rather than translating them to pushdown systems—yields both theoretical and practical improvements. The semiring‑based formulation is sufficiently general to encompass a broad class of interprocedural analyses, and the techniques for concurrent, context‑bounded settings open avenues for scalable verification of multithreaded programs. Future directions include extending the approach to infinite‑height semirings, handling richer control‑flow constructs (e.g., exceptions), and integrating the symbolic preprocessing into existing verification frameworks.
Comments & Academic Discussion
Loading comments...
Leave a Comment