An extension of data automata that captures XPath
We define a new kind of automata recognizing properties of data words or data trees and prove that the automata capture all queries definable in Regular XPath. We show that the automata-theoretic approach may be applied to answer decidability and expressibility questions for XPath.
💡 Research Summary
The paper introduces a novel automata model, called the Extended Data Automaton (EDA), designed to recognize properties of both data words and data trees. Unlike traditional data automata, which are limited in expressive power, the EDA augments a finite‑state control with an unbounded set of registers that can store data values encountered during the run. Transitions are guarded by equality tests between the current input symbol’s data component and the contents of the registers, allowing the automaton to perform the kind of data comparisons required by XPath predicates and filters. The authors first formalize Regular XPath, a fragment of XPath that includes child, descendant, parent, sibling axes, and data‑value tests, and then present a systematic translation from any Regular XPath expression into an equivalent EDA. The translation proceeds inductively on the syntax tree of the XPath query: basic axis steps become simple state transitions, while filter predicates become guard conditions that compare the current node’s data value with a value stored in a register. The construction also handles the nesting of predicates by allocating fresh registers for intermediate data values and by re‑using registers when possible, which keeps the size of the resulting automaton under control. To address the potential state‑space explosion, the paper introduces two optimisation techniques: (1) merging of bisimilar states that share identical register constraints, and (2) a register minimisation procedure that eliminates redundant registers without affecting the recognised language. Moreover, the authors show how to determinise the resulting nondeterministic EDA by a subset construction that respects register equivalence classes, thereby preserving decidability of emptiness and inclusion. The main theoretical contributions are two equivalence theorems: (i) every Regular XPath query can be captured by some EDA, and (ii) for every EDA there exists an equivalent Regular XPath expression. The proofs rely on constructing a canonical normal form for EDAs and on demonstrating that the automaton’s run semantics coincide with the XPath navigation semantics on data trees. As a consequence, classic decision problems for XPath—emptiness, containment, equivalence—are reduced to the corresponding problems for EDAs, which are known to be decidable. The paper also discusses practical implications: the automata‑theoretic framework can be used for static analysis of XPath queries in XML databases, for query optimisation by detecting redundant path steps, and for verifying security policies expressed as XPath constraints. In summary, the work bridges the gap between automata theory and XML query languages, providing a robust toolset for reasoning about XPath’s expressive limits and for developing algorithmic solutions to its fundamental decision problems.
Comments & Academic Discussion
Loading comments...
Leave a Comment