Automata for two-variable logic over trees with ordered data values

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Data trees are trees in which each node, besides carrying a label from a finite alphabet, also carries a data value from an infinite domain. They have been used as an abstraction model for reasoning tasks on {XML} and verification. However, most existing approaches consider the case where only equality test can be performed on the data values. In this paper we study data trees in which the data values come from a linearly ordered domain, and in addition to equality test, we can test whether the data value in a node is greater than the one in another node. We introduce an automata model for them which we call ordered-data tree automata (ODTA), provide its logical characterisation, and prove that its non-emptiness problem is decidable in 3-NEXPTIME. We also show that the two-variable logic on unranked trees, studied by Bojanczyk, Muscholl, Schwentick and Segoufin in 2009, corresponds precisely to a special subclass of this automata model. Then we define a slightly weaker version of ODTA, which we call weak ODTA, and provide its logical characterisation. The complexity of the non-emptiness problem drops to NP. However, a number of existing formalisms and models studied in the literature can be captured already by weak ODTA. We also show that the definition of ODTA can be easily modified, to the case where the data values come from a tree-like partially ordered domain, such as strings.

💡 Research Summary

The paper addresses the limitation of most existing data‑tree frameworks, which only allow equality tests on data values, by considering data trees whose node values come from an infinite linearly ordered domain. In this setting one can test both equality ( = ) and order ( > ) between data values. To capture such expressive power the authors introduce a new automaton model called Ordered‑Data Tree Automata (ODTA). An ODTA works on unranked trees whose nodes carry a finite alphabet label and a data value; its transition rules may refer to the label, the equality of data values, and the strict order between a node’s data and that of its children or siblings.

A central contribution is a logical characterisation: the class of languages recognised by ODTA coincides exactly with the fragment of two‑variable first‑order logic (FO²) over unranked trees that can use the predicates “label = a”, “data = data”, and “data > data”. The authors give an effective translation from any FO² formula in this signature into an equivalent ODTA, thereby establishing a tight correspondence between logic and automata. Conversely, every ODTA can be expressed as an FO² sentence, showing that ODTA is precisely the automata counterpart of this logic.

The decidability of the non‑emptiness problem for ODTA is proved. By reducing ODTA emptiness to the emptiness of a suitable class of counter‑free tree automata enriched with Presburger constraints, the authors obtain an algorithm running in triple‑exponential nondeterministic time (3‑NEXPTIME). This upper bound demonstrates that, despite the presence of order tests on an infinite domain, the model remains decidable.

Building on this, the paper defines a weaker variant, weak‑ODTA, which restricts the use of order tests and simplifies the transition structure. For weak‑ODTA the non‑emptiness problem drops dramatically to NP, making it attractive for practical verification tasks. The authors show that many previously studied formalisms—such as XML schema validation, XPath‑like navigation queries, and certain data‑base integrity constraints—can already be captured by weak‑ODTA, highlighting its expressive adequacy.

Finally, the authors discuss how the ODTA framework can be adapted to data domains equipped with a tree‑like partial order (e.g., strings ordered by prefix). By modifying the comparison predicates to respect the partial order, the same automaton construction and logical characterisation go through, indicating that the approach is robust with respect to the underlying data ordering.

In summary, the paper makes four major contributions: (1) the definition of ODTA, an automaton model that integrates order tests on data values; (2) a precise logical characterisation linking ODTA to FO² over data trees; (3) decidability results showing non‑emptiness in 3‑NEXPTIME for ODTA and in NP for weak‑ODTA; and (4) extensions to partially ordered data domains. These results advance the theoretical foundations of data‑tree verification and open new avenues for applying automata‑based techniques to XML processing, database theory, and related areas where ordered data values are essential.

Automata for two-variable logic over trees with ordered data values

💡 Research Summary

Comments & Academic Discussion

Leave a Comment