Operational State Complexity of Deterministic Unranked Tree Automata

We consider the state complexity of basic operations on tree languages recognized by deterministic unranked tree automata. For the operations of union and intersection the upper and lower bounds of both weakly and strongly deterministic tree automata are obtained. For tree concatenation we establish a tight upper bound that is of a different order than the known state complexity of concatenation of regular string languages. We show that (n+1) ( (m+1)2^n-2^(n-1) )-1 vertical states are sufficient, and necessary in the worst case, to recognize the concatenation of tree languages recognized by (strongly or weakly) deterministic automata with, respectively, m and n vertical states.

💡 Research Summary

This paper investigates the state‑complexity of fundamental operations on languages recognized by deterministic unranked tree automata (DUTAs). Unranked trees, where each node may have an arbitrary number of children, model widely used hierarchical data such as XML. The authors distinguish two deterministic models: strongly deterministic automata, whose transition function is total for every possible input, and weakly deterministic automata, which may have undefined transitions on some inputs but still recognize a well‑defined language. Both models employ a two‑dimensional state system: vertical states that follow the parent‑child direction and horizontal states that handle sibling ordering.

The study first addresses union and intersection. For two automata A and B with m and n vertical states respectively, the authors show that the worst‑case number of vertical states required by a deterministic automaton (strong or weak) to recognize the union (or intersection) of the two languages is exactly m·n. The upper bound follows the classic Cartesian‑product construction adapted to the tree setting, while the lower bound is demonstrated by constructing language families that force every pair of vertical states to be distinguishable. Horizontal states are handled analogously, and the results hold for both strong and weak determinism.

The core contribution concerns tree concatenation, an operation that has no direct analogue in string automata because of the hierarchical nature of trees. The concatenation considered here attaches the root of a second tree to a leaf of the first tree, preserving the root of the first tree. The authors prove a tight bound on the number of vertical states needed to recognize the concatenation of languages L₁ and L₂, where L₁ is recognized by an automaton with m vertical states and L₂ by one with n vertical states. The bound is

(Nₙ) = (n + 1)·((m + 1)·2ⁿ − 2ⁿ⁻¹) − 1.

The term (n + 1) accounts for the n vertical states of the second automaton plus an extra “waiting” state that represents the moment before the second tree is attached. The factor ((m + 1)·2ⁿ − 2ⁿ⁻¹) arises from a refined product construction that must keep track of all possible subsets of the second automaton’s vertical states (hence the 2ⁿ term) while also remembering whether the attachment point has been reached (the +1 and −2ⁿ⁻¹ adjustments). The authors give an explicit constructive upper‑bound automaton that realises this state count, and they prove optimality by presenting a family of languages for which any deterministic automaton must use exactly this many vertical states. This demonstrates that tree concatenation incurs a state explosion of a different order than the well‑known string concatenation bound (m·2ⁿ − 2ⁿ⁻¹).

The paper also analyses the impact of strong versus weak determinism on these bounds. Although the constructions differ slightly—weak automata may omit some transitions—the final state‑complexity figures are identical for both models. This equivalence suggests that, from a complexity‑theoretic perspective, the choice between strong and weak determinism does not affect the worst‑case resource requirements, while weak determinism may still be preferable in practice because of its more permissive transition definition.

Beyond the technical proofs, the authors discuss practical implications. The results give precise estimates for the memory consumption of deterministic tree‑automaton based tools such as XML schema validators, tree‑pattern query engines, and static analyzers for hierarchical data. Knowing that concatenation can cause a super‑exponential blow‑up in the number of vertical states warns designers to avoid naïve concatenation of large tree languages or to employ nondeterministic or symbolic techniques when scalability is a concern.

In summary, the paper extends the theory of state complexity from regular string languages to deterministic unranked tree automata, delivering exact upper and lower bounds for union, intersection, and especially concatenation. The concatenation bound, (n+1)((m+1)2ⁿ−2ⁿ⁻¹)−1, is novel and highlights how the hierarchical, unranked nature of trees fundamentally changes the combinatorial landscape of automata operations. These findings enrich both the theoretical understanding of tree automata and provide actionable guidance for developers of tree‑processing systems.

💡 Research Summary

📜 Original Paper Content