Automata on Graph Alphabets
The theory of finite automata concerns itself with words in a free monoid together with concatenation and without further structure. There are, however, important applications which use alphabets which are structured in some sense. We introduce automata over a particular type of structured data, namely an alphabet which is given as a (finite or infinite) directed graph. This constrains concatenation: two strings may only be concatenated if the end vertex of the first is equal to the start vertex of the second. We develop the beginnings of an automata theory for languages on graph alphabets. We show that they admit a Kleene theorem, relating rational and regular languages, and a Myhill-Nerode theorem, stating that languages are regular iff they have finite prefix or, equivalently, suffix quotient. We present determinization and minimization algorithms, but we also exhibit that regular languages are not stable by complementation. Finally, we mention how these structures could be generalized to presimplicial alphabets, where languages are no more freely generated.
💡 Research Summary
The paper introduces a novel framework for finite automata whose alphabet is not a plain set of symbols but a directed graph, called a “graph alphabet”. A graph alphabet (V, Σ, d₀, d₁) consists of a set of vertices V, a set of edges Σ, and source/target maps d₀, d₁ : Σ → V. A word over this alphabet is a sequence of edges that can be concatenated only when the target vertex of one edge matches the source vertex of the next; thus concatenation is type‑restricted.
An automaton on a graph alphabet is defined as A = (Q, I, F, E, s, t, µ, λ) where Q is a set of states, µ : Q → V labels each state with a vertex, and λ : E → Σ labels each transition with an edge. The labeling must respect the graph structure: µ(s(e)) = d₀(λ(e)) and µ(t(e)) = d₁(λ(e)) for every transition e. Consequently, the label of a path π = (q₀, e₁, …, eₙ, qₙ) is a triple (u, ω, v) where u = µ(q₀), v = µ(qₙ) and ω = λ(e₁)…λ(eₙ). The language L(A) consists of all such triples produced by accepting paths; the “untyped” language Lᵇ(A) forgets the vertex information and yields a subset of Σ*.
The authors first observe that the untyped language of any finite graph‑alphabet automaton is regular in the classical sense (Proposition 2), because one can collapse the vertex information via a homomorphism to a one‑vertex graph. They then develop a full theory of regular languages over graph alphabets, establishing a Kleene theorem (Theorem 18) that equates two notions:
- Rational sets – those built from basic edge sets using union, concatenation, and Kleene plus (no star, because the unit would be an infinite set of identity morphisms when V is infinite);
- Regular sets – those recognized by some finite automaton on the graph alphabet.
To prove the Kleene theorem, the paper defines basic automata for the empty set and for each single edge, shows how to construct automata for union (disjoint union of states), concatenation (by adding silent transitions that connect accepting states of the first automaton to initial states of the second when the vertex labels match), and Kleene plus (by adding silent loops from accepting to initial states of the same automaton). Lemma 12 guarantees that silent transitions can be eliminated without changing the language, mirroring the ε‑closure construction in classical automata theory. The converse direction (regular ⇒ rational) follows a Brzozowski‑McCluskey style induction on the number of states, carefully handling the vertex labels throughout.
The paper also shows that regular languages over graph alphabets are closed under intersection (Definition 19, Lemma 20) by constructing a product automaton whose state space consists of pairs of states with matching vertex labels, and under union and concatenation by the constructions already described. However, regular languages are not closed under complement; a counterexample is provided where taking the complement would require paths whose start and end vertices do not match any edge in the graph, making it impossible to represent the complement with a finite graph‑alphabet automaton.
A Myhill‑Nerode theorem is proved for graph‑alphabet languages. For a language X ⊆ (V, Σ)⁎, left and right quotients are defined with respect to morphisms (u, ω, v). The set of all right quotients (suffixes) suff(X) and left quotients (prefixes) pref(X) are shown to be finite exactly when X is regular (Lemma 21). Conversely, any rational language has a finite set of source vertices d₀(X) (Lemma 22), which leads to a finite Nerode congruence (Definition 23). Thus, regularity can be characterized by the finiteness of these quotient sets, just as in the classical case, but now the quotients respect the graph typing.
Determinization and minimization are addressed. Determinization proceeds by the standard subset construction, with the additional requirement that all states in a subset share the same vertex label; silent transitions are first removed, then subsets are formed, preserving the typing constraint. Minimization follows the usual partition refinement: states are merged when they have identical vertex labels and equivalent future behavior. The paper claims that minimal deterministic automata exist and can be effectively computed.
Finally, the authors discuss possible extensions to presimplicial alphabets, where the underlying structure includes higher‑dimensional cells (2‑cells, 3‑cells, etc.) rather than just edges. This would allow modeling of more complex concurrent behaviors, such as ST‑automata, where events can be started, terminated, or run concurrently. The paper suggests that many of the results (Kleene theorem, Myhill‑Nerode characterization) may carry over with suitable adaptations, opening a research direction toward automata on richer combinatorial structures.
In summary, the work systematically builds an automata theory for languages whose alphabet carries a graph‑based typing discipline. It extends fundamental results—Kleene’s theorem, Myhill‑Nerode, determinization, minimization—to this setting, while also highlighting a crucial divergence: lack of closure under complement. The framework unifies several application domains (resource‑controlled concurrency, type‑checking of function compositions, database triggers) under a common mathematical model and points toward further generalizations involving higher‑dimensional combinatorial alphabets.
Comments & Academic Discussion
Loading comments...
Leave a Comment