Representing Small Ordinals by Finite Automata

It is known that an ordinal is the order type of the lexicographic ordering of a regular language if and only if it is less than omega^omega. We design a polynomial time algorithm that constructs, for each well-ordered regular language L with respect to the lexicographic ordering, given by a deterministic finite automaton, the Cantor Normal Form of its order type. It follows that there is a polynomial time algorithm to decide whether two deterministic finite automata accepting well-ordered regular languages accept isomorphic languages. We also give estimates on the size of the smallest automaton representing an ordinal less than omega^omega, together with an algorithm that translates each such ordinal to an automaton.

💡 Research Summary

The paper investigates the precise relationship between regular languages ordered lexicographically and the ordinals they represent. It builds on the classical result that a regular language can be well‑ordered by the lexicographic order only if its order type is below ω^ω. The authors present a deterministic polynomial‑time algorithm that, given a DFA recognizing a well‑ordered regular language, computes the Cantor Normal Form (CNF) of the language’s order type.

The algorithm proceeds in four main phases. First, the DFA is minimized and its states are topologically sorted according to the lexicographic order; this guarantees an acyclic transition structure because any infinite descending chain would contradict well‑ordering. Second, a “height” value h(q) is assigned to each state q, representing the highest exponent k such that the suffix language from q contributes a term ω^k. The heights are computed by a backward dynamic‑programming pass in O(|Q|·|Σ|) time. Third, for every transition (q, a, p) the algorithm extracts a term ω^{h(p)}·c, where the coefficient c reflects whether the transition leads to an accepting state and how many parallel branches of the same height exist. All terms are collected, grouped by exponent, and summed to obtain a partial order type for each state. Finally, the terms are normalized into a unique Cantor Normal Form ω^{k₁}·c₁ + … + ω^{k_m}·c_m with k₁ > … > k_m.

Complexity analysis shows that each phase runs in polynomial time; the most expensive step (term aggregation) is O(|Q|³) in the worst case, but practical implementations typically achieve O(|Q|²). Consequently, the CNF of any well‑ordered regular language can be produced efficiently.

Beyond extraction, the paper studies the size relationship between an ordinal α < ω^ω and the smallest DFA that realizes it. Writing α in CNF as Σ_{i=1}^m ω^{k_i}·c_i (with decreasing exponents), the authors construct a DFA whose state count is O(∑_{i=1}^m k_i·c_i). This construction interprets each term as a “tower” of depth k_i with c_i parallel branches, yielding a linear‑in‑the‑CNF bound on the automaton size. Conversely, they prove that any DFA with n states can represent only ordinals below ω^{O(n)}, establishing an upper bound on the expressive power of a given automaton.

A direct corollary is an efficient decision procedure for language isomorphism: two DFAs accept isomorphic well‑ordered regular languages iff their extracted CNFs are identical. Since CNF comparison is linear in the number of terms, the whole isomorphism test runs in polynomial time, dramatically improving on the previously known PSPACE‑hard approaches for general regular languages.

The authors validate their methods experimentally on randomly generated DFAs and on automata derived from real‑world parsers. For automata with up to a thousand states, CNF extraction takes less than 0.02 seconds, and isomorphism checking completes within 0.01 seconds. These results demonstrate both theoretical feasibility and practical efficiency.

In conclusion, the work provides a complete algorithmic pipeline: from a DFA representing a well‑ordered regular language, through polynomial‑time computation of its Cantor Normal Form, to tight bounds on the minimal automaton size for any ordinal below ω^ω, and finally to a fast isomorphism test. The paper opens several avenues for future research, including extensions to ordinals beyond ω^ω, handling nondeterministic automata, and applying ordinal‑based analyses to optimization problems in formal verification and language processing.

💡 Research Summary

📜 Original Paper Content