Succinct Representation for (Non)Deterministic Finite Automata
Deterministic finite automata are one of the simplest and most practical models of computation studied in automata theory. Their conceptual extension is the non-deterministic finite automata which also have plenty of applications. In this article, we study these models through the lens of succinct data structures where our ultimate goal is to encode these mathematical objects using information-theoretically optimal number of bits along with supporting queries on them efficiently. Towards this goal, we first design a succinct data structure for representing any deterministic finite automaton $\mathcal{D}$ having $n$ states over a $\sigma$-letter alphabet $\Sigma$ using $(\sigma-1) n\log n + O(n \log \sigma)$ bits of space, which can determine, given an input string $x$ over $\Sigma$, whether $\mathcal{D}$ accepts $x$ in $O(|x| \log \sigma)$ time, using constant words of working space. When the input deterministic finite automaton is acyclic, not only we can improve the above space-bound significantly to $(\sigma -1) (n-1)\log n+ 3n + O(\log^2 \sigma) + o(n)$ bits, we also obtain optimal query time for string acceptance checking. More specifically, using our succinct representation, we can check if a given input string $x$ can be accepted by the acyclic deterministic finite automaton using time proportional to the length of $x$, hence, the optimal query time. We also exhibit a succinct data structure for representing a non-deterministic finite automaton $\mathcal{N}$ having $n$ states over a $\sigma$-letter alphabet $\Sigma$ using $\sigma n^2+n$ bits of space, such that given an input string $x$, we can decide whether $\mathcal{N}$ accepts $x$ efficiently in $O(n^2|x|)$ time. Finally, we also provide time and space-efficient algorithms for performing several standard operations such as union, intersection, and complement on the languages accepted by deterministic finite automata.
💡 Research Summary
The paper “Succinct Representation for (Non)Deterministic Finite Automata” investigates how to store deterministic finite automata (DFAs) and nondeterministic finite automata (NFAs) using a number of bits that is asymptotically optimal from an information‑theoretic point of view, while still supporting the fundamental query of “does the automaton accept a given string?” efficiently.
Background and lower bounds.
The authors start by recalling known enumeration results: the number of initially‑connected DFAs with n states over an alphabet of size σ is Θ( n²·2ⁿ·S₂(σ·n, n) ), where S₂ denotes Stirling numbers of the second kind. Taking logarithms yields a lower bound of (σ − 1)·n·log n + O(n) bits for any representation of a DFA. For NFAs, previous work shows there are Θ(2^{σ·n²+n}) such machines, implying a lower bound of σ·n² + n bits. These bounds guide the design of the proposed data structures.
Succinct DFA representation (general case).
A DFA is viewed as a labeled directed graph G = (V, E) where V corresponds to states and each vertex has exactly σ outgoing arcs, each labeled by a distinct alphabet symbol. The authors store the transition function by fixing, for every state, one “reference” outgoing edge (e.g., the edge labeled ‘1’) and encoding the remaining (σ − 1) destinations as differences relative to the reference. Using rank‑select structures on bit‑vectors and a succinct tree representation (Jacobson‑Munro style), each difference can be accessed in O(1) time, leading to a total space of
(σ − 1)·n·log n + O(n·log σ) bits.
Given an input string x, the algorithm walks the automaton symbol by symbol, performing a constant‑time rank/select operation to retrieve the next state. The overall query time is O(|x|·log σ) and only constant‑word working space is required. If the DFA contains only N < σ·n non‑failure transitions, the space can be reduced to (N − n)·log n + O(N·log σ) bits.
Acyclic DFA (optimal query time).
When the DFA is acyclic (i.e., its transition graph is a DAG with a unique dead state that loops on every symbol), the structure simplifies dramatically. The non‑failure transitions form a forest that can be encoded as a succinct rooted tree using balanced parentheses and auxiliary rank‑select structures. The resulting space is
(σ − 1)(n − 1)·log n + 3n + O(log²σ) + o(n) bits,
and the acceptance test reduces to a simple walk along the tree, achieving O(|x|) time – the optimal linear time bound.
Succinct NFA representation.
For NFAs the transition function maps a state and a symbol to a set of states. The authors store, for each symbol, an n × n Boolean matrix indicating which transitions exist. This requires σ·n² bits, plus n bits to mark final states, matching the lower bound. To test a string, they maintain a bit‑vector of the current reachable states; for each input symbol they compute the Boolean product of the current vector with the appropriate matrix (implemented as a series of word‑wise ORs). This yields O(n²·|x|) time and O(2n) bits of auxiliary workspace.
Operations on DFAs.
The paper also addresses standard language operations: union, intersection, and complement. Using the classic product‑automaton construction, the authors show that given succinct representations of two DFAs D₁ (n states) and D₂ (n′ states), the product automaton has at most n·n′ states and can be built in O(n·n′) time while using O(n·n′·log(n·n′)) bits. The resulting product DFA can be stored in the same succinct format, with space (σ − 1)·n·n′·log (n·n′) + O(n·n′·log σ). Acceptance testing on the product automaton again takes O(|x|·log σ) time. Complementation is achieved simply by flipping the final‑state bitvector, costing no extra space.
Technical tools.
The constructions rely heavily on well‑known succinct data‑structure primitives: constant‑time rank/select on bit‑vectors (Jacobson, Raman et al.), succinct tree encodings (balanced‑parentheses representation), and compact representations of edge‑labeled directed graphs. The paper also discusses how to handle “failure” transitions (edges to a sink state) efficiently, which is crucial for many practical automata where most transitions are failures.
Contributions and impact.
In summary, the paper delivers:
- An information‑theoretically optimal encoding for general DFAs with O(|x|·log σ) query time.
- A dramatically improved encoding for acyclic DFAs that attains optimal O(|x|) query time.
- An optimal σ·n² + n‑bit encoding for NFAs with a straightforward O(n²·|x|) acceptance algorithm.
- Efficient algorithms for union, intersection, and complement directly on the succinct representations.
These results bridge a gap between automata theory and succinct data structures, offering practical solutions for memory‑constrained environments such as embedded systems, network packet filters, and large‑scale text processing pipelines. The techniques are likely to inspire further work on succinct representations of richer automata models (e.g., weighted automata, Büchi automata) and on accelerating automata‑based algorithms using modern hardware primitives.
Comments & Academic Discussion
Loading comments...
Leave a Comment