Remarks on separating words
The separating words problem asks for the size of the smallest DFA needed to distinguish between two words of length <= n (by accepting one and rejecting the other). In this paper we survey what is known and unknown about the problem, consider some variations, and prove several new results.
š” Research Summary
The paper studies the classic āseparating wordsā problem: given two distinct words w and x of length at most n, what is the size of the smallest deterministic finite automaton (DFA) that accepts one of them and rejects the other? The authors denote this size by sep(w,x) and define S(n)=max_{wā x, |w|,|x|ā¤n} sep(w,x). The goal is to obtain good upper and lower bounds on S(n).
First, the authors review prior work. GoralÄĆk and Koubek (1986) showed S(n)=o(n). Robson later improved the upper bound to S(n)=O(n^{2/5}(logāÆn)^{3/5}), which remains the best known.
A key contribution is the observation that the alphabet size does not affect S(n) for any alphabet of size at least two. PropositionāÆ2 proves S_k(n)=S_2(n) for all kā„2, allowing the whole discussion to be restricted to binary alphabets without loss of generality.
The paper then examines the average case. PropositionāÆ3 shows that for a uniformly random pair of distinct lengthān words over an alphabet of size k, the expected number of DFA states needed to separate them is O(1) (specifically ā¤4). The argument is that with probability 1ā1/k the words differ at the first position, which can be detected by a 3āstate DFA, and the expected cost of later differences forms a convergent geometric series.
The authors present a new lower bound for equalālength words. TheoremāÆ1 constructs two binary words w=0^{nā1}1^{nā1}+lcm(1,ā¦,n) and x=0^{nā1}+lcm(1,ā¦,n)1^{nā1} and proves that no DFA with n states can separate them. Since lcm(1,ā¦,n)=e^{(1+o(1))n}, this yields a ā¦(logāÆn) lower bound for S(n) when the two words have the same length.
Next, they collect a series of simple upper bounds for āeasyātoādetectā differences. If the words differ within the first d symbols, sepā¤d+2 (PropositionāÆ4); if they differ within the last d symbols, sepā¤d+1 (PropositionāÆ5). If the number of occurrences of some symbol a differs, a prime p=O(logāÆn) can be used to count modulo p, giving sep=O(logāÆn) (PropositionāÆ6). More generally, if a pattern of length d occurs a different number of times, sep=O(dāÆlogāÆn) (PropositionāÆ7).
A particularly interesting result is TheoremāÆ2, which handles the case of small Hamming distance. If H(w,x)ā¤d, then sep(w,x)=O(dāÆlogāÆn). The proof selects a prime p that does not divide the product of the differences in positions, builds two pāstate cycles to compute the parity of the bits at the first differing position, and uses the fact that the other differing positions are invisible modulo p. This shows that even very similar words can be separated with only a modest number of states.
The paper also investigates special families. For reversals, PropositionāÆ8 shows that there exist words w such that sep(w,w^R)=ā¦(logāÆn). The same holds for conjugate pairs (cyclic shifts) by PropositionāÆ9. These examples demonstrate that the lower bound from TheoremāÆ1 extends to natural restricted classes.
SectionāÆ6 introduces nondeterministic separation. Define nsep(w,x) as the smallest number of states of an NFA that accepts w and rejects x. TheoremāÆ3 proves that the ratio sep(w,x)/nsep(w,x) is unbounded: the same pair of words used in TheoremāÆ1 requires Ī(n^2) DFA states but only Ī(ān) NFA states. TheoremāÆ4 gives an ā¦(logāÆn) lower bound for NFAs separating a similar pair of words, using Chrobakās normal form for unary NFAs. TheoremāÆ5 shows that nondeterministic separation is symmetric under reversal: nsep(w,x)=nsep(w^R,x^R).
SectionāÆ7 shows that twoāway deterministic pushādown automata (2DPDAs) can separate any two distinct lengthān words with only O(logāÆn) states (PropositionāÆ10). The construction uses a binaryāsearchāstyle navigation of the input tape, moving to the first differing position in logarithmic time.
SectionāÆ8 discusses permutation automata, where each input symbol induces a permutation of the state set. Robsonās O(ān) upper bound for this model is recalled, and the authors relate the separatingāwords problem to the algebraic question of the shortest nonātrivial āidentical relationā in the symmetric group S_n. Recent work by Gimadeev and Vyalyi shows that the minimal length ā satisfies ā=2^{O(ānāÆlogāÆn)}.
Finally, the paper lists several open problems: whether the difference sep(x,w)āsep(x^R,w^R) can be unbounded, tighter bounds on the ratio sep/nsep, improved bounds for permutation automata, and others.
Overall, the paper provides a comprehensive survey of known results, introduces new lower bounds for equalālength words, demonstrates that averageācase separation is trivial, and establishes that nondeterminism can give arbitrarily large savings in state complexity. The mixture of combinatorial, numberātheoretic, and automataātheoretic techniques makes it a valuable reference for researchers interested in state complexity, language separation, and related decision problems.
Comments & Academic Discussion
Loading comments...
Leave a Comment