Remarks on separating words

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The separating words problem asks for the size of the smallest DFA needed to distinguish between two words of length <= n (by accepting one and rejecting the other). In this paper we survey what is known and unknown about the problem, consider some variations, and prove several new results.

💡 Research Summary

The paper studies the classic “separating words” problem: given two distinct words w and x of length at most n, what is the size of the smallest deterministic finite automaton (DFA) that accepts one of them and rejects the other? The authors denote this size by sep(w,x) and define S(n)=max_{w≠x, |w|,|x|≤n} sep(w,x). The goal is to obtain good upper and lower bounds on S(n).

First, the authors review prior work. Goralčík and Koubek (1986) showed S(n)=o(n). Robson later improved the upper bound to S(n)=O(n^{2/5}(log n)^{3/5}), which remains the best known.

A key contribution is the observation that the alphabet size does not affect S(n) for any alphabet of size at least two. Proposition 2 proves S_k(n)=S_2(n) for all k≥2, allowing the whole discussion to be restricted to binary alphabets without loss of generality.

The paper then examines the average case. Proposition 3 shows that for a uniformly random pair of distinct length‑n words over an alphabet of size k, the expected number of DFA states needed to separate them is O(1) (specifically ≤4). The argument is that with probability 1−1/k the words differ at the first position, which can be detected by a 3‑state DFA, and the expected cost of later differences forms a convergent geometric series.

The authors present a new lower bound for equal‑length words. Theorem 1 constructs two binary words w=0^{n−1}1^{n−1}+lcm(1,…,n) and x=0^{n−1}+lcm(1,…,n)1^{n−1} and proves that no DFA with n states can separate them. Since lcm(1,…,n)=e^{(1+o(1))n}, this yields a Ω(log n) lower bound for S(n) when the two words have the same length.

Next, they collect a series of simple upper bounds for “easy‑to‑detect” differences. If the words differ within the first d symbols, sep≤d+2 (Proposition 4); if they differ within the last d symbols, sep≤d+1 (Proposition 5). If the number of occurrences of some symbol a differs, a prime p=O(log n) can be used to count modulo p, giving sep=O(log n) (Proposition 6). More generally, if a pattern of length d occurs a different number of times, sep=O(d log n) (Proposition 7).

A particularly interesting result is Theorem 2, which handles the case of small Hamming distance. If H(w,x)≤d, then sep(w,x)=O(d log n). The proof selects a prime p that does not divide the product of the differences in positions, builds two p‑state cycles to compute the parity of the bits at the first differing position, and uses the fact that the other differing positions are invisible modulo p. This shows that even very similar words can be separated with only a modest number of states.

The paper also investigates special families. For reversals, Proposition 8 shows that there exist words w such that sep(w,w^R)=Ω(log n). The same holds for conjugate pairs (cyclic shifts) by Proposition 9. These examples demonstrate that the lower bound from Theorem 1 extends to natural restricted classes.

Section 6 introduces nondeterministic separation. Define nsep(w,x) as the smallest number of states of an NFA that accepts w and rejects x. Theorem 3 proves that the ratio sep(w,x)/nsep(w,x) is unbounded: the same pair of words used in Theorem 1 requires Θ(n^2) DFA states but only Θ(√n) NFA states. Theorem 4 gives an Ω(log n) lower bound for NFAs separating a similar pair of words, using Chrobak’s normal form for unary NFAs. Theorem 5 shows that nondeterministic separation is symmetric under reversal: nsep(w,x)=nsep(w^R,x^R).

Section 7 shows that two‑way deterministic push‑down automata (2DPDAs) can separate any two distinct length‑n words with only O(log n) states (Proposition 10). The construction uses a binary‑search‑style navigation of the input tape, moving to the first differing position in logarithmic time.

Section 8 discusses permutation automata, where each input symbol induces a permutation of the state set. Robson’s O(√n) upper bound for this model is recalled, and the authors relate the separating‑words problem to the algebraic question of the shortest non‑trivial “identical relation” in the symmetric group S_n. Recent work by Gimadeev and Vyalyi shows that the minimal length ℓ satisfies ℓ=2^{O(√n log n)}.

Finally, the paper lists several open problems: whether the difference sep(x,w)−sep(x^R,w^R) can be unbounded, tighter bounds on the ratio sep/nsep, improved bounds for permutation automata, and others.

Overall, the paper provides a comprehensive survey of known results, introduces new lower bounds for equal‑length words, demonstrates that average‑case separation is trivial, and establishes that nondeterminism can give arbitrarily large savings in state complexity. The mixture of combinatorial, number‑theoretic, and automata‑theoretic techniques makes it a valuable reference for researchers interested in state complexity, language separation, and related decision problems.

Remarks on separating words

💡 Research Summary

Comments & Academic Discussion

Leave a Comment