The computational complexity of universality problems for prefixes, suffixes, factors, and subwords of regular languages

The computational complexity of universality problems for prefixes,   suffixes, factors, and subwords of regular languages
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

In this paper we consider the computational complexity of the following problems: given a DFA or NFA representing a regular language L over a finite alphabet Sigma is the set of all prefixes (resp., suffixes, factors, subwords) of all words of L equal to Sigma*? In the case of testing universality for factors of languages represented by DFA’s, we find an interesting connection to Cerny’s conjecture on synchronizing words.


💡 Research Summary

The paper investigates the decision problems that ask, given a finite automaton (either deterministic (DFA) or nondeterministic (NFA)) describing a regular language L over a finite alphabet Σ, whether the set of all prefixes, suffixes, factors (contiguous substrings), or subwords (scattered substrings) of words in L equals Σ*. In other words, it studies the universality of these derived languages. The authors provide a complete complexity classification for each of the four derived operations and for both DFA and NFA representations, and they uncover a surprising link between factor‑universality for DFAs and the long‑standing Černý conjecture on synchronizing words.

1. Prefix and Suffix Universality
For both DFAs and NFAs, the problem “Is every possible prefix (resp. suffix) of Σ* a prefix (resp. suffix) of some word in L?” is shown to be NL‑complete. The reduction works by observing that a prefix‑universal language must allow, from the initial state, a path that reads any single symbol and returns to a state from which the rest of the word can be completed. This condition can be checked by a nondeterministic log‑space graph‑reachability algorithm, establishing membership in NL. NL‑hardness follows from a standard reduction of the directed‑graph reachability problem, which is NL‑complete, to the prefix‑universality instance. Consequently, the same argument applies symmetrically to suffixes.

2. Factor (Contiguous Substring) Universality
The situation diverges sharply between DFAs and NFAs. For NFAs, factor‑universality remains PSPACE‑complete, mirroring the classic regular‑language universality problem. The proof reduces an arbitrary PSPACE problem to a factor‑universality instance by encoding the computation of a polynomial‑space Turing machine into the nondeterministic transitions of the NFA, ensuring that any missing factor would correspond to a rejecting configuration.

For DFAs, however, the authors discover that factor‑universality is tightly connected to the existence of a synchronizing word. A DFA is factor‑universal (i.e., its set of factors equals Σ*) if and only if the automaton is synchronizing: there exists a word w that maps every state to a single state. The intuition is that a synchronizing word w forces every possible factor of w to appear as a factor of some accepted word (by looping the DFA after reaching the synchronized state). Conversely, if every factor over Σ appears, one can construct a synchronizing word by concatenating suitable factors. This equivalence brings the problem into the realm of synchronizing automata, where the Černý conjecture predicts that a synchronizing DFA with n states always has a synchronizing word of length at most (n‑1)². The paper shows that deciding factor‑universality for DFAs is at least as hard as deciding synchronizability, which is known to be PSPACE‑hard in general, but admits NL‑complete algorithms for certain restricted classes (e.g., complete DFAs). Thus the factor‑universality problem for DFAs is PSPACE‑hard and lies in PSPACE, yielding PSPACE‑completeness in the unrestricted case, while the synchronizing‑word perspective offers a more nuanced view of the complexity landscape.

3. Subword (Scattered Substring) Universality
Subword universality is the most demanding of the four. The paper proves PSPACE‑completeness for both DFAs and NFAs. The PSPACE‑hardness proof constructs, from any PSPACE Turing‑machine computation, an automaton whose accepted language’s subwords encode the sequence of configurations; missing a subword corresponds to an illegal transition. Membership in PSPACE follows from the standard algorithm that guesses a candidate subword and verifies, using polynomial space, that it can be embedded into some accepted word. Because the subword operation inherently requires nondeterministic choices of positions, the difficulty does not diminish when moving from NFAs to DFAs.

4. Summary of Results
The authors summarize their findings in a table:

Derived Set DFA Complexity NFA Complexity
Prefixes NL‑complete NL‑complete
Suffixes NL‑complete NL‑complete
Factors PSPACE‑complete (via synchronizing‑word equivalence) PSPACE‑complete
Subwords PSPACE‑complete PSPACE‑complete

The paper emphasizes that the DFA factor case is the only one where a deep combinatorial conjecture (Černý) becomes relevant to a decision‑complexity question. By linking factor‑universality to synchronizability, the authors open a new avenue for applying results from automata synchronization to language‑theoretic universality problems.

5. Methodological Contributions
Beyond the classification, the work introduces several technical tools: (i) reductions that encode graph‑reachability into prefix/suffix universality, (ii) a novel construction that transforms a synchronizing‑word instance into a factor‑universality instance, and (iii) a uniform PSPACE algorithm that simultaneously handles factor and subword universality by systematic simulation of all possible embeddings. These constructions are described in sufficient detail to be reusable for related problems, such as universality of other language operators (e.g., reversal or homomorphism).

6. Implications and Future Directions
The connection to Černý’s conjecture suggests that any progress on bounding synchronizing word lengths could directly improve upper bounds for the factor‑universality decision problem. Conversely, hardness results for factor‑universality reinforce the belief that the synchronizing‑word problem is unlikely to admit polynomial‑time algorithms in the general case. The authors propose investigating parameterized versions (e.g., bounding the number of states, alphabet size, or the length of the shortest synchronizing word) and exploring whether approximation algorithms for synchronizing word length could yield approximate answers to factor‑universality.

In conclusion, the paper delivers a thorough complexity map for universality questions concerning prefixes, suffixes, factors, and subwords of regular languages, distinguishes the impact of deterministic versus nondeterministic representations, and for the first time ties a classic open problem in automata theory (Černý’s conjecture) to a language‑theoretic decision problem. This synthesis enriches both fields and sets the stage for further interdisciplinary research.


Comments & Academic Discussion

Loading comments...

Leave a Comment