A Trichotomy for Regular Simple Path Queries on Graphs
Regular path queries (RPQs) select nodes connected by some path in a graph. The edge labels of such a path have to form a word that matches a given regular expression. We investigate the evaluation of RPQs with an additional constraint that prevents multiple traversals of the same nodes. Those regular simple path queries (RSPQs) find several applications in practice, yet they quickly become intractable, even for basic languages such as (aa)* or aba. In this paper, we establish a comprehensive classification of regular languages with respect to the complexity of the corresponding regular simple path query problem. More precisely, we identify the fragment that is maximal in the following sense: regular simple path queries can be evaluated in polynomial time for every regular language L that belongs to this fragment and evaluation is NP-complete for languages outside this fragment. We thus fully characterize the frontier between tractability and intractability for RSPQs, and we refine our results to show the following trichotomy: Evaluations of RSPQs is either AC0, NL-complete or NP-complete in data complexity, depending on the regular language L. The fragment identified also admits a simple characterization in terms of regular expressions. Finally, we also discuss the complexity of the following decision problem: decide, given a language L, whether finding a regular simple path for L is tractable. We consider several alternative representations of L: DFAs, NFAs or regular expressions, and prove that this problem is NL-complete for the first representation and PSPACE-complete for the other two. As a conclusion we extend our results from edge-labeled graphs to vertex-labeled graphs and vertex-edge labeled graphs.
💡 Research Summary
The paper investigates the evaluation problem for Regular Simple Path Queries (RSPQs), a variant of Regular Path Queries (RPQs) where the returned paths must be simple, i.e., they may not repeat any vertex. While RPQs can be answered in NL‑complete data complexity, the additional simplicity constraint dramatically raises the difficulty: even for extremely simple regular languages such as (aa)* or aba, the problem becomes NP‑complete. The authors therefore set out to delineate exactly which regular languages admit polynomial‑time evaluation of RSPQs and which do not.
The core contribution is a complete classification of regular languages with respect to the data‑complexity of the corresponding RSPQ problem. They identify a maximal tractable fragment F of regular languages. A language L belongs to F if and only if the minimal DFA for L has a very restricted strongly‑connected component structure: either it contains no cycles (the DFA is a directed tree) or it contains exactly one simple cycle, and the labels on that cycle do not “interleave” with labels on other transitions. In regular‑expression terms, F can be described by a syntactic restriction that forbids crossing of sub‑expressions; typical members include expressions such as a*·b·a*·b*, (a|b)*, and any expression that can be written as a concatenation of a star‑free part and a single looping part whose alphabet is disjoint from the rest.
For languages inside F, the authors prove that RSPQ evaluation is always polynomial‑time. Moreover, they refine the classification into three precise data‑complexity classes:
- AC⁰ – If the DFA of L is acyclic (a tree), the query can be answered by a constant‑depth, polynomial‑size circuit. Intuitively, each vertex can be checked locally for compatibility with the regular expression, and no global search is required.
- NL‑complete – If the DFA contains a single simple cycle that satisfies the non‑interleaving condition, the problem reduces to a nondeterministic log‑space graph‑reachability task with an additional regular‑language check. This matches the classic NL‑complete complexity of ordinary RPQs.
- NP‑complete – For any regular language outside F, the authors give a reduction from 3‑SAT (or Hamiltonian‑Path) showing that enforcing simplicity together with the language constraint yields an NP‑hard combinatorial problem. Consequently, evaluating RSPQs for such languages is NP‑complete even when the graph is given in binary encoding.
Beyond data complexity, the paper studies the language‑tractability decision problem: given a description of a regular language L (as a DFA, NFA, or regular expression), decide whether L lies in the tractable fragment F. They show:
- When L is presented as a DFA, the problem is NL‑complete. One only needs to inspect the transition graph for the presence of multiple cycles or interleaving labels, which can be done in logarithmic space.
- When L is given as an NFA or a regular expression, the problem becomes PSPACE‑complete. The difficulty stems from the need to determinize the automaton or to reason about all possible ε‑transitions, which may require exponential blow‑up; this matches known PSPACE‑hardness results for regular‑expression equivalence and universality.
The authors also extend all results to two richer labeling models:
- Vertex‑labeled graphs, where the label sequence is taken from vertex labels rather than edge labels. By treating vertex labels as edge labels of a transformed graph, the same classification holds.
- Vertex‑and‑edge‑labeled graphs, where each step contributes a pair (vertex label, edge label). By encoding each pair as a single symbol over a larger alphabet, the same DFA‑structure analysis applies, yielding the identical trichotomy.
Methodologically, the paper combines automata‑theoretic characterizations with classic complexity‑theoretic reductions. The maximal tractable fragment is proved maximal by constructing, for any language not satisfying the structural conditions, a graph family that forces an NP‑hard reduction. Conversely, for languages in the fragment, the authors present explicit algorithms: a constant‑depth circuit construction for the acyclic case, and a nondeterministic log‑space traversal that respects the single‑cycle structure for the NL case.
The significance of the work lies in its practical implications for graph database systems and knowledge‑graph query engines. Since many real‑world queries are expressed as RPQs with a simplicity requirement (to avoid cycles that would otherwise cause infinite result sets), the classification tells system designers exactly which regular expressions can be safely allowed without risking intractable query evaluation. Moreover, the tractability‑decision algorithms enable static analysis tools that can warn users when a query’s regular expression falls outside the safe fragment, possibly suggesting a rewrite into an equivalent expression inside the fragment.
In summary, the paper delivers a full trichotomy—AC⁰, NL‑complete, NP‑complete—for the data complexity of Regular Simple Path Queries, precisely characterizes the maximal tractable regular‑language fragment, determines the complexity of deciding membership in that fragment for various language representations, and shows that the results extend seamlessly to vertex‑labeled and mixed vertex‑edge‑labeled graph models. This establishes a definitive frontier between tractable and intractable RSPQs and provides both theoretical insight and actionable guidance for the design of efficient graph query languages.
Comments & Academic Discussion
Loading comments...
Leave a Comment