On the Hardness and Inapproximability of Recognizing Wheeler Graphs
In recent years several compressed indexes based on variants of the Burrows-Wheeler transformation have been introduced. Some of these index structures far more complex than a single string, as was originally done with the FM-index [Ferragina and Manzini, J. ACM 2005]. As such, there has been an effort to better understand under which conditions such an indexing scheme is possible. This led to the introduction of Wheeler graphs [Gagie it et al., Theor. Comput. Sci., 2017]. A Wheeler graph is a directed graph with edge labels which satisfies two simple axioms. Wheeler graphs can be indexed in a way which is space efficient and allows for fast traversal. Gagie et al. showed that de Bruijn graphs, generalized compressed suffix arrays, and several other BWT related structures can be represented as Wheeler graphs. Here we answer the open question of whether or not there exists an efficient algorithm for recognizing if a graph is a Wheeler graph. We demonstrate:(i) Recognizing if a graph is a Wheeler graph is NP-complete for any edge label alphabet of size $\sigma \geq 2$, even for DAGs. It can be solved in linear time for $\sigma =1$; (ii) An optimization variant called Wheeler Graph Violation (WGV) which aims to remove the minimum number of edges needed to obtain a Wheeler graph is APX-hard, even for DAGs. Hence, unless P = NP, there exists constant $C > 1$ such that there is no $C$-approximation algorithm. We show conditioned on the Unique Games Conjecture, for every constant $C \geq 1$, it is NP-hard to find a $C$-approximation to WGV; (iii) The Wheeler Subgraph problem (WS) which aims to find the largest Wheeler subgraph is in APX for $\sigma=O(1)$; (iv) For the above problems there exist efficient exponential time exact algorithms, relying on graph isomorphism being computed in strictly sub-exponential time; (v) A class of graphs where the recognition problem is polynomial time solvable.
💡 Research Summary
The paper investigates the computational complexity of recognizing Wheeler graphs, a class of edge‑labeled directed graphs that admit space‑efficient BWT‑based indexing. A Wheeler graph is defined by two simple ordering axioms relating edge labels and vertex orderings; these axioms guarantee the “path coherence” property essential for fast traversal. The authors address both the decision problem (does a given graph admit a Wheeler ordering?) and several optimization variants.
Main Results
-
NP‑completeness of Recognition
For any alphabet size σ ≥ 2, recognizing Wheeler graphs is NP‑complete. The proof reduces the classic Betweenness problem (known to be NP‑complete) to Wheeler recognition. The reduction builds a directed acyclic graph (DAG) whose vertices represent copies of the input elements and the triples of the Betweenness instance. Edges labeled 1 enforce a duplicated permutation of the elements; edges labeled 2 encode the Betweenness constraints. A Wheeler ordering exists iff the original Betweenness instance has a feasible total order. Consequently, even when the input graph is a DAG, the problem remains NP‑complete.For σ = 1, the problem becomes equivalent to testing whether a DAG has queue‑number 1, a problem solvable in linear time using known algorithms. Moreover, the authors prove that any Wheeler graph with a single label contains Θ(n) edges, establishing a tight linear bound.
-
Complexity on d‑NFAs
A d‑NFA is a nondeterministic finite automaton where each state has at most d outgoing edges with the same label. The authors show that Wheeler recognition remains NP‑complete for d‑NFAs with d ≥ 5, even when the automaton is a DAG. The reduction uses a variant of the 4‑NAE‑SAT problem, first converting it to a restricted 3‑NAE‑SAT (where each middle variable appears at most twice) and then encoding the clauses into a DAG. This complements earlier work showing polynomial‑time recognizability for d ≤ 2. -
Optimization Variant – Wheeler Graph Violation (WGV)
WGV asks for the smallest set of edges whose removal yields a Wheeler graph. By reducing from Minimum Feedback Arc Set, the authors prove that WGV is APX‑hard, even on DAGs. Assuming the Unique Games Conjecture (UGC), they further show that for any constant C ≥ 1, achieving a C‑approximation is NP‑hard. Hence, unless P = NP, no constant‑factor approximation algorithm exists for WGV. -
Dual Optimization – Wheeler Subgraph (WS)
WS seeks the largest subgraph that is a Wheeler graph. For constant alphabet size σ = O(1), the problem lies in APX. The authors present a simple greedy algorithm that selects, for each label, a contiguous block of vertices respecting the Wheeler ordering. This algorithm guarantees a solution whose size is at least a 1/σ‑fraction of the optimum, establishing a constant‑factor approximation. -
Exact Exponential‑Time Algorithms
Leveraging recent results that undirected graph isomorphism can be solved in sub‑exponential time, the paper gives exact algorithms for recognition, WGV, and WS that run in time 2^{O(n + e log σ)} where n is the number of vertices and e the number of edges. The approach enumerates all possible bounded‑size encodings of Wheeler graphs and checks each candidate via isomorphism testing. -
A Polynomial‑Time Solvable Class
Using PQ‑trees and techniques from queue‑number testing, the authors identify a non‑trivial class of graphs (e.g., DAGs with a single label or with restricted label ordering) where Wheeler recognition can be performed in linear time. This class includes many practical structures arising in bioinformatics and text indexing.
Implications and Future Directions
The results delineate a clear boundary: Wheeler recognition is easy only when the label alphabet is trivial (σ = 1) or when the graph’s nondeterminism is very low (d ≤ 2). In the general case, the problem is computationally intractable, and even the natural optimization versions are either APX‑hard (for edge deletion) or admit only constant‑factor approximations (for subgraph extraction). These findings caution practitioners that preprocessing a generic graph to fit a Wheeler‑based index may be computationally prohibitive, and motivate the search for parameterized or heuristic methods tailored to specific graph families.
Comments & Academic Discussion
Loading comments...
Leave a Comment