On the Complexity of the Ordered Covering Problem in Distance Geometry

December 02, 2025

Reading time: 20 minute

...

📝 Original Info

Title: On the Complexity of the Ordered Covering Problem in Distance Geometry
ArXiv ID: 2512.03124
Date: 2025-12-02
Authors: Michael Souza, Júlio Araújo, John Kesley Costa, Carlile Lavor

📝 Abstract

The Ordered Covering Problem (OCP) arises in the context of the Discretizable Molecular Distance Geometry Problem (DMDGP), where the ordering of pruning edges significantly impacts the performance of the SBBU algorithm for protein structure determination. In recent work, Souza et al. (2023) formalized OCP as a hypergraph covering problem with ordered, exponential costs, and proposed a greedy heuristic that outperforms the original SBBU ordering by orders of magnitude. However, the computational complexity of finding optimal solutions remained open. In this paper, we prove that OCP is NP-complete through a polynomial-time reduction from the strongly NP-complete 3-Partition problem. Our reduction constructs a tight budget that forces optimal solutions to correspond exactly to valid 3-partitions. This result establishes a computational barrier for optimal edge ordering and provides theoretical justification for the heuristic approaches currently used in practice.

📄 Full Content

The Distance Geometry Problem (DGP) is fundamental to computational structural biology: given a weighted graph G = (V, E, d) with distance function d : E → R + and a positive integer K, find a realization x : V → R K such that ∥x u -x v ∥ = d uv for all {u, v} ∈ E [13,15,3]. Saxe [18] proved that the DGP is strongly NP-hard, even for fixed dimension K. This problem has extensive applications in determining three-dimensional protein structures from experimental distance data obtained via Nuclear Magnetic Resonance (NMR) spectroscopy [21].

For arbitrary graphs, the DGP typically requires search algorithms in continuous space [15]. However, certain protein-derived graphs possess special structural properties that enable discrete solution methods. In particular, protein backbone graphs exhibit a natural ordering of atoms along the molecular chain, where each atom’s position can be determined from a finite set of candidate positions based on its predecessors [10]. This discretization property led to the development of the Discretizable Molecular Distance Geometry Problem (DMDGP) [11], a subclass of DGP instances characterized by vertex orders satisfying specific adjacency constraints.

The DMDGP framework has proven remarkably effective for protein backbone determination. When the graph has an appropriate vertex order, the solution space becomes discrete, and a Branchand-Prune (BP) algorithm [14] can systematically explore all feasible molecular conformations. The BP algorithm constructs solutions incrementally, branching on the possible positions of each atom and pruning configurations that violate distance constraints. While exponential in the worst case, BP remains practical for proteins due to effective pruning from additional distance constraints.

Recently, Gonçalves et al. [6] introduced the SBBU algorithm, which represents a significant advance over the classical BP approach. Rather than exploring a single global search tree, SBBU decomposes the DMDGP into a sequence of smaller subproblems, each associated with a pruning edge (a distance constraint beyond those required for discretization). The algorithm solves these subproblems sequentially, and crucially, binary variables from solved subproblems can be eliminated from subsequent ones through symmetry exploitation [17].

The computational efficiency of SBBU depends critically on the order in which pruning edges are processed. Given a permutation π = (e 1 , . . . , e m ) of the m pruning edges, SBBU solves the sequence (P (e 1 ), . . . , P (e m )) of feasibility subproblems. Each subproblem P (e i ) involves a set of binary variables whose cardinality determines an exponential search space. The ordering determines which variables can be eliminated early, dramatically affecting the total computational cost.

Souza et al. [19] demonstrated this impact empirically: on a test set of 5,000 randomly generated instances, the original SBBU edge ordering performed on average 1,300 times worse than optimal orderings, with some instances showing gaps exceeding 6,000-fold. They formalized the edge ordering problem as the Ordered Covering Problem (OCP), viewing the pruning edges as hyperedges covering segments of binary variables, where the order determines which variables remain active in each subproblem. Their proposed greedy heuristic achieved near-optimal performance (within 0.1% on average), but the fundamental computational complexity of finding optimal solutions remained an open question.

The complexity of vertex ordering problems in distance geometry has been extensively studied. Cassioli et al. [1] analyzed three types of discretization vertex orders, establishing their NPcompleteness and inclusion relationships:

The Trilateration Ordering Problem (TOP) asks whether a graph admits a DDGP order, where each vertex beyond an initial K-clique is adjacent to at least K predecessors. Cassioli et al. proved TOP is NP-complete by reduction from Clique. However, for fixed K, TOP becomes polynomial-time solvable [9].

Ordering Problem (CTOP) requires the K adjacent predecessors to be contiguous in the order. This defines the stricter K DMDGP class, which has favorable symmetry properties [11]. These results establish the inclusionwise relationship: CTOP ⊊ ReOP ⊊ TOP, with all three problems being NP-complete. More recently, Lavor et al. [12] showed an interesting dichotomy: the ReOP is NP-complete for K = 1 (essentially Hamiltonian Path), but belongs to P for any fixed K ≥ 2, with a polynomial-time algorithm running in O(|V | 2K ) time.

The OCP differs fundamentally from these vertex ordering problems. While TOP, CTOP, and ReOP concern the order of graph vertices with local adjacency constraints, OCP concerns the order of pruning edges (or more generally, hyperedges in a hypergraph) with a global, exponential cost function. The cost structure arises from the exponential growth of search spaces with respect to the number of active binary variables.

Formally, an OCP instance consists of a set S of labels with weights, a family E of subsets covering S, and a budget C. An ordered covering is a sequence E ′ = (E ′ 1 , . . . , E ′ k ) of sets from E that collectively cover S. The cost is defined by residual sets:

The decision problem asks: does there exist an ordered covering with F (E ′ ) ≤ C? Unless stated otherwise, all integers in the input (weights and budget) are encoded in binary.

The exponential cost function fundamentally distinguishes OCP from classical covering problems like Set Cover or Hitting Set, which have linear or polynomial costs [8,5,2]. Moreover, the order dependence means that simply finding a minimal cover is insufficient-the sequence matters critically. Taken together, these structural features might suggest that OCP is computationally tractable, perhaps even admitting a greedy algorithm or a dynamic programming approach.

In this paper, we resolve the complexity of OCP by proving it is NP-complete. Our main result is:

We establish this through a polynomial-time reduction from the 3-Partition problem, which is known to be strongly NP-complete [5]. The reduction uses a carefully calibrated exponential cost structure to separate valid from invalid solutions, imposes a tight budget that is met exactly when a 3-partition exists, and builds opening and assignment edges that force any feasible solution to respect the triplet structure of the 3-partition instance.

The proof relies on three key lemmas establishing that any feasible solution must have a rigid structure: the number of active assignment edges equals the number of bins in the corresponding 3-Partition instance, each preceded by its opening edge, each contributing exactly three new labels with disjoint coverage. These structural constraints ensure a bijection between OCP solutions and 3-partitions.

Note that although we reduce from the strongly NP-complete 3-Partition problem, the budget parameter in our construction may be exponential in the size of the 3-Partition instance. Consequently, this reduction establishes NP-completeness of OCP but does not establish strong NP-completeness; we return to this point when describing the reduction in Section 3.

Our NP-completeness result provides theoretical justification for the heuristic approach taken by Souza et al. [19]. The greedy heuristic’s near-optimal performance is not merely a practical convenience but a well-motivated strategy given the fundamental computational barrier. The result also explains why the original SBBU ordering can be arbitrarily suboptimal: no polynomial-time algorithm can guarantee optimality, assuming P ̸ = NP.

From a broader perspective, our result adds to the landscape of complexity results in distance geometry. While vertex ordering problems (TOP, CTOP, ReOP) are NP-complete via reductions from Clique and Hamiltonian Path, edge ordering (OCP) is NP-complete via reduction from 3-Partition, highlighting the distinct computational structure of the two problem types.

The remainder of this paper is organized as follows. Section 2 provides preliminaries, including formal definitions of OCP and 3-Partition. Section 3 presents our main result: the NP-completeness proof via reduction from 3-Partition, including the reduction construction and correctness proof with three key lemmas. Section 4 concludes with a discussion of implications and open questions.

Souza et al. [19] originally define the OCP as a minimization problem. In this work, we study the associated decision problem, which is the standard formulation used to prove NP-completeness. The optimization and decision versions are polynomially equivalent, so NP-completeness of the decision version implies NP-hardness of the original optimization problem.

Definition 1 (OCP Instance). An instance of the Ordered Covering Problem consists of:

A weight function val : m i=1 E i → N assigning a positive integer weight to every element that appears in some edge;

In the DMDGP context, S represents segments of binary variables, E represents hyperedges corresponding to pruning edges in the distance geometry graph, and the exponential cost reflects the size of the search space for each subproblem in the SBBU algorithm [19].

Definition 3 (Residual Sets and Weights). Given a covering E ′ = (E ′ 1 , . . . , E ′ k ), we define for each i ∈ {1, . . . , k}:

The residual set:
The residual weight:

The residual set U i represents the labels that are covered for the first time by E ′ i .

Definition 4 (Cost Function). The partial cost of E ′ i in the covering E ′ is:

The total cost of the covering E ′ is:

The exponential cost function distinguishes OCP from classical covering problems. In Set Cover, for instance, the cost is simply the number of sets selected (or their cardinality) [7]. In OCP, the cost grows exponentially with the weight of elements covered, and crucially, the order determines which elements contribute to which residual sets.

Does there exist an ordered covering E ′ with F (E ′ ) ≤ C?

Our reduction uses the 3-Partition problem, a classical strongly NP-complete problem. Garey and Johnson [4] originally proved 3-Partition to be NP-complete by a reduction from 3-dimensional matching; see also [5]. The “strongly” NP-complete designation means the problem remains NPcomplete even when all numbers are bounded by a polynomial in the input size. The constraint B/4 < a i < B/2 ensures that each triplet in a valid partition contains exactly three distinct elements-no pair sums to B, and no element can appear twice or with a fourth element. This “forced triplet” structure is essential to our reduction.

Note that A is a multiset, meaning that multiple elements can have the same value. However, elements are distinguishable by their labels (indices). When we construct a partition, we partition the elements themselves (by their labels), not merely their values. This distinction will be important in our reduction, where we use labeled tokens to represent multiset elements.

Theorem 2 (Garey and Johnson [4,5]). 3-Partition is strongly NP-complete.

We use 3-Partition as the source problem, but note that our OCP construction employs exponential cost terms (powers of two). Thus, while the instance encoding size remains polynomial (binary representation), the reduction does not bound all numeric parameters by a polynomial in m; accordingly, our proof establishes NP-completeness, not strong NP-completeness, for OCP.

Throughout this paper, we use the following conventions:

[m] = {1, 2, . . . , m} denotes the set of first m positive integers.

For a covering E ′ = (E ′ 1 , . . . , E ′ k ) and index i, we write U (E ′ i ) or simply U i for the residual set of E ′ i in this covering.
We use Greek letters α, β, γ for labels in S, and Latin letters a, b, c for values (weights).
We distinguish between labels (elements of S) and their weights (values assigned by the function val).

In this section, we prove the main result of this paper.

Theorem 3. The Ordered Covering Problem is NP-complete.

The proof consists of two parts: showing OCP is in NP (Section 3.1), and reducing 3-Partition to OCP in polynomial time (Sections 3.2-3.5).

Proof that OCP ∈ NP. We must show that OCP solutions can be verified in time polynomial in the input size when integers are given in binary.

Given an OCP instance (S, val, E, C) and a candidate solution E ′ = (E ′ 1 , . . . , E ′ k ), we verify as follows:

Structure of the covering: We may assume the certificate encodes E ′ as a sequence of indices in [m], where E = {E 1 , . . . , E m }. Checking that each index lies in [m] takes O(k) arithmetic comparisons. Using a Boolean array over S, we then scan all sets E ′ i once to mark covered labels and verify that every element of S is covered at least once. This requires O k i=1 |E ′ i | operations, which is polynomial in the input size.
Residual sets and weights: Processing the sets in order and maintaining a second Boolean array of “already seen” labels over m i=1 E i , we compute, for each i, the residual set

For each i, we then compute u i = x∈U i val(x) using big-integer addition. Let L be the maximum bit-length of any weight in the input. Since

so each u i has bit-length at most L + log 2 m j=1 |E j | , and the total time to compute all u i is polynomial in the input size.

, so we can immediately reject the certificate. Otherwise, u i ≤ ⌊log 2 C⌋ and f (E ′ i ) = 2 u i has at most O(log C) bits and can be computed as a left shift. Summing these k values and comparing the result to C uses a polynomial number of big-integer operations on O(log C)-bit integers.

Therefore, verification runs in polynomial time in the input size, and OCP ∈ NP under binary encoding.

We now construct a polynomial-time reduction Φ that maps any 3-Partition instance to an OCP instance such that the 3-Partition instance is a YES instance if and only if the corresponding OCP instance is a YES instance.

Let (A, B) be an instance of 3-Partition, where A = {a 1 , . . . , a 3m } with 3m i=1 a i = mB and B/4 < a i < B/2 for all i. We construct an OCP instance Φ(A, B) = (S, val, E, C) as follows.

The Label Set S. The set of primary labels corresponding to the elements of the multiset A is:

where each label α ℓ has weight val(α ℓ ) = a ℓ . We use distinct labels α ℓ even when values a ℓ repeat in the multiset A. This allows us to distinguish between different elements with the same value, which is essential for partitioning the multiset. These are the only labels that must be covered.

Valid Triplets. Define the collection of valid triplets:

In other words, T consists of all three-element subsets of S whose weights sum to exactly B. By the constraint B/4 < a i < B/2, any valid triplet contains exactly three distinct labels. No pair sums to B, and no element appears twice. Finally, we have

Auxiliary Tokens and Edge Construction. For each “bin” i ∈ [m] and each valid triplet X j ∈ T (indexed by j ∈ [|T |]), we introduce two distinct auxiliary tokens:

An opening token ω ij with weight val(ω ij ) = w;

where w and t are positive integer constants to be specified below. Crucially, these tokens are distinct for each pair (i, j) and do not belong to S:

∈ S for all (i, j).

The auxiliary tokens lie outside the required coverage set but still contribute to costs through val.

For each pair (i, j) with i ∈ [m] and j ∈ [|T |], we define two edges:

Opening edge:

If X j = {α j1 , α j2 , α j3 }, then:

The edge family is:

Key properties: (1) Each ω ij appears in exactly two edges: A ij and E ij ; (2) Each τ ij appears in exactly one edge: E ij ; (3) Only the E ij edges contain labels from S; (4) The total number of edges is 2m|T | = O(m 4 ), which is polynomial in m.

The elements that appear in edges (and thus receive weights via val) are

The Cost Separation Parameter w. The parameter w is chosen to ensure a cost “separation” between different solution structures. Specifically, we want to ensure that using more than m opening edges, or using an assignment edge E ij before its opening edge A ij , will exceed the budget.

We first define the canonical cost that a valid solution should achieve. If we have a 3-partition P = {P 1 , . . . , P m }, we can construct a covering by selecting, for each bin i, the opening edge A ij(i) and assignment edge E ij(i) where X j(i) = P i (viewing P i as a subset of labels from S). The covering sequence is:

In this sequence: (1) Each A ij(i) covers ω ij(i) with residual weight w, contributing cost 2 w ; (2) Each E ij(i) covers τ ij(i) and the three labels of P i , with residual weight t + α∈P i val(α) = t + B, contributing cost 2 t+B . Moreover, α∈Q i val(α) = B for each i.

We construct the covering:

Analysis of costs:

Bin 1: When we apply A 1j(1) = {ω 1j(1) }, the residual set is U (A 1j(1) ) = {ω 1j(1) } with weight w, so f (A 1j(1) ) = 2 w . Next, when we apply

was already covered), with weight α∈Q 1 val(α

Coverage: The sets Q 1 , . . . , Q m are disjoint and cover S. Each ω ij(i) is covered by A ij(i) , and each τ ij(i) is covered by E ij(i) . Thus, all labels in S are covered.

Total cost:

Thus, E can is a feasible covering with F (E can ) ≤ C.

We now show that if the OCP instance is a YES instance, then the 3-Partition instance is a YES instance. This is the more involved direction, requiring three key lemmas about the structure of optimal coverings.

Theorem 5 (Soundness). If Φ(A, B) is a YES instance of OCP (i.e., there exists a covering E ′ with F (E ′ ) ≤ C), then (A, B) is a YES instance of 3-Partition.

The proof relies on three lemmas that establish the structure of any feasible covering.

Lemma 1 (Opening Precedes Assignment). In any covering E ′ = (E ′ 1 , . . . , E ′ k ) with F (E ′ ) ≤ C, every assignment edge E ij that appears with positive residual weight (i.e., U (E ij ) ̸ = ∅) is preceded in the sequence by its opening edge A ij .

Proof. We argue by contradiction. Suppose there is an assignment edge E ij with positive residual such that its opening edge A ij does not appear before E ij in E ′ .

Since only assignment edges contain labels from S, and |S| = 3m while each assignment edge contains at most three labels from S, at least m assignment edges must have positive residuals; let r ≥ m denote their number.

When E ij is applied without A ij having been applied earlier, both ω ij and τ ij are uncovered, so

For each of the remaining (r -1) assignment edges with positive residuals, the covering must pay at least 2 w : if such an edge is properly opened, its opening edge costs 2 w , and if it is not, its cost is even larger. Thus,

For each p ∈ [m], let Q p = U (H p ) ∩ S be the three labels from S contributed by H p . By the construction of the reduction, there exists a triplet X p ∈ T such that

where X p is a three-element subset of S with total weight B.

The only labels of S contained in H p are those in X p , so Q p and X p are both three-element subsets of S contained in H p . Since Q p consists precisely of the labels from S first covered by H p , we must have Q p = X p .

Because the sets Q 1 , . . . , Q m are pairwise disjoint and their union is S (Lemma 3), the same holds for X 1 , . . . , X m . Hence {X 1 , . . . , X m } forms a partition of S into m triplets, each of total weight B.

Since the elements of S are labels for the elements of A, we translate back: for each p, the three labels in X p correspond to three elements of A whose sum is B. The resulting sets P 1 , . . . , P m are pairwise disjoint and cover A, so P = {P 1 , . . . , P m } is a valid 3-partition of A.

We have now established both directions:

Moreover, the reduction Φ is computable in polynomial time, and the OCP instance has polynomial encoding size (weights are polynomially bounded; the budget is exponentially large in value but has polynomial bit-length).

Proof of Theorem 2. By Subsection 3.1, OCP is in NP. By Subsections 3.2-3.4, there is a polynomialtime reduction from 3-Partition to OCP. Since 3-Partition is NP-complete [5], it follows that OCP is NP-complete.

Remark 1. This reduction does not establish strong NP-completeness of OCP. Determining whether OCP is strongly NP-complete remains an open question.

In this section, we present two small instances (testA and testB) to illustrate the problem. Figure 1 shows the hypergraphs associated with these small instances, where elements of E are represented as red circles and labels in S are represented as blue squares. Each square has a label s i and a value v i (highlighted in yellow).

Instance testA:

}, where:

Instance testB:

}, where:

We analyzed these instances to determine the optimal ordered covering and compared it with a dynamic greedy heuristic. The greedy heuristic constructs the covering by iteratively selecting the subset E ∈ E that minimizes the residual weight (and thus the immediate cost).

For testA, the greedy algorithm successfully finds the optimal solution. The optimal sequence is (E 4 , E 3 , E 2 , E 1 ) with total cost 592.

However, the greedy strategy fails for testB. In this instance, the optimal sequence is (E 4 , E 3 , E 2 , E 1 ) with cost 2 4 +2 8 +2 2 +2 4 = 292. The greedy algorithm, after selecting E 4 (cost 2 4 ), selects E 1 (residual weight 6, cost 2 6 ) instead of E 3 (residual weight 8, cost 2 8 ), leading to a sequence (E 4 , E 1 , E 2 ) with total cost 2 4 + 2 6 + 2 2 + 2 8 = 336. The greedy choice of E 1 is locally cheaper than E 3 , but it fails to reduce the cost of covering s 2 as effectively as the optimal sequence does.

These examples demonstrate that a locally optimal greedy strategy does not guarantee a globally optimal solution for the OCP, motivating the need for more sophisticated algorithms or exact approaches [20].

We have shown that the Ordered Covering Problem is NP-complete by giving a polynomial-time reduction from the strongly NP-complete 3-Partition problem. The construction uses opening and assignment edges together with a carefully chosen exponential cost separation to enforce that any feasible solution has exactly m active assignment edges, each opened in advance and covering one triplet of labels, yielding a bijection between feasible coverings and valid 3-partitions. This result complements existing NP-completeness results for vertex ordering problems in distance geometry and explains why heuristic and exact exponential-time methods are unavoidable for OCP. In particular, it provides a theoretical justification for the greedy ordering heuristic and Branch-and-Bound approaches developed for the SBBU algorithm, and it motivates further work on approximation guarantees, parameterized algorithms, and structural properties of practically arising OCP instances.

📄 Read Full PDF on ArXiv

📸 Image Gallery

Reference

This content is AI-processed based on open access ArXiv data.

On the Complexity of the Ordered Covering Problem in Distance Geometry

📝 Original Info

📝 Abstract

📄 Full Content

📸 Image Gallery

Reference

Table of Contents

Table of Contents

📝 Original Info

📝 Abstract

📄 Full Content

📸 Image Gallery

Reference

Start searching

No results found