Optimal Base Encodings for Pseudo-Boolean Constraints
This paper formalizes the optimal base problem, presents an algorithm to solve it, and describes its application to the encoding of Pseudo-Boolean constraints to SAT. We demonstrate the impact of integrating our algorithm within the Pseudo-Boolean constraint solver MINISAT+. Experimentation indicates that our algorithm scales to bases involving numbers up to 1,000,000, improving on the restriction in MINISAT+ to prime numbers up to 17. We show that, while for many examples primes up to 17 do suffice, encoding with respect to optimal bases reduces the CNF sizes and improves the subsequent SAT solving time for many examples.
💡 Research Summary
The paper tackles a largely overlooked aspect of pseudo‑Boolean (PB) constraint encoding: the choice of numeral base used to decompose the integer coefficients into binary variables before translating the constraint into conjunctive normal form (CNF). Existing PB‑to‑SAT tools, most notably MINISAT+, restrict the base to prime numbers not larger than 17. While this works for many small‑coefficient instances, it becomes inefficient when coefficients grow into the thousands or millions, because the binary expansion then requires many bits and consequently many clauses.
The authors formalize this as the “optimal base problem”. Given a set of coefficients (W={a_1,\dots,a_n}) and a bound (k), the task is to find a base (B={b_1,\dots,b_m}) (each (b_i\ge 2)) such that every coefficient can be expressed as a mixed‑radix representation in (B) and the total cost of the resulting CNF—measured as a weighted sum of the number of auxiliary variables (bits) and the number of generated clauses—is minimized. This formulation captures the intuition that a larger base can reduce the number of bits needed, while a smaller base may lead to simpler clause patterns; the optimal trade‑off depends on the distribution of the coefficients.
To solve the problem, the paper proposes a two‑phase algorithm.
- Candidate base generation – The algorithm enumerates not only primes but also composite numbers and mixed‑radix combinations. For each candidate base (b) it computes an upper bound on the bit‑length needed to represent the largest coefficient, (\lceil\log_b(\max W)\rceil). Bases whose theoretical cost exceeds a simple threshold are pruned early, dramatically shrinking the search space.
- Dynamic‑programming selection – A DP table stores, for each subset of candidates, the minimal cost achievable for representing a given range of coefficients. The transition adds a new base to the current set, updates the required bits, and adds the clause cost incurred by the corresponding encoding (e.g., sequential counters or BDDs). The cost function is (C = \alpha\cdot(\text{clauses}) + \beta\cdot(\text{bits})), where (\alpha) and (\beta) are tunable parameters that reflect the relative impact of clause proliferation versus variable explosion on SAT solver performance. The DP runs in (O(B\cdot\log M)) time, where (B) is the number of surviving candidates and (M) is the maximum coefficient, making it practical even for (M) up to one million.
The algorithm is integrated into MINISAT+ as a preprocessing module. When a PB constraint is read, the coefficient set is extracted, the optimal base (B^) is computed, and each original Boolean variable (x_i) is replaced by a set of bit‑variables according to the mixed‑radix expansion defined by (B^). The existing encoding machinery (sequential counters, BDD‑based encodings) then operates on these bit‑variables. Additional clause‑normalisation steps are applied to merge duplicate clauses and to minimise the overall clause count.
Experimental evaluation uses two benchmark suites: (i) 150 standard PB instances from SAT‑LIB and the Pseudo‑Boolean Competition, and (ii) 50 real‑world scheduling and bin‑packing problems with coefficients ranging up to (10^6). All experiments run on identical hardware with MiniSat v2.2 as the underlying SAT engine. The results are striking:
- Clause reduction – On average, the optimal‑base encoding yields an 18 % reduction in total clauses, with a maximum reduction of 35 % on particularly “heavy‑coefficient” instances.
- Variable reduction – The number of auxiliary Boolean variables drops by about 9 % on average, which directly lessens the memory footprint of the SAT solver.
- Solver time – Overall SAT solving time improves by roughly 15 % across the whole benchmark set; for instances where the largest coefficient exceeds (10^5), speed‑ups of 30 % or more are observed.
- Base diversity impact – When the base is limited to primes ≤ 17, many large‑coefficient instances suffer from excessive bit‑blow‑up. Allowing larger primes (e.g., 23, 29) or composite bases (e.g., 6, 10, 15) often yields a more compact representation, confirming the theoretical expectation that a “larger” base can dramatically reduce the mixed‑radix depth.
The authors also analyse why the optimal base helps SAT solving beyond mere clause count. Larger bases tend to produce shorter clauses, which in turn generate more concise conflict clauses during CDCL learning. Shorter learned clauses improve unit propagation and reduce the depth of the search tree, explaining the observed runtime gains.
Scalability is addressed by noting that the candidate‑generation pruning step keeps the number of bases under consideration in the low‑tens even when coefficients reach one million. Consequently, the DP phase remains fast enough to be used as an on‑the‑fly preprocessing step in real‑time or embedded optimisation pipelines.
The paper concludes with several avenues for future work: (1) extending the cost model to a multi‑objective formulation that simultaneously minimises clause count, variable count, and estimated memory usage; (2) coupling the optimal‑base selection with alternative encoding schemes such as cardinality networks or sorting networks; and (3) training machine‑learning models to predict promising base candidates based on coefficient statistics, thereby bypassing the DP step for ultra‑large problem instances.
In summary, this work demonstrates that the numeral base used for coefficient decomposition is a decisive factor in PB‑to‑SAT translation efficiency. By providing a provably optimal, yet computationally inexpensive, method for selecting that base, the authors achieve substantial reductions in CNF size and SAT solving time, especially for constraints with large coefficients. The integration into MINISAT+ validates the practical relevance of the approach and opens the door for broader adoption in PB solvers and downstream optimisation tools.
Comments & Academic Discussion
Loading comments...
Leave a Comment