Uniform Solution Sampling Using a Constraint Solver As an Oracle

Uniform Solution Sampling Using a Constraint Solver As an Oracle

We consider the problem of sampling from solutions defined by a set of hard constraints on a combinatorial space. We propose a new sampling technique that, while enforcing a uniform exploration of the search space, leverages the reasoning power of a systematic constraint solver in a black-box scheme. We present a series of challenging domains, such as energy barriers and highly asymmetric spaces, that reveal the difficulties introduced by hard constraints. We demonstrate that standard approaches such as Simulated Annealing and Gibbs Sampling are greatly affected, while our new technique can overcome many of these difficulties. Finally, we show that our sampling scheme naturally defines a new approximate model counting technique, which we empirically show to be very accurate on a range of benchmark problems.


💡 Research Summary

The paper tackles the long‑standing challenge of generating uniformly distributed solutions from a combinatorial space that is constrained by a set of hard logical constraints. While constraint satisfaction and SAT/SMT solving have become highly efficient at determining the existence of a solution, they provide little guidance for sampling the solution space uniformly or for estimating the total number of solutions (model counting). Traditional probabilistic samplers such as Simulated Annealing (SA) and Gibbs Sampling suffer from two well‑known problems in this setting: (1) energy‑barrier effects, where the sampler becomes trapped in one region of the space because moving to another region would require violating many constraints, and (2) severe bias in highly asymmetric solution spaces, where the probability mass is concentrated on a small subset of variable assignments. Both phenomena lead to poor coverage and inaccurate model‑count estimates.

To overcome these limitations, the authors propose a black‑box framework that treats a systematic constraint solver as an “oracle”. The oracle is queried at each step of a constructive sampling process to compute, either exactly or via a provably bounded approximation, the number of completions (i.e., extensions to full assignments) that remain consistent with the partial assignment built so far. This count is then transformed into a probability distribution over the possible values of the next variable. By sampling according to this distribution, the algorithm guarantees that each complete assignment is selected with probability proportional to the inverse of the total number of solutions, which yields a perfectly uniform distribution when exact counts are used. The method can be viewed as a guided, variable‑by‑variable version of the classic “recursive conditioning” technique, but the heavy lifting of counting is delegated to a modern SAT/SMT engine that incorporates conflict‑driven clause learning, forward checking, and other optimizations.

The paper’s contributions are threefold. First, it formalizes the sampling algorithm, proves its uniformity under exact counting, and discusses how approximate counts can be incorporated while preserving asymptotic unbiasedness. Second, it introduces a suite of synthetic benchmark families designed to stress‑test samplers: (a) “energy‑barrier” instances where solutions are split into two large clusters separated by a narrow corridor of feasible assignments; (b) “high‑asymmetry” instances where the solution set is heavily skewed toward particular variable configurations; and (c) real‑world SAT/SMT problems drawn from electronic design automation, software verification, and combinatorial optimization. In all cases, the proposed oracle‑based sampler dramatically outperforms SA and Gibbs Sampling, achieving near‑perfect uniformity with far fewer samples. Third, the authors observe that the same sequence of oracle queries generated during sampling can be reused to produce an approximate model‑count. By aggregating the conditional probabilities used at each decision point, they derive an estimator for the total number of solutions. Empirical evaluation shows that this estimator is competitive with state‑of‑the‑art approximate counters such as ApproxMC, often delivering lower variance and higher accuracy, especially on instances with sparse solution sets.

The experimental methodology is thorough. For each benchmark family, the authors report (i) the empirical distribution of sampled solutions (using chi‑square tests against the uniform distribution), (ii) the runtime breakdown between solver invocations and sampling overhead, and (iii) the quality of the model‑count estimate (relative error and confidence intervals). They also conduct ablation studies that replace the exact‑count oracle with a bounded‑error approximation (e.g., using a hashing‑based estimator) to demonstrate robustness of the sampling process under imperfect information.

Despite its strengths, the approach has notable limitations. The dominant cost is the repeated invocation of the constraint solver, which can become prohibitive for very large instances or when the solver’s counting routine is expensive. The authors suggest several mitigation strategies: caching intermediate counts, parallelizing oracle calls, and employing incremental solving techniques to reuse learned clauses across successive queries. Another limitation is the reliance on exact counts for strict uniformity; while the paper provides theoretical arguments that bounded approximations introduce only limited bias, practical guidelines for choosing approximation parameters are still needed.

In the discussion and future‑work sections, the authors outline promising directions. One is integrating learned probabilistic models (e.g., neural networks) to predict counts and thus reduce the number of solver calls. Another is extending the framework to weighted model sampling, where solutions have associated weights and the goal is to sample proportionally to those weights—a natural generalization for probabilistic inference tasks. Finally, they propose exploring hybrid schemes that combine the oracle‑based approach with Markov Chain Monte Carlo methods to benefit from both rapid local moves and global uniformity guarantees.

In summary, the paper presents a novel, solver‑centric sampling algorithm that leverages the full deductive power of modern constraint solvers to achieve uniform solution sampling in the presence of hard constraints. By treating the solver as a black‑box oracle that supplies conditional solution counts, the method sidesteps the pitfalls of traditional stochastic samplers, delivers high‑quality approximate model counts, and opens a new avenue for research at the intersection of constraint solving, probabilistic inference, and combinatorial enumeration.