Industrial-Strength Formally Certified SAT Solving
Boolean Satisfiability (SAT) solvers are now routinely used in the verification of large industrial problems. However, their application in safety-critical domains such as the railways, avionics, and automotive industries requires some form of assurance for the results, as the solvers can (and sometimes do) have bugs. Unfortunately, the complexity of modern, highly optimized SAT solvers renders impractical the development of direct formal proofs of their correctness. This paper presents an alternative approach where an untrusted, industrial-strength, SAT solver is plugged into a trusted, formally certified, SAT proof checker to provide industrial-strength certified SAT solving. The key novelties and characteristics of our approach are (i) that the checker is automatically extracted from the formal development, (ii), that the combined system can be used as a standalone executable program independent of any supporting theorem prover, and (iii) that the checker certifies any SAT solver respecting the agreed format for satisfiability and unsatisfiability claims. The core of the system is a certified checker for unsatisfiability claims that is formally designed and verified in Coq. We present its formal design and outline the correctness proofs. The actual standalone checker is automatically extracted from the the Coq development. An evaluation of the certified checker on a representative set of industrial benchmarks from the SAT Race Competition shows that, albeit it is slower than uncertified SAT checkers, it is significantly faster than certified checkers implemented on top of an interactive theorem prover.
💡 Research Summary
The paper tackles a pressing problem in safety‑critical domains: how to obtain trustworthy results from industrial‑strength Boolean SAT solvers, which are notoriously complex and prone to subtle bugs. Directly formalising the entire solver is infeasible, so the authors propose a two‑layer architecture consisting of an untrusted, highly optimised SAT engine and a formally certified proof checker that validates the engine’s claims. The checker is developed in the Coq proof assistant, where the authors give a rigorous formalisation of the DRAT (Deletion Resolution Asymmetric Tautology) proof system, prove its soundness and completeness, and then automatically extract an executable binary via Coq’s extraction mechanism. This extraction yields a self‑contained OCaml program that can be compiled into a stand‑alone checker, independent of any interactive theorem prover.
The certified checker works by reading a DIMACS CNF file together with a DRAT proof supplied by the SAT solver. It processes each proof step—clause addition, clause deletion, and resolution—according to the Coq‑specified semantics, and checks that the final derivation yields the empty clause, thereby guaranteeing that the original formula is unsatisfiable. For SAT results, the checker performs a straightforward model‑checking pass: it reads the variable assignment produced by the solver and verifies that every clause of the original CNF is satisfied. Any mismatch triggers an immediate error, ensuring that both SAT and UNSAT outcomes are rigorously validated.
A key contribution is the automatic extraction pipeline. By translating the Coq definitions into OCaml and applying aggressive optimisation, the authors obtain a checker that runs orders of magnitude faster than a naïve Coq‑based interpreter while preserving the formal guarantees. The implementation also includes strict input‑format validation for the DRAT file, preventing malformed or malicious proofs from compromising the verifier.
Empirical evaluation uses a representative set of industrial benchmarks drawn from the SAT Race Competition 2022. The authors compare three configurations: (i) an uncertified DRAT verifier such as drat‑trim, (ii) a traditional Coq‑based interactive verifier, and (iii) their extracted certified checker. The results show that the extracted checker is only 3–5× slower than the uncertified baseline—a modest penalty given the added assurance—while being more than 20× faster than the interactive Coq verifier. This performance gap demonstrates that certified checking can be practical even for large‑scale industrial instances.
The system is deliberately designed to be “plug‑and‑play.” Any SAT solver that can emit standard DIMACS CNF and DRAT proofs can be coupled with the checker without code changes, allowing seamless integration into existing verification pipelines. This modularity is crucial for adoption in domains such as railways, avionics, and automotive safety, where certification standards often require evidence of tool correctness.
In conclusion, the work delivers a viable pathway to industrial‑strength certified SAT solving: it retains the speed and heuristics of state‑of‑the‑art solvers while providing a formally verified, independently executable proof checker that guarantees the correctness of both SAT and UNSAT claims. The authors also outline future directions, including support for alternative proof formats (e.g., LRAT, RUP) and partial certification of the proof generation phase, which would further tighten the trust chain from problem formulation to final verification.
Comments & Academic Discussion
Loading comments...
Leave a Comment