Generalizing Redundancy in Propositional Logic: Foundations and Hitting Sets Duality

Detection and elimination of redundant clauses from propositional formulas in Conjunctive Normal Form (CNF) is a fundamental problem with numerous application domains, including AI, and has been the subject of extensive research. Moreover, a number of recent applications motivated various extensions of this problem. For example, unsatisfiable formulas partitioned into disjoint subsets of clauses (so-called groups) often need to be simplified by removing redundant groups, or may contain redundant variables, rather than clauses. In this report we present a generalized theoretical framework of labelled CNF formulas that unifies various extensions of the redundancy detection and removal problem and allows to derive a number of results that subsume and extend previous work. The follow-up reports contain a number of additional theoretical results and algorithms for various computational problems in the context of the proposed framework.

💡 Research Summary

The paper introduces a unified theoretical framework for redundancy detection and removal in propositional formulas by means of “labelled CNF” (LCNF). In traditional CNF redundancy problems, the focus has been on clause‑level phenomena such as tautologies, subsumption, or on group‑level constructs like MUS‑groups. However, many modern applications require handling redundancy at multiple granularities simultaneously—clauses, variables, or user‑defined groups. To address this, the authors attach one or more labels to each clause, thereby turning clauses, variables, and groups into instances of a single abstract entity: a label.

A label set (S\subseteq L) (where (L) is the universe of labels) induces a sub‑formula (\Phi(S)=\bigcup_{l\in S}\Phi_l), where (\Phi_l) denotes the set of clauses carrying label (l). The central question becomes: does a particular label set preserve the (un)satisfiability of the original formula? The authors answer this by establishing a duality with hitting‑set theory. When the original CNF (\Phi) is unsatisfiable, a minimal hitting label set (MHLS) is a smallest set of labels whose induced sub‑formula remains unsatisfiable; this directly generalises the classic Minimal Unsatisfiable Subset (MUS). Dually, when (\Phi) is satisfiable, a maximal hitting label set (XHLS) is a largest set of labels that keeps the formula satisfiable, corresponding to Maximal Satisfiable Subsets (MSS) or Minimal Correction Subsets (MCS).

By treating labels as elements of a partially ordered set (e.g., a variable label may dominate all clause labels that contain that variable), the framework captures inclusion relationships among different redundancy granularities. The authors prove that, under certain structural restrictions on the label poset (such as tree‑shaped or bounded width), polynomial‑time approximation algorithms exist for finding MHLS or XHLS. In the general case, the problems remain NP‑hard, but the reduction to hitting‑set allows the reuse of mature SAT‑solver‑based MUS extraction techniques without modification.

The paper formalises two new notions: Core Label Set, the intersection of all minimal hitting label sets (generalising the core clause concept), and Redundant Label Set, any label whose removal does not change the (un)satisfiability status. These definitions subsume earlier concepts like core clauses, essential variables, and indispensable groups, providing a single language for reasoning about redundancy across different levels.

Beyond theory, the authors sketch three concrete application scenarios. In software package management, packages become labels and dependency clauses become labelled clauses; the framework identifies unnecessary package groups, reducing installation size. In hardware verification, gates are modelled as variable labels while timing or logical constraints are clause labels; redundant constraints are pruned, speeding up model checking. In knowledge‑base simplification, logical rules are clause labels and concepts are variable labels; the system isolates the essential rule set while discarding superfluous axioms. In each case, existing SAT‑based tools can be plugged into the LCNF pipeline, yielding performance gains without reinventing the wheel.

In conclusion, labelled CNF provides a general, extensible, and algorithmically friendly abstraction for redundancy problems. It unifies clause‑, variable‑, and group‑level redundancy under a single mathematical structure, leverages the well‑studied hitting‑set duality, and enables the direct transfer of existing algorithms to richer problem settings. The authors suggest future work on specialised approximation algorithms for particular label posets, integration with machine‑learning‑driven label prediction, and large‑scale empirical validation on real‑world benchmarks.