Dependencies in Formal Mathematics: Applications and Extraction for Coq and Mizar
Two methods for extracting detailed formal dependencies from the Coq and Mizar system are presented and compared. The methods are used for dependency extraction from two large mathematical repositories: the Coq Repository at Nijmegen and the Mizar Mathematical Library. Several applications of the detailed dependency analysis are described and proposed. Motivated by the different applications, we discuss the various kinds of dependencies that we are interested in,and the suitability of various dependency extraction methods.
💡 Research Summary
The paper presents two complementary techniques for extracting fine‑grained formal dependencies from the interactive theorem provers Coq and Mizar, and evaluates them on the large CoRN repository (maintained at Nijmegen) and the Mizar Mathematical Library (MML). The authors begin by clarifying what “dependency” means in a formal setting. They distinguish a purely logical view—where a definition, theorem or axiom is required for the correctness of another statement—from a pragmatic view that also includes all the auxiliary machinery of the proof assistant (notations, hint databases, type‑class mechanisms, tactic parameters, etc.). This dual perspective motivates the need for two different extraction strategies.
In Coq, dependency tracking is woven into three kinds of commands: (1) registration of new logical objects (definitions, axioms), (2) updates of the proof tree when a tactic is applied, and (3) finalisation of a proof (Qed, Save, Defined). The implementation hooks into the Coq kernel at the points where these commands are processed. When a new object is added, its type and body are traversed to collect direct references. When a tactic is executed, the system records the parsed form, the Ltac‑expanded form, and the evaluated form of the tactic, thereby capturing every identifier that appears at any stage of interpretation. The authors also discuss special tactics such as auto and intuition, which may generate temporary lemmas, register hint databases, or perform opaque/transparent handling; these are also reported as dependencies. The output is a machine‑readable list of dependencies emitted after each “progress‑making” command.
Mizar’s approach is conceptually simpler but equally robust. Each article A has an associated environment EA that contains all declarations needed for verification. EA is usually a conservative over‑approximation, often including many items that are not actually required. The authors compute a minimal environment E′A by iteratively removing superfluous items while preserving verifiability and semantics. This yields a precise dependency set for each article. Because Mizar’s core is less dynamic than Coq’s, the implementation does not need to hook into a proof‑tree, but it must handle the article‑level environment machinery.
The extracted dependency data are put to several practical uses. First, they enable fast recompilation of large libraries: by knowing exactly which items depend on a changed definition, only the affected portion of the library needs to be rebuilt, dramatically reducing build times. Second, the data serve as training material for AI/ATP systems. The authors experiment with graph‑neural‑network models that ingest dependency graphs to predict useful lemmas or to guide tactic selection, showing promising improvements in automated proof search. Third, in collaborative formal‑math wikis, real‑time dependency information can alert contributors when a modification would break other entries, supporting safer collaborative development.
The paper also discusses limitations and open challenges. In Coq, many pragmatic dependencies (e.g., the contents of hint databases, first‑order proof‑search depth settings) are not captured automatically because they require knowledge of internal tactic behaviour. The authors suggest extending the OCaml API so that tactics themselves can report their auxiliary dependencies. Another subtle issue is opacity: Coq distinguishes opaque and transparent objects, and changing an opaque object can affect universe constraints in the underlying pCIC. Detecting such indirect effects is non‑trivial and may require full re‑checking of the library. For Mizar, the main concern is future changes to the core that could invalidate the current minimal‑environment algorithm.
In conclusion, the authors demonstrate that detailed dependency extraction is feasible for both Coq and Mizar, that the extracted data are valuable for library maintenance, AI‑assisted proving, and collaborative editing, and that further work should aim at extending these techniques to other proof assistants (Isabelle, Lean) and at improving the handling of non‑logical, system‑level dependencies.
Comments & Academic Discussion
Loading comments...
Leave a Comment