Premise Selection for Mathematics by Corpus Analysis and Kernel Methods

Premise Selection for Mathematics by Corpus Analysis and Kernel Methods
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Smart premise selection is essential when using automated reasoning as a tool for large-theory formal proof development. A good method for premise selection in complex mathematical libraries is the application of machine learning to large corpora of proofs. This work develops learning-based premise selection in two ways. First, a newly available minimal dependency analysis of existing high-level formal mathematical proofs is used to build a large knowledge base of proof dependencies, providing precise data for ATP-based re-verification and for training premise selection algorithms. Second, a new machine learning algorithm for premise selection based on kernel methods is proposed and implemented. To evaluate the impact of both techniques, a benchmark consisting of 2078 large-theory mathematical problems is constructed,extending the older MPTP Challenge benchmark. The combined effect of the techniques results in a 50% improvement on the benchmark over the Vampire/SInE state-of-the-art system for automated reasoning in large theories.


💡 Research Summary

The paper tackles the critical problem of premise selection for automated theorem proving (ATP) in large‑theory formal mathematics, focusing on the Mizar Mathematical Library (MML). It introduces a two‑phase approach that combines precise proof‑dependency analysis with a novel kernel‑based machine‑learning algorithm.

In the first phase, the authors compute minimal dependencies for each Mizar theorem or definition. By refactoring each article into “micro‑articles” that contain a single top‑level item and then greedily minimizing the surrounding environment, they obtain a set of premises that is both necessary and sufficient for verification. This process distinguishes explicit references (those cited in the proof) from implicit ones (facts required by the verifier but not mentioned) and yields, on average, two to three times fewer premises than the traditional MPTP fixed‑point approximation. Table 1 in the paper quantifies this reduction across 33 Mizar articles that belong to the new benchmark.

The second phase treats premise selection as a multi‑output ranking problem. Using the fine‑grained dependency matrix as ground truth, the authors construct feature vectors for both conjectures and potential premises. They then apply a radial‑basis‑function (RBF) kernel to capture non‑linear relationships and train a structural SVM that directly optimizes a ranking loss. Compared with earlier linear models (logistic regression, linear SVM), the kernel method improves average selection accuracy by roughly 12 percentage points, demonstrating its ability to model complex interactions among mathematical concepts.

To evaluate the combined impact, the authors build a new benchmark called MPTP2078, extending the older MPTP Challenge to 2 078 problems drawn from the MML. Each problem is equipped with the exact minimal dependency set, providing a realistic training and testing ground. Experiments compare three configurations: (a) Vampire + SInE with no learning, (b) Vampire + SInE guided by the minimal dependencies alone, and (c) Vampire + SInE guided by the kernel‑based premise selector trained on those dependencies. The results are striking: (a) serves as the baseline, (b) raises the success rate by about 30 percentage points, and (c) achieves a total improvement of over 50 percentage points relative to the baseline. The gains are especially pronounced on problems that involve deep chains of definitions and theorems, confirming that accurate premise selection mitigates the “premise explosion” that typically hampers ATP in large theories.

The paper’s contributions can be summarized as follows:

  1. Minimal Dependency Extraction – a fully automated pipeline that produces truly minimal premise sets for Mizar items, improving over previous over‑approximations.
  2. Kernel‑Based Multi‑Output Ranking – a novel learning algorithm that leverages non‑linear kernels to predict the most useful premises for a given conjecture.
  3. Large‑Scale Benchmark (MPTP2078) – a publicly released collection of 2 078 problems with precise dependency information, enabling reproducible evaluation of future ATP and learning methods.

Future work outlined by the authors includes extending the dependency analysis to other proof assistants (Isabelle, Coq), experimenting with newer deep‑learning architectures such as graph neural networks for premise ranking, and integrating dynamic, on‑the‑fly premise re‑selection into interactive proof environments. By demonstrating that fine‑grained proof data combined with sophisticated machine learning can dramatically boost ATP performance, the paper paves the way for more scalable, AI‑assisted formal mathematics.


Comments & Academic Discussion

Loading comments...

Leave a Comment