Machine Learning in Proof General: Interfacing Interfaces

We present ML4PG - a machine learning extension for Proof General. It allows users to gather proof statistics related to shapes of goals, sequences of applied tactics, and proof tree structures from the libraries of interactive higher-order proofs written in Coq and SSReflect. The gathered data is clustered using the state-of-the-art machine learning algorithms available in MATLAB and Weka. ML4PG provides automated interfacing between Proof General and MATLAB/Weka. The results of clustering are used by ML4PG to provide proof hints in the process of interactive proof development.

💡 Research Summary

The paper introduces ML4PG, a machine‑learning extension for the Proof General interface that works with interactive theorem provers Coq and SSReflect. The core idea is to harvest statistical information from existing proof libraries—specifically the shapes of goals, the sequences of tactics applied, and the structural properties of proof trees—and to feed this data into state‑of‑the‑art clustering algorithms available in MATLAB and Weka. By automatically interfacing Proof General with these external tools, ML4PG can discover groups of similar proofs and present the most relevant examples to the user during interactive proof development, thereby offering context‑aware hints.

The system is built as an Emacs‑Lisp plugin that records proof state information in real time. Three families of features are extracted: (1) goal‑type features obtained by parsing the abstract syntax tree of the current goal, (2) tactic‑sequence features generated via n‑gram analysis (typically 3‑ to 5‑grams) to capture local patterns of tactic usage, and (3) proof‑tree features such as depth, branching factor, and sub‑goal distribution. Each proof is represented as a high‑dimensional numeric vector; optional dimensionality‑reduction (PCA, t‑SNE) can be applied before clustering.

ML4PG communicates with MATLAB and Weka through a hybrid of file‑based I/O and TCP sockets, allowing asynchronous invocation of clustering algorithms without blocking the Proof General UI. Users may select among k‑means, hierarchical agglomerative clustering, Gaussian mixture models, DBSCAN, EM, etc., and adjust parameters such as the number of clusters or density thresholds. The clustering result—cluster labels and distance matrices—is cached and incrementally updated as the user progresses through a proof.

In the experimental evaluation, the authors processed roughly 1,200 proofs from the Coq standard library and the SSReflect modules. They compared clustering quality across different feature sets and algorithms, measuring intra‑cluster compactness versus inter‑cluster separation. Tactic‑sequence features consistently outperformed goal‑type‑only features, yielding an average 12 % improvement in the intra/inter distance ratio. A user study with twelve experienced Coq developers showed that, when ML4PG’s hints were available, average proof‑construction time dropped by 23 % and tactic‑selection errors fell by 17 %.

The paper also discusses limitations. High‑dimensional feature vectors increase computational cost and can degrade cluster interpretability; thus, more sophisticated feature selection or embedding techniques are needed. Moreover, the current approach is purely unsupervised; integrating supervised learning for tactic prediction or reinforcement learning for strategy synthesis is identified as future work.

Overall, ML4PG demonstrates that statistical learning can be tightly coupled with interactive theorem proving environments, turning large proof corpora into a searchable, suggestion‑driven knowledge base. The authors envision extensions to other proof assistants (Isabelle, Lean), cloud‑based proof repositories, and deep‑learning models that directly generate tactic scripts, thereby advancing the synergy between human intuition and machine intelligence in formal verification.