📝 Original Info
- Title: Gödel’s Poetry
- ArXiv ID: 2512.14252
- Date: 2025-12-16
- Authors: Researchers from original ArXiv paper
📝 Abstract
Formal, automated theorem proving has long been viewed as a challenge to artificial intelligence. We introduce here a new approach to computer theorem proving, one that employs specialized language models for Lean4 proof generation combined with recursive decomposition of difficult theorems into simpler entailing propositions. These models are coordinated through a multi-agent architecture that orchestrates autoformalization (if required), proof generation, decomposition of difficult theorems into simpler entailing propositions, and recursive proof (and/or decomposition) of these propositions. Without decomposition, we achieve a 90.4% pass rate on miniF2F. With decomposition, this is significantly improved. A key technical contribution lies in our extension of the Kimina Lean Server with abstract syntax tree (AST) parsing capabilities to facilitate automated, recursive proof decomposition. The system is made available on PyPI as goedels-poetry, and the open-source implementation at https://github.com/KellyJDavis/goedels-poetry facilitates both adaptation to alternative language models and extension with custom functionality.
💡 Deep Analysis
Deep Dive into Gödel's Poetry.
Formal, automated theorem proving has long been viewed as a challenge to artificial intelligence. We introduce here a new approach to computer theorem proving, one that employs specialized language models for Lean4 proof generation combined with recursive decomposition of difficult theorems into simpler entailing propositions. These models are coordinated through a multi-agent architecture that orchestrates autoformalization (if required), proof generation, decomposition of difficult theorems into simpler entailing propositions, and recursive proof (and/or decomposition) of these propositions. Without decomposition, we achieve a 90.4% pass rate on miniF2F. With decomposition, this is significantly improved. A key technical contribution lies in our extension of the Kimina Lean Server with abstract syntax tree (AST) parsing capabilities to facilitate automated, recursive proof decomposition. The system is made available on PyPI as goedels-poetry, and the open-source implementation at htt
📄 Full Content
GÖDEL’S POETRY
Kelly J. Davis
Unaffiliated
kdavis@alum.mit.edu
ABSTRACT
Formal, automated theorem proving has long been viewed as a challenge to artificial intelligence. We
introduce here a new approach to computer theorem proving, one that employs specialized language
models for Lean4 proof generation combined with recursive decomposition of difficult theorems into
simpler entailing propositions. These models are coordinated through a multi-agent architecture that
orchestrates autoformalization (if required), proof generation, decomposition of difficult theorems
into simpler entailing propositions, and recursive proof (and/or decomposition) of these proposi-
tions. Without decomposition, we achieve a 90.4% pass rate on miniF2F. With decomposition, this
is significantly improved. A key technical contribution lies in our extension of the Kimina Lean
Server with abstract syntax tree (AST) parsing capabilities to facilitate automated, recursive proof
decomposition. The system is made available on PyPI as goedels-poetry, and the open-source
implementation at https://github.com/KellyJDavis/goedels-poetry facilitates both adap-
tation to alternative language models and extension with custom functionality.
1
Introduction
Formal, automated theorem proving represents a fundamental challenge to artificial intelligence. The task requires au-
tomated generation of formal proofs that may be verified by computer systems [11]. Recent advances in large language
models [12–15] have demonstrated remarkable capabilities in mathematical reasoning [2, 4–8, 17, 18]. However, the
generation of formally verified proofs remains a difficult undertaking due to the strict syntactic and logical require-
ments imposed by proof assistants such as Lean [1], Isabelle [23], and Coq [24].
We introduce here Gödel’s Poetry, a system combining automated theorem proving, decomposition of difficult the-
orems into entailing propositions, RAG based retrieval of propositions useful for decomposition, and specialized
language models. Our approach builds upon three recent advances: the verifier-guided self-correction approach of
Goedel-Prover-V2 [2], the recursive proof strategy of POETRY [3], and the RAG based propositions retrieval of
Hilbert [4]. The system coordinates specialized agents through LangGraph [9] and LangChain [25] for the purposes of
autoformalization, autoformalization verification, proof generation, proof verification, and recursive decomposition.
A key technical contribution of this work lies in our extension of the Kimina Lean Server [10] to support abstract syntax
tree (AST) extraction from Lean 4 code. This extension enables programmatic analysis of decompositions, automatic
identification of unproven subgoals, and extraction of subgoal proposition statements—operations that prove essential
for recursive decomposition as adapted to Lean’s tactic-based proof system.
The system provides three principal capabilities: (1) multi-stage proof generation proceeding through an optional
autoformalization phase (autoformalization, autoformalization syntactic validation, and autoformalization semantic
validation), a proof generation phase , and a proof verification phase; (2) recursive decomposition of complex theorems
into entailing propositions using proof sketches with sorry placeholders; (3) proof reconstruction integrating verified
proposition proofs through AST-based substitution. The modular architecture permits the substitution of alternative
language models, thereby facilitating experimentation with different proof strategies.
arXiv:2512.14252v1 [cs.AI] 16 Dec 2025
Gödel’s Poetry
2
Related Work
2.1
Neural Theorem Proving
Neural theorem proving, which combines deep learning with formal verification, has emerged as a productive area of
research [16]. Its modern era arguably began with Polu and Sutskever’s work [41] applying transformer-based [39]
language models to automated theorem proving. Formulating tactic prediction as language modeling, this inspired
much follow-on research [29–36].
More recently, the work of DeepSeek-Prover [42] demonstrated that language models, when fine-tuned on mathe-
matical corpora, are capable of generating Lean 4 proofs with significant success rates. This motivated follow-on
work, DeepSeek-Prover-V1.5 [38] and DeepSeek-Prover-V2 [17]. Subsequent to DeepSeek-Prover, AlphaProof [43]
achieved silver-medal performance on the International Mathematical Olympiad employing reinforcement learning
with formal verification. Even more recently there has been a flood of papers from Apple [4], ByteDance [5–7],
Harmonic [8], Kimina [18], and many others.
Progress in this domain has been driven by benchmarks including miniF2F [44], MathLib-Bench [45], and Putnam-
Bench [46], which provide standardized evaluation on undergraduate mathematics, Mathlib [47] theorems, and Putnam
competition problems.
2.2
Goedel-Prover-V2
The work of Goedel-Prover-V2 [2] introduced three key innovations: (1) scaffolded data synthesis,
…(Full text truncated)…
📸 Image Gallery
Reference
This content is AI-processed based on ArXiv data.