Gödel's Poetry

Reading time: 5 minute
...

📝 Original Info

  • Title: Gödel’s Poetry
  • ArXiv ID: 2512.14252
  • Date: 2025-12-16
  • Authors: Researchers from original ArXiv paper

📝 Abstract

Formal, automated theorem proving has long been viewed as a challenge to artificial intelligence. We introduce here a new approach to computer theorem proving, one that employs specialized language models for Lean4 proof generation combined with recursive decomposition of difficult theorems into simpler entailing propositions. These models are coordinated through a multi-agent architecture that orchestrates autoformalization (if required), proof generation, decomposition of difficult theorems into simpler entailing propositions, and recursive proof (and/or decomposition) of these propositions. Without decomposition, we achieve a 90.4% pass rate on miniF2F. With decomposition, this is significantly improved. A key technical contribution lies in our extension of the Kimina Lean Server with abstract syntax tree (AST) parsing capabilities to facilitate automated, recursive proof decomposition. The system is made available on PyPI as goedels-poetry, and the open-source implementation at https://github.com/KellyJDavis/goedels-poetry facilitates both adaptation to alternative language models and extension with custom functionality.

💡 Deep Analysis

Deep Dive into Gödel's Poetry.

Formal, automated theorem proving has long been viewed as a challenge to artificial intelligence. We introduce here a new approach to computer theorem proving, one that employs specialized language models for Lean4 proof generation combined with recursive decomposition of difficult theorems into simpler entailing propositions. These models are coordinated through a multi-agent architecture that orchestrates autoformalization (if required), proof generation, decomposition of difficult theorems into simpler entailing propositions, and recursive proof (and/or decomposition) of these propositions. Without decomposition, we achieve a 90.4% pass rate on miniF2F. With decomposition, this is significantly improved. A key technical contribution lies in our extension of the Kimina Lean Server with abstract syntax tree (AST) parsing capabilities to facilitate automated, recursive proof decomposition. The system is made available on PyPI as goedels-poetry, and the open-source implementation at htt

📄 Full Content

GÖDEL’S POETRY Kelly J. Davis Unaffiliated kdavis@alum.mit.edu ABSTRACT Formal, automated theorem proving has long been viewed as a challenge to artificial intelligence. We introduce here a new approach to computer theorem proving, one that employs specialized language models for Lean4 proof generation combined with recursive decomposition of difficult theorems into simpler entailing propositions. These models are coordinated through a multi-agent architecture that orchestrates autoformalization (if required), proof generation, decomposition of difficult theorems into simpler entailing propositions, and recursive proof (and/or decomposition) of these proposi- tions. Without decomposition, we achieve a 90.4% pass rate on miniF2F. With decomposition, this is significantly improved. A key technical contribution lies in our extension of the Kimina Lean Server with abstract syntax tree (AST) parsing capabilities to facilitate automated, recursive proof decomposition. The system is made available on PyPI as goedels-poetry, and the open-source implementation at https://github.com/KellyJDavis/goedels-poetry facilitates both adap- tation to alternative language models and extension with custom functionality. 1 Introduction Formal, automated theorem proving represents a fundamental challenge to artificial intelligence. The task requires au- tomated generation of formal proofs that may be verified by computer systems [11]. Recent advances in large language models [12–15] have demonstrated remarkable capabilities in mathematical reasoning [2, 4–8, 17, 18]. However, the generation of formally verified proofs remains a difficult undertaking due to the strict syntactic and logical require- ments imposed by proof assistants such as Lean [1], Isabelle [23], and Coq [24]. We introduce here Gödel’s Poetry, a system combining automated theorem proving, decomposition of difficult the- orems into entailing propositions, RAG based retrieval of propositions useful for decomposition, and specialized language models. Our approach builds upon three recent advances: the verifier-guided self-correction approach of Goedel-Prover-V2 [2], the recursive proof strategy of POETRY [3], and the RAG based propositions retrieval of Hilbert [4]. The system coordinates specialized agents through LangGraph [9] and LangChain [25] for the purposes of autoformalization, autoformalization verification, proof generation, proof verification, and recursive decomposition. A key technical contribution of this work lies in our extension of the Kimina Lean Server [10] to support abstract syntax tree (AST) extraction from Lean 4 code. This extension enables programmatic analysis of decompositions, automatic identification of unproven subgoals, and extraction of subgoal proposition statements—operations that prove essential for recursive decomposition as adapted to Lean’s tactic-based proof system. The system provides three principal capabilities: (1) multi-stage proof generation proceeding through an optional autoformalization phase (autoformalization, autoformalization syntactic validation, and autoformalization semantic validation), a proof generation phase , and a proof verification phase; (2) recursive decomposition of complex theorems into entailing propositions using proof sketches with sorry placeholders; (3) proof reconstruction integrating verified proposition proofs through AST-based substitution. The modular architecture permits the substitution of alternative language models, thereby facilitating experimentation with different proof strategies. arXiv:2512.14252v1 [cs.AI] 16 Dec 2025 Gödel’s Poetry 2 Related Work 2.1 Neural Theorem Proving Neural theorem proving, which combines deep learning with formal verification, has emerged as a productive area of research [16]. Its modern era arguably began with Polu and Sutskever’s work [41] applying transformer-based [39] language models to automated theorem proving. Formulating tactic prediction as language modeling, this inspired much follow-on research [29–36]. More recently, the work of DeepSeek-Prover [42] demonstrated that language models, when fine-tuned on mathe- matical corpora, are capable of generating Lean 4 proofs with significant success rates. This motivated follow-on work, DeepSeek-Prover-V1.5 [38] and DeepSeek-Prover-V2 [17]. Subsequent to DeepSeek-Prover, AlphaProof [43] achieved silver-medal performance on the International Mathematical Olympiad employing reinforcement learning with formal verification. Even more recently there has been a flood of papers from Apple [4], ByteDance [5–7], Harmonic [8], Kimina [18], and many others. Progress in this domain has been driven by benchmarks including miniF2F [44], MathLib-Bench [45], and Putnam- Bench [46], which provide standardized evaluation on undergraduate mathematics, Mathlib [47] theorems, and Putnam competition problems. 2.2 Goedel-Prover-V2 The work of Goedel-Prover-V2 [2] introduced three key innovations: (1) scaffolded data synthesis,

…(Full text truncated)…

📸 Image Gallery

cover.png page_2.webp page_3.webp

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut