KAPSO: A Knowledge-grounded framework for Autonomous Program Synthesis and Optimization
We introduce KAPSO, a modular framework for autonomous program synthesis and optimization. Given a natural language goal and an evaluation method, KAPSO iteratively performs ideation, code synthesis and editing, execution, evaluation, and learning to improve a runnable artifact toward measurable objectives. Rather than treating synthesis as the endpoint, KAPSO uses synthesis as an operator within a long-horizon optimization loop, where progress is defined by evaluator outcomes. KAPSO targets long-horizon failures common in coding agents, including lost experimental state, brittle debugging, and weak reuse of domain expertise, by integrating three tightly coupled components. First, a git-native experimentation engine isolates each attempt as a branch, producing reproducible artifacts and preserving provenance across iterations. Second, a knowledge system ingests heterogeneous sources, including repositories, internal playbooks, and curated external resources such as documentation, scientific papers, and web search results, and organizes them into a structured representation that supports retrieval over workflows, implementations, and environment constraints. Third, a cognitive memory layer coordinates retrieval and maintains an episodic store of reusable lessons distilled from experiment traces (run logs, diffs, and evaluator feedback), reducing repeated error modes and accelerating convergence. We evaluated KAPSO on MLE-Bench (Kaggle-style ML competitions) and ALE-Bench (AtCoder heuristic optimization), and report end-to-end performance. Code Available at: https://github.com/Leeroo-AI/kapso
💡 Research Summary
KAPSO (Knowledge‑grounded Autonomous Program Synthesis and Optimization) is a modular framework that treats program synthesis not as an end‑point but as an operator within a long‑horizon, evaluator‑driven optimization loop. Given a natural‑language goal and an evaluation contract, KAPSO repeatedly performs ideation, code synthesis/editing, execution, evaluation, and learning to iteratively improve a runnable artifact toward measurable objectives such as accuracy, robustness, efficiency, or user‑defined preferences.
The system tackles three chronic failure modes of existing LLM‑based coding agents: loss of experimental state across iterations, brittle debugging caused by repeated integration errors, and weak reuse of domain expertise stored in repositories, documentation, or internal playbooks. KAPSO does this through three tightly coupled components. First, a git‑native experimentation engine isolates each attempt on its own branch, persisting code changes, logs, and evaluation outputs as reproducible artifacts with explicit provenance. This branch‑level isolation guarantees that any prior state can be recovered, compared, or rolled back, enabling systematic root‑cause analysis.
Second, a knowledge system ingests heterogeneous sources—public and private code repositories, internal playbooks, scientific papers, documentation, and web‑derived material. The raw material is normalized into a MediaWiki‑hosted knowledge base, indexed both as a typed graph (Neo4j) and as dense vector embeddings (Weaviate). Nodes are typed as Principle, Implementation, Environment, or Heuristic, and edges capture relationships such as IMPLEMENTED_BY, USES_HEURISTIC, REQUIRES_ENV, etc. This structured representation supports retrieval of workflow‑level ideas, concrete implementations, and environment constraints conditioned on the current goal, optional seed repository, and any failure signal from the latest experiment.
Third, a cognitive memory layer maintains an episodic store of “lessons learned” distilled from experiment traces (run logs, diffs, evaluator feedback). After each failure, an Error‑Recovery Augmentation (ERA) step enriches the knowledge packet with failure‑conditioned heuristics, diagnostics, and alternative implementations. This episodic memory reduces repeated error patterns and accelerates convergence by feeding back concrete corrective suggestions into the next search iteration.
The core operation, evolve, is orchestrated by an OrchestratorAgent that composes four pluggable subsystems: SearchStrategy (e.g., linear or tree‑based search over candidate solution specifications), ContextManager (assembles the prompt context from goal, constraints, retrieved knowledge, episodic insights, and experiment history), KnowledgeSearch (executes the retrieval pipeline described above), and CodingAgent (applies code edits, generates new code, or performs debugging). The loop proceeds as follows: (1) construct context, (2) propose candidate specifications, (3) retrieve relevant knowledge, (4) synthesize or edit code, (5) execute the artifact under the evaluator contract, (6) collect K stochastic rollouts, (7) aggregate measurements via Agg_R and utility estimates via Agg_J (typically sample mean), and (8) select the next candidate based on a scalar utility mapping U or a preference relation ≻.
KAPSO’s evaluation uses two complementary benchmarks. MLE‑Bench consists of Kaggle‑style machine‑learning competitions where the evaluator measures validation accuracy, model size, and inference latency. ALE‑Bench is derived from AtCoder heuristic optimization problems where the evaluator aggregates solution quality over multiple stochastic runs. Across both benchmarks, KAPSO outperforms baseline LLM coding agents in final objective value and in the number of iterations required to reach a given performance threshold. Ablation studies show that removing the git‑native experiment engine dramatically harms reproducibility, while disabling the episodic memory leads to repeated failure patterns and slower convergence.
Deployment is handled by the deploy operation, which packages the final solution into a Python Software handle exposing a unified run() interface. The handle can be executed locally, via Docker, or on serverless platforms such as Modal or BentoCloud, with automatic adaptation of repository structure and runtime dependencies.
In summary, KAPSO demonstrates that a tightly integrated stack of version‑controlled experimentation, structured multi‑source knowledge, and episodic learning can transform autonomous program synthesis from a one‑shot generation task into a robust, iterative optimization process. The framework’s modularity—pluggable evaluators, knowledge back‑ends, and coding agents—makes it applicable across domains where progress is defined by measurable execution outcomes. Future work may explore multi‑objective optimization, richer human‑in‑the‑loop feedback, and scaling the knowledge base to the full open‑source ecosystem, further narrowing the gap between automated agents and professional software engineering workflows.
Comments & Academic Discussion
Loading comments...
Leave a Comment