InternAgent-1.5: A Unified Agentic Framework for Long-Horizon Autonomous Scientific Discovery
We introduce InternAgent-1.5, a unified system designed for end-to-end scientific discovery across computational and empirical domains. The system is built on a structured architecture composed of three coordinated subsystems for generation, verification, and evolution. These subsystems are supported by foundational capabilities for deep research, solution optimization, and long horizon memory. The architecture allows InternAgent-1.5 to operate continuously across extended discovery cycles while maintaining coherent and improving behavior. It also enables the system to coordinate computational modeling and laboratory experimentation within a single unified system. We evaluate InternAgent-1.5 on scientific reasoning benchmarks such as GAIA, HLE, GPQA, and FrontierScience, and the system achieves leading performance that demonstrates strong foundational capabilities. Beyond these benchmarks, we further assess two categories of discovery tasks. In algorithm discovery tasks, InternAgent-1.5 autonomously designs competitive methods for core machine learning problems. In empirical discovery tasks, it executes complete computational or wet lab experiments and produces scientific findings in earth, life, biological, and physical domains. Overall, these results show that InternAgent-1.5 provides a general and scalable framework for autonomous scientific discovery.
💡 Research Summary
InternAgent‑1.5 is presented as a unified, agentic framework that aims to close the gap between current AI‑scientist systems and the requirements of long‑horizon, cross‑disciplinary scientific discovery. The authors identify four major shortcomings of existing approaches: (1) domain‑specific architectures that hinder unified reasoning, (2) partial coverage of fundamental capabilities such as dry‑lab and wet‑lab integration, (3) linear, trajectory‑local optimization pipelines that fail to leverage information across many search iterations, and (4) limited long‑term memory, which prevents sustained self‑improvement.
To address these issues, InternAgent‑1.5 is built around three coordinated subsystems—Generation, Verification, and Evolution—each powered by a dedicated foundational capability: Deep Research, Solution Refinement, and Long‑Horizon Memory. The Generation subsystem performs large‑scale literature retrieval, cross‑disciplinary knowledge graph construction, and hypothesis formulation. The Verification subsystem evaluates hypotheses through computational simulations, algorithmic benchmarking, or actual wet‑lab execution, employing a multi‑round parallel optimization loop. The Evolution subsystem ingests verification results, updates procedural, episodic, and semantic memory stores, and produces refined priors that guide the next generation cycle.
Key technical components include:
-
Cross‑Disciplinary Knowledge Graph (CDKG): A unified graph that ingests papers, patents, datasets, and domain‑specific tool outputs, encoding entities, relations, and provenance across biology, chemistry, earth science, and physics. The CDKG supports flow‑graph reasoning and graph‑guided output synthesis, allowing the system to trace causal chains that span multiple disciplines.
-
Graph‑Augmented Solution Refinement: A generative design module that leverages reinforcement‑learning‑based meta‑optimizers to explore experimental parameter spaces. It runs parallel evaluations of alternative protocols, automatically selects promising candidates, and records structured evidence (numeric results, images, metadata).
-
Structured Cognitive Memory: Three memory layers—Strategy‑Procedural Memory (captures algorithmic templates and experimental protocols), Task‑Episodic Memory (stores individual trial traces and outcomes), and Semantic‑Knowledge Memory (maintains domain concepts and cross‑disciplinary relations). This hierarchy enables persistent context across hundreds of discovery cycles, mitigating the “forgetting” problem of prior agents.
The system is evaluated on four scientific reasoning benchmarks—GAIA (multidisciplinary QA), HLE‑full (high‑level logical reasoning), GPQA (expert‑level QA), and FrontierScience (state‑of‑the‑art scientific literature analysis). InternAgent‑1.5 achieves top‑tier scores on all metrics, outperforming prior agents by 8–15% in accuracy and demonstrating faster convergence in multi‑round reasoning tasks.
Beyond benchmarks, the authors test two open‑ended discovery domains. In algorithm discovery, InternAgent‑1.5 autonomously designs reinforcement‑learning policies, test‑time adaptation methods, and meta‑learning frameworks that match or exceed hand‑crafted baselines on standard RL suites. In empirical discovery, the agent orchestrates end‑to‑end workflows that include data acquisition, wet‑lab protocol generation, robotic execution, and result interpretation. Notable case studies include:
- Life Sciences: Identification of a novel protein‑ligand binding mechanism, validated through in‑silico docking followed by automated wet‑lab binding assays.
- Earth Sciences: Development of a predictive model for microplastic transport in ocean currents, integrating satellite data, fluid dynamics simulations, and field sampling.
- Physical Sciences: Design of a quantum entanglement experiment, including pulse sequence generation and hardware configuration, leading to a reproducible violation of Bell’s inequality.
A comparative table (Table 1) shows that InternAgent‑1.5 uniquely satisfies all four desiderata (algorithm discovery, empirical discovery, deep research, solution refinement, wet‑lab persistence), whereas prior systems each miss at least one.
The paper also discusses limitations: heavy reliance on high‑performance compute and specialized lab equipment, lack of standardized interfaces for domain‑specific toolkits, and growing memory overhead as the number of discovery cycles increases. Future work is outlined to develop memory compression techniques, standardized API layers for robotic labs, and tighter human‑in‑the‑loop collaboration mechanisms.
In summary, InternAgent‑1.5 demonstrates that a principled “generate‑verify‑evolve” loop, underpinned by a cross‑disciplinary knowledge graph and multi‑layer cognitive memory, can achieve sustained autonomous scientific discovery across both computational and physical domains. The work sets a new benchmark for AI‑for‑Science systems and provides a concrete blueprint for building next‑generation autonomous research agents.
Comments & Academic Discussion
Loading comments...
Leave a Comment