Lemon Agent Technical Report
Recent advanced LLM-powered agent systems have exhibited their remarkable capabilities in tackling complex, long-horizon tasks. Nevertheless, they still suffer from inherent limitations in resource efficiency, context management, and multimodal perception. Based on these observations, Lemon Agent is introduced, a multi-agent orchestrator-worker system built on a newly proposed AgentCortex framework, which formalizes the classic Planner-Executor-Memory paradigm through an adaptive task execution mechanism. Our system integrates a hierarchical self-adaptive scheduling mechanism that operates at both the overall orchestrator layer and workers layer. This mechanism can dynamically adjust computational intensity based on task complexity. It enables orchestrator to allocate one or more workers for parallel subtask execution, while workers can further improve operational efficiency by invoking tools concurrently. By virtue of this two-tier architecture, the system achieves synergistic balance between global task coordination and local task execution, thereby optimizing resource utilization and task processing efficiency in complex scenarios. To reduce context redundancy and increase information density during parallel steps, we adopt a three-tier progressive context management strategy. To make fuller use of historical information, we propose a self-evolving memory system, which can extract multi-dimensional valid information from all historical experiences to assist in completing similar tasks. Furthermore, we provide an enhanced MCP toolset. Empirical evaluations on authoritative benchmarks demonstrate that our Lemon Agent can achieve a state-of-the-art 91.36% overall accuracy on GAIA and secures the top position on the xbench-DeepSearch leaderboard with a score of 77+.
💡 Research Summary
The paper introduces Lemon Agent, a novel multi‑agent orchestrator‑worker system built on the newly proposed AgentCortex framework, which formalizes the classic Planner‑Executor‑Memory paradigm for large‑language‑model (LLM) agents. The authors begin by identifying three persistent challenges in current LLM‑powered agents: (1) static resource allocation that leads to over‑provisioning of powerful models for simple tasks, (2) binary‑outcome‑driven memory updates that discard valuable partial successes or informative failures, and (3) limited multimodal perception, especially for high‑resolution visual inputs. To address these, Lemon Agent integrates five core innovations.
First, AgentCortex provides a modular, production‑grade infrastructure that decomposes an intelligent agent into well‑defined components (intent understanding, task decomposition, planning, tool execution, knowledge retrieval, memory management, summarization) and connects them through abstracted interfaces. This enables rapid algorithmic experimentation while preserving a direct path to enterprise deployment, including built‑in microservice engine, logging, monitoring, and database support.
Second, a hierarchical self‑adaptive scheduling mechanism operates at two levels. At the macro level, the main orchestrator dynamically decides whether to engage a single sub‑worker or a collaborative ensemble of specialized workers based on the structural independence of subtasks. Simple, monolithic goals trigger a parsimonious configuration; complex goals with orthogonal sub‑goals trigger parallel workers. At the micro level, each worker adjusts the degree of tool parallelism between one and five concurrent calls, guided by the nature of the sub‑task. Information‑intensive tasks (e.g., large‑scale web search, multi‑image analysis) benefit from concurrent tool invocations, while reasoning‑heavy chains with high inter‑step dependencies default to sequential execution to preserve logical coherence.
Third, Lemon Agent employs a three‑tier progressive context management strategy to mitigate context window overflow and information decay during long‑horizon trajectories. Tier 1 performs intra‑tool truncation when a single tool’s raw output exceeds a character threshold, logging metadata about the truncation point. Tier 2 triggers intra‑round adaptive summarization when the cumulative length of all tool responses in a round surpasses a heuristic limit, reconstructing truncated segments to retain semantic fidelity. Tier 3 initiates cross‑round retroactive compression once the overall historical context approaches capacity; it backtracks to locate previously truncated nodes, applies a secondary summarization, and replaces the raw entries in‑place, thereby shrinking the context footprint while preserving essential logical links. The truncation registry is cleared after each retroactive update to avoid repeated loss.
Fourth, the Self‑Evolving Semantic Memory (SES‑Memory) departs from binary success/failure memory schemes. It extracts high‑value skill snippets from every execution trajectory, regardless of final outcome. Intermediate steps that contain reusable code fragments, tool usage patterns, or decisive reasoning steps are distilled into independent “skill snippets.” A two‑stage control mechanism first retrieves the top‑k most relevant memories, then filters them by similarity thresholds to exclude noisy or low‑value entries. When a query returns multiple highly similar memories, the system suppresses creation of a new redundant memory, preventing uncontrolled growth. This continual experiential learning enables the agent to improve over time, leveraging both successes and informative failures.
Fifth, the toolset is augmented with specialized modules to enhance multimodal perception and real‑world interaction. The “Intelligent Image Tool” addresses resolution loss in Vision‑Language Models (VLMs) by detecting when localized high‑resolution analysis is needed, extracting normalized bounding‑box coordinates, converting them to absolute pixel values, and re‑processing only the region of interest at full resolution. Additional tools include a multi‑source search engine, a robust file reader, and a street‑view navigation agent, collectively expanding the agent’s capability to handle diverse data modalities.
Empirical evaluation on two authoritative benchmarks demonstrates the efficacy of the design. On the GAIA benchmark, Lemon Agent achieves a 91.36 % overall accuracy, surpassing prior state‑of‑the‑art systems. On the xbench‑DeepSearch leaderboard, it attains a score above 77, securing the top position. Ablation studies reveal that the hierarchical scheduling reduces computational cost by roughly 30 % without sacrificing accuracy, while the three‑tier context compression maintains reasoning coherence over trajectories exceeding typical LLM context windows. The SES‑Memory contributes measurable gains in downstream task performance, especially in scenarios where partial successes provide reusable knowledge.
Beyond academic metrics, the authors report deployment of the AgentCortex framework in Lenovo’s Super Intelligent Agent, handling transaction volumes in the hundreds of millions and earning recognition as a 2025 CCF Enterprise Digitalization Outstanding Case. The open‑source release (https://github.com/Open‑LemonAgent/LemonAgent) invites the community to reproduce results and extend the architecture.
In summary, Lemon Agent presents a comprehensive solution to the resource, memory, and perception bottlenecks of contemporary LLM agents. By unifying a modular production‑grade framework, hierarchical adaptive scheduling, progressive context compression, self‑evolving semantic memory, and an enhanced multimodal tool suite, it delivers state‑of‑the‑art performance while remaining scalable and deployable in real‑world settings. The work sets a promising direction for future autonomous AI systems, suggesting that the combination of adaptive resource coordination and continuous experiential learning can bridge the gap between theoretical agent potential and practical, large‑scale deployment.
Comments & Academic Discussion
Loading comments...
Leave a Comment