Agent Primitives: Reusable Latent Building Blocks for Multi-Agent Systems

Agent Primitives: Reusable Latent Building Blocks for Multi-Agent Systems
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

While existing multi-agent systems (MAS) can handle complex problems by enabling collaboration among multiple agents, they are often highly task-specific, relying on manually crafted agent roles and interaction prompts, which leads to increased architectural complexity and limited reusability across tasks. Moreover, most MAS communicate primarily through natural language, making them vulnerable to error accumulation and instability in long-context, multi-stage interactions within internal agent histories. In this work, we propose \textbf{Agent Primitives}, a set of reusable latent building blocks for LLM-based MAS. Inspired by neural network design, where complex models are built from reusable components, we observe that many existing MAS architectures can be decomposed into a small number of recurring internal computation patterns. Based on this observation, we instantiate three primitives: Review, Voting and Selection, and Planning and Execution. All primitives communicate internally via key-value (KV) cache, which improves both robustness and efficiency by mitigating information degradation across multi-stage interactions. To enable automatic system construction, an Organizer agent selects and composes primitives for each query, guided by a lightweight knowledge pool of previously successful configurations, forming a primitive-based MAS. Experiments show that primitives-based MAS improve average accuracy by 12.0-16.5% over single-agent baselines, reduce token usage and inference latency by approximately 3$\times$-4$\times$ compared to text-based MAS, while incurring only 1.3$\times$-1.6$\times$ overhead relative to single-agent inference and providing more stable performance across model backbones.


💡 Research Summary

The paper addresses two major shortcomings of current large‑language‑model (LLM) based multi‑agent systems (MAS): (1) the heavy reliance on manually crafted agent roles and natural‑language prompts, which makes each system highly task‑specific and difficult to reuse, and (2) the instability of natural‑language communication as interaction histories grow, leading to information degradation and error accumulation. Inspired by the modular design of neural networks, the authors propose Agent Primitives, a set of reusable latent building blocks that capture recurring internal computation patterns in MAS. Three primitives are instantiated: (i) Review, which aggregates and validates prior outputs; (ii) Voting and Selection, which collects multiple agents’ suggestions and picks the best; and (iii) Planning and Execution, which decomposes a problem into steps and carries them out.

All primitives expose the same external interface as a standard LLM agent, enabling plug‑and‑play composition. Crucially, internal communication between primitives does not use text but the key‑value (KV) cache of the Transformer decoder. By concatenating KV caches and re‑indexing positions using Rotary Positional Encoding (RoPE), a downstream primitive can attend directly to the latent representations produced by an upstream primitive, avoiding repeated token decoding. This latent communication yields three practical benefits: (a) dramatic reduction in token usage and inference latency, (b) robustness against long‑context dilution and injected noise, and (c) preservation of the original semantic content without the brittleness of natural‑language parsing.

To automate system construction, an Organizer agent selects and assembles an appropriate sequence of primitives for each incoming query. The Organizer is guided by a lightweight knowledge pool that stores previously successful query‑to‑configuration mappings, allowing the system to adapt without human redesign.

Empirical evaluation spans eight benchmarks covering mathematical reasoning (GSM8K), code generation, and question answering, using five open‑source LLM backbones (e.g., Qwen‑3‑8B, LLaMA‑2). Compared with a single‑agent baseline, primitive‑based MAS improve average accuracy by 12.0–16.5 %. Relative to traditional text‑based MAS, they cut token consumption and latency by roughly 3×–4×, while incurring only 1.3×–1.6× overhead compared to single‑agent inference. Stress‑test experiments on long‑context task injection and communication noise demonstrate that KV‑cache communication maintains high accuracy and injection compliance where natural‑language communication collapses. Performance remains stable across different model sizes, confirming the approach’s scalability.

In summary, the work introduces a novel abstraction layer for multi‑agent collaboration—Agent Primitives—that transforms MAS design from ad‑hoc, role‑centric engineering into a modular, reusable architecture. By leveraging KV‑cache as a latent communication channel, the system achieves both efficiency and robustness, paving the way for more adaptable and scalable LLM‑driven multi‑agent applications.


Comments & Academic Discussion

Loading comments...

Leave a Comment