Large Language Models can break through knowledge and timeliness limitations by invoking external tools within the Model Context Protocol framework to achieve automated execution of complex tasks. However, with the rapid growth of enterprise-scale MCP services, efficiently and accurately matching target functionalities among thousands of heterogeneous tools has become a core challenge restricting system practicality. Existing approaches generally rely on full-prompt injection or static semantic retrieval, facing issues including semantic disconnection between user queries and tool descriptions, context inflation in LLM input, and high inference latency. To address these challenges, this paper proposes Z-Space, a data-generation-oriented multi-agent collaborative tool invocation framework Z-Space. The Z-Space framework establishes a multi-agent collaborative architecture and tool filtering algorithm: (1) A structured semantic understanding of user queries is achieved through an intent parsing model; (2) A tool filtering module (FSWW) based on fused subspace weighted algorithm realizes fine-grained semantic alignment between intents and tools without parameter tuning; (3) An inference execution agent is constructed to support dynamic planning and fault-tolerant execution for multi-step tasks. This framework has been deployed in the Eleme platform's technical division, serving large-scale test data generation scenarios across multiple business units including Taotian, Gaode, and Hema. Production data demonstrates that the system reduces average token consumption in tool inference by 96.26\% while achieving a 92\% tool invocation accuracy rate, significantly enhancing the efficiency and reliability of intelligent test data generation systems.
💡 Deep Analysis
📄 Full Content
Z-SPACE: A MULTI-AGENT TOOL ORCHESTRATION
FRAMEWORK FOR ENTERPRISE-GRADE LLM AUTOMATION
Qingsong He
Rajax Network Technology (ele.me)
Jing Nan
Rajax Network Technology (ele.me)
Jiayu Jiao
Rajax Network Technology (ele.me)
Liangjie Tang
Rajax Network Technology (ele.me)
email
Xiaodong Xu
Rajax Network Technology (ele.me)
Mengmeng Sun
Rajax Network Technology (ele.me)
email
Qingyao Wang
Rajax Network Technology (ele.me)
Minghui Yan
Rajax Network Technology (ele.me)
yinan.nj@alibaba-inc.com
November 26, 2025
ABSTRACT
Large Language Models can break through knowledge and timeliness limitations by invoking external
tools within the Model Context Protocol framework to achieve automated execution of complex
tasks. However, with the rapid growth of enterprise-scale MCP services, efficiently and accurately
matching target functionalities among thousands of heterogeneous tools has become a core challenge
restricting system practicality. Existing approaches generally rely on full-prompt injection or static
semantic retrieval, facing issues including semantic disconnection between user queries and tool
descriptions, context inflation in LLM input, and high inference latency. To address these challenges,
this paper proposes Z-Space, a data-generation-oriented multi-agent collaborative tool invocation
framework Z-Space. The Z-Space framework establishes a multi-agent collaborative architecture and
tool filtering algorithm: (1) A structured semantic understanding of user queries is achieved through
an intent parsing model; (2) A tool filtering module (FSWW) based on fused subspace weighted
algorithm realizes fine-grained semantic alignment between intents and tools without parameter
tuning; (3) An inference execution agent is constructed to support dynamic planning and fault-tolerant
execution for multi-step tasks. This framework has been deployed in the Eleme platform’s technical
division, serving large-scale test data generation scenarios across multiple business units including
Taotian, Gaode, and Hema. Production data demonstrates that the system reduces average token
consumption in tool inference by 96.26% while achieving a 92% tool invocation accuracy rate,
significantly enhancing the efficiency and reliability of intelligent test data generation systems.
1
Introduction
In recent years, Large Language Models have achieved groundbreaking advancements in natural language understanding
and generation, demonstrating formidable capabilities in general reasoning and dialogue [1]. However, their capabilities
are fundamentally constrained by the static snapshot of training data and limited context window, making it challenging
arXiv:2511.19483v1 [cs.SE] 23 Nov 2025
A PREPRINT - NOVEMBER 26, 2025
to directly intervene in real-world state transitions or access private, dynamic data sources [2]. To overcome these
limitations, technologies such as Function Calling and Model Context Protocol have emerged, endowing LLMs with
the ability to invoke external tools, execute program code, and manipulate physical or digital environments [3]. This
marks a paradigm shift in artificial intelligence systems—from "passive responders" toward "active executors."
Despite these advancements, existing tool integration paradigms face significant challenges. The prevailing mainstream
approach adopts an "all-injection" strategy, where detailed descriptions (name, parameters, functional specifications) of
all available tools are concatenated into the prompt to support LLM decision-making [4]. While conceptually simple,
this method reveals fundamental flaws in practical applications: As the tool repository scales, prompt length expands
linearly. This not only incurs substantial computational costs and latency but also drowns critical semantic cues in
massive redundant information, dispersing model attention and significantly degrading decision accuracy—effectively
regressing an advanced cognitive agent into an inefficient keyword matching engine.
To alleviate contextual pressure, recent studies have explored Retrieval-Augmented Generation based tool selection
methods [5]. These approaches vectorize tool metadata and perform similarity retrieval against user queries at runtime
to pre-screen candidate tools [6]. While such methods partially address scalability issues, they remain trapped in static,
passive retrieval paradigms. Their fatal deficiency lies in the lack of foresight and dynamic adaptation: Systems can
only respond to the literal meaning of initial queries, failing to anticipate emergent tool requirements arising from
intermediate states during task execution chains [7]. For instance, when receiving the instruction "Issue a coupon
to the user," an ideal execution path should involve consecutive actions including user information retrieval, coupon
distribution, and result verification. However, single-stage retrieval based on the initial query often only identifies the
"coupon distribution" module, leaving other required tools unactivated.