Z-Space: A Multi-Agent Tool Orchestration Framework for Enterprise-Grade LLM Automation

November 23, 2025

Reading time: 5 minute

...

📝 Original Info

Title: Z-Space: A Multi-Agent Tool Orchestration Framework for Enterprise-Grade LLM Automation
ArXiv ID: 2511.19483
Date: 2025-11-23
Authors: Qingsong He, Jing Nan, Jiayu Jiao, Liangjie Tang, Xiaodong Xu, Mengmeng Sun, Qingyao Wang, Minghui Yan

📝 Abstract

Large Language Models can break through knowledge and timeliness limitations by invoking external tools within the Model Context Protocol framework to achieve automated execution of complex tasks. However, with the rapid growth of enterprise-scale MCP services, efficiently and accurately matching target functionalities among thousands of heterogeneous tools has become a core challenge restricting system practicality. Existing approaches generally rely on full-prompt injection or static semantic retrieval, facing issues including semantic disconnection between user queries and tool descriptions, context inflation in LLM input, and high inference latency. To address these challenges, this paper proposes Z-Space, a data-generation-oriented multi-agent collaborative tool invocation framework Z-Space. The Z-Space framework establishes a multi-agent collaborative architecture and tool filtering algorithm: (1) A structured semantic understanding of user queries is achieved through an intent parsing model; (2) A tool filtering module (FSWW) based on fused subspace weighted algorithm realizes fine-grained semantic alignment between intents and tools without parameter tuning; (3) An inference execution agent is constructed to support dynamic planning and fault-tolerant execution for multi-step tasks. This framework has been deployed in the Eleme platform's technical division, serving large-scale test data generation scenarios across multiple business units including Taotian, Gaode, and Hema. Production data demonstrates that the system reduces average token consumption in tool inference by 96.26\% while achieving a 92\% tool invocation accuracy rate, significantly enhancing the efficiency and reliability of intelligent test data generation systems.

💡 Deep Analysis

📄 Full Content

Z-SPACE: A MULTI-AGENT TOOL ORCHESTRATION FRAMEWORK FOR ENTERPRISE-GRADE LLM AUTOMATION Qingsong He Rajax Network Technology (ele.me) Jing Nan Rajax Network Technology (ele.me) Jiayu Jiao Rajax Network Technology (ele.me) Liangjie Tang Rajax Network Technology (ele.me) email Xiaodong Xu Rajax Network Technology (ele.me) Mengmeng Sun Rajax Network Technology (ele.me) email Qingyao Wang Rajax Network Technology (ele.me) Minghui Yan Rajax Network Technology (ele.me) yinan.nj@alibaba-inc.com November 26, 2025 ABSTRACT Large Language Models can break through knowledge and timeliness limitations by invoking external tools within the Model Context Protocol framework to achieve automated execution of complex tasks. However, with the rapid growth of enterprise-scale MCP services, efficiently and accurately matching target functionalities among thousands of heterogeneous tools has become a core challenge restricting system practicality. Existing approaches generally rely on full-prompt injection or static semantic retrieval, facing issues including semantic disconnection between user queries and tool descriptions, context inflation in LLM input, and high inference latency. To address these challenges, this paper proposes Z-Space, a data-generation-oriented multi-agent collaborative tool invocation framework Z-Space. The Z-Space framework establishes a multi-agent collaborative architecture and tool filtering algorithm: (1) A structured semantic understanding of user queries is achieved through an intent parsing model; (2) A tool filtering module (FSWW) based on fused subspace weighted algorithm realizes fine-grained semantic alignment between intents and tools without parameter tuning; (3) An inference execution agent is constructed to support dynamic planning and fault-tolerant execution for multi-step tasks. This framework has been deployed in the Eleme platform’s technical division, serving large-scale test data generation scenarios across multiple business units including Taotian, Gaode, and Hema. Production data demonstrates that the system reduces average token consumption in tool inference by 96.26% while achieving a 92% tool invocation accuracy rate, significantly enhancing the efficiency and reliability of intelligent test data generation systems. 1 Introduction In recent years, Large Language Models have achieved groundbreaking advancements in natural language understanding and generation, demonstrating formidable capabilities in general reasoning and dialogue [1]. However, their capabilities are fundamentally constrained by the static snapshot of training data and limited context window, making it challenging arXiv:2511.19483v1 [cs.SE] 23 Nov 2025 A PREPRINT - NOVEMBER 26, 2025 to directly intervene in real-world state transitions or access private, dynamic data sources [2]. To overcome these limitations, technologies such as Function Calling and Model Context Protocol have emerged, endowing LLMs with the ability to invoke external tools, execute program code, and manipulate physical or digital environments [3]. This marks a paradigm shift in artificial intelligence systems—from "passive responders" toward "active executors." Despite these advancements, existing tool integration paradigms face significant challenges. The prevailing mainstream approach adopts an "all-injection" strategy, where detailed descriptions (name, parameters, functional specifications) of all available tools are concatenated into the prompt to support LLM decision-making [4]. While conceptually simple, this method reveals fundamental flaws in practical applications: As the tool repository scales, prompt length expands linearly. This not only incurs substantial computational costs and latency but also drowns critical semantic cues in massive redundant information, dispersing model attention and significantly degrading decision accuracy—effectively regressing an advanced cognitive agent into an inefficient keyword matching engine. To alleviate contextual pressure, recent studies have explored Retrieval-Augmented Generation based tool selection methods [5]. These approaches vectorize tool metadata and perform similarity retrieval against user queries at runtime to pre-screen candidate tools [6]. While such methods partially address scalability issues, they remain trapped in static, passive retrieval paradigms. Their fatal deficiency lies in the lack of foresight and dynamic adaptation: Systems can only respond to the literal meaning of initial queries, failing to anticipate emergent tool requirements arising from intermediate states during task execution chains [7]. For instance, when receiving the instruction "Issue a coupon to the user," an ideal execution path should involve consecutive actions including user information retrieval, coupon distribution, and result verification. However, single-stage retrieval based on the initial query often only identifies the "coupon distribution" module, leaving other required tools unactivated.

📄 Read Full PDF on ArXiv