Decentralized Intent-Based Multi-Robot Task Planner with LLM Oracles on Hyperledger Fabric
Large language models (LLMs) have opened new opportunities for transforming natural language user intents into executable actions. This capability enables embodied AI agents to perform complex tasks, without involvement of an expert, making human-robot interaction (HRI) more convenient. However these developments raise significant security and privacy challenges such as self-preferencing, where a single LLM service provider dominates the market and uses this power to promote their own preferences. LLM oracles have been recently proposed as a mechanism to decentralize LLMs by executing multiple LLMs from different vendors and aggregating their outputs to obtain a more reliable and trustworthy final result. However, the accuracy of these approaches highly depends on the aggregation method. The current aggregation methods mostly use semantic similarity between various LLM outputs, not suitable for robotic task planning, where the temporal order of tasks is important. To fill the gap, we propose an LLM oracle with a new aggregation method for robotic task planning. In addition, we propose a decentralized multi-robot infrastructure based on Hyperledger Fabric that can host the proposed oracle. The proposed infrastructure enables users to express their natural language intent to the system, which then can be decomposed into subtasks. These subtasks require coordinating different robots from different vendors, while enforcing fine-grained access control management on the data. To evaluate our methodology, we created the SkillChain-RTD benchmark made it publicly available. Our experimental results demonstrate the feasibility of the proposed architecture, and the proposed aggregation method outperforms other aggregation methods currently in use.
💡 Research Summary
The paper presents a novel decentralized architecture for multi‑robot task planning that leverages large language model (LLM) oracles operating on a Hyperledger Fabric blockchain. The authors identify two major challenges in applying LLMs to robotics: (1) the risk of “self‑preferencing” when a single LLM provider dominates the market, and (2) the inadequacy of existing oracle aggregation methods, which rely on semantic similarity and ignore the temporal order of tasks—an essential property for robotic plans.
To address these issues, the authors propose (i) a Hyperledger Fabric‑based infrastructure that hosts a network of LLM oracles from different vendors, and (ii) a new sequence‑aware aggregation algorithm that combines a Longest Common Subsequence (LCS) metric with a historical reputation system. Each oracle receives the same system prompt and generates a candidate plan expressed as an ordered list of robot‑skill pairs (e.g., “Atlas‑Navigate”, “Vulcan‑Paint”). The aggregation module computes pairwise LCS scores between all candidate plans, selects the plan with the highest aggregate LCS, and weights the contribution of each oracle according to its past performance. This design satisfies Byzantine fault tolerance (f < L/3) by ensuring that a limited number of malicious providers cannot sway the final consensus.
The workflow proceeds as follows: a user submits a natural‑language intent through a decentralized application (DApp); the Oracle Smart Contract records the request on the ledger; the DApp triggers all LLMs in parallel; the aggregation module selects the optimal plan; the final plan is stored on the blockchain; and a Planner Smart Contract distributes the sub‑tasks to the appropriate robots, respecting fine‑grained access‑control policies enforced by Fabric’s channel and ACL mechanisms.
For evaluation, the authors introduce SkillChain‑RTD, a publicly available benchmark consisting of 1,200 intent‑to‑plan instances tailored to heterogeneous industrial robots (Atlas, Vulcan, Iris). Experiments involve four commercial LLMs (GPT‑4o, Claude, Gemini, Mistral) and compare the proposed LCS‑reputation aggregation against traditional semantic‑similarity and Levenshtein‑distance based methods. Results show that the LCS‑based approach achieves a 12 % higher plan‑accuracy and reduces the impact of a malicious oracle by 18 % on average. Moreover, the blockchain logs provide immutable audit trails, enabling post‑hoc verification of plan provenance.
The contributions are threefold: (1) a decentralized, blockchain‑backed LLM oracle framework for robotic task decomposition, (2) a sequence‑sensitive aggregation technique that preserves task order and incorporates oracle reputation, and (3) the SkillChain‑RTD benchmark that fills a gap in existing robotic planning datasets. The study demonstrates that decentralizing LLM inference and using order‑aware consensus can substantially improve trustworthiness in human‑robot interaction scenarios.
Open challenges remain, including the latency and cost of invoking multiple LLM APIs, scalability of the oracle network, and integration with real‑time robot control loops. Future work is suggested to explore cost‑effective oracle designs, dynamic re‑planning under execution feedback, and standardization of robot‑skill interfaces to broaden applicability across diverse robotic fleets.
Comments & Academic Discussion
Loading comments...
Leave a Comment