From Raw Data to Shared 3D Semantics: Task-Oriented Communication for Multi-Robot Collaboration
Multi-robot systems (MRS) rely on exchanging raw sensory data to cooperate in complex three-dimensional (3D) environments. However, this strategy often leads to severe communication congestion and high transmission latency, significantly degrading collaboration efficiency. This paper proposes a decentralized task-oriented semantic communication framework for multi-robot collaboration in unknown 3D environments. Each robot locally extracts compact, task-relevant semantics using a lightweight Pixel Difference Network (PiDiNet) with geometric processing. It shares only these semantic updates to build a task-sufficient 3D scene representation that supports cooperative perception, navigation, and object transport. Our numerical results show that the proposed method exhibits a dramatic reduction in communication overhead from $858.6$ Mb to $4.0$ Mb (over $200\times$ compression gain) while improving collaboration efficiency by shortening task completion from $1,054$ to $281$ steps.
💡 Research Summary
The paper addresses the severe communication bottleneck that arises when multiple robots collaborate in unknown three‑dimensional (3D) environments. Traditional multi‑robot systems (MRS) exchange raw sensory streams—high‑resolution RGB‑D images, dense point clouds, or TSDF grids—leading to massive bandwidth consumption, congestion, and latency that degrade coordination. To overcome this, the authors propose a decentralized, task‑oriented semantic communication framework that replaces raw data with compact, decision‑critical information.
At the perception side, each robot runs a lightweight Pixel Difference Network (PiDiNet) on its onboard RGB image to produce a binary edge mask (256 × 256, 1‑bit per pixel, ≈ 64 KB). Edges capture structural boundaries essential for navigation while discarding texture. From depth or point‑cloud data, the robot extracts a set of sparse 3D anchors (≈ 1,200 points, each quantized to 48 bits, ≈ 58 KB) that provide coarse spatial alignment across robots. Detected target objects are encoded as semantic entities containing ID, class label, 3‑D position, orientation, confidence, and task status; even with up to 20 objects the payload stays in the low‑kilobit range. All these components, together with a small header (robot ID, timestamp, optional pose), form a semantic message Mᵢˢᵉᵐ,ₜ.
Transmission is event‑driven. Four events trigger a broadcast: (E1) discovery of a new target, (E2) significant pose/orientation deviation of an existing object, (E3) change of an object’s task status, and (E4) a noticeable change in the edge map. Events are prioritized (E3/E1 > E2 > E4) and a simple budget‑aware scheduler ensures the aggregate bit budget Bₘₐₓ per time step is never exceeded. Consequently, robots send updates only when the shared scene changes in a way that can affect decision making, dramatically reducing unnecessary traffic.
Upon reception, each robot incrementally fuses incoming messages into a local shared 3D semantic scene Sᵢ(t), consisting of aggregated edges, merged anchors, and a synchronized object table. Anchors are merged via voxel hashing; duplicate reports are averaged or the highest‑confidence anchor is kept. Objects are associated by label and spatial proximity (gate εₒ) and fused with confidence‑weighted averaging; status updates follow a monotonic hierarchy (delivered > carried > claimed > unassigned) to avoid oscillations. Edge masks are transformed using the sender’s pose and combined conservatively (union or confidence‑weighted sum). The resulting scene is intentionally coarse—it is not a globally consistent dense map but a lightweight representation sufficient for navigation, exploration, and task allocation.
Task allocation is fully decentralized. Each robot computes a local cost cᵢₖ = α‖pₖ − pᵢ‖ + β·busyᵢ for every unassigned object k and claims the one with minimal cost by broadcasting a claim message. Other robots mark the object as claimed, preventing redundant pursuit. Motion planning uses local sensor data for immediate obstacle avoidance, while the shared edges and anchors provide global traversability hints, reducing dead‑ends and repeated exploration. The shared object table supplies goal locations and status information, enabling coordinated pick‑up, transport to a known depot, and status dissemination via the same event‑driven mechanism.
Experiments were conducted in simulation with four and six robots navigating an initially unknown 3D environment containing static obstacles and multiple target objects. Two communication paradigms were compared: (i) raw communication (periodic transmission of full sensor payloads) and (ii) the proposed semantic communication. The semantic approach reduced total transmitted data from 858.6 MB to 4.0 MB—a compression gain exceeding 200×—and shortened the mission from 1,054 steps to 281 steps, demonstrating a 73 % reduction in task completion time under the same bandwidth constraints.
The paper’s contributions are threefold: (1) a lightweight, task‑oriented semantic extraction pipeline based on PiDiNet that isolates structural edges, sparse geometric anchors, and object semantics; (2) an event‑driven transmission policy that aligns communication with task‑relevant changes, achieving extreme bandwidth efficiency; (3) a fully decentralized framework that builds a shared 3D semantic scene and uses it for task allocation, motion planning, and cooperative transport without any centralized controller or dense map fusion. Limitations include the lack of real‑world wireless channel testing (packet loss, variable latency) and limited evaluation of scalability to larger robot swarms or highly dynamic environments. Future work is suggested to integrate robustness to network impairments, extend to dynamic obstacles, and explore learning‑based event triggers for even smarter bandwidth management.
Comments & Academic Discussion
Loading comments...
Leave a Comment