Bandwidth-constrained Variational Message Encoding for Cooperative Multi-agent Reinforcement Learning
Graph-based multi-agent reinforcement learning (MARL) enables coordinated behavior under partial observability by modeling agents as nodes and communication links as edges. While recent methods excel at learning sparse coordination graphs-determining who communicates with whom-they do not address what information should be transmitted under hard bandwidth constraints. We study this bandwidth-limited regime and show that naive dimensionality reduction consistently degrades coordination performance. Hard bandwidth constraints force selective encoding, but deterministic projections lack mechanisms to control how compression occurs. We introduce Bandwidth-constrained Variational Message Encoding (BVME), a lightweight module that treats messages as samples from learned Gaussian posteriors regularized via KL divergence to an uninformative prior. BVME’s variational framework provides principled, tunable control over compression strength through interpretable hyperparameters, directly constraining the representations used for decision-making. Across SMACv1, SMACv2, and MPE benchmarks, BVME achieves comparable or superior performance while using 67–83% fewer message dimensions, with gains most pronounced on sparse graphs where message quality critically impacts coordination. Ablations reveal U-shaped sensitivity to bandwidth, with BVME excelling at extreme ratios while adding minimal overhead.
💡 Research Summary
The paper tackles a largely overlooked aspect of cooperative multi‑agent reinforcement learning (MARL): how to encode the content of inter‑agent messages when strict bandwidth limits are imposed. While recent graph‑based MARL methods have made great strides in learning sparse coordination graphs—determining who should communicate—they typically assume that once a link is established, agents can exchange high‑dimensional embeddings without restriction. The authors demonstrate that under hard bandwidth constraints (e.g., only 5–10 % of the original observation size may be transmitted), naïve dimensionality reduction via deterministic linear projections leads to severe performance degradation, especially for sparse graphs where each edge carries critical coordination information.
To address this, the authors propose Bandwidth‑constrained Variational Message Encoding (BVME), a lightweight module that treats each agent’s outgoing message as a sample from a learned Gaussian posterior. Concretely, after a standard GNN message‑passing layer produces a vector m_i of size d_msg, two small MLPs (Enc_μ and Enc_σ) map m_i to a mean μ_i and a log‑variance log σ_i². The message is then sampled using the re‑parameterization trick: z_i = μ_i + σ_i ⊙ ε, ε ∼ N(0, I). This stochastic message z_i replaces the deterministic m_i as input to the agent’s Q‑network, ensuring that the compression directly influences decision‑making (“on‑path coupling”).
A KL‑divergence term regularizes each posterior toward an uninformative isotropic Gaussian prior N(0, σ₀² I). The strength of this regularization is controlled by a hyper‑parameter λ_KL, while σ₀ sets the scale of the prior. Together with the explicit compression ratio r = d_msg/d_obs, these parameters give precise, interpretable control over how much information can be transmitted. Unlike fixed linear projections, BVME can allocate low variance (high confidence) to task‑relevant dimensions and high variance (low confidence) to less useful features, effectively learning a task‑aware compression scheme.
The method is instantiated on two representative graph‑based MARL backbones: GA‑CG (a sparse, group‑aware coordination graph learner) and DICG (a dense coordination graph). Experiments span three benchmark suites—SMACv1, SMACv2, and the Multi‑Particle‑Environment (MPE) Tag task. Under severe bandwidth constraints (r = 0.05), BVME consistently outperforms the baselines, achieving higher win rates and faster convergence. Notably, BVME attains comparable or superior performance while using 67 %–83 % fewer message dimensions, with the most pronounced gains on sparse graphs where each edge’s information is critical.
A detailed ablation study reveals a U‑shaped sensitivity to bandwidth: BVME provides large benefits at extreme compression (r ≤ 0.05), modest gains at moderate compression, and negligible impact when bandwidth is abundant. Crucially, “on‑path” regularization (applying KL to the sampled messages that feed the Q‑network) outperforms an “off‑path” variant that regularizes only the mean vectors, confirming that the compression must directly constrain the representations used for control. Additional ablations show that tuning λ_KL and σ₀ allows smooth trade‑offs between information preservation and bandwidth usage.
In summary, the paper makes three key contributions: (1) it introduces a variational framework for message compression that provides principled, tunable control over bandwidth usage via KL regularization; (2) it demonstrates that this approach yields substantial performance improvements under hard bandwidth limits, especially for sparse communication topologies; and (3) it validates that coupling the stochastic messages directly to the decision‑making pipeline is essential for achieving these gains. The proposed BVME module is lightweight, architecture‑agnostic, and readily applicable to real‑world multi‑robot or IoT deployments where communication resources are scarce.
Comments & Academic Discussion
Loading comments...
Leave a Comment