Rethinking Efficient Mixture-of-Experts for Remote Sensing Modality-Missing Classification

Rethinking Efficient Mixture-of-Experts for Remote Sensing Modality-Missing Classification
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Multimodal remote sensing classification often suffers from missing modalities caused by sensor failures and environmental interference, leading to severe performance degradation. In this work, we rethink missing-modality learning from a conditional computation perspective and investigate whether Mixture-of-Experts (MoE) models can inherently adapt to diverse modality-missing scenarios. We first conduct a systematic study of representative MoE paradigms under various missing-modality settings, revealing both their potential and limitations. Building on these insights, we propose a Missing-aware Mixture-of-LoRAs (MaMOL), a parameter-efficient MoE framework that unifies multiple modality-missing cases within a single model. MaMOL introduces a dual-routing mechanism to decouple modality-invariant shared experts and modality-aware dynamic experts, enabling automatic expert activation conditioned on available modalities. Extensive experiments on multiple remote sensing benchmarks demonstrate that MaMOL significantly improves robustness and generalization under diverse missing-modality scenarios with minimal computational overhead. Transfer experiments on natural image datasets further validate its scalability and cross-domain applicability.


💡 Research Summary

The paper tackles the pervasive problem of missing modalities in multimodal remote‑sensing classification, where sensor failures, atmospheric conditions, or budget constraints often leave only a subset of the available data (optical, SAR, hyperspectral, LiDAR, etc.). Traditional approaches follow a two‑stage pipeline: first train a model on complete data, then adapt it to each missing‑modality scenario via additional modules, fine‑tuning, or parameter updates. This strategy suffers from high computational and storage overhead when many missing‑modality patterns exist, and it fails to generalize to unseen combinations because the initial training never experiences incomplete inputs.

Motivated by recent advances in conditional computation, the authors reinterpret missing‑modality learning as a routing problem: each possible modality subset should trigger a distinct computation path within the network. Mixture‑of‑Experts (MoE) architectures naturally fit this view because they consist of multiple expert subnetworks and a router that selects a subset of experts based on the input. The authors first conduct a systematic empirical study of three representative MoE paradigms under missing‑modality conditions:

  1. Replace‑based MoE – entire backbone layers are swapped with expert blocks. While expressive, this design scales poorly: the number of parameters grows linearly with the number of experts, leading to prohibitive memory and compute costs for remote‑sensing models that often need many experts to cover all modality combinations.

  2. Adapt‑based MoE – a shared backbone is kept frozen (or lightly tuned) and lightweight residual expert branches are added. This is more parameter‑efficient, but because the router does not explicitly consider modality availability, the experts entangle modality‑specific knowledge with missing‑pattern adaptation, resulting in unstable specialization and limited robustness.

  3. Task‑driven MoE – the missing‑modality configuration is encoded as a task descriptor and concatenated with the input before routing. Although this introduces awareness of the missing pattern, all experts still belong to a single pool, causing interference between modality‑invariant, modality‑specific, and missing‑pattern transformations.

The empirical analysis shows that none of these existing designs simultaneously (i) preserve modality‑invariant knowledge, (ii) retain modality‑specific priors when a modality is present, and (iii) isolate adaptations required by the missing‑pattern. To address these gaps, the authors propose Missing‑aware Mixture‑of‑LoRAs (MaMOL), a novel, parameter‑efficient MoE framework built on low‑rank adaptation (LoRA) experts.

Key architectural components of MaMOL:

  • Backbone: A pretrained Vision Transformer (ViT) whose parameters are largely frozen, ensuring that the model inherits strong visual representations without costly re‑training.

  • Static experts: Two families of LoRA experts that are always active (shared expert) or conditionally active based on the presence of a specific modality (modality‑specific experts). The shared expert captures modality‑invariant visual features, while each modality‑specific expert encodes sensor‑type priors (e.g., SAR texture, hyperspectral spectral signatures). Their activation coefficients are deterministic: the shared expert is always on, and a modality‑specific expert is turned on only when its corresponding binary indicator (m_i = 1).

  • Dynamic experts: A set of K_d LoRA experts that specialize in handling particular missing‑modality patterns. The router explicitly incorporates both the feature representation (z) and a learned embedding of the binary modality mask (\psi(m)). A lightweight projection (\phi) maps (


Comments & Academic Discussion

Loading comments...

Leave a Comment