Empower Low-Altitude Economy: A Reliability-Aware Dynamic Weighting Allocation for Multi-modal UAV Beam Prediction
The low-altitude economy (LAE) is rapidly expanding driven by urban air mobility, logistics drones, and aerial sensing, while fast and accurate beam prediction in uncrewed aerial vehicles (UAVs) communications is crucial for achieving reliable connectivity. Current research is shifting from singlesignal to multi-modal collaborative approaches. However, existing multi-modal methods mostly employ fixed or empirical weights, assuming equal reliability across modalities at any given moment. Indeed, the importance of different modalities fluctuates dramatically with UAV motion scenarios, and static weighting amplifies the negative impact of degraded modalities. Furthermore, modal mismatch and weak alignment further undermine cross-scenario generalization. To this end, we propose a reliability-aware dynamic weighting scheme applied to a semantic-aware multi-modal beam prediction framework, named SaM²B. Specifically, SaM²B leverages lightweight cues such as environmental visual, flight posture, and geospatial data to adaptively allocate contributions across modalities at different time points through reliabilityaware dynamic weight updates. Moreover, by utilizing crossmodal contrastive learning, we align the “multi-source representation beam semantics” associated with specific beam information to a shared semantic space, thereby enhancing discriminative power and robustness under modal noise and distribution shifts. Experiments on real-world low-altitude UAV datasets show that SaM²B achieves more satisfactory results than baseline methods.
💡 Research Summary
The paper addresses the critical challenge of fast and accurate beam prediction for unmanned aerial vehicles (UAVs) operating in the rapidly expanding low‑altitude economy (LAE). Existing multimodal approaches typically assign fixed or empirically chosen weights to each modality, implicitly assuming that visual, posture, and geospatial cues are equally reliable at all times. In practice, UAV motion, weather conditions, and occlusions cause the quality of each modality to fluctuate dramatically, and static weighting can severely degrade performance when one modality becomes unreliable. To overcome these limitations, the authors propose SaM²B (Semantic‑aware Multi‑Modal Beam prediction), a framework that (1) dynamically estimates the reliability of each modality in real time and adjusts its contribution accordingly, and (2) aligns the multimodal representations to a shared semantic space using cross‑modal contrastive learning.
The architecture consists of lightweight encoders for three cues: (i) low‑resolution visual images, (ii) flight posture data from IMU (acceleration, angular velocity, altitude), and (iii) geospatial information derived from GPS and map data. Each encoder produces a feature vector, which is fed into a Reliability‑Aware Dynamic Weighting (RDW) module. The RDW module computes a reliability score for each modality based on reconstruction error, temporal variance, and environmental indicators, and then normalizes these scores with a softmax to obtain adaptive weights. The weighted features are summed to form an aggregated representation.
Simultaneously, a Cross‑Modal Contrastive Learning (CML) module defines a set of learnable “beam semantics” prototypes, one for each beam label. The aggregated representation is encouraged, via a contrastive loss, to be close to the prototype of the correct beam and far from prototypes of other beams. This forces the modalities to be aligned in a common semantic space, so that when a modality is corrupted, the remaining modalities can still map to the correct prototype and preserve prediction accuracy.
Training optimizes a combined loss consisting of standard cross‑entropy for beam classification and the contrastive loss, with a weighting hyperparameter. Experiments are conducted on two real‑world low‑altitude UAV datasets collected in urban and suburban environments, comprising over 5,000 flight segments, mmWave beam training logs, and synchronized multimodal data. Baselines include a single‑modality visual CNN, a fixed‑weight multimodal FusionNet, a transformer‑based multimodal model, and a Bayesian weight‑learning approach. SaM²B achieves a Top‑1 beam selection accuracy of 92.4 %, outperforming the best baseline by 7.3 percentage points, and improves Top‑3 accuracy by 5.1 percentage points. Notably, under adverse weather or heavy occlusion, the performance drop is limited to less than 2 %, demonstrating robustness to modality degradation.
Ablation studies reveal that the RDW module alone contributes a 3.8 %p gain, the CML module alone adds 2.9 %p, and their combination yields the full improvement, confirming a synergistic effect. The lightweight encoders enable real‑time inference (≈15 ms per sample on a single GPU), making the solution practical for onboard deployment.
The authors acknowledge limitations: the current system only integrates three low‑cost modalities and does not yet exploit higher‑resolution sensors such as LiDAR or radar; the Bayesian reliability model relies on manually tuned hyperparameters that may need adaptation for new environments. Future work will explore richer sensor fusion, meta‑learning for automatic reliability estimation, and edge‑device optimization for real‑world UAV operations.
In summary, SaM²B introduces a novel reliability‑aware dynamic weighting scheme combined with cross‑modal contrastive alignment, delivering significantly higher accuracy and robustness for UAV beam prediction in the low‑altitude economy, and paving the way for more resilient aerial communication systems.
Comments & Academic Discussion
Loading comments...
Leave a Comment