ThermoSplat: Cross-Modal 3D Gaussian Splatting with Feature Modulation and Geometry Decoupling
Multi-modal scene reconstruction integrating RGB and thermal infrared data is essential for robust environmental perception across diverse lighting and weather conditions. However, extending 3D Gaussian Splatting (3DGS) to multi-spectral scenarios remains challenging. Current approaches often struggle to fully leverage the complementary information of multi-modal data, typically relying on mechanisms that either tend to neglect cross-modal correlations or leverage shared representations that fail to adaptively handle the complex structural correlations and physical discrepancies between spectrums. To address these limitations, we propose ThermoSplat, a novel framework that enables deep spectral-aware reconstruction through active feature modulation and adaptive geometry decoupling. First, we introduce a Spectrum-Aware Adaptive Modulation that dynamically conditions shared latent features on thermal structural priors, effectively guiding visible texture synthesis with reliable cross-modal geometric cues. Second, to accommodate modality-specific geometric inconsistencies, we propose a Modality-Adaptive Geometric Decoupling scheme that learns independent opacity offsets and executes an independent rasterization pass for the thermal branch. Additionally, a hybrid rendering pipeline is employed to integrate explicit Spherical Harmonics with implicit neural decoding, ensuring both semantic consistency and high-frequency detail preservation. Extensive experiments on the RGBT-Scenes dataset demonstrate that ThermoSplat achieves state-of-the-art rendering quality across both visible and thermal spectrums.
💡 Research Summary
ThermoSplat introduces a cross‑modal 3D Gaussian Splatting framework that jointly reconstructs RGB and thermal infrared scenes with real‑time performance and high visual fidelity. Building on the fast, explicit representation of 3DGS, the authors identify two fundamental challenges in multi‑spectral reconstruction: (1) the need to exploit complementary structural cues from the thermal band to guide visible‑light texture synthesis, and (2) the physical discrepancy between the two modalities that makes a single shared geometry sub‑optimal. To address these, ThermoSplat proposes three tightly coupled components. First, a Spectrum‑Aware Adaptive Modulation module conditions the shared latent feature map (produced by rasterizing per‑Gaussian latent vectors) on thermal structural priors. A shared encoder extracts a common representation h, while a dedicated thermal prior head generates a thermal feature h_th. Linear layers map h_th to scaling (γ) and shifting (β) parameters, which are applied to h (γ⊙h + β) to produce a modulated feature h_mod. This dynamic conditioning forces visible‑light decoding to respect thermal boundaries, effectively reducing texture drift under low‑light or adverse weather. Second, a Modality‑Adaptive Geometric Decoupling scheme learns a per‑Gaussian opacity offset Δtα, added to the base opacity α to form a thermal‑specific opacity α_t = sigmoid(logit(α) + Δtα). An independent rasterization pass using α_t yields a thermal‑only latent map A_f(t), ensuring that occlusion and depth cues in the thermal image follow the physics of infrared sensing rather than inheriting high‑frequency visible textures. Third, a Hybrid Rendering pipeline combines explicit Spherical Harmonics (SH) for low‑frequency illumination consistency with an implicit neural decoder that consumes h_mod to recover high‑frequency RGB color. The final RGB image is a blend of SH‑derived base colors and neural‑enhanced details, while the thermal image is decoded directly from the thermal‑specific latent map. Experiments on the RGBT‑Scenes benchmark demonstrate that ThermoSplat outperforms prior RGBT‑GS methods such as ThermalGaussian, MS‑Splatting, and MMOne across PSNR, SSIM, and LPIPS metrics. Qualitative results show that without geometric decoupling the thermal output retains spurious visible‑light textures, whereas the proposed decoupling produces smooth, physically plausible infrared renders. Ablation studies confirm the importance of both the adaptive modulation and the opacity offset. The paper’s contributions lie in (i) a novel cross‑modal feature modulation that leverages thermal structure to guide visible synthesis, (ii) a learnable, modality‑specific geometry adjustment that resolves depth and occlusion inconsistencies, and (iii) a hybrid explicit‑implicit rendering strategy that preserves high‑frequency detail without sacrificing real‑time speed. By reconciling shared latent representations with modality‑specific adjustments, ThermoSplat advances the state of the art in multi‑spectral neural rendering, opening pathways for robust perception in autonomous driving, robotics, and remote sensing under challenging lighting and weather conditions.
Comments & Academic Discussion
Loading comments...
Leave a Comment