Thermal odometry and dense mapping using learned odometry and Gaussian splatting

Thermal odometry and dense mapping using learned odometry and Gaussian splatting
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Thermal infrared sensors, with wavelengths longer than smoke particles, can capture imagery independent of darkness, dust, and smoke. This robustness has made them increasingly valuable for motion estimation and environmental perception in robotics, particularly in adverse conditions. Existing thermal odometry and mapping approaches, however, are predominantly geometric and often fail across diverse datasets while lacking the ability to produce dense maps. Motivated by the efficiency and high-quality reconstruction ability of recent Gaussian Splatting (GS) techniques, we propose TOM-GS, a thermal odometry and mapping method that integrates learning-based odometry with GS-based dense mapping. TOM-GS is among the first GS-based SLAM systems tailored for thermal cameras, featuring dedicated thermal image enhancement and monocular depth integration. Extensive experiments on motion estimation and novel-view rendering demonstrate that TOM-GS outperforms existing learning-based methods, confirming the benefits of learning-based pipelines for robust thermal odometry and dense reconstruction.


💡 Research Summary

The paper introduces TOM‑GS, a novel thermal‑only SLAM framework that unites learning‑based odometry with 3D Gaussian Splatting (GS) for dense reconstruction. Recognizing that thermal cameras output high‑dynamic‑range 14‑ or 16‑bit images with low texture and contrast, the authors first apply an adaptive image‑enhancement module based on Fieldscale to convert raw thermal data into 8‑bit grayscale suitable for existing deep networks. A pretrained monocular depth predictor (e.g., Depth Anything, ZoeDepth, Metric3D) supplies an absolute depth prior, which is later aligned with the relative depth estimates produced by a DROID‑SLAM‑derived module.

The odometry core, named Thermal Infrared Odometry (TIO), adapts DROID‑SLAM’s ConvGRU‑based optical‑flow prediction, Dense Bundle Adjustment (DBA), and introduces a DSO‑style depth‑scale‑optimization layer. This layer jointly refines camera poses, dense inverse‑depth maps, and an affine transformation that reconciles the monocular depth prior with DROID‑SLAM’s depth, thereby mitigating the inherent scale ambiguity of monocular thermal SLAM. Keyframes are selected based on mean optical‑flow magnitude and organized in a co‑visibility graph; older edges are pruned to keep computation tractable.

For mapping, the refined poses and depth maps initialize a set of 3D Gaussians. Each Gaussian stores grayscale intensity, opacity, 3‑D position, and a scaling matrix defining its covariance. The authors deliberately disable pose refinement during GS optimization to preserve consistency with the odometry output. Rendering follows the standard GS pipeline: Gaussians are sorted by depth, projected onto the image plane, and composited using front‑to‑back volumetric alpha blending within 16 × 16 tiles for efficiency. The loss function combines a photometric term, a structural similarity term (SSIM), and a depth term, weighted empirically.

Extensive experiments on two public thermal SLAM benchmarks—RRXIO and VIVID—show that TOM‑GS outperforms prior learning‑based methods (e.g., DROID‑SLAM, GLORIE‑SLAM) in both trajectory accuracy (ATE, RPE) and novel‑view rendering quality (PSNR, SSIM). Ablation studies reveal that (1) removing the Fieldscale enhancement drastically degrades performance, (2) omitting the monocular depth prior increases high‑error pixel ratios, and (3) enabling GS pose refinement introduces inconsistencies with the odometry, confirming the design choices.

The contributions are threefold: (1) a thermal‑specific image‑enhancement and depth‑prior integration that makes off‑the‑shelf deep models usable on thermal data, (2) an extended DROID‑SLAM pipeline that resolves scale ambiguity via joint depth‑scale optimization, and (3) the first application of 3D Gaussian Splatting to thermal SLAM, delivering high‑quality dense maps and novel‑view synthesis. The work establishes a strong baseline for future research on dense, robust perception in visually degraded environments, and opens avenues for real‑time implementations, multimodal sensor fusion, and large‑scale outdoor deployments.


Comments & Academic Discussion

Loading comments...

Leave a Comment