Targetless LiDAR-Camera Calibration with Neural Gaussian Splatting
Accurate LiDAR-camera calibration is crucial for multi-sensor systems. However, traditional methods often rely on physical targets, which are impractical for real-world deployment. Moreover, even carefully calibrated extrinsics can degrade over time due to sensor drift or external disturbances, necessitating periodic recalibration. To address these challenges, we present a Targetless LiDAR-Camera Calibration (TLC-Calib) that jointly optimizes sensor poses with a neural Gaussian-based scene representation. Reliable LiDAR points are frozen as anchor Gaussians to preserve global structure, while auxiliary Gaussians prevent local overfitting under noisy initialization. Our fully differentiable pipeline with photometric and geometric regularization achieves robust and generalizable calibration, consistently outperforming existing targetless methods on the KITTI-360, Waymo, and Fast-LIVO2 datasets. In addition, it yields more consistent Novel View Synthesis results, reflecting improved extrinsic alignment. The project page is available at: https://www.haebeom.com/tlc-calib-site/.
💡 Research Summary
**
The paper introduces TLC‑Calib, a target‑free LiDAR‑camera calibration framework that jointly optimizes sensor extrinsics and a neural scene representation based on 3D Gaussian Splatting (3DGS). Traditional calibration methods rely on physical targets (checkerboards, reflective spheres) which are costly and impractical for large‑scale or long‑term deployments. Even target‑based calibrations degrade over time due to mechanical vibrations, thermal expansion, or impacts, necessitating periodic recalibration. Existing target‑less approaches either exploit geometric cues (planes, edges) that struggle with LiDAR sparsity, or use deep networks that require large labeled datasets and often fail to generalize.
TLC‑Calib addresses these limitations by treating the LiDAR as a globally referenced sensor and building a differentiable scene model composed of two types of Gaussians:
- Anchor Gaussians – directly instantiated at reliable LiDAR points after adaptive voxel‑based down‑sampling. Their positions are frozen throughout training, preserving global scale and overall geometry.
- Auxiliary Gaussians – a learned set of K offsets per anchor, generated by a lightweight MLP conditioned on anchor features, viewing direction, and a learned scale. These Gaussians are free to move and adapt, providing local flexibility and allowing the representation to cover regions not observed by LiDAR (e.g., sky, upper building facades).
The scene is rendered via differentiable rasterization: each Gaussian is projected onto the image plane as a 2‑D Gaussian, sorted front‑to‑back, and composited using alpha blending. Because the projection depends analytically on the camera pose, gradients of a photometric loss can be back‑propagated directly to the extrinsic parameters. The authors adopt a per‑image “rig optimization” strategy: at each iteration a random image (camera c, timestamp t) is sampled, its photometric error with respect to the ground‑truth image is computed, and the gradient updates the shared extrinsic T_ec for that camera. This yields fast convergence without the memory overhead of batch accumulation.
Two regularization mechanisms further stabilize training:
- Adaptive voxel control dynamically selects voxel size ε* based on scene scale, automatically balancing the number of Gaussians against computational cost.
- Gaussian scale regularization penalizes overly anisotropic Gaussians, preventing them from dominating the loss landscape.
The total loss combines the photometric term with geometric constraints (depth consistency, plane alignment) to enforce multi‑modal coherence.
Experiments on three real‑world datasets—KITTI‑360, Waymo, and a handheld solid‑state LiDAR setup (Fast‑LIVO2)—demonstrate that TLC‑Calib consistently outperforms prior target‑less methods (CalibAnything, INF, RobustCalib, 3DGS‑Calib). On KITTI‑360, the method achieves a 100 % success rate (defined as ≤1° rotation and ≤20 cm translation error) across diverse motion patterns (straight, zig‑zag, large rotations), with mean rotation error 0.13° and translation error 8.86 cm. Training time is dramatically reduced to roughly 0.15 hours (≈9 minutes), compared with >1 hour for 3DGS‑Calib and >4 hours for deep‑learning baselines, making the approach suitable for on‑vehicle or on‑robot recalibration.
Ablation studies confirm the importance of each component: removing auxiliary Gaussians collapses the loss surface into sharp local minima; fixing all Gaussians (no auxiliary) limits the method to LiDAR‑observed regions and degrades novel view synthesis; disabling adaptive voxel control inflates memory usage without accuracy gains.
In summary, TLC‑Calib provides a practical, accurate, and computationally efficient solution for LiDAR‑camera extrinsic calibration without any physical markers. By anchoring the scene to reliable LiDAR geometry while allowing learned auxiliary structures to fill gaps, the method achieves robust convergence even from noisy initial poses. The framework is readily extensible to other sensor modalities (radar, ultrasonic) and could be integrated with online SLAM pipelines for continuous, real‑time calibration in autonomous driving, robotics, and AR/VR applications.
Comments & Academic Discussion
Loading comments...
Leave a Comment