LiFlow: Flow Matching for 3D LiDAR Scene Completion
In autonomous driving scenarios, the collected LiDAR point clouds can be challenged by occlusion and long-range sparsity, limiting the perception of autonomous driving systems. Scene completion methods can infer the missing parts of incomplete 3D LiDAR scenes. Recent methods adopt local point-level denoising diffusion probabilistic models, which require predicting Gaussian noise, leading to a mismatch between training and inference initial distributions. This paper introduces the first flow matching framework for 3D LiDAR scene completion, improving upon diffusion-based methods by ensuring consistent initial distributions between training and inference. The model employs a nearest neighbor flow matching loss and a Chamfer distance loss to enhance both local structure and global coverage in the alignment of point clouds. LiFlow achieves state-of-the-art performance across multiple metrics. Code: https://github.com/matteandre/LiFlow.
💡 Research Summary
LiFlow addresses the long‑standing problem of incomplete LiDAR point clouds in autonomous driving, where occlusion and long‑range sparsity leave large gaps that hinder downstream perception tasks. While recent scene‑completion approaches have adopted point‑level denoising diffusion probabilistic models (DDPMs), these methods suffer from a fundamental mismatch: during training the model learns to denoise from a Gaussian‑perturbed version of a complete scene, whereas at inference the initial noisy point cloud is constructed from a single sparse LiDAR scan. This discrepancy is especially detrimental in far‑range or heavily occluded regions, leading to sub‑optimal reconstruction quality.
The authors propose the first flow‑matching (FM) formulation for 3D LiDAR scene completion. FM recasts the diffusion process as a continuous‑time ordinary differential equation (ODE), removing the need for a Gaussian assumption and allowing the same initial distribution to be used both in training and inference. By leveraging data‑dependent couplings, the target complete scene (G) and the noisy initial cloud (P_T) are linked through a linear interpolation path (\phi_C^t = t,G + (1-t),P_T) with corresponding velocity field (v_C^t = G - P_T).
Because point clouds lack a natural one‑to‑one correspondence, the authors introduce two complementary losses. The first, Nearest‑Neighbor Flow Matching (NFM), establishes point‑wise correspondences by assigning each noisy point its nearest neighbor in the target cloud. This yields a conditional flow (\phi_N^t) and velocity (v_N^t) that respect the nearest‑neighbor geometry, and the NFM loss penalizes the L2 deviation between the predicted vector field and the true nearest‑neighbor displacement. NFM preserves local structure while learning a smooth transformation from the noisy to the complete distribution.
The second loss, Chamfer Distance Matching (CDM), mitigates the many‑to‑one mapping issue inherent in NFM (multiple noisy points may map to the same target point). CDM directly minimizes the Chamfer distance between the transformed point set and the ground‑truth scene, encouraging the generated cloud to spread out and achieve full occupancy of the target geometry. By combining NFM (which enforces local fidelity) with CDM (which enforces global coverage), LiFlow balances fine‑grained detail and overall completeness.
Architecturally, LiFlow adopts the MinkUNet backbone used in prior LiDAR diffusion works, replacing batch normalization with instance normalization to better handle the variability of point clouds. Training employs classifier‑free guidance: the model learns both unconditional and conditional vector fields, and at inference the conditional field is obtained by a weighted combination of the two. The overall loss is (\mathcal{L}= \lambda_{NFM}\mathcal{L}{NFM}+ \lambda{CDM}\mathcal{L}{CDM}) with (\lambda{NFM}=1) and (\lambda_{CDM}=0.1) in the reported experiments.
Experiments are conducted on SemanticKITTI (training sequences 00‑07, 09‑10; validation 08) and the Apollo Columbia Park dataset. Training runs for 20 epochs on an NVIDIA A100, using farthest point sampling to obtain 18 k input points and 180 k target points (a 10× upsampling factor). Inference integrates the learned ODE with Euler’s method over 10 steps (step size 0.1) and applies a refinement network to further upsample the output by a factor of six.
Evaluation metrics include Chamfer Distance (CD), Jensen‑Shannon Divergence (JSD), and Voxel IoU at three resolutions (0.5 m, 0.2 m, 0.1 m). LiFlow consistently outperforms prior diffusion‑based methods (LiDiff, LiDPM) and voxel‑based approaches (LMSCNet, LODE, MID, PVD). With refinement, LiFlow achieves the lowest CD (≈0.023), the best JSD, and the highest Voxel IoU across all resolutions, demonstrating superior reconstruction fidelity and occupancy accuracy.
The paper’s contributions are threefold: (1) introducing the first flow‑matching framework for 3D LiDAR scene completion, (2) solving the initial‑distribution mismatch via nearest‑neighbor flow matching and Chamfer distance losses, and (3) establishing new state‑of‑the‑art performance on multiple benchmarks. The work opens avenues for more sophisticated conditional flows, real‑time lightweight variants, and multimodal extensions (e.g., LiDAR‑camera fusion) in future research.
Comments & Academic Discussion
Loading comments...
Leave a Comment