Uncertainty-Aware 4D Gaussian Splatting for Monocular Occluded Human Rendering
High-fidelity rendering of dynamic humans from monocular videos typically degrades catastrophically under occlusions. Existing solutions incorporate external priors-either hallucinating missing content via generative models, which induces severe temporal flickering, or imposing rigid geometric heuristics that fail to capture diverse appearances. To this end, we reformulate the task as a Maximum A Posteriori estimation problem under heteroscedastic observation noise. In this paper, we propose U-4DGS, a framework integrating a Probabilistic Deformation Network and a Double Rasterization pipeline. This architecture renders pixel-aligned uncertainty maps that act as an adaptive gradient modulator, automatically attenuating artifacts from unreliable observations. Furthermore, to prevent geometric drift in regions lacking reliable visual cues, we enforce Confidence-Aware Regularizations, which leverage the learned uncertainty to selectively propagate spatial-temporal validity. Extensive experiments on ZJU-MoCap and OcMotion demonstrate that U-4DGS achieves SOTA rendering fidelity and robustness.
💡 Research Summary
The paper tackles the long‑standing problem of rendering dynamic humans from monocular video when the subject is partially occluded. While recent advances in Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) have enabled photorealistic avatars under full visibility, they break down dramatically in the presence of occlusions because they treat every pixel as equally reliable. To overcome this, the authors reformulate the reconstruction as a Maximum‑A‑Posteriori (MAP) estimation problem that explicitly models heteroscedastic observation noise. Each pixel’s likelihood is assumed to follow a Laplacian distribution whose scale parameter σ represents aleatoric uncertainty. This σ is learned jointly with the canonical geometry and acts as an inverse weight in the loss, automatically down‑weighting gradients from unreliable (occluded) regions.
The core of the proposed system, called U‑4DGS (Uncertainty‑Aware 4D Gaussian Splatting), consists of three tightly coupled components. First, a Probabilistic Deformation Network takes a time embedding and SMPL pose as input and predicts per‑primitive deformation offsets (Δr, Δµ, Δs) together with a per‑primitive uncertainty σᵢ. By attaching σᵢ to each Gaussian, the network quantifies the confidence of the temporal correspondence for that primitive. Second, a Double Rasterization pipeline renders the deformed Gaussians twice: one branch produces the usual photometric image, while the other accumulates the per‑primitive uncertainties into a pixel‑aligned Uncertainty Map Ũ. Bright regions in Ũ indicate high uncertainty (i.e., likely occlusion). Third, the MAP objective incorporates Ũ as an adaptive gradient modulator: the photometric loss becomes L_NLL = Σ_u
Comments & Academic Discussion
Loading comments...
Leave a Comment