SurfSplat: Conquering Feedforward 2D Gaussian Splatting with Surface Continuity Priors
Reconstructing 3D scenes from sparse images remains a challenging task due to the difficulty of recovering accurate geometry and texture without optimization. Recent approaches leverage generalizable models to generate 3D scenes using 3D Gaussian Splatting (3DGS) primitive. However, they often fail to produce continuous surfaces and instead yield discrete, color-biased point clouds that appear plausible at normal resolution but reveal severe artifacts under close-up views. To address this issue, we present SurfSplat, a feedforward framework based on 2D Gaussian Splatting (2DGS) primitive, which provides stronger anisotropy and higher geometric precision. By incorporating a surface continuity prior and a forced alpha blending strategy, SurfSplat reconstructs coherent geometry together with faithful textures. Furthermore, we introduce High-Resolution Rendering Consistency (HRRC), a new evaluation metric designed to evaluate high-resolution reconstruction quality. Extensive experiments on RealEstate10K, DL3DV, and ScanNet demonstrate that SurfSplat consistently outperforms prior methods on both standard metrics and HRRC, establishing a robust solution for high-fidelity 3D reconstruction from sparse inputs. Project page: https://hebing-sjtu.github.io/SurfSplat-website/
💡 Research Summary
SurfSplat tackles the long‑standing problem of reconstructing high‑fidelity 3D scenes from only a few input images without per‑scene optimization. While recent feed‑forward approaches have adopted 3D Gaussian Splatting (3DGS) as a generalizable representation, they typically produce sparse, color‑biased point clouds that lack surface continuity. The artifacts are often invisible at normal resolution but become glaring under close‑up or off‑axis views, and standard NVS metrics (PSNR, SSIM, LPIPS) fail to capture these failures.
The authors propose a fundamentally different pipeline: they replace 3DGS with 2D Gaussian Splatting (2DGS), which represents each pixel as an anisotropic 2‑D Gaussian (a splat) projected onto the image plane. 2DGS offers stronger anisotropy and higher geometric precision, but directly regressing its parameters is unstable because geometry and appearance are tightly coupled. To stabilize training, SurfSplat introduces two key mechanisms:
-
Surface Continuity Prior – The prior assumes that neighboring pixels on a coherent surface correspond to neighboring surfels in 3D. By applying Sobel filters to the 3‑D positions of a pixel and its immediate neighbors, the method estimates a local surface normal, computes a rotation matrix that aligns the canonical normal to this estimate (via Rodrigues’ formula), and derives anisotropic scales from image‑space distances. The network only predicts scale multipliers, which are bounded and multiplied with the coarse estimates, ensuring that rotation and scale vary smoothly across the surface.
-
Forced Alpha Blending – During training the opacity (α) of each Gaussian is deliberately reduced, forcing even occluded or low‑visibility Gaussians to contribute to the rendered image. This prevents the network from collapsing α to zero (which would hide surface holes) and reduces color bias caused by overly dominant Gaussians.
The architecture is a dual‑path encoder. A single‑view branch uses a pretrained monocular depth model (Depth Anything V2) to provide dense depth cues, while a multi‑view branch builds low‑resolution feature maps, applies self‑ and cross‑attention, and constructs cost volumes via plane‑sweep stereo. The concatenated features are fed into a 2‑D U‑Net that predicts intermediate attributes (depth, scale multipliers, spherical‑harmonics coefficients, opacity). These intermediates are then transformed into final Gaussian parameters using the surface continuity prior and forced alpha blending. All components are fully differentiable, enabling end‑to‑end training from sparse image sets (as few as two views).
To evaluate the quality of reconstructions beyond traditional metrics, the paper introduces High‑Resolution Rendering Consistency (HRRC). HRRC renders the predicted scene at high resolutions (e.g., 4K) from a dense set of novel viewpoints and measures consistency of the rendered images. Because HRRC uses only the existing datasets and does not require additional ground‑truth geometry, it can expose hidden defects such as surface discontinuities, voids, and color bleed that are invisible at low resolution.
Experiments on RealEstate10K, DL3DV, and ScanNet demonstrate that SurfSplat consistently outperforms prior feed‑forward 3DGS‑based methods (PixelSplat, MVSplat, FreeSplat, etc.) on both standard NVS metrics and HRRC. Qualitatively, SurfSplat’s reconstructions show far fewer holes, smoother surfaces, and more faithful textures when examined under close‑up or off‑axis views. Quantitatively, it achieves higher PSNR/SSIM/LPIPS scores and a substantial improvement in HRRC, confirming that the surface continuity prior and forced alpha blending effectively regularize the 2DGS representation. Runtime remains in the millisecond range, preserving the real‑time capability of feed‑forward pipelines.
In summary, SurfSplat advances the state of the art in generalizable 3D reconstruction by (i) leveraging the higher geometric fidelity of 2D Gaussian splats, (ii) enforcing smooth, physically plausible surface geometry through a novel continuity prior, (iii) mitigating opacity collapse with forced alpha blending, and (iv) providing a new high‑resolution evaluation metric (HRRC) that better reflects real‑world visual quality. The method delivers continuous, high‑resolution 3D scenes from sparse inputs without per‑scene optimization, opening the door for scalable applications in VR/AR, gaming, and digital content creation.
Comments & Academic Discussion
Loading comments...
Leave a Comment