Single-Slice-to-3D Reconstruction in Medical Imaging and Natural Objects: A Comparative Benchmark with SAM 3D
While three-dimensional imaging is essential for clinical diagnosis, its high cost and long wait times have motivated the use of image-to-3D foundation models to infer volume from two-dimensional modalities. However, because these models are trained on natural images, their learned geometric priors struggle to transfer to inherently planar medical data. A benchmark of five state-of-the-art models (SAM3D, Hunyuan3D-2.1, Direct3D, Hi3DGen, and TripoSG) across six medical and two natural datasets revealed that voxel-based overlap remains uniformly low across all methods due to severe depth ambiguity from single-slice inputs. Despite this fundamental volumetric failure, global distance metrics indicate that SAM3D best captures topological similarity to ground-truth medical shapes, whereas alternative models are prone to oversimplification. Ultimately, these findings quantify the limits of zero-shot single-slice 3D inference, highlighting that reliable medical 3D reconstruction requires domain-specific adaptation and anatomical constraints to overcome complex medical geometries.
💡 Research Summary
This paper presents a comprehensive benchmark of zero‑shot single‑slice‑to‑3D reconstruction using five state‑of‑the‑art image‑to‑3D foundation models—SAM3D, Hunyuan3D‑2.1, Direct3D, Hi3DGen, and TripoSG—across six medical datasets (three anatomical: AeroPath, BTCV, Duke C‑Spine; three pathological: MSD Lung, MSD Brain, MSD Liver) and two natural‑object datasets (Google Scanned Objects, Animal3D). The authors follow a uniform pipeline: a middle slice is extracted from each 3D volume, masked by its segmentation, and fed to each model as a 2D input. Models output either voxel grids or point clouds, which are evaluated against ground‑truth surface point clouds using voxel‑based overlap metrics (F1, Voxel IoU, Voxel Dice) and global shape distance metrics (Chamfer Distance, Earth Mover’s Distance).
The results reveal a stark dichotomy. Voxel‑based metrics are uniformly low for all models on medical data: F1 scores hover below 0.10, Voxel IoU below 0.16, and Voxel Dice below 0.26, regardless of architecture. This failure is attributed to the intrinsic lack of depth cues (shading gradients, occlusion boundaries, multi‑object relationships) in a single CT or MRI slice, which leaves the depth inference problem severely under‑constrained. Consequently, all methods produce near‑planar reconstructions with minimal volumetric extent. The problem is exacerbated for pathological datasets, whose irregular, non‑convex morphologies deviate sharply from the smooth, compact priors learned from natural images, leading to even poorer voxel overlap.
In contrast, global shape metrics expose meaningful performance hierarchies. SAM3D consistently achieves the lowest Chamfer Distance and Earth Mover’s Distance across all medical datasets, indicating that it captures the overall point‑cloud distribution and coarse morphology better than its peers, even when fine‑grained depth is inaccurate. Hi3DGen follows closely in several cases, while TripoSG and Hunyuan3D‑2.1 sometimes produce near‑zero voxel overlap and higher distance values, reflecting a “depth collapse” failure mode. The authors also observe a strong dependence on the input plane: the same model can yield markedly different results when fed a coronal versus an axial slice, underscoring that the silhouette information provided by the chosen plane directly modulates reconstruction fidelity.
Natural‑object datasets tell a complementary story. Voxel‑based scores are substantially higher (e.g., SAM3D reaches IoU ≈ 0.18 and Dice ≈ 0.29 on GSO), and all models achieve lower Chamfer and EMD values (≈ 0.15–0.40) than on medical data. This confirms that the rich texture, shading, and multi‑object context of natural images supply the depth cues that foundation models rely on. On Animal3D, the competitive landscape shifts: TripoSG and Hi3DGen lead voxel metrics, while SAM3D remains strong on distance metrics, highlighting dataset‑specific strengths.
The discussion synthesizes these findings into actionable insights. First, the uniformly low voxel overlap demonstrates a fundamental ceiling for single‑slice medical reconstruction using off‑the‑shelf foundation models; depth ambiguity cannot be resolved without additional information. Second, the superior global‑shape performance of SAM3D suggests that point‑cloud‑centric architectures retain useful geometric priors even under severe depth uncertainty. Third, the pronounced gap between anatomical and pathological targets indicates that models trained on predominantly smooth, convex objects struggle with the complex topology of tumors and lesions.
To move toward clinically viable 3D reconstruction, the authors propose three complementary strategies: (1) multi‑view or multi‑plane input to supply missing depth cues; (2) incorporation of anatomical priors or physics‑based constraints (e.g., organ‑specific shape models, volume regularization) to guide the generative process; and (3) domain‑specific fine‑tuning or adaptation of foundation models on annotated medical data to bridge the natural‑to‑medical domain gap. Without such interventions, zero‑shot single‑slice reconstruction remains limited to coarse shape estimation rather than precise volumetric fidelity.
In summary, this benchmark quantifies the limits of current zero‑shot single‑slice‑to‑3D methods, demonstrates that SAM3D offers the best global shape fidelity among the evaluated models, and outlines clear research directions—multi‑view integration, anatomical constraints, and domain adaptation—necessary to achieve reliable, high‑quality 3D reconstructions for medical applications.
Comments & Academic Discussion
Loading comments...
Leave a Comment