Neural Implicit 3D Cardiac Shape Reconstruction from Sparse CT Angiography Slices Mimicking 2D Transthoracic Echocardiography Views

Neural Implicit 3D Cardiac Shape Reconstruction from Sparse CT Angiography Slices Mimicking 2D Transthoracic Echocardiography Views
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Accurate 3D representations of cardiac structures allow quantitative analysis of anatomy and function. In this work, we propose a method for reconstructing complete 3D cardiac shapes from segmentations of sparse planes in CT angiography (CTA) for application in 2D transthoracic echocardiography (TTE). Our method uses a neural implicit function to reconstruct the 3D shape of the cardiac chambers and left-ventricle myocardium from sparse CTA planes. To investigate the feasibility of achieving 3D reconstruction from 2D TTE, we select planes that mimic the standard apical 2D TTE views. During training, a multi-layer perceptron learns shape priors from 3D segmentations of the target structures in CTA. At test time, the network reconstructs 3D cardiac shapes from segmentations of TTE-mimicking CTA planes by jointly optimizing the latent code and the rigid transforms that map the observed planes into 3D space. For each heart, we simulate four realistic apical views, and we compare reconstructed multi-class volumes with the reference CTA volumes. On a held-out set of CTA segmentations, our approach achieves an average Dice coefficient of 0.86 $\pm$ 0.04 across all structures. Our method also achieves markedly lower volume errors than the clinical standard, Simpson’s biplane rule: 4.88 $\pm$ 4.26 mL vs. 8.14 $\pm$ 6.04 mL, respectively, for the left ventricle; and 6.40 $\pm$ 7.37 mL vs. 37.76 $\pm$ 22.96 mL, respectively, for the left atrium. This suggests that our approach offers a viable route to more accurate 3D chamber quantification in 2D transthoracic echocardiography.


💡 Research Summary

Background
Three‑dimensional (3D) transthoracic echocardiography (TTE) offers detailed anatomical insight, but its limited spatial‑temporal resolution forces clinicians to rely on two‑dimensional (2D) acquisitions. Quantitative assessment of chamber size and function therefore depends on geometric assumptions such as Simpson’s biplane rule, which can produce systematic under‑estimation when the ultrasound plane is misaligned with the true cardiac apex. Recent works have attempted to reconstruct 3D cardiac geometry from 2D echo by estimating view pose and fitting statistical shape models, graph neural networks, or neural implicit functions. However, most of these studies train on synthetic shapes, and many require updating all network weights at test time, risking catastrophic forgetting of the learned shape prior.

Data
The authors used a real‑world dataset of 452 patients who underwent ECG‑gated cardiac CT angiography (CTA) for stroke work‑up. From these scans, a previously validated deep‑learning model produced segmentations of five cardiac structures (left atrium, left ventricle blood pool, right atrium, right ventricle, and left‑ventricular myocardium) at end‑diastole. After visual quality control, 153 cases remained. All volumes share the same anatomical orientation and voxel size (512 × 512 × slice).

Method Overview

  1. Shape Prior Learning – An auto‑decoder‑style neural implicit function fθ maps a 3‑D coordinate x∈ℝ³ together with a 128‑dimensional latent code z∈ℝ¹²⁸ to a six‑class softmax occupancy vector (background + five cardiac structures). The MLP consists of eight hidden layers (width 128) with skip connections. For each training subject a unique latent vector is learned jointly with the network weights, using a loss that combines categorical cross‑entropy, multi‑class Dice, and an L2 regularizer on z (λ = 1e‑4). Training runs for 1,800 epochs, batch size 8, sampling 64 random points per volume per iteration.

  2. Simulated TTE Views – From each full‑heart CTA segmentation the authors compute centers of mass for the chambers and locate the apex as the left‑ventricular voxel farthest from the left‑atrial CoM. Using this information they define a canonical long‑axis direction (e_v) and a short‑axis direction (e_u). Four standard apical views are generated: A4C (e_u, e_v), A3C (45° rotation of e_u about e_v), A2C (90° rotation), and A5C (5° superior tilt of A4C). Each view is rendered as a 256 × 256 pixel mask covering a physical extent of 1.5 × |a‑CoM_LA|, with nearest‑neighbor labeling from the 3‑D segmentation.

  3. Test‑Time Optimization – To mimic the uncertainty of real free‑hand echo, Gaussian noise (σ = 5 mm) is added to the landmarks used for slicing, perturbing the initial view poses. During inference the latent code z and a rigid transformation (axis‑angle rotation α∈ℝ³ and translation t∈ℝ³) for each view are jointly optimized. The transformed pixel coordinates are given by x_ij(α,t) = (a + t) + α_ij R(α) e_u + β_ij R(α) e_v, where R(α) is obtained via Rodrigues’ formula. The loss again combines cross‑entropy and Dice, with the same L2 regularization on z. Optimization proceeds in two phases: the first 100 steps update only z, the remaining 900 steps update both z and the pose parameters using Adam (learning rate 1e‑2). After convergence, a dense 3‑D grid matching the reference volume is queried; the class with highest softmax probability is taken as the final label.

Experiments
The dataset was split into 100 training, 13 validation, and 40 test subjects. Four experimental conditions were evaluated: (i) realistic setting with perturbed poses and joint latent‑pose optimization, (ii) ablation where poses are fixed (latent‑only), (iii) ideal setting with unperturbed (true) poses and latent‑only optimization, and (iv) comparison with Simpson’s biplane rule using only the A2C and A4C views.

Results

  • Joint latent‑pose (perturbed) achieved mean Dice = 0.86 ± 0.04 across all structures, average symmetric surface distance (ASSD) ≈ 1.4 mm, and left‑ventricle (LV) volume MAE = 4.88 ± 4.26 mL (Simpson = 8.14 ± 6.04 mL). Left‑atrial (LA) volume MAE = 6.40 ± 7.37 mL (Simpson = 37.76 ± 22.96 mL).
  • Latent‑only (perturbed) showed a drop in Dice (≈ 0.78–0.84) and a substantial increase in volume errors (LV MAE ≈ 9 mL, LA MAE ≈ 14 mL), confirming the importance of pose refinement.
  • Latent‑only (ideal poses) yielded the highest performance (LV Dice ≈ 0.93, LA Dice ≈ 0.90, LV MAE ≈ 2.8 mL), indicating that the current model is close to its theoretical ceiling when pose information is perfect.
  • Visualizations (Fig. 2) illustrate how initial misaligned slices become correctly oriented after optimization, leading to accurate 3‑D reconstructions that closely match the reference CTA. Heatmaps of surface distance further highlight the reduction of errors when pose optimization is enabled.

Discussion
The study demonstrates that a neural implicit representation trained on real CTA segmentations can serve as a powerful shape prior for reconstructing whole‑heart geometry from only a few 2‑D slices. Joint optimization of a per‑subject latent code and the rigid transforms of the input planes compensates for the inevitable pose uncertainty in free‑hand echo, delivering substantially better volumetric accuracy than the clinical Simpson’s biplane method. Limitations include reliance on synthetic TTE‑like masks rather than true ultrasound images, which may contain speckle, shadowing, and non‑linear intensity distortions. Future work should integrate actual TTE segmentations, explore more robust latent‑space optimization (e.g., meta‑learning or Bayesian approaches), and assess the benefit of adding extra views or 3‑D echo data.

Conclusion
By leveraging high‑resolution CTA‑derived shape priors and a test‑time latent‑plus‑pose optimization scheme, the authors achieve accurate 3‑D cardiac reconstructions from only four sparse 2‑D slices that mimic standard apical TTE views. The method outperforms the traditional Simpson’s biplane rule in both Dice overlap and volume error, and it approaches the performance ceiling attainable with perfect pose information. This work paves the way for more reliable 3‑D quantification in routine 2‑D transthoracic echocardiography, provided that future studies validate the pipeline on genuine ultrasound data and extend it to broader clinical scenarios.


Comments & Academic Discussion

Loading comments...

Leave a Comment