Investigating Deep Learning Models for Ejection Fraction Estimation from Echocardiography Videos

Reading time: 5 minute
...

📝 Original Info

  • Title: Investigating Deep Learning Models for Ejection Fraction Estimation from Echocardiography Videos
  • ArXiv ID: 2512.22657
  • Date: 2025-12-27
  • Authors: Shravan Saranyan, Pramit Saha

📝 Abstract

Left ventricular ejection fraction (LVEF) is a key indicator of cardiac function and plays a central role in the diagnosis and management of cardiovascular disease. Echocardiography, as a readily accessible and non-invasive imaging modality, is widely used in clinical practice to estimate LVEF. However, manual assessment of cardiac function from echocardiograms is time-consuming and subject to considerable inter-observer variability. Deep learning approaches offer a promising alternative, with the potential to achieve performance comparable to that of experienced human experts. In this study, we investigate the effectiveness of several deep learning architectures for LVEF estimation from echocardiography videos, including 3D Inception, two-stream, and CNN-RNN models. We systematically evaluate architectural modifications and fusion strategies to identify configurations that maximize prediction accuracy. Models were trained and evaluated on the EchoNet-Dynamic dataset, comprising 10,030 echocardiogram videos. Our results demonstrate that modified 3D Inception architectures achieve the best overall performance, with a root mean squared error (RMSE) of 6.79%. Across architectures, we observe a tendency toward overfitting, with smaller and simpler models generally exhibiting improved generalization. Model performance was also found to be highly sensitive to hyperparameter choices, particularly convolutional kernel sizes and normalization strategies. While this study focuses on echocardiography-based LVEF estimation, the insights gained regarding architectural design and training strategies may be applicable to a broader range of medical and non-medical video analysis tasks.

💡 Deep Analysis

Figure 1

📄 Full Content

Heart disease remains the leading cause of mortality worldwide. In 2023 alone, 910,032 deaths in the United States were attributed to cardiovascular disease, accounting for approximately one in every three deaths [1]. Cardiovascular disease encompasses a broad range of conditions, including coronary artery disease, vascular disease, arrhythmias, and congenital heart disease (CHD), either in isolation or in combination. These conditions, together with comorbidities such as hypertension, diabetes, anemia, and hyperthyroidism, can contribute to the development of heart failure (HF), a chronic syndrome characterized by impaired cardiac function and/or structural abnormalities [2,3].

Although HF can be life-threatening, survival rates improve substantially with early diagnosis, appropriate medical management, and mitigation of modifiable risk factors, including smoking, excessive alcohol consumption, poor diet, and chronic stress [2]. Consequently, accurate and timely assessment of cardiac function is critical for both prognosis and treatment planning.

The left ventricular ejection fraction (LVEF) represents the proportion of blood ejected from the left ventricle with each cardiac contraction and is calculated as the ratio of stroke volume to end-diastolic volume. LVEF is a central indicator of systolic cardiac performance and serves as a primary metric for HF classification. The European Society of Cardiology (ESC) categorizes HF into three phenotypes based on LVEF: HF with reduced ejection fraction (HFrEF, LVEF < 40%), HF with mildly reduced ejection fraction (HFmrEF, LVEF 40-49%), and HF with preserved ejection fraction (HFpEF, LVEF ≥ 50%) [4,5]. This distinction is clinically significant, as each phenotype is associated with distinct underlying mechanisms and therapeutic strategies. HFrEF is typically preceded by cardiomyocyte loss and has several effective pharmacological treatments, whereas HFpEF is often driven by chronic comorbidities and currently lacks effective disease-modifying therapies [5]. Both conditions can progress to cardiomyopathy and, ultimately, advanced HF, underscoring the importance of precise LVEF measurement.

Echocardiography, or cardiac ultrasound, is the most widely used imaging modality for assessing LVEF in clinical practice. It provides comprehensive information on cardiac chamber volumes and ventricular systolic and diastolic function, while remaining non-invasive, widely accessible, safe, and cost-effective. Transthoracic echocardiography (TTE) is the standard approach for LVEF estimation [4,6]. Although three-dimensional echocardiography offers the highest accuracy, two-dimensional echocardiography is more commonly available in routine clinical settings. In 2D echocardiography, LVEF is typically computed using the Modified Simpson’s method, which estimates ventricular volumes by summing a series of traced disks at end-diastole and end-systole [7]. However, this approach is labor-intensive, requires expert interpretation, and is subject to substantial inter-observer variability [8]. While numerous techniques have been proposed to address these limitations, recent advances in deep learning have introduced promising automated alternatives.

In 2020, Ouyang et al. addressed this challenge by introducing EchoNet-Dynamic, a video-based deep learning framework trained on large-scale echocardiography data to directly estimate LVEF from echocardiogram videos [8].

In this work, we systematically investigate deep learning approaches for estimating left ventricular ejection fraction from echocardiography videos. We evaluate and compare multiple architectural paradigms including 3D convolutional networks, two-stream models, and CNN-RNN frameworks using the large-scale EchoNet-Dynamic dataset. Through controlled experiments, we analyze the impact of architectural design choices, fusion strategies, and key hyperparameters on prediction accuracy and generalization performance. By identifying strengths and limitations across model families, this study aims to provide practical insights into the design of robust deep learning systems for automated cardiac function assessment, and to inform future research in video-based medical imaging analysis. [20]. The dataset comprises 10,030 apical four-chamber echocardiography videos that were preprocessed through cropping and masking to remove extraneous text and regions outside the ultrasound scanning sector. The resulting images were subsequently downsampled to a standardized resolution of 112 × 112 pixels using cubic interpolation. The dataset also includes expert-annotated segmentations at end-diastole and end-systole, which are used for ejection fraction (EF) calculation, as illustrated. While these segmentations were not used to train the models presented in this work, they are utilized by selected models described in the Appendix.

While numerous deep learning models have been proposed for LVEF estimation, comprehensive benchmarks systematical

📸 Image Gallery

cover.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut