InfoMotion: A Graph-Based Approach to Video Dataset Distillation for Echocardiography
Echocardiography plays a critical role in the diagnosis and monitoring of cardiovascular diseases as a non-invasive real-time assessment of cardiac structure and function. However, the growing scale of echocardiographic video data presents significant challenges in terms of storage, computation, and model training efficiency. Dataset distillation offers a promising solution by synthesizing a compact, informative subset of data that retains the key clinical features of the original dataset. In this work, we propose a novel approach for distilling a compact synthetic echocardiographic video dataset. Our method leverages motion feature extraction to capture temporal dynamics, followed by class-wise graph construction and representative sample selection using the Infomap algorithm. This enables us to select a diverse and informative subset of synthetic videos that preserves the essential characteristics of the original dataset. We evaluate our approach on the EchoNet-Dynamic datasets and achieve a test accuracy of (69.38%) using only (25) synthetic videos. These results demonstrate the effectiveness and scalability of our method for medical video dataset distillation.
💡 Research Summary
This paper introduces “InfoMotion,” a novel graph-based approach for distilling large echocardiography video datasets into compact, informative synthetic subsets. Dataset distillation aims to synthesize a small dataset that retains the essential information of the original large-scale data, addressing critical challenges in medical AI such as massive storage requirements, high computational costs for training, and data sharing restrictions due to privacy concerns.
The core innovation of InfoMotion lies in its tailored strategy for medical videos, which possess unique characteristics like high structural similarity across frames and critical temporal dynamics. The method operates in three key stages:
- Motion Feature Extraction: Recognizing that appearance alone is insufficient to distinguish echocardiogram videos, the method explicitly captures temporal dynamics. It employs an Inter-Frame Attention (IFA) model to extract motion features between consecutive End-Diastolic (ED) and End-Systolic (ES) frames. These motion feature vectors serve as distinctive signatures for each video, more closely related to cardiac function (e.g., Ejection Fraction) than static appearance.
- Class-wise Graph Construction: Videos are categorized into five clinical classes based on their Ejection Fraction (EF) values (e.g., Severe Dysfunction, Normal). For each class, a weighted graph is constructed where nodes represent the motion feature vectors of videos, and edge weights are the Euclidean distances between them. This creates a structured representation of the diversity within each clinical category.
- Representative Sample Selection via Infomap: The Infomap algorithm, grounded in information-theoretic principles (Minimum Description Length), is applied to each class-specific graph. Infomap effectively detects natural “communities” or clusters within the graph by modeling the flow of a random walker. From each discovered community, representative video nodes with high “modular centrality” (balancing intra- and inter-community influence) are uniformly selected. This process ensures the final distilled set is both diverse and comprehensively representative of the original data’s structure.
The method was evaluated on the EchoNet-Dynamic (real videos) and EchoNet-Synthetic (generated by a latent video diffusion model) datasets. The downstream task was EF regression, with predictions mapped to clinical classes for evaluation. Compared to strong baselines including random selection, K-means clustering on appearance features, and an image-based Infomap method (InfoDist), InfoMotion consistently achieved superior performance.
A key result demonstrates the method’s efficiency: using a distilled set of only 25 synthetic videos (5 videos per class, VPC=5), a model trained with InfoMotion achieved a test accuracy of 69.38% on the real EchoNet-Dynamic test set. Furthermore, acknowledging inter-clinician variability in EF assessment near class boundaries, the authors introduced a “soft evaluation” metric with a ±2% tolerance. Under this more clinically realistic setting, InfoMotion’s performance improved to 75.02% accuracy with VPC=5, approaching the baseline accuracy of 81.21% obtained by training on the entire real dataset. Notably, InfoMotion also showed significantly lower standard deviation across multiple runs compared to baselines, indicating greater stability and reliability.
In conclusion, InfoMotion presents the first dedicated dataset distillation framework for medical videos. By strategically leveraging explicit motion features and graph-based community detection with Infomap, it successfully produces ultra-compact synthetic datasets that preserve the crucial clinical characteristics of original echocardiography data, offering a promising solution for efficient and privacy-aware medical AI development.
Comments & Academic Discussion
Loading comments...
Leave a Comment