Fréchet Radiomic Distance (FRD): A Versatile Metric for Comparing Medical Imaging Datasets

Fréchet Radiomic Distance (FRD): A Versatile Metric for Comparing Medical Imaging Datasets
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Determining whether two sets of images belong to the same or different distributions or domains is a crucial task in modern medical image analysis and deep learning; for example, to evaluate the output quality of image generative models. Currently, metrics used for this task either rely on the (potentially biased) choice of some downstream task, such as segmentation, or adopt task-independent perceptual metrics (e.g., Fréchet Inception Distance/FID) from natural imaging, which we show insufficiently capture anatomical features. To this end, we introduce a new perceptual metric tailored for medical images, FRD (Fréchet Radiomic Distance), which utilizes standardized, clinically meaningful, and interpretable image features. We show that FRD is superior to other image distribution metrics for a range of medical imaging applications, including out-of-domain (OOD) detection, the evaluation of image-to-image translation (by correlating more with downstream task performance as well as anatomical consistency and realism), and the evaluation of unconditional image generation. Moreover, FRD offers additional benefits such as stability and computational efficiency at low sample sizes, sensitivity to image corruptions and adversarial attacks, feature interpretability, and correlation with radiologist-perceived image quality. Additionally, we address key gaps in the literature by presenting an extensive framework for the multifaceted evaluation of image similarity metrics in medical imaging – including the first large-scale comparative study of generative models for medical image translation – and release an accessible codebase to facilitate future research. Our results are supported by thorough experiments spanning a variety of datasets, modalities, and downstream tasks, highlighting the broad potential of FRD for medical image analysis.


💡 Research Summary

The paper introduces Fréchet Radiomic Distance (FRD), a novel metric designed specifically for comparing distributions of medical images. Existing metrics such as Fréchet Inception Distance (FID) and its medical‑adapted variant RadiologyFID (RadFID) rely on features learned from natural images, which fail to capture the anatomical and textural nuances critical in radiology. FRD addresses this gap by leveraging a comprehensive set of handcrafted radiomic features—first‑order statistics, gray‑level co‑occurrence, run‑length, and size‑zone matrices—augmented with frequency‑domain representations obtained through wavelet or Fourier filtering. Unlike the min‑max scaling used in the earlier FRD v0, the current version applies robust z‑score normalization across both datasets using a shared reference distribution, ensuring comparability and resistance to outliers.

The authors compute the Fréchet distance between the multivariate Gaussian approximations of the normalized radiomic feature distributions for two image sets (e.g., real vs. generated). This yields a scalar distance that reflects both global intensity characteristics and fine‑grained texture patterns relevant to clinical interpretation.

Extensive experiments span ten diverse datasets covering breast, brain, spine, and abdominal imaging across multiple modalities (MRI, CT) and scanner vendors. Three primary application scenarios are evaluated: (1) out‑of‑domain (OOD) detection, (2) image‑to‑image translation quality assessment, and (3) unconditional image generation evaluation. In OOD detection, FRD consistently outperforms FID, KID, and RadFID in AUC, accuracy, and sensitivity, especially when sample sizes are limited (≤100 images). For translation models (e.g., MRI→CT, T1→T2, inter‑scanner), lower FRD scores correlate strongly with higher downstream performance on segmentation, classification, and lesion detection tasks, demonstrating that FRD captures clinically relevant domain alignment. In generative model benchmarking, FRD shows the highest Pearson correlation (≈0.78) with radiologists’ subjective quality ratings, surpassing all competing metrics.

Additional analyses probe robustness to image corruptions (Gaussian noise, blur, compression artifacts) and adversarial attacks (FGSM, PGD). FRD exhibits heightened sensitivity, detecting subtle degradations earlier than FID or RadFID, and it successfully distinguishes adversarially perturbed images, highlighting its potential for quality control in clinical pipelines.

A major contribution is the release of a unified evaluation framework and open‑source Python library that automates radiomic extraction, normalization, and Fréchet distance computation, facilitating reproducible research. The paper also provides a standardized OOD detection protocol and a large‑scale comparative study of generative models for medical image translation—both firsts in the field.

In summary, FRD combines interpretability (through clinically meaningful radiomic features), stability on small datasets, computational efficiency, and strong alignment with downstream task performance and expert perception. These properties make FRD a compelling candidate to become the de‑facto standard for assessing distribution similarity in medical imaging research and practice.


Comments & Academic Discussion

Loading comments...

Leave a Comment