Automatic quality control in multi-centric fetal brain MRI super-resolution reconstruction

Automatic quality control in multi-centric fetal brain MRI super-resolution reconstruction
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Quality control (QC) has long been considered essential to guarantee the reliability of neuroimaging studies. It is particularly important for fetal brain MRI, where acquisitions and image processing techniques are less standardized than in adult imaging. In this work, we focus on automated quality control of super-resolution reconstruction (SRR) volumes of fetal brain MRI, an important processing step where multiple stacks of thick 2D slices are registered together and combined to build a single, isotropic and artifact-free T2 weighted volume. We propose FetMRQC${SR}$, a machine-learning method that extracts more than 100 image quality metrics to predict image quality scores using a random forest model. This approach is well suited to a problem that is high dimensional, with highly heterogeneous data and small datasets. We validate FetMRQC${SR}$ in an out-of-domain (OOD) setting and report high performance (ROC AUC = 0.89), even when faced with data from an unknown site or SRR method. We also investigate failure cases and show that they occur in $45%$ of the images due to ambiguous configurations for which the rating from the expert is arguable. These results are encouraging and illustrate how a non deep learning-based method like FetMRQC$_{SR}$ is well suited to this multifaceted problem. Our tool, along with all the code used to generate, train and evaluate the model are available at https://github.com/Medical-Image-Analysis-Laboratory/fetmrqc_sr/ .


💡 Research Summary

Quality control (QC) is a prerequisite for reliable neuroimaging studies, yet it remains a largely unsolved problem for fetal brain magnetic resonance imaging (MRI). Unlike adult MRI, fetal scans are acquired as multiple thick 2‑D T2‑weighted stacks that must be registered and combined through a super‑resolution reconstruction (SRR) step to produce an isotropic 3‑D volume. The SRR process is prone to a variety of artifacts—geometric distortions, excessive noise, low tissue contrast, and topological inconsistencies—that can severely compromise downstream analyses such as tissue segmentation or surface extraction. Manual QC of SRR volumes is time‑consuming and subjective, and existing automated QC tools for adult brain MRI cannot be directly applied because they rely on priors (e.g., air surrounding the head) that are invalid in the intra‑uterine environment.

In this context, the authors introduce FetMRQC₍SR₎, a fully automated, open‑source QC framework specifically designed for fetal SRR volumes. The method builds on the previously published FetMRQC pipeline for 2‑D fetal stacks and on MRIQC for adult data, but adapts and expands the set of image quality metrics (IQMs) to the 3‑D SRR setting. A total of 106 IQMs are extracted from each reconstructed volume. These metrics fall into three families: (i) intensity‑based statistics (mean, variance, slice‑wise SSIM, etc.), (ii) mask‑ and segmentation‑derived descriptors (brain‑mask centroid, tissue volumes, tissue‑wise contrast), and (iii) topological descriptors (Betti numbers, Euler characteristic) computed after automatic segmentation with the state‑of‑the‑art BOUNTI pipeline. The inclusion of topological features is motivated by recent evidence that they capture subtle segmentation errors that are otherwise invisible to intensity‑based measures.

The dataset comprises 673 SRR volumes collected retrospectively from three institutions (CHUV, BCNatal, KISPI) across two scanner manufacturers (Siemens 1.5 T/3 T and GE). Four widely used SRR algorithms—SVR‑TK, NeSV‑OR, NiftyMIC, and MIALSR‑TK—are represented, and reconstruction parameters (target resolution, number of input stacks) are randomly varied to maximize heterogeneity. Two experienced raters assigned a continuous quality score ranging from 0 (exclude) to 4 (excellent) and a binary exclusion label (threshold = 1). A subset of 98 volumes was rated twice to assess inter‑rater reliability.

FetMRQC₍SR₎ uses a random‑forest classifier to predict the binary exclusion label from the IQM vector. Two model variants are explored: (a) a re‑weighting scheme that assigns lower training weight to samples whose continuous score lies near the decision boundary (using a sigmoid mapping to a probability and then computing label purity), and (b) a “predict‑artifacts” variant that first trains separate regressors for each of the 11 artifact categories and feeds their predictions back as additional features. Both variants aim to mitigate the class imbalance (≈ 30 % excluded) and to exploit the richer annotation information.

Pre‑processing of the SRR volumes is also investigated. One strategy saturates intensities at the 99.5 th percentile before segmentation to avoid bright background voxels that can corrupt BOUNTI’s tissue maps; the other normalizes intensities to a 0–1 range after brain masking and resamples to a common 0.8 mm isotropic grid. The authors find that the combination of intensity saturation and brain‑mask normalization yields the best downstream IQM stability.

Evaluation follows a rigorous cross‑validation scheme. In‑domain performance is measured with 10‑fold subject‑wise CV. Out‑of‑domain (OOD) robustness is assessed through three leave‑one‑out experiments: (i) leave‑one‑site‑out, (ii) leave‑one‑SRR‑method‑out, and (iii) leave‑one‑site‑and‑SRR‑method‑out, thereby simulating a double domain shift. Because the task is highly imbalanced, the authors report ROC AUC, balanced accuracy (BA), sensitivity, and specificity, emphasizing specificity to avoid false positives (i.e., classifying a bad volume as acceptable).

Results show that the full FetMRQC₍SR₎ pipeline (including segmentation‑derived IQMs and topological metrics) achieves an in‑domain ROC AUC of 0.93, BA of 0.85, and specificity of 0.88 when the decision threshold on predicted probabilities is set to 0.7. Adding topological features improves AUC and BA by roughly 3 % and 4 %, respectively. The re‑weighting variant matches this performance, while the predict‑artifacts variant yields slightly lower AUC (0.90) and BA (0.81). Baseline models (random guessing, simple MLP) perform substantially worse, confirming the advantage of the handcrafted IQM set and the random‑forest learner.

In OOD experiments, performance degrades modestly but remains robust: ROC AUC stays between 0.85 and 0.90 across all three leave‑one‑out scenarios, indicating that the model generalizes well to unseen sites and reconstruction pipelines. Error analysis reveals that 45 % of misclassifications arise from intrinsically ambiguous cases where the expert’s continuous score hovers around the exclusion threshold; these are not true model failures but reflect the inherent subjectivity of the ground truth. The remaining errors are largely false negatives (good volumes flagged as bad) that nonetheless contain subtle artifacts detectable by the IQMs but not deemed critical by human raters.

The study’s contributions are fourfold: (1) creation of a multi‑site, multi‑SRR, multi‑resolution dataset with 673 expert‑rated volumes, (2) extension of the FetMRQC framework to 3‑D SRR volumes with a comprehensive set of intensity, segmentation, and topological IQMs, (3) thorough evaluation of domain‑shift robustness, and (4) transparent analysis of failure modes, highlighting both the limits of human annotation and the sensitivity of the automated system. All code, trained models, and the annotated dataset are released publicly on GitHub, providing a benchmark for future research.

Overall, FetMRQC₍SR₎ demonstrates that a non‑deep‑learning approach, when equipped with carefully engineered domain‑specific features and robust preprocessing, can deliver high‑quality automated QC for fetal brain SRR volumes even in the face of limited and heterogeneous data. This work paves the way for more reproducible fetal MRI studies and may accelerate the translation of fetal neuroimaging pipelines into clinical practice.


Comments & Academic Discussion

Loading comments...

Leave a Comment