Slice-wise quality assessment of high b-value breast DWI via deep learning-based artifact detection

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Diffusion-weighted imaging (DWI) can support lesion detection and characterization in breast magnetic resonance imaging (MRI), however especially high b-value diffusion-weighted acquisitions can be prone to intensity artifacts that can affect diagnostic image assessment. This study aims to detect both hyper- and hypointense artifacts on high b-value diffusion-weighted images (b=1500 s/mm2) using deep learning, employing either a binary classification (artifact presence) or a multiclass classification (artifact intensity) approach on a slice-wise dataset.This IRB-approved retrospective study used the single-center dataset comprising n=11806 slices from routine 3T breast MRI examinations performed between 2022 and mid-2023. Three convolutional neural network (CNN) architectures (DenseNet121, ResNet18, and SEResNet50) were trained for binary classification of hyper- and hypointense artifacts. The best performing model (DenseNet121) was applied to an independent holdout test set and was further trained separately for multiclass classification. Evaluation included area under receiver operating characteristic curve (AUROC), area under precision recall curve (AUPRC), precision, and recall, as well as analysis of predicted bounding box positions, derived from the network Grad-CAM heatmaps. DenseNet121 achieved AUROCs of 0.92 and 0.94 for hyper- and hypointense artifact detection, respectively, and weighted AUROCs of 0.85 and 0.88 for multiclass classification on single-slice high b-value diffusion-weighted images. A radiologist evaluated bounding box precision on a 1-5 Likert-like scale across 200 slices, achieving mean scores of 3.33+-1.04 for hyperintense artifacts and 2.62+-0.81 for hypointense artifacts. Hyper- and hypointense artifact detection in slice-wise breast DWI MRI dataset (b=1500 s/mm2) using CNNs particularly DenseNet121, seems promising and requires further validation.

💡 Research Summary

This paper addresses the problem of intensity artifacts that frequently affect high‑b‑value (b = 1500 s/mm²) diffusion‑weighted imaging (DWI) of the breast. While DWI is increasingly used for lesion detection and characterization, high b‑values amplify susceptibility to both hyper‑intense and hypo‑intense artifacts, which can obscure pathology and bias quantitative apparent diffusion coefficient (ADC) maps. Prior work has only examined artifacts on maximum‑intensity projections (MIPs), which collapse three‑dimensional information and may either exaggerate or miss slice‑specific artifacts. The authors therefore propose a slice‑wise deep‑learning approach that can detect and localize both artifact types, using either binary classification (artifact present vs. absent) or multiclass classification (artifact severity on a 1‑5 Likert scale).

A retrospective, IRB‑approved cohort of 1 383 routine breast MRI examinations performed on 3 T Siemens Skyra/ViDA scanners between 2022 and mid‑2023 was screened. From this pool, 156 cases with moderate‑to‑severe artifacts (scores 4‑5) were pre‑selected using MIP review by an experienced reader. These cases were converted into 2‑D axial slices, split into left and right breast halves, yielding a total of 11 806 slices. Each slice was intensity‑scaled to 0‑255, saved as JPEG, and resized to 160 × 128 pixels. Breast masks derived from T1‑weighted images were applied to restrict the network’s input to breast tissue only.

Ground truth labeling was performed in two stages. For binary classification, scores 1‑2 were merged into “non‑significant/no artifact” (class 0) and scores 3‑5 into “potentially significant artifact” (class 1). For multiclass classification the original 1‑5 scores were retained as separate classes; a sixth “ambiguous” class was resolved by a board‑certified radiologist. Labeling was carried out by a master’s student under radiologist supervision, and a pair of control readers independently evaluated the hold‑out test set to verify consistency.

The dataset was stratified by artifact severity and split at the case level into training (70 %), validation (15 %), and test (15 %) subsets, ensuring that all slices from a given patient remained in the same split. To mitigate class imbalance, a weighted random sampler was used during training.

Three convolutional neural network (CNN) architectures were evaluated: DenseNet‑121, ResNet‑18, and SE‑ResNet‑50. All models were implemented with MONAI and PyTorch Lightning, trained on NVIDIA RTX 2080 GPUs using the Adam optimizer and cross‑entropy loss. Learning rates were tuned per architecture (ranging from 4 × 10⁻⁶ to 9 × 10⁻⁵). Data augmentation comprised random rotations (±12°) and horizontal/vertical flips (p = 0.5). Early stopping with a patience of 10 epochs limited training to a maximum of 200 epochs.

Performance on the validation set identified DenseNet‑121 as the best performer, which was then applied to the independent test set. In binary classification, DenseNet‑121 achieved area under the receiver operating characteristic (AUROC) of 0.92 for hyper‑intense artifacts and 0.94 for hypo‑intense artifacts. In the multiclass setting, weighted AUROCs were 0.85 and 0.88 respectively. Additional metrics (accuracy, precision, recall, AUPRC) were reported but not detailed in the abstract.

To provide spatial interpretability, Grad‑CAM heatmaps were generated for each prediction. The top 20 % of activation values (threshold = 0.2) were binarized, contours extracted, and bounding boxes drawn around the most influential regions. A board‑certified radiologist evaluated the correspondence between these boxes and the true artifact locations on 200 randomly selected slices using a 1‑5 Likert scale. Mean scores were 3.33 ± 1.04 for hyper‑intense and 2.62 ± 0.81 for hypo‑intense artifacts, indicating moderate agreement but also room for improvement in localization precision.

The authors discuss several strengths: (1) slice‑wise labeling captures heterogeneous artifact presentations that MIP‑based methods miss; (2) the use of a relatively large, well‑balanced slice dataset (over 11 k samples) supports robust learning; (3) DenseNet‑121’s dense connectivity appears advantageous for this task. Limitations include the single‑center, single‑vendor data source, potential information loss from JPEG compression and 2‑D down‑sampling, and the qualitative nature of the bounding‑box evaluation without quantitative overlap metrics (e.g., IoU).

Future work is suggested to involve multi‑center, multi‑vendor datasets, 3‑D CNN or sequence models that exploit volumetric continuity, and more rigorous localization assessment. The authors conclude that deep‑learning‑based artifact detection, particularly using DenseNet‑121, is promising for automated quality control of high‑b‑value breast DWI and could be integrated into clinical workflows after further validation.

Slice-wise quality assessment of high b-value breast DWI via deep learning-based artifact detection

💡 Research Summary

Comments & Academic Discussion

Leave a Comment