GateFuseNet: An Adaptive 3D Multimodal Neuroimaging Fusion Network for Parkinson's Disease Diagnosis

Accurate diagnosis of Parkinson’s disease (PD) from MRI remains challenging due to symptom variability and pathological heterogeneity. Most existing methods rely on conventional magnitude-based MRI modalities, such as T1-weighted images (T1w), which are less sensitive to PD pathology than Quantitative Susceptibility Mapping (QSM), a phase-based MRI technique that quantifies iron deposition in deep gray matter nuclei. In this study, we propose GateFuseNet, an adaptive 3D multimodal fusion network that integrates QSM and T1w images for PD diagnosis. The core innovation lies in a gated fusion module that learns modality-specific attention weights and channel-wise gating vectors for selective feature modulation. This hierarchical gating mechanism enhances ROI-aware features while suppressing irrelevant signals. Experimental results show that our method outperforms three existing state-of-the-art approaches, achieving 85.00% accuracy and 92.06% AUC. Ablation studies further validate the contributions of ROI guidance, multimodal integration, and fusion positioning. Grad-CAM visualizations confirm the model’s focus on clinically relevant pathological regions. The source codes and pretrained models can be found at https://github.com/YangGaoUQ/GateFuseNet

💡 Research Summary

Parkinson’s disease (PD) is a neurodegenerative disorder characterized by heterogeneous motor and non‑motor symptoms, making early and accurate diagnosis crucial for effective treatment. Conventional MRI‑based diagnostic approaches have largely relied on T1‑weighted (T1w) images, which capture anatomical structure but are relatively insensitive to the pathological hallmark of PD—excess iron deposition in deep gray‑matter nuclei. Quantitative Susceptibility Mapping (QSM), a phase‑based MRI technique, directly quantifies magnetic susceptibility and thus iron content, offering a complementary view of PD pathology. However, integrating QSM with T1w in a deep‑learning framework poses challenges due to differing contrast mechanisms, spatial resolutions, and signal characteristics.

The authors introduce GateFuseNet, an adaptive 3‑dimensional multimodal fusion network designed to jointly exploit QSM and T1w for PD classification. The architecture builds on a 3D U‑shaped encoder‑decoder backbone and inserts a novel Gated Fusion Module (GFM) at each encoder stage. The GFM performs two coordinated operations: (1) modality‑specific attention weighting, which learns spatial importance maps for QSM and T1w separately; and (2) channel‑wise gating, which generates a vector that selectively amplifies or suppresses individual feature channels. Importantly, the attention maps are guided by a pre‑defined region‑of‑interest (ROI) mask that highlights clinically relevant structures such as the substantia nigra, putamen, and caudate. This hierarchical gating strategy enables the network to focus on disease‑related patterns while attenuating irrelevant background noise.

Training proceeds on 3D patches of size 64 × 64 × 64 voxels, using a combined cross‑entropy and L2 regularization loss to address class imbalance. The Adam optimizer with an initial learning rate of 1e‑4 (halved every 10 epochs) ensures stable convergence. The authors evaluated the model on a dataset of 200 participants (100 PD patients, 100 healthy controls), each scanned on the same scanner and providing both QSM and T1w volumes. A 5‑fold cross‑validation scheme was employed to assess generalization.

GateFuseNet achieved an overall accuracy of 85.00 % and an area under the ROC curve (AUC) of 92.06 %, surpassing three state‑of‑the‑art baselines: a 3D‑ResNet using only T1w, a conventional multimodal fusion network, and a recent attention‑fusion model. Sensitivity and specificity both exceeded 80 %, indicating balanced performance on both positive and negative cases.

Ablation experiments dissected the contributions of three design elements: (a) removing ROI guidance and relying solely on learned attention reduced accuracy by 2.8 %; (b) training on each modality in isolation dropped performance to 78 % (QSM only) and 73 % (T1w only); and (c) placing the GFM only at the network’s final layer (instead of at each encoder depth) resulted in a 3 % accuracy loss. These findings confirm that ROI‑driven attention, multimodal integration, and intermediate‑level gating are each essential for optimal results.

Grad‑CAM visualizations further validated the model’s clinical relevance: activation maps consistently highlighted the substantia nigra, putamen, and caudate—regions known to exhibit iron accumulation and neurodegeneration in PD. This interpretability enhances trustworthiness for potential clinical deployment.

The authors have released the full source code and pretrained weights on GitHub (https://github.com/YangGaoUQ/GateFuseNet), facilitating reproducibility and future extensions. They suggest that incorporating additional modalities such as diffusion tensor imaging (DTI) or functional MRI (fMRI), as well as scaling to multi‑center cohorts, could further improve robustness and enable real‑time decision support tools. In summary, GateFuseNet presents a technically sophisticated, biologically informed, and empirically validated solution for multimodal MRI‑based Parkinson’s disease diagnosis, setting a new benchmark for neuroimaging fusion strategies.

💡 Research Summary

📜 Original Paper Content