DiGAN: Diffusion-Guided Attention Network for Early Alzheimer's Disease Detection

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Early diagnosis of Alzheimer’s disease (AD) remains a major challenge due to the subtle and temporally irregular progression of structural brain changes in the prodromal stages. Existing deep learning approaches require large longitudinal datasets and often fail to model the temporal continuity and modality irregularities inherent in real-world clinical data. To address these limitations, we propose the Diffusion-Guided Attention Network (DiGAN), which integrates latent diffusion modelling with an attention-guided convolutional network. The diffusion model synthesizes realistic longitudinal neuroimaging trajectories from limited training data, enriching temporal context and improving robustness to unevenly spaced visits. The attention-convolutional layer then captures discriminative structural-temporal patterns that distinguish cognitively normal subjects from those with mild cognitive impairment and subjective cognitive decline. Experiments on the ADNI dataset demonstrate that DiGAN outperforms existing state-of-the-art baselines, showing its potential for early-stage AD detection.

💡 Research Summary

**
Alzheimer’s disease (AD) remains a major public‑health challenge, especially because the earliest structural brain changes are subtle, heterogeneous across individuals, and often captured only in sparse, irregularly timed MRI scans. Existing deep‑learning approaches for AD detection typically require large longitudinal cohorts and struggle to model continuous disease trajectories, limiting their applicability in real‑world clinical settings. In this context, the authors propose the Diffusion‑Guided Attention Network (DiGAN), a novel generative‑discriminative architecture that jointly synthesizes realistic longitudinal neuroimaging trajectories and learns discriminative structural‑temporal representations for early AD detection.

The generative component is a latent diffusion model. Starting from a clean MRI‑derived feature vector (regional volumes, white‑matter hyperintensities, perivascular spaces, etc.), the forward diffusion process adds Gaussian noise in a scheduled manner (αt). A denoising network Dθ learns to reverse this process, reconstructing the original vector while simultaneously learning a variational lower bound on the data likelihood. By sampling from the reverse diffusion, the model can generate synthetic longitudinal profiles ˆX_i that mimic real disease progression, even when only a few visits are available. These synthetic profiles are then split into overlapping subsequences of fixed length L, normalized across subjects, and fed to the discriminative branch.

The discriminative component is an attention‑convolutional network built from stacked Self‑Attention Convolution (SAC) units. Each SAC first computes scaled dot‑product attention across the temporal dimension, allowing the network to focus on clinically informative visits while capturing long‑range dependencies. The attention‑weighted representation is then processed by a 2‑D convolution that jointly operates on time and feature axes, extracting localized structural‑temporal patterns. Batch normalization and non‑linear activations stabilize training. After m SAC layers, the representation is flattened, passed through a fully‑connected layer, and transformed into a subsequence‑level logit. A sigmoid yields a probability p_eℓ that the subsequence reflects cognitive impairment.

For subject‑level decision making, DiGAN aggregates the subsequence probabilities using a max‑pooling rule (p_i = maxℓ p_eℓ). This clinically motivated rule reflects the notion that the presence of any neurodegenerative signal in a patient’s longitudinal record is sufficient to flag elevated risk. The overall loss combines the diffusion reconstruction loss (L_diff) with a binary cross‑entropy classification loss (L_cls) that includes a regularization term penalizing imbalanced positive‑to‑negative prediction ratios. Hyper‑parameters β and λ balance generative and discriminative objectives.

Experiments were conducted on the ADNI cohort, comprising over a thousand subjects with 2–4 MRI visits each. Two binary classification tasks were evaluated: (1) cognitively normal (NO) vs. mild cognitive impairment (MCI) and (2) NO vs. AD. Baselines included linear methods (ALASCA), generative models (TV‑AE, AnoGAN), probabilistic kernel models (GP), and ensemble anomaly detectors (LSCP, SUOD, IsoForest). Synthetic data quality was assessed via distribution overlap, moment differences, KL divergence, and PCA visualizations, all indicating high fidelity to real scans.

Results show that DiGAN consistently outperforms all baselines. For NO vs. MCI, DiGAN achieves an average accuracy of 0.71 ± 0.04, sensitivity of 0.53 ± 0.03, and specificity of 0.96 ± 0.00, surpassing the next best method (LSCP) by roughly 6 % in accuracy. Performance improves as the number of visits increases, confirming that richer temporal context benefits detection of subtle changes. For NO vs. AD, DiGAN reaches >0.85 accuracy and >0.95 sensitivity, with specificity remaining near 0.99. ROC and precision‑recall curves demonstrate superior AUC and stable precision across high‑recall regimes, highlighting robustness to class imbalance.

Visualization of the SAC embeddings reveals a progressive refinement: early layers capture coarse, shared brain patterns, while deeper layers focus on high‑intensity, localized activations corresponding to AD‑related atrophy. This hierarchical behavior provides interpretability and explains why distinguishing NO from MCI remains more challenging than separating NO from AD.

In summary, DiGAN addresses three critical limitations of prior work: (1) it generates realistic longitudinal trajectories from limited data, mitigating scarcity and irregularity; (2) its attention‑convolutional encoder extracts discriminative structural‑temporal features; and (3) its max‑pooling aggregation aligns with clinical reasoning, yielding high specificity. The authors suggest future extensions such as multimodal integration (PET, CSF, genetics), conditional diffusion to model genotype‑specific trajectories, and deployment in prospective clinical trials for risk stratification and treatment monitoring. DiGAN thus represents a promising step toward reliable, early‑stage AD detection in real‑world healthcare environments.

DiGAN: Diffusion-Guided Attention Network for Early Alzheimer's Disease Detection

💡 Research Summary

Comments & Academic Discussion

Leave a Comment