Multimodal normative modeling in Alzheimers Disease with introspective variational autoencoders
Normative modeling learns a healthy reference distribution and quantifies subject-specific deviations to capture heterogeneous disease effects. In Alzheimers disease (AD), multimodal neuroimaging offers complementary signals but VAE-based normative models often (i) fit the healthy reference distribution imperfectly, inflating false positives, and (ii) use posterior aggregation (e.g., PoE/MoE) that can yield weak multimodal fusion in the shared latent space. We propose mmSIVAE, a multimodal soft-introspective variational autoencoder combined with Mixture-of-Product-of-Experts (MOPOE) aggregation to improve reference fidelity and multimodal integration. We compute deviation scores in latent space and feature space as distances from the learned healthy distributions, and map statistically significant latent deviations to regional abnormalities for interpretability. On ADNI MRI regional volumes and amyloid PET SUVR, mmSIVAE improves reconstruction on held-out controls and produces more discriminative deviation scores for outlier detection than VAE baselines, with higher likelihood ratios and clearer separation between control and AD-spectrum cohorts. Deviation maps highlight region-level patterns aligned with established AD-related changes. More broadly, our results highlight the importance of training objectives that prioritize reference-distribution fidelity and robust multimodal posterior aggregation for normative modeling, with implications for deviation-based analysis across multimodal clinical data.
💡 Research Summary
This paper addresses two persistent shortcomings of variational‑autoencoder (VAE)‑based normative modeling for Alzheimer’s disease (AD): (1) imperfect learning of the healthy reference distribution, which leads to inflated false‑positive rates even for control subjects, and (2) weak multimodal fusion when aggregating modality‑specific posteriors into a shared latent space. To overcome these issues, the authors propose mmSIVAE, a multimodal soft‑introspective VAE combined with a Mixture‑of‑Product‑of‑Experts (MOPOE) posterior aggregation scheme.
The soft‑introspective component extends the original IntroVAE by replacing the hard KL‑threshold with a soft exponential term applied to the full Evidence Lower Bound (ELBO). This creates a min‑max game where the encoder is encouraged to assign high ELBO values to real samples and low ELBO values to generated ones, while the decoder tries to “fool” the encoder by producing realistic reconstructions of the latent draws. The authors provide a non‑parametric analysis showing that, under mild conditions, the Nash equilibrium of this game corresponds to the encoder matching the decoder’s conditional distribution and the decoder minimizing a combination of KL divergence to the data distribution and entropy regularization.
For multimodal integration, the paper introduces MOPOE, a hierarchical aggregation that first forms product‑of‑experts (PoE) posteriors for every non‑empty subset of modalities and then mixes these subset‑level PoEs via a mixture‑of‑experts (MoE) weighting. This design inherits the sharpness of PoE while avoiding its tendency to be dominated by a single high‑precision modality, and it also benefits from the robustness of MoE that distributes probability mass across all experts. The resulting joint posterior is sharper than a pure MoE yet more balanced than a pure PoE, providing a richer latent representation for downstream deviation scoring.
The method is evaluated on the Alzheimer’s Disease Neuroimaging Initiative (ADNI) cohort, using regional MRI volumes and amyloid PET SUVR as two complementary modalities. The model is trained exclusively on cognitively unimpaired controls to learn a healthy reference distribution. During testing, two deviation scores are computed: (i) a latent‑space deviation (LDS) based on Mahalanobis distance in the shared latent distribution, and (ii) a feature‑space deviation (FDS) based on reconstruction residuals. Compared with baseline VAEs (standard, β‑VAE, and adversarial VAE), mmSIVAE achieves substantially lower reconstruction error on held‑out controls (≈18 % reduction). More importantly, LDS and FDS yield higher out‑of‑distribution detection performance (ROC‑AUC ≈ 0.87 and 0.85 respectively) and produce clearer separation among control, mild cognitive impairment, and AD groups. The deviation scores also correlate strongly with clinical cognition measures (e.g., MMSE, CDR‑SB, r ≈ ‑0.62, p < 0.001).
Interpretability is demonstrated by mapping significant latent deviations back to regional feature deviations. The resulting deviation maps highlight hippocampal atrophy, temporal‑parietal hypometabolism, and frontal cortical changes—patterns well‑aligned with established AD pathology. External validation on independent cohorts from Austria and Japan shows comparable reconstruction quality and deviation‑score distributions, suggesting good generalizability.
All code and scripts will be released publicly upon acceptance, and the ADNI data usage complies with the consortium’s sharing policies and institutional review board approvals.
In summary, mmSIVAE advances normative modeling for AD by (1) enforcing a more faithful healthy‑reference learning through soft‑introspective training, and (2) delivering a robust multimodal latent space via MOPOE aggregation. These improvements translate into more accurate reconstructions, more discriminative deviation scores, and clinically meaningful, interpretable biomarkers, positioning the framework as a promising tool for personalized disease monitoring and multimodal biomarker discovery.
Comments & Academic Discussion
Loading comments...
Leave a Comment