Support vector machine classification of dimensionally reduced structural MRI images for dementia

Support vector machine classification of dimensionally reduced   structural MRI images for dementia
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We classify very-mild to moderate dementia in patients (CDR ranging from 0 to 2) using a support vector machine classifier acting on dimensionally reduced feature set derived from MRI brain scans of the 416 subjects available in the OASIS-Brains dataset. We use image segmentation and principal component analysis to reduce the dimensionality of the data. Our resulting feature set contains 11 features for each subject. Performance of the classifiers is evaluated using 10-fold cross-validation. Using linear and (gaussian) kernels, we obtain a training classification accuracy of 86.4% (90.1%), test accuracy of 85.0% (85.7%), test precision of 68.7% (68.5%), test recall of 68.0% (74.0%), and test Matthews correlation coefficient of 0.594 (0.616).


💡 Research Summary

This paper investigates the use of support vector machines (SVM) to classify dementia status based on structural magnetic resonance imaging (MRI) data that have been dramatically reduced in dimensionality. The authors work with the publicly available OASIS‑Brains dataset, which contains T1‑weighted MRI scans from 416 subjects ranging in age from 18 to 96. Among these, 100 individuals have a Clinical Dementia Rating (CDR) greater than zero (i.e., they are considered demented), while the remaining 316 are cognitively normal (CDR = 0).

Rather than feeding the raw voxel intensities—on the order of 15 million voxels per scan—directly into a classifier, the authors first perform a series of preprocessing steps. They use the pre‑processed “masked” images (non‑brain tissue removed, gain‑field corrected, and registered to a standard atlas) and the segmented images (grey matter, white matter, and cerebrospinal fluid labels). From these they extract a compact set of eleven features per subject:

  1. Age (years)
  2. Gender (binary)
  3. Estimated total intracranial volume (eTIV)
  4. Normalized whole‑brain volume (nWBV)
  5. Total white‑matter volume (from segmentation)
  6. Total gray‑matter volume (from segmentation)
  7. Total CSF volume (from segmentation)
  8. Up/down axial symmetry (average zero‑lag correlation across axial slices)
  9. Left/right axial symmetry (average zero‑lag correlation across axial slices)
  10. Coefficient of the 4th principal component derived from a coronal “eigenbrain” slice
  11. Coefficient of the 7th principal component derived from an axial “eigenbrain” slice

The symmetry measures are computed by correlating each slice with its mirrored counterpart and normalizing by the slice’s total signal. The two PCA coefficients are obtained by first constructing eigenbrains—principal components of selected coronal and axial slices—using all subjects, then projecting each individual’s slice onto the chosen components. The authors selected components #4 (coronal) and #7 (axial) because they maximized cross‑validated test accuracy during a parametric search. All features are mean‑centered and scaled to unit variance before classification.

For the classification stage, the authors employ LIBSVM, testing both a linear kernel and a radial basis function (RBF, Gaussian) kernel. The RBF kernel’s γ parameter is set to the inverse of the number of features (1/11), a common heuristic that prevents the model from becoming overly complex. Model performance is evaluated using 10‑fold cross‑validation: in each fold, 90 % of the data are used for training and the remaining 10 % for testing, and the process is repeated until every subject has been held out once. The authors report accuracy, precision, recall, and the Matthews correlation coefficient (MCC), which is particularly informative for imbalanced binary problems.

Results show that the linear SVM achieves a training accuracy of 86.4 % and a test accuracy of 85.0 %, while the RBF SVM reaches 90.1 % training accuracy and 85.7 % test accuracy. Precision is roughly 68 % for both kernels; recall is 68 % for the linear model and 74 % for the RBF model. MCC values are 0.594 (linear) and 0.616 (RBF), indicating substantially better than random guessing (MCC = 0). To assess feature importance, the authors remove the age variable, which reduces test accuracy by about 2 %, confirming that the classifier does not rely solely on age. Excluding the PCA coefficients leads to a 5–10 % drop in recall and a modest reduction in MCC (by 0.03–0.07), demonstrating that the eigenbrain features contribute meaningfully to performance.

The authors compare their approach to earlier studies that used raw voxel intensities and reported near‑90 % test accuracy. While their reduced‑feature model does not quite match that level of raw‑voxel performance, it avoids severe over‑fitting (training accuracies of 100 % reported in naïve raw‑voxel experiments) and dramatically lowers computational demands, enabling execution on a standard laptop. The paper concludes that a carefully engineered low‑dimensional feature set—combining demographic, volumetric, symmetry, and PCA‑derived eigenbrain coefficients—can yield robust dementia classification with good generalization, while remaining computationally tractable.

Limitations include the modest overall accuracy (≈85 %) and precision/recall around 70 %, which may be insufficient for clinical decision‑making without further validation. The dataset is also single‑site, so external generalizability remains to be demonstrated. Future work could incorporate multi‑site data, explore additional clinical covariates, and compare against deep‑learning approaches that learn features directly from images. Nonetheless, the study provides a clear demonstration that dimensionality reduction, when thoughtfully applied, can produce practical machine‑learning pipelines for neuroimaging‑based dementia detection.


Comments & Academic Discussion

Loading comments...

Leave a Comment