Simple 1-D Convolutional Networks for Resting-State fMRI Based Classification in Autism
Deep learning methods are increasingly being used with neuroimaging data like structural and function magnetic resonance imaging (MRI) to predict the diagnosis of neuropsychiatric and neurological disorders. For psychiatric disorders in particular, it is believed that one of the most promising modality is the resting-state functional MRI (rsfMRI), which captures the intrinsic connectivity between regions in the brain. Because rsfMRI data points are inherently high-dimensional (~1M), it is impossible to process the entire input in its raw form. In this paper, we propose a very simple transformation of the rsfMRI images that captures all of the temporal dynamics of the signal but sub-samples its spatial extent. As a result, we use a very simple 1-D convolutional network which is fast to train, requires minimal preprocessing and performs at par with the state-of-the-art on the classification of Autism spectrum disorders.
💡 Research Summary
The paper presents a minimalist yet effective deep‑learning pipeline for classifying Autism Spectrum Disorder (ASD) versus typically developing (TD) subjects using resting‑state functional MRI (rsfMRI) data from the large multi‑site ABIDE I+II consortium. The authors argue that most existing neuro‑imaging classifiers reduce the high‑dimensional 4‑D rsfMRI volumes to static summary statistics such as correlation matrices, thereby discarding potentially informative nonlinear temporal dynamics. To retain the full temporal information while still addressing the curse of dimensionality, they first apply a standard FSL‑based preprocessing workflow (motion correction, slice‑timing correction, spatial smoothing, high‑pass filtering, confound regression including WM/CSF signals and ICA‑AROMA, followed by band‑pass filtering 0.009–0.08 Hz). After preprocessing, the brain is parcellated using four atlases: Automated Anatomical Labeling (AAL), Harvard‑Oxford, Schaefer‑100 and Schaefer‑400. For each region of interest (ROI) the mean time‑series is extracted, yielding a matrix of size (time points × ROIs).
These ROI‑averaged time‑series constitute the input to a very simple 1‑D convolutional neural network. The network consists of a single 1‑D convolutional layer whose number of input channels equals the number of ROIs, followed by an adaptive average‑pooling layer that collapses each channel to a single scalar. The pooled vector is flattened, passed through a fully‑connected layer with a dropout of 0.2, and finally a softmax layer produces the probability of the two classes. Training uses the Adam optimizer (learning rate = 1e‑4, weight decay = 2e‑3). Data are split into 70 % training, 10 % validation, and 20 % test; model selection is based on the lowest validation loss.
Two evaluation strategies are employed. First, a 10‑fold cross‑validation examines how the number of retained time points (100, 150, 200, 250) influences performance for each atlas. Accuracy improves with longer recordings up to 200 time points (≈4 minutes), after which it declines because fewer subjects meet the longer duration threshold, reducing the effective sample size. The best cross‑validated accuracy (68 %) is achieved with the Harvard‑Oxford atlas at 200 time points. Second, a leave‑one‑site‑out (LOSO) validation assesses generalization across the 29 acquisition sites. Here, Harvard‑Oxford again yields the highest mean accuracy (65.1 %) and the lowest variance across sites, indicating robustness to scanner and protocol heterogeneity.
The authors also explore the relationship between image quality, measured by temporal signal‑to‑noise ratio (tSNR), and classification accuracy. Pearson correlation analyses reveal no significant association (p > 0.41 for all atlases), suggesting that the extensive confound regression and ICA‑AROMA steps effectively mitigate site‑specific quality differences.
Overall, the study demonstrates that a straightforward 1‑D ConvNet, combined with ROI‑averaged raw time‑series, can reach performance comparable to more complex graph‑convolutional or ensemble methods (which report accuracies in the low‑70 % range) while requiring far less computational resources—training completes in under two minutes on a single NVIDIA Pascal GPU, and the architecture involves only a handful of learnable parameters.
Nevertheless, the achieved accuracy (65–68 %) remains modest for clinical deployment, and the reliance on ROI averaging still discards fine‑grained spatial information. Future work could explore finer parcellations, data‑augmentation or transfer‑learning strategies, multimodal fusion with structural MRI or behavioral measures, and more sophisticated temporal modeling (e.g., recurrent networks or attention mechanisms) to boost sensitivity and specificity.
In summary, the paper contributes a proof‑of‑concept that preserving the full temporal dynamics of rsfMRI, even when spatially reduced to a few hundred ROIs, enables a lightweight deep‑learning classifier that is fast to train, easy to implement, and competitive with state‑of‑the‑art approaches for autism classification.
Comments & Academic Discussion
Loading comments...
Leave a Comment