Accelerating Benchmarking of Functional Connectivity Modeling via Structure-aware Core-set Selection

Accelerating Benchmarking of Functional Connectivity Modeling via Structure-aware Core-set Selection
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Benchmarking the hundreds of functional connectivity (FC) modeling methods on large-scale fMRI datasets is critical for reproducible neuroscience. However, the combinatorial explosion of model-data pairings makes exhaustive evaluation computationally prohibitive, preventing such assessments from becoming a routine pre-analysis step. To break this bottleneck, we reframe the challenge of FC benchmarking by selecting a small, representative core-set whose sole purpose is to preserve the relative performance ranking of FC operators. We formalize this as a ranking-preserving subset selection problem and propose Structure-aware Contrastive Learning for Core-set Selection (SCLCS), a self-supervised framework to select these core-sets. SCLCS first uses an adaptive Transformer to learn each sample’s unique FC structure. It then introduces a novel Structural Perturbation Score (SPS) to quantify the stability of these learned structures during training, identifying samples that represent foundational connectivity archetypes. Finally, while SCLCS identifies stable samples via a top-k ranking, we further introduce a density-balanced sampling strategy as a necessary correction to promote diversity, ensuring the final core-set is both structurally robust and distributionally representative. On the large-scale REST-meta-MDD dataset, SCLCS preserves the ground-truth model ranking with just 10% of the data, outperforming state-of-the-art (SOTA) core-set selection methods by up to 23.2% in ranking consistency (nDCG@k). To our knowledge, this is the first work to formalize core-set selection for FC operator benchmarking, thereby making large-scale operators comparisons a feasible and integral part of computational neuroscience. Code is publicly available on https://github.com/lzhan94swu/SCLCS


💡 Research Summary

The paper tackles a pressing bottleneck in functional connectivity (FC) research: benchmarking hundreds of statistical pairwise interaction (SPI) operators on large‑scale fMRI datasets is computationally prohibitive. Rather than trying to evaluate every operator on the full dataset, the authors reframe the problem as a “ranking‑preserving core‑set selection” task. The goal is to find a small subset of subjects (the core‑set) such that the relative performance ranking of all SPI operators computed on this subset matches the ranking obtained on the entire cohort.

To achieve this, they introduce SCLCS (Structure‑aware Contrastive Learning for Core‑set Selection), a self‑supervised pipeline consisting of four key components:

  1. Attention‑based FC learning – Each fMRI sample (N ROIs × T time points) is treated as a sequence of N tokens. An adaptive multi‑head transformer encoder learns a sample‑specific connectivity matrix. Crucially, instead of naïvely averaging attention heads (which Theorem 1 shows dilutes structural information), the model learns a set of non‑negative fusion weights α that combine heads adaptively. Theorem 2 proves that this adaptive attention family can universally approximate any continuous SPI mapping on a compact domain, guaranteeing expressive power for diverse FC patterns.

  2. Structural Perturbation Score (SPS) – During training, the Frobenius norm of the difference between the attention matrix at epoch e and epoch e‑1 is accumulated across L epochs. Proposition 1 demonstrates that this perturbation magnitude reflects the mixture of underlying connectivity prototypes: samples that belong to a pure prototype exhibit low SPS (stable structure), whereas noisy or atypical samples yield high SPS. Hence SPS serves as a principled importance metric for core‑set selection.

  3. Structure‑aware density‑balanced sampling – Selecting the top‑k low‑SPS samples alone can be brittle because it may over‑represent dense regions of the data manifold and ignore rare but informative connectivity patterns. The authors therefore estimate the local density of each candidate (e.g., via k‑nearest‑neighbor counts) and deliberately sample additional points from low‑density regions, ensuring that the final core‑set is both structurally robust and distributionally diverse.

  4. Identity‑supervised contrastive learning – Subject identity labels are used to pull together different scans of the same participant while pushing apart scans from different participants. This encourages the encoder to learn “brain fingerprint” representations that are stable across sessions, aligning the learned structures with the downstream SPI evaluation metric (Spearman rank correlation between within‑class and between‑class FC similarity).

The method is evaluated on the REST‑meta‑MDD dataset, a multi‑site resting‑state fMRI collection comprising thousands of subjects and hundreds of ROIs. Two benchmark tasks are considered: (a) brain‑fingerprinting (identifying a subject from their FC) and (b) major depressive disorder (MDD) diagnosis. For each task, 130 SPI operators are scored, producing a full‑dataset ranking. Core‑sets of varying sizes (1 %–20 % of the data) are then constructed with SCLCS and with several strong baselines (CRAIG, GLISTER, BADGE, random, etc.). Ranking fidelity is measured by normalized Discounted Cumulative Gain at k (nDCG@k).

Key results:

  • With only 10 % of the subjects, SCLCS achieves nDCG@k ≈ 0.92, preserving the full‑dataset ranking almost perfectly.
  • Compared to the best existing core‑set method, SCLCS improves ranking consistency by up to 23.2 % across all core‑set budgets.
  • Ablation studies reveal that (i) removing adaptive head fusion dramatically degrades SPS stability, (ii) omitting density‑balanced sampling reduces diversity and leads to rank distortion, and (iii) contrastive pre‑training is essential for aligning learned structures with the downstream SPI scores.

The authors discuss limitations: the transformer encoder is memory‑intensive for very high‑dimensional ROI grids, and SPS may over‑penalize noisy low‑SNR data, potentially discarding useful samples. Future work includes exploring lightweight graph‑neural‑network encoders, robust regularization for noisy scans, and extending the framework to other neuroimaging modalities (e.g., diffusion MRI, MEG).

In summary, this work is the first to formalize core‑set selection for FC operator benchmarking, providing a theoretically grounded, empirically validated pipeline that reduces computational cost by an order of magnitude while maintaining high fidelity of operator rankings. The code and data are publicly released, facilitating reproducibility and broader adoption in the neuroimaging community.


Comments & Academic Discussion

Loading comments...

Leave a Comment