Brain Gene Expression Analysis: a MATLAB toolbox for the analysis of brain-wide gene-expression data

Brain Gene Expression Analysis: a MATLAB toolbox for the analysis of   brain-wide gene-expression data

The Allen Brain Atlas project (ABA) generated a genome-scale collection of gene-expression profiles using in-situ hybridization. These profiles were co-registered to the three-dimensional Allen Reference Atlas (ARA) of the adult mouse brain. A set of more than 4,000 such volumetric data are available for the full brain, at a resolution of 200 microns. These data are presented in a voxel-by-gene matrix. The ARA comes with several systems of annotation, hierarchical (40 cortical regions, 209 sub-cortical regions in the whole brain), or non-hierarchical (12 regions in the left hemisphere, with refinement into 94 regions, and cortical layers). The high-dimensional nature of this dataset and the possible connection between anatomy and gene expression pose challenges to data analysis. We developed the Brain Gene Expression Analysis Toolbox, whose functionalities include: determination of marker genes for brain regions, statistical analysis of brain-wide co-expression patterns, and the computation of brain-wide correlation maps with cell-type specific microarray data.


💡 Research Summary

The Allen Brain Atlas (ABA) has produced a genome‑scale collection of mouse brain gene‑expression profiles using in‑situ hybridization (ISH). These data are co‑registered to the three‑dimensional Allen Reference Atlas (ARA) at a 200 µm voxel resolution, yielding a voxel‑by‑gene matrix of roughly 49,742 voxels by 4,117 genes. The ARA provides multiple annotation schemes, including a hierarchical system (40 cortical and 209 subcortical regions) and non‑hierarchical partitions (12 large left‑hemisphere regions refined into 94 sub‑regions, plus cortical layers). While this resource offers unprecedented anatomical coverage, its high dimensionality and the need to relate anatomy to transcriptional patterns create substantial analytical challenges.

To address these challenges, the authors developed the Brain Gene Expression Analysis Toolbox, a MATLAB‑based suite that streamlines three core analytical tasks: (1) identification of region‑specific marker genes, (2) statistical assessment of brain‑wide co‑expression patterns, and (3) generation of whole‑brain correlation maps using cell‑type‑specific microarray data.

Marker‑gene detection is performed by computing the mean expression of each gene across voxels belonging to a given region, then evaluating a specificity score that compares regional expression to the whole‑brain average. Statistical significance is assessed with t‑tests and false‑discovery‑rate (FDR) correction, and users can set thresholds for minimum expression, region coverage, and effect size. The result is a ranked list of candidate markers for any anatomical label.

Co‑expression analysis calculates voxel‑wise Pearson correlations for every pair of genes, producing a massive correlation matrix. To manage memory, the matrix is stored in sparse format, and parallel loops (parfor) accelerate computation. The toolbox compares the observed correlation distribution against permutation‑derived null models, allowing the extraction of statistically robust gene modules. Visualization options include heat‑maps, hierarchical clustering dendrograms, and dimensionality‑reduction plots (PCA, ICA).

Cell‑type correlation mapping enables users to import external transcriptomic profiles (e.g., neuronal, astrocytic, microglial signatures) and compute voxel‑wise correlations between these profiles and the ABA expression matrix. The resulting correlation values are projected onto the 3‑D ARA, producing intuitive maps that highlight brain regions enriched for a particular cell type. This feature bridges bulk ISH data with single‑cell or bulk microarray datasets, facilitating cross‑modal validation.

The toolbox also incorporates a suite of visualization utilities (slice, surface, and transparent 3‑D views) and statistical tools (bootstrapping for confidence intervals, multiple‑comparison corrections). Implementation leverages MATLAB’s optimized linear‑algebra routines, single‑precision storage to reduce memory footprint, and optional GPU acceleration for large‑scale operations.

The authors demonstrate the toolbox with three case studies. First, they recover known cortical layer markers Satb2 and Cux1, confirming the method’s ability to pinpoint region‑specific genes. Second, a whole‑brain co‑expression network reveals a strong positive correlation between Gad1 and Gad2, reflecting the GABAergic neuronal system. Third, correlation mapping with an astrocyte microarray signature identifies high‑correlation voxels in the hippocampus and specific cortical zones, visually validating the spatial distribution of astrocytic transcriptional programs.

Limitations are acknowledged. The 200 µm resolution cannot resolve single‑cell microstructures, potentially obscuring fine‑grained regional markers. ISH intensity is semi‑quantitative, so absolute expression levels should be interpreted cautiously. The current implementation is mouse‑specific; extending to human brain atlases would require re‑annotation and coordinate transformation.

In summary, the Brain Gene Expression Analysis Toolbox provides a comprehensive, user‑friendly platform for interrogating the ABA’s massive gene‑expression dataset. By integrating marker discovery, co‑expression network analysis, and cell‑type correlation mapping within a single MATLAB environment, it empowers neuroscientists to link anatomical architecture with transcriptional signatures. Future extensions—such as integration with high‑resolution single‑cell RNA‑seq, cloud‑based parallel processing, and machine‑learning predictive models—could further enhance its utility for elucidating brain function and disease mechanisms.