arrayMap: A Reference Resource for Genomic Copy Number Imbalances in Human Malignancies
Background: The delineation of genomic copy number abnormalities (CNAs) from cancer samples has been instrumental for identification of tumor suppressor genes and oncogenes and proven useful for clinical marker detection. An increasing number of projects have mapped CNAs using high-resolution microarray based techniques. So far, no single resource does provide a global collection of readily accessible oncoge- nomic array data. Methodology/Principal Findings: We here present arrayMap, a curated reference database and bioinformatics resource targeting copy number profiling data in human cancer. The arrayMap database provides a platform for meta-analysis and systems level data integration of high-resolution oncogenomic CNA data. To date, the resource incorporates more than 40,000 arrays in 224 cancer types extracted from several resources, including the NCBI’s Gene Expression Omnibus (GEO), EBIs ArrayExpress (AE), The Cancer Genome Atlas (TCGA), publication supplements and direct submissions. For the majority of the included datasets, probe level and integrated visualization facilitate gene level and genome wide data re- view. Results from multi-case selections can be connected to downstream data analysis and visualization tools. Conclusions/Significance: To our knowledge, currently no data source provides an extensive collection of high resolution oncogenomic CNA data which readily could be used for genomic feature mining, across a representative range of cancer entities. arrayMap represents our effort for providing a long term platform for oncogenomic CNA data independent of specific platform considerations or specific project dependence. The online database can be accessed at http://www.arraymap.org.
💡 Research Summary
The paper introduces arrayMap, a curated, publicly accessible reference database that aggregates high‑resolution copy‑number alteration (CNA) data from human cancers. Recognizing that CNAs are pivotal for identifying tumor suppressor losses, oncogene amplifications, and clinically relevant biomarkers, the authors set out to overcome the fragmentation of existing resources (GEO, ArrayExpress, TCGA, supplemental tables, and direct submissions) by creating a unified platform. To date, arrayMap houses more than 40,000 microarray experiments covering 224 distinct cancer types. Each entry includes probe‑level intensity values, genomic coordinates (re‑mapped to the hg19 reference), gene annotations, and a standardized set of metadata (tissue of origin, disease stage, platform, etc.).
Data acquisition involved automated parsing of public repositories combined with manual curation to ensure quality and consistency. Quality control metrics were applied, and only datasets meeting predefined thresholds were retained. By normalizing across diverse platforms (Agilent, Affymetrix, Illumina, etc.), the database achieves true platform independence, allowing direct comparison of CNA profiles generated on different technologies.
The web interface offers intuitive filtering by cancer type, tissue, platform, and other attributes. Users can visualize individual or aggregated CNA landscapes through heatmaps, log‑ratio plots, and circular ideograms. Multi‑sample selections trigger downstream analyses: average copy‑number profiles, frequency maps, statistical significance of recurrent regions, and export of data for external tools such as GISTIC, CBS, or gene‑set enrichment pipelines. An API further enables programmatic access for large‑scale meta‑analyses.
The authors demonstrate arrayMap’s utility with case studies. For instance, they query multiple myeloma datasets to confirm the high prevalence of 1q21 amplification, and they examine colorectal cancer cohorts to link 18q loss with poorer prognosis. These examples illustrate how the resource can accelerate hypothesis generation, validation of known CNA hotspots, and discovery of novel genomic lesions across heterogeneous studies.
Limitations are acknowledged. Currently, the repository is restricted to microarray‑derived CNAs; next‑generation sequencing (WGS, WES) data, which provide higher resolution and the ability to detect focal events, are not yet integrated. Moreover, clinical outcome data (treatment response, survival) are unevenly represented, constraining translational analyses. The authors outline future plans to incorporate sequencing‑based copy‑number calls, enrich clinical annotations, and foster community submissions to sustain long‑term growth.
In summary, arrayMap represents the most extensive, platform‑agnostic collection of oncogenomic CNA data available to date. By standardizing data formats, providing robust visualization, and linking to downstream analytical tools, it serves as a foundational infrastructure for cancer genomics research, biomarker discovery, and systems‑level investigations of copy‑number driven tumor biology.
Comments & Academic Discussion
Loading comments...
Leave a Comment