Consensus dimension reduction via multi-view learning
A plethora of dimension reduction methods have been developed to visualize high-dimensional data in low dimensions. However, different dimension reduction methods often output different and possibly conflicting visualizations of the same data. This problem is further exacerbated by the choice of hyperparameters, which may substantially impact the resulting visualization. To obtain a more robust and trustworthy dimension reduction output, we advocate for a consensus approach, which summarizes multiple visualizations into a single consensus dimension reduction visualization. Here, we leverage ideas from multi-view learning in order to identify the patterns that are most stable or shared across the many different dimension reduction visualizations, or views, and subsequently visualize this shared structure in a single low-dimensional plot. We demonstrate that this consensus visualization effectively identifies and preserves the shared low-dimensional data structure through both simulated and real-world case studies. We further highlight our method’s robustness to the choice of dimension reduction method and hyperparameters – a highly-desirable property when working towards trustworthy and reproducible data science.
💡 Research Summary
The paper addresses a pervasive problem in exploratory data analysis: different dimensionality‑reduction (DR) techniques (e.g., PCA, t‑SNE, UMAP) and their hyper‑parameter settings often produce conflicting low‑dimensional visualizations of the same high‑dimensional dataset. Because unsupervised analyses lack ground‑truth labels, practitioners have little guidance on which DR output to trust, and visual impressions are vulnerable to cognitive biases such as the clustering illusion. Existing attempts to combine multiple DR results—most notably Meta‑Spec—first convert each embedding into a normalized pairwise‑distance matrix and then apply a single DR method to the averaged distance. While Meta‑Spec avoids explicit alignment, its final visualization still depends on the choice of that final DR method, limiting robustness.
To overcome these limitations, the authors propose a consensus‑driven framework rooted in multi‑view learning. Given M DR embeddings (Z^{(1)},\dots,Z^{(M)}) of the same data, each embedding is transformed into a distance matrix (D^{(m)}) (e.g., Euclidean distances). Distance matrices are invariant to rotations, reflections, and scaling, thus sidestepping the alignment problem that plagues raw coordinate‑based approaches. The core of the framework is Consensus Multidimensional Scaling (CoMDS), a multi‑view extension of classical metric MDS. CoMDS jointly optimizes a consensus embedding (\hat Z) and a set of diagonal scaling matrices (W^{(m)}) for each view by minimizing the sum of squared differences between the original distances (D^{(m)}_{ij}) and the distances in the consensus space after view‑specific scaling: \
Comments & Academic Discussion
Loading comments...
Leave a Comment