How to Evaluate Dimensionality Reduction? - Improving the Co-ranking Matrix
The growing number of dimensionality reduction methods available for data visualization has recently inspired the development of quality assessment measures, in order to evaluate the resulting low-dimensional representation independently from a methods’ inherent criteria. Several (existing) quality measures can be (re)formulated based on the so-called co-ranking matrix, which subsumes all rank errors (i.e. differences between the ranking of distances from every point to all others, comparing the low-dimensional representation to the original data). The measures are often based on the partioning of the co-ranking matrix into 4 submatrices, divided at the K-th row and column, calculating a weighted combination of the sums of each submatrix. Hence, the evaluation process typically involves plotting a graph over several (or even all possible) settings of the parameter K. Considering simple artificial examples, we argue that this parameter controls two notions at once, that need not necessarily be combined, and that the rectangular shape of submatrices is disadvantageous for an intuitive interpretation of the parameter. We debate that quality measures, as general and flexible evaluation tools, should have parameters with a direct and intuitive interpretation as to which specific error types are tolerated or penalized. Therefore, we propose to replace K with two parameters to control these notions separately, and introduce a differently shaped weighting on the co-ranking matrix. The two new parameters can then directly be interpreted as a threshold up to which rank errors are tolerated, and a threshold up to which the rank-distances are significant for the evaluation. Moreover, we propose a color representation of local quality to visually support the evaluation process for a given mapping, where every point in the mapping is colored according to its local contribution to the overall quality.
💡 Research Summary
The paper addresses the growing need for objective quality assessment of dimensionality reduction (DR) visualizations, a need that has become pressing as the number of DR algorithms proliferates. Existing quality measures commonly rely on the co‑ranking matrix, which records for each pair of points the rank of their distance in the high‑dimensional space (ρ) and in the low‑dimensional embedding (r). Errors appear as off‑diagonal entries, and many measures partition the matrix at a single parameter K, counting points that stay within the K‑nearest‑neighbourhood after projection (the Q_NX(K) or LCMC scores). The authors argue that this single K simultaneously controls two distinct concepts: (1) the size of the neighbourhood whose relationships we care about, and (2) the tolerance for rank errors. Because K conflates these, the resulting curves over K are difficult to interpret and can give misleading indications of quality.
To resolve this, the authors propose a two‑parameter framework. The first parameter τ₁ defines a maximum allowable rank deviation |ρ−r|; any pair whose rank difference exceeds τ₁ is considered an error and is penalised. The second parameter τ₂ defines the neighbourhood radius in the original space: only pairs with ρ ≤ τ₂ are deemed relevant for the evaluation. By separating neighbourhood relevance from error tolerance, users can directly express what they consider important (e.g., preserving close neighbours) and how much deviation they are willing to accept.
In addition to separating the concepts, the paper introduces a new weighting scheme for the co‑ranking matrix. Instead of the rectangular sub‑matrices used in earlier work, a non‑rectangular mask is applied that gives higher weight to small rank differences (near the diagonal) and to pairs that belong to the important neighbourhood (low ρ). The weight can be expressed as w_{kl}=f(|k−l|)·g(min(k,l)), where f decays with distance from the diagonal and g increases for smaller ranks. This design penalises large errors more strongly while still rewarding preservation of locally important relationships.
The resulting quality score is defined as
Q(τ₁,τ₂)= (1/N) Σ_i Σ_{j≠i} 𝟙
Comments & Academic Discussion
Loading comments...
Leave a Comment