Metric geometry for ranking-based voting: Tools for learning electoral structure

Metric geometry for ranking-based voting: Tools for learning electoral structure
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

In this paper, we develop the metric geometry of ranking statistics, proving that the two major permutation distances in the statistics literature – Kendall tau and Spearman footrule – extend naturally to incomplete rankings with both coordinate embeddings and graph realizations. This gives us a unifying framework that allows us to connect popular topics in computational social choice: metric preferences (and metric distortion), polarization, and proportionality. As an important application, the metric structure enables efficient identification of blocs of voters and slates of their preferred candidates. Since the definitions work for partial ballots, we can execute the methods not only on synthetic elections, but on a suite of real-world elections. This gives us robust clustering methods that often produce an identical grouping of voters – even though one family of methods is based on a Condorcet-consistent ranking rule while the other is not.


💡 Research Summary

This paper develops a comprehensive metric‑geometric framework for ranking‑based voting, extending the two most widely used permutation distances—Kendall τ (swap distance) and Spearman footrule (score distance)—to incomplete (partial) rankings. The authors introduce two coordinate embeddings: a Borda embedding that maps each ballot to an m‑dimensional vector of reverse ranks (so that the L₁ distance between embeddings equals half the footrule distance), and a head‑to‑head embedding that records every pairwise comparison as +1, 0, −1 in an (m choose 2)‑dimensional space (so that the L₁ distance equals half the Kendall τ distance). Both embeddings naturally handle partial ballots by adopting a “pessimistic” convention (unmentioned candidates receive the worst possible rank), though an averaged convention is also discussed.

Beyond embeddings, the paper constructs sparse graph realizations of these metrics. The basic ballot graph Gₘ has vertices for all possible (partial) ballots and edges for adjacent transpositions (unit weight) and for truncation/extension moves (weight proportional to the rank gap). The shortest‑path metric on Gₘ reproduces Kendall τ exactly. A shortcut version G₊ₘ adds edges for arbitrary transpositions with weight equal to the distance between the swapped positions; the induced path metric on G₊ₘ equals the Spearman footrule distance. Both graphs have O(m!) vertices and O(m²) degree, making them far sparser than the complete metric graph while preserving the essential distances. When restricted to complete ballots, Gₘ coincides with the Cayley graph of the symmetric group Sₘ, and G₊ₘ augments it with additional generators.

The authors leverage this geometric structure for unsupervised learning of electoral patterns. In the Borda embedding, L₁ medians give “Borda median” rankings; in the head‑to‑head embedding, L₁ centers are Kemeny rankings, and k‑clustering under this metric is precisely the k‑Kemeny problem. By clustering ballots (voters) and candidates separately, they identify coherent voter blocs and candidate slates. The methods are agnostic to party labels and work directly with partial ballots, enabling analysis of real‑world data where many voters submit truncated rankings.

Empirical validation proceeds in two stages. First, synthetic elections with known block and slate structures are generated; the proposed clustering reliably recovers the planted groups, and results are robust across a range of noise levels and numbers of clusters. Second, the framework is applied to a large corpus of Scottish local elections (over 1,000 elections from 2012‑2022, featuring five major parties and numerous minor parties). Despite the presence of partial ballots, both a Condorcet‑consistent clustering pipeline and a non‑Condorcet pipeline produce almost identical voter bloc partitions, demonstrating the stability of the metric approach. Detailed case studies (e.g., the Penland Hills ward) reveal clear ideological blocs and corresponding candidate slates that would be difficult to detect with traditional party‑centric analyses.

The paper’s contributions are threefold: (1) a rigorous extension of Kendall τ and Spearman footrule to partial and weak rankings, together with explicit coordinate and graph representations; (2) a demonstration that the “pessimistic” Borda distance aligns with a sparse graph realization, while the averaged version does not, highlighting subtle design choices in metric extensions; (3) practical clustering algorithms that exploit these metrics to uncover meaningful electoral structure, validated on both synthetic and real data. All code and datasets are released publicly, ensuring reproducibility and inviting further research on metric distortion, proportionality, polarization, and other central topics in computational social choice. The work thus bridges a gap between worst‑case metric distortion analyses and data‑driven descriptive tools, providing a versatile toolkit for scholars and practitioners interested in the geometry of rankings.


Comments & Academic Discussion

Loading comments...

Leave a Comment