Fast redshift clustering with the Baire (ultra) metric
The Baire metric induces an ultrametric on a dataset and is of linear computational complexity, contrasted with the standard quadratic time agglomerative hierarchical clustering algorithm. We apply the Baire distance to spectrometric and photometric redshifts from the Sloan Digital Sky Survey using, in this work, about half a million astronomical objects. We want to know how well the (more cos\ tly to determine) spectrometric redshifts can predict the (more easily obtained) photometric redshifts, i.e. we seek to regress the spectrometric on the photometric redshifts, and we develop a clusterwise nearest neighbor regression procedure for this.
💡 Research Summary
The paper introduces a novel clustering method based on the Baire distance, an ultrametric defined by the length of the longest common prefix of numeric strings. Traditional agglomerative hierarchical clustering requires O(n²) pairwise distance calculations, which becomes prohibitive for large datasets. In contrast, the Baire‑based approach converts each data point into a string of decimal digits (or any base‑m representation) and assigns it to a hierarchy of bins according to its successive digits. For a chosen precision ℓ (the number of digits examined), the algorithm creates at most 10^ℓ nodes (for base‑10 data) and places each of the n observations into the appropriate node in a single pass. Consequently, the computational cost is O(n·ℓ), effectively linear in the number of observations, with a very small constant factor because only digit extraction and bin updates are required.
The authors apply this method to the Sloan Digital Sky Survey (SDSS) dataset, focusing on about 443 094 objects for which both spectroscopic redshifts (z_spec) and photometric redshifts (z_phot) are available. After normalising redshifts to the interval
Comments & Academic Discussion
Loading comments...
Leave a Comment