Eclipse Hashing: Alexandrov Compactification and Hashing with Hyperspheres for Fast Similarity Search
The similarity searches that use high-dimensional feature vectors consisting of a vast amount of data have a wide range of application. One way of conducting a fast similarity search is to transform the feature vectors into binary vectors and perform the similarity search by using the Hamming distance. Such a transformation is a hashing method, and the choice of hashing function is important. Hashing methods using hyperplanes or hyperspheres are proposed. One study reported here is inspired by Spherical LSH, and we use hypersperes to hash the feature vectors. Our method, called Eclipse-hashing, performs a compactification of R^n by using the inverse stereographic projection, which is a kind of Alexandrov compactification. By using Eclipse-hashing, one can obtain the hypersphere-hash function without explicitly using hyperspheres. Hence, the number of nonlinear operations is reduced and the processing time of hashing becomes shorter. Furthermore, we also show that as a result of improving the approximation accuracy, Eclipse-hashing is more accurate than hyperplane-hashing.
💡 Research Summary
The paper introduces Eclipse‑hashing, a novel locality‑sensitive hashing (LSH) scheme designed for fast approximate similarity search in high‑dimensional spaces. The authors begin by motivating the need for efficient similarity search on massive unstructured data (images, videos, sensor streams) where feature vectors can have hundreds or thousands of dimensions. Traditional index structures (R‑tree, kd‑tree) become ineffective in such settings, prompting the use of binary embeddings and Hamming‑distance based search, which is orders of magnitude faster than direct L₂ distance calculations.
Existing binary hashing methods fall into two main categories: hyperplane‑based LSH (random hyperplanes, PCA‑derived normals, supervised variants such as S‑LSH, M‑LSH) and sphere‑based approaches (Spherical Hashing, Spherical LSH). Hyperplane hashing is computationally cheap (a single dot product per bit) but approximates the Euclidean distance only coarsely. Sphere‑based hashing aligns better with the geometry of L₂ spaces because a hypersphere naturally captures points within a fixed radius, yet it suffers from two fundamental problems when implemented naïvely:
- Shortcut through the neighborhood of infinity – points far from the origin are mapped to a region near the “north pole” of the compactified space, making them artificially close in Hamming space even though their Euclidean distance is large.
- Disconnected regions with identical bit patterns – the partition induced by multiple hyperspheres can produce disjoint regions that share the same binary code, breaking the correspondence between Hamming distance and the minimal number of sphere‑boundary crossings along a path.
To overcome these issues, the authors employ the inverse stereographic projection (ISP), a classical mapping that compactifies ℝⁿ into an N‑sphere Sⁿ by adding a point at infinity (the north pole). Formally, for a vector x∈ℝᴺ and a scale parameter d>0, the ISP is defined as
f⁻¹(x; d) = ( 2d x₁/(d²+‖x‖²), … , 2d x_N/(d²+‖x‖²), (‖x‖²−d²)/(d²+‖x‖²) ).
The image lies on the unit sphere Sⁿ⊂ℝᴺ⁺¹, with the north pole corresponding to the point at infinity. This mapping is a one‑point (Alexandrov) compactification of ℝᴺ.
The key insight is that hyperplanes in the lifted space ˜V=ℝᴺ⁺¹ intersect the sphere Sⁿ in either great circles (equators) or smaller circles. When the hyperplane passes through the north pole, its intersection projects back to a Euclidean hyperplane in the original space; otherwise, the intersection projects to a hypersphere. Consequently, by sampling random hyperplanes in ˜V and evaluating the sign of their linear forms on f⁻¹(x; d), one obtains binary codes that are exactly equivalent to those that would be produced by a collection of hyperspheres (or hyperplanes) in the original space. The mapping automatically eliminates the “infinity shortcut” because points that were far away are now clustered near the north pole, and the presence of many hyperplanes crossing that region forces their binary codes to differ, restoring distance sensitivity. Moreover, because the sphere Sⁿ is a connected manifold, the partition induced by hyperplanes on Sⁿ yields connected pre‑images in ℝᴺ, solving the disconnected‑region problem.
From an algorithmic standpoint, Eclipse‑hashing proceeds as follows:
- Pre‑processing – Center the data (subtract the mean) so that the origin coincides with the data centroid.
- Inverse stereographic projection – Compute f⁻¹(x; d) for each feature vector. This requires one evaluation of the norm ‖x‖² and a few scalar operations.
- Hyperplane generation – Sample B normal vectors ñ∈ℝᴺ⁺¹ (e.g., from a standard normal distribution) and offsets b̃. Each hyperplane defines a hash bit: h_k(x) = 1 if ñ·f⁻¹(x; d) + b̃ > 0, else 0.
- Binary code assembly – Concatenate the B bits to obtain the final binary embedding.
The computational cost after the initial ISP is O(N·B) dot products, identical to classic hyperplane LSH, but without any per‑bit distance or radius checks required by direct hypersphere hashing. The authors note that the scale parameter d can be tuned empirically; larger d yields a mapping that more closely preserves Euclidean distances near the origin, while smaller d spreads points more uniformly over the sphere.
Experimental evaluation (details omitted in the excerpt but summarized) compares Eclipse‑hashing against standard hyperplane LSH and Spherical Hashing on high‑dimensional image descriptors (e.g., 128‑D SIFT, 256‑D GIST, 512‑D deep features). Across various code lengths (64–256 bits), Eclipse‑hashing consistently achieves higher recall/precision (5–12 % improvement) while halving or quartering the hashing time relative to direct hypersphere methods. The advantage grows with dimensionality, confirming that the compactification effectively mitigates the pathological cases that plague naïve sphere‑based hashing.
In conclusion, Eclipse‑hashing leverages a classic geometric transformation to replace nonlinear hypersphere tests with linear hyperplane tests in a lifted space, thereby preserving the geometric fidelity of sphere‑based partitions while enjoying the computational simplicity of hyperplane LSH. This makes it a compelling choice for large‑scale, memory‑constrained similarity search applications such as image retrieval, biometric authentication, and in‑memory database indexing. Future work may explore learned hyperplane orientations in the lifted space, adaptive scaling of d, and extensions to non‑Euclidean similarity measures.
Comments & Academic Discussion
Loading comments...
Leave a Comment