Scalable Distributed Vector Search via Accuracy Preserving Index Construction

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Scaling Approximate Nearest Neighbor Search (ANNS) to billions of vectors requires distributed indexes that balance accuracy, latency, and throughput. Yet existing index designs struggle with this tradeoff. This paper presents SPIRE, a scalable vector index based on two design decisions. First, it identifies a balanced partition granularity that avoids read-cost explosion. Second, it introduces an accuracy-preserving recursive construction that builds a multi-level index with predictable search cost and stable accuracy. In experiments with up to 8 billion vectors across 46 nodes, SPIRE achieves high scalability and up to 9.64X higher throughput than state-of-the-art systems.

💡 Research Summary

The paper addresses the long‑standing challenge of scaling Approximate Nearest Neighbor Search (ANNS) to billions of high‑dimensional vectors while meeting strict latency (10‑20 ms) and high recall (>0.9) requirements. Existing graph‑based indexes such as HNSW provide excellent accuracy but, when sharded across many servers, incur a prohibitive number of cross‑node hops—over 80 % of all traversal steps—leading to latency that can be two orders of magnitude higher than acceptable. Hierarchical, partition‑based designs reduce cross‑node communication by clustering vectors and routing queries through centroids, yet the clustering introduces fidelity loss: vectors near partition boundaries are poorly represented, forcing queries to probe many partitions to recover accuracy. This probing dramatically increases vector reads, CPU cycles, and I/O, throttling throughput.

SPIRE (Scalable Partition‑aware Index with Recursive Estimation) resolves this trade‑off with two key ideas. First, it introduces partition density (D = #partitions / #vectors) as a quantitative measure of granularity. Empirical profiling across several datasets shows a sharp inflection: as D decreases (coarser partitions), the number of vectors accessed per query remains low until D falls below roughly 0.1, after which the read cost rises sharply because each partition becomes too large to preserve fidelity. Simultaneously, cross‑node hops decrease gradually with lower D but never vanish, even at very low densities. SPIRE therefore selects the balanced granularity—the density just before the inflection point—ensuring that vector‑access cost stays comparable to a high‑quality graph index while still reducing remote hops.

Second, SPIRE builds a multi‑level hierarchy by recursively applying the same balanced granularity to the centroids generated at each level. The leaf level clusters the raw vectors; its centroids become the data points for the next level, which is again clustered with the same density criterion, and so on until the top‑level graph fits within a single server’s memory budget (typically a few thousand centroids). During query processing, the root graph (kept in memory on every compute node) returns the top‑m nearest centroids; the corresponding m partitions are fetched in parallel from SSDs, and the process repeats down the hierarchy. Because the number of levels grows logarithmically with dataset size, the number of data‑dependent network round‑trips is bounded and predictable, while each level’s read cost is kept low by the balanced granularity.

Implementation details: SPIRE is written in ~6 000 lines of C++, stores only the root graph in memory (replicated across nodes), and offloads all lower‑level indices and raw vectors to SSD‑backed distributed storage. This stateless compute tier enables easy fault recovery and elastic scaling. Experiments on SIFT‑100M, SP‑ACEV‑100M, and larger synthetic workloads (up to 8 billion vectors on 46 nodes) demonstrate that SPIRE achieves up to 9.64× higher peak throughput than state‑of‑the‑art distributed ANNS systems, while maintaining lower average and p99 latencies. Even when SSD I/O is saturated, network and CPU utilization stay below 30 % and 40 % respectively, indicating ample headroom for further scaling.

In summary, the paper’s contributions are: (1) a principled, data‑driven method to select partition granularity based on partition density, (2) a recursive, accuracy‑preserving hierarchy that bounds network round‑trips and stabilizes search cost, and (3) a practical, high‑performance system that demonstrates linear scalability to billions of vectors with modest hardware resources. SPIRE’s design offers a compelling blueprint for future large‑scale vector search services.

Scalable Distributed Vector Search via Accuracy Preserving Index Construction

💡 Research Summary

Comments & Academic Discussion

Leave a Comment