Passing the Baton: High Throughput Distributed Disk-Based Vector Search with BatANN

Passing the Baton: High Throughput Distributed Disk-Based Vector Search with BatANN
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Vector search underpins modern information-retrieval systems, including retrieval-augmented generation (RAG) pipelines and search engines over unstructured text and images. As datasets scale to billions of vectors, disk-based vector search has emerged as a practical solution. However, looking to the future, we need to anticipate datasets too large for any single server. We present BatANN, a distributed disk-based approximate nearest neighbor (ANN) system that retains the logarithmic search efficiency of a single global graph while achieving near-linear throughput scaling in the number of servers. Our core innovation is that when accessing a neighborhood which is stored on another machine, we send the full state of the query to the other machine to continue executing there for improved locality. On 100M- and 1B-point datasets at 0.95 recall using 10 servers, BatANN achieves 6.21-6.49x and 2.5-5.10x the throughput of the scatter-gather baseline, respectively, while maintaining mean latency below 6 ms. Moreover, we get these results on standard TCP. To our knowledge, BatANN is the first open-source distributed disk-based vector search system to operate over a single global graph.


💡 Research Summary

The paper “Passing the Baton: High Throughput Distributed Disk-Based Vector Search with BatANN” introduces BatANN, a novel distributed system for approximate nearest neighbor (ANN) search designed to efficiently handle billion-scale vector datasets that exceed the capacity of a single server’s memory and disk bandwidth.

The core problem addressed is the scalability of disk-based graph ANN indices (like DiskANN). While such graphs offer logarithmic search complexity, a single server’s throughput is ultimately limited by its SSD bandwidth. Distributed solutions like scatter-gather, which partition data and query each partition independently, lose this logarithmic efficiency. Alternative distributed global graph systems (e.g., DistributedANN, CoT ra) maintain a single index but often rely on request-reply communication patterns or expensive RDMA hardware, leading to communication overhead or high cost.

BatANN’s key innovation is an asynchronous, state-passing query execution model designed for standard TCP networks. The system first constructs a single global search graph over the entire dataset and partitions it across servers using a neighborhood-aware algorithm to maximize locality. When a query is initiated on a server, it performs a beam search. Crucially, whenever all the current best candidate nodes in the beam reside on a different server, BatANN does not request data from that server. Instead, it packages the entire state of the beam search (the candidate pool, visited nodes, etc.) into an “envelope” and forwards it directly to the remote server. That server unpacks the state and immediately continues the search using its local disk data. This “baton-passing” process repeats until the search converges, with the final server returning results to the client.

This design offers significant advantages: 1) Reduced Communication Rounds: It eliminates the request-reply wait time, cutting the latency penalty of remote access in half. 2) Maximized Locality: Each server performs sustained local computation and I/O on a query before passing it on. 3) TCP Efficiency: The pattern of sending larger state objects less frequently aligns well with TCP’s throughput characteristics, avoiding the overhead of many small messages.

The authors evaluate BatANN on 100M and 1B-point datasets. Using 10 servers and targeting 0.95 recall, BatANN outperforms a scatter-gather baseline (which uses independent DiskANN graphs per partition) by 6.21–6.49x on the 100M dataset and 2.5–5.10x on the 1B dataset in queries per second (QPS). Importantly, it maintains a mean query latency below 6 milliseconds. These results demonstrate that BatANN successfully preserves the logarithmic search efficiency of a single global graph in a distributed setting, achieving near-linear throughput scaling with the number of servers using commodity networking.

In summary, BatANN presents a significant architectural shift for distributed vector search. By passing full query states between servers, it achieves high throughput and low latency on standard TCP networks, making efficient large-scale vector search more accessible without specialized hardware. The paper also notes that BatANN is the first open-source distributed disk-based system operating over a single global graph.


Comments & Academic Discussion

Loading comments...

Leave a Comment