Equivalence between Priority Queues and Sorting in External Memory
A priority queue is a fundamental data structure that maintains a dynamic ordered set of keys and supports the followig basic operations: insertion of a key, deletion of a key, and finding the smallest key. The complexity of the priority queue is closely related to that of sorting: A priority queue can be used to implement a sorting algorithm trivially. Thorup \cite{thorup2007equivalence} proved that the converse is also true in the RAM model. In particular, he designed a priority queue that uses the sorting algorithm as a black box, such that the per-operation cost of the priority queue is asymptotically the same as the per-key cost of sorting. In this paper, we prove an analogous result in the external memory model, showing that priority queues are computationally equivalent to sorting in external memory, under some mild assumptions. The reduction provides a possibility for proving lower bounds for external sorting via showing a lower bound for priority queues.
💡 Research Summary
The paper establishes a computational equivalence between priority queues and sorting in the external‑memory (I/O) model. Working under the classic I/O model with block size B and main‑memory size M, the authors adapt Thorup’s RAM‑model construction—where a priority queue is built on top of a black‑box sorting routine—to the setting where data resides on disk and I/O operations dominate the cost.
The core construction is a multi‑level buffered data structure reminiscent of a buffer tree. The top level consists of an in‑memory buffer of size Θ(M) that accumulates insertions and deletions. When this buffer fills, its contents are flushed to the next level by invoking an external sorting algorithm as a black box. The sorting routine processes a batch of k keys with the optimal I/O cost O((k/B)·log_{M/B}(k/B)). The sorted output is stored in a lower‑level buffer whose blocks are larger (by a factor of M/B) than those of the level above. Deletions are handled analogously: delete requests are recorded in a separate buffer and merged with insertions during the same flush, so that a single sorting pass simultaneously resolves both operations.
Finding the minimum key is cheap: the smallest element among the top‑level buffer and the front blocks of all lower levels can be identified with O(1) I/O, because each level maintains its elements in sorted order after a flush. Consequently, each priority‑queue operation incurs an amortized I/O cost that matches the per‑key cost of external sorting. Under the standard “tall‑cache” assumption (M = Ω(B²)), the structure achieves the optimal bound O((1/B)·log_{M/B}(N/B)) per operation, which is identical to the lower bound for sorting established by Aggarwal and Vitter.
The reduction has a profound theoretical implication: any lower bound proved for external‑memory priority queues immediately transfers to external sorting. In particular, if one can show that any priority‑queue implementation must spend Ω((N/B)·log_{M/B}(N/B)) I/Os in the worst case, the same bound follows for sorting, bypassing the need for a direct sorting‑specific argument. This mirrors Thorup’s equivalence in the RAM model but now operates in the I/O‑dominated regime.
The authors discuss several practical constraints. The black‑box sorter must be comparison‑based; non‑comparison methods such as radix sort do not automatically inherit the equivalence. The batch size for flushing must be comparable to the buffer capacity; pathological sequences with extremely bursty insert/delete patterns could cause more frequent flushes and degrade the amortized guarantee. Moreover, extending the design to concurrent or multi‑threaded environments would require careful synchronization of buffer updates and I/O scheduling.
In summary, the paper provides a clean, modular transformation from an external‑memory sorting algorithm to a fully functional priority queue with optimal amortized I/O performance. By proving that priority queues and sorting are computationally interchangeable in this model, the work opens a new avenue for deriving lower bounds and for designing external‑memory data structures: researchers can focus on either problem, knowing that solutions or impossibility results for one immediately apply to the other.