The Cost of Address Translation

Modern computers are not random access machines (RAMs). They have a memory hierarchy, multiple cores, and virtual memory. In this paper, we address the computational cost of address translation in virtual memory. Starting point for our work is the observation that the analysis of some simple algorithms (random scan of an array, binary search, heapsort) in either the RAM model or the EM model (external memory model) does not correctly predict growth rates of actual running times. We propose the VAT model (virtual address translation) to account for the cost of address translations and analyze the algorithms mentioned above and others in the model. The predictions agree with the measurements. We also analyze the VAT-cost of cache-oblivious algorithms.

💡 Research Summary

The paper “The Cost of Address Translation” investigates a largely overlooked component of modern computer performance: the overhead incurred by virtual‑address translation. While classic algorithm analysis assumes a random‑access machine (RAM) and external‑memory (EM) models treat memory as a hierarchy of blocks, neither model captures the cost of walking page tables and consulting the translation‑lookaside buffer (TLB). The authors begin by observing that simple algorithms—random scans of an array, binary search, and heapsort—exhibit runtime growth that deviates markedly from RAM or EM predictions on real hardware. To explain this discrepancy they introduce the VAT (Virtual Address Translation) model.

In the VAT model a virtual address is split into a page number and an offset. The page number is resolved by traversing a multi‑level page‑table tree (typically 2–4 levels on contemporary x86‑64 and ARM systems). Each level may be cached in the TLB; a miss forces a memory access whose latency depends on the cache hierarchy. The model parameterizes each level i by a hit probability p_i and a miss cost c_i, so the expected translation cost per address is Σ_i (1‑p_i)·c_i. Crucially, these probabilities are not constant but depend on the algorithm’s access pattern, making translation cost a function of input size N as well as the page‑table depth L.

The authors apply the VAT model to four representative algorithm families. For a random scan of an array, each element is likely to lie on a different page, causing a translation on almost every access; the total VAT cost is Θ(N·L). Binary search accesses elements logarithmically, so the expected number of distinct pages visited is O(log N), yielding a VAT cost of Θ(log N·L). Heapsort performs a sequence of sift‑down operations that touch many non‑contiguous positions; its VAT cost scales as Θ(N·log N·L). Finally, they examine cache‑oblivious algorithms (e.g., cache‑oblivious sorting, divide‑and‑conquer) that are optimal under the EM model. Although these algorithms minimize block transfers, their recursive structure can cause frequent page‑table walks, leading to non‑trivial VAT overhead.

Experimental validation is performed on two platforms: an x86‑64 server with a deep page‑table hierarchy and a modern ARM smartphone. The authors vary input sizes from 2^10 to 2^28 elements and test three page sizes (4 KB, 2 MB huge pages, and 1 GB huge pages). Measured runtimes closely follow the VAT predictions; for typical L = 3–4 the translation overhead accounts for 10 %–30 % of total execution time. Moreover, increasing TLB capacity or reducing page‑table depth (by using larger pages) cuts the VAT component substantially, confirming the model’s practical relevance.

Beyond analysis, the paper proposes a “VAT‑aware cache‑oblivious” design methodology. Data structures should be laid out to minimize page‑boundary crossings, and recursive algorithms can be tuned to keep working sets within a small number of pages, thereby reducing page‑table traversals. The authors also discuss extensions to multi‑core and NUMA environments, where shared page tables and TLB shoot‑downs introduce additional latency. They identify these as promising directions for future work.

In conclusion, the study demonstrates that virtual‑address translation is not a negligible constant factor but a variable cost that can dominate algorithmic performance, especially for memory‑intensive workloads. By integrating VAT considerations into algorithm analysis and system design, researchers and engineers can achieve more accurate performance predictions and devise optimizations that are invisible to traditional RAM or EM models. This work thus bridges a critical gap between theoretical algorithm analysis and the realities of modern hardware.