Hardware Translation Coherence for Virtualized Systems
To improve system performance, modern operating systems (OSes) often undertake activities that require modification of virtual-to-physical page translation mappings. For example, the OS may migrate data between physical frames to defragment memory and enable superpages. The OS may migrate pages of data between heterogeneous memory devices. We refer to all such activities as page remappings. Unfortunately, page remappings are expensive. We show that translation coherence is a major culprit and that systems employing virtualization are especially badly affected by their overheads. In response, we propose hardware translation invalidation and coherence or HATRIC, a readily implementable hardware mechanism to piggyback translation coherence atop existing cache coherence protocols. We perform detailed studies using KVM-based virtualization, showing that HATRIC achieves up to 30% performance and 10% energy benefits, for per-CPU area overheads of 2%. We also quantify HATRIC’s benefits on systems running Xen and find up to 33% performance improvements.
💡 Research Summary
The paper investigates a largely overlooked source of overhead in modern virtualized systems: translation coherence. When an operating system or hypervisor remaps a physical page—e.g., to defragment memory, create super‑pages, or move data between heterogeneous memory devices—the corresponding virtual‑to‑physical mappings in the page tables must be updated. In a virtualized environment this involves two levels of page tables (guest and nested) and several hardware translation structures: TLBs, MMU caches, and nested TLBs (nTLBs). Existing translation‑coherence mechanisms are coarse‑grained: they either flush entire translation structures or broadcast invalidation messages to all CPUs that might have run the VM, because the hypervisor does not know which guest virtual address (GVA) is affected. Consequently, each remapping incurs expensive inter‑processor interrupts, VM‑exits, and full walks of the page‑table hierarchy, which the authors measure to consume up to 40 % of runtime in their KVM experiments and a similar fraction in Xen.
To address these problems the authors propose HATRIC (Hardware Translation Invalidation and Coherence), a lightweight hardware extension that piggybacks translation‑coherence on the existing cache‑coherence protocol. The key idea is to augment every translation‑structure entry with a co‑tag: the system physical address (SPA) of the page that the entry maps to. Because the SPA is already known when the hypervisor updates a nested‑page‑table entry, the hypervisor can issue a coherence request that carries this SPA. The cache‑coherence network then delivers the request only to those cores that actually hold a translation entry with a matching co‑tag, allowing precise invalidation of individual entries rather than flushing whole structures. This satisfies three design goals identified by the authors: (1) precise validation, (2) precise target identification, and (3) lightweight target‑side handling.
Implementation requires only modest hardware changes: a few extra bits per TLB/MMU‑cache/nTLB entry and logic to compare incoming SPA tags with stored co‑tags. The authors integrate this logic into a MESI‑style coherence protocol, reusing the existing broadcast and acknowledgment mechanisms. As a result, the target core can invalidate the relevant translation entry and acknowledge the initiator without invoking software, eliminating costly inter‑processor interrupts and VM‑exits.
The authors evaluate HATRIC using a cycle‑accurate simulator of a 16‑core Intel Haswell‑class processor equipped with 2 GB of die‑stacked DRAM (four times the bandwidth of an 8 GB off‑chip DRAM) and a total of 10 GB addressable memory. They modify KVM to perform page migrations between the fast die‑stacked memory and the slower off‑chip memory, exercising a variety of paging policies. Even with the best software‑only paging policy, translation‑coherence overheads dominate and limit performance gains. With HATRIC enabled, the same workloads achieve up to 30 % higher performance and 10 % lower energy consumption, while incurring less than 2 % area overhead per CPU. A similar study on Xen shows up to 33 % performance improvement.
Beyond the quantitative results, the paper discusses broader implications. HATRIC’s reliance on the cache‑coherence substrate makes it applicable to other ISAs (ARM, Power, RISC‑V) and to systems that still use shadow page tables. It also opens the door for more aggressive use of heterogeneous memory technologies (e.g., 3D‑XPoint, HBM, persistent memory) in virtualized clouds, because the translation‑coherence penalty that previously discouraged frequent page migrations is now largely eliminated.
In summary, the work makes three major contributions: (1) a thorough characterization of translation‑coherence costs in virtualized, heterogeneous‑memory systems; (2) the design of HATRIC, a hardware mechanism that integrates translation‑coherence with cache‑coherence to achieve precise, low‑overhead invalidations; and (3) a comprehensive evaluation demonstrating substantial performance, energy, and area benefits. The paper convincingly argues that future high‑performance, memory‑centric cloud platforms will need hardware support like HATRIC to fully exploit emerging memory technologies without being crippled by translation‑coherence overheads.
Comments & Academic Discussion
Loading comments...
Leave a Comment