Specifying and Verifying RDMA Synchronisation (Extended Version)

Specifying and Verifying RDMA Synchronisation (Extended Version)
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Remote direct memory access (RDMA) allows a machine to directly read from and write to the memory of remote machine, enabling high-throughput, low-latency data transfer. Ensuring correctness of RDMA programs has only recently become possible with the formalisation of $\text{RDMA}^\text{TSO}$ semantics (describing the behaviour of RDMA networking over a TSO CPU). However, this semantics currently lacks a formalisation of remote synchronisation, meaning that the implementations of common abstractions such as locks cannot be verified. In this paper, we close this gap by presenting $\text{RDMA}^{\text{TSO}}{\text{RMW}}$, the first semantics for remote `read-modify-write’ (RMW) instructions over TSO. It turns out that remote RMW operations are weak and only ensure atomicity against other remote RMWs. We therefore build a set of composable synchronisation abstractions starting with the $\text{RDMA}^{\text{WAIT}}{\text{RMW}}$ library. Underpinned by $\text{RDMA}^{\text{WAIT}}{\text{RMW}}$, we then specify, implement and verify three classes of remote locks that are suitable for different scenarios. Additionally, we develop the notion of a strong RDMA model, $\text{RDMA}^{\text{SC}}{\text{RMW}}$, which is akin to sequential consistency in shared memory architectures. Our libraries are built to be compatible with an existing set of high-performance libraries called LOCO, which ensures compositionality and verifiability.


💡 Research Summary

Remote Direct Memory Access (RDMA) has become a cornerstone of modern high‑performance networking, allowing a node to read from or write to the memory of a remote node without involving the remote CPU. While the RDMA_TSO model formalised the behaviour of remote reads, writes and polling on top of a Total‑Store‑Order (TSO) CPU, it deliberately omitted remote read‑modify‑write (RMW) operations, leaving a gap for the verification of higher‑level synchronisation primitives such as locks.

This paper fills that gap by introducing two new semantics. First, RDMA_TSO_RMW extends the original model with remote RMW instructions, capturing the fact that remote RMWs are only atomic with respect to other remote RMWs and provide weak isolation against CPU accesses, remote reads and writes. The authors validate this model against the InfiniBand specification and through discussions with NVIDIA engineers. Because the original polling primitive (Poll) depends on the exact number of prior remote operations, it is non‑compositional and unsuitable for modular verification.

To obtain compositionality, the authors build on the LOCO framework’s notion of composable objects and replace Poll with a work‑identifier based waiting primitive, Wait. This yields the RDMA_WAIT model, where each remote operation is tagged with a work identifier (Wid) and Wait(Wid) blocks until all earlier operations sharing that identifier have completed. RDMA_WAIT is modular and can be combined with other LOCO libraries.

RDMA_WAIT_RMW further extends RDMA_WAIT with two remote RMW primitives: remote compare‑and‑swap (RCAS) and remote fetch‑and‑add (RFAA). The authors introduce a new stamp aNAR_n in the mowgli declarative verification framework to represent the ordering guarantees of remote RMWs. Stamps and the preserved program order (ppo) relation allow the framework to reason about possible reorderings between remote RMWs and other library methods, while the happens‑before (hb) relation eliminates forbidden executions.

Using the extended mowgli framework, the paper specifies, implements, and mechanically verifies three remote lock libraries:

  1. wlock – a weak lock that guarantees mutual exclusion across the network but provides no ordering guarantees for RDMA operations inside the critical section.
  2. slock – a strong lock that augments wlock with a global fence before releasing the lock, thereby ensuring sequential‑consistent ordering of all RDMA instructions within the critical section.
  3. nlock – a node‑specific lock that synchronises only operations targeting a particular node n; operations on other nodes remain unsynchronised, offering a trade‑off between scalability and ordering.

Finally, the authors propose RDMA_SC_RMW, a strong RDMA model that mimics sequential consistency (SC) for remote RMWs. By inserting global fences or otherwise enforcing SC‑style ordering, RDMA_SC_RMW provides strong isolation and ordering guarantees comparable to a shared‑memory system, while remaining compatible with the modular LOCO libraries.

The contributions are threefold: (i) the first formal semantics for remote RMWs, validated against hardware documentation; (ii) an extension of the mowgli framework to support RMWs and the construction of composable, verified RDMA libraries; and (iii) the definition of a strong SC‑like RDMA model. Together, these results enable rigorous specification and verification of synchronisation primitives in RDMA‑based distributed systems, bridging the gap between high‑performance networking and formal correctness guarantees.


Comments & Academic Discussion

Loading comments...

Leave a Comment