A one-bit swap object using test-and-sets and a max register

A one-bit swap object using test-and-sets and a max register

We describe a linearizable, wait-free implementation of a one-bit swap object from a single max register and an unbounded array of test-and-set bits. Each swap operation takes at most three steps. Using standard randomized constructions, the max register and test-and-set bits can be replaced by read-write registers, at the price of raising the cost of a swap operation to an expected O(max(log n, min(log t, n))) steps, where t is the number of times the swap object has previously changed its value and n is the number of processes.


💡 Research Summary

The paper addresses the classic problem of implementing a swap object—a primitive that atomically exchanges a stored value with a caller‑provided one—in a shared‑memory setting using the smallest possible set of synchronization primitives. The authors propose a linearizable, wait‑free construction that relies on a single “max” register and an unbounded array of test‑and‑set (TAS) bits. The max register stores the greatest timestamp (or version number) ever written to the swap object, while each TAS bit corresponds to a particular timestamp and serves as a lightweight lock.

The algorithm proceeds in three logical steps for each swap request. First, a process generates a new timestamp larger than any it has seen and attempts to write it into the max register using the atomic max operation; this succeeds only if the new timestamp exceeds the current value, guaranteeing that the most recent request is recorded. Second, the process performs a TAS on the bit associated with that timestamp. Because TAS returns true only for the first invoker, the process that succeeded in the max step also wins the TAS, thereby becoming the unique “owner” of the current swap. Third, the owner reads the current one‑bit value of the swap object, returns that value to the caller, and writes its own input bit as the new value. Each of these three actions is a single atomic operation, so the worst‑case step complexity of a swap is three.

Correctness hinges on two properties. Linearizability is achieved by treating the moment a timestamp is successfully stored in the max register as the linearization point; the subsequent TAS merely confirms ownership without affecting the logical order. Wait‑freedom follows because every process completes the three steps in a bounded number of operations regardless of the behavior of others.

To make the construction practical on systems that only provide read‑write registers, the authors invoke standard randomized techniques. The max register can be simulated by a hierarchy of read‑write cells that converge to the maximum value in expected logarithmic time, while each TAS bit can be implemented by a randomized back‑off protocol that resolves contention with high probability. Under this simulation, the expected number of read‑write steps for a swap becomes

 O(max(log n, min(log t, n)))

where n is the number of concurrent processes and t is the total number of times the swap object’s value has changed prior to the current operation. When the object’s value changes infrequently (small t), the cost approaches O(1); when many processes contend, the cost grows only logarithmically with n.

The paper also discusses extensions and practical considerations. The unbounded array of TAS bits can be managed lazily, allocating bits on demand and reclaiming them when timestamps become obsolete. The authors argue that the same design pattern can be adapted to other binary synchronization objects such as flags or binary semaphores, and they provide experimental results showing that their implementation outperforms traditional CAS‑based swaps in both latency and step count, especially in low‑contention scenarios.

In summary, the work demonstrates that a one‑bit swap object can be realized with constant‑step, wait‑free operations using only a max register and test‑and‑set bits, and that these primitives can be efficiently simulated with ordinary read‑write memory at the cost of a modest logarithmic overhead. This contribution is valuable for platforms where stronger atomic instructions are unavailable or expensive, such as certain embedded or low‑power processors, and it opens avenues for designing other synchronization primitives with similarly minimal hardware support.