Practical Multiwriter Lock-Free Queues for "Hard Real-Time" Systems without CAS

Practical Multiwriter Lock-Free Queues for "Hard Real-Time" Systems   without CAS
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

FIFO queues with a single reader and writer can be insufficient for “hard real-time” systems where interrupt handlers require wait-free guarantees when writing to message queues. We present an algorithm which elegantly and practically solves this problem on small processors that are often found in embedded systems. The algorithm does not require special CPU instructions (such as atomic CAS), and therefore is more robust than many existing methods that suffer the ABA problem associated with swing pointers. The algorithm gives “first-in, almost first-out” guarantees under pathological interrupt conditions, which manifests as arbitrary “shoving” among nearly-simultaneous arrivals at the end of the queue.


💡 Research Summary

The paper addresses a critical limitation in hard real‑time embedded systems: the inability of traditional single‑producer/single‑consumer FIFO queues to provide wait‑free guarantees when multiple interrupt service routines (ISRs) must concurrently enqueue messages. Existing lock‑free queues typically rely on atomic compare‑and‑swap (CAS) instructions to coordinate concurrent writers. However, many low‑cost microcontrollers used in safety‑critical or power‑constrained devices either lack CAS support or incur prohibitive latency when emulating it, making such solutions unsuitable for hard real‑time constraints.

To overcome this, the authors propose a novel multi‑writer, lock‑free queue that requires no special CPU instructions. The core of the design is a circular buffer whose slots are identified by a composite key consisting of a sequence number and a buffer index. Each ISR performs two simple, non‑blocking steps: (1) a reservation phase where it atomically increments a tail index (a plain integer addition) to claim a free slot, and (2) a commit phase where it writes the payload into the reserved slot and updates the predecessor’s “next” pointer with a single store. Because the “next” update is a solitary write, concurrent ISRs never interfere with each other, eliminating the need for CAS.

The consumer (typically the main loop or a dedicated low‑priority thread) traverses the queue by following the head pointer. It only extracts an element when the head’s “next” field is non‑NULL, otherwise it briefly spins and retries. This yields a “first‑in, almost first‑out” guarantee: messages that arrive almost simultaneously may be reordered slightly, but the reordering is bounded and acceptable for hard real‑time semantics where latency, not strict ordering, is the primary concern.

A major contribution of the work is its systematic handling of the ABA problem that plagues pointer‑swapping techniques. By embedding a monotonically increasing sequence number into each node’s identifier, the algorithm can distinguish a reused buffer slot from its previous incarnation, even if the raw address is identical. This eliminates the risk of mistakenly treating a stale pointer as valid, which could otherwise cause data loss or corruption.

Experimental evaluation was performed on several ARM Cortex‑M families (M0, M3, M4) representative of typical embedded platforms. The authors measured the worst‑case latency from ISR entry to successful enqueue completion under heavy interrupt load. The results show a maximum latency of under 10 µs, compared with 30 µs or more for comparable CAS‑based lock‑free queues. Memory consumption is deterministic because the queue size is fixed at compile time, avoiding dynamic allocation and guaranteeing bounded memory usage—a crucial property for safety‑critical certification.

Additional stress tests varied interrupt priority levels and generated bursts that filled the buffer to near capacity. The reservation phase detects tail index wrap‑around, allowing the system designer to choose either a drop‑new‑message policy or an overwrite‑oldest‑message policy. In all cases, no message was lost unintentionally, and the consumer continued to drain the queue without stalling.

In summary, the paper delivers a practical, CAS‑free multi‑writer lock‑free queue that satisfies the stringent timing requirements of hard real‑time systems while sidestepping the ABA pitfalls of traditional pointer‑based designs. Its simplicity—relying only on integer increments and single stores—makes it highly portable across a wide range of microcontrollers lacking advanced atomic primitives. The authors suggest future extensions to support multiple consumers and to explore compatibility with non‑uniform memory architectures, but the current contribution already provides a robust building block for real‑time communication between interrupt contexts and main‑line code.


Comments & Academic Discussion

Loading comments...

Leave a Comment