DAMQ-Based Schemes for Efficiently Using the Buffer Spaces of a NoC Router
In this paper we present high performance dynamically allocated multi-queue (DAMQ) buffer schemes for fault tolerance systems on chip applications that require an interconnection network. Two or four virtual channels shared the same buffer space. On the message switching layer, we make improvement to boost system performance when there are faults involved in the components communication. The proposed schemes are when a node or a physical channel is deemed as faulty, the previous hop node will terminate the buffer occupancy of messages destined to the failed link. The buffer usage decisions are made at switching layer without interactions with higher abstract layer, thus buffer space will be released to messages destined to other healthy nodes quickly. Therefore, the buffer space will be efficiently used in case fault occurs at some nodes.
💡 Research Summary
The paper addresses the challenge of maintaining high performance and reliability in Network‑on‑Chip (NoC) routers when permanent faults occur on a chip. Traditional static‑allocated multi‑queue (SAMQ) buffers assign a fixed amount of buffer space to each virtual channel (VC). This leads to poor utilization, especially when traffic is unbalanced, and when a node or physical link fails, the buffer space occupied by the faulty VC remains locked, degrading overall network throughput and latency.
To overcome these limitations, the authors propose dynamically allocated multi‑queue (DAMQ) buffer schemes that share buffer space among multiple VCs and can quickly reclaim space from faulty channels. Two concrete architectures are introduced:
-
DAMQS (DAMQ Shared) – combines the buffers of two physical channels (e.g., East‑X and South‑Y) into a single shared buffer. The shared buffer serves eight VCs (four per physical channel) and provides two read ports and two write ports. Each VC reserves two slots (Reserved Slots) that must be used before any other flits can occupy the buffer, guaranteeing a minimum headroom for every VC.
-
DAMQAS (DAMQ All Shared) – extends the sharing concept to four physical channels, creating one large buffer shared by sixteen VCs. It offers four read and four write ports, allowing higher concurrency. The same reservation policy applies, but the buffer space expands and contracts from opposite ends depending on which VC group (X‑dimension or Y‑dimension) is active.
Both schemes operate entirely at the switching layer. When a fault is detected on a node or a physical link, the upstream router immediately terminates the buffer occupancy of all flits destined for the failed link. The released slots become instantly available to flits of healthy VCs, without involving higher‑level protocols. This “local‑only” fault‑recovery mechanism reduces latency and avoids the need for complex coordination.
The authors evaluate the proposals using a 64‑node 8×8 mesh NoC. Packets consist of 32 flits, each physical channel multiplexes four VCs, and buffer sizes per VC range from 4 to 16 flits. Two traffic patterns are examined: synthetic uniform traffic and a realistic telecom workload derived from the E3S benchmark suite. Fault rates of 0 % to 4 % are injected randomly.
Key findings include:
-
Throughput: DAMQS and DAMQAS achieve comparable or higher maximum throughput than a 16‑flit SAMQ while using significantly less total buffer space. For uniform traffic, a 14‑flit DAMQS with 0 % faults matches the throughput of a 16‑flit SAMQ; with 4 % faults, an 8‑flit DAMQS reaches the same level. Similar trends hold for DAMQAS.
-
Latency: At low to moderate injection loads, the shared‑buffer schemes exhibit lower average packet latency because they can reallocate space from faulty VCs, preventing congestion buildup. At very high loads and fault rates, DAMQAS shows slightly higher latency due to its larger occupancy, but the difference remains modest.
-
Real‑world traffic: The advantage of shared buffers is amplified for the telecom benchmark, which displays highly unbalanced traffic (some channels carry near‑zero load). In this scenario, a 13‑flit DAMQS with no faults attains the same throughput as a 16‑flit SAMQ, and a 7‑flit DAMQS with 4 % faults does the same.
-
Buffer efficiency: The total buffer requirement for the simulated system is 3 584 flits. The shared schemes can achieve the same performance with 12.5 %–25 % less buffer space under fault‑free conditions and up to 50 %–68 % less under 4 % fault rates.
The paper concludes that DAMQS and DAMQAS provide a practical, low‑overhead solution for fault‑tolerant NoC designs. By dynamically sharing buffer resources and performing fault‑induced reclamation locally at the router, they maintain high throughput and low latency while reducing silicon area and power consumption. Future work is suggested on extending the approach to three‑dimensional meshes, multiple simultaneous faults, and detailed power/area analysis.
Comments & Academic Discussion
Loading comments...
Leave a Comment