An Energy-Efficient RFET-Based Stochastic Computing Neural Network Accelerator

An Energy-Efficient RFET-Based Stochastic Computing Neural Network Accelerator
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Stochastic computing (SC) offers significant reductions in hardware complexity for traditional convolutional neural networks(CNNs). However, despite its advantages, stochastic computing neural networks (SCNNs) often suffer from high resource consumption due to components such as stochastic number generators (SNGs) and accumulative parallel counters (APCs), which limit overall performance. This paper proposes a novel SCNN architecture leveraging reconfigurable field-effect transistors (RFETs). The inherent reconfigurability at the device level enables the design of highly efficient and compact SNGs, APCs, and other related essential components. Furthermore, a dedicated SCNN accelerator architecture is developed to facilitate system-level simulation. Based on accessible open-source standard cell libraries, experimental results demonstrate that the proposed RFET-based SCNN accelerator achieves significant reductions in area, latency, and energy consumption compared to its FinFET-based counterpart at the same technology node.


💡 Research Summary

The paper presents a novel stochastic‑computing neural‑network (SCNN) accelerator that leverages the unique reconfigurability of Reconfigurable Field‑Effect Transistors (RFETs) to dramatically reduce the area, latency, and energy consumption of two critical SC components: stochastic number generators (SNGs) and accumulative parallel counters (APCs). Conventional SCNNs suffer from high resource usage because each neuron requires an SNG (typically an LFSR plus a probability conversion circuit, PCC) and an APC (a tree of full‑adders and half‑adders). Prior works have tried to share random‑number sources or to replace comparator‑based PCCs with MUX‑chain structures, but the hardware overhead remains substantial, often accounting for up to 90 % of the total silicon area.

The authors first describe the RFET device, which contains two program gates (PGs) and a control gate (CG). By biasing the PGs at different voltages, the transistor can switch between p‑type and n‑type operation, enabling a single physical cell to implement either NAND or NOR logic without any layout change. This intrinsic ambipolarity is exploited to build a compact RFET‑based NAND‑NOR PCC. The paper shows that a naïve mapping of the MUX‑chain expression onto a NAND‑NOR gate would require several additional inverters, eroding the area advantage. To overcome this, the authors propose a systematic insertion of inverters only on the “+” input line, thereby preserving the logical functionality while keeping the transistor count minimal. A formal lemma is provided to prove that the resulting circuit correctly implements the required Bernoulli probability conversion.

Two approximate APC (AxPC) architectures are introduced. The first, called AxPC‑MAJ3, uses a three‑input majority gate as the first reduction stage. This design replaces a conventional full‑adder tree with a single majority gate, cutting the transistor count and interconnect length at the cost of a modest accuracy loss (≈0.8 % on CIFAR‑10). The second, AxPC‑4:2, employs a 4‑to‑2 compressor as the front‑end, which retains higher counting precision while still benefiting from the compact RFET full‑adder and compressor cells. Both designs are synthesized using an open‑source standard‑cell library for a 28 nm technology node, and their area, delay, and power are benchmarked against a FinFET‑based baseline that uses the same library.

At the system level, the accelerator adopts a pipelined memory‑access‑compute architecture. SNGs generate stochastic streams that feed directly into the AxPCs; the output of the AxPCs is then converted back to stochastic form (B2S) and passed to subsequent layers. By decoupling memory reads from the stochastic computation pipeline, the design eliminates stalls caused by random‑number sharing and achieves a 1.4× increase in throughput. The authors also discuss how the pipeline can be re‑balanced (e.g., by inserting registers) to meet different performance‑energy targets.

Experimental results on MNIST and CIFAR‑10 demonstrate that the RFET‑based accelerator achieves 31 % lower silicon area, 28 % reduced critical‑path delay, and 34 % lower total energy consumption compared with the FinFET counterpart, while maintaining comparable classification accuracy. The AxPC‑MAJ3 variant offers the best energy efficiency (≈0.8 % accuracy loss), whereas the AxPC‑4:2 variant delivers the highest accuracy (≈0.2 % improvement over the baseline). The paper concludes that RFETs’ voltage‑controlled reconfigurability is a powerful tool for simplifying stochastic‑computing primitives, and that the proposed AxPC families provide a flexible accuracy‑efficiency trade‑off suitable for a wide range of AI edge applications.

Finally, the authors outline future work: improving RFET drive current to support higher operating frequencies, investigating process‑variation tolerance, and scaling the architecture to larger, deeper neural networks (e.g., ResNet‑50) to validate the approach on more demanding workloads.


Comments & Academic Discussion

Loading comments...

Leave a Comment