BinarEye: An Always-On Energy-Accuracy-Scalable Binary CNN Processor With All Memory On Chip in 28nm CMOS

BinarEye: An Always-On Energy-Accuracy-Scalable Binary CNN Processor   With All Memory On Chip in 28nm CMOS
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

This paper introduces BinarEye: a digital processor for always-on Binary Convolutional Neural Networks. The chip maximizes data reuse through a Neuron Array exploiting local weight Flip-Flops. It stores full network models and feature maps and hence requires no off-chip bandwidth, which leads to a 230 1b-TOPS/W peak efficiency. Its 3 levels of flexibility - (a) weight reconfiguration, (b) a programmable network depth and (c) a programmable network width - allow trading energy for accuracy depending on the task’s requirements. BinarEye’s full system input-to-label energy consumption ranges from 14.4uJ/f for 86% CIFAR-10 and 98% owner recognition down to 0.92uJ/f for 94% face detection at up to 1700 frames per second. This is 3-12-70x more efficient than the state-of-the-art at on-par accuracy.


💡 Research Summary

The paper presents BinarEye, a fully digital processor designed for always‑on binary convolutional neural networks (BinaryNets) implemented in 28 nm CMOS. The core idea is to combine a memory‑compute architecture with on‑chip storage of the entire network (weights and intermediate feature maps), thereby eliminating off‑chip DRAM bandwidth and achieving unprecedented energy efficiency.

The hardware consists of a 64‑neuron array where each neuron’s binary weights are held in local flip‑flops. During inference, all 64 weight sets are pre‑loaded from two 259 kB weight SRAM blocks (north and south) into these flip‑flops, after which the array performs parallel stride‑1 convolutions on 2 × 2 kernels using XNOR‑popcount operations followed by a binary comparator. Two 32 kB activation SRAMs (west and east) store the input and output feature maps, and a small 5 kB SRAM plus logic implements fully‑connected layers, allowing the chip to produce classification labels without external assistance.

Flexibility is achieved on three levels: (1) weight re‑configuration – the on‑chip SRAM can be rewritten with any binary model; (2) programmable network depth – a micro‑coded controller can execute up to 16 instructions that define the sequence of input‑output, convolution, and fully‑connected layers; (3) programmable network width – a batch size parameter S (1, 2, 4) scales the number of input/output channels (F = C = 256/S). Larger S reduces the number of LD‑CONV phases from four to one, improving throughput and energy quadratically (∝ S²) at the cost of reduced modeling capacity.

Measured performance shows a peak core efficiency of 230 TOPS/W (1‑bit ops) at 6 MHz, and an input‑to‑label (I2L) efficiency of 145 TOPS/W when all overheads are included. Energy per inference ranges from 14.4 µJ (86 % CIFAR‑10 accuracy) down to 0.92 µJ (94 % face‑detection accuracy). The processor can run up to 1.7 k frames per second at its minimum‑energy point, and up to 10 kFPS in higher‑power modes. Power consumption is as low as 1.6 mW in the most efficient configuration.

Compared with state‑of‑the‑art low‑precision CNN accelerators (YodaNN, BRein Memory, Envision, IBM TrueNorth, etc.), BinarEye delivers 3‑70× lower energy per inference while matching or exceeding their accuracy, and provides up to 10× higher throughput. The die occupies 2 mm², uses two supply rails (one for memories and control, one for the neuron array), and requires no custom analog design because the neuron array is built from standard digital cells.

In summary, BinarEye demonstrates that a fully digital, memory‑centric binary CNN processor with on‑chip model storage and three levels of programmability can enable ultra‑low‑power, always‑on visual wake‑up sensing for wearable and edge devices, achieving up to 230 TOPS/W peak efficiency and 145 TOPS/W system‑level efficiency.


Comments & Academic Discussion

Loading comments...

Leave a Comment