CADC: Crossbar-Aware Dendritic Convolution for Efficient In-memory Computing

Convolutional neural networks (CNNs) are computationally intensive and often accelerated using crossbarbased in-memory computing (IMC) architectures. However, large convolutional layers must be partit

CADC: Crossbar-Aware Dendritic Convolution for Efficient In-memory Computing

Convolutional neural networks (CNNs) are computationally intensive and often accelerated using crossbarbased in-memory computing (IMC) architectures. However, large convolutional layers must be partitioned across multiple crossbars, generating numerous partial sums (psums) that require additional buffer, transfer, and accumulation, thus introducing significant system-level overhead. Inspired by dendritic computing principles from neuroscience, we propose crossbar-aware dendritic convolution (CADC), a novel approach that dramatically increases sparsity in psums by embedding a nonlinear dendritic function (zeroing negative values) directly within crossbar computations. Experimental results demonstrate that CADC significantly reduces psums, eliminating 80% in LeNet-5 on MNIST, 54% in ResNet-18 on CIFAR-10, 66% in VGG-16 on CIFAR-100, and up to 88% in spiking neural networks (SNN) on the DVS Gesture dataset. The induced sparsity from CADC provides two key benefits: (1) enabling zero-compression and zeroskipping, thus reducing buffer and transfer overhead by 29.3%, and accumulation overhead by 47.9%; (2) minimizing ADC quantization noise accumulation, resulting in small accuracy degradation-only 0.01% for LeNet-5, 0.1% for ResNet-18, 0.5% for VGG-16, and 0.9% for SNN. Compared to vanilla convolution (vConv), CADC exhibits accuracy changes ranging from +0.11% to +0.19% for LeNet-5, -0.04% to -0.27% for ResNet-18, +0.99% to +1.60% for VGG-16, and -0.57% to +1.32% for SNN, across crossbar sizes from 64×64 to 256×256. Ultimately, a SRAM-based IMC implementation of CADC achieves 2.15 TOPS and 40.8 TOPS/W for ResNet-18 (4/2/4b), realizing a 11×-18× speedup and 1.9×-22.9× improvement in energy efficiency compared to existing IMC accelerators.


📜 Original Paper Content

🚀 Synchronizing high-quality layout from 1TB storage...