A Reconfigurable Low Power High Throughput Architecture for Deep Network Training

Reading time: 5 minute
...

📝 Abstract

General purpose computing systems are used for a large variety of applications. Extensive supports for flexibility in these systems limit their energy efficiencies. Neural networks, including deep networks, are widely used for signal processing and pattern recognition applications. In this paper we propose a multicore architecture for deep neural network based processing. Memristor crossbars are utilized to provide low power high throughput execution of neural networks. The system has both training and recognition (evaluation of new input) capabilities. The proposed system could be used for classification, dimensionality reduction, feature extraction, and anomaly detection applications. The system level area and power benefits of the specialized architecture is compared with the NVIDIA Telsa K20 GPGPU. Our experimental evaluations show that the proposed architecture can provide up to five orders of magnitude more energy efficiency over GPGPUs for deep neural network processing.

💡 Analysis

General purpose computing systems are used for a large variety of applications. Extensive supports for flexibility in these systems limit their energy efficiencies. Neural networks, including deep networks, are widely used for signal processing and pattern recognition applications. In this paper we propose a multicore architecture for deep neural network based processing. Memristor crossbars are utilized to provide low power high throughput execution of neural networks. The system has both training and recognition (evaluation of new input) capabilities. The proposed system could be used for classification, dimensionality reduction, feature extraction, and anomaly detection applications. The system level area and power benefits of the specialized architecture is compared with the NVIDIA Telsa K20 GPGPU. Our experimental evaluations show that the proposed architecture can provide up to five orders of magnitude more energy efficiency over GPGPUs for deep neural network processing.

📄 Content

1

  Abstract—General purpose computing systems are used for a large variety of applications. Extensive supports for flexibility in these systems limit their energy efficiencies. Neural networks, including deep networks, are widely used for signal processing and pattern recognition applications. In this paper we propose a multicore architecture for deep neural network based processing. Memristor crossbars are utilized to provide low power high throughput execution of neural networks. The system has both training and recognition (evaluation of new input) capabilities. The proposed system could be used for classification, dimensionality reduction, feature extraction, and anomaly detection applications. The system level area and power benefits of the specialized architecture is compared with the NVIDIA Telsa K20 GPGPU. Our experimental evaluations show that the proposed architecture can provide up to five orders of magnitude more energy efficiency over GPGPUs for deep neural network processing.

Keywords–Low power architecture; memristor crossbars; autoencoder; on-chip training; deep network. I. INTRODUCTION eliability and power consumption are among the main obstacles to continued performance improvement of future multicore computing systems [1]. As a result, several research groups are investigating the design of energy efficient processors from different aspects. These include architectures for approximate computation utilizing dynamic voltage scaling technique, dynamic precision control, and inexact hardware [2,3]. Emerging non-volatile memory technologies are being investigated as low power on-chip caches [4]. Application specific architectures are also proposed for several application domains such as signal processing and video processing. Interest in specialized architectures for accelerating neural networks has increased significantly because of their ability to reduce power, increase performance, and allow fault tolerant computing. Recently IBM has developed the TrueNorth chip [5] consisting of 4,096 neurosynaptic cores interconnected via an intra-chip network. Their synapse element is SRAM based and off-chip training is utilized. DaDianNao [6] is an accelerator for deep neural network (DNN) and convolutional neural network (CNN). In this system, neuron synaptic weights are stored in eDRAM and

later brought into Neural Functional Unit for execution.
Recently deep neural networks (or deep networks) have gained significant attention because of their superior performance for classification and recognition applications. Training and evaluation of a deep network are both computationally and data intensive tasks. This paper presents a generic multicore architecture for training and recognition of deep network applications. The system has both unsupervised and supervised learning capabilities. The proposed system could be used for classification, unsupervised clustering, dimensionality reduction, feature extraction and anomaly detection applications. Memristor [7] is a novel non-volatile device having a large varying resistance range. Physical memristors can be laid out in a high density grid known as a crossbar [8]. A memristor crossbar can evaluate many multiply-add operations in parallel in analog domain which are the dominant operations in neural networks. We are using memristor crossbars in the proposed system which provide high synaptic weight density and parallel analog processing consuming very low energy. In this system processing happens at physical location of the data. Thus data transfer energy and functional unit energy consumptions are saved significantly. Both the training and the recognition phases of the
neural networks were examined. As deep networks deal with large networks, efficient approaches to simulate and implement large memristor crossbars for these networks are important. We have presented a novel method to accurately simulate large crossbars at high speed. Detailed circuit level simulations of memristor crossbars were carried out to verify the neural operations. We have evaluated the power, area,
and performance of the proposed multicore system and
compared them with a GPU based system. Our results
indicate that the memristor based architecture can provide
up to five orders of magnitude more energy efficiency over
GPU for the selected benchmarks. The related memristor core design works in this area are [9,10] where the impact on area, power, and throughput are examined for systems that carry out recognition tasks only. Unsupervised training or deep network training is not examined in these studies. These systems are based on ex-situ training and do not examine on-chip training and A Reconfigurable Low Power High Throughput Architecture for Deep Network Training Raqibul Hasan, and Tarek M. Taha Department of Electrical and Computer En

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut