Field-Programmable Deep Neural Network (DNN) Learning and Inference accelerator: a concept

Reading time: 6 minute
...

📝 Original Info

  • Title: Field-Programmable Deep Neural Network (DNN) Learning and Inference accelerator: a concept
  • ArXiv ID: 1802.04899
  • Date: 2018-09-25
  • Authors: : Franca-Neto, A., Farooq, M., …

📝 Abstract

An accelerator is a specialized integrated circuit designed to perform specific computations faster than if those were performed by CPU or GPU. A Field-Programmable DNN learning and inference accelerator (FProg-DNN) using hybrid systolic and non-systolic techniques, distributed information-control and deep pipelined structure is proposed and its microarchitecture and operation presented here. Reconfigurability attends diverse DNN designs and allows for different number of workers to be assigned to different layers as a function of the relative difference in computational load among layers. The computational delay per layer is made roughly the same along pipelined accelerator structure. VGG-16 and recently proposed Inception Modules are used for showing the flexibility of the FProg-DNN reconfigurability. Special structures were also added for a combination of convolution layer, map coincidence and feedback for state of the art learning with small set of examples, which is the focus of a companion paper by the author (Franca-Neto, 2018). The accelerator described is able to reconfigure from (1) allocating all a DNN computations to a single worker in one extreme of sub-optimal performance to (2) optimally allocating workers per layer according to computational load in each DNN layer to be realized. Due the pipelined architecture, more than 50x speedup is achieved relative to GPUs or TPUs. This speed-up is consequence of hiding the delay in transporting activation outputs from one layer to the next in a DNN behind the computations in the receiving layer. This FProg-DNN concept has been simulated and validated at behavioral-functional level.

💡 Deep Analysis

Figure 1

📄 Full Content

Field-Programmable Gate Arrays (FPGA's) use the high circuit densities in modern semiconductor fabrication processes to design integrated circuits that are "field-programmable". Their on-die logic is reconfigurable even after the dies are packaged and shipped. FPGA's use arrays of logic cells surrounded by programmable routing resources. Look-Up Tables (LUTs), Block RAM (BRAM), and specialized analog blocks (PLLs, ADC and DAC converters, and high speed transceivers with signal emphasis, continuous and decision feedback equalizers) are also added to high performance FPGA's. Interconnection resources dominate their logic resources and provide for the FPGA's flexibility (Farooq, 2012).

Field-Programmable DNN Learning & Inference Accelerator (FProg-DNN) presented in this work aims at providing for reconfigurable DNNs what FPGAs offer for reconfigurable logic. In FPGAs, unused hardware resources after synthesis, place and route, remain as idle hardware on die. In FProg-DNN, unused hardware resources will remain as idle tensor and pixel units. Figure 1 shows a system architecture envisioned for applications with large scale deployment of FProg-DNN chips. Many storage drives distributed in a datacenter or distributed across several datacenters are equipped with a relatively small version of a FProg-DNN in each storage unit. Very large datasets from several drives are brought to be processed by a much larger DNN realized in a server with multiple large size FProg-DNN chips.

The possibility of adding FProg-DNN to storage drives in addition to those FProg-DNN chips on the server is intended to exploit the fact that the initial layers in a large DNN tend to produce much more activation outputs than later layers. These initial layers are also concerned with common low level edge patterns and might use the same trained coefficients among the storage drives. Moreover, this “pre-screening” by the local FP-DNN in each drive may bring the additional benefit of diminishing the amount of data (activation outputs) sent to the much larger multi-chip Fprog-DNN based server, thus leading to more efficient a computational solution.

FProg-DNN uses (a) reconfigurable functional blocks and (b) reconfigurable interconnect resources on die (appendix A). As per figure 1, a computer is used to define the architecture of a DNN to be synthesized. On this computer, the user specifies number of inputs, number of hidden layers what type of layers they are, number of nodes per layer and number of outputs. Each layer will have their non-linearity specified. All hyperparameters are also specified by the user for learning on the targeted FProg-DNN device. The computer than compile all this information and create a model file to be sent to each FProg-DNN in the server. Inside each FProg-DNN, a control processor (figure 2a) informs each worker its identity. Worker’s identity defines the behavior of a worker, which kind of process it is to perform, what filters to use, how many map pixels to construct, and what data from the previous layer are relevant to its processing. Workers also have special functional blocks for analytics and diagnostics.

Plus, the user may define configuration parameter for the FProg-DNN inside each storage drive.

The physical interconnections that are used to bring data from those storage drives to the FProg-DNN server are used to provide model configuration from the user’s computer to the FProg-DNN in each storage driver.

FPGA’s are able to solve any problem that can be posed as a computing problem. That can be readily recognized by noting that a full-fledged soft processor can be implemented in an FPGA. Similarly, FProg-DNN can be reconfigured to any arrangement of convolution layers, max pool layer and fully-connected layer using its programmable functional blocks and interconnects. That is readily recognized as well by the fact that an FProg-DNN can be reconfigured (a) to make a single worker to run the computational load of all nodes in a DNN or (b) distribute worker among layers so that the delay from computations are roughly the same in each layer in the FProg-DNN.

The pipelined structure of the DNN realized in a FProg-DNN die is responsible for the speed-up in relation to GPUs or TPUs. Several DNN layers are synthesized in a FProg-DNN and the transport of activation outputs from one layer to the next is overlapped by the computation in the following layers. Thus, different from GPUs and TPUs, the delay in transporting data from one layer to the next is hidden in Fprog-DNN realizations.

In a simplified architectural description as shown in figure 2a, an FProg-DNN is defined by a large reconfigurable fabric and a control processor.

Inside the reconfigurable fabric, reconfigurable interconnect channels are alternated with reconfigurable functional blocks. As shown in figure 2b, these functional blocks are either tensor arrays fields or pixel arrays fields, which will be defined in more detail later. In the foll

📸 Image Gallery

cover.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut