A Digital Neuromorphic Architecture Efficiently Facilitating Complex Synaptic Response Functions Applied to Liquid State Machines

A Digital Neuromorphic Architecture Ef ﬁciently F acilitating Complex Synaptic Response Functions Applied to Liquid State Machines Michael R. Smith ∗ , Aaron J. Hill ∗ , Kristofor D. Carlson ∗ , Craig M. V ineyard ∗ , Jonathon Donaldson ∗ , David R. F ollett † , Pamela L. F ollett † ‡ , John H. Naegle ∗ , Conrad D. James ∗ and James B. Aimone ∗ ∗ Sandia National Laboratories, Albuquerque, NM 87185 USA Email: { msmith4, ajhill, kdcarls, cmviney , jwdonal, jhnae gl, cdjame, jbaimon } @sandia.gov † Lewis Rhodes Labs, Concord, MA 01742 USA Email: { drfollett, plfollett } @earthlink.net ‡ T ufts University , Medford, MA 02155 USA Abstract —Information in neural networks is r epresented as weighted connections, or synapses, between neurons. This poses a problem as the primary computational bottleneck for neural networks is the vector -matrix multiply when inputs are multiplied by the neural network weights. Con ventional processing architec- tures are not well suited f or simulating neural networks, often re- quiring large amounts of energy and time. Additionally , synapses in biological neural networks are not binary connections, but exhibit a nonlinear response function as neurotransmitters are emitted and diffuse between neurons. Inspired by neuroscience principles, we present a digital neuromorphic architecture, the Spiking T emporal Processing Unit (STPU), capable of modeling arbitrary complex synaptic response functions without requiring additional hardware components. W e consider the paradigm of spiking neurons with temporally coded information as opposed to non-spiking rate coded neurons used in most neural networks. In this paradigm we examine liquid state machines applied to speech recognition and show ho w a liquid state machine with temporal dynamics maps onto the STPU—demonstrating the ﬂexibility and efﬁciency of the STPU for instantiating neural algorithms. I . I N T RO D U C T I O N Neural-inspired learning algorithms are achieving state of the art performance in many application areas such as speech recognition [1], image recognition [2], and natural language processing [3]. Information and concepts, such as a dog or a person in an image, are represented in the synapses, or weighted connections, between the neurons. The success of a neural network is dependent on training the weights between the neurons in the network. Howe ver , training the weights in a neural network is non-trivial and often has high computational complexity with large data sets requiring long training times. One of the contributing factors to the computational com- plexity of neural networks is the vector-matrix multiplications This work was supported by Sandia National Laboratories Laboratory Directed Research and Dev elopment (LDRD) Program under the Hardware Acceleration of Adaptive Neural Algorithms (HAAN A) Grand Challenge project. Sandia National Laboratories is a multi-mission laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U. S. Department of Energys National Nuclear Security Administration under Contract DE-A C04-94AL85000. (the input vector multiplied by the synapse or weight matrix). Con ventional computer processors are not designed to process information in the manner that a neural algorithm requires (such as the vector -matrix multiply). Recently , major advances in neural networks and deep learning hav e coincided with advances in processing power and data access. Howe ver , we are reaching the limits of Moore’ s law in terms of how much more efﬁcienc y can be gained from conv entional processing architectures. In addition to reaching the limits of Moore’ s law , con ventional processing architectures also incur the von Neumann bottleneck [4] where the processing unit’ s program and data memory exist in a single memory with only one shared data bus between them. In contrast to con ventional processing architectures which consist of a powerful centralized processing unit(s) that oper- ate(s) in a mostly serialized manner , the brain is composed of many simple distrib uted processing units (neurons) that are sparsely connected and operate in parallel. Communication between neurons occurs at the synaptic connection which operate independently of the other neurons that are not in- volv ed in the connection. Thus, vector-matrix multiplications are implemented more ef ﬁciently facilitated by parallel oper- ations. Additionally , the synaptic connections in the brain are generally sparse and information is encoded in a combination of the synaptic weights and the temporal latencies of a spike on the synapse [5]. Biological synapses are not simply a weighted binary connection but rather exhibit a non-linear synaptic response function due to the release and dispersion of neurotransmitters in the space between neurons. Biological neurons communicate using simple “data pack- ets, ” that are generally accepted as binary spikes . This is in contrast to the neuron models used in traditional artiﬁcial neu- ral networks (ANN) which are commonly rate coded neurons. Rate coded neurons encode information between neurons as a real-valued magnitude of the output of a neuron—a larger output represents a higher ﬁring rate. The use of rate coded neurons stems from the assumption that the ﬁring rate of a Fig. 1. High level overvie w of the STPU. The STPU is composed of a set of leaky integrate and ﬁre neurons. Each neuron has an associated temporal buf fer such that inputs can be mapped to a neuron with a time delay . W ( t ) is the neuronal encoding transformation which addresses connectivity , efﬁcacy and temporal shift. The functionality of the STPU mimics the of functionality of biological neurons. neuron is the most important piece of information, whereas temporally coded neurons encode information based on when a spike from one neuron arriv es at another neuron. T emporally coded information has been shown to be more powerful than rate coded information and more biologically accurate [6] . Based on these neuroscience principles, we present the Spik- ing T emporal Pr ocessing Unit (STPU), a novel neuromorphic hardware architecture designed to mimic neuronal functional- ity and alleviate the computational restraints inherent in con- ventional processors. Other neuromorphic architectures have shown very strong energy efﬁcienc y [7], powerful scalability [8], and aggressiv e speed-up [9] by utilizing the principles ob- served in the brain. W e build upon these efforts lev eraging the beneﬁts of low energy consumption, scalability , and run time speed ups and include an ef ﬁcient implementation of arbitrarily complex synaptic response functions in a digital architecture. This is important as the synaptic response function has strong implications in spiking recurrent neural networks [10]. W e also examine liquid state machines (LSMs) [11] to show how the constructs av ailable in the STPU facilitate complex dynamical neuronal systems. While we examine the STPU in the context of LSMs, the STPU is a general neuromorphic architecture. Other spiked-based algorithms have been imple- mented on the STPU [12], [13]. In Section II, we present the STPU. A high le vel comparison with other neuromorphic architectures is presented in Section III. W e present LSMs in Section IV. In Section V, we examine how LSMs map onto the STPU and show results from running the LSM on the STPU. W e conclude in Section VI. I I . T H E S P I K I N G T E M P O R A L P RO C E S S I N G U N I T In this section, we describe the Spiking T emporal Pr ocessing Unit (STPU) and how the components in the STPU map to functionality in biological neurons. The design of the STPU is based on the following three neuroscience principles observed in the brain: 1) the brain is composed of simple processing units (neurons) that operate in parallel and are sparsely connected, 2) each neuron has its own local mem- ory for maintaining temporal state, and 3) information is encoded in the connectivity , efﬁcac y , and signal propagation characteristics between neurons. A high-level ov erview of a biological neuron and how its components map onto the STPU are shown in Figure 1. The STPU deri ves its dynamics from the leaky integrate and ﬁre (LIF) neuron model [14]. Each LIF neuron j maintains a membrane potential state variable, v j , that tracks its stimulation at each time step based on the following differential equation [10]: dv j dt = − v j τ j + X k X l w kj · s ( t − t kl − ∆ kl ) . (1) The v ariable τ j is the time constant of the ﬁrst-order dynamics, k is the index of the presynaptic neuron, w kj is the weight connecting neuron j to neuron k , t kl is the time of the l th spike from neuron k , ∆ kl is the synaptic delay from neuron k on the l th spike, and s ( · ) is the dynamic synaptic response function to an input spike. In the LIF model, neuron j will ﬁre if v j exceeds a threshold θ j . The synapses between input neurons to destination neurons are deﬁned in the weight matrix W ( t ) for a giv en time t as the weights between inputs and neurons can change ov er time. Unique to the STPU, each LIF neuron has a local temporal memory buf fer R composed of D memory cells to model synaptic delays. When a biological neuron ﬁres, there is a latency associated with the arriv al of the spike at the soma of the postsynaptic neuron due to the time required to propagate down the axon of the presynaptic neuron and the time to propagate from the dendrite to the soma of the postsynaptic neuron ( ∆ kl ). The temporal buf fer represents dif ferent synaptic junctions in the dendrites where a lower index value in the temporal buffer constitutes a dendritic connection closer to the soma and/or a shorter axon length than one with a larger index value. Thus, synapses in the STPU are speciﬁed as a weight w kj d from a source input neuron k , to a destination neuron j in the d th cell of the temporal buf fer , d ∈ { 0 , 1 , 2 , . . . , D − 1 } . This allows multiple connections between neurons with dif- ferent synaptic delays. At each time step a summation of the product of the inputs i ( t ) and synaptic weights W ( t ) occurs and is added to the current value in that position of the temporal buf fer ˆ R d ( t ) = R d ( t ) + P k i k ( t ) w kj d ( t ) where ˆ R ( t ) is a temporary state of the temporal b uffer . The value in each cell of the temporal buffer is then shifted down one position, that is R d ( t + 1) = ˆ R d − 1 ( t ) . The values at the bottom of the buf fer are fed into the LIF neuron. In biological neurons, when a neuron ﬁres a (near) binary spike is propagated down the axon to the synapse, which deﬁnes a connection between neurons. The purpose of the synapse is to transfer the electric activity or information from one neuron to another neuron. Direct electrical communication does not take place, rather a chemical mediator is used. In the presynaptic terminal, an action potential from the emitted spike causes the release of neurotransmitters into the synaptic cleft (space between the pre and postsynaptic neurons) from the synaptic vescles. The neurotransmitters cross the synaptic cleft and attach to receptors on the postsynaptic neuron injecting a positi ve or negati ve current into the postsynaptic neuron. Through a chemical reaction, the neurotransmitters are broken down in receptors on the postsynaptic neuron and are released back into the synaptic cleft where the presynaptic neuron reabsorbs the broken down molecules to synthesize new neu- rotransmitters. In terms of electrical signals, the propagation of activ ation potentials on the axon is a digital signal as sho wn in Figure 2. Ho wev er , the chemical reactions that occur at the synapse to release and reabsorb neurotransmitters are modeled as an analog signal. The beha vior of the synapse propagating spikes between neurons has important ramiﬁcations on the dynamics of the liquid. In Equation 1, the synaptic response function is rep- resented by s ( · ) . Following Zhang et al. [10], the Dirac delta function δ ( · ) can be used as the synaptic response function and is con venient for implementation on digital hardware. Howe ver , the Dirac delta function exhibits static behavior . Zhang et al. show that dynamical behavior can be modeled in the synapse by using the ﬁrst-order response to a presynaptic spike: 1 τ s e − t − t kl − ∆ kl τ s · H ( t − t kl − ∆ kl ) (2) where τ s is the time constant of the ﬁrst-order response, H ( · ) is the Heaviside step function, and 1 /τ s normalizes the ﬁrst- order response function. The dynamical behavior can also be implemented using a second-order dynamic model for s ( · ) : 1 τ s 1 − τ s 2 ( e − t − t kl − ∆ kl τ s 1 − e − t − t kl − ∆ kl τ s 2 ) · H ( t − t kl − ∆ kl ) (3) Fig. 2. Spike propagation along the axon and across the synapse. The spike propagated on the axon is generally accepted as a binary spike. Upon arrival at the synapse, the spike initiates a chemical reaction in the synaptic cleft which stimulates the postsynaptic neuron. This chemical reaction produces an analog response that is fed into the soma of the postsynaptic neuron. In the STPU, arbitrary synaptic response functions are modeled efﬁciently using the temporal buf fer. The synaptic response function is discretely sampled and encoded into the weights connecting one neuron to another and mapped to the corresponding cells in the temporal buf fer. where τ s 1 and τ s 2 are the time constants for the second order response and 1 / ( τ s 1 − τ s 2 ) normalizes the second-order dynamical response function. Zhang et al. showed signiﬁcant improv ements in accurac y and the dynamics of the liquid when using these dynamical response functions. Implementing exponential functions in hardware is expen- siv e in terms of the resources needed to implement ex- ponentiation. Considering that the STPU is composed of individual parallel neuronal processing units, each neuron would need its own exponentiation functionality . Including the hardware mechanisms for each neuron to do exponentiation would signiﬁcantly reduce the number of neurons by orders of magnitude as there are limited resources on an FPGA. Rather than explicitly implement the exponential functions in hardware, we use the temporal b uffer associated with each neuron. The exponential function is discretely sampled and the value at each sample is assigned a connection weight w kj d from the presynaptic neuron k to the corresponding cell d in the temporal buf fer of the postsynaptic neuron j . Thus, a single weighted connection between two neurons is expanded to multiple weighted connections between the same two neurons. This is shown graphically in Figure 2. The use of the temporal b uffer allows for an ef ﬁcient implementation of the digital signal propagation down the axon of a neuron or T ABLE I H I GH - L E VE L C O M P A R I S ON O F T HE S T P U W I T H T RU E N O R T H A N D S P I N N A K E R . Platform: STPU T rueNorth SpiNNaker Interconnect: 3D mesh multicast 1 2D mesh unicast 2D mesh multicast Neuron Model: LIF LIF 2 Programmable 3 Synapse Model: Programmable 4 Binary Programmable 5 1 The 3D mesh is enabled due to the temporal buf fer available for each neuron in STPU. 2 T rueNorth provides a highly programmable LIF to facilitate additional neural dynamics. 3 SpiNNaker provides ﬂexibility for the neuron model, howe ver more complex biological models are more computationally expensiv e. 4 The synapse model is programmable in the STPU via the temporal buf fer by discretely sampling an arbitrary synapse model. 5 As with the neuron model, SpiNNaker is optimized for simpler synaptic models. More complex synaptic models incur a cost in computational complexity . the analog signal propagation between neurons at the synapse. I I I . C O M PA R I S O N W I T H O T H E R N E U RO M O R P H I C A R C H I T E C T U R E S The STPU is not the ﬁrst neuromorphic architecture. Four prominent neuromorphic architectures are IBM’ s TrueNorth chip [7], the Stanford Neurogrid [15], the Heidelberg Brain- ScaleS machine [16] and the Manchester Spiking Neural Net- work Architecture (SpiNNaker) [17]. The Stanford Neurogrid and the Heidelberg BrainScaleS are analog circuits while T rueNorth and SpiNNaker are digital circuits. As the STPU is also a digital system, we will focus on a comparison with T rueNorth and SpiNNaker . The T rueNorth chip le verages a highly distributed cross- bar based architecture designed for high energy-ef ﬁciency composed of 4096 cores. The base-lev el neuron is a highly parametrized LIF neuron. A T rueNorth core is a 256 × 256 binary crossbar where the existence of the synapse is encoded at each junction, and indi vidual neurons assign weights to particular sets of input axons. The crossbar architecture allows for efﬁcient vector -matrix multiplication. T rueNorth only al- lows for point-to-point routing. Each of the 256 neurons on a core is programmed with a spike destination addressed to a single row on a particular core which could be the same core, enabling recurrence, or a different core. The crossbar inputs are coupled via delay buffers to insert axonal delays. A neuron is not nativ ely able to connect to multiple cores or to connect to a single neuron with different temporal delays. As a work around, a neuron is to be replicated within the same core and mapped to the different cores. For multiple temporal delays between two neurons (such as those in the STPU), there is no obvious mechanism for an implementation [18]. SpiNNaker is a massi vely parallel digital computer com- posed of simple ARM cores with an emphasis on ﬂexibility . Unlike the STPU and TrueNorth, SpiNNaker is able to model arbitrary neuron models via an instruction set that is provided to the ARM core. SpiNN Aker is designed for sending large numbers of small data packages to many destination neurons. While SpiNNaker was designed for modeling neural networks, it could potentially be used more generally due to its ﬂexibility . The STPU architecture falls in between the T rueNorth and SpiNNaker architectures. The STPU implements a less parameterized LIF neuron than TrueNorth, howe ver , its routing of neural spikes is more ﬂexible and allo ws a multicast similar to SpiNNaker rather than the unicast used in T rueNorth. A key distinguishing feature of the STPU is the temporal buf fer associated with each neuron, giving the STPU 3-dimensional routing. A high-lev el summary of the comparison of STPU with T rueNorth and SpiNNaker is shown in T able I. I V . L I Q U I D S TA T E M A C H I N E S The liquid state machine (LSM) [11] is a neuro-inspired algorithm that mimics the cortical columns in the brain. It is conjectured that the cortical microcircuits nonlinearly project input streams into a high-dimensional state space. This high-dimension representation is then used as input to other areas in the brain where learning can be achiev ed. The cortical microcircuits hav e a sparse representation and fading memory—the state of the microcircuit “forgets” over time. While LSMs may be able to mimic certain functionality in the brain, it should be noted that LSMs do not try to explain how or why the brain operates as it does. In machine learning, LSMs are a variation of recurrent neu- ral networks that fall into the category of reservoir computing (RC) [19] along with echo state networks [20]. LSMs differ from echo state machines in the type of neuron model used. LSMs use spiking neurons while echo state machines use rate coded neurons with a non-linear transfer function. LSMs operate on temporal data composed of multiple related time steps. LSMs are composed of three general components: 1) input neurons, 2) randomly connected leaky- integrate and ﬁre spiking neurons called the liquid, and 3) readout nodes that read the state of liquid. A diagram of an LSM is shown in Figure 3. Input neurons are connected to a random subset of the liquid neurons. The readout neurons may be connected to all the neurons in the liquid or a subset of them. Connections between neurons in the liquid are based on probabilistic models of brain connectivity [11]: P connection ( N 1 , N 2 ) = q · e − E ( N 1 ,N 2 ) r 2 (4) where N 1 and N 2 represent two neurons and E ( N 1 , N 2 ) is the Euclidean distance between N 1 and N 2 . The variables q and r are two chosen constants. In this paper , we use a 3- dimensional grid to deﬁne the positions of neurons on the liquid. The liquid functions as a temporal kernel, casting the input data into a higher dimension. The LIF neurons allow for temporal state to be carried from one time step to another . LSMs av oid the problem of training recurrent neural models Fig. 3. A liquid state machine, composed of three components: 1) a set of input neurons, 2) the liquid—a set of recurrent spiking neurons, and 3) a set of readout neurons with plastic synapses that can read the state of the neurons in the liquid. T ABLE II P A R A M E T E R S F O R T H E S Y N A P S E S ( O R C O N N E C T I O N S B E T W E E N N E U R O N S ) I N T H E L I Q U I D . Parameter type value r from Equation 4 ALL 2 q from Equation 4 E → E 0.45 E → I 0.30 I → E 0.60 I → I 0.15 Synaptic weight E → E 3 E → I 6 I → E -2 I → I -2 by only training the synaptic weights from the liquid to the readout nodes, similar to extreme machine learning that use a random non-recurrent neural network for non-temporal data [21]. It is assumed that all temporal integration is encompassed in the liquid. Thus, the liquid in an LSM acts similarly to the kernel in a support vector machine on streaming data by employing a temporal kernel. In general, the weights and connections in the liquid do not change, although some studies hav e looked at plasticity in the liquid [22]. The readout neurons are the only neurons that hav e plastic synapses, allowing for synaptic weight updates via training. Using each neurons ﬁring state from the liquid, the temporal aspect of learning on temporal data is transformed to a static (non-temporal) learning problem. As all temporal integration is done in the liquid, no additional mechanisms are needed to train the readout neurons. Any classiﬁer can be used, but often a linear classiﬁer is sufﬁcient. Training of the readout neurons can be done in a batch or on-line manner [10]. LSMs hav e been successfully applied to several applications including speech recognition [10], vision [23], and cognitive neuroscience [11], [24]. Practical applications suffer from the fact that traditional LSMs take input in the form of spike trains. T ransforming numerical input data into spike data, such that the non-temporal data is represented temporally , is nontrivial. V . M A P P I N G T H E L S M O N T O T H E S T P U In this section, we implement the LSM on the STPU. There hav e been previous implementations of LSMs on hardware, howe ver , in most cases an FPGA or VLSI chip has been designed speciﬁcally for a hardware implementation of an LSM. Roy et al. [25] and also Zhang et al. [10] present a low-po wered VLSI hardware implementation of an LSM. Schrauwen et al. [26] implement an LSM on an FPGA chip. In contrast to other work, the STPU has been de veloped to be a general neuromorphic architecture. Other neuroscience work and algorithms are being de veloped against the STPU such as spike sorting and using spikes for median ﬁltering [13]. Currently , we have an STPU simulator implemented in MA TLAB as well as an implementation on an FGP A chip. The MA TLAB simulator has a one-to-one correspondence with the hardware implementation. Giv en the constructs provided by the STPU, the LSM with a liquid composed of LIF neurons maps naturally onto the STPU. W e use the second-order synaptic response function of Equation 3 that is based on the work of Zhang et al. [10]. They found that the second-order response function produced more dynamics in the liquid allo wing the neural signals to persist longer after the input sequence had ﬁnished. This lead to improved classiﬁcation results. Following Zhang et al., the synaptic properties of the liquid, including parameters for the connection probabilities between the liquid neurons deﬁned in Equation 4 and the synaptic weights, are gi ven in T able II. There are two types of neurons: excitatory ( E ) and inhibitory ( I ). As has been observed in the brain [11], the liquid is made up of an 80/20 network where 80% of the nuerons are excitatory and 20% of the neurons are inhibitory . The probability of a synapse existing between two neurons and the weights between the neurons is dependent on the types of the considered neurons. E /I → E /I denotes the presynaptic and postsynaptic neurons being connected by the synapse. For example, E → I denotes the connection between an excitatory presynaptic neuron with an inhibitory postsynaptic neuron. Excitatory neurons increase the action potential at a target neurons (positi ve synaptic weights) while the inhibitory neu- rons decrease the action potential (negativ e synaptic weights). When the connections are generated between neurons in the liquid, the neurons are randomly connected according to Equation 4 with the parameters giv en in T able II. Each input neuron is randomly connected to a subset of 30% of the neurons in the liquid with a weight of 8 or -8 chosen uniformly at random. T o implement the second-order synaptic response function, Equation 3 is sampled at discrete time steps and multiplied by the synaptic weight value between the neurons as speciﬁed in T able II. The discretely sampled weights are then encoded via multiple weights at corresponding cells in the temporal buf fer for the postsynaptic neuron. In this implementation, there is no synaptic delay ( ∆ kl = 0) and τ s 1 is set to 4 and τ s 2 is set to 8 for excitatory neurons. For inhibitory neurons, τ s 1 and τ s 2 are set to 4 and 2 respecti vely . For all neurons, τ j is set to 32. The plastic readout neurons are connected to all of the neurons in the liquid. T raining is done off-line using a linear classiﬁer on the average ﬁring rate of the neurons in the liquid. W e examine the effect of the various linear classiﬁers below . A. Experiments T o ev aluate the effect of different parameters for the liquid state machine, we use a data set for spoken digit recognition of Arabic digits from 0 to 9 [27]. The dataset is composed of the time series Mel-frequency cepstral coef ﬁcients (MFCCs) of 8800 utterances of each digit from 88 speakers on with 10 repetitions per digit ( 10 × 10 × 88 ). The MFCCs were taken from 44 male and 44 female nativ e Arabic speakers between the ages of 18 and 40. The dataset is partitioned into a training set from 66 speakers and a test set from the other 22 speakers. W e scale all variables between 0 and 1. T o ev aluate the performance of the LSM, we examine the classiﬁcation accuracy on the test set, and measure the separation in the liquid from the training set. If there is good separation within the liquid, then the state vectors from the trajectories for each class should be distinguishable from each other . T o measure the separability of a liquid Ψ on a set of state vectors O from the liquid perturbed by a giv en input sequence, we follow the deﬁnition from Norton and V entura [22]: S ep (Ψ , O ) = c d c v +1 where c d is the inter-class distance and c v is the intra-class variance. Separation is the ratio of the distance between the classes divided by the class variance. The inter-class dif ference is the mean difference of the center of mass for ev ery pair of classes: c d = P n l =1 P n m =1 k µ ( O l ) − µ ( O m ) k 2 n 2 where k · k 2 is the L 2 norm, n is the number of classes, and µ ( O l ) is the center of mass for each class. For a given class, the intra-class variance is the mean variance of the state vectors from the inputs from the center of mass for that class: c v = 1 n P n l =1 P o k ∈ O l k µ ( O l ) − o k k 2 | O l | . W e in vestigate v arious properties of the liquid state ma- chine, namely the synaptic response function, the input en- coding scheme, the liquid topology , and the readout train- ing algorithm. W e also consider the impact of θ j on the liquid. A neuron j will spike if v j exceeds θ j . Thus, θ j can ha ve a signiﬁcant impact on the dynamics of the liq- uid. Beginning with a base value of 20 (as was used by Zhang et al.) we consider the effects of decreasing values of θ j ∈ { 20 , 17 . 5 , 15 , 12 . 5 , 11 , 10 , 9 , 7 . 5 , 5 , 3 , 2 . 5 , 2 , 1 } . For default parameters, we use a reservoir of size 3 × 3 × 15 , we feed the the magnitude of the inputs into the input neurons (current injection) and a linear SVM to train the synapses for the readout neurons. 1) Synaptic Response Functions: W e ﬁrst in vestigate the effect of the synaptic response function using default param- eters. Using θ j = 20 , the av erage separation values, av erage spiking rates and the classiﬁcation accuracy of a linear SVM are given in T able III. As highlighted in bold, the second-order synaptic function (Equation 3) achiev es the largest separation values for training and testing, the lo west average spike rate, and the highest classiﬁcation accuracy . The av erage spike rate is signiﬁcantly higher for the ﬁrst-order response function T ABLE III S E PAR ATI O N V A L U E S , A V E R A G E S P I K I N G R ATE S , A N D C L A S S I FI C ATI O N AC C U R A C Y F R O M D I FF E R E N T S Y NA P T I C R E S P O N S E F U N C T I O N S . Synaptic Res T rainSep TrainRate T estSep T estRate SVM Dirac Delta 0.129 0.931 0.139 0.931 0.650 First-Order 0.251 0.845 0.277 0.845 0.797 Second-Order 0.263 0.261 0.290 0.255 0.868 First-Order 30 0.352 0.689 0.389 0.688 0.811 First-Order 40 0.293 0.314 0.337 0.314 0.817 First-Order 50 0.129 0.138 0.134 0.138 0.725 0 5 10 15 20 25 30 35 0 0 . 1 0 . 2 time response val First-Order Second-Order Fig. 4. V isualization of the ﬁrst- and second-order response functions. than for the second-order response function, which is counter- intuitiv e since the second-order response function perpetuates the signal through the liquid longer . Howe ver , examining the ﬁrst-order and second-order response functions, as shown in Figure 4, sho ws that the ﬁrst-order response function has a larger initial magnitude and then quickly subsides. The second- order response function has a lower initial magnitude, but is slower to decay giving a more consistent propagation of the spike through time. Adjusting the value of θ j to 30, 40, and 50 accommodate this behavior for the ﬁrst-order response function (the bottom three ro ws int he T able III) shows that an improvement can be made in the separation values, spiking rate, and classiﬁcation accuracy . Despite this improvement, the ﬁrst-order response function does not achieve a better performance than the second-order response function for the classiﬁcation accuracy . The ﬁrst-order response function does get a better separation score, but this does not translate into better accuracy . 2) Input Encoding Schemes: In traditional LSMs, the input is temporally encoded in the form of a spike train. Un- fortunately , most datasets are not temporally encoded, but rather are numerically encoded. A spike train input aligns with neuroscience, but practically it is non-trivial to encode all information temporally as the brain does. Therefore, we examine three possible encoding schemes: 1) rate encoding where the magnitude of the numeric value is con verted into a rate and a spike train at that rate is fed into the liquid 2) bit encoding where the magnitude of the numeric value is con verted into its bit representation at a giv en precision, and 3) current injection. Rate encoding requires n time steps to encode a single input by con verting a magnitude to a rate. This is similar to binning and has some information loss. Bit encoding, only requires one time step, ho wever , it requires m inputs per standard input to conv ert the magnitude into its T ABLE IV T H E S E PA R A T I O N V A L U E S , A V E R A G E S P I K I N G R A T E S O F T H E L I Q U I D , A N D C L A S S I FI C ATI O N AC C U R A C Y E X A M I N I N G L I Q U I D T O P O L O G I E S W I T H D I FF E R E N T I N P U T E N C O D I N G S C H E M E S A N D V A L U E S F O R θ j . T H E L A R G E S T S E PA R ATI O N V A L U E S A N D AC C U R AC I E S F O R E AC H E N C O D I N G S C H E M E A R E I N B O L D θ j Encoding Scheme 20 15 10 5.5 3 2 Current Injection 0.263 0.409 0.378 0.334 0.324 0.324 0.261 0.580 0.750 0.843 0.873 0.878 0.868 0.905 0.894 0.873 0.868 0.866 Bit Encoding 0.271 0.310 0.338 0.350 0.353 0.357 0.434 0.497 0.544 0.592 0.634 0.620 0.741 0.741 0.735 0.755 0.764 0.761 Rate Encoding 0.164 0.364 0.622 0.197 0.047 0.048 0.146 0.199 0.594 0.952 0.985 0.985 0.747 0.733 0.643 0.601 0.548 0.558 m bit-precise representation. W e set m to 10. Compared to current injection, the execution time increases linearly in the number of time steps for rate encoding. T able IV shows the separation v alues (ﬁrst ro w for each encoding scheme), average spiking rates (second row), and the accuracy from a linear SVM on the test set (third row) for the input encoding schemes with various v alues of θ j . The av erage spiking rate giv es the percentage of neurons that were ﬁring in the liquid through the time series and provides insight into how sparse the spikes are within the liquid. T able IV shows a representativ e subset of the v alues that were used for θ j . The bold v alues represent the highest separation value and classiﬁcation accuracy for each encoding scheme. The results show that the v alue of θ j has a signiﬁcant effect on the separation of the liquid as well as the classiﬁcation accuracy from the SVM. This is expected as the dynamics of the liquid are dictated by when neurons ﬁre. A lower threshold allows for more spikes as is indicated in the increasing values of the av erage spiking rates as the values for θ j decrease. Overall, using rate encoding produces the greatest values in separation. Ho we ver , there is signiﬁcant variability as the values for θ j change. For rate encoding, the greatest accuracy from the SVM is achiev ed with a low separation value. In the other encoding schemes, separation and classiﬁcation accuracy appear to be correlated. The greatest classiﬁcation accuracy is achiev ed from current injection. 3) Liquid T opology: The topology of the liquid in an LSM determines the size of the liquid and inﬂuences the connections within the liquid as the distance between the neurons impacts the connections made between neurons. A more cubic liquid (e.g. 5 × 5 × 5 ) should be more densely connected compared to a column of liquid (e.g. 2 × 2 × 20) . In this section, we examine using liquids with grids of 3 × 3 × 15 , 2 × 2 × 20 , 4 × 5 × 20 , and 5 × 5 × 5 . As before, we consider different v alues for θ j . The separation values, av erage spike rates, and the accuracy from a linear SVM are gi ven in T able V for the v alues of θ j that provided the largest separation values for each topology conﬁguration and encoding scheme combination. T ABLE V T H E S E PA R A T I O N V A L U E S A N D A V E R A G E S P I K E R AT E O F T H E L I Q U I D U S I N G D I FF E R E N T L I Q U I D T O P O L O G I E S . T H E L A R G E S T S E PA R A T I O N V A L U E S A N D AC C U R A C I E S F O R E AC H T O P O L O G Y A R E I N B O L D T opology/ Encoding Scheme Num Neurons Current Injection Bit Encoded Rate Encoded θ j : 15 12.5 3 2 11 10 3x3x15/135 0.409 0.405 0.353 0.357 0.622 0.622 0.580 0.693 0.634 0.620 0.563 0.594 0.905 0.884 0.764 0.761 0.658 0.643 θ j : 12.5 11 3 2 10 9 5x5x5/125 0.467 0.432 0.384 0.380 0.384 0.451 0.534 0.682 0.596 0.586 0.473 0.600 0.889 0.900 0.765 0.755 0.637 0.610 θ j : 15 12.5 15 12.5 15 12.5 4x5x10/200 0.418 0.384 0.506 0.501 0.479 0.352 0.698 0.760 0.576 0.603 0.289 0.577 0.879 0.876 0.830 0.835 0.644 0.612 θ j : 11 10 3 2 10 9 2x2x20/80 0.397 0.389 0.264 0.247 0.305 0.255 0.608 0.658 0.537 0.550 0.609 0.846 0.903 0.912 0.672 0.666 0.695 0.678 Again, the v alue for θ j has a signiﬁcant impact on the separation of the liquid and the classiﬁcation accuracy . The greatest separation values and classiﬁcation accuracies for each topology is highlighted in bold. For all of the topologies, current injection achiev es the highest classiﬁcation accuracy . Interestingly , the separation values across encoding schemes and topologies do not correlate with accuracies. Within the same encoding scheme and topology , howe ver , the accuracy generally improv es as the separation increases. For current injection, the different topologies do not appear to have a signiﬁcant impact on the classiﬁcation accuracy except for the 4 × 5 × 10 topology which has a decrease in accuracy . This may be due to increased number of liquid nodes that are used as input to the SVM. The con verse is true for bit encoding as the 4 × 5 × 10 topology achie ves the highest accuracy possibly due to the increased number of inputs due to the bit representation of the input. 4) Readout T raining Algorithms: Ho w the plastic synapses are trained will have a signiﬁcant effect on the performance of the LSM. Traditionally , LSMs use a linear classiﬁer based on the assumption that the liquid has transformed the state space such that the problem is linearly separable. Linear models are represented as a set of weights and a threshold—which can be implemented in neuromorphic hardware. By using a linear model, the liquid and the classiﬁcation can all be done on the STPU av oiding the overhead of going off chip to make a prediction. W e consider four linear classiﬁers: 1) linear SVM, 2) linear discriminant analysis (LD A), 3) ridge regression, and 4) logistic regression. With each of these algorithms, we use the default parameters as they are set in the Statistics and Machine Learning T oolbox in MA TLAB. W e examine the classiﬁcation of each of the linear classiﬁers T ABLE VI C L A S S I FI C ATI O N AC C U R A C Y O N T H E T E S T S E T F RO M D I FF E R E N T L I N E A R C L A S S I FI E R S . T H E G R E ATE S T AC C U R A C Y F O R E A C H TO P O L O G Y I S I N B O L D Linear Model: 3 × 3 × 15 5 × 5 × 5 4 × 5 × 10 2 × 2 × 20 θ j = 15 θ j = 11 θ j = 15 θ j = 10 Linear SVM 0.906 0.900 0.900 0.914 LD A 0.921 0.922 0.922 0.946 Ridge Regress 0.745 0.717 0.717 0.897 Logistic Regress 0.431 0.254 0.254 0.815 on the topologies and θ j values that achieved the highest clas- siﬁcation accuracy on the linear SVM in pre vious experiments. W e also limit ourselves to examining current injection for the input scheme as current injection consistently achiev ed the highest classiﬁcation accuracy . The results are shown in T able VI. LD A consistently achieves the highest classiﬁcation ac- curacy of the considered classiﬁers. The highest classiﬁcation accuracy achiev ed is 0.946. V I . C O N C L U S I O N A N D F U T U R E W O R K In this paper, we presented the Spiking T emporal Processing Unit or STPU—a nov el neuromorphic processing architecture. It is well suited for ef ﬁciently implementing neural networks and synaptic response functions of arbitrary complexity . This is facilitated by using the temporal buf fers associated with each neuron in the architecture. The capabilities of the STPU, including complex synaptic response functions, were demon- strated through implementing the functional mapping and implementation of an LSM onto the STPU architecture. As neural algorithms grow in scale and con ventional pro- cessing units reach the limits of Moore’ s law , neuromorphic computing architectures, such as the STPU, allow ef ﬁcient implementations of neural algorithms. Howe ver , neuromorphic hardware is based on spiking neural networks to achiev e low energy . Thus, more research is needed to understand and dev elop spiking-based algorithms. R E F E R E N C E S [1] L. Deng, G. E. Hinton, and B. Kingsbury , “New types of deep neural network learning for speech recognition and related applications: an overvie w , ” in Proceedings of International Conference on Acoustics, Speech, and Signal Pr ocessing (ICASSP) , 2013, pp. 8599–8603. [2] D. Ciresan, U. Meier, and J. Schmidhuber, “Multi-column deep neural networks for image classiﬁcation, ” in Pr oceedings of the 2012 IEEE Confer ence on Computer V ision and P attern Recognition . IEEE Computer Society , 2012, pp. 3642–3649. [3] R. Socher, A. Perelygin, J. W u, J. Chuang, C. D. Manning, A. Ng, and C. Potts, “Recursive deep models for semantic compositionality over a sentiment treebank, ” in Pr oceedings of the 2013 Conference on Empirical Methods in Natural Language Pr ocessing . Association for Computational Linguistics, October 2013, pp. 1631–1642. [4] J. Backus, “Can programming be liberated from the von neumann style?: A functional style and its algebra of programs, ” Communications of the ACM , vol. 21, no. 8, pp. 613–641, Aug. 1978. [5] P . L. Follett, C. Roth, D. Follett, and O. Dammann, “White matter damage impairs adaptiv e recovery more than cortical damage in an in silico model of activity-dependent plasticity , ” Journal of Child Neur ol- ogy , vol. 24, no. 9, pp. 1205–1211, 2009. [6] T . J. Sejnowski, “Time for a new neural code?” Nature , vol. 376, no. 6535, pp. 21–22, Jul. 1995. [7] P . Merolla, J. Arthur , R. Alvarez-Icaza, A. Cassidy , J. Sawada, F . Akopyan, B. Jackson, N. Imam, C. Guo, Y . Nakamura, B. Brezzo, I. V o, S. Esser, R. Appuswamy , B. T aba, A. Amir , M. Flickner , W . Risk, R. Manohar, and D. Modha, “ A million spiking-neuron integrated circuit with a scalable communication network and interface, ” Science , pp. 668– 673, August 2014. [8] S. B. Furber , D. R. Lester , L. A. Plana, J. D. Garside, E. Painkras, S. T emple, and A. D. Bro wn, “Overview of the spinnaker system architecture, ” IEEE T ransactions on Computers , vol. 62, no. 12, pp. 2454–2467, 2013. [9] J. Schemmel, J. Fieres, and K. Meier, “W afer-scale integration of analog neural networks, ” in IEEE International Joint Confer ence on Neural Networks , june 2008, pp. 431 –438. [10] Y . Zhang, P . Li, Y . Jin, and Y . Choe, “ A digital liquid state machine with biologically inspired learning and its application to speech recognition, ” IEEE T ransactions on Neural Networks and Learning Systems , vol. 26, no. 11, pp. 2635–2649, 2015. [11] W . Maass, T . Natschl ¨ ager , and H. Markram, “Real-time computing without stable states: A new framework for neural computation based on perturbations, ” Neural Computation , vol. 14, no. 11, pp. 2531–2560, Nov . 2002. [12] S. J. V erzi, C. M. V ineyard, E. D. V ugrin, M. Galiardi, C. D. James, and J. B. Aimone, “Optimization-based computation with spiking neurons, ” in Pr oceedings of the IEEE International Joint Conference on Neural Network , 2017, p. Accepted. [13] S. J. V erzi, F . Rothganger , O. D. Parekh, T .-T . Quach, N. E. Miner, C. D. James, and J. B. Aimone, “Computing with spikes: The advantage of ﬁne-grained timing, ” submitted. [14] P . Dayan and L. F . Abbott, Theoretical neur oscience : computational and mathematical modeling of neural systems , ser . Computational neu- roscience. Cambridge (Mass.), London: MIT Press, 2001. [15] B. V . Benjamin, P . Gao, E. McQuinn, S. Choudhary , A. Chandrasekaran, J.-M. Bussat, R. Alvarez-Icaza, J. Arthur, P . Merolla, and K. Boahen, “Neurogrid: A mixed-analog-digital multichip system for large-scale neural simulations, ” Pr oceedings of the IEEE , vol. 102, no. 5, pp. 699– 716, 2014. [16] J. Schemmel, D. Briiderle, A. Griibl, M. Hock, K. Meier , and S. Millner, “ A wafer-scale neuromorphic hardware system for large-scale neural modeling, ” in Proceedings of 2010 IEEE International Symposium on Cir cuits and Systems , May 2010, pp. 1947–1950. [17] S. Furber, F . Galluppi, S. T emple, and L. A. Plana, “The spinnaker project, ” Proceedings of the IEEE , vol. 102, no. 5, pp. 652–665, 2014. [18] W . Severa, K. D. Carlson, O. Parekh, C. M. V ineyard, and J. B. Aimone, “Can we be formal in assessing the strengths and weaknesses of neural architectures? A case study using a spiking cross-correlation algorithm, ” in NIPS W orkshop Computing with Spikes , 2016. [19] M. Luko ˇ sevi ˇ cius and H. Jaeger , “Reservoir computing approaches to recurrent neural network training, ” Computer Science Revie w , v ol. 3, no. 3, pp. 127–149, August 2009. [20] H. Jaeger , “ Adaptiv e nonlinear system identiﬁcation with echo state networks, ” in Advances in Neural Information Pr ocessing Systems 15 . MIT Press, 2003, pp. 609–616. [21] G. B. Huang, Q. Y . Zhu, and C. K. Siew , “Extreme learning machine: theory and applications, ” Neur ocomputing , vol. 70, no. 1-3, pp. 489–501, 2006. [22] D. Norton and D. V entura, “Improving liquid state machines through iterativ e reﬁnement of the reservoir , ” Neurocomputing , vol. 73, no. 16- 18, pp. 2893–2904, 2010. [23] H. Burgsteiner , M. Kr ¨ oll, A. Leopold, and G. Steinbauer, “Movement prediction from real-world images using a liquid state machine, ” Applied Intelligence , vol. 26, no. 2, pp. 99–109, 2007. [24] D. Buonomano and W . Maass, “State-dependent computations: Spa- tiotemporal processing in cortical networks, ” Nature Revie ws Neuro- science , vol. 10, no. 2, pp. 113–125, 2009. [25] S. Roy , A. Banerjee, and A. Basu, “Liquid state machine with dendrit- ically enhanced readout for low-po wer, neuromorphic VLSI implemen- tations, ” IEEE Tr ansactions on Biomedical Cir cuits and Systems , vol. 8, no. 5, 2014. [26] B. Schrauwen, M. D‘Haene, D. V erstraeten, and D. Stroobandt, “Com- pact hardware liquid state machines on fpga for real-time speech recognition, ” Neural Networks , no. 21, pp. 511–523, 1 2008. [27] N. Hammami and M. Bedda, “Improved tree model for arabic speech recognition, ” in Pr oceddings of the IEEE International Conference on Computer Science and Information T echnology , 2010.

A Digital Neuromorphic Architecture Efficiently Facilitating Complex Synaptic Response Functions Applied to Liquid State Machines

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment