Lattice Map Spiking Neural Networks (LM-SNNs) for Clustering and Classifying Image Data

Spiking neural networks (SNNs) with a lattice architecture are introduced in this work, combining several desirable properties of SNNs and self-organized maps (SOMs). Networks are trained with biologically motivated, unsupervised learning rules to ob…

Authors: Hananel Hazan, Daniel J. Saunders, Darpan T. Sanghavi

Lattice Map Spiking Neural Networks (LM-SNNs) for Clustering and   Classifying Image Data
Lattice Map Spiking Neural Net w orks (LM-SNNs) for Clustering and Classifying Image Data Hananel Hazan 1 , Daniel J. Saunders 1 , Darpan T. Sangha vi 1 , Ha v a Siegelmann 1 , and Rob ert Kozma 1 , 2 1 Biologically-Inspired Neural & Dynamical Systems Lab oratory (BINDS) Univ ersity of Massac h usetts Amherst, Amherst, MA01003, USA 2 Cen ter for Large-Scale Intelligen t Optimization & Net w orks (CLION) Univ ersity of Memphis, TN38152, USA June 28, 2019 Abstract Spiking neur al networks (SNNs) with a lattice architecture are in tro duced in this work, com bining sev eral desirable prop erties of SNNs and self-or ganize d maps (SOMs). Netw orks are trained with biologically motiv ated, unsup ervised learning rules to obtain a self-organized grid of filters via co operative and competitive excitatory-inhibitory in teractions. Several inhibition strategies are dev elop ed and tested, suc h as (i) incr emental ly incr e asing inhibition lev el o v er the course of net work training, and (ii) switc hing the inhibition level from lo w to high ( two-level ) after an initial training segment. During the lab eling phase, the spiking activit y generated b y data with kno wn lab els is used to assign neurons to categories of data, whic h are then used to ev aluate the net work’s classification ability on a held-out set of test data. Sev eral biologically plausible ev aluation rules are prop osed and compared, including a p opulation-lev el confidence rating, and an n -gram inspired method. The effectiv eness of the prop osed self- organized learning mechanism is tested using the MNIST b enchmark dataset, as well as using images pro duced by playing the A tari Break out game. Original Manuscript Submitted: Octob er 30, 2018 Revised: May 28, 2019 Sp ecial Issue: ”Cognition and Neuro computation” of A nnals of Mathematics and Artificial Intel ligenc e 1 1 In tro duction T o da y’s dominant artificial intel ligenc e (AI) approac h uses de ep neur al networks (DNNs), which are based on glob al gradien t-descent learning algorithms [1, 2], wherein a loss function is defined and all DNN parameters are up dated using approximate deriv atives to minimize it. The success of this approac h is based on employing massive amounts of data to train the DNNs, which requires significan t computational resources [3] provided by the exp onentially increasing p ow er of prolific sup ercomputing facilities world-wide. In the case of certain practical problems, ho wev er, one may not hav e a sufficiently large dataset to adequately cov er the problem space, or one may need to mak e decisions quickly without waiting for an exp ensiv e training pro cess. There are sev eral prop osed approaches to o v ercome the computational constraints of de ep le arn- ing (DL), one of which is based on using lo c al learning rules inv oking neuro-biologically motiv ated spik e-timing-dep endent plasticit y (STDP) [4]. Such learning rules represen t a trade-off b etw een the inevitably decreased classification p erformance due to the unsup ervised nature of their operation, and the adv an tage provided by the distributed nature of the parameter up dates. Namely , lo cal learning rules do not require an end-to-end differentiable netw ork structure, and no time is wasted on synchronous parameter up dates through forward and backw ard passes widely used in bac k- propagation algorithms. Substituting global learning metho ds with lo cal learning rules provides a solution to the computational b ottleneck of deep learning, striking a balance b etw een p ossibly increased learning sp eed and lesser memory requirements at the cost of reduced performance on mac hine learning (ML) tasks. In order to mo dify the synaptic parameters of spiking neural netw orks, one ma y record spike tr ac es and neuron membr ane p otentials . The former captures a short-term trace of a neuron’s spiking activity , while the latter is used in the simulation of the spiking neuron as part of its c harging and action potential emission pro cess. These t wo state variables are all that is required to implemen t a robust set of correlation-based learning rules for synaptic strengths b etw een spiking neurons; e.g., Hebbian learning [5] and STDP [6]. Certain SNNs hav e the potential to b e more energy efficient than their deep learning coun terparts [7, 8], as DNNs m ust cache the results of their lay er-wise computation (pro duced during the forwar d p ass ) to b e used in the computation of gradien ts during the b ackwar d p ass . The memory requirement is proportional to the num ber and size of lay ers utilized by the netw ork, and the size of the input data. SNNs whic h implement lo cal learning rules are free from the forward-bac kward computational paradigm, and for that reason, ma y b e b eneficial in terms of time and memory complexit y . 1.1 Related W ork 1.1.1 Spiking Neural Net w orks There is an extensiv e literature on spiking neural net w orks for v arious applications [9], but it was not un til recen tly that SNNs demonstrated their feasibilit y to solve complex mac hine learning problems. Dep ending on the requirements of SNN simulation s, v arious pack ages provide different lev els of biological realism. F or example, the NEST [10], and BRIAN [11] libraries are rather faithful to fine-grained neuro-biological details, while framew orks like CARLsim [12] and Nengo [13] are more fo cused on engineering applications and efficiency , with less atten tion to biological constraints. The work b y Diehl & Co ok [14] uses a simple three-lay er spiking neural net work to classify the MNIST handwritten digits b y learning synaptic weigh ts in an unsup ervised fashion with sev eral differen t STDP rules. A num ber of other previous pro jects use STDP to train SNNs to classify image 2 data [15, 16]. The former uses Gabor filters as a pre-pro cessing input step to detect simple features, whic h are then used as input to their net work [15]. They use a rank-order enco ding sc heme on the input spikes, whic h are pro cessed through the netw ork, in which the winner-tak e-all classification strategy is used on its output activity . The latter metho d comprises a difference of Gaussian pre- pro cessing step follow ed b y SNN with a series of conv olutional and p o oling lay ers, whose output is used to train a linear SVM for classification [16]. Other systems use spiking neurons, but require training with a sup ervisory signal; e.g., [17], in whic h a deep neural net work is first trained using bac k-propagation and later transferred to a similar SNN without muc h los s in p erformance. More comprehensiv e artificial neural netw ork (ANN) to SNN con version metho ds are presen ted in [18, 19]. Still others simply apply v arious approximations of the back-propagation algorithm directly to the spiking netw orks itself [20–23]. 1.1.2 Self-Organizing Properties with Spiking Neural Net w orks There is a great p oten tial to extend the work of [14, 24] and [25] by com bining STDP rules and comp etitiv e inhibitory in teractions with a mechanism inspired by the self-organizing map (SOM) algorithm [26] and the prop erties of the The adaptive resonance theory (AR T) mo del [27]. SOMs are able to cluster an unlab eled dataset in an unsup ervised manner. The top ological arrangement created by the SOM algorithm forms sp ecialized clusters that often corresp ond w ell with the cate- gories that exist in the dataset. One cav eat of the SOM training algorithm is that the size of the training set m ust b e known in adv ance in order to adjust the learning rate accordingly . This clustering prop ert y is reminiscen t of sp ecialized areas in the primate cortex, wherein groups of neurons self-organize themselves based on the similarity of their role in functionalit y in parts of the primate b o dy [28]. An experimental study of Martinotti and bask et cells provides an in silic o sim ulation of the self-organizing (clustering) ability of the cortex [29]. Early w ork use spiking neurons [24] with similar topology used by [26] for the benefit of self- organizing prop erties. The authors trained their net work using a mo dified Hebbian learning with correlation function on a giving input. The algorithm sho wed selective b ehavioral activity and dev elop ed a receptiv e fields as expected from self-organized net work. Unlike Kohonen SOM, this algorithm require to b e initialize with unique random weigh ts that is relative to the num b er of neurons in the receptive field. Moreov er, it require tuning the strength of the learning magnitude to be prop ortional to the training size in order to reach a certain of con version. Th us its not fit for training on the fly , where the size and the conten t of the dataset is unkno wn. In teresting work on self-organization in SNNs has b een accomplished in [30], in whic h the authors analyze SOM formation in a netw ork of in tegrate-and-fire (IF) neurons on b oth synthetic data and a cancer dataset. In [31], a SNN is first trained to form a SOM of a cancer dataset, after which it is used to classify the data. 1.2 Outline of the present w ork In this work, we start b y emplo ying the netw ork arc hitecture and co de of Diehl & C ook [14], whic h mak es use of the BRIAN simulator. W e introduce mo difications to the original Diehl & Cook approac h, in particular by employing lattice top ology in the inhibitory lay er and using multi-lev el inhibition schemes. W e demonstrate increased learning sp eed on machine learning tasks due to these arc hitectural change. Next, we describe the PyT orc h-based BindsNET 1 sim ulator [32, 33], 1 BindsNET code is av ailable at https://github.com/Hananel-Hazan/bindsnet 3 where we were able to replicate previous results using the BRIAN library and implement new exp erimen ts based on the greater flexibilit y of BindsNET. The emplo y ed testb eds include images from MNIST dataset [34] and images obtained b y the Atari Breakout game. Using optimally tuned parameters, our approach pro duces p erformance improv emen ts ov er previous state-of-the-art unsup ervised spiking neural netw orks. Our w ork shows ho w to use net works of self-organizing spiking neurons to (1) simultaneously cluster and classify , and (2) op erate on image data. W e also shows ho w the training done without the need to kno w the size or the distribution of the input. Thus, this work op en the feasibility of implemen ting spiking neural netw orks for efficien t machine learning applications to b e able to train on the fly in unsup ervised manner. In this pap er we sho w examples of SNN learned image filters and map formation and report test data p erformance based on lab els determined with spiking activity from the training phase. 2 Metho ds This section summarizes the applied spiking neural net work structure and learning algorithms using leaky integrate-and-fire (LIF) neurons and spik e-timing-dep enden t plasticity (STDP), as initially outlined in [14], extended in [25, 32], and supplemen ted in [33] with in ter-neuron distance-dep enden t inhibition strength. 2.1 LIF neurons and STDP learning It has been demonstrated that, in regard to the n umber of neural units needed, netw orks of spiking neurons constitute a more computationally p ow erful class of neural netw ork mo dels than that com- prised of traditional ANNs [35]. Namely , single spiking neurons are able to compute biologically relev an t functions that would require many more ANN hidden units to realize. The basic computa- tional op erations of an ANN do not incorp orate time, and unlike the all-or-nothing action p otential of the biological neuron, they communicate b y sending precise floating-p oint num b ers downstream to the next lay er of pro cessing units. The standard unit used in traditional neural netw orks is sync hronous; i.e., all neurons within a lay er participate in computation at each step during for- w ard ( infer enc e ) and backw ard ( le arning ) passes, whereas spiking neurons are w ell-known for their async hronous op eration; i.e., they are up dated only as needed. Moreov er, neurons of ANNs do not ha ve memories of their previous actions, compared to spiking neurons, whic h often integrate time naturally by keeping track of decaying memory traces or implementing leaky voltages (membrane p oten tials). A computationally efficien t choice of biologically-inspired unit of computation is the leaky in tegrate-and-fire (LIF) spiking neuron [36]. These incorp orate time in their op eration by inte- grating a mo derately quic kly decaying voltage as sim ulation progresses, y et are simple enough to b e incorp orated in the large netw orks required for pro cessing large amounts of images or other complex data. In our netw ork architectures, we use these units; some are excitatory (increasing the p oten tials of their downstream neigh b ors) and others are inhibitory (decreasing the potentials of their do wnstream neighbors). Synaptic conductances are mo deled, which are used in the update equation for neuron membrane p oten tial. The p otentials are given b y τ dv dt = ( v rest − v ) + g e ( E exc − v ) + g i ( E inh − v ) , (1) 4 where v rest is the resting mem brane potential, E exc , E inh are the equilibrium p oten tials of excitatory and inhibitory synapses, respectively , and g e , g i are the conductances of excitatory and inhibitory synapses, respectively . When a neuron’s membrane potential exceeds a threshold v thresh , it fires an action p otential and resets back to v reset . The neuron then undergoes a short (5ms) refractory p erio d, during which time it do es not integrate input and cannot spike. Synapses are mo deled by conductance changes and synaptic weigh ts w : a synapse increases its conductance at the moment a pre-synaptic action p otential arrives by w [6]. Otherwise, the conductances are deca ying exp onen tially . The dynamics of the synaptic conductance are giv en by τ g e dg e dt = − g e , τ g i dg i dt = − g i . (2) Spik e-timing-dep endent plasticit y [4, 37] is used to modify the w eights of synapses connecting certain neurons. W e call the neuron from whic h a synapse pro jects pr e-synaptic , and the one to whic h it connects p ost-synaptic . F or the sake of computational efficiency , we use online STDP , in whic h spik e traces x pre and x post are recorded for ev ery synapse, a deca ying memory of its recent pre- and p ost-synaptic spiking history . Each time a pre-synaptic (p ost-synaptic) spik e o ccurs, x pre ( x post ) is set to 1; otherwise, it decays exp onentially to zero with a time constant τ trace . When a pre- or p ost-synaptic spike o ccurs, a w eight c hange ∆ w is computed using a STDP update rule. The rule w e chose for our exp eriments is given b y ∆ w = ( η post x pre ( w max − w ) on p ost-synaptic spike − η pre x post w on pre-synaptic spik e (3) The term η denotes a learning rate and w max is the maximal allow ed synaptic weigh t. 2.2 SNN architecture and op erating principle The basic SNN of [14] has a m ulti-lay er structure, comprised of an input la yer, an excitatory lay er, and an inhibitory lay er. F or image data, the input la yer is an arra y of size n n input = h × w , where h, w are the image’s vertical and horizon tal span in pixels, resp ectively . The excitatory and inhibitory lay ers hav e arbitrary size n n neurons . The input lay er is connected all-to-all with the excitatory neurons, the excitatory lay er is connected one-to-one with the inhibitory lay er, and eac h inhibitory neuron connects to all excitatory neurons, except the one from which it receives its one-to-one connection. This later al inhibition creates a comp etition betw een excitatory neurons, in the sense that when an excitatory neuron fires and activ ates its corresp onding inhibitory neuron, all other excitatory neurons are inhibited, a mechanism akin to the op eration of a winner-take-al l (WT A) circuit [38]. Spikes are generated in the input neurons by con verting each pixel in an image in to a Poisson spik e train [9] with firing rate λ prop ortional to its intensit y; i.e., more in tense pixels are con verted into spike trains with a higher av erage firing rate. This process is carried out for a sp ecific simulation time, and the STDP rule is computed for the connections betw een the input and excitatory lay ers. It is an imp ortant prop erty of the architecture that once a neuron fires, it inhibits all other neurons according to the inhibition sc heme described abov e. F or further details on the basic mo del structure, including considerations on system size, see [14, 25]. The WT A-like mechanism describ ed ab ov e is what allows individual neurons to learn unique filters. Increasing the num b er of neurons in the excitatory and inhibitory lay ers has the effect of enabling a SNN to memorize more canonical examples from the training data, and therefore 5 (a) (b) Figure 1: Illustration of the SNN structure and its op eration; (a) basic spiking neural netw ork arc hitecture, following [14]; (b) learned filters from baseline mo del. recognize more patterns during the test phase. This netw ork architecture is depicted in Figure 1a, and an example set of filters learned from the MNIST handwritten digit dataset [34] is sho wn in Figure 1b. Although the filters are arranged in a t wo-dimensional matrix, there is no neighborho o d structure b et ween the learned filters; i.e., neigh b oring excitatory neuron filters do not hav e any particular relationship with each other. Figure 2: Inhibition as a function of Euclidean distance 6 T able 1: Definition of parameters used in training of LM-SNNs. c inhib F actor to scale the distance b etw een neurons to determine inhibition strength c min Minim um allow able inhibition strength c max Maxim um allow able inhibition strength p low Prop ortion of training where c inhib is fixed to c min p grow Prop ortion of training where c inhib is linearly increased from c min to c max 2.3 In tro ducing top ology in la yers b y increasing inhibition with in ter- neuron distance P arameters for the spatial and temp oral v ariation of inhibition are summarized in T able 1. 2.3.1 Spatial profile of inhibition W e introduce a change to the foregoing SNN architecture, by defining a profile in inter-neuron inhibition as a function of their distance on a lattice. Instead of inhibiting all other neurons at a large fixed rate as in [14, 25], we increase the lev el of inhibition with the square root of the Euclidean distance b etw een neurons, similar to the SOM learning algorithm: inhib i,j = ( c inhib p ( x i − x j ) 2 + ( y i − y j ) 2 , if c inhib p ( x i − x j ) 2 + ( y i − y j ) 2 < c max , c max , otherwise , (4) where i, j indexes a pair of neurons, and ( x i , y i ) , ( x j , y j ) giv e their position on a t wo-dimensional lattice. This assumes that the neurons in the excitatory la yer are arranged as suc h. It requires a parameter c inhib , whic h is m ultiplied by the distance to compute inhibition levels, as w ell as a parameter c max that giv es the maximum allo wed inhibition; see Figure 2. With prop er choice of c inhib , when a neuron exceeds its firing threshold v threshold , instead of inhibiting all other neurons from firing, a neighborho o d of nearb y neurons will be w eakly inhibited and ma y hav e the c hance to fire. This encourages neighborho o ds of neurons to fire for the same inputs and learn similar filters. The radius of the neigh b orho o d is determined by the c inhib parameter. This inhibition profile is inspired b y the self-organizing map (SOM) [26] and it is included in part to reduce the degree of comp etition imp osed by the connections from the inhibitory lay er. The original inhibitory scheme introduced in [14] pro duces a netw ork activ ation which is typically to o sparse during training to learn filters quic kly . Namely , after an initial transient following the in tro duction of an input example, one neuron fires first and causes all others to b ecome strongly inhibited and therefore quiescent. Additionally , the in tro duced profile causes excitatory neuron filters to self-organize into distinct clusters by similarit y , whic h often corresp ond to categories of the input data. In contrast to the SOM learning algorithm, this method main tains a single learning rate through- out the training process and is able to con tin ue to learn if more data become av ailable. In the SOM, the learning rate is decreasing throughout the training phase, enabling the net work to change dra- matically in early stages of the training and then gradually stabilize and fine-tune its weigh ts b y the end of the training phase. F or clarity , lattice map spiking neural netw orks with all v arian ts of increasing inhibition strategies are denoted as LM-SNNs. 7 2.3.2 T emporal v ariation of inhibition profile • Gr owing inhibition over tr aining W e wan t to pro duce individualize d filters as learned by the SNN presen ted in [14], yet retain the clustering of filters ac hieved by our incr e asing inhibition modification. Distinct filters are necessary to ensure that our learned representation contains as little redundancy as p ossible, making the best use of model capacit y . T o that end, w e implemen ted another mo dification to the inhibition scheme, where the inhibition constant c inhib gro ws on a linear schedule from a small v alue c min ≈ 0 . 1 to a large v alue c max ≈ 17 . 5. The incr e asing inhibition strategy is used as b efore; how ev er, b y the end of net work training, the inhibition level is equiv alent to that of [14]. T o illustrate the effect of the spatial profile of inhibition, we sho w Figure 3, where increasing inhibition creates clusters of filters in the excitatory lay ers; the corresp onding la yers are display ed as well. Compare this to Figure 1b, in which filters are learned by using fixed, uniform inhibition across the space, according to the learning algorithm describ ed in [14]. In Figure 3, the filters self-organize into smoothly-v arying clusters, and then individualize as the inhibition lev el becomes large. W e also consider growing the inhibition level to c max for some p ercentage of the training ( p grow ) and holding it fixed for the rest (1 − p grow ). During the last 1 − p grow p ercen tage of the training phase, neuron filters are allow ed to individualize more, enabling the netw ork to represent less frequent examples from the training data. • Two-level inhibition T o remov e the need to re-compute inhibitory synapse strengths through- out net work training, we implemented a simple two-level inhibition scheme: F or the first p low prop ortion of the training, the netw ork is trained with inhibition level c min ; for the last 1 − p low prop ortion, the net work is trained with c max . The inhibition lev el is not smoothly v aried betw een the t wo levels, but jumps suddenly at the p low mark. At the lo w inhibition lev el, large neighborho o ds of excitatory neurons comp ete to gether to fire for certain types of inputs, ev entually organizing their filters in to a SOM-lik e representation of the dataset. A t the high inhibition level, how ev er, neurons t ypically maintain yet sharp en the filter acquired during the lo w inhibition p ortion of training. In this w ay , we obtain filter maps similar to those learned using the gr owing inhibition mechanism, but with a simpler implementation. This inhibition strategy represents a middle ground betw een that of [14] and our incr e asing inhibition scheme. See Figure 4 for an example of learned filter map and neuron class assignmen ts with t wo-lev el inhibition. There is some degree of clustering of the filters; how ever, as the inhibition level approac hes that of [14], they ma y even tually mov e aw ay from the digit originally represen ting their weigh ts, pro ducing a somewhat fragmented geometry of the filter clustering. The degree of this fragmen tation depends on the c hoice of p low . Namely , with more time training with the high c max inhibition level, the more lik ely a neuron is to c hange its filter to represent data outside of its originally con verged class. 8 (a) (b) Figure 3: Illustration of LM-SNN with increasing inhibition algorithm after training on the MNIST handwritten digit dataset; c inhib = 1.0; (a) 25x25 lattice of neuron filters; (b) class assignment lab els are shown by color maps. (a) 25x25 lattice of neuron filters (b) Neuron class assignments Figure 4: Illustration of LM-SNN with tw o-level inhibition algorithm using the MNIST handwritten digit dataset; (a) 25x25 lattice of neuron filters; (b) class assignments are shown b y color maps. 9 2.4 Reading out learned representations from the activ ations of SNN Due to the unsup ervised nature of training in LM-SNN, the known class lab els of input data are not utilized at that stage. The fact that the class labels are not used in the training pro cess leads to a decreased classification p erformance, no matter what read-out scheme is used after the training. This may seem a waste of resources, how ev er, it may b ecome a virtue if the class lab els are not kno wn or unreliable. Accordingly , the application domain of SNNs is clearly different that of massiv e Deep Learning NNs trained b y sup ervised gradient descent. Nev ertheless, we may wish to ev aluate the quality of the represen tations the SNN has learnt in terms of classification p erformance. There are several wa ys to ac hiev e this goal, whic h is dep endent on the readout scheme employ ed. An efficient readout sc heme can b e based on the observ ation that the net w ork’s represen tation of the dataset is enco ded in the learned w eights of synapses connecting the input and excitatory la yers and the adaptive threshold parameters of the excitatory neurons. W e use the activity of the neurons in the excitatory lay er with their filter w eights held fixed to accomplish these tasks. W e p erform a t wo-step pro cedure based on individual neuron activ ations: (1) L ab eling: Excitatory neurons receiv e a label corresp onding to the category they represent; (2) Classifying: New examples are classified based on the spiking activity of neurons with the previously attached lab els. There are man y p opular labeling and classification schemes, some of the listed b elow, which w e will test. The first 4 can b e considered as rate-based schemes; namely the timing of individual neurons’ spikes are discarded in fav or of a rate-based code, see, e.g., [39]. Some v oting sc hemes consider the information contained in the timing, which is the case of the last option listed b elo w: 1. Al l voting: Neurons are assigned the label of the input class for whic h they ha ve fired m ost on av erage, and these lab els are used to classify new data based on netw ork activ ation. 2. Confidenc e weighting: In this v oting sc heme, we record the prop ortions of spikes each neuron fires for each input class, and used a weigh ted sum of these prop ortions and netw ork activity to classify new inputs. 3. Distanc e-b ase d: New data is lab eled with the class lab el of the neuron whose filter most closely matc hes the input data as measured by Euclidean distance. This is akin to the k - nearest neighbors algorithm with k = 1, where the stored data examples are represented by the neurons’ excitatory filter w eights. 4. n-gr am: In order to leverage information contained in the timing (ordering) of excitatory spik es, w e used a n -gram based classification scheme, in which the timing and the order of spik es is used to mak e a prediction. This algorithm, inspired by rank order co ding [40, 41], represen ts an approac h different from the sc heme ab o ve. This metho d is able to utilize the pattern and the timing of first individual spikes in order to make predictions without w aiting for a fixed amoun t of information. Therefore, this metho d can be trained on-line (ov er the course of training) and can b e adapted incrementally . In Section 3, we ev aluate netw orks with the al l , c onfidenc e weighting , distanc e and n-gr am v oting sc hemes. Note that the n − g r am sc heme, lik e the rate-based sch emes, also follo ws a tw o- step pro cedure: a learning phase to estimate the n − g r am conditional class probabilities from the training data, and a classification phase where n − g r ams are iden tified in the output spiking sequence and used to “vote” for the class they co de for. n − g r ams ha ve long b een used for sequen tial mo deling, particularly in computational linguistics. There is evidence [42] that most of the information in cortical neural netw orks is con tained in the timing of individual spik es, and in 10 particular on the imp ortance of the exact timing of the first spike. Moreov er, n -grams are able to iden tify repeating motifs [43] in the activ ation of sync hronized bursts in cultured neural net works and can also classify stimuli based on the time to first spike [44]. 3 Results 3.1 Classification of MNIST images Here w e summarize results obtained with the LM-SNN models using MNIST data; for additional details, see [25, 33]. W e omit results on the incr e asing and gr owing inhibition strategies in fav or of results using the simpler two-level inhibition strategy . A large compilation of results demonstrates that netw ork p erformance is robust to a wide range of h yp er-parameter choices. This strategy is compared with a baseline SNN taken from [14], and is shown to outp erform it, esp ecially in the regime of small netw orks, and particularly so with the n-gr am voting scheme. W e discuss the robustness of LM-SNNs, showing a graceful degradation in p erformance with resp ect to random deletion of neurons and synapses. 3.1.1 Comparison of Diehl & Co ok baseline SNN and LM-SNN The net works comprising 625 excitatory and inhibitory neurons are trained on a single pass ov er the training data and ev aluated on all 10K test examples. T est accuracy results and single standard de- viations are rep orted in T able 2, eac h av eraged ov er 5 indep endent trials. Results are demonstrated for a num b er of choices of parameters p low , c min , and c max . The b est performances are o ver 92% for some h yp erparameter configurations, which measures up to the Diehl & Co ok original results. F or a detailed comparison of our approach and the Diehl & Co ok baseline results [14], w e conducted a systematic study of netw ork sizes and parameters. T able 3 introduces a comparison of Diehl & Co ok accuracies (baseline SNN in column 2) o ver several settings of the num ber of neurons in LM-SNN, using the c onfidenc e , al l , distanc e and n-gr am voting sc hemes. The tw o-level inhibition h yper-parameters are fixed to p low = 0 . 1 , c min = 1 . 0 , c max = 20 . 0. F or the n-gr am sc heme, 12K examples are used for the learning phase. The smaller net works were trained for a single pass, while we conducted a 3-pass training for the 1,225 and 1,600 netw orks to explore p eak accuracies. Detailed results with all single passes are given in [33]. T o further ev aluate the classification ability that is prop osed in this pap er, we compute av erage confusion matrices (Figure 5) for netw orks with 1,600 neurons from T able 3, for the bes t classification scheme ( n − g r am ; 94.07%), the second-b est (distance; 93.03%), and the wor st (baseline Diehl & Co ok; 92.80%). T ests sho wed that bi-grams ga ve the b est performance, and hence w e fixed n = 2. 5 independent exp erimen ts with different initial configurations and Poisson spik e trains were run, and their results are a veraged and reported along with a single standard deviation. While the c onfidenc e , al l and n-gr am schemes use the activity of the netw ork in order to classify new data, the distanc e sc heme simply lab els new inputs with the lab el of the neuron whose filter most closely matches the input. This last ev aluation scheme is reminiscen t of the one-nearest neighbor algorithm. How ever, our spiking neural netw orks learn prototypical data vectors, whereas the one-nearest neighbor metho d stores the entire dataset to use during ev aluation. The results for netw orks with 1,225 and 1,600 neurons suggest either or b oth of (1) the training algorithm for b oth of the baseline SNN and the LM-SNN does not make appropriate use of all data on a single training pass, or (2) net w ork capacit y is too large to adequately learn all filter weigh ts with a single pass ov er the data. Insp ection of the 11 T able 2: T est accuracy on MNIST images (t wo-lev el inhibition; n e , n i = 625) p low c min c max distanc e al l c onfidenc e p low c min c max distanc e al l c onfidenc e 0.1 0.1 15.0 91.8 ± 0.63 91.4 ± 0.13 91.69 ± 0.63 0.25 0.1 15.0 91.51 ± 0.25 90.62 ± 0.26 90.97 ± 0.25 0.1 0.1 17.5 92.02 ± 0.38 91.26 ± 0.11 91.68 ± 0.38 0.25 0.1 17.5 91.83 ± 0.18 91.06 ± 0.18 91.54 ± 0.18 0.1 0.1 20.0 92.38 ± 0.49 91.54 ± 0.14 92.1 ± 0.49 0.25 0.1 20.0 92.16 ± 0.34 91.07 ± 0.29 91.83 ± 0.34 0.1 1.0 15.0 91.67 ± 0.63 91.12 ± 0.39 91.59 ± 0.63 0.25 1.0 15.0 91.36 ± 0.24 90.26 ± 0.21 90.77 ± 0.24 0.1 1.0 17.5 92.25 ± 0.42 91.32 ± 0.29 91.67 ± 0.42 0.25 1.0 17.5 91.78 ± 0.61 91.09 ± 0.29 91.46 ± 0.61 0.1 1.0 20.0 92.36 ± 0.66 91.44 ± 0.38 91.9 ± 0.66 0.25 1.0 20.0 92.3 ± 0.19 91.51 ± 0.17 91.98 ± 0.19 0.1 2.5 15.0 91.99 ± 0.53 91.01 ± 0.29 91.3 ± 0.53 0.25 2.5 15.0 91.65 ± 0.64 90.89 ± 0.31 91.19 ± 0.64 0.1 2.5 17.5 92.17 ± 0.39 91.49 ± 0.17 91.86 ± 0.39 0.25 2.5 17.5 92.04 ± 0.62 91.52 ± 0.27 91.95 ± 0.62 0.1 2.5 20.0 92.55 ± 0.54 92.07 ± 0.3 92.49 ± 0.54 0.25 2.5 20.0 92.26 ± 0.33 91.43 ± 0.48 91.97 ± 0.33 T able 3: Baseline SNN (Diehl & Co ok) vs. LM-SNN (t wo-lev el inhibition; v ariable n e , n i ) n e , n i Baseline D & C LM-SNN ( c onfidenc e ) LM-SNN ( al l ) LM-SNN ( distance ) LM-SNN ( n-gr am ) 100 80.71% ± 1.66% 82.94% ± 1.47% 81.12% ± 1.96% 85.11% ± 0.74% 85.71% ± 0.85% 225 85.25% ± 1.48% 88.49% ± 0.48% 87.33% ± 0.59% 89.11% ± 0.37% 90.50% ± 0.43% 400 88.74% ± 0.38% 91% ± 0.56% 90.56% ± 0.67% 91.4% ± 0.38% 92.56% ± 0.48% 625 91.27% ± 0.29% 92.14% ± 0.50% 91.69% ± 0.59% 92.37% ± 0.29% 93.39% ± 0.25% 900 92.63% ± 0.28% 92.36% ± 0.63% 91.73% ± 0.7% 92.77% ± 0.26% 93.25% ± 0.57% 1,225 a 92.43% ± 0.23% 92.57% ± 0.57% 91.85% ± 0.52% 92.48% ± 0.29% 93.87% ± 0.25% 1,600 a 92.80% ± 0.49% 92.96% ± 0.56% 92.87% ± 0.49% 93.03% ± 0.30% 94.07% ± 0.46% a Results for 1,225 and 1,600 nodes have b een obtained using 3 passes through the MNIST training data. con vergence of filter weigh ts during training (data not shown) suggests that the training algorithm needs to b e adjusted for greater data efficiency . The fact that training with more ep o chs impro ves accuracy also points to the fact that netw ork capacity may be too high to fit with a single pass through the data. 12 Figure 6: LM-SNN vs. Diehl & Co ok SNN smo othed p erformance estimate ov er the training phase for MNIST data. (a) (b) (c) Figure 5: Av erage confusion matrices for net works with 1,600 neurons, based on the results in T able 3, for the b est classification scheme (Figure 5a; n − g r am ; 94.07%), sec ond b est (Figure 5b; distance; 93.03%), and the worst (Figure 5c; baseline Diehl & Co ok; 92.80%). Entries on ro w i and column j giv e the p ercentage of test examples that are classified as j , despite actually b elonging to class i . 3.1.2 LM-SNN con v ergence and online learning option A ma jor adv antage of using a relaxed inhibition scheme is the ability to learn a b etter data repre- sen tation with fewer training examples. While training LM-SNNs using the gr owing and two-level inhibition strategies, we observe a con vergence to optimal net work p erformance well b efore SNNs trained with large, constan t inhibition as in [14]. With small inhibition (increasing with the distance b et ween pairs of neurons) the dev elopment of filters o ccurs quickly , allowing the fast attainmen t of decent performance. Over the course of the training, those filters are gradually refined as the in ter-neuron inhibition grows and allows for the individualized firing of neurons. W e sho w in Figure 6 a comparison of the conv ergence in estimated test p erformance for netw orks of v arious sizes. D & C baseline SNNs are trained with large, constan t inhibition c inhib = 20 . 0, while LM-SNNs are trained with the two-level inhibition strategy with parameter v alues p low = 0 . 1, c min = 13 1 . 0, and c max = 20 . 0. Estimates are calculated by assigning labels to neurons based on the firing activit y on 250 training examples, and using these labels along with the aforementioned classification metho ds to classify the next 250 training examples. T raining accuracy curv es are smo othe d b y a veraging each estimate with a windo w of the 10 neigh b oring estimates. The performance of LM-SNNs quickly outpace that of baseline SNNs, and obtain near-optimal accuracy after seeing b et ween 10 and 20 thousand training examples, regardless of the n um b er of excitatory and inhibitory neurons, in the range of 225 to 900 neurons. As the result of the employ ed spatially distributed inhibition strategy , larger LM-SNNs achiev e b etter accuracy more quickly , due to their abilit y to learn more filters at once. On the other hand, SNNs are limited to learning one filter at a time, and their accuracy during the training phase is hindered by the size of the net work as a result. The observed rapid learning prop erty of LM-SNN p oints to the p oten tial of their use under v arious online learning conditions. 3.1.3 Robustness of LM-SNN p erformance with sparse data T o test the p erformance of the SNNs, we explored the impact of diminishing connection strengths on the learning pro cess. Accordingly , instead of connecting the input data pixels fully with all the neurons in the excitatory la yer, w e exp erimented with v arying degrees of random sparse connectivit y . The applied pro cedure has b een as follows. Before training, some p ercentage of synapses are randomly eliminated, after which the training and test phases pro ceeded as usual. W e studied whether v arying levels of sparsit y make LM-SNN more robust to outliers in the MNIST data, therefore increasing the chance of go o d test p erformance. W e study how do es the netw ork p erform in the even t of missing features, whether it exhibits graceful degradation in p erformance as the input data b ecomes less clear. T able 4 displa ys accuracy results with v arying lev el of sparsity of connections to the input as defined ab ov e, based on av eraging ten indep enden t training and test runs, rep orted along with their standard deviations. In terestingly , small amounts of sparsity do not significantly degrade the net work p erformance. Moreov er, even with nearly all connections remov ed, the netw ork maintains reasonable accuracy . In particular, removing 90% of synapses from the input, netw orks of 625 excitatory and inhibitory neurons achiev e nearly 60% test accuracy after being trained using the two-level inhibition mechanism. This result demonstrates the robustness of our system to missing information in the input data. T able 4: Sparse input test accuracy n e , n i % sparsity T est accuracy 625 0% 91.71 % ± 0.23% 625 10% 91.48% ± 0.31% 625 25% 89.79% ± 0.66% 625 50% 85.83% ± 0.95% 625 75% 75.71% ± 1.20% 625 90% 58.60% ± 1.31% 14 3.2 A tari break out frame classification 3.2.1 Description of A tari Break out task domain T o demonstrate the flexibility of the LM-SNN mo del across differen t c hoices of task domains, we assem bled a dataset of modified frames from the Atari game Break out. W e use the Op enAI Gym [45] to simulate the game en vironment. The game inv olv es an image of 80x80 pixel on computer screen, whic h has a paddle at the b ottom, a wall of m ultiple la yers of bric ks on the upp er section, and a ball that mov es o ver the screen. Atari Breakout frames are cropp ed to remov e borders that app ear to be irrelev ant to the game-playing agent. An example game from the actual game in sho wn in Figure 7. The game is pla yed b y moving a rectangular paddle horizontally across the bottom of the screen in order to keep a ball from entering the low est part of the screen and being lost. The p ossible actions in the Break out game are ”no action”, ”fire” (launc h the ball from the paddle to start the round), ”mo ve righ t”, and ”mov e left”. The game ends when all tiles from the top of the screen are cleared by the action of the b ouncing ball. In this dataset, frames are lab eled according to the corresponding action tak en b y a tw o-lay er fully-connected artificial neural net work (ANN) trained by the Q-learning algorithm; that is, a Deep Q-Net work (DQN) [46]. The frames were created by taking a weigh ted sum of four consecutive in- game frames, in whic h the most recent frame has the highest weigh t, and the least recent the lo west w eight. Since some actions are taken more often than others, there is a class imbalance in the dataset we ha ve curated. T o mitigate this problem, we re-sample from eac h class-sp ecific subset of data as many times as needed to build a new dataset of arbitrary size with balanced classes. In the original data, there are 6,850 examples, with approximately 49.3% lab eled ”no action”, 6.3% lab eled ”fire”, 28.4% lab eled ”mov e right”, and 16.0% labeled ”mov e left”. W e found through exp eriments that a re-sampling of 16,000 training examples (with 4,000 examples p er class) saturated the learning ability of trained netw orks; i.e., training with more examples did not significantly impro ve classification accuracy . Figure 7: A frame from the Atari Break out game. Note the p edal at the bottom, the colored bric ks, and the ball (red) in the middle. W e first mo dified the SNN architecture of [14] to use for classification of the A tari Break out game frames. The input lay er (after cropping out irrelev an t regions) is 50 pixels high and 72 pixels wide, requiring a p opulation of input neurons of size 50 × 72 = 3,600. The input p opulation 15 is connected al l-to-al l to an output la yer of neurons of arbitrary size n neurons . W e remov ed the “inhibitory” lay er of neurons from the arc hitecture of [14], replacing it with a mechanistically similar inhibitory recurren t connection in the output lay er; i.e., every output neuron connects to all other output neurons with inhibitory synapses. This allows us to sim ulate only n input + n neurons instead of n input + 2 ∗ n neurons , reducing simulation time significan tly while retaining the comp etitiv e inhibitory mec hanism that enables b oth the baseline algorithm of [14] and our own metho d to learn. Additionally , w e re-implemented both the baseline SNN as well the LM-SNN net work arc hi- tectures in the BindsNET spiking neural net works sim ulation library [32]. Although the c hoice of soft ware implementation is a minor detail, we found that this library enabled us to prototype and train netw orks more quickly than in the previously used simulation soft ware [47]. T o ensure that our chan ges didn’t result in a reduction in netw ork p erformance, w e replicated the test accuracy results on the MNIST handwritten digits rep orted ab o ve (data not sho wn). 3.2.2 A tari Break out: Baseline results W e applied the mo dified netw orks of [14] to the custom A tari game frame classification dataset. A t ypical confusion matrix pro duced by these netw orks is shown in Figure 8, with corresp onding w eights and neuron class assignments in Figure 9. T est classification results are sho wn in table 5 for a num b er of configurations of hyper-parameters. All hyper-parameters are held fixed at sensible v alues except for θ + (the amount to increase the neurons’ firing thresholds p ost-spik e) and c norm (the normalization constant for weigh ts on the connection from input to output p opulations). F or each configuration of hyper-parameters, the same experiment is run 5 times with differen t random seeds, and the av erage result with a single standard deviation is rep orted. As expected, the test classification accuracy typically increases with n neurons , and we found that θ + = 5 . 0 and c norm = 62.5 t ypically ga ve the b est results. V arying other h yp er-parameters and expanding the range of the search may lead to improv ed p erformance. Figure 8: T est dataset confusion matrix for Atari Breakout frame classification using the net work mo del of [14]. The net work that produced this confusion matrix result achiev ed ≈ 60% classification accuracy , with ≈ 50% correct answ ers for “no action”, ≈ 85% for “fire”, and ≈ 60% for b oth “mov e righ t” and “mov e left”. Axis lab els are shortened for clarity of presentation. 16 Figure 9: Atari Breakout netw ork filter weigh ts and corresp onding class assignments. The contrast in the plot of filter weigh ts is increased in order to show the learned weigh ts a bit b etter. The filters clearly capture p ositions of the paddle (at the b ottom of the frame) and the ball (which mov es around the top portion of the frame), jointly or separately . Notice that some filters are not learned; i.e., some w eights did not settle on a particular configuration of the game state. 17 θ + c norm n neurons al l activity pr op ortion weighting n-gr am 1 50 100 53.69 ± 1.38 54.35 ± 0.73 54.70 ± 0.73 200 56.92 ± 0.30 57.15 ± 0.43 58.46 ± 0.43 300 57.25 ± 1.57 57.50 ± 1.73 57.65 ± 1.73 400 56.15 ± 1.84 56.13 ± 1.86 57.99 ± 1.86 500 56.80 ± 1.18 56.91 ± 1.10 58.03 ± 1.10 62.5 100 52.37 ± 1.53 52.58 ± 1.46 55.54 ± 1.46 200 56.05 ± 1.41 56.21 ± 1.48 57.21 ± 1.48 300 55.78 ± 1.66 56.07 ± 1.32 57.03 ± 1.32 400 56.47 ± 1.91 57.01 ± 2.10 57.67 ± 2.10 500 56.72 ± 2.04 57.20 ± 2.14 58.61 ± 2.14 75 100 53.84 ± 1.67 53.31 ± 2.26 55.42 ± 2.26 200 54.72 ± 2.54 54.71 ± 2.78 57.18 ± 2.70 300 56.76 ± 1.89 57.17 ± 1.92 57.75 ± 1.92 400 56.39 ± 1.55 56.66 ± 1.70 56.92 ± 1.70 500 57.18 ± 0.79 57.42 ± 0.74 58.08 ± 0.74 2.5 50 100 53.62 ± 3.25 53.90 ± 3.24 55.33 ± 3.24 200 56.16 ± 2.03 56.47 ± 2.05 58.00 ± 2.05 300 58.07 ± 1.07 58.55 ± 0.94 59.38 ± 0.94 400 58.13 ± 1.23 58.30 ± 1.13 59.28 ± 1.13 500 57.05 ± 1.06 57.31 ± 1.32 59.14 ± 1.32 62.5 100 52.37 ± 4.54 52.84 ± 4.31 54.40 ± 4.31 200 57.79 ± 1.52 57.90 ± 1.82 59.00 ± 1.82 300 57.97 ± 1.20 58.35 ± 1.11 59.05 ± 1.11 400 57.64 ± 0.77 58.27 ± 0.43 59.15 ± 0.43 500 58.53 ± 0.40 58.86 ± 0.68 59.17 ± 0.68 75 100 52.34 ± 3.70 52.97 ± 3.70 53.31 ± 3.70 200 54.89 ± 4.85 55.21 ± 4.73 57.65 ± 4.73 300 58.34 ± 1.08 58.79 ± 1.19 60.14 ± 1.19 400 57.77 ± 1.54 58.29 ± 1.58 58.03 ± 1.58 500 57.85 ± 0.85 58.37 ± 1.20 58.74 ± 1.20 5 50 100 54.58 ± 1.57 54.80 ± 2.18 55.06 ± 2.18 200 57.03 ± 3.61 57.44 ± 3.75 58.31 ± 3.75 300 58.31 ± 0.91 58.85 ± 0.88 59.39 ± 0.88 400 59.56 ± 1.02 60.11 ± 0.96 59.96 ± 0.96 500 59.89 ± 0.70 60.14 ± 0.80 59.68 ± 0.80 62.5 100 55.53 ± 1.77 55.78 ± 1.88 54.84 ± 1.88 200 56.90 ± 1.78 57.27 ± 1.85 58.05 ± 1.85 300 60.03 ± 1.24 60.48 ± 1.09 60.24 ± 1.09 400 58.68 ± 0.71 59.83 ± 0.54 60.09 ± 0.54 500 59.66 ± 0.96 60.59 ± 0.76 60.33 ± 0.76 75 100 55.61 ± 0.69 55.66 ± 0.55 54.60 ± 0.55 200 57.90 ± 1.39 58.30 ± 1.44 57.69 ± 1.44 300 59.28 ± 1.25 59.87 ± 1.17 59.37 ± 1.17 400 59.16 ± 1.38 59.46 ± 1.47 59.82 ± 1.47 500 59.69 ± 0.67 60.10 ± 0.24 60.15 ± 0.24 T able 5: T est p erformance results on the Atari breakout games frames classification task. Besides the rep orted v ariations of hyper-parameters, all other parameters are k ept constant. 5 separate exp erimen ts are run, and av eraged test results are rep orted along with standard deviations. 3.2.3 A tari Break out: LM-SNN results W e used the lattice map spiking neural net work (LM-SNN) arc hitecture to join tly learn a clustering of filters and classify the A tari Breakout paired frame-actions dataset. W e fix θ + = 5 . 0 and 18 c norm = 62 . 5, and v ary n neurons , c start (starting inhibition level), and p low (prop ortion of training during whic h inhibition = c start ). As usual, after n low = p low ∗ n examples training examples, the inhibition is switched to that of the baseline SNN [14], and remains at this level for the rest of the training and test phases. Examples of learned filters and corresp onding lab el assignmen ts from a netw ork with p low = 1 . 0 are plotted (with con trast lev els adjusted for ease of viewing) in Figure 10. Some filters remain relativ ely un-settled, in that they remain close to their initialization w eights (chosen uniformly at random in [0, 1], then normalized to sum to c norm ), while others learn a in-game configuration corresp onding to certain frames from the training data. Some clustering of ball and paddle config- urations is eviden t, but due to the nature of the data, this is difficult to visually insp ect. On the other hand, insp ection of the lab el assignmen ts show a clustering of certain class lab els, esp ecially “fire” and “right”, and to a lesser extent, “left”, while “no-op” remains scattered throughout. The visualization of the class labels reveals a second problem: although the dataset has b een rebalanced, certain classes are clearly preferred in the neuron lab eling. Figure 10: Example learned filters and corresp onding class assignmen ts from a tw o-level inhibition SNN with 400 output neurons and p low = 1 . 0. Some filters remain un-settled, but many hav e learned a configuration of the ball and paddle. Though difficult to see, some filters app ear to ha ve clustered together on the basis of the similarit y of their ball and paddle configurations. The visualized assignmen ts rev eal a degree of class lab el clustering, as well as the un balanced learned represen tation of the training data. T est classification results for a num ber of hyper-parameters are gathered in T able 6. F or eac h setting of h yp er-parameters, 5 exp eriments are run with different random seeds, and w e rep ort the a verage accuracy along with a single standard deviation. Notice that, for p low = 0 . 0, the test results are the same across v alues of c start , since the netw ork sp ends no training time in the low inhibition phase. The general trend is that test results tend to increase with n neurons , and decrease with p low , although, in some cases, p low = 1 . 0 outperforms p low ∈ (0 , 1). The n-gr am classification metho d tends to p erform b etter than the pr op ortion weighting metho d, which tends to outperform the al l activity method. 19 n neurons c start p low al l activity pr op ortion weighting n-gr am 100 0.5 0.00 55.78 ± 0.23 56.38 ± 0.05 56.47 ± 0.80 0.25 48.90 ± 1.73 48.85 ± 1.73 50.22 ± 0.35 0.50 44.86 ± 1.84 44.78 ± 1.84 48.38 ± 1.76 1.00 36.41 ± 3.01 36.51 ± 3.19 37.01 ± 2.57 1.0 0.00 55.83 ± 2.41 56.35 ± 2.60 56.22 ± 1.32 0.25 50.41 ± 1.02 50.38 ± 1.08 50.15 ± 0.87 0.50 43.21 ± 3.47 43.20 ± 3.47 47.00 ± 3.07 1.00 38.97 ± 2.83 38.85 ± 2.75 38.54 ± 2.31 2.5 0.00 55.88 ± 2.79 56.32 ± 3.00 56.19 ± 1.52 0.25 49.34 ± 1.38 49.30 ± 1.21 50.14 ± 0.85 0.50 45.19 ± 1.42 45.10 ± 1.45 46.91 ± 2.70 1.00 42.49 ± 3.83 43.08 ± 3.19 41.08 ± 1.99 200 0.5 0.00 56.24 ± 2.57 56.45 ± 2.95 58.35 ± 1.61 0.25 50.46 ± 0.85 50.40 ± 0.84 50.70 ± 0.87 0.50 49.45 ± 1.17 49.39 ± 1.24 50.13 ± 0.88 1.00 40.03 ± 1.81 39.84 ± 2.52 40.57 ± 1.22 1.0 0.00 56.24 ± 2.57 56.45 ± 2.95 58.35 ± 1.61 0.25 49.92 ± 1.58 49.95 ± 1.42 50.33 ± 0.74 0.50 50.15 ± 0.82 50.11 ± 0.83 50.30 ± 0.38 1.00 40.93 ± 2.90 41.58 ± 2.90 39.59 ± 1.48 2.5 0.00 56.24 ± 2.57 56.45 ± 2.95 58.35 ± 1.61 0.25 51.26 ± 1.10 51.36 ± 1.34 50.67 ± 1.68 0.50 49.75 ± 1.55 49.98 ± 1.79 49.60 ± 0.81 1.00 50.62 ± 0.93 51.02 ± 1.24 49.90 ± 0.52 300 0.5 0.00 57.77 ± 1.41 58.19 ± 1.45 58.36 ± 1.55 0.25 51.29 ± 1.17 51.26 ± 1.12 52.21 ± 1.36 0.50 49.81 ± 1.75 49.90 ± 1.56 50.50 ± 1.05 1.00 44.73 ± 3.51 45.03 ± 3.23 43.46 ± 2.33 1.0 0.00 57.77 ± 1.41 58.19 ± 1.45 58.36 ± 1.55 0.25 50.57 ± 1.53 50.63 ± 1.58 51.96 ± 0.98 0.50 48.53 ± 0.86 48.73 ± 0.85 50.43 ± 0.18 1.00 48.20 ± 5.44 48.44 ± 5.29 46.10 ± 3.66 2.5 0.00 57.77 ± 1.41 58.19 ± 1.45 58.36 ± 1.55 0.25 52.64 ± 1.36 52.79 ± 1.45 52.46 ± 1.29 0.50 52.49 ± 1.50 52.68 ± 1.67 51.15 ± 0.69 1.00 53.55 ± 0.65 53.74 ± 0.47 52.11 ± 0.54 400 0.5 0.00 57.13 ± 0.80 57.90 ± 1.17 58.93 ± 0.64 0.25 51.90 ± 1.15 51.84 ± 1.18 52.13 ± 1.77 0.50 50.41 ± 0.67 50.58 ± 0.69 51.65 ± 0.70 1.00 43.83 ± 3.06 43.77 ± 2.44 45.30 ± 2.33 1.0 0.00 57.13 ± 0.80 57.90 ± 1.17 58.93 ± 0.64 0.25 51.43 ± 1.23 51.40 ± 1.20 52.84 ± 2.30 0.50 50.59 ± 1.78 50.89 ± 1.86 50.92 ± 1.10 1.00 50.88 ± 1.99 51.26 ± 1.93 49.72 ± 1.57 2.5 0.00 57.13 ± 0.80 57.90 ± 1.17 58.93 ± 0.64 0.25 52.82 ± 2.39 52.90 ± 2.39 54.47 ± 1.64 0.50 52.43 ± 2.64 52.63 ± 2.71 51.78 ± 1.22 1.00 53.13 ± 1.80 53.42 ± 1.68 52.96 ± 1.90 T able 6: T est p erformance results on the Atari breakout games frames classification task using the t wo-lev el inhibition modified SNN arc hitecture. Besides the rep orted v aried hyper-parameters, all others are k ept constant. 5 separate experiments are run, and av erage test results are reported along with plus or min us a single standard deviation. 20 4 Conclusions and F uture W ork W e hav e in tro duced a lattice map spiking neural netw ork (LM-SNN) mo del with mo dified STDP learning rule and biological inspired decision making mechanism. The feasibilit y of using LM-SNN to solve actual mac hine learning tasks is demonstrated on tw o datasets: the MNIST handwritten digits image data, and a collection of snapshots of Atari Breakout computer game images gathered from a successfully trained Deep Q-Learning netw orks. The in tro duced mo del generalizes previous state-of-art and demonstrates several adv an tages, as follows: • The learning algorithm in LM-SNN manifests an unsup ervised learning sc heme with self- organization, classification, and clustering prop erties, related to Kohonen SOMs. It is shown that we can contr ol the smo othness, or fragmen tation, of the produced self-organized filter maps by dynamically tuning the mo del parameters. This smo othing effect helps to inter- pret the op eration of the filters, and it is esp ecially significan t for the inhibitory connection strengths and strategies, such as incremen tal change in inhibition level, as w ell as switching inhibition from lo w to high v alues. Moreov er, these results p oin t to mechanisms, which may pla y role in the sp ontaneous formation of receptiv e fields observed in biological systems. • W e demonstrated that the LM-SNN mo del can learn fast, whic h is v ery adv antageous for rapid decision-making, p ossibly for online learning, in dynamically changing en vironmen ts. Namely , LM-SNN reaches reasonable accuracy levels within a fraction of the learning time and using m uch less training examples required by traditional SNNs. T o achiev e such p erformance, we utilize the timing information in the spik e patterns of the activit y using the n -gr am scheme, whic h is able to utilize the pattern and the timing of first individual spikes in order to make a predictions without w aiting for a fixed amount of information. In addition to the abilit y to serv e as an image classifier with clustering ability , the algorithm in this pap er also show the feasibilit y to serve as a controller to real-time task like in the Atari Breakout game. • Our approac h demonstrated robustness and graceful degradation in p erformance in the even t of missing inputs. This is in stark con trast with deep neural net works, whic h sho w ex- treme sensitivity and catastrophic degradation of p erformance with changing data prop erties. Sp ecifically , the p erformance of LM-SNN has b een degrading only sligh tly even if ab out 50% of the pixels were missing, and the p erformance has b een main tained ab ov e random choice ev en with 90% of the pixels missing. F uture work will utilize the c lustering ability of the excitatory lay er for enhancing and scaling up this mo del to a m ulti-la yer net w ork. It is an ticipated that b y stac king up la yers, the netw ork w ould b e able to cluster abstract features in the first lay ers and sp ecialize the clustering features in deep er lay ers. Con tinued developmen t of neuron lab eling and classification strategies may play an important role in ev aluating trained SNN activit y; i.e., defining what netw ork output entails in relation to categorical membership of the input data. Moreov er, one can use this mo del in a real-time reinforcemen t learning b y rew arding certain pattern activit y in the classifier to manipulate the desired outcome. Using spiking neurons has significant p otential long-term b enefits. With the proliferation of neuromorphic hardware platforms, the implementation of SNNs can b ecome very efficien t and c heap. Scaling up this tec hnology is more cost-effective in term of energy and computational speed compared to the traditional neural netw orks. Moreov er, the operation of spiking neurons is one step closer to biological neurons. Understanding how to compute with spiking neuron can give us 21 insigh t for b etter understand computation in biological systems under normal and p ossibly abnormal conditions. Ac kno wledgemen ts This work has been supp orted in part by Defense Adv anced Researc h Pro ject Agency Grant, D ARP A/MTO HR0011-16-l-0006 and b y National Science F oundation Grant NSF-CRCNS-DMS- 13-11165. The supp ort of Devdhar Patel in preparing the Atari b enchmark data set is greatly appreciated. References [1] P . J. W erb os, “Beyond regression: New to ols for prediction and analysis in the b ehavioral sciences.” Do ctor al Dissertation, Applie d Mathematics, Harvar d University, MA. , 1974. [2] Rumelhart, D. E., Hinton, G. E., and Williams, R. J., “Learning representations by error propagation.” In D. E. Rumelhart, J. L. McClel land, and the PDP R ese ar ch Gr oup (Eds.), Par al lel distribute d pr o c essing , vol. 1, p. 318, 1986. [3] Y. LeCun, Y. Bengio, G. Hinton, “Deep learning,” Natur e , no. 521, p. 436, May 2015. [4] Bi, G. and Poo, M., “Synaptic Mo difications in Cultured Hipp o campal Neurons: Dep endence on Spike Timing, Synaptic Strength, and P ostsynaptic Cell T yp e,” Journal of Neur oscienc e , v ol. 18, no. 24, p. 10464, 1998. [Online]. Av ailable: h ttp://www.jneurosci.org/conten t/18/24/10464 [5] D. O. Hebb, The or ganization of b ehavior: A neur opsycholo gic al the ory . New Y ork: Wiley , Jun. 1949. [6] H. Markram, W. Gerstner, and P . J. Sjstrm, “Spik e-timing-dep endent plasticity: A comprehensiv e o verview,” F r ontiers in Synaptic Neur oscienc e , vol. 4, p. 2, 2012. [Online]. Av ailable: https://www.fron tiersin.org/article/10.3389/fnsyn.2012.00002 [7] P . A. Merolla, J. V. Arthur, R. Alv arez-Icaza, A. S. Cassidy , J. Saw ada, F. Akop y an, B. L. Jac kson, N. Imam, C. Guo, Y. Nak amura, B. Brezzo, I. V o, S. K. Esser, R. Appuswam y , B. T aba, A. Amir, M. D. Flickner, W. P . Risk, R. Manohar, and D. S. Mo dha, “A million spiking-neuron integrated circuit with a scalable communication net work and interface,” Scienc e , vol. 345, no. 6197, pp. 668–673, 2014. [Online]. Av ailable: h ttps://science.sciencemag.org/conten t/345/6197/668 [8] Y. Cao, Y. Chen, and D. Khosla, “Spiking deep conv olutional neural net works for energy-efficien t ob ject recognition,” International Journal of Computer Vision , v ol. 113, no. 1, pp. 54–66, Ma y 2015. [Online]. Av ailable: https://doi.org/10.1007/s11263- 014- 0788- 3 [9] W. Gerstner, W. M. Kistler, Spiking Neur on Mo dels. Single Neur ons, Populations, Plasticity . Cam bridge Universit y Press, 2002. [10] M.-O. Gew altig and M. Diesmann, “Nest (neural sim ulation tool),” Scholarp e dia , vol. 2, no. 4, p. 1430, 2007. 22 [11] D. F. M. Go o dman, R. Brette, “The Brian simulator,” F r ontiers in Computational Neur o- scienc e , Sep. 2009. [12] M. Beyeler, K. D. Carlson, Ting-Shuo Chou, N. Dutt and J. L. Krichmar, “Carlsim 3: A user-friendly and highly optimized library for the creation of neurobiologically detailed spiking neural netw orks,” in 2015 International Joint Confer enc e on Neur a l Networks (IJCNN) , July 2015, p. 1. [13] T. Bekola y , J. Bergstra, E. Hunsberger, T. DeW olf, T. C. Stew art, D. Rasm ussen, X. Cho o, A. R. V o elker, and C. Eliasmith, “Nengo: A p ython to ol for building large-scale functional brain mo dels,” F r ontiers in Neur oinformatics , vol. 7, no. 48, 2014. [Online]. Av ailable: h ttp://www.frontiersin.org/neuroinformatics/10.3389/fninf.2013.00048/abstract [14] P . Diehl and M. Co ok, “Unsup ervised learning of digit recognition using spike-timing- dep enden t plasticit y ,” F r ontiers in Computational Neur oscienc e , v ol. 9, p. 99, 2015. [Online]. Av ailable: https://www.fron tiersin.org/article/10.3389/fncom.2015.00099 [15] Liu, D. and Y ue, S., “F ast unsup ervised learning for visual pattern recognition using spike timing dep endent plasticit y ,” Neur o c omputing , v ol. 249, p. 212, Aug. 2017. [Online]. Av ailable: h ttp://www.sciencedirect.com/science/article/pii/S0925231217306276 [16] Kheradpisheh, S. R. and Ganjtabesh, M. and Thorp e, S. J. and Masquelier, T., “STDP-based spiking deep con volutional neural netw orks for ob ject recognition,” arXiv:1611.01421 [cs] , No v. 2016, arXiv: 1611.01421. [Online]. Av ailable: [17] Diehl, P . U. and Neil, D. and Binas, J. and Co ok, M. and Liu, S. C. and Pfeiffer, M., “F ast- classifying, high-accuracy spiking deep netw orks through weigh t and threshold balancing,” in 2015 International Joint Confer enc e on Neur al Networks (IJCNN) , Jul. 2015, p. 1. [18] B. Ruec k auer, I.-A. Lungu, Y. Hu, M. Pfeiffer, and S.-C. Liu, “Con version of con tinuous-v alued deep net works to efficient ev en t-driven net w orks for image classification,” F r ontiers in Neur oscienc e , vol. 11, p. 682, 2017. [Online]. Av ailable: https://www.fron tiersin.org/article/ 10.3389/fnins.2017.00682 [19] B. Ruec k auer, I.-A. Lungu, Y. Hu, and M. Pfeiffer, “Theory and T ools for the Con v ersion of Analog to Spiking Conv olutional Neural Netw orks,” ArXiv e-prints , Dec. 2016. [20] D. Huh and T. J. Sejno wski, “Gradien t Descent for Spiking Neural Netw orks,” ArXiv e-prints , Jun. 2017. [21] J. H. Lee, T. Delbruck, and M. Pfeiffer, “T raining deep spiking neural netw orks using bac kpropagation,” F r ontiers in Neur oscienc e , vol. 10, p. 508, 2016. [Online]. Av ailable: h ttps://www.frontiersin.org/article/10.3389/fnins.2016.00508 [22] O. Booij and H. tat Nguy en, “A gradien t descent rule for spiking neurons emitting m ultiple spikes,” Inf. Pr o c ess. L ett. , v ol. 95, no. 6, p. 552, Sep. 2005. [Online]. Av ailable: h ttp://dx.doi.org/10.1016/j.ipl.2005.05.023 [23] B. Schrau w en and J. V. Camp enhout, “Extending spikeprop,” in 2004 IEEE International Joint Confer enc e on Neur al Networks (IEEE Cat. No.04CH37541) , vol. 1, July 2004, p. 471. 23 [24] D. M. Sala, K. J. Cios, and J. T. W all, “Self-organization in netw orks of spiking neurons,” A ustr alian Journal of Intel ligent Information Pr o c essing Systems , vol. 5, no. 3, pp. 161–170, 1998. [25] D. J. Saunders, H. T. Siegelmann, R. Kozma, and M. Ruszinko, “STDP learning of image patc hes with con volutional spiking neural netw orks,” in 2018 International Joint Confer enc e on Neur al Networks (IJCNN) World Congr ess on Computational Intel ligenc e, Rio de Janeir o, Br azil , July 8-13 2018, pp. 4906–4912. [26] T. Kohonen, “Self-organized formation of top ologically correct feature maps,” Biolo gic al Cyb ernetics , vol. 43, no. 1, p. 59, Jan 1982. [Online]. Av ailable: https://doi.org/10.1007/ BF00337288 [27] G. A. Carp en ter and S. Grossberg, “The art of adaptive pattern recognition by a self-organizing neural netw ork,” Computer , vol. 21, no. 3, pp. 77–88, March 1988. [28] E. Erwin, K. Ob ermay er, and K. Sch ulten, “Mo dels of orien tation and o cular dominance columns in the visual cortex: A critical comparison,” Neur al Computation , v ol. 7, no. 3, p. 425, 1995. [Online]. Av ailable: https://doi.org/10.1162/neco.1995.7.3.425 [29] A. T al, N. Peled, and H. Siegelmann, “Biologically inspired load balancing mec hanism in neo cortical comp etitive learning,” F r ontiers in Neur al Cir cuits , v ol. 8, p. 18, 2014. [Online]. Av ailable: https://www.fron tiersin.org/article/10.3389/fncir.2014.00018 [30] T. Rumbell, S. L. Denham, and T. W ennek ers, “A spiking self-organizing map com bining STDP, oscillations, and con tinuous learning,” IEEE T r ansactions on Neur al Networks and L e arning Systems , vol. 25, no. 5, p. 894, May 2014. [31] B. Y usob, S. M. H. Shamsuddin, and H. N. A. Hamed, “Spiking self-organizing maps for classification problem,” Pr o c e dia T e chnolo gy , vol. 11, p. 57, 2013, 4th In ternational Conference on Electrical Engineering and Informatics, ICEEI 2013. [Online]. Av ailable: h ttp://www.sciencedirect.com/science/article/pii/S2212017313003162 [32] H. Hazan, D. J. Saunders, H. Khan, D. Patel, D. T. Sanghavi, H. T. Siegelmann, and R. Kozma, “Bindsnet: A machine learning-oriented spiking neural netw orks library in python,” F r ontiers in Neur oinformatics , vol. 12, p. 89, 2018. [Online]. Av ailable: h ttps://www.frontiersin.org/article/10.3389/fninf.2018.00089 [33] H. Hazan, D. Saunders, D. T. Sangha vi, H. Siegelmann, and R. Kozma, “Unsup ervised learning with self-organizing spiking neural netw orks,” in 2018 International Joint Confer enc e on Neur al Networks (IJCNN), World Congr ess on Computational Intel ligenc e, Rio de Janeir o, Br azil , July 8-13 2018, pp. 493–498. [34] Y. LeCun and C. Cortes, “MNIST handwritten digit database,” Dataset , 2010. [Online]. Av ailable: http://y ann.lecun.com/exdb/mnist/ [35] Maass, W., “Net works of spiking neurons: The third generation of neural netw ork mo dels,” Neur al Networks , vol. 10, no. 9, p. 1659, Dec. 1997. [Online]. Av ailable: h ttp://www.sciencedirect.com/science/article/pii/S0893608097000117 24 [36] W. M. K. W. Gerstner, Spiking Neur on Mo dels. Single Neur ons, Populations, Plasticity . Cam- bridge Universit y Press, 2002. [37] H. Markram, J. L ¨ ubk e, M. F rotscher, and B. J. Sakmann, “Regulation of synaptic efficacy by coincidence of p ostsynaptic aps and epsps.” Scienc e , vol. 275 5297, p. 213, 1997. [38] M. Oster, R. Douglas, and S.-C. Liu, “Computation with spikes in a winner-take-all net work,” Neur al Comput. , vol. 21, no. 9, p. 2437, Sep. 2009. [Online]. Av ailable: h ttp://dx.doi.org/10.1162/neco.2009.07- 08- 829 [39] Gollisch, T. and Meister, M., “Rapid neural co ding in the retina with relative spik e latencies,” Scienc e , vol. 319, no. 5866, p. 1108, 2008. [Online]. Av ailable: h ttp://science.sciencemag.org/conten t/319/5866/1108 [40] S. Thorpe and J. Gautrais, “Rank order co ding,” in Computational neur oscienc e . Springer, 1998, pp. 113–118. [41] A. Delorme, L. Perrinet, and S. J. Thorp e, “Netw orks of integrate-and-fire neurons using rank order co ding b: Spik e timing dep enden t plasticity and emergence of orientation selectivit y ,” Neur o c omputing , vol. 38, pp. 539–545, 2001. [42] R. S. Petersen, S. Panzeri and M. E. Diamond, “Population coding of stim ulus lo cation in rat somatosensory cortex,” Neur on , vol. 32, no. 3, p. 503, 2001. [Online]. Av ailable: h ttp://www.sciencedirect.com/science/article/pii/S0896627301004810 [43] Raichman, N. and Ben-Jacob, E., “Identifying repeating motifs in the activ ation of sync hronized bursts in cultured neuronal netw orks,” Journal of Neur oscienc e Metho ds , v ol. 170, no. 1, p. 96, 2008. [Online]. Av ailable: http://www.sciencedirect.com/science/article/pii/ S0165027008000046 [44] Kermany , E., Gal, A., Lyakho v, V., Meir, R., Marom, S., Eytan, D., “T radeoffs and constrain ts on neural represen tation in net works of cortical neurons,” Journal of Neur oscienc e , vol. 30, no. 28, p. 9588, 2010. [Online]. Av ailable: http://www.jneurosci.org/con ten t/30/28/9588 [45] G. Bro ckman, V. Cheung, L. P ettersson, J. Schneider, J. Sc hulman, J. T ang, and W. Zaremba, “Op enai gym,” arXiv pr eprint arXiv:1606.01540 , 2016. [46] Mnih, V, Kavuk cuoglu, K., Silv er, D., Rusu, A., V eness, J., Bellemare, M.G., Grav es, A., Riedmiller, M., Fidjeland, A.K., Ostro vski, G., P etersen, S., Beattie, C., Sadik, A., An tonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D., “Human-level control through deep reinforcement learning.” Natur e , vol. 518(7540), p. 529, 2015. [47] Go o dman, Dan F. M. and Brette, Romain, “The Brian simulator,” F r ontiers in Neur oscienc e , v ol. 3, 2009. [Online]. Av ailable: https://www.fron tiersin.org/articles/10.3389/neuro.01.026. 2009/full 25

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment