Training a Probabilistic Graphical Model with Resistive Switching Electronic Synapses
📝 Abstract
Current large scale implementations of deep learning and data mining require thousands of processors, massive amounts of off-chip memory, and consume gigajoules of energy. Emerging memory technologies such as nanoscale two-terminal resistive switching memory devices offer a compact, scalable and low power alternative that permits on-chip co-located processing and memory in fine-grain distributed parallel architecture. Here we report first use of resistive switching memory devices for implementing and training a Restricted Boltzmann Machine (RBM), a generative probabilistic graphical model as a key component for unsupervised learning in deep networks. We experimentally demonstrate a 45-synapse RBM realized with 90 resistive switching phase change memory (PCM) elements trained with a bio-inspired variant of the Contrastive Divergence (CD) algorithm, implementing Hebbian and anti-Hebbian weight updates. The resistive PCM devices show a two-fold to ten-fold reduction in error rate in a missing pixel pattern completion task trained over 30 epochs, compared to untrained case. Measured programming energy consumption is 6.1 nJ per epoch with the resistive switching PCM devices, a factor of ~150 times lower than conventional processor-memory systems. We analyze and discuss the dependence of learning performance on cycle-to-cycle variations as well as number of gradual levels in the PCM analog memory devices.
💡 Analysis
Current large scale implementations of deep learning and data mining require thousands of processors, massive amounts of off-chip memory, and consume gigajoules of energy. Emerging memory technologies such as nanoscale two-terminal resistive switching memory devices offer a compact, scalable and low power alternative that permits on-chip co-located processing and memory in fine-grain distributed parallel architecture. Here we report first use of resistive switching memory devices for implementing and training a Restricted Boltzmann Machine (RBM), a generative probabilistic graphical model as a key component for unsupervised learning in deep networks. We experimentally demonstrate a 45-synapse RBM realized with 90 resistive switching phase change memory (PCM) elements trained with a bio-inspired variant of the Contrastive Divergence (CD) algorithm, implementing Hebbian and anti-Hebbian weight updates. The resistive PCM devices show a two-fold to ten-fold reduction in error rate in a missing pixel pattern completion task trained over 30 epochs, compared to untrained case. Measured programming energy consumption is 6.1 nJ per epoch with the resistive switching PCM devices, a factor of ~150 times lower than conventional processor-memory systems. We analyze and discuss the dependence of learning performance on cycle-to-cycle variations as well as number of gradual levels in the PCM analog memory devices.
📄 Content
1 Abstract— Current large scale implementations of deep learning and data mining require thousands of processors, massive amounts of off-chip memory, and consume gigajoules of energy. Emerging memory technologies such as nanoscale two- terminal resistive switching memory devices offer a compact, scalable and low power alternative that permits on-chip co- located processing and memory in fine-grain distributed parallel architecture. Here we report first use of resistive switching memory devices for implementing and training a Restricted Boltzmann Machine (RBM), a generative probabilistic graphical model as a key component for unsupervised learning in deep networks. We experimentally demonstrate a 45-synapse RBM realized with 90 resistive switching phase change memory (PCM) elements trained with a bio-inspired variant of the Contrastive Divergence (CD) algorithm, implementing Hebbian and anti- Hebbian weight updates. The resistive PCM devices show a two- fold to ten-fold reduction in error rate in a missing pixel pattern completion task trained over 30 epochs, compared to untrained case. Measured programming energy consumption is 6.1 nJ per epoch with the resistive switching PCM devices, a factor of ~150 times lower than conventional processor-memory systems. We analyze and discuss the dependence of learning performance on cycle-to-cycle variations as well as number of gradual levels in the PCM analog memory devices.
Index
Terms—neuromorphic
computing,
phase
change
memory, resistive memory, brain-inspired hardware, cognitive
computing
I. INTRODUCTION
EEP learning can extract complex and useful structures
within
high-dimensional
data,
without
requiring
significant amounts of manual feature engineering [1]. It has
This work is supported in part by SONIC, one of six centers of STARnet, a
Semiconductor Research Corporation program sponsored by MARCO and
DARPA, the NSF Expedition on Computing (Visual Cortex on Silicon, award
1317470), and the member companies of the Stanford Non-Volatile Memory
Technology Research Initiative (NMTRI) and the Stanford SystemX Alliance.
S.B. Eryilmaz and H.-S.P. Wong are with the Electrical Engineering
Department, Stanford University, Stanford, CA 94305 USA (e-mail:
eryilmaz@stanford.edu; hspwong@stanford.edu).
E. Neftci is with the Department of Cognitive Sciences, UC Irvine, Irvine,
CA 92697 USA (e-mail: eneftci@uci.edu).
S. Joshi is with the Department of Electrical and Computer Engineering,
UC San Diego, San Diego, CA 92093 USA (e-mail: sijoshi@eng.ucsd.edu).
S. Kim, M. BrightSky, C. Lam are with the IBM Research, Yorktown
Heights,
NY
10598
USA
(e-mail:
SangBum.Kim@us.ibm.com;
breitm@us.ibm.com; clam@us.ibm.com).
H.-L. Lung is with the Macronix International Co., Ltd., Emerging Central
Lab, Taiwan (e-mail: Sllung@mxic.com.tw).
G. Cauwenberghs is with the Department of Bioengineering, UC San
Diego, San Diego, CA 92093 USA (e-mail: gert@ucsd.edu).
made significant advances in recent years and is shown to
outperform many other machine learning techniques for a
variety of tasks such as image recognition, speech recognition,
natural language understanding, predicting the effects of
mutations in DNA, and reconstructing brain circuits [2].
However, training of large scale deep networks (~109
synapses, compared to ~1015 synapses in human brain) in
today’s hardware consumes more than 10 gigajoules
(estimated) of energy [3-4]. An important origin of this energy
consumption is the physical separation of processing and
memory, which is exacerbated by the large amounts of data
needed for training deep networks [1-5]. It has been reported
that ~40 percent of energy consumed in general purpose
computers are due to the off-chip memory hierarchy [6], and
this fraction will increase when applications are more data-
centric [7]. GPUs do not solve this problem, since up to 50
percent of dynamic power and 30 percent of overall power are
consumed by off-chip memory as shown in several
benchmarks [8]. On-chip SRAM does not solve the problem
either, since it is very area inefficient (> 100 F2, F being the
minimum half-pitch allowed by the considered lithography)
and cannot scale up with system size.
Extracting useful information from data, which requires
efficient data mining and (deep) learning algorithms, is
becoming increasingly common in consumer products such as
smartphones, and is expected to be even more important for
the internet-of-things (IoT) [9]; where energy efficiency is
especially crucial. To scale up these systems in an energy
efficient manner, it is necessary to develop new learning
algorithms and hardware architectures that can capitalize on
fine-grained on-chip integration of memory with computation.
Because the number of synapses in a neural network far
exceeds the number of neurons, we must pay special attention
to the power, device density, and wiring of the electronic
synapses, f
This content is AI-processed based on ArXiv data.