The Deep Learning Revolution and Its Implications for Computer Architecture and Chip Design
📝 Abstract
The past decade has seen a remarkable series of advances in machine learning, and in particular deep learning approaches based on artificial neural networks, to improve our abilities to build more accurate systems across a broad range of areas, including computer vision, speech recognition, language translation, and natural language understanding tasks. This paper is a companion paper to a keynote talk at the 2020 International Solid-State Circuits Conference (ISSCC) discussing some of the advances in machine learning, and their implications on the kinds of computational devices we need to build, especially in the post-Moore’s Law-era. It also discusses some of the ways that machine learning may also be able to help with some aspects of the circuit design process. Finally, it provides a sketch of at least one interesting direction towards much larger-scale multi-task models that are sparsely activated and employ much more dynamic, example- and task-based routing than the machine learning models of today.
💡 Analysis
The past decade has seen a remarkable series of advances in machine learning, and in particular deep learning approaches based on artificial neural networks, to improve our abilities to build more accurate systems across a broad range of areas, including computer vision, speech recognition, language translation, and natural language understanding tasks. This paper is a companion paper to a keynote talk at the 2020 International Solid-State Circuits Conference (ISSCC) discussing some of the advances in machine learning, and their implications on the kinds of computational devices we need to build, especially in the post-Moore’s Law-era. It also discusses some of the ways that machine learning may also be able to help with some aspects of the circuit design process. Finally, it provides a sketch of at least one interesting direction towards much larger-scale multi-task models that are sparsely activated and employ much more dynamic, example- and task-based routing than the machine learning models of today.
📄 Content
The Deep Learning Revolution and Its Implications for Computer Architecture and Chip Design
Jeffrey Dean Google Research jeff@google.com Abstract
The past decade has seen a remarkable series of advances in machine learning, and in particular deep learning approaches based on artificial neural networks, to improve our abilities to build more accurate systems across a broad range of areas, including computer vision, speech recognition, language translation, and natural language understanding tasks. This paper is a companion paper to a keynote talk at the 2020 International Solid-State Circuits Conference (ISSCC) discussing some of the advances in machine learning, and their implications on the kinds of computational devices we need to build, especially in the post-Moore’s Law-era. It also discusses some of the ways that machine learning may also be able to help with some aspects of the circuit design process. Finally, it provides a sketch of at least one interesting direction towards much larger-scale multi-task models that are sparsely activated and employ much more dynamic, example- and task-based routing than the machine learning models of today. Introduction
The past decade has seen a remarkable series of advances in machine learning (ML), and in particular deep learning approaches based on artificial neural networks, to improve our abilities to build more accurate systems across a broad range of areas [LeCun et al. 2015]. Major areas of significant advances include computer vision [Krizhevsky et al. 2012, Szegedy et al. 2015, He et al. 2016, Real et al. 2017, Tan and Le 2019], speech recognition [Hinton et al. 2012, Chan et al. 2016], language translation [Wu et al. 2016] and other natural language tasks [Collobert et al. 2011, Mikolov et al. 2013, Sutskever et al. 2014, Shazeer et al. 2017, Vaswani et al. 2017, Devlin et al. 2018]. The machine learning research community has also been able to train systems to accomplish some challenging tasks by learning from interacting with environments, often using reinforcement learning, showing success and promising advances in areas such as playing the game of Go [Silver et al. 2017], playing video games such as Atari games [Mnih et al. 2013, Mnih et al. 2015] and Starcraft [Vinyals et al. 2019], accomplishing robotics tasks such as substantially improved grasping for unseen objects [Levine et al. 2016, Kalashnikov et al. 2018], emulating observed human behavior [Sermanet et al. 2018], and navigating complex urban environments using autonomous vehicles [Angelova et al. 2015, Bansal et al. 2018].
As an illustration of the dramatic progress in the field of computer vision, Figure 1 shows a graph of the improvement over time for the Imagenet challenge, an annual contest run by Stanford University [Deng et al. 2009] where contestants are given a training set of one million color images across 1000 categories, and then use this data to train a model to generalize to an evaluation set of images across the same categories. In 2010 and 2011, prior to the use of deep learning approaches in this contest, the winning entrants used hand-engineered computer vision features and the top-5 error rate was above 25%. In 2012, Alex Krishevsky, Ilya Sutskever, and Geoffrey Hinton used a deep neural network, commonly referred to as “AlexNet”, to take first place in the contest with a major reduction in the top-5 error rate to 16% [Krishevsky et al. 2012]. Their team was the only team that used a neural network in 2012. The next year, the deep learning computer vision revolution was in full force with the vast majority of entries from teams using deep neural networks, and the winning error rate again dropped substantially to 11.7%. We know from a careful study that Andrej Karpathy performed that human error on this task is just above 5% if the human practices for ~20 hours, or 12% if a different person practices for just a few hours [Karpathy 2014]. Over the course of the years 2011 to 2017, the winning Imagenet error rate dropped sharply from 26% in 2011 to 2.3% in 2017.
Figure 1: ImageNet classification contest winner accuracy over time
These advances in fundamental areas like computer vision, speech recognition, language understanding, and large-scale reinforcement learning have dramatic implications for many fields. We have seen a steady series of results in many different fields of science and medicine by applying the basic research results that have been generated over the past decade to these problem areas. Examples include promising areas of medical imaging diagnostic tasks including for diabetic retinopathy [Gulshan et al. 2016, Krause et al. 2018], breast cancer pathology [Liu et al. 2017], lung cancer CT scan interpretation [Ardila et al. 2019], and dermatology [Esteva et al. 2017]. Sequential prediction methods that are useful for language
This content is AI-processed based on ArXiv data.