A Guide to Teaching Data Science
📝 Abstract
Demand for data science education is surging and traditional courses offered by statistics departments are not meeting the needs of those seeking training. This has led to a number of opinion pieces advocating for an update to the Statistics curriculum. The unifying recommendation is computing should play a more prominent role. We strongly agree with this recommendation, but advocate the main priority is to bring applications to the forefront as proposed by Nolan and Speed (1999). We also argue that the individuals tasked with developing data science courses should not only have statistical training, but also have experience analyzing data with the main objective of solving real-world problems. Here, we share a set of general principles and offer a detailed guide derived from our successful experience developing and teaching a graduate-level, introductory data science course centered entirely on case studies. We argue for the importance of statistical thinking, as defined by Wild and Pfannkuck (1999) and describe how our approach teaches students three key skills needed to succeed in data science, which we refer to as creating, connecting, and computing. This guide can also be used for statisticians wanting to gain more practical knowledge about data science before embarking on teaching an introductory course.
💡 Analysis
Demand for data science education is surging and traditional courses offered by statistics departments are not meeting the needs of those seeking training. This has led to a number of opinion pieces advocating for an update to the Statistics curriculum. The unifying recommendation is computing should play a more prominent role. We strongly agree with this recommendation, but advocate the main priority is to bring applications to the forefront as proposed by Nolan and Speed (1999). We also argue that the individuals tasked with developing data science courses should not only have statistical training, but also have experience analyzing data with the main objective of solving real-world problems. Here, we share a set of general principles and offer a detailed guide derived from our successful experience developing and teaching a graduate-level, introductory data science course centered entirely on case studies. We argue for the importance of statistical thinking, as defined by Wild and Pfannkuck (1999) and describe how our approach teaches students three key skills needed to succeed in data science, which we refer to as creating, connecting, and computing. This guide can also be used for statisticians wanting to gain more practical knowledge about data science before embarking on teaching an introductory course.
📄 Content
A Guide to Teaching Data Science
Stephanie C. Hicks1,2, Rafael A. Irizarry1,2 1Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA 2Department of Biostatistics, Harvard School of Public Health, Boston, MA
Emails: Stephanie C. Hicks, shicks@jimmy.harvard.edu Rafael A. Irizarry, rafa@jimmy.harvard.edu
Abstract Demand for data science education is surging and traditional courses offered by statistics departments are not meeting the needs of those seeking training. This has led to a number of opinion pieces advocating for an update to the Statistics curriculum. The unifying recommendation is computing should play a more prominent role. We strongly agree with this recommendation, but advocate the main priority is to bring applications to the forefront as proposed by Nolan and Speed (1999). We also argue that the individuals tasked with developing data science courses should not only have statistical training, but also have experience analyzing data with the main objective of solving real-world problems. Here, we share a set of general principles and offer a detailed guide derived from our successful experience developing and teaching a graduate-level, introductory data science course centered entirely on case studies. We argue for the importance of statistical thinking, as defined by Wild and Pfannkuck (1999) and describe how our approach teaches students three key skills needed to succeed in data science, which we refer to as creating, connecting, and computing. This guide can also be used for statisticians wanting to gain more practical knowledge about data science before embarking on teaching an introductory course.
Keywords data science, applied statistics, teaching principles, active learning, reproducibility, computing
- INTRODUCTION 1.1 What do we mean by Data Science? The term data science is used differently in different contexts since the needs of data-driven enterprises are varied and include acquisition, management, processing, and interpretation of data as well as communication of insights and development of software to perform these tasks. Michael Hochster defined two broad categorizations of data scientists: “Type A [for ‘Analysis’] 1 Data Scientist” and “Type B [for ‘Building’] Data Scientist”, where Type A is “similar to a statistician… but knows all the practical details of working with data that aren’t taught in the statistics curriculum” and Type B are “very strong coders … may be trained as software engineers and mainly interested in using data ‘in production’”. Here we focus on the term data science as it refers generally to Type A data scientists who process and interpret data as it pertains to answering real-world questions. We do not make any recommendations as it pertains to training Type B data scientist as we view this as a task better suited for engineering or computer science departments.
1.2 Why are statistics departments of a natural home for Data Science in Academia?
2
Current successful Data Science education initiatives in academia have resulted from combined
efforts from different departments. Here we argue that statistics departments should be part of
these collaborations. The statistics discipline was born directly from the endeavour most
commonly associated with data science: data processing and interpretation as it pertains to
answering real world questions. Most of the principles, frameworks and methodologies that
encompass this discipline were originally developed as solutions to practical problems.
1 https://www.quora.com/What-is-data-science
2 We include biostatistics departments
Furthermore, one would be hard pressed to find a successful data analysis by a modern data
scientist that is not grounded, in some form or another, in some statistical principle or method.
Concepts such as inference, modelling, and data visualization, are an integral part of the
toolbox of the modern data scientist. Wild and Pfannkuck (1999) describe applied statistics as:
“part of the information gathering and learning process which, in an ideal world, is
undertaken to inform decisions and actions. With industry, medicine and many other
sectors of society increasingly relying on data for decision making, statistics should be
an integral part of the emerging information era”.
A department that embraces applied statistics as defined above is a natural home for data
science in academia. For a larger summary of the current discussions in the statistical literature
describing how past contributions of the field have influenced today’s data science, we refer the
reader to Supplementary Section 1.
1.3 What is missing in the current Statistics curriculum? Creating, Computing, Connecting
Despite important subject matter insights and the discipline’s applied roots, research in current
academic statistics departmen
This content is AI-processed based on ArXiv data.