A Guide to Teaching Data Science

February 23, 2026

Reading time: 6 minute

...

📝 Abstract

Demand for data science education is surging and traditional courses offered by statistics departments are not meeting the needs of those seeking training. This has led to a number of opinion pieces advocating for an update to the Statistics curriculum. The unifying recommendation is computing should play a more prominent role. We strongly agree with this recommendation, but advocate the main priority is to bring applications to the forefront as proposed by Nolan and Speed (1999). We also argue that the individuals tasked with developing data science courses should not only have statistical training, but also have experience analyzing data with the main objective of solving real-world problems. Here, we share a set of general principles and offer a detailed guide derived from our successful experience developing and teaching a graduate-level, introductory data science course centered entirely on case studies. We argue for the importance of statistical thinking, as defined by Wild and Pfannkuck (1999) and describe how our approach teaches students three key skills needed to succeed in data science, which we refer to as creating, connecting, and computing. This guide can also be used for statisticians wanting to gain more practical knowledge about data science before embarking on teaching an introductory course.

💡 Analysis

🇰🇷 한글로 읽기

📄 Content

A Guide to Teaching Data Science

Stephanie C. Hicks1,2, Rafael A. Irizarry1,2 1Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA 2Department of Biostatistics, Harvard School of Public Health, Boston, MA

Emails: Stephanie C. Hicks, shicks@jimmy.harvard.edu Rafael A. Irizarry, rafa@jimmy.harvard.edu

Abstract Demand for data science education is surging and traditional courses offered by statistics departments are not meeting the needs of those seeking training. This has led to a number of opinion pieces advocating for an update to the Statistics curriculum. The unifying recommendation is computing should play a more prominent role. We strongly agree with this recommendation, but advocate the main priority is to bring applications to the forefront as proposed by Nolan and Speed (1999). We also argue that the individuals tasked with developing data science courses should not only have statistical training, but also have experience analyzing data with the main objective of solving real-world problems. Here, we share a set of general principles and offer a detailed guide derived from our successful experience developing and teaching a graduate-level, introductory data science course centered entirely on case studies. We argue for the importance of statistical thinking, as defined by Wild and Pfannkuck (1999) and describe how our approach teaches students three key skills needed to succeed in data science, which we refer to as creating, connecting, and computing. This guide can also be used for statisticians wanting to gain more practical knowledge about data science before embarking on teaching an introductory course.

Keywords data science, applied statistics, teaching principles, active learning, reproducibility, computing

INTRODUCTION 1.1 What do we mean by Data Science? The term data science is used differently in different contexts since the needs of data-driven enterprises are varied and include acquisition, management, processing, and interpretation of data as well as communication of insights and development of software to perform these tasks. Michael Hochster defined two broad categorizations of data scientists: “Type A [for ‘Analysis’] 1 Data Scientist” and “Type B [for ‘Building’] Data Scientist”, where Type A is “similar to a statistician… but knows all the practical details of working with data that aren’t taught in the statistics curriculum” and Type B are “very strong coders … may be trained as software engineers and mainly interested in using data ‘in production’”. Here we focus on the term data science as it refers generally to Type A data scientists who process and interpret data as it pertains to answering real-world questions. We do not make any recommendations as it pertains to training Type B data scientist as we view this as a task better suited for engineering or computer science departments.

1.2 Why are statistics departments of a natural home for Data Science in Academia? 2 Current successful Data Science education initiatives in academia have resulted from combined efforts from different departments. Here we argue that statistics departments should be part of these collaborations. The statistics discipline was born directly from the endeavour most commonly associated with data science: data processing and interpretation as it pertains to answering real world questions. Most of the principles, frameworks and methodologies that encompass this discipline were originally developed as solutions to practical problems. 1 https://www.quora.com/What-is-data-science 2 We include biostatistics departments Furthermore, one would be hard pressed to find a successful data analysis by a modern data scientist that is not grounded, in some form or another, in some statistical principle or method. Concepts such as inference, modelling, and data visualization, are an integral part of the toolbox of the modern data scientist. Wild and Pfannkuck (1999) describe applied statistics as: “part of the information gathering and learning process which, in an ideal world, is undertaken to inform decisions and actions. With industry, medicine and many other sectors of society increasingly relying on data for decision making, statistics should be an integral part of the emerging information era”.
A department that embraces applied statistics as defined above is a natural home for data science in academia. For a larger summary of the current discussions in the statistical literature describing how past contributions of the field have influenced today’s data science, we refer the reader to Supplementary Section 1.

1.3 What is missing in the current Statistics curriculum? Creating, Computing, Connecting
Despite important subject matter insights and the discipline’s applied roots, research in current academic statistics departmen

View Original ArXiv

This content is AI-processed based on ArXiv data.

A Guide to Teaching Data Science

📝 Abstract

💡 Analysis

📄 Content

Table of Contents

Table of Contents

📝 Abstract

💡 Analysis

📄 Content

Start searching

No results found