Reflections on the Future of Statistics Education in a Technological Era
Keeping pace with rapidly evolving technology is a key challenge in teaching statistics. To equip students with essential skills for the modern workplace, educators must integrate relevant technologies into the statistical curriculum where possible. …
Authors: Craig Alex, er, Jennifer Gaskell
Reflections on the F uture of Statistics Education in a T ec hnological Era Craig Alexander 1,*, † , Jennifer Gask ell 1 , Vinn y Davies 1 1 Sc ho ol of Mathematics and Statistics, Univ ersity of Glasgo w * Corresp onding author email - Craig.Alexander.2@Glasgo w.ac.uk † All authors con tributed equally to this work. Abstract Keeping pace with rapidly ev olving tec hnology is a key c hallenge in teaching statistics. T o equip studen ts with essential skills for the modern workplace, educators must in tegrate relev an t tec hnologies in to the statistical curriculum where possible. Universit y-level statistics education has exp erienced substan tial technological change, particularly in the to ols and practices that underpin teac hing and learning. Statistical programming has become cen tral to many courses, with R widely used and Python increasingly incorp orated into statistics and data analytics programmes. Additionally , coding practices, database managemen t, and machine learning now feature within some statistics curricula. Lo oking ahead, w e an ticipate a gro wing emphasis on artificial in telligence (AI), particularly the p edagogical implications of generative AI to ols suc h as ChatGPT. In this article, w e explore these technological dev elopments and discuss strategies for their integration in to contemporary statistics education. 1 In tro duction The w a y w e teac h is constan tly ev olving, and this is esp ecially true in the statistical sciences. Adv ances in computing p o wer and data storage ha ve significan tly expanded our ability to collect and analyse data. As metho ds and mo dels contin ue to dev elop to accommo date increasingly large and complex datasets, the discipline is shifting to wards approaches that were not traditionally considered part of statistics, necessitating a deeper understanding of computational thinking and data-driven workflo ws. Broadly categorised under ‘data science’, this field encompasses Statistics, Machine Learning (ML), and Artificial Intelligence (AI), with the distinctions b et ween them often blurred ( Diggle , 2015 ; Bzdok et al. , 2018 ). The rapid expansion of these techniques raises imp ortant questions ab out how b est to in tegrate them into universit y-lev el statistical curricula ( Hardin et al. , 2015 ). Statistics education has alw ays adapted to changes in technology and practice. While man y core statistical principles remain largely unchanged o ver long p erio ds, other parts of the curriculum ha ve ev olved considerably . Earlier shifts w ere strongly linked to the mov e tow ards mo dern programming languages, particularly with the adoption of R for statistical computing. More recen tly , how ev er, these c hanges hav e extended beyond programming languages themselv es. New data types, ev olving co ding practices, and increasingly complex modelling approaches are reshaping ho w statistics is applied in b oth researc h and industry . In this article, w e consider how these technological developmen ts are influencing the statistics curriculum and ho w educators migh t resp ond to them within existing programmes. W e do not fo cus on educational technologies suc h as liv e p olling to ols, as, while imp ortant, they are not sp ecific to statistical education and hav e b een explored in detail elsewhere ( Bond et al. , 2020 ; Grani´ c , 2022 ). W e b egin b y discussing ho w programming practices hav e evolv ed, including the rise of R, the dev elopment of tidyverse workflo ws, and the ongoing debate around teac hing base R, tidyv erse, or a combination of b oth. W e then consider how Python might b e introduced alongside R, including the p oten tial benefits of m ulti-language teaching and the implications for student cognitive load. The pap er then mo ves to c hanges in data and co ding practice, exploring ho w large or unstructured data sources, accessed through application programming interfaces (APIs) and w eb scraping, are influencing 1 statistical work. W e discuss how these topics might b e integrated gradually across programmes rather than taught in isolation, alongside the gro wing role of version control and repro ducible w orkflows as imp ortan t skills for mo dern statisticians. Next, we examine ho w ML and AI may be incorp orated into statistics curricula. W e argue that the depth and exten t of co verage should dep end on graduate pathw ays and programme goals. In some cases, ML can b e integrated into existing mo dules, while in others dedicated courses may b e appropriate. More adv anced AI con tent is lik ely to dep end on the a v ailabilit y of educators with relev an t expertise. Finally , w e consider the rapid emergence of generative AI tools, discussing different attitudes to their use in teaching, ho w students are already engaging with them, and the implications for assessment, marking, and feedback. Throughout, w e reflect on how these developmen ts shap e teac hing practice and the c hallenges they present for educators. This article is structured as follo ws. Section 2 examines dev elopments in statistical programming, including the role of RStudio and the integration of Python within statistics curricula ( Kenett et al. , 2022 ). Section 3 discusses mo dern data sources, strategies for teaching data managemen t, and b est practices for sharing and storing co de efficien tly . Section 4 explores the integration of ML and AI into statistical education, and Section 5 assesses the impact of generative AI to ols suc h as ChatGPT on statistical learning and assessment. 2 Programming within the Statistics Curriculum In recent decades, adv ances in computing and the dev elopmen t of programming languages in the field of statistics hav e led to a transformation in the wa y the modern statistician carries out analyses. T raditional statistical metho ds, once confined to manual calculations and limited softw are tools, ha v e ev olved in to complex data-driven approac hes supp orted b y programming environmen ts designed for efficiency , scalabilit y , and repro ducibility . Languages suc h as R, Python, and Julia hav e b ecome essen tial for statisticians, offering extensiv e libraries for data analysis, visualisation, and ML. The dev elopment of suc h programming languages has help ed to shap e mo dern statistical practices, enabling researc hers and analysts to pro cess large datasets and dev elop inno v ativ e metho dologies that w ere once considered impractical. The developmen t of statistical softw are b egan in the mid-20th century , when early computing mac hines w ere first used to automate statistical calculations. In the 1960s and 1970s, languages like F OR TRAN were implemented to create some of the earliest besp oke statistical programs, such as SAS. F ollowing these first examples of statistical softw are, an increase of sp ecialised statistical softw are such as SPSS and S-PLUS became a v ailable to statisticians, expanding up on earlier soft ware, pro viding a more user-friendly in terface for data analysis, and expanding the suite of modelling tec hniques av ailable to the user. The emergence of R in 2000 revolutionised statistical computing by offering an op en-source, ex- tensible platform for statisticians. As free softw are, its accessibility and reach quickly surpassed many comp etitors, which often required a licence. As a command-line program, R provided greater flexibil- it y in the range and scale of tasks it could p erform, supported b y a substan tial ecosystem of external pac k ages. This flexibility gav e it a clear adv an tage o ver menu-driv en alternativ es suc h as Minitab and SPSS. Ho wev er, the shift to wards R also required man y educators to transition from more famil- iar soft ware environmen ts to a scripting-based language, presen ting b oth technical and p edagogical c hallenges as staff adapted their teac hing materials and approac hes. As R b ecame established within academic and professional practice, it increasingly assumed a cen tral role in statistics education. Among the statistical soft ware pack ages a v ailable, it has emerged as a widely used language in mo dern curricula. As the demand for mo delling skills using softw are has gro wn, the question in many statistics programmes is no longer whether R should be taugh t, but ho w it should b e integrated into the curriculum to b est prepare students for the workplace. While R remains central to con temp orary statistical computing, its place in education contin ues to evolv e. Man y academic institutions now em b ed R programming across statistics mo dules, and its use has expanded b ey ond traditional disciplinary b oundaries. In the follo wing sections, we examine how the teac hing of R has dev eloped within higher education, considering its p edagogical v alue, approaches to curriculum design, and, subsequently , how Python can b e integrated alongside R to supp ort a broader computational skillset. 2 2.1 R in the Statistics Curriculum As the use of R has grown across academia and industry , particularly alongside increased attention to ‘big data’ problems, the to ols and facilities av ailable to R users hav e expanded significantly . This demand has had a transformative effect on statistics education, with R programming now commonplace in man y statistics curricula. In this con text, man y users choose to work with R through an integrated dev elopment environmen t (IDE), which pro vides a more supp ortiv e interface for writing, running, and managing co de. One of the most commonly used IDE’s for R is RStudio. RStudio extends the functionality of base R by providing a graphical user interface (GUI), that is, a visual in terface through which users interact with co de and data, alongside features such as syntax highligh ting, auto-completion of co de, and in tegrated data visualisation to ols. These features help reduce the learning curve for beginners and supp ort go od coding practice. The open-source nature of RStudio ensures compatibilit y across ma jor op erating systems and promotes accessibilit y b y removing financial barriers, allo wing studen ts to install and use the soft ware freely on their o wn devices. Presen ting statistical softw are in an accessible and user-friendly manner is imp ortan t for shaping studen ts’ exp eriences and attitudes to wards statistics. Considerations of usability ha v e therefore mo- tiv ated man y recen t dev elopments within b oth R and RStudio, several of which are discussed in the follo wing sections. While these tools can significantly enhance teac hing and learning, their effective in tegration into the curriculum dep ends on educators p ossessing, or developing, the relev an t exp ertise to use them confidently and pedagogically effectiv ely . 2.1.1 Tidyv erse One of the most commonly used pac k ages within R is the tidyverse suite of pack ages ( Wickham et al. , 2019 ). The underlying philosophy of this collection is to promote the practice of tidy data ( Wickham , 2014 ), encouraging users to structure data in a consisten t format from the outset to supp ort a more efficien t analysis workflo w ( Wickham et al. , 2023b ). The tidyv erse itself comprises multiple pac k ages designed to supp ort the different stages of a statistical analysis, from data import and transformation to visualisation and modelling. Its p opularit y has also con tributed to increased use of R in disciplines b ey ond statistics, including psychology ( Ryan , 2021 ), so cial science ( Imai and Williams , 2022 ), and medicine ( Musa et al. , 2023 ). The p edagogical adv antages o ver base R commands stem from the unified design philosophy of tidyv erse, which prioritises consistency , clarit y and simplifies concepts for b eginners. Unlik e base R, where functions can hav e inconsisten t naming conv entions and require nested syntax for complex op erations, tidyverse offers a more coherent set of pack ages with clear naming conv entions for function (for example, filter() , mutate() ), helping to reduce the cognitive load for learners. Central to the tidyv erse suite is the use of the pip e op erator, which replaces base R’s nested function calls with step- b y-step w orkflows. F or example, an op eration which transforms v ariables in data may require sev eral nested functions using base R b ecomes a logical sequence of verbs in tidyverse, reflecting how learners naturally think through problems. This not only improv es code readability but encourages mo dular problem solving, breaking analyses in to smaller, interpretable steps ( Johnson , 1997 ). Visualisation is also simplified through tidyverse b y use of the ggplot pack age. Though base R plot- ting functions (e.g. plot() , hist() ) are flexible, they lac k the la yered grammar-of-graphics framework of ggplot. By building plots incrementally b y adding aesthetics, geometries and themes, learners de- v elop a structured understanding of data visualisation similar to constructing sentences in language ( Wic kham , 2011 ). Although there are clear p edagogical b enefits to teaching R using tidyverse, the question arises whether w e should teach only tidyv erse functionality o ver base R, or vice v ersa, or a hybrid of both. With a tidyv erse only approach, the learning curv e is low er which is ideal for b eginners and allows them to construct readable co de for step wise thinking. There is also a relev ance within industry of kno wing tidyverse, as this is commonly used within mo dern industry roles. The downside to this approac h is learners will effectively skip base R commands ha ving a limited low-lev el understanding of R and potentially a voiding fundamen tal programming concepts and core operators within R. This can also cause issues for using certain statistical modelling methods, whic h use base R conv en tions. T eaching with base R only is somewhat the ‘standard’ approac h across institutions, providing a solid foundational knowledge of R and core programming principles such as loops and vector op erations. Though more of a ‘forced’ skill, b eginning with base R can teach go o d debugging skills while working 3 through some of the quirks of R. The main do wnside to using only base R is the steep er learning curve for learners, particularly those with no programming exp erience and inconsisten t syn tax. Studies in the literature suggest that students can ac hieve similarly p ositive learning exp eriences when the same concepts are taught through different syntactic approaches, provided that the teaching is carefully structured ( Carscadden and Martin , 2022 ). In addition, Cetink ay a-Rundel et al. ( 2022 ) highligh ts how the grammar and design consistency of tidyv erse can supp ort understanding across the data analysis cycle. A hybrid approach to teac hing R, first introducing foundational programming concepts through base R b efore incorp orating tidyverse to ols for data analysis, can therefore be a reasonable strategy . Such an approach can foster a deeper understanding of tidyverse metho ds by grounding them in foundational knowledge of base R. The principal c hallenge lies in allocating sufficient curriculum time to co ver b oth persp ectiv es at a pace that do es not ov erwhelm learners. When carefully structured, how ev er, a hybrid mo del can also offer flexibility within teaching teams, enabling educators with strong tidyverse exp ertise to lead tidyverse-based comp onen ts, while others ma y teach using base R or tidyv erse as appropriate to the module con tent and their o wn experience. 2.1.2 Shin y The creation of RStudio has also lead to the dev elopment of effective communication libraries and tools for rep orting data analysis to a non-technical audience. The shiny pack age ( Chang et al. , 2025 ) allows for the simple dev elopment of w eb applications in RStudio, pro viding an approach for constructing dynamic web-based applications, using R, a v oiding the need for learners to require knowledge in HTML or Jav aScript. A common theme within the workplace for a statistician is expanding beyond conducting analysis and incorp orating effective communication. Shiny has emerged as a k ey to ol for bridging this gap b et w een theory and application and has b een promoted within higher education as effectiv e teac hing to ols ( Potter et al. , 2016 ; F aw cett , 2018 ). Shin y allows for the transformation of metho dology into user-friendly applications, allowing studen ts to explore data dynamically , visualise patterns within mo del framew orks, and develop understanding by building apps to communicate metho ds, allo wing the learner to critically think ab out the metho d in a deep er manner. The use of shiny within assessments can aid critical thinking when structuring apps, building functional applications whic h reinforce key programming skills. New concepts such as reactive pro- gramming are also explored through shin y , dev eloping skills in structuring co de in a modular fashion. Including shin y within the statistics curriculum allo ws us to help bridge the gap betw een theory and application, promoting go od practice in effectiv e communication while further developing knowledge. 2.1.3 RMarkdo wn & Quarto The notions of repro ducible research and effective data communication are fundamental to b oth sci- en tific and industry practice. T o ols such as RMarkdo wn ( Allaire et al. , 2024 ) and Quarto ( Allaire and Dervieux , 2025 ) supp ort these principles and hav e therefore b ecome increasingly common com- p onen ts of mo dern statistics curricula. Their inclusion reflects a broader shift in statistics education to wards practices that mirror real-world analytical workflo ws, where analysis, do cumen tation, and comm unication are closely in tertwined. The in tegration of R in to the statistics curriculum has con tributed to a mo v e tow ards more pro ject- based learning, with op en-ended assignmen ts that require students to carry out statistical analyses and produce written rep orts. T o ols such as RMarkdown and Quarto encourage learners to combine co de, narrative, and results within a single do cumen t, reinforcing go o d practice in repro ducibility and transparen t research. In doing so, they foreground the imp ortance of not only conducting an analysis, but also clearly communicating its purpose, assumptions, and conclusions. Including RMarkdown and Quarto within the statistics curriculum therefore supp orts the devel- opmen t of skills that extend b ey ond tec hnical computation. By requiring students to structure their reasoning, justify mo delling choices, and presen t results coherently , these to ols help cultiv ate habits that are directly transferable to b oth academic research and the w orkplace. As a result, their use pro vides a natural and authentic mechanism for assessing statistical understanding alongside comm u- nication and repro ducibilit y . 4 2.2 P olyglot Programming The emphasis on reproducible w orkflows and in tegrated communication naturally leads to broader questions ab out the role of programming en vironments in statistics education, particularly as analytical practice increasingly spans m ultiple programming languages. Recen t dev elopmen ts around Quarto reflect this shift and are closely link ed to changes in the to ols and services offered by Posit, the dev elop ers of RStudio. One motiv ation for these dev elopmen ts has b een to mo ve b ey ond a solely R-fo cused ecosystem and to supp ort the integration of additional languages such as Python and Julia. Through Quarto and recen t extensions to RStudio, users can no w work with multiple language engines within a single session, enabling gen uinely cross-language w orkflows. This reflects common practice in industry , where analysts often adopt a pragmatic, task-driven approach to language choice rather than relying on a single programming language. As mixed-language workflo ws b ecome more common across academia and industry , there is a gro wing case for broadening the programming exp osure provided within statistics curricula. While R remains cen tral to statistical education, languages such as Python and Julia are increasingly visible in data science, computational mo delling, and ML. Graduates are therefore likely to encounter profes- sional environmen ts in whic h m ultiple languages co exist, eac h selected according to con text and task. In tro ducing studen ts to this wider computational landscap e can b etter prepare them for collab orativ e and interdisciplinary settings. Expanding b eyond a single language, how ever, is not straightforw ard. T eaching tw o programming languages can b e challenging, particularly when many statistics educators were trained primarily in R and ma y hav e limited experience with Python, while familiarit y with Julia is less common still. Although b oth Python and Julia offer strong tec hnical capabilities, Python is currently more widely adopted across academia and industry , making it the more practical addition where curriculum space is constrained and in tro ducing three languages w ould b e unrealistic. This raises imp ortan t questions ab out how b est to in tro duce Python alongside R in statistics education, and whether the goal should b e a unified environmen t or familiarity with m ultiple tools. 2.2.1 In tro ducing Python Although Python predates R, ha ving been in tro duced in 1991, its inclusion within statistics curricula has largely o ccurred in more recen t y ears. This shift has b een driv en primarily b y Python’s prominence in ML and data science, where widely adopted libraries such as scikit-learn ( P edregosa et al. , 2011 ) ha ve made it a natural choice for teaching modern computational methods. As a result, Python has increasingly b een p ositioned alongside R in statistics education, particularly in con texts that emphasise predictiv e modelling and ML. While comparisons b etw een R and Python hav e b een explored in the literature ( Sudhak a , 2018 ; Ozgur et al. , 2017 ), the choice b et ween languages in practice is often shaped less b y technical differences and more by the surrounding ecosystem of pack ages and developer communities. Many core data structures and w orkflo ws hav e close similarities across the t wo languages, for example data frames in R and pandas ( Pandas , 2020 ) in Python, or ggplot2 ( Wickham , 2016 ) in R and visualisation frameworks suc h as matplotlib ( Hunter , 2007 ) and its grammar-of-graphics inspired coun terpart plotnine ( Plotnine , 2026 ) in Python. Similarly , established statistical mo delling frameworks dev elop ed in R, such as mgcv ( W o o d , 2017 ), no w hav e Python interfaces or related implemen tations, including pyGAM ( Serv ´ en and Brummitt , 2018 ) and p ymgcv ( Pymgcv , 2026 ). While these translations supp ort cross-language adoption, they often feel less natural than their native coun terparts, reflecting differences in language design and t ypical usage patterns. Ho wev er, important differences remain, particularly in areas driv en b y mac hine learning and deep learning. Python has b ecome the dominan t in terface for these metho ds, supported by mature and w ell-integrated frameworks suc h as T ensorFlow ( Abadi et al. , 2015 ), JAX ( Bradbury et al. , 2018 ), and PyT orch ( P aszk e et al. , 2019 ). While these framew orks are primarily accessed through Python, their computational back ends rely heavily on optimised compiled co de and hardware-accelerated libraries, enabling efficient large-scale and computationally in tensive w orkflows. Com bined with strong industry supp ort and activ e developmen t comm unities, this results in in terfaces that are often more accessible and flexible for mo dern mac hine learning applications. By con trast, many adv anced statistical meth- o ds contin ue to emerge first within R, reflecting the language’s close ties to the statistical research 5 comm unity . These differences are also shap ed b y the bac kgrounds of the respective developer communities. R pac k ages are frequently developed by statisticians and metho dologists, often alongside new theoretical con tributions, whereas Python libraries are more commonly pro duced by researchers and engineers w orking in ML, softw are developmen t, and industry . This divergence influences b oth the design prior- ities of pac k ages and the contexts in whic h eac h language is most naturally applied. The growing prominence of ML within statistical researc h and education therefore pro vides a strong motiv ation for introducing Python within statistics curricula. While Python is unlikely to replace R as the sole language of instruction in the near term, there is an increasing case for exp osing students to m ultiple programming languages. Such an approach reflects the diversit y of modern analytical practice and ackno wledges that different to ols ma y b e b etter suited to different tasks. This p erspec- tiv e motiv ates a closer examination of how R and Python might be taugh t together within statistics programmes, and what p edagogical mo dels are b est suited to supp orting multi-language learning. 2.2.2 T eac hing b oth R and Python If b oth R and Python are to be taught within a degree programme, decisions ab out how they are in tro duced and in tegrated require careful consideration, particularly regarding curriculum progression and the order in which studen ts encounter eac h language. While arguments can b e made for intro- ducing either language first, the practical reality for man y statistics programmes is that R is already em b edded across multiple modules. In suc h cases, in tro ducing R early and reinforcing it throughout the programme, b efore later incorp orating Python, is likely to b e a coherent and pragmatic approach. A further consideration is how Python should b e incorporated into existing courses. In a course on linear regression, for example, it is common to provide example co de and data to supp ort problem- based learning. In this context, an inclusive approach is to pro vide equiv alent Python co de alongside existing R examples, rather than treating Python as an optional or p eripheral addition. While pre- sen ting material in multiple languages risks increasing students’ cognitiv e load if not handled carefully , pro viding access to both languages can offer a v aluable additional resource, allo wing studen ts to com- pare approaches and supporting future learning beyond the immediate course con text. One approac h to managing this cognitiv e load is the use of language switchers embedded within teac hing materials, for the example the approach discussed in Jack et al. ( 2023 ). An implemen tation of this type is illustrated in Figure 1 , where students can toggle betw een R and Python co de within a single set of notes, allo wing them to focus primarily on one language while retaining access to the other as a reference or future learning resource. While the example tool is custom-built, similar functionality is no w supp orted through tools such as Quarto, making it increasingly feasible for educators to provide comparable multi-language materials without relying on b espoke institutional solutions. While offering materials in b oth languages pro vides flexibilit y for learners, it also in tro duces prac- tical challenges in delivery . Many statistics educators may b e less familiar with Python, just as ML or AI specialists ma y not necessarily know R. In such cases, a pragmatic approac h ma y b e to pro vide materials in b oth languages while teac hing primarily in one, dep ending on staff exp ertise. This mo del requires appropriate supp ort, b oth for developing materials in the alternativ e language and for ensur- ing that studen t questions can b e addressed by someone with relev ant exp erience. Shared teaching resp onsibilities or sc heduled drop-in sessions ma y help meet this need. There will also b e situations where equiv alen t implementations in b oth languages are not feasible. Even so, using b oth languages where p ossible remains beneficial, and op enly explaining wh y certain comp onen ts are language-specific can help studen ts dev elop a more realistic understanding of mixed-language practice. 3 The Influence of Data and Co ding Practices The gro wing prev alence of large-scale data, collected across many asp ects of life and business, alongside adv ances in parallel computing and cloud infrastructure, has significantly influenced how statistical analysis is conducted. The scale and complexity of mo dern datasets increasingly require not only statistical exp ertise, but also familiarity with co ding and computational to ols. As these tec hnologies con tinue to shape the field, they place greater emphasis on efficient data pro cessing and scalable analytical metho ds, c hallenging some traditional statistical w orkflows. 6 (a) R co de enabled (b) Python co de enabled Figure 1: An example of the language switcher, shown in the top right of each image. (a) R code enabled, allowing studen ts to view the notes as if they were a v ailable only in R. (b) Equiv alent view with Python enabled. Despite these dev elopments, statistical education do es not alw ays fully reflect this evolving land- scap e. While the core principles of statistics are typically usually well cov ered, the in tegration of co ding practices and computational tec hniques v aries considerably across programmes. As a result, some students graduate with limited exp osure to to ols and methods for w orking with large datasets and distributed computing environmen ts, b oth of whic h are becoming increasingly common in research and industry . Addressing this gap may require rethinking asp ects of how statistics is taught. Incorp orating mo d- ern co ding practices, such as co de optimisation, v ersion control, and the use of cloud-based resources, has the p oten tial to b etter equip studen ts for contemporary data challenges. By complementing tradi- tional statistical theory with applied computational training, programmes can help students develop skills that are increasingly relev ant in data-driv en con texts. This section examines how tec hnological shifts ha v e influenced statistical practice and explores ho w c hanges in teaching and assessmen t migh t better align statistical education with the demands of mo dern data analysis. 3.1 Mo dern data sources & structures With adv ances in tec hnology and the widespread use of digital systems in b oth professional and ev eryday contexts, mo dern data sources are increasingly abundant and offer rich opp ortunities for use in statistics teac hing. Such data ma y originate from websites, social media platforms, smart devices, op en public datasets, Internet of Things (IoT) systems, and research infrastructures. While the range of av ailable data is broad, accessing and w orking with these sources can present c hallenges. Mo dern datasets are often large, unstructured, or stored in formats that are not traditionally encountered in statistics education. This section therefore considers how mo dern data are commonly structured and highligh ts the computational considerations required to work with them effectiv ely . Most mo dern data can b e broadly categorised as structured, semi-structured, or unstructured. Structured data remains the most common format used in statistics teaching and is typically organ- 7 ised in to rigid sc hemas, such as tables with predefined fields. Large structured datasets are often stored in databases and accessed using SQL, which is already a familiar component of man y statistics programmes. As a result, structured data con tinues to pla y a central role in teaching foundational statistical concepts. Semi-structured data do es not conform to strict tabular formats but contains identifiable ele- men ts, such as tags or metadata, that supp ort pro cessing and transformation. Common examples include XML and JSON files, whic h are frequently accessed through application programming in ter- faces (APIs). Unstructured data, by contrast, lacks a predefined format and often requires substantial pre-pro cessing b efore analysis, with examples including free text, w eb-scrap ed con tent, and images. In tro ducing students to semi-structured and unstructured data offers opportunities to broaden the curriculum, for instance through the use of API queries to access public datasets or data generated through studen ts’ o wn digital activities, such as social media or fitness trac king platforms. These examples also provide an opp ortunit y to highlight that to ols for accessing and transforming such data are av ailable in b oth R and Python, reinforcing the transferability of these skills across languages. While not all educators may hav e prior exp erience with these workflo ws, generativ e AI to ols now offer p oten tial suppo rt for developing teaching materials and examples, making it increasingly feasible to incorp orate such topics into statistics teac hing. Bey ond data structure, the scale of mo dern datasets can itself p ose c hallenges. Some datasets are simply to o large to b e handled efficien tly on p ersonal machines. While the use of high-p erformance computing and clusters is common in statistical researc h, it is less frequently addressed in undergrad- uate teaching. As datasets contin ue to grow, distributed computing framew orks b ecome increasingly relev ant. T echnologies suc h as Hado op and Spark can therefore pla y a role in statistics education where the relev ant educator expertise exists. A gradual introduction is often appropriate, b eginning with data manipulation and database concepts on a single machine, b efore progressing to distributed framew orks and cloud-based w orkflows in more adv anced courses. Computational challenges are not limited to data storage and pro cessing. Many modern ML meth- o ds, particularly deep learning mo dels underlying tools such as generative AI systems, require sub- stan tial computational resources and are commonly trained using graphical processing units (GPUs). Access to such resources is increasingly mediated through cloud computing platforms, which allow users to submit jobs to remote serv ers equipp ed with specialised hardware. Introducing students to cloud- based environmen ts, for example through free platforms such as Go ogle Colab, can pro vide a practical en try p oin t to these concepts, although the extent to which such material can b e incorp orated may b e constrained by the av ailability of educators with the appropriate computational exp ertise. F rom there, teac hing can progress to ideas suc h as parallelisation, batc h processing, and scalable computa- tion. Alongside these technical considerations, it is also important to address ethical issues, including energy consumption, en vironmental impact, and bias in large-scale modelling. 3.2 Ho w can we impro ve the w a y w e teac h mo dern data As outlined ab o v e, modern data arise from a wide range of sources and are stored in a v ariety of formats. While these structures can initially app ear complex, many can b e read directly into R and pro cessed into forms suitable for statistical analysis. A suite of R pac k ages supp orts the transformation of mo dern data formats into standard data frame or tibble representations, expanding the range of data practices that can b e incorporated into statistics teaching. The follo wing discussion outlines sev eral common data structures and illustrates how they can b e handled within R, while noting that comparable workflo ws are also av ailable in Python. T ogether, these tools pro vide opp ortunities to extend statistics curricula to better reflect contemporary data acquisition and preparation practices. One of the most common mec hanisms for obtaining modern data is through application program- ming in terfaces (APIs), which allo w data to b e sen t and received via URL-based HTTP requests. In R, API access is supported by pac k ages suc h as httr2 ( Wic kham , 2023 ), with endp oin ts that ma y b e publicly accessible or require authen tication through API keys. Data returned from APIs are most often structured in hierarchical formats such as JSON or XML. Pac k ages including jsonlite ( Ooms , 2014 ) and xml2 ( Wickham et al. , 2023a ) pro vide straightforw ard tools for transforming these formats in to data frames or tibbles for subsequent analysis. Introducing API-based data access within the curriculum allows students to engage with real-w orld data sources reinforcing data wrangling skills. In cases where data are av ailable online but not obtainable through an API, web scraping provides an alternative means of collecting data. W eb scraping inv olv es automatically extracting information 8 from w eb pages and transforming it into structured datasets. In R, the rvest pack age ( Wic kham , 2022 ) supp orts the extraction of con ten t from HTML documents b y allowing users to identify and select specific elements, commonly referred to as no des, within a page. T o ols suc h as SelectorGadget from rv est can assist in iden tifying these elements, making the pro cess more in tuitiv e to learners, a voiding the need to directly in terpret HTML co de. F or websites that rely hea vily on dynamically loaded conten t, particularly b y utilising Ja v aScript, rvest also includes functionality to enable running a liv e browser session to retrieve such data dynamically . Similar approac hes exist in Python using dedicated web scraping and browser automation libraries. In tro ducing web scraping selectively within statistics programmes can exp ose students to less structured data sources, while also providing a natural context for discussing ethical, legal, and practical considerations when pro cessing data. T o incorp orate mo dern data tec hniques effectively within a statistics curriculum, a scaffolded ap- proac h is likely to b e most appropriate. Rather than treating these topics as stand-alone additions, elemen ts of mo dern data acquisition and pro cessing can b e w ov en throughout the programme. In the early stages of a degree, structured data remains a natural starting p oin t, supp orting the introduction of core statistical concepts alongside basic programming. As students’ computational skills dev elop, semi-structured data accessed through APIs can be introduced within modelling and programming courses, allowing learners to engage with the full pip eline from data acquisition to analysis. In later y ears, more complex data sources, including dynamic w eb conten t and cloud-based w orkflo ws, can b e explored within adv anced mo delling courses, shifting the emphasis from the use of to ols to the design of analytical frameworks. This staged approac h naturally distributes conten t across multiple mo dules, meaning that no single sp ecialist educator can cov er all asp ects; instead, educators may need to develop familiarity with related topics ov er time, supp orted through collab oration and knowledge sharing across teac hing teams. 3.3 Changes in co de-based go od practice and transparency Op en-source co ding practices are b ecoming an increasingly imp ortan t comp onen t of soft ware and mo del developmen t across many areas of statistics. Growing expectations around transparency , co de sharing, data a v ailability , and collaborative dev elopment hav e led to greater emphasis on go o d practice in how co de is written, managed, and shared. The sharing and repro ducibilit y of co de is now common in m an y journal publications, for example within the journals of the Roy al Statistical So ciety , and the use of v ersion control systems in research pro jects contin ues to expand, with evidence that this can enhance the relev ance and longevity of published work ( Kang et al. , 2023 ). V ersion con trol platforms suc h as GitHub, built on the Git system, provide widely adopted mechanisms for trac king changes, collab orating on co de, and managing research softw are. F amiliarity with such to ols is increasingly exp ected in industry-facing roles within statistics and data analytics, making exp osure to version con trol an imp ortan t consideration within statistics education. The effectiv e in tegration of version control into the statistics curriculum, how ever, requires a careful p edagogical design. V ersion con trol systems were originally dev elop ed for large-scale softw are engi- neering pro jects, and man y existing resources assume a background in computer science or soft ware dev elopment. F or students whose primary focus is statistics or data analysis, learning materials should therefore b e tailored to relev ant use cases, suc h as managing individual analysis pro jects, collaborat- ing on group-based coursework, or contributing to shared researc h co de. An additional consideration is the environmen t in which v ersion control is introduced. F or example, integrating v ersion control through an RStudio-based w orkflow can reduce cognitive load for b eginners b y em be dding these con- cepts within a familiar interface, although this ma y limit the applicabilit y of those skills to a narrow er range of future pro jects. Alternatively , in tro ducing version control directly through platforms such as GitHub Desktop may pro vide more broadly transferable skills, but can presen t a steeper initial learning curve. Decisions ab out which approach to adopt, and when, should therefore b e informed b y studen ts’ prior exp erience and the intended learning outcomes of the programme. Once in tro duced, incorp orating v ersion control within assessmen t can supp ort skill dev elopment b y giving students opp ortunities for authen tic practice. F or example, group-based data analysis pro jects can use version con trol to supp ort collab orativ e co ding, shared analysis, and p eer feedbac k throughout the pro ject lifecycle, including co de review via GitHub pull requests. Beyond technical skills, suc h approac hes can also enable more robust assessment design in the con text of generative AI, as discussed further in Section 5 . In online programming courses, for example, where access to generative AI to ols cannot be fully restricted, assessment of core concepts can b e supp orted through in teraction with 9 non-public co de libraries hosted on platforms such as GitHub, requiring students to engage directly with existing code and extend its functionality . P eer review activities can further reinforce practical exp erience with version con trol, code review, and collab orativ e dev elopment workflo ws while supp orting academic integrit y . 4 Incorp orating ML and AI in to Statistical Education The distinction b et ween statistics, ML, and AI is often unclear ( Bzdok et al. , 2018 ), with substantial o verlap betw een the resp ective fields. High-profile textb ooks frequently co ver material that spans m ultiple disciplines ( Bishop , 2006 ; Murphy , 2012 ). F or example, metho ds such as the LASSO and elastic net are often regarded as b oth statistical and ML techniques. Similarly , deep learning, typically classified under AI, incorp orates statistical regression mo dels as part of its core structure, y et is rarely considered part of statistics despite w ell-recognised connections ( Cheng and Titterington , 1994 ; White , 1989 ). While we do not seek to imp ose rigid disciplinary b oundaries, working definitions are necessary to pro vide context for discussion. In this article, we define AI as methods based on neural netw orks, and ML as mo delling approaches not traditionally included in statistics curricula, excluding neural net works. W e also av oid the common, but arguably misleading, distinction of statistics as causal and ML or AI as predictiv e, as many modern metho ds blur this divide ( Sch¨ olk opf , 2022 ). These definitions are inten tionally flexible, pro viding a coherent framework for discussion while allo wing educators to adapt them to their specific curricula and studen t needs. F rom an educational p erspective, giv en the ov erlap b etw een these fields, it can b e argued that attempting to rigidly define b oundaries b et ween statistics, ML, and AI is neither necessary nor partic- ularly helpful. An alternative approach is to presen t multiple p erspectives, allowing studen ts to dev elop a more nuanced understanding of how these areas relate to one another. In this view, ML and AI need not b e p ositioned as comp etitors to statistics, but rather as increasingly in tegral comp onen ts of many statistical roles. When included in the curriculum, they may be taught with a balanced emphasis on b oth strengths and limitations. The extent and depth of their inclusion will dep end on a programme’s fo cus, intended learning outcomes, and desired graduate attributes. F or example, programmes aimed at clinical statistics migh t include limited cov erage of ML and AI, with greater emphasis placed on their constraints and appropriate use cases ( Wilkinson et al. , 2020 ). In contrast, data science-oriented programmes are likely to require more comprehensiv e co v erage, alongside strong statistical foundations to help students av oid common methodological pitfalls ( Arnold et al. , 2020 ). Ultimately , the extent to whic h suc h conten t can b e incorporated dep ends on the av ailability of educators with the relev an t exp ertise to teac h it effectiv ely . 4.1 T eaching ML A cen tral aim of higher education is to prepare studen ts for further study or employmen t. F rom this p erspective, decisions ab out whether to include ML in a statistics curriculum should b e guided by lik ely graduate pathw a ys. F or studen ts pursuing careers inv olving substantial statistical work, such as academia, data science, or applied analytics, some exposure to ML is lik ely to b e important, as these metho ds increasingly form part of routine practice. By contrast, for students in disciplines outside the mathematical sciences, suc h as psychology or medicine, regular engagemen t with ML ma y be less common. In suc h cases, it ma y be reasonable to limit formal training in ML. How ev er, one could still argue for educating these students ab out the p oten tial risks of p oorly applied ML, particularly as the field grows in p opularity and diverse applications, enabling them to critically interpret research findings and a void common methodological pitfalls within their fields ( Wilkinson et al. , 2020 ). The widespread use of ML metho ds across applied disciplines, sometimes without sufficien t metho d- ological rigour, further supp orts the case for at least a conceptual in tro duction to their strengths and limitations ( Wilkinson et al. , 2020 ). Man y of these issues align closely with core statistical principles, including non-linear effects, o v erfitting, and mo del v alidation. F rom this standpoint, in tro ducing ML pro vides an opp ortunit y to reinforce existing statistical concepts, or at minimum to raise aw areness of where and wh y problems may arise. More broadly , w ell-designed ML analyses depend hea vily on sta- tistical reasoning ( Arnold et al. , 2020 ), including considerations of causalit y , uncertaint y , and critical ev aluation. Addressing these ideas explicitly can therefore offer a pragmatic wa y to situate ML within a statistics-led curriculum. 10 If ML is to b e incorp orated into a programme, careful though t is required as to how this should b e done. Unlik e the parallel teaching of R and Python discussed in Section 2.2.2 , statistics and ML should probably not be considered interc hangeable, despite sometimes being treated that wa y in practice. As a result, approaches such as the language switcher illustrated in Figure 1 are not directly applicable. One option is to integrate ML topics within existing courses where there is a natural conceptual o verlap, for example introducing Gaussian pro cesses within a non-linear mo delling course traditionally focused on teaching generalised additiv e mo dels. Alternatively , a standalone ML course ma y b e appropriate, pro vided it is clearly grounded in statistical concepts already cov ered elsewhere in the curriculum. T aken together, a combination of targeted integration and dedicated teac hing has the p oten tial to offer a balanced and flexible approac h. This may lead to course titles that challenge disciplinary distinctions, such as ‘ML Methods’, but suc h compromises reflect the broader challenges asso ciated with defining statistics, ML, and AI, and can also pro vide opp ortunities to open discussion with students ab out how these fields relate to one another. 4.2 T eaching AI When considering ML education, a natural question is whether AI metho ds should also be included. Deep learning approaches, which build on multi-la y er neural netw orks, are increasingly prev alen t in b oth academic researc h and industry , suggesting that some lev el of exposure ma y b e b eneficial for studen ts. This raises further questions ab out ho w AI should b e positioned within statistics curricula and how it relates to existing ML con tent. In practice, the teac hing of AI will often b e constrained b y staff exp ertise, although this ma y c hange as AI metho ds b ecome more em b edded within statistical researc h. A t an introductory lev el, studen ts can b e exp osed to deep learning metho ds for tasks suc h as classification and regression without engaging in detailed theoretical treatmen ts of underlying mechanisms suc h as bac kpropagation or automatic differen tiation. Demonstrating the practical use of these metho ds in R or Python could require only a high-lev el understanding of the mo dels and can b e incorp orated within existing ML courses. More adv anced cov erage of AI is possible, but presen ts additional challenges in terms of both staff exp ertise and studen ts’ technical backgrounds. T opics suc h as large language models and transformer arc hitectures, while increasingly prominent, would b e b est approached at an applied level if they w ere to b e included in statistics programmes. Similarly , more sp ecialised skills, including the use of Lin ux-based systems, graphical pro cessing units (GPUs), and parallel computing, might typically more appropriate as adv anced or supplementary topics, or as part of sp ecialist training outside the core statistics curriculum. In cases where deeper technical exp ertise in AI is required, it ma y b e more effectiv e for such training to be deliv ered in collab oration with, or delegated to, related disciplines suc h as computer science or engineering. 5 The emergence of Generativ e AI to ols Generativ e AI tools suc h as ChatGPT make rapidly adv ancing large language mo dels readily accessible through interactiv e interfaces, lo w ering the barrier to their use in educational and professional contexts. As a result, c hanges are already evident in how students approach learning new topics and engage with assessmen t, alongside indications that patterns of studen t engagemen t are shifting in sometimes unpredictable w ays ( P ardos and Bhandari , 2023 ). In resp onse, it is increasingly apparen t that learning and teaching practice m ust adapt to these dev elopmen ts and, in some capacity , engage constructiv ely with such to ols. A central question then b ecomes how, and to what extent, generativ e AI should b e addressed within statistics education. One option is to fo cus primarily on the risks asso ciated with these to ols, discour- aging or restricting their use. An alternative approach is to ackno wledge their widespread adoption and encourage students to use them responsibly , while explicitly teac hing their limitations, p oten tial biases, and failure modes. This latter approach raises further questions ab out what constitutes resp on- sible use, ho w such practices should b e incorp orated into teaching and assessment, and how educators can develop the expertise needed to support studen ts effectiv ely in this area. F rom the p erspective of graduate attributes, the growing reliance on generativ e AI to ols within industry suggests that some level of engagement is increasingly difficult to a v oid. Evidence also indi- cates that students are already using suc h to ols for a wide range of tasks ( F reeman , 2025 ), although 11 there is also some suggestion that educators ma y ov erestimate the exten t of student usage ( Lee et al. , 2024 ). In this context, it can b e argued that developing guidance on ho w to teach effectiv e generativ e AI usage is more constructive than attempting to prohibit its use outrigh t. This remains an active area of discussion within the literature, with several prop osed framew orks and guiding principles for the resp onsible in tegration of generativ e AI into higher education ( Nartey , 2024 ; Gro ve , 2025 ). F rom a learning and teaching persp ectiv e, the adoption of generative AI presents a n umber of chal- lenges. Assessmen t design is a particularly prominent concern, esp ecially as students rapidly adopt these tec hnologies across academia. Equally imp ortan t is ho w students are supp orted in developing effectiv e and critical approac hes to using generativ e AI. T eac hing staff also require support in identi- fying p oten tial misuse, understanding how these to ols can inform their o wn practice, and dev eloping confidence in guiding studen ts tow ards responsible use. The follo wing sections explore sp ecific c hal- lenges posed by generative AI to ols and consider their implications for statistics education in greater depth. 5.1 T eaching ab out Generativ e AI to ols T eaching around generative AI presents a num ber of challenges, not least b ecause the technology itself is ev olving rapidly . Un til relativ ely recently , effective use of generative AI to ols often required a detailed understanding of prompt engineering, that is, the pro cess b y whic h instructions are form ulated to elicit useful resp onses from a generativ e mo del. Increasingly , ho w ever, these to ols are able to generate plausible outputs in resp onse to less precise or carefully structured queries ( W ang et al. , 2024 ). F rom an educational p ersp ectiv e, this ever changing landscap e creates difficulties in maintaining up-to- date teaching materials, particularly within institutional con texts where educators manage a range of comp eting resp onsibilities. Despite these to ols b ecoming easier to use, it remains clear that there are still b oth effective and ineffectiv e w a ys of engaging with them. Studen ts therefore need to b e supp orted in dev eloping an understanding of how generativ e AI to ols function, including what they can and cannot do reliably . In a statistics context, this may b e most effectively achiev ed by embedding discussion of generative AI within statistical or ML analyses, recognising that suc h to ols may b e effectiv e for certain comp onents of the w orkflow while less appropriate for others. F or example, to ols ma y b e well suited to supp orting co ding tasks or producing visualisations, while b eing inappropriate for more substan tive decisions suc h as co v ariate selection or mo del choice. Determining how b est to con vey these distinctions, and ho w m uch emphasis to place on them, represen ts an ongoing challenge across the sector. Bey ond tec hnical use, the adoption of generative AI raises broader considerations that are also relev ant to statistics education. Univ ersities are increasingly dev eloping formal guidance on ho w suc h to ols ma y b e used and how their use should b e declared. While it is necessary for educators to comm unicate institutional p olicies, there is also a case for addressing these issues at a more general lev el. Students who are not encouraged to reflect on how and why they use generative AI to ols are unlik ely to consider the wider implications of their use. These broader considerations include the en vironmental impact of generativ e AI systems ( Rillig et al. , 2023 ), ethical issues such as transparency , bias, and accountabilit y ( Hagendorff , 2024 ), and ongoing concerns around the use of copyrigh ted material in training data and generated outputs ( Buick , 2025 ), all of which universities ma y reasonably b e exp ected to address within higher education. T aken together, these considerations raise the question of where and ho w education ab out gener- ativ e AI should b e situated within a statistics programme. A stand-alone course fo cused solely on generativ e AI, distinct from foundational AI or ML conten t, more extensive than is currently neces- sary . At the same time, ignoring these tools en tirely is no longer seems a viable option. Em b edding discussion of generative AI throughout all courses, b ey ond comm unicating institutional rules, ma y also place unreasonable demands on b oth studen ts and staff. A more pragmatic approac h ma y b e to situate education ab out generative AI within skills-based mo dules, in a manner comparable to ho w comm unication, presen tation, and consultancy skills are typically addressed within statistics education. 5.2 Studen t usage of Generativ e AI to ols The extent to which students should use generative AI to ols, and whether such use ultimately b enefits their education, remains an op en question. A growing bo dy of literature has b egun to explore this issue in greater depth than is p ossible here, e.g., Lehmann et al. ( 2024 ), but existing studies rep ort 12 a wide range of outcomes and v ary substantially in scop e, metho dology , and research design. Even where p ositiv e effects hav e b een observed, it remains unclear how far these findings can b e generalised. In particular, it is difficult to assess the long-term implications of students completing an entire degree programme in sustained conjunction with generative AI tools. F rom an educational p ersp ectiv e, this uncertain t y mak es it c hallenging to determine where and ho w the use of generativ e AI is p edagogically appropriate. These decisions m ust b e made not only at the programme lev el but also within individual mo dules and learning activities. In statistics education, for example, there are emerging case studies that report promising outcomes from the use of generativ e AI to ols ( Al Labadi and Ly , 2025 ). A t the same time, there is concern that sustained reliance on such to ols, particularly as their use expands across multiple courses, ma y contribute to a form of apath y in whic h students b ecome less engaged in the effortful reasoning needed to develop statistical in tuition, p oten tially w eakening long-term conceptual understanding ( F an et al. , 2025 ). How ev er, comprehensive comparisons across a broad range of implemen tations, course formats, and learning con texts are not y et a v ailable, and the cumulativ e impact of rep eated exp osure across a degree programme remains largely unstudied, reflecting the fact that generative AI tools ha ve only recen tly b ecome widely adopted. A further consideration is whether generative AI agen ts should b e in tegrated directly into educa- tional platforms. Suc h in tegration could represen t a substantial shift in how education is delivered, offering the p ossibilit y of highly accessible, resp onsive AI-based tutors that help bridge gaps in un- derstanding and provide more immediate supp ort than is t ypically feasible for academic staff. A t the same time, closer in tegration raises concerns ab out the propagation of incorrect information and the risk of increased ov er-reliance on automated systems. Given that man y students are already using generativ e AI to ols informally , it remains unclear whether tigh ter institutional integration represen ts a natural progression or a step to wards even greater ov er-reliance on these tec hnologies. 5.3 Assessmen t, marking, and feedbac k The rapid rise of generative AI to ols has created significant challenges for ass essmen t across m uc h of higher education. Evidence suggests that man y existing assessment formats are vulnerable and ma y require relatively rapid adaptation ( Newton , 2025 ), with certain types of assessment app earing particularly exp osed to misuse ( Gro ve , 2024 ). A commonly proposed response is a return to invigi- lated, on-campus examinations. Ho w ever, such approac hes are not univ ersally welcomed within higher education and ha ve well-documented strengths and limitations in terms of assessing student learning ( Buc kley , 2024 ). As a result, there has been gro wing in terest in using the emergence of generativ e AI as an opp ortunit y to reconsider broader principles of assessment design and to establish new standards of go od practice ( Grov e , 2024 ), although it remains unclear what such assessmen ts should lo ok like in practice. In terestingly , generativ e AI to ols ma y also present opp ortunities within assessment, particularly from the p erspective of efficiency and consistency . Some educators hav e b egun to explore the use of generativ e AI in the creation of assessment materials, with emerging guidance on how this might b e done effectively in the context of statistics education ( Gordon et al. , 2026 ). While this offers clear adv antages in terms of efficiency of time managemen t, it also raises concerns around accuracy , including the p oten tial for hallucinated conten t or insufficien t alignmen t with course-sp ecific con text. More adv anced p ossibilities include the use of fine-tuned mo dels trained on course materials, which ma y offer greater contextual aw areness. In addition, while caution is clearly required when using generativ e AI as a conten t-generation to ol, there is also potential v alue in emplo ying these systems as supplemen tary c hecks on assessment materials, for example to iden tify ob vious errors or am biguities prior to release. Opp ortunities also exist in the areas of marking and feedbac k. A num ber of frameworks hav e b een proposed to supp ort the use of generativ e AI as a marking aid ( Safilian et al. , 2025 ), including w ork fo cused sp ecifically on statistics education ( Iliev a et al. , 2025 ). Given the quantitativ e and computational nature of the discipline, there may b e particular scop e for supp orting the assessment of co de-based w ork. Generativ e AI to ols ma y also enable streamlining pro cesses to pro vide more p ersonalised feedback in contexts where this would otherwise b e impractical, such as in large classes. A t the same time, concerns remain around trust, transparency , and p erceiv ed fairness. If students or staff lose confidence in the reliabilit y of AI-supp orted marking and feedback, there is a risk of undermining the educational pro cess, particularly in settings where the financial and p ersonal costs of higher education are substantial. 13 6 Summary and Discussion This article has examined recent tec hnological developmen ts and considered their implications for con- temp orary statistics education. In particular, it has explored how changes in programming practice ha ve shap ed teac hing, from the emergence of R and its transformative impact on statistical computing to the growing need to consider ho w Python may be in tegrated into certain statistics curricula, par- ticularly those with a focus on data science or ML. Approac hes for teac hing R and Python in parallel ha ve also been discussed as a pragmatic response to mo dern analytical practice. The paper has further considered the influence of mo dern data sources and co ding practices on statistics education. As data t yp es b ecome more diverse and complex, there is a gro wing need for curricula to adapt accordingly . Rather than isolating these topics within standalone mo dules, it has b een argued that mo dern data handling practices should, where p ossible, b e in tegrated throughout degree programmes. Related to this is the increasing imp ortance of version control, driv en b oth b y industry exp ectations and the need to develop appropriate graduate attributes within statistics education. The integration of ML and AI into statistics curricula has also b een examined. Given the stated aim of higher education to prepare students for future academic and professional path wa ys, it has b een argued that decisions about inclusion should b e guided by the likely destinations of graduates. A foundational understanding of ML, particularly its strengths and limitations, appears increasingly imp ortan t across many statistics programmes. Broader or more adv anced cov erage of AI, defined here as metho ds based on neural netw orks, should b e contingen t on the av ailability of appropriate exp ertise and their relev ance to future careers or further study . In cases where more sp ecialised AI training is required, this may b e b etter delivered by related disciplines or considered b ey ond the scop e of a statistics programme. Finally , the pap er has discussed the growing impact of generativ e AI to ols, such as ChatGPT, on statistical learning and assessmen t. These to ols are already influencing ho w students engage with learning and assessment, and this influence is likely to contin ue irresp ectiv e of whether institutions activ ely embrace them. F rom the p erspective of graduate attributes, and given evidence that many studen ts are already using generative AI to ols ( F reeman , 2025 ), there is a strong case for providing structured education ab out their use. This includes not only discussion of risks and limitations, but also guidance on resp onsible and effectiv e use. How best to ac hiev e this remains an op en question, although situating suc h conten t within skills-fo cused mo dules has b een prop osed as a pragmatic approach. Wider implications for assessmen t and student learning ha v e also b een highlighted. T aken together, these discussions underline that statistics education has a long history of adapting to metho dological and tec hnological c hange. Educators now face the c hallenge of resp onding to rapid dev elopments in analytical methods, including ML and AI, ev olving data types such as text, images, and w eb-based sources, and c hanging expectations around coding skills, including Python, version con trol, and cloud-based computation. Determining how these elements should b e incorp orated will necessarily dep end on the sp ecific curriculum, student cohort, and programme aims. In this context, a div ersity of skills across teac hing teams b ecomes increasingly v aluable, helping to ensure that exp ertise is distributed rather than concentrated in a small num b er of individuals. Supp orting educators in finding time and opportunities to upskill is therefore essen tial. Researc h activit y can pla y an imp ortan t role in this process, but professional dev elopment, collab oration, and shared teaching practices are equally imp ortant for sustaining long-term curriculum dev elopment. In conclusion, statistics education is undergoing significant c hange, bringing c hallenges for studen ts, educators, and institutions alike. Keeping pace with ev olving technologies and metho ds requires not only thoughtful curriculum design but also contin ued reflection on how statistical thinking, compu- tational skills, and emerging tools can co exist within coherent educational frameworks. Rather than represen ting a departure from the discipline’s foundations, these developmen ts highlight the ongoing need for adaptable, critically informed teac hing that prepares students for a rapidly changing analytical landscap e. Orcids Craig Alexander - https://orcid.org/0000- 0001- 6734- 747X Jennifer Gaskell - https://orcid.org/0000- 0001- 9583- 323X 14 Vinn y Da vies - https://orcid.org/0000- 0003- 1896- 8936 References M. Abadi, A. Agarwal, P . Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Da vis, J. Dean, M. Devin, S. Ghemaw at, I. Go o dfello w, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefo wicz, L. Kaiser, M. Kudlur, J. Lev enberg, D. Man ´ e, R. Monga, S. Mo ore, D. Murra y , C. Olah, M. Sch uster, J. Shlens, B. Steiner, I. Sutskev er, K. T alw ar, P . T uck er, V. V anhouck e, V. V asudev an, F. Vi ´ egas, O. Vin yals, P . W arden, M. W attenberg, M. Wick e, Y. Y u, and X. Zheng. T ensorFlow: Large-scale machine learning on heterogeneous systems, 2015. URL http://tensorflow.org/ . Soft ware av ailable from tensorflo w.org. L. Al Labadi and A. Ly . Enhancing statistics education through Pro ject-Based Learning (PBL) and the emergence of ChatGPT. T e aching Statistics , 47(3):200–218, 2025. doi: https://doi.org/10.1111/ test.12405. J. Allaire and C. Dervieux. quarto: R Interfac e to ‘Quarto’ Markdown Publishing System , 2025. URL https://CRAN.R- project.org/package=quarto . R pack age v ersion 1.5.1. J. Allaire, Y. Xie, C. Dervieux, J. McPherson, J. Luraschi, K. Ushey , A. A tkins, H. Wickham, J. Cheng, W. Chang, and R. Iannone. rmarkdown: Dynamic Do cuments for R , 2024. URL https://github. com/rstudio/rmarkdown . R pac k age v ersion 2.29. K. F. Arnold, V. Da vies, M. de Kamps, P . W. T ennant, J. Mb otw a, and M. S. Gilthorpe. Reflection on mo dern metho ds: generalized linear mo dels for prognosis and interv ention—theory , practice and implications for mac hine learning. International journal of epidemiolo gy , 49(6):2074–2082, 2020. C. M. Bishop. Pattern r e c o gnition and machine le arning , v olume 4. Springer, 2006. M. Bond, K. Bun tins, S. Bedenlier, O. Zaw acki-Ric h ter, and M. Kerres. Mapping research in studen t engagemen t and educational tec hnology in higher education: A systematic evidence map. Interna- tional journal of e duc ational te chnolo gy in higher e duc ation , 17:1–30, 2020. J. Bradbury , R. F rostig, P . Ha wkins, M. J. Johnson, C. Leary , D. Maclaurin, G. Necula, A. P aszke, J. V anderPlas, S. W anderman-Milne, and Q. Zhang. JAX: comp osable transformations of Python+NumPy programs, 2018. URL http://github.com/jax- ml/jax . A. Buckley . Are we answering the question that has b een set? exploring the gap b et ween researc h and practice around examinations in higher education. Studies in Higher Educ ation , 49(11):1928–1944, 2024. A. Buick. Copyrigh t and AI training data—transparency to the rescue? Journal of Intel le ctual Pr op erty L aw and Pr actic e , 20(3):182–192, 2025. D. Bzdok, N. Altman, and M. Krzywinski. Statistics v ersus machine learning. Nat Metho ds , 15(4): 233, 2018. doi: 10.1038/nmeth.4642. K. Carscadden and A. Martin. T o tidy or not when teac hing r skills in biology classes. International Journal of Higher Educ ation , 11(5):39–50, 2022. M. Cetink ay a-Rundel, J. Hardin, B. Baumer, A. McNamara, N. Horton, and C. Rundel. An educator’s p erspective of the tidyverse. T e chnolo gy Innovations in Statistics Educ ation , 14(1), 2022. doi: 10.5070/t514154352. W. Chang, J. Cheng, J. Allaire, C. Sievert, B. Sc hlo erk e, Y. Xie, J. Allen, J. McPherson, A. Dip ert, and B. Borges. shiny: Web Applic ation F r amework for R , 2025. URL https://CRAN.R- project. org/package=shiny . R pac k age v ersion 1.11.1. B. Cheng and D. M. Titterington. Neural net works: A review from a statistical p ersp ectiv e. Statistic al scienc e , pages 2–30, 1994. 15 P . J. Diggle. Statistics: a data science for the 21st century . Journal of the R oyal Statistic al So ciety Series A: Statistics in So ciety , 178(4):793–813, 2015. L. F an, K. Deng, and F. Liu. Educational impacts of generative artificial intelligence on learning and p erformance of engineering students in China. Scientific R ep orts , 15(1):26521, 2025. doi: 10.1038/s41598- 025- 06930- w . L. F aw cett. Using in teractive shin y applications to facilitate researc h-informed learning and teac hing. Journal of Statistics Educ ation , 26(1):2–16, 2018. J. F reeman. Student gener ative AI survey 2025 . Higher Education Policy Institute: London, UK, 2025. E. Gordon, J. Chimeli, L. Keny o, and R. F rost. F rom ai to ta: How to use c hatgpt to quickly create statistics and analytics assessmen ts. T e aching Statistics , 48(1):19–32, 2026. doi: h ttps: //doi.org/10.1111/test.70013. A. Grani´ c. Educational technology adoption: A systematic review. Educ ation and Information T e ch- nolo gies , 27(7):9725–9744, 2022. M. Gro v e. Generative ai tec hnologies and their role within assessment design. Educ ation in Pr actic e , 5(1):15–35, 2024. M. Grov e. Designing the Student Learning Journey: A Practical Approach to In tegrating Generativ e AI within Higher Education. MSOR Conne ctions , 24(1), 2025. T. Hagendorff. Mapping the ethics of generativ e ai: A comprehensiv e scoping review. Minds and Machines , 34(4), Sept. 2024. ISSN 1572-8641. doi: 10.1007/s11023- 024- 09694- w . URL http: //dx.doi.org/10.1007/s11023- 024- 09694- w . J. Hardin, R. Ho erl, N. J. Horton, D. Nolan, B. Baumer, O. Hall-Holt, P . Murrell, R. P eng, P . Roback, D. T emple Lang, et al. Data science in statistics curricula: Preparing students to “think with data”. The A meric an Statistician , 69(4):343–353, 2015. J. D. Hunter. Matplotlib: A 2d graphics environmen t. Computing in Scienc e & Engine ering , 9(3): 90–95, 2007. doi: 10.1109/MCSE.2007.55. G. Iliev a, T. Y anko v a, M. Rusev a, and S. Kabaiv anov. A F ramework for Generativ e AI-Driven Assess- men t in Higher Education. Information , 16(6):472, 2025. doi: 10.3390/info16060472. K. Imai and N. W. Williams. Quantitative So cial Scienc e: A n Intr o duction in Tidyverse . Princeton Univ ersity Press, 2022. E. Jac k, C. Alexander, D. McArth ur, and C. Mair. Reflections on designing and deliv ering an online distance learning programme in the mathematical sciences. MSOR Conne ctions , 21(2):25–33, 2023. doi: 10.21100/msor.v21i2.1397. A. L. Johnson. T eaching Creative Problem Solving and Applied Reasoning Skills: A Mo dular Ap- proac h. Cal. WL R ev. , 34:389, 1997. D. Kang, T. Kang, and J. Jang. Papers with co de or without code? impact of github rep ository usabilit y on the diffusion of machine learning research. Information Pr o c essing & Management , 60 (6):103477, 2023. R. S. Kenett, S. Zacks, and P . Gedeck. Mo dern statistics: a c omputer-b ase d appr o ach with python . Springer, 2022. D. Lee, M. Arnold, A. Sriv astav a, K. Plasto w, P . Strelan, F. Plo ec kl, D. Lekk as, and E. Palmer. The impact of generative ai on higher education learning and teaching: A study of educators’ p ersp ectiv es. Computers and Educ ation: A rtificial Intel ligenc e , 6:100221, 2024. M. Lehmann, P . B. Cornelius, and F. J. Sting. Ai meets the classro om: When do es chatgpt harm learning. arXiv pr eprint arXiv:2409.09047 , 2024. 16 K. P . Murph y . Machine le arning: a pr ob abilistic p ersp e ctive . MIT press, 2012. K. I. Musa, W. N. A. W. Mansor, and T. M. Hanis. Data Analysis in Me dicine and He alth Using R . CR C Press, 2023. E. K. Nartey . Guiding principles of generative ai for employ ability and learning in uk univ ersities. Co gent e duc ation , 11(1):2357898, 2024. P . M. Newton. How vulnerable are uk univ ersities to cheating with new genai tools? a pragmatic risk assessmen t. Assessment & Evaluation in Higher Educ ation , pages 1–12, 2025. J. Ooms. The jsonlite Pac k age: A Practical and Consistent Mapping Bet ween JSON Data and R Ob jects. arXiv:1403.2805 [stat.CO] , 2014. URL . C. Ozgur, T. Colliau, G. Rogers, Z. Hughes, et al. Matlab vs. Python vs. R. Journal of data Scienc e , 15(3):355–371, 2017. P andas. pandas-dev/pandas: P andas, F eb. 2020. URL https://doi.org/10.5281/zenodo.3509134 . Z. A. Pardos and S. Bhandari. Learning gain differences betw een chatgpt and h uman tutor generated algebra hints. arXiv pr eprint arXiv:2302.06871 , 2023. A. P aszke, S. Gross, F. Massa, A. Lerer, J. Bradbury , G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. An tiga, A. Desmaison, A. Kopf, E. Y ang, Z. DeVito, M. Raison, A. T ejani, S. Chilamkurthy , B. Steiner, L. F ang, J. Bai, and S. Chin tala. Pytorch: An imp erativ e st yle, high-p erformance deep learning library . In A dvanc es in Neur al Information Pr o c essing Systems , volume 32. Curran Asso ciates, Inc., 2019. URL https://proceedings.neurips.cc/paper_files/paper/2019/file/ bdbca288fee7f92f2bfa9f7012727740- Paper.pdf . F. P edregosa, G. V aro quaux, A. Gramfort, V. Mic hel, B. Thirion, O. Grisel, M. Blondel, P . Pretten- hofer, R. W eiss, V. Dub ourg, et al. Scikit-learn: Machine learning in p ython. Journal of Machine L e arning R ese ar ch , 12:2825–2830, 2011. Plotnine. plotnine: A grammar of graphics for python, 2026. URL https://github.com/has2k1/ plotnine . G. Potter, J. W ong, I. Alcaraz, P . Chi, et al. W eb application teac hing tools for statistics using r and shin y . T e chnolo gy Innovations in Statistics Educ ation , 9(1), 2016. Pymgcv. pymgcv: Generalized additiv e mo dels in p ython, 2026. URL https://smoothforge.github. io/pymgcv/ . M. C. Rillig, M. ˚ Agerstrand, M. Bi, K. A. Gould, and U. Sauerland. Risks and b enefits of large language mo dels for the environmen t. Envir onmental scienc e & te chnolo gy , 57(9):3464–3466, 2023. C. Ryan. Data Scienc e with R for Psycholo gists and He althc ar e Pr ofessionals . CR C Press, 2021. M. Safilian, A. Beheshti, and S. Elb ourn. Ratas framework: A comprehensiv e genai-based approach to rubric-based marking of real-w orld textual exams. arXiv pr eprint arXiv:2505.23818 , 2025. B. Sch¨ olk opf. Causality for Machine L e arning . ACM, F eb. 2022. ISBN 9781450395861. doi: 10.1145/ 3501714.3501755. URL http://dx.doi.org/10.1145/3501714.3501755 . D. Serv´ en and C. Brummitt. p ygam: Generalized additive mo dels in p ython, 2018. K. Sudhak a. Python vs. R programming language. International Journal of Management, IT and Engine ering , 8(8), 2018. G. W ang, Z. Sun, S. Y e, Z. Gong, Y. Chen, Y. Zhao, Q. Liang, and D. Hao. Do adv anced language mo dels eliminate the need for prompt engineering in soft ware engineering? A CM T r ansactions on Softwar e Engine ering and Metho dolo gy , 2024. H. White. Learning in artificial neural netw orks: A statistical p erspective. Neur al c omputation , 1(4): 425–464, 1989. 17 H. Wickham. ggplot2. WIREs Computational Statistics , 3(2):180–185, 2011. doi: h ttps://doi.org/10. 1002/wics.147. H. Wickham. Tidy data. Journal of statistic al softwar e , 59:1–23, 2014. doi: 10.18637/jss.v059.i10. H. Wic kham. ggplot2:Ele gant Gr aphics for Data Analysis . Springer-V erlag New Y ork, 2016. ISBN 978-3-319-24277-4. URL https://ggplot2.tidyverse.org . H. Wic kham. rvest: Easily Harvest (Scr ap e) Web Pages , 2022. URL https://CRAN.R- project.org/ package=rvest . R pac k age v ersion 1.0.3. H. Wickham. httr2: Perform HTTP R e quests and Pr o c ess the R esp onses , 2023. URL https://CRAN. R- project.org/package=httr2 . R pack age v ersion 0.2.3. H. Wic kham, M. Averic k, J. Bry an, W. Chang, L. D. McGo wan, R. F ran¸ cois, G. Grolem und, A. Hay es, L. Henry , J. Hester, M. Kuhn, T. L. P edersen, E. Miller, S. M. Bac he, K. M¨ uller, J. Ooms, D. Robin- son, D. P . Seidel, V. Spinu, K. T ak ahashi, D. V aughan, C. Wilke, K. W o o, and H. Y utani. W elcome to the tidyv erse. Journal of Op en Sour c e Softwar e , 4(43):1686, 2019. doi: 10.21105/joss.01686. H. Wickham, J. Hester, and J. Ooms. xml2: Parse XML , 2023a. URL https://CRAN.R- project. org/package=xml2 . R pac k age v ersion 1.3.6. H. Wic kham, M. C ¸ etink ay a Rundel, and G. Grolemund. R for Data Scienc e: Imp ort, Tidy, T r ansform, Visualize, and Mo del Data . O’Reilly Media, 2nd edition, 2023b. URL https://r4ds.hadley.nz . J. Wilkinson, K. F. Arnold, E. J. Murray , M. v an Smeden, K. Carr, R. Sipp y , M. de Kamps, A. Beam, S. Konigorski, C. Lipp ert, et al. Time to realit y chec k the promises of mac hine learning-p o wered precision medicine. The L anc et Digital He alth , 2(12):e677–e680, 2020. S. W o od. Gener alize d A dditive Mo dels: An Intr o duction with R . Chapman and Hall/CR C, 2 edition, 2017. 18
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment