On the State of Computing in Statistics Education: Tools for Learning and for Doing

On the State of Computing in Statisti cs Education: T o ols for Learni ng and for Doing Amelia McNamara Smith College 1. INTR ODUCTION When Rolf Biehler wrote his 199 7 pap er, “Soft ware for Learning and for Doing Statistics,” some edu- cators were already using computers in their int ro ductory stat istics classes ( Biehler 19 97 ). Biehler’s pap er laid o ut su ggestions for th e conscious impro v ement of softw are for statistics, and m uch of his vision has been realized. But, computers hav e improv ed tremendously sin ce 1997, and statistics and data hav e c hanged along with them. It has b ecome normativ e to use computers in stat istics courses– the 2005 Guidelines for Assessmen t and Instruction in S tatistics Education (GAISE ) college r ep ort suggested that stu dents “Use real data” and “Use tec hnology for dev eloping conceptual u nderstandin g and analyzing data” ( Aliaga et al. 2005 ). In 2010 , Deb No lan and Duncan T emple Lang argued studen ts sh ould “Compute w ith data in the practice of statistics” ( Nolan and T emple Lang 2010 ). Indeed, m uc h of the hyp e around ‘data science’ seems to b e cen tering around compu tational statistics. Donoho’s vision of ‘Greater Data Science’ is almost entirely dep endent on compu tation ( Donoho 2015 ). The 2016 GAIS E college rep ort u p dated their r ecommendations to “In tegrate r eal data with a cont ext and purp ose” and “Use tec hnology to explore concepts and an alyze d ata” ( Carver et al. 2016 ). So, it is clear that we need to b e engaging stu dents in s tatistica l computing. But how? T o b egin, w e should consider how computers are currently b eing used in stati stics classes. O nce we hav e a solid foundation of the current state of computation in statistics education, we can begin to dream ab out the fu ture. 1.1. The Gap Betw een T ools for Learning and Doing When thin k in g about statistical compu tation in edu cation, we are considering the transition from a no vice to a m ore exp erienced stud ent. F or the pur p oses of th is p ap er , we fo cus pr imarily on introductory statistics cour s es at th e college level. Ho wev er, it sh ould b e n oted th at a no vice is a novice. Whether a stud ent is in high s chool, college, or gradu ate school, if they hav e n ot seen statistics or statistical compu tation b efore, th ey w ill exp erience a similar learnin g cur ve . T ools that get used in sta tistics co urses ca n be brok en into t wo cate gories: tools for learning statistics, and to ols for doing statistics ( Baglin 2013 ; McNamara 2015 ). Bo th types of tools are used to introduce novice s to statistical computing, and of course, they h av e their adv an tage s and disadv a ntag es. T ools designed for lea rning statisti cs are generally n ot go od for actually p erf orm ing data analysis, and to ols for pr ofessionals tend to b e hard to learn. This distinction b et ween t yp es of tools has b een longstanding, and is often quite d ivisiv e. In h is history of statistical computing, Jan De Leeu w explicitly states he is only concerned with “statistical soft ware” and not “soft ware for statistics” ( De L eeuw 2009 ) w hile Biehler ( 1997 ) fo cuses only on novic es’ abilit y to grasp a to ol with minimal instr u ction. Whatev er your p ersp ectiv e, it is clear there is gap b et ween these two t yp es of to ols. T he tension b etw een learning and doing has b een discussed in the past ( Baglin 2013 ; McNamara 2015 ), but research into the transition betw een the t wo sta tes is still nascent. 1.2. Bridging the Gap The argument is often made that the t wo types of tools should b e kept separate ( Biehler 1997 ; Konold 2007 ; F riel 2008 ). In particular, K onold says tools f or learning statistics should not b e stripp ed do wn v ersions of p rofessional tools for doing statistics. Instead, they should be developed with a b ottom-up p er s p ectiv e, th inking ab out what features no vices need to build their u nderstand- ings ( Konold 2007 ). Ho we ver, we take the p ersp ective that the gap must b e either bridged or closed. In considerin g how to close the gap, we will keep in min d how to ols can build n ovi ces’ und erstanding from the ground u p, but we will aim to end the siloing of tools for le arning and for doing statistics. There are many approac hes that could b e used to do this. F or example, curr icular r esources making explicit reference to a prior to ol and couc hing tasks in terminology fr om th e previous s y s tem might mak e the transition b etw een technology easier. Like wise, providing some sort of ‘ramp ing up’ as users reac h the end of the abilities of the learning to ol and ‘ramping down’ at the b eginning of the to ol for doing could make the gap feel less abrupt. W e could also imagine a new type of to ol, bridging from a supp ortiv e tool for le arning to an expressive tool for doing. Th is tool could be used by n ovi ces and exp erts alik e. Ho wev er, there are few mo dels of softw are in other domains that manage this b alance. Considering the barriers (both technologi cal and ph ilosophical), an easie r wa y to imag ine br idging the gap is to consider tools for learning and for d oing statistics ‘r eac hing across’ the gap, aiming to tak e goo d qualities from th eir counterpart across the divide. 1.3. Ev aluating curren tly-existing to ols In ord er to b egin thinking about how to dev elop n ew to ols or improv e existing ones, w e must consider the stren gths and weaknesses of the currently-e xisting landscap e of tools. McNamara ( 2016 ) pro vides a critical f ramewo rk thr ough whic h to ev aluate tools, la ying out a set of 10 attributes necessary for a mo dern s tatistical compu ting tool. These attributes are 1. Acc essibilit y 2. Easy en try for novic e users 3. Da ta as a ﬁrst-order p ersistent ob jec t 4. Supp ort for a cycle of exploratory and conﬁr matory analysis 5. Fle xible plot creatio n 6. Supp ort for randomization throughout 7. In teractivit y at ev ery leve l 8. Inherent do cumentatio n 9. Simple supp ort for narrative, p ublishing, and repro ducibility 10. Flexibility to build extensions As we consid er tools, these attributes will b e invok ed. 1.4. T yp es of Curre ntly-Existing T o ols In this pap er, we will attempt to consider the current landscap e of cu r rently-a v ailable statistical tools, from the pr osaic (Excel) to the besp oke (W rangler, Lyra). Keeping in mind th e gap betw een tools for learning and to ols for d oing statistics, and the attributes listed in Section 1.3 , we will attempt to assess the ﬁ eld of existing tec hnology for learning and d oing statistics. In statistics education, a d istinction is often made b etw een route-t yp e and landscap e-t yp e to ols for statistical computing ( Bakk er 2002 ). Rou te-type tools drive the tra jectory of learning, wh ile landscap e-t yp e tools are less dir ective, and allow us ers to explore the ﬁeld of p ossibilities them- selv es ( Bakker 2002 ). W e will consider b oth route-t yp e tools (e.g. app lets, w h ic h only allo w for one concept to b e explored) and landscap e-type tools (e.g., standalone educational soft wa re and statistica l programming tools). Dra wing on the categories of technology p rop osed by the GAISE rep orts ( Aliaga et al. 2005 ; Carver et al. 2016 ) and the distinctions oﬀered by Biehler ( Biehler 1997 ), we will consider the follo wing types of technolog y: • Graphing calculators • Spreadsh eets • Applets and microw orlds • Standalone educational softw are • Statistical programming to ols • T o ols for repro ducible research • Besp oke tools 2. GRAPHING CALC ULA TORS In statistics, there are some educators who b elieve the mathemati cal un derpinn ings of statistics are suﬃcient to pro vide novices with an in tuitiv e understanding of th e discipline. Because of this, they do not ﬁn d it imp erative that statistics edu cation be accompanied by computation. One example to consider is the Adv anced Placement S tatistics cours e and asso ciated exam. Th e AP Statistics teac her guid e states “students are exp ected to use tec hnological tools th roughout the course,” but goes on to say technolog ical tools in this conte xt are p rimarily graphin g calculators ( Legacy 20 08 ). Instead of building in computer w ork, the g uide suggests exposing students to computer output s o they can learn to interpret it. In this paradigm, students lea rn basic concepts ab out samplin g, distr ibutions, and v a riabilit y , and wo rk through formulas b y hand. Th ey use calculators to assist with th eir arith m etic calculatio ns. Calculators sh ould not b e considered app ropriate tools for statistical compu tation. In particular, they make it imp ossible to work with real d ata, to say nothing of p r o viding data ‘as a ﬁr st-order p ersistent ob ject’ ( McNamara 2016 ). Ad ditionally , the analysis that is prod uced is not repro du cible, and the ‘computation’ d oes not h elp students develo p a d eeper under s tanding of the under lyin g concepts. 3. SPREADSHEETS Spreadsheet soft wa re like Excel are pr obably the most commonly used tools to do data analysis by p eople across a b road swath of use-cases ( Brya n 2016 ). Wh ile the Microsoft O ﬃ ce S uite that con tains Excel can b e expensive, the free av ailabilit y of s p readsheet tools like Google Sp readsheets and the Op en Oﬃce an alogue of Excel , S heets, means spr eadsh eets can b e considered to satisfy the attribute of accessibilit y ( McNamara 2016 ). Ho we ver, spreadsh eets lac k the functionality to b e a tru e to ol for statistical pr ogramming. They t ypically allow f or only limited scripting, which means th eir capabilitie s are limited to th ose b u ilt in b y the developmen t co mpany . The lock ed-in nature of th e functionality means they are only ab le to pr o vide a limited number of analysis an d visualization method s , and cann ot b e ﬂexible enough to allo w for true crea tivit y . They fail at providing ‘ﬂexible plot creation,’ or the ‘ﬂexibilit y to b uild extensions’ ( McNamara 2016 ). Bey ond this, several high proﬁle cases of academic pap er retraction hav e b een b ased on inte rnal errors w ithin Excel ( Herndon et al. 2013 ). Because the underlyin g co de is closed-source, Excel do es not allo w users to view h o w metho ds are implemented, which means it is very diﬃcu lt for an ind ivid ual to assess the v alidit y of th e internal code. Some dedicated researc hers hav e tested Excel’s statistical v ali dity o ver every soft ware version Microsoft has released. Not only is every ve rsion ﬂaw ed, bu t even with sp eciﬁc attentio n shed on th e pr oblem, Microsoft often either fails to repair the problem, or mak es a c hange to another ﬂa w ed v ersion ( McCullough and Heiser 2008 ; Melard 2014 ). Additionally , spreadsh eets do not pr ivilege data ‘ﬁrst-order, p er s istent ob ject’ ( McNamara 2016 ). Once a d ata ﬁle is open, m od iﬁcation or deleti on of data v alues is just a cl ic k a w a y . In this paradigm, the sanctit y of data is not preserved, and original data can be lost foreve r. In contrast, most statistica l to ols discourage d irect manipulation o f original data. In tools us ed by practitio ners to do statistical analysis (e.g., R , SAS softw are), data is an almost sacred ob ject, and u sers are giv en only a copy of the data to work w ith. Data do es not hav e structural integrit y in a spreadsheet. Data v alues sit n ext to b locks of text and plots pro du ced by data co ve r up d ata cells. Everything is included on one canv as. These pieces ma y b e linked together, but there is no explicit visual connection. In a true s tatistica l to ol, resu lts from the analysis are separated from the data fr om which they w ere d erive d, and any data cleaning tasks p erformed in these to ols can b e easily do cumented. This leads to the largest challenge with sp readsheets: they do n ot supp ort rep r od ucibility ( McNamara 2016 ). Data journalists hav e historically done analysis using tools like Excel ( Plaue and Cook 20 15 ). Journalists must b e careful ab ou t th e an alysis they p ublish, as it must b e veriﬁed like any other ‘source’ they might in terview. Spr eadsh eets do not oﬀer any inheren t documentation. As a r esu lt, journalists developed th eir own repro ducibility documentation, often cal led a ‘data diary .’ The data diary typically takes the form of a do cument written in p arallel with the analysis that describ es all the steps taken. This sup plementa ry do cument is done separately , either by h and or in word pro cessing softw are like Microsoft W ord . There are wa ys to imp r o ve this pro cess ( Wilson et al. 2016 ), but it is still in h erently pr ecarious. Because eac h stag e of analysis in a spreadsheet is d one by clic king and dr agging, there is no w a y to fully reco rd all the act ions tak en. One of the ce ntral tenets of repro ducibility is it should b e p ossible to p erform th e same analysis on s ligh tly diﬀerent d ata (e.g., from a diﬀerent year). Spr eadsheets do not make this possible, so they are not eﬀectiv e to ols for data analysis. Ho we ver, spreadsh eets ha ve some clear adv a ntag es. The ﬁrst is their great market saturation, as mentio ned ab o ve . Almost eve ryone has access to some sort of sp readsheet p rogram, and many lev els of schooling oﬀer sp readsheet training. The second is that because outpu t and p lots are created by linking cells of the d ata to attributes in the analysis, the ﬁnal pr od u ct is reactiv e. When you c hange a v alue in the s p readsheet, all the associated graphs u p date automatical ly . Computer scienti st Alan Kay believe s computer op erating systems sh ou ld essentially b e sp r eadsheets, by wh ic h he means an op erating system sh ould b e a reactiv e programming environment that can b e bu ilt up into resp ons ive tools to p erform a wide v ariet y of tasks ( Ka y 1984 ). In this paradigm, ob jects can b e linked together in a d ep endent structure, and wheneve r an inp u t is changed, al l the do w nstream elements are u p dated acco rdingly . Again, b ecause so many p eople hav e acc ess to spreadsheet tools, the interactiv e pro duct can b e eas- ily sh ared with others. Although the dep end ent structures are not made visible in the spr eadsheet, readers of the pr od uct can get a sense of th e connections by playing with the data. Of course, the reac tive p ossib ilities in spreadsheets can also lead to unintended consequences. In a study of spreadsheets used by En ron, researchers found 24% of spreadsheets with a formula included an error ( Hermans and Murphy-Hill 2015 ). This is lik ely because wh ile sp readsheets allo w for reactiv e linking of cells, they do not visualize the reactiv e connectio ns, and it can b e ea sy to doub le a formula or in clude unintended cells. Th e reactive programming en vironment Sh iny show cases some of the capabilities of this paradigm in a more r epro ducible data analysis environment ( Chang et al. 2015 ). Overal l, while spr eadsheets are widely accessible and hav e r eactiv e capabilities, they sh ould not b e considered true tools for statistical pr ogramming. This is b ecause they d o n ot privilege data, making it easy to ac cidenta lly (or inte ntio nally) manipulate the original v alues, and contain errors in their closed-source co d e. 4. APPLETS AND MIC R OW ORLDS 4.1. Applets Statistics applets are t ypically hosted on the web, and are p rogrammed to illustrate one concept through th e u se of a sp ecialized inte ractiv e web tool. They are h ighly accessible b ecause they are hosted online, and they are t ypically free to us e. Biehler would consider these ‘micro wo rlds’– tools that allo w instructors to cur ate a limited list of fun ctions for students to us e. Some of the b est applets w ere d esigned by statistics educators Alan Rossman and Beth Chance ( Ch ance and Rossm a n 2006 ). One of their app lets allo ws students to d isco ve r rand omization by wo rking thr ough a sce- nario ab out randomly assig ning b ab ies at the hospital. Th e applet asks the question, if we r andomly assign four b abies to four homes, how often do they end up in the home to which they b elong? T he user can watc h the r andomization happ en once, as a stork ﬂies across the s creen to delive r the babies to their color-coded h omes, and then acc elerate the illustratio n to see wh at the distribu tion wo uld look like if one tried the same experiment many times. Stud ents can u se c h ec kb o xes to turn the animation on or oﬀ, or to see th e theoretical probabilities. T hey can also try again with a diﬀerent number of babies or a diﬀerent n umber of trials. Another p opular applet set called StatKey was d eve lop ed by the Lock5 group to acco mpany their textbo ok ( Lock et al. 2012 ; Morgan et al. 2014 ). One S tatKey applet allows students to create a b ootstrap conﬁd ence in terv al for a mean. Th is applet does not include an anim ation like the stork featured in the Rossman-Chance example, but users can still specify how many samples they w ant to take, stepping th r ough one sample at a time, or accele rating the pro cess by clic kin g the “generate 1000 samples” b utton. Applets can b e useful for students to learn distinct concepts lik e randomization, but they can also b e frustrating when s tudents wan t to do things just o utside the scope of the applet. The Rossman and Chance applets includ e many concepts, but v ery few of them let user s imp ort their o wn data. The StatKey applets do allow users to edit th e example data sets or upload ent irely n ew data, but they are n ecessarily limited to what they were p rogrammed to do. In other words, they fail on the attributes of the ‘ﬂexibilit y to build extensions’ and ‘supp ort for a cycle of exploratory and conﬁrmatory analysis’ ( McNamara 2016 ) StatCrunch is another p opular tool used by educators. It was developed by W ebster W est, and it com b ines features from standalone softw are pac k ages like TinkerPlots and F athom alongside spreadsheet-like fu nctionalit y and ‘micro world’-st yle applets. S tatCrunch is inexp ens ive and a v ail- able through the web browser, s o it is accessible. It was initially developed in the late 1990s as a Java applet ( W est et al. 2004 ), bu t has s in ce b een rework ed int o a mo dern web tool (likely using JavaScript ) distribu ted by Pearson Education. While StatCrunch do es colle ct a lot of the b est features of the tools it amalgamate s, it also accumulat es many of the negativ es. F or example, the lac k of data s an ctity mentio ned in the spreadsheets section is certainly true here, as is the m essy can v as associated with b oth sp readsheets and soft ware lik e TinkerPlots an d F athom. 4.2. In t eractiv e Data Visualizations Interac tiv e data visu alizatio ns are gaining p opularity on the web. The New Y ork Times pro du ces some esp ecially salient examples. Instead of static graphics, their visualizations allow readers to ma- nipulate r epresentati ons of data themselv es. F or example, the Times has pr o du ced graphics allo win g readers to balance the federal budget, p redict wh ic h wa y states will v ote in the p residential election, or assess wh ether they would sav e more money by bu ying or renting their housing ( Carter et al. 2010 ; New Y ork Times 2012 ; Bostock et al. 2014 ). These data visu alizatio ns are essentially applets or microw orlds. Th ey allo w a user to learn ab out one p articular facet of a dataset the author h as made av ailable. Th is scrip ted quality is actually v alued in d ata visualization, b ecause visualizations shou ld provide some context and storytelling for the d ata, r ather than simply leavi ng users to explore ( Cairo 2013 ). But th e scr ip t can also b e limiting. Data visualizations can ser ve muc h the same purp ose as applets (helping users u nderstand one particular concept) but hav e the same dra wbacks (inﬂ exibilit y). Some inte ractiv e visualizations hav e b een pu s hing th e boun d aries on this in ﬂexibilit y , allo wing readers to critique the creatio n pr o cess or algo rithmic decisions. On e n otable example is the IEEE Sp ectrum rating of programming languages. The article provides a default rankin g, but it allo ws readers to cr eate a custom ranking by adjus ting the weigh ts of all the data inputs ( Cass et al. 2014 ). It is p ossible to imagine a future where all j ournalistic p r od ucts based on d ata are ac companied by this type of auditable repr esent ation of the pro cess used to cr eate them. 4.3. Shin y and manipulate A r ecen t ad d ition to this ecosystem are the R pac k ages Sh iny and manipulate , which wo uld lik ely b e termed “meta- tools” b y Biehler, en ab lin g teachers to “adapt and mo dify material and softw are for their s tudents” ( Biehler 1997 ). ( R and its pac k age system are discuss ed in more detail in Section 7.2 .) Shiny enables R pr ogrammers to create interact ive visualizations for the web ( Chang et al. 2015 ). Authoring S hiny apps is a task for more exp ert R users, but the resulting applets are similar to those describ ed in Section 4.1 , so they can b e useful teaching tools for n o vices to play with . The applets are r eactiv e, wh ic h p uts some bu rden on th e p rogrammer but means that th ey are h ighly inte ractiv e. Shiny has enabled R programmers to build interacti ve to ols that hav e gained vir al success, suc h as the dialect map p ublished by the New Y ork Times th at even tually receiv ed more views th an any article in the history of the paper ( Katz and Andr ews 2013 ; Leonhardt, D. 2014 ). There are also many examples of applets develo p ed sp eciﬁcally for edu cation ( ¸ Cetink a ya Ru ndel 2014 ). Shiny sup p orts in terface features lik e sliders, radio bu ttons, c heck boxes, and text input. Typically , though, the resulting visu alizatio ns are themselv es static. T h e user cannot zo om into them naturally in the wa y they wo uld with a other in teractiv e web graphics. Instead, the pr ogrammer would ha ve to incorp orate sliders for the x- and y -r anges, and the user would manipulate those to imp act the zoom. In other words, Sh iny app s suﬀer from many of the same dr a wbacks as more traditional applets, although they are easier f or instructors to develo p. A simpler R p ac k age with a similar idea is the m anipulate pack age ( Allaire 201 4 ). m anipulate is easier for n o vices to u se, although it still requ ires some kn o wledge of R . How ever, instead of pro ducing a standalone interactiv e graphic, manipulate works w ithin RS tu dio to p ro duce interactio n for the purp oses of education. 4.4. Reﬂections on applets and micro w orlds While applets can b ecome frustrating for studen ts b ecause of their limited scop e, the arr a y of w ell- considered statistics education applets is a rich source of inspiration for pro j ects trying to br idge the gap. App lets tend to b e th e most successful at s atisfying the attribu te of ‘in teractiv e at every lev el,’ although th ey are rarely ﬂexible enough to build extensions ( McNamara 20 16 ). If statistical programming to ols could aim for s ome of the interactivi ty and animation of applets, they migh t be b etter un dersto od b y all users. Shiny also shows promise, b ecause it m akes it simpler for educators to create their own applets to illus tr ate new concepts. Ideally , it should b e p ossible f or an yo ne to create an interacti ve data produ ct, not just those with hav e R or JavaScript skills. 5. ST AND ALO NE E DUCA TIONAL SOFTW ARE The ﬁeld of standalone educational soft ware is d ominated by sibling softw are pack ages TinkerPlots and F athom. Although computers were b eing used in introductory stat istics classes b efore Biehler’s 1997 pap er, it seems clear there w as a turnin g p oint after it was pub lished. F athom and TinkerPlots, designed by Cliﬀ Konold and William Finzer, resp ectiv ely , ca n b oth trace their origins to “Softw are for Learning and for Doing S tatistics” and h a ve realized m uch of Biehler’s vision ( Biehler 1997 ). TinkerPlots and F athom ha ve b een around for years, and are well-lo ved bu t b ecoming outdated. Two more recent dev elopmen ts in the ﬁeld o f standalone edu cational soft ware are COD AP (a new pro ject by Finzer and Konold) and iNZight. W e’ll talk ab out each of these in turn. 5.1. Tink erPlots and F athom The authors of F athom and Tink er P lots wan ted to design to ols relev ant to the w a y students think. The tw o hav e s im ilar functionalit y , although slightly diﬀerent inte nded u sers. Tinke rPlots is de- scrib ed as b eing appropriate for students from 4th grade u p to universit y , and F athom is d irected at the secondary sc ho ol and introductory college level s. Because F athom is intended for slightly older users, it includes more features than TinkerPlots do es. Both TinkerPlots and F athom are excellent tools for n o vices to use when learning statistics. Th ey comply w ith nearly all the sp eciﬁcations outlined by Biehler ( 1997 ), allowing for ﬂexible plotting, providing a low threshold, and encouraging pla y and re-randomization. T hey allow students to jump right in, to p erform exploratory data analysis and to mov e through a data analytic cycle (e.g., asking questions, tryin g to answ er them, re-forming questions), and hav e b een shown to enhance student u nderstand in g ( W atson and Donne 2009 ). F athom was devel op ed by William Finzer, based on principles from Biehler ( 1997 ), and intended to allow students p la y w ith statistical concepts in a more creative wa y . The design sp ecs up on which F athom is b ased include a focu s on resamp lin g, a b elief th ere should b e no mo dal dialog b oxe s, the location of controls outside the do cument prop er, and animations to illustrate what is happ ening ( Finzer 2002 ). TinkerPlots w as designed b y a team led b y Cliﬀord Konold, a p sycholo gist fo cused on sta tis- tics education ( Konold and Miller 2005 ). TinkerPlots was built on F athom’s in frastructure, but designed for yo unger stud ents. TinkerPlots was devel op ed the same year as the initial GAISE rep ort ( F ranklin e t al. 2005 ), and the conn ection b etw een the cognitive tasks TinkerPlots make s p ossible and the A and B lev els of the guidelines is clear. TinkerPlots includ es probability mo deling, but no standard statistical models (e.g. linear r egression). Users ca n develop their o wn simulatio ns and link comp onents together to s ee how c h anging elements in one area will imp act the outcome somewhere else. TinkerPlots and F athom hav e a large market share w hen it comes to teac h ing introductory sta tistics in the K -12 conte xt ( Leh r er 2007 ; Garﬁ eld and Ben-Zvi 2008 ; Konold and Kazak 2008 ; W atson and Fitzal len 2010 ; Biehler et al. 2013 ; Finzer 2013 ; Fitzallen 2013 ; Mathews et al. 2013 ), at the int ro ductory college level ( Ben-Zvi 200 0 ; Garﬁeld et al. 200 2 ; Everson et al. 200 8 ) and in training for teach- ers ( Rub in 2002 ; Biehler 2003 ; Gould and Pec k 2004 ; Hammerman and Rub in 2004 ; Ru bin et al. 2006 ; Hall 2008 ; Pfannkuch and Ben-Zvi 20 11 ). Past their design p rinciples, b oth tools w ere p opu - lar for their reasonable p ricing str ategy , which made it p ossible for sc ho ols to a ﬀord licenses. T hey are accessible in terms of pricing and compatibility with many platforms, bu t they ma y n ot b e useable by studen ts with disabilities ( McNamara 2016 ). F or educators who w ant to teac h concepts like randomization and data-driv en inference, the primary competitors at th is lev el are app lets. TinkerPlots and F athom hav e a num b er of adv antag es ov er applets. Most imp ortantly , Tinke rPlots and F athom allo w studen ts to use wh atev er data they wan t, rather than demonstrating data on one lo ck ed-in data set. T he systems co me with pre-loaded d ata sets, but it is easy to open other data and use it in the same wa y . F athom was dev eloped in 2002, and TinkerPlots in 2005. In the 10 years since their r esp ectiv e releases, statistica l p rogramming h as mo ved forward in wa ys these pac k ages ha ve not. F or example, while few wo uld expect n ovices to b e working with ‘b ig d ata’ in the truest s ense of the term, TinkerPlots can only deal w ith data up to a certain size. A trial using a dataset with 12,00 0 observ ations and 20 v ariables caused considerable slowing, w h ile larger datasets ca used the program to hang ind eﬁnitely . F athom dealt w ith the same data m uch more easily , but still had a notice able dela y loading and manip ulating the data. While b oth softw are pac k ages allo w for th e inclusion of text in th e workspace, th ere is no wa y to deve lop a data analysis narrative . They do not ‘sup p ort narrative, pu blishing, and repro ducibil- it y’ ( McNamara 2016 ). The more free-form workspace can feel creativ e, but it makes it n early imp ossible to repro du ce analysis, eve n u sing an existing ﬁle. There is also no easy wa y to pu blish results fr om these p rograms. Th e p roprietary ﬁle t yp es (.tp for T inkerPlots and .ftm for F athom) need the associated soft ware in ord er to b e ru n int eractiv ely , and the only wa y to pro duce something view able without the app lication is to p rint th e screen. Because th e softw are is closed-source, neither TinkerPlots nor F athom are extend able in any wa y . There is no ‘ﬂexibility to build extensions.’ What you see is what you get. T his b ecomes particularly problematic when it comes to mod ern mo deling techniques. F or example, in the Introdu ction to Data S cience class develo p ed for high school stu dents thr ough the Mobilize grant, students use classiﬁcation and regression trees, and p erform k-means classiﬁcation ( Gould et al. 2015 ). T h ose method s are not a v ailable in either soft ware pack age, and cannot b e added. In f act, Tin kerPlot s has no s tandard statistical mo dels, whic h means it cannot b e u sed for the full data analytic cycle. It is truly only a tool for learning. F athom, which is designed for slightly older s tudents, do es provide limited mo deling fu nctionalit y in the form of simple linear regression and m ultiple regression. In the con text of Cliﬀord Konold’s argumen t that tools for learning should b e completely separate from to ols for doing ( Konold 2007 ), it make s sen se there are limits to these to ols. They w ere consciously d esigned to b e separate. Ho wev er , giv en the capabilities o f m o dern computing, it should b e p ossible to p rovide this groun d-up entry w hile still supp orting more exte nsibility . 5.2. COD AP Both William Finzer and Cliﬀ K onold are inv olv ed with the developmen t of the Common O nline Data Analysis Platform (CODAP). CO D AP promises to b e a more mo dern, web-based tak e on the principles that drov e the devel opment of F athom and TinkerPlots ( The Concord Consortiu m 2016 ). While COD AP is still in the initial stage s of d evelopmen t, it sh o ws promise. Sev eral partnerships with data-generating groups hav e formed initial test cases f or CO DAP . One p art- ner is th e group Ocean T racks, whose g oal is to in vo lve high sc h o ol students in marine biol- ogy ( Th e C on cord Consortium 2016 ). Several other partn erships are in progress as well, and the CODAP team hop es to learn wh at features are most useful for a platform by generalizing f r om these s p eciﬁc examples. Because CODAP is web-based, it w ill b e more accessible than to ols like F athom and also make it easier to sh are and pu blish results of analysis. Only time will tell if it will supp ort repro ducibility and extensibilit y . 5.3. iNZigh t Another recent develo pment in the category of standalone educational softw are is the microw orld iNZight . Underlying iNZigh t is R , but u nlike other graphical user interfaces (G UIs) for R (discussed in more detail in S ection 7.2.2 ), the goal of iNZight is not for students to learn R . Instead, the designers of iNZight hav e used R as a target language for the developmen t of a free, interacti ve statistica l tool ( Wild and Elliott 2016 ). iNZight is laun ched from within R , bu t launc hes its o wn GUI window. Unlike the R GUIs in Section 7.2.2 , iNZ ight do es not generate R code associated with the actions tak en in the GUI. The to ol has b een designed to mak e it p ossible to d o a v ariet y of data analytic ac tions associated with statisti cs curriculum in New Zealand. While it is somewhat a wkward to hav e to launch an application from another application that will not b e exp licitly used, otherwise iNZight delive rs on b eing a light weigh t, op en-sour ce, graph ical approac h to statistics. It is acce ssible, extensible, and inte ractiv e ( McNamara 2016 ). 5.4. Reﬂections on standalone educational softw are The ﬁeld o f standalone educational soft ware for statistics has lo ng b een dominated b y Tink er P lots and F athom. These tools r evo lutionized the wa y statistic s could b e done by novice s, and realized Rolf Biehler’s 1997 vision. Ho weve r, as computers and d ata hav e changed, they are becoming outdated. L u cki ly , b oth Finzer and Konold are inv olved w ith the Comm on On line Data Analysis Platform (CO D AP) pro ject, wh ic h h as the p oten tial to m od er n ize the b est asp ects of the tools. Another promising dev elopment is iNZigh t, a micro wo rld built on R , w ithout exposing stud ents to syntax. In the context of McNamara ( 2016 ), standalone educational s oft ware exempliﬁes many of the at- tributes. Because of their lo w price, these to ols can be considered at least mostly accessible, although th ey d o n ot oﬀer accessibilit y supp ort for disabled users. Th ey certainly ease ent ry , and supp ort exploratory and conﬁrmatory analysis. They pr ovide ﬂexible p lotting and make it simp le to us e randomization. They are interactiv e and visual. But, they are not app ropriate to ols to use to actually ‘do’ d ata analysis. Dat a is not a p ersistent ob ject in these tools, as cases can b e easily c hanged with n o record made. In the s ame vein, they do not su pp ort repr od ucible researc h , nar- rativ e, or p ublishing (t o view a F athom or Tinke rPlots d ocum ent, a reader must ha ve the soft ware installed on their computer). Th ey are also closed-source and cannot b e extended. Again, CODAP is aiming to solve some of these issu es. Because it is open -sour ce and on the we b, do cu ments will b e accessible to any one with internet access. How ever, educators choosing to use these tools sh ould consider the tradeo ﬀs b etw een standalone educational soft ware and professional tools. 6. D A T A DESK Data Desk has b een giv en its own section in this text b ecause while it w as considered to be s tan- dalone statistics ed u cation softw are by Rolf Biehler, it also is in cluded in a history of statistical programming tools by Jan DeLeeu w ( De Leeuw 2009 ). It is the only p rogram to garner this broad acceptance across a v ariety of use-cases. Data Desk was developed by Paul V elleman (a student of John T ukey) to facilitate T u key’s ex- ploratory data analysis ( V elleman 1989 ). It represent s one of the ﬁrst u ses of linked visualiza- tion ( Wills 2008 ). The ﬁ rst version of Data Desk wa s introduced in 1985, and although m ost of what p ersists from that v ersion are screenshots, it is easy to see ho w groundbreaking it must ha ve b een at the time. When Biehler mentions Data Desk in his 1997 pap er, developmen t had b een underwa y for more than 10 y ears ( Biehler 1997 ). In cont rast, almost all other tools used for teac hing sta tistics came after the 1997 p ap er. Am azingly , Data Desk has gone through sev en versions since 1985, and still exists tod ay . The only other to ol that might b e able to claim such a long history is JMP (describ ed in Section 7.1 ), although JMP did not hav e interact ive graphics until 1991 ( Best and Morganstein 1991 ). The Data Desk inte rface was clearly an insp iration for TinkerPlots and F athom, and features a palette of to ols as w ell a s menu bars. How ev er, Data Desk provides muc h ric h er functionalit y than either Tink erPlots or F athom, includ ing linear and nonlinear models, cluster analysis, and p rincipal compon ent analysis. The drawbac ks of Data Desk are slight , and similar to those of T inkerPlot s and F athom. After 30 ye ars, the int erface lo oks outdated, and it do es not ‘supp ort narrative, pu blishing, and reprod ucibil- it y’ b ecause it d o es not promote the in clus ion of text and only static versions of analysis can b e shared ( McNamara 2016 ). Ho we ver, these drawbac ks notwithstanding, it is a highly inspir ational tool. 7. ST A TISTICAL PR OGRAMMING TOOLS Over time, there has been a mov ement to wa rd student s as true ‘creators’ of computational statistic al wo rk, wh ic h requires the use of a statistica l p rogramming to ol. Deb Nolan an d Duncan T emple Lang argue well for this in their p ap er, “Computing in the stat istics curriculum,” where they suggest students sh ould “compute with data in the pr actice of statistic s” ( Nolan and T emple L an g 2010 ). They are promoting R , although their recommendations could be ac hieve d using a d iﬀerent language. Many college s and universities are mo difying their s tatistics courses to fall in line with Nolan and T emple Lang’s recommendations. Statistical p rogramming tools are those that can b e used by statistical practitioners to do data analysis. Common examples in clud e SAS softw are, SPSS softw are, Stata, R , Py thon , and Julia . The ﬁrst thr ee we will consider together, b ecause these are ente rprise softw are tools that can b e prohibitively exp ensive (a ma jor barr ier to use in edu cation). The last three (all programming languages) will b e considered as op en-source alternatives that are more ‘accessible’ ( McNamara 2016 ). 7.1. SAS softw are, Stata softw are, SPSS, and JMP Some commonly used tools for doing statistic al analysis are SAS softw are, Stata soft ware, SPSS, and JMP ( SAS Institute Inc 2015 ; StataCorp 201 5 ; IBM Corp 20 13 ; SAS Institute 2012 ). Inte restingly , although most u sers refer to these tools by shorthand names (e.g. ‘SAS’) their oﬃcial names include the word ‘softw are’ (e.g. ’SAS softw are’). All three tools are standalone softw are, and all com bine elemen ts of graph ical user interfaces with command-line tools. They are u sed in a v ariety of disciplinary con texts, so the argumen t for teac hing them is ‘students will need to u se th is in the future.’ They are often p opu lar in indu stry , b ecause th ey come with guarantees of v alidity and tec h nical supp ort, and they are designed for work with big data. How eve r, they are closed-source, exp ensive, and can be unintuitiv e for novices. SAS soft ware and Stata softw are are p erhaps the most similar of these three pack ages. Their inte rfaces are visually similar, and they ha ve similar b eneﬁts and drawbac ks. SAS soft ware was ﬁrst in trod uced in the 1970s ( De Leeuw 2 009 ), when data was read in on punch cards. Because each card could on ly hold a small amount of information, the system n ecessarily had to b e goo d at distribu ted computing. This functionalit y h as scaled well ov er the years, and as a result SAS softw are is very p ow erf ul for working with b ig data. SAS softw are is often used in pharmaceutical and bu s iness applications b ecause it comes with a guarantee of accuracy . S AS soft ware has a command line interface (CLI) because of its long h istory , and the fact it was developed b efore p ersonal computers were commonplace. F or many y ears, it was d iﬃ cu lt to re-use results from analysis as d ata, but the developmen t of th e SAS O utput Delivery System (ODS) in the late 1990s changed that ( Brya nt et al. 2000 ). Th e C LI and abilit y to use results as data supp orts repro ducibility , which is d iscussed in more depth in Section 8.3 . The main d rawbac k to S AS softw are is its price. The company mak es the softw are free for educa- tional use, both as desktop softw are and via the cloud, so students can acc ess it via a web bro wser (which makes it more accessible). But, SAS softw are is h ugely exp en s ive for corporate use, in part b ecause of the guarantee of accuracy and included supp ort. Bus iness p r icing is not a v ailable on their website. Instead, users must submit a request for a quote. Stata softw are w as ﬁ rst released in the 1980s, and initially only had a command line interface ( De Leeuw 2009 ). It is often the to ol of c hoice for economists and, therefore, introductory statistics cour ses taugh t in economics departments. Since 2003, it has includ ed a graphical user interface (GUI) in addition to the CLI ( De Leeuw 2009 ). The CL I ca n be used to create analyses can b e re-run to get the same results, so it does supp ort r epro ducible researc h. The to ols that su pp ort rep rod ucibility in Stata s oft ware are discussed in more depth in Section 8.3 . S tata’s user in terface is often though t of as more user-friendly than that of SAS. Soft ware supp ort is av ailable in the system or via a phone num b er users ca n call to get personalized help. Stata soft ware do es provide the ‘ﬂexibilit y to bu ild extensions’ ( McNamara 2016 ), and it h as an archiv e of contributed co de fr om us ers ( De L eeu w 2009 ). Like SAS softw are, the drawbac k to Stata softw are is its price. Although the company has reduced the p rice of almost all their pro ducts, they can still b e cost-prohib itive for students and academic institutions. As of 2016, ind ividual stu d ent pr icing was $125 for an annual license to th e version that works with moderate-sized data or $198 for a perp etual license. A single bus iness license costs $595 p er year or $1,195 for a p erp etual license. The version of Stata softwa re that works with large datasets (up to 10,99 8 v ariables) comes at an additional co st. Th e compan y do es oﬀer group discounts, bu t th ese are also expensive. Instr uctors u sing Stata soft ware ha ve co mplained that the licenses for ed u cational use are often so limited their stud ents must doub le up on compu ters to a vo id getting kick ed oﬀ th e sys tem. Similar to SAS s oftwa re and S tata softw are, SPS S is another corp orate to ol for statistics. It is t ypically used by social scientists and is m uch more fo cused on a men u-driven interface, than SAS soft ware and Stata softw are. SP S S d oes h av e a proprietary command-line syntax, bu t th e syntax is harder for humans to parse and the code is generally only created by copying and pasting, ve rsus users generating code themselve s ( Academic T echnology Services 20 13 ). Although code can b e sav ed, SPSS d o es not su pp ort repr od ucible r esearc h in the sens e of literate programming or dynamic do cuments. SPSS is also very expen s ive – $1,170-$ 7,820 for a 12-month individual license (pricing depend ing on features included), or $69.99-$89 .99 for a one-ye ar student license (plus $4.99 downloa d fee). The ﬁn al pro duct in this categ ory is another oﬀering from SAS In stitute, ca lled JMP ( SAS Institute 2012 ). JMP is often used in an educational setting, and p rovides a drag-and-dr op , menu-driv en graphical user interface to S AS soft ware. Like Data Desk, discussed in S ection 6 , JMP was originally designed in the 1980s and provides man y features useful for n o vices, lik e interactiv e b rushin g and linking, generaliz able data cleaning, and visual mo del su pp ort. The bac kb one of JMP is SAS soft ware, so the analysis d one in JMP can b e considered to b e repro du cible, but JMP pr o vides a simple visual interface . JMP provides many of the features of softw are for learning statistics w ith the p ow er of a to ol for really doing statistic s. Like TinkerPlots and F athom, while JMP does pro duce interactiv e graphics within an ind ividual session, these in teractiv e results cannot b e exp orted. Instead, a work session can b e print ed or pasted in to a docum ent. T he student v er s ion of JMP does not supp ort exp orting graphics, bu t individual licenses do. JMP p r o vides a lot of in spiration for what an int eractiv e statistica l programming to ol could lo ok like, particularly one coupled with repro du cible results. Ho we ver, once again JMP is exp ensive ($1,620 for an individu al and $14,900 for the professional ve rsion) but they oﬀer academic d iscounts: $49.95 for a 12-mont h license for under grad u ate and graduate students. Although SAS softw are, S tata soft ware, and SPS S are commonly u sed in industry , none of them seem sup p ortive of learners. They all pr o vide only sp eciﬁc typ es of graph ics (failing to p rovide ‘ﬂexible plot creation’), and most work is done using menus and wizards, so th ey do n ot m ak e clear wh at the tool is ac tually d oing (no ‘inherent docum enta tion’) ( McNamara 2016 ). These tools create s ‘us er s ’ rather than ‘creators’ of statistics. All thr ee to ols o bscure the u n derlying computational pro cesses and red uce statistica l pro cedures to b utton clic ks. Th ey all provide some capabilit y of extending the soft wa re with s cripting, bu t they all suﬀer from a lac k of transparen cy ab ou t how internal routines were co ded. Finally , th eir pr icing is prohibitive for many use-cases. JMP is the most insp irational of the group, p roviding graph ical metho ds to interact with analysis as well as reprodu cible co de in the SAS softw are language. 7.2. R R is a p r ogramming language for statistical computing ( R Core T eam 2016 ). It is the to ol of c hoice of academic statisticians, and has a growing market outside academia ( V ance 2009 ). Analysts at companies like Google routinely us e R to p erform exploratory data analysis and mak e models. R has sev eral adv antages o ver the other professional to ols w e h av e discussed. First, it is free. When members of the open source c ommunit y use the word “fr ee” they often d istinguish b et wee n “free as in sp eech” and “free as in b eer.” These ph rases indicate the d iﬀerence b etw een soft wa re that costs no money (e.g., m ost Google p ro d ucts) and softw are that is completely unrestricted and a v ailable for an yo ne to m od ify and edit. R is free in both wa ys. It is also compatabile with computer accessibilit y features, making it, for example, u s eable by blind p eople ( Go dfrey 2013 ). Because of this, it is one of the only tools that can b e considered to b e fully ‘acce ssible’ ( McNamara 2016 ). Like man y programming languages, R has b oth a base language and additional libraries that extend its fun ctionalit y , called pac k ages. Most pack ages a re h osted on a centraliz ed server called the Comprehensive R Archiv e Netw ork (CRAN) ( R Core T eam 2015 ). CRAN m akes it simple for users to install new pac k ages. Because R has the statistica l community inv ested in it, and because it is op en-source and easy to modify , th ere are many ad d itional pack ages for R . As of this writing, CRAN h osts ov er 9,000 pack ages. Pa ck ages mosaic , dplyr and ggplot2 are discussed in Section 7.2.1 , and Shiny w as already discu s sed in Section 4.3 . Another great quality of R is that it makes it very diﬃcu lt to modif y original data (it privileges data as a ‘ﬁrst-order, p ersistent ob ject’). When working in R , a u ser is n eve r interacti ng d irectly with the original d ata, rather a cop y of the data that has b een loaded into the w ork session. R k eeps a h istory of all commands that hav e b een used in a session, making it sim p ler to follow the trail of actions from the original data to a cleaned version and ﬁnal analysis. R facilitates repro ducible researc h, as discussed in Section ?? . As with any tool, R has its shortcomings as w ell. The main d rawbac k of R is its s tatus as a p rogram- ming language. Many of the other tools discus s ed here are graphical user interfaces (GUIs), while R is a language. Pr ogramming languages tend not to oﬀer ‘easy ent ry’ because they require user s to provide syntactical ly correct fu nction calls with appropriate argu m ents, and are not ﬂexible ab out things like capitali zation and pu nctuation. O n top of this standard hurdle, R has an inconsistent syntax, w hich can make it particularly hard to master. There h a ve b een eﬀorts to s im p lify the cod ing asp ects of R ov er the years. Some of these ef- forts are curr icular, reducing the number of commands to which no vices are exp osed, or providing more consistent syntax ( V erzani 2005 ; Kaplan and Sh oop 2013 ). Other eﬀorts are GUIs like De- ducer ( F ello ws 2012 ) and RCommander ( F ox 2004 ). How ever, none of these eﬀorts hav e truly s olv ed the problem. R syntaxes One complex aspect of R is the multitude of synt axes it su pp orts. Where most programming languages would h a ve one standard syntax, R h as many . W e will discuss three syntaxes th at are commonly encountered, as wel l as how they ﬁt in to the attributes from McNamara ( 2016 ). Historically , R has used the ‘dollar sign syntax,’ whic h uses the $ op erator. F or example, diamonds $color indicates the color v ariable w ithin the d iamonds dataset. T h is is often accom- panied b y su bsetting u sing square brack ets, as in diamonds[1, ] w h ic h would p u ll th e ﬁrst row of the data or diamo nds[ ,1] whic h would pull the ﬁr st column. This syntax d oes not p rovide ea sy ent ry or inh er ent documentation, because the symbols do not hold prior meaning. In educational settings, many teac h ers use the formula synta x, so named b ecause it is m ost com- monly found in functions p erforming mo deling. Pro ject MOSAIC and its associated R pack age, mosaic , promote the form ula synta x and mosaic rewrites summ ary statistics functions to follow the con ven tion ( Pruim et al. 2015a , b ). T he formula synt ax uses a ~ op erator, and instead of referring to v ariables within datasets, the user refers to the v ariables directly and then notes the dataset late r. F or example, tall y(~color , data=d iamonds) counts the num b er of diamonds of eac h color in the dataset. By using the mosaic pac k age, along with lattice graphics ( Sark ar 2008 ), studen ts can sta y ﬁrmly within the formula-based syntax for an en tire intro d uctory college statistics course. At the high sc hool le vel , the Mobilize pr o ject has also limited its scope to the formula syn tax ( Gould et al. 2015 ). Limiting th e scope of a course to the formula syntax seems to ease entry , and mak e it clearer to users wh at the co de do es. It is s till diﬃcult for n ovices to und erstand the leading ~ in one-v ariable s ituations, but ov erall the formula synta x seems easier for user s . One newer syntax (wh ic h can b e mixed in with either paradigm m ent ioned ab ov e) is the pip e. This op erator, % > %, ‘pipes’ data fr om one function into another ( Bac h e and Wickham 2014 ). The pip e is most often u s ed in conju nction w ith the so-calle d ‘tidyverse.’ The tidyverse, so named b e- cause it works with ‘tidy’ d ata ( Wic kham 2014a ), includ es the d ata ingestion pac k age readr , data visualization pack age ggplot2 , data manipulation pack ages dp lyr , lubridate , stringr , and mo del- ing p ac k age bro om ( Wic kham and F rancois 2015b ; Wickham 2009 ; Wic kham and F rancois 2015a ; Grolem und and Wic kham 2011 ; Wickham 2016 ; Robinson 2016 ). Many of these pac k ages w ere au- thored by Hadley Wi ckham, who has said h e wan ts to b uild to ols that allo w users to ea sily express 90% of wh at they w ant to b e able to do, while only losing 10% of the ﬂexibilit y ( Wic k h am 201 4b ). Sta ying within the particular syntax of the tidyverse is essen tially usin g a domain-sp eciﬁc language for data an alysis S ince th e tidyverse is situated within the full-featured language of R , edge cases can b e addressed with extensions as they are needed. An example of the tidyverse synt ax with the pip e op erator would be diamonds %>% group_by (color) %>% tally() The pip e paradigm allows u sers to av oid writing dollar signs, s o it is b eginning to gain traction within the statistics edu cation comm u nity . F rom these descrip tions of the v arious syntaxes of R , it should b e clear that th ey conform to th e attributes from McNamara ( 2016 ) to v arious degrees. Th e formula and pip e syntaxes hav e the easiest en try , and b etter supp ort the cyc le of exploratory and co nﬁrmatory analysis. Plots from the lattice pack age, u sing the formula s yntax, make it more diﬃcult to d o ﬂexible plot creation than base plots asso ciated with th e dollar sign syntax or ggplot2 plots. Gr aphic al U ser Interfac es and Inte gr ate d De velopment E nvir onments for R R work typically takes place at a command-line interface (C L I). In fact, R can b e u s ed directly at the command line or termin al. Ho we ver, there are some Graphical User Interface s (GUIs) and Integ rated Dev elopment En vironments (IDEs) for R that help supp ort users in their work. In contrast to the CL I paradigm that c h aracterizes muc h of programming, GUIs and IDEs redu ce cognitiv e load on users b y al lo w users to interact with compu ters by the use of men us and bu ttons (in the case of a GUI) or providing a source cod e editor with colored code, d ebugging supp ort, code completion, and sometimes automated code refactoring (IDEs). Un lik e standard source code editor (e.g. vi o r Notepad++), IDEs p rovide additional su pp ort for programmers. A common example of an language -sp eciﬁc IDE is Eclipse. Oth er IDEs are language agnostic, like Visual Studio Code by Microsoft, or Sub lime T ext. The most common GUIs for R are R Command er and Deducer ( F ox 2004 ; F ello ws 2012 ), and the runaw a y winner in the IDE cate gory is RStudio. Both R Commander and Deducer pr od uce R code when graphical elements of the interface are manipulated, b ut d o n ot mov e any fu rther in encouraging users to transition from user s to d oers of statistics. Th ey do not facilitate play (as F athom do es) or develop computational thinking (as learning R d oes). T he connection b et ween actio ns tak en with the menus and the resulting co de is implicit rather than explicit, and there is little reason for users to manipu late the code. R Command er was developed by John F ox as a wa y to use R in introd uctory statistics classes without stu dents having to learn syn tax ( F ox 2004 ). I t provides a limited set of p ossible tasks and provides a graphical u s er interface . While the interface is muc h more route-type than the landscap e-t yp e T inkerPlots and F athom ( Bakke r 2002 ), R Commander also mak es it p ossible to do summary statistics, graphics, and s imple mod els. Deducer is another GUI for R which, m uch like R C ommander, pro vides access to some of R ’s functionalit y th r ough menus and wizards ( F ello ws 2012 ). The plot menus p ro duce b eautiful ggplot 2 graphics ( Wic k h am 2 009 ), whic h is ideal for u sers b ecause they are inclined to feel pride for havi ng created something app ealing. How ever, the resu lting ggplot2 code p rinted into the console is to o diﬃcult for users to parse. Another detail is the automa tically-crea ted R code, wh ic h should ideally b e n on -thr eatening to encourage users to associate the actions th ey hav e tak en in w izards with the r esulting code. In Deducer, this auto-ge nerated cod e is bright red. In many users’ minds, red signiﬁes ‘error’ ( Ellior et al. 2007 ), so users often initially think th ey hav e done something wrong. While neither R Commander or Deducer a re ideal for teac hing students computational th inking or facilitat ing pla y , they are freely av ailable and work with a v ariety of systems. Muc h more useful for learners (though less graphical) is RStud io, an IDE for R ( RStudio T eam 2014 ). Lik e other IDEs, RStudio colors co de to make it easier to parse, provides code completion, and makes it easier to debug. RStudio can b e run as a desktop application f or Mac, Windo ws, or Linux, but it is also av ailable as a server install. With a server d istribution, us ers go to a we bsite, log in, and ﬁnd their RStudio session just how they left it. The inte rface looks nearly identical to the desktop version, but conta ined within a b rowser w indow. All the data ﬁles and cod e are hosted on a cen tral server, so students can d o their work from any computer without h a ving to worry ab out mo ving d ata from place to place. A serve r also allows instructors to manage pac k age installat ions from a cen tr al lo cation and provide quick bu g ﬁxes to all s tu dents at once. Because of the simp le access and management, many colleges use RStudio servers for their studen ts. In particular, Smith C ollege, Mount Holyok e College, Duke Unive rsity , and Macalester College all use this arr angement ( Baumer et al. 2014 ). A t the high sc ho ol lev el, the Mobilize pro ject also used a server version to reduce startup friction for high sc ho ol teac hers and their stu dents ( Gould et al. 2015 ). RStudio pr ovides additional supp ort f eatures that imp rov e on the stand ard R GUI. In particu- lar, RStudio is a u niﬁed inte rface where windows cannot get ‘lost.’ It also provides visual cues; to ob jects in the wo rking environmen t, to installed pac k ages, and to ﬁles in the working direc- tory . It oﬀers ﬁ le m anagemen t and comprehensive cod e history . The data preview fu nctionalit y helps ease the transition from spreadsheet pr ograms. And even in th e most programming-orien ted area, the Con s ole, R S tudio provides co ding sup p ort features like tab completion and co d e hints, which increase th e in h erent do cumentatio n of R ( McNamara 2016 ). RStudio has b een used suc- cessfully in man y introductory colle ge statistics classes ( Baumer et al. 2014 ; Mulle r and Kid d 20 14 ; Pruim et al. 2014 ; Horton et al. 2014 ) and with high sc h ool teachers and students through the Mobilize Pro ject ( Gould et al. 2015 ). How ever, even th ou gh it low er s the b arrier to entry f or R , RStudio still requires user s to co de, so there is a startup cost asso ciated w ith using it. 7.3. Python Python is a general-pur p ose programming language that provides supp ort to statistics thr ough the pac k ages pan das , NumPy , an d matplotlib ( McKinney 2012 ). It do es not hav e the same statistical programming comm unity that R do es, bu t d oes hav e a large communit y consid ering repro ducible researc h, discussed in more dep th in S ection 8.2 . Computer science education research has sho wn that Python is easier for novices to learn than Java , which has led m any computer science departments to switch the language u s ed in their in- trod uctory class to p romote access ( Guo 2014 ; Alv arado et al. 201 2 ; Ranum et al. 20 06 ). How ever, the language features that make Python easier to learn also apply to R – b ecause they are b oth inte rpreted languages with lo w startup costs. While Python is clearly adv antageous in the intro- ductory computer science con text, it is not clear whether it is the appropriate tool for introductory statistics. 7.4. Julia Another programming language sp eciﬁcally d esigned for statistical computing is Juli a . Unlik e R , Julia is b eing b uilt from programming language p rinciples, so the authors hop e to av oid many of the pitfalls R has r u n into ov er the ye ars ( Bezanson et al. 2015 ). Another principle Julia is based on is the idea that the language used b y a Julia developer should b e the same as the la nguage used b y a Julia user ( Bezanson et al. 2015 ). In some s cientiﬁc computing languages, th e un d erlying cod e is written in another, faster language, like FORTRAN or C++ . R falls into this category , as many of its faster routin es are written in lo wer-le vel languages. This is akin to the ‘ﬂexibility to build extensions’ principle take n to the extreme ( McNamara 2016 ). The distinctions b etw een R and Jul ia h a ve n ot b een fully ﬂeshed-out yet. Julia is a new language, still under activ e dev elopment. O f course, this means that it does not hav e the user comm unity that R has gathered ov er the cour se of 20+ years. Ho wev er, Ju lia seems more aimed at co mputer scientists and programmers, rather than simp ly p eople with data problems lo oking to an s wer questions. 7.5. Reﬂections on statist ical programming to ols Statistical programming to ols are goo d for, sim p ly , statistical programming. In this category we con trasted more GUI-driven to ols like Stata softw are, SAS soft ware, SPSS, and JMP , and program- ming languages R , Python and Juli a . While SAS softw are, Stata softw are, S PSS and JMP are all GUI-driven, th ey all hav e un derlying code which can b e u sed to script analysis. Th ese tools are u sed by p eople in a v ariet y of d omain areas for solving statistical p roblems. Because of their p rohibitive pr ice, they tend to b e used by corp orations and in industry . Cons id ering th e 10 attributes from McNamara ( 2016 ), they are not accessible, they often d o n ot su pp ort easy entry , but they do p rivilege data as a ﬁ rst-order, p ersistent ob ject. They supp ort th e cycle of exploratory and conﬁr matory analysis as well as any current tools do. While it can b e chall enging, they oﬀer the p ossibilit y to do randomization, as well pro duce repro ducible r ep orts. Since they are scriptable, they oﬀer th e ﬂexibilit y to b uild extensions. They are n ot go o d for int eractivit y (with the exception of JMP , the GUI to SAS softw are), nor inherent d ocum enta tion. Programming languages like R , Python and Julia hav e many of the same pr os and cons as th e more grap h ical soft wa re pack ages, although they are more accessible. The most p opular c hoice of language for statistics cours es is R , which has th e p ow er of the statistics comm unity driving pack age deve lopment. Because it is a programming language, R ca n be hard to learn, but eﬀorts like Pro ject MOSAIC hav e b een working to make it more accessible to novice s. RStu dio smo othes out s etup issues and provides a graph ical view of data, as well as other friendly in terface features. Python has b een sh o wn to b e novic e-friendly in introd uctory computer science classes, but it do esn’t h a ve as muc h inh er ent s upp ort for statistical work. Julia is b eing written sp eciﬁcally for use in statistics, but it is so new it is hard to comment o n. All these p rofessional statistical programming tools hav e the p ow er to work with large, arb itrary data and can pro duce reproducible co de (unlike app lets and standalone educati onal softwa re). Ho we ver, their high sta rtup costs mak e them somewhat less than ideal for a learning con text. 8. TOOLS FOR REPR ODUCIBLE RESEAR CH Repro ducible research is a crucial element of statistics and stat istics educatio n. In the 2016 Guide- lines for Assessment and In struction in Statistics Education College Report, s upp ort for r epro- ducible resea rch is listed as one of the considerations f or teac hers when select ing tools ( Carver et al. 2016 ). Previously , tools for repro ducible research were int imately tied to the statistical pack age b eing u sed for analysis. How ever, most to ols are n ow open to a v ariet y of languages or soft ware, so we can consider these tools separately from the target languages th ey grew out of. There are many views of repro ducibility . In this section, will b e considering the narrow est view, that it should b e p ossible to re-ru n the an analysis u sing the same data and co de to get the same result again. It is said th at ‘data wrangling’ can tak e up to 80% (or more!) of the time in data pro jects ( Kandel et al. 2011b ), s o it is imp ortant that eﬀort n ot b e wasted. Ideally , it sh ould b e p ossible to run the analysis using a sligh tly-mo diﬁed version of the data (for example, the n ext y ear of data collectio n) to get an alogous r esults. Programming languages inh erentl y supp ort repr od ucible code. Ho wev er, ev en with a script cont ain- ing the analysis cod e or a history of all commands ru n, it is easy for co de to become un-repr od ucible. Typical ly , this is b ecause of human err or– the code gets separated fr om the analysis, so it is not clear which parts of the co de corresp ond to which p lots and outp u t in the pap er, or the co de b ecomes outdated and is not human-readable en ough to debu g. T o s olv e th ese issu es, Donald Knuth pro- p osed an idea of ‘literate p rogramming’ where all pr ograms wo uld b e acc ompanied by sur roundin g narrative to explain to humans wh at they were doing ( Knuth 1984 ). Deborah Nolan and Duncan T emple Lang to ok this fu rther, deﬁning a dynamic do cument as one that is compiled and automatically includes the results of em b edded code ( Nolan and T emple Lang 2007 ). When we consider repro ducible researc h in education, we wan t students to b e creating dynamic do cuments. In the same pap er, Nolan and T emp le Lang deﬁn e interacti ve d ocum ents (those that let a reader interac t with components lik e graph ics) ( Nolan and T emple Lang 2007 ). Because McNamara ( 2016 ) includes attributes ‘interacti vity at ev ery lev el’, and the importance of ‘publishin g’ and ‘r epro ducibility’, we would like tools to pro duce dynamic-inter active graphics. While we co uld imagine more graphical to ols supp orting repro du cibilit y , the to ols considered here are all formatting for ﬁ les that can b e compiled in s ome wa y to create a ﬁnish ed pro duct. 8.1. knitr and RMarkdo wn The R communit y has long b een committed to repro ducibility , whether through simple script ﬁles con taining analysis cod e or the R pac k age Swea ve ( Leisch 20 02 ). Sw ea ve allo wed users to combine text and mathematical notation written in the mark u p language L A T E X with R co d e in to one source do cument. The source document could b e pr ocessed to create a PDF do cument co ntaining formatted text and math, R co de, and output from R (for example, plots or numeric statistics). This s ystem meant that the enti re analysis writeup could b e re-run to pro duce the same result. Ho we ver, Swea ve was fragile an d d iﬃcult for n o vices to use. It has since b een su p erseded by new deve lopments. The knitr pac k age by Yihui Xie has pushed the boun dary ev en furth er ( Xie 2014 ). knitr d oes ev erything Swea ve did, but more generally and robustly . Where Sw ea ve wa s limited to text w ritten in L A T E X and code written in R , kn itr allo w s user s to com b ine any t yp e of co de ( Python , C++ , etc) with any textual format. The most common textual formats are L A T E X and Ma rkdown. Markdown is muc h simpler than L A T E X, so this mak es knitr more accessible to novice s. The most canonical examples of use are includin g R code in L A T E X (the functionalit y that s u p ersedes Sw ea ve ) or R code in Markdo wn text (called RMarkd own), but the pac k age is muc h more ﬂexible ( Xie 2014 ). Users write text and co d e (delimited as such b y a particular syntax dep endin g on the textual format they are u sing), then ‘knit’ the source document to create a fully form atted HTML , PDF, or W ord do cu ment. The ou tp ut document has nicely form atted te xt, code with syntax highlighti ng, a nd all the results from the co de, including numeric su m maries and plots. If the user changes something in the source document, they hav e only to re-knit the d ocument to see the up dated results in their output do cument. knitr functionalit y is av ailable through any R s ession, but th e most embedd ed supp ort is through RS tudio A sp ecialized ve rsion of Markdo wn h as b een w ritten to incorporate R cod e, cal led RMarkdown. Users can either ‘knit’ a ﬁnished RMarkdo wn do cument to see their r esults, or execute code inlin e in a note b o ok setting m uch like the Jupyter noteb o oks in Section 8.2 . Introductory statistics students can use RMarkdo wn to s u bmit their homewo rk or p ro duce repro ducible rep orts for ﬁnal pro jects ( Baumer et al. 2014 ). RMarkdown do cuments can also include interac tive graphics p ro d uced using the Sh iny p ac k age, discussed in more detail in Sect ion 4.3 . In this w ay , proﬁcien t R programmers can creat e dynamic- inte ractiv e do cu ments. Ho wev er, the co de in knitted d ocuments is static, so if readers wan t to inte ract with elements that were not programmed using Shiny they need the source ﬁle to b e able to mo dify and re-knit or to use the noteb o ok fun ctionalit y to execute chunks inline. Whatev er text marku p language and programming language, R an d knitr supp ort r epro ducible researc h. knitr makes it simpler to share analysis results in suc h a w a y that the same analysis can b e easily ru n on new d ata, changing only one line in the source co de and re-knitting the r ep ort to see the results. 8.2. Pro ject Jup yter Pro ject Jupyter is fo cused on scien tiﬁc computation and repro d ucibility ( Pe rez and Granger 2015 ; Ragan-Kelley et al. 2014 ). It g rew out of wo rk in the Python co mmunit y , and its best-known pro duct is the Ju pyter notebo ok (a next generation of the iPython notebo ok). Jupyter n ote- b ooks allo w users to combine text and co de in m uch the same wa y as the knitr pack age. Th e feature that separated knitr and Ju pyter n otebo oks for several years wa s the fact that Ju pyter notebo oks were a sin gle do cument, and cells of cod e could b e executed to see r esu lts directly b e- lo w th e code ( P ´ erez and Granger 2007 ). I n contrast, kn itr initially only sup p orted the knitting of entire do cuments, not the execution of c hunks of co de within the s ou r ce d ocum ent. How ever, (p erhaps ins pired by Jupyter noteb o oks), RStudio n o w supp orts this t yp e of noteb o ok functionalit y in RMarkdown documents. There are b eneﬁts and dra w bac ks to the notebo ok approac h. The main adv antage is that users can pla y with small pieces of code, and the m anipulation is more direct b ecause results are pro duced immediately b elow. This leads to the main w eakness, b ecause it is p ossible to execute co de out of order and get to a state that would not b e p ossible if the do cument was p ro cessed all the wa y through. This p la ys t wo of the attributes from McNamara ( 201 6 ) oﬀ of one another. O n one h and is inte ractivit y , w hich noteb ooks certainly supp ort. On the other is repro ducibility , whic h the abilit y to play with code out of order can undermin e. F or the most robu st su p p ort of Jupyter noteb o oks, u s ers must install Python on their local ma- c hine. The Ju pyter note b o ok is laun c hed by typing jupyter notebook in a terminal win dow. The installation and runnin g of Jupyter therefore has more of a barrier to entry for novic es than d oes knitr in RStudio. How eve r, there are other wa ys to work in J upyter. Instructors can host and supp ort a serv er for students, or Pro ject Jupyter h osts a sample v ers ion of the platform on the w eb, so any one with internet acce ss can try it. Jupyter notebo oks in itially only s u pp orted Python co de, but are no w more general. Oﬃcial ‘k ernels’ are av ailable for Python , R , and Julia , and communit y-maint ained k ernels supp ort almost any other language you can think of (SAS soft ware, Sage , Go , Erlang , an d many more) ( Pe rez and Granger 2015 ). Users wr ite text and code in ‘cells’, w h ic h are ﬂagged as C od e, Markdown, Ra w NBConv ert, or Heading. I n an inte ractiv e session, a user can exec ute cod e cells as they w an t to. T o share their wo rk, they can either provide the .ipynb ﬁle to another user with Ju pyter installed, or can exp ort their work as HTML, PDF, or sev er al other ﬁle formats. These exp orted do cuments are no longer inte ractiv e. Jupyter noteb o oks also p rovide the capability to create interac tive graph ics through the IPython library , s o if the author has d ecided to include them readers can interact with selected graphics in the ﬁnal prod uct. Again, this may qualify them as dynamic-inte ractiv e docum ents. But, if a reader wa nts to in teract with the cod e directly , they must get the .ipynb ﬁle and work on it locally . 8.3. ODS, Stat Rep, Stat W ea v e, MarkDo c While knitr /RMarkdo wn and J upyter noteb ooks supp ort m any pr ogramming languages, their in- tegration with SAS softw are, S tata soft ware, and SPS S has alwa ys b een limited. Ho wev er, there are other tools which ha ve been dev eloped to supp ort repro ducible research f or those to ols. The oldest f eature f r om S AS soft ware to su pp ort repr od ucible r esearc h was the d ev elopment of the Outp ut Deliv ery System (ODS). This allo wed u sers of S AS s oftw are to p rint their results to many formats, includ ing S AS data sets, L A T E X, HTML, and R TF ( Brya nt et al. 2000 ). Sin ce then, the deve lopment of the StatRep pack age has enhanced repro d ucibility in SAS ( Arnold and Ku h feld 2015 ) In S tata softw are, users can create repro du cible rep orts using StatW eav e ( Lenth 2012 ), which oﬀers similar fu n ctionalit y to the R pac k age Swea ve . It allows users to combine L A T E X and S tata code. Ho wev er, StatW ea v e can be diﬃcult to u se, p articularly for novice s ( Rising 2014 ). A newer deve lopment in the ve in of knitr is MarkDoc, a wa y to com bine S tata co de with Markd own, HTML, or L A T E X to create dynamic do cuments ( H aghish 2014 ). All these to ols are stil l under dev elopmen t, and are not as in tegrated into a graphical user interface or inte grated developmen t en viron m ent as RMarkdown or Jupyter noteb o oks. 8.4. Reﬂections on to ols for repro ducible researc h Current tools f or repro ducible researc h do a great job of combining text and code. T he most common tools used by pr ofessional data analysts and stati sticians are Jupyter noteb ooks and knitr /RMarkdo wn do cuments. Th ere are other alternativ es out there, lik e Beak er ( Two Sigma Op en Source, LLC 2016 ) and Zep p elin ( Apac he Softw are F oundation 2016 ). More sp eciﬁc to SAS softw are and Stata soft ware, pack ages like StatRep, StatW eav e, and MarkDo c oﬀer some of the same fun ctionalit y , although not as ﬂuid ly . These tools all s atisfy the ‘supp ort for repro ducibility’ attribute from McNamara ( 2016 ). But, they all fail to v arying d egrees on ‘easy entry’ and ‘in teractivit y at ev ery leve l.’ StatRep, StatW ea ve, and MarkDoc are all the hardest to use, b ecause they d o not hav e the inte rface sup p orting them that RMarkd own or Jupyter n otebo oks h av e. They require more steps to compile, and S tatRep and StatW eav e use L A T E X (notoriously diﬃcult to learn) as the text marku p language. RMarkdown do cuments, Jupyter notebo oks, and th eir alternativ es are easier for n o vices to u se. They still require some co ding ﬁnesse to use, but b ecause the text m ark u p language is markdo wn and the interface includes bu ttons to help add cells or co de c hunks, they can b e used in in trod uctory classes ( Baumer et al. 2014 ). All of these tools generally produ ce static documents, unless the author s p eciﬁcally codes in in ter- activ e features. O nce a d ocum ent is p u blished or shared, th e ability to execute co de is remov ed, preven ting readers from manipu lating it. If a reader wants to modify the code, they must d o wnload the sou r ce cod e, edit it, and then re-share the r esu lts. So, they do not supp ort dynamic-in teractive do cu ments ( Nolan and T emple Lang 2007 ). W e are not aw are of any graphical tools fully su pp orting repro ducible researc h, although some of the b esp ok e to ols mentioned in Section 9 sup p ort comp onents of r epro ducible research graph ically . 9. BESPOKE T O OLS In addition to the tools discussed abov e, there are a number of ‘b esp oke’ tools for doing p articular things w ith data. Th ese tools do not fall u nder any of the previous umbrellas, and represent s ome of the progress being made in statistical computing. Typically , these are standalone programs, mostly Graphical User Interfaces (GUIs), which are d esigned for d oing one sp eciﬁc task along th e d ata analytic p ip eline. Although th ere are new b esp oke tools p opp ing up all the time, we will consider Data W rangler, Op en Reﬁne, T ableau, and Lyra. 9.1. Data W ra ngler/T rifacta Data W rangler b egan as a pro j ect from th e Stanford Visualization Group in 2011 ( Kandel et al. 2011a ). Their goal was to provide a visual representation of d ata transforms, as well as a repro- ducible history of those transforms. F or example, a user could select an empty row and indicate it should b e deleted, at which p oint th e W rangler interface wo uld su ggest a v ariet y of generalizable transformations that could b e built from that one ‘ru le’ (e.g., delete all emp t y ro ws, or alw a ys delete the 7th ro w). O nce the user sp eciﬁes a transform , it is applied to the data and added to the inte raction h istory . T he interaction h istory can b e exported as a data tran s formation script in a v ariet y of languages. W rangler can also p erf orm simple database manipu lations. The to ols W rangler p rovided were so useful the authors w ere able to conv ert their academic researc h pro ject in to a corp orate v enture, whic h is no w kno wn as T rifacta. T rifacta still oﬀers a free v ersion of their p r od uct, called T rifacta W rangler, b ut their bu s iness m od el is b uilding on selling their ent erprise softw are. Pr icing is not explicitly listed on their website, so companies interested in the pro duct must contact the team to get a quote. Much like SAS softw are, Stata softw are, and SPSS, this model means that the free v ersion could b e used f or teac h ing, b ut would essen tially b e grooming students to n eed an exp ensive pro duct once they mov ed b eyond the capabilities of the free ve rsion. Pricing notwithstanding, W r an gler p rovides the existence p ro of that a visu al approach could b e tak en to wa rd data cleaning while preserving repro ducibility . 9.2. Op en Reﬁne Similar to Data W rangler is Op en Reﬁne ( V erb orgh and Wilde 2013 ). The pro ject was initially called Go ogle Reﬁne, but has since b een turned into an op en source pack age. L ike W rangler, Open Reﬁne can h elp clean data and do cument the data cleaning p ro cess. It can also b e used for d ata exploration and data matching, including geoo ding. Again, th e results of the reﬁn ing process are a v ailable as a re-useable script. Both Data W rangler a nd Op en Reﬁne pr o vide great alternatives to the spreadsheet paradigm. They privilege data as a complete ob ject, and document all modiﬁcations. By suggesting method s of generalizing data transformations, they remo ve muc h of the grunge w ork of spreadsheet analysis. The other b eneﬁt of generalize d data transformations is they encourage the user to think compu - tationally . Instead of ju st doing ‘whateve r works,’ there is us er incen tiv e to ﬁn d a wa y to d escrib e the data cleaning rule in a w ay that works generally . As its name suggests, Op en Reﬁn e is open-sour ce and free to use. Ho wev er, since Op en Reﬁne (and, similarly , W r angler) only do es data transformation, u sing it in teac hing would necessitate students learning an entire p ip eline of pr od ucts. Generally , we want to teac h as few tools as p ossible, to reduce the ov erall cognitive load on students. No matter h o w go o d th e second tool y ou teach to p eople is, they alwa ys lik e it less than the ﬁrst one. 9.3. T ableau T ableau is a b esp oke system for data visualization, based on Wilkinson ( 2005 ). As su c h, it do es not provide much su pp ort for data cleaning. T ab leau mak es it simple for users to create interactiv e graphics that can b e easily published on the web. It oﬀers easy entry and simple sup p ort for publishin g ( McNamara 2016 ). T ableau will s u ggest the ‘b est’ p lot for particular data, whic h is b oth a b lessing and a cur s e ( Mac kin la y et al. 2007 ). It can lead to muc h more appropriate uses of standard plots, but it al so do es not supp ort ﬂexible plot crea tion. A user can mak e a plot w ithout having any idea of what it means. S imilarly , T ableau makes it p ossible to ﬁt mo dels to data, but again does not mak e it cle ar what these models mean or how appropriate they ma y be. Like ot her ent erprise softw are, T ableau is exp ensive– $999 for an individual license or $1,999 for an individ ual professional license. Ho wev er, as with SAS soft ware, th ey mak e the tool free to stud ents. T ableau oﬀers easy en tr y and in teractivit y , but is not as reprodu cible as the other to ols mentioned in this section. It do es make it p ossible to rep eat the same analysis on another dataset b y replacing the data sou r ce, but there is no w ay to audit the pro cess otherwise. The pro cessing that ta kes place is opaque, and can’t b e exported as human-readable co de as in other systems. 9.4. Lyra Another besp oke data visualiz ation system is Lyra, wh ic h makes it ea sy for novices to create graphics in a drag-and-drop matter ( Sat yanara yan a nd Heer 201 4 ). Lyra was developed at the Unive rsity of W ash ington Interactiv e Data Lab. Inte restingly , Jeﬀrey Heer was a member of th e Stanford Visualiz ation Group that created Dat a W rangler, and is no w one of the founders of T r if acta. He has since mo ve d to the Universit y of W ashington and is a member of the Interacti ve Data Lab. Lyra is built on top o f vega , an abs traction lay er on top of d3 , a JavaS cript library . d3 is a library for “manipulating docu ments based on d ata,” wh ere ‘documents’ refers to the docum ent ob ject mod el (DOM) of the web ( Bostock et al. 2011 ; Bosto c k 2013 ). It is commonly used to create interac tive web visualizatio ns. d3 is a v ery general lib rary , and cann ot b e consid ered to b e a plotting library at all. It d oes n ot p r o vide p r imitiv es lik e b ar, b ox, axes, etc., like standard v isu alizatio n systems. Instead, it bind s data to the DOM of a web page. Man y of the in teractive data visualizations by the New Y ork Times (mentio ned in Secti on 4.2 ) are based on d3 , and an online site allows u sers can sh are ‘blocks’ th ey hav e created in d3 ( Bostock 2015 ). While the sharin g of co de examples helps users get started, d3 is generally considered to b e quite diﬃcult to learn. Veg a is an attempt to make it easier for n ovices to create the b eautiful interact ive graphics asso ciated with d3 ( Heer 2014 ). It pr o vides the sorts of graphical primitives more typicall y associated with data visu alization tools: r ect , a rea , and line . Ho weve r, even with these pr im itives, Veg a can b e diﬃcult for novices in the same wa y all textual programming languages. Enter Lyr a, a tool to simplify the crea tion of Vega graphics. It su pp orts simple data transformation, lik e grouping based on a v ariable, but generally should only b e co nsidered to b e a visualizat ion tool, b ecause it does not provide functionalit y for data cleaning, modeling, etc. It is a repro ducible tool, b ecause the r esu lting graphics can b e interrogated in the wa y stand ard Ve ga graphics can b e (i.e., by looking at the cod e). Lyra d oes not supp ort interactiv e graphics creation, bu t the group recently deploy ed an reactiv e ve rsion of Vega ( Sat yanara yan et al. 2016 , 2017 ), so it seems likely Lyra will soon go in that dir ection as we ll. Lyra provides much easier entry to making web graph ics than to ols like d3 . It is close to b eing inte ractiv e at eve ry lev el– the p ro cess of creating visu alizatio ns is inte ractiv e, although the ﬁn al pro duct is not y et. Because the to ol generates co de as you mov e through the creation process, it is also repro ducible. 9.5. Reﬂections on b e sp ok e to ols Bespoke d ata to ols like these are great sources for in spiration ab out n ew wa ys to visualize and improv e d ata cleaning, mod eling, and visualization. Man y of th ese p ro jects are op en-source, and while they do not cov er the entire analysis tra jectory , they sh o w promise as tools f or p articular data needs. W e hav e fo cused here on W rangler, Op en Reﬁne, T ableau, and Lyra, but there are many more b esp oke pro jects out there. F or example, Brunel ( Wills 2016 ) is an alternative to V ega and Lyr a. Lik e vega , Brunel oﬀers a domain-sp eciﬁc language for visualization, and like Lyra, it also p rovides a graphic interface to the language. Another similar eﬀort is plot.ly , a Ja vaScript plotting library that supp orts translation betw een graphics in R , Python , and MA TLAB ( Inc 20 15 ). As with Lyra and Brun el, p lot.ly p rovides a graph ical user inte rface to allo w p eople not f amiliar with coding to create interactiv e graphics very sim p ly . Again, co de is generated (either in plot.ly syntax o r a target language), so the pro cess is repro du cible. In fact, that is the inspir ing element of many of th e b esp oke to ols discussed here. They provide graphical user interfaces including new visual metaphors for d ata analysis alongside u nderlying code to provide r epro ducibility . They allo w novic es to p erform complex data cleaning and visu alization without getting lost in th e syntactic we eds. 10. CONCLUSIO NS AND FUR THER WORK Giv en the attributes outlined in McNamara ( 2016 ), the existing tools used in statistics ed ucation once again break into tw o distinct group s– to ols for learning stat istics, and tools for doing stat istics. T o ols that are interact ive and oﬀer easy entry are typically n ot ﬂexible to extensions or repro ducible. In p articular, T inkerPlots and F athom forefront metho ds to increase the visu al r epresentat ion of analysis and to simplify it for novice s. The b esp oke to ols Data W rangler, Op en Reﬁne, T ableau, and Lyr a also pro vide easy en try coupled with a more solid trace of the analysis. In con tr ast, more ﬂexible, scrip table to ols like R , SAS softw are, Stata softw are, or SPS S are hard er to get started using and muc h less interactiv e. The b esp ok e p ro ducts we examined here provide inspiration that a to ol could satisfy all 10 of the attributes at once (p erhap s with v arying levels of success). I want to encour age stati stics edu cators to look to the fu ture and consider w hat an ideal tool m ight look lik e, seve ral y ears down the line. One vision wo uld be a blocks-based language pro viding dr ag-and-drop fu nctionalit y for novices, with a domain-sp eciﬁc language u nderlying it for more adv anced s tu dents leading to a target language used by professionals. More of this vision is shared in McNamara ( 2016 ). Ho we ver, since we all live in the p r esent, it seems imp ortant to oﬀer b est practices given current computational tools. As is probably clear, my preference for statistics education is R , using the formula syntax or the pip ed tidyverse syntax. Whichev er synta x is c hosen, educators should mak e ev ery attempt to only exp ose stud ents to that one syntax. In an in trod uctory course, this is possible. Instructors using the Op enIntro textb o ok In trod uction to Statistics with Randomization and Simulation ha ve written the associated labs in a v ariety of ‘ﬂav ors’ to limit students’ exposu re to a particular syntax ( Diez et al. 2014 ). F or interactiv e work, S hiny or m anipulate can b e used for applet-like functionalit y . Best practices also include the us e of the IDE for R , RStudio, and h a ving stud ents pro duce r ep rod ucible wo rk using RMarkdown ( Baumer et al. 2014 ). F or instru ctors wh o are not willing to make the leap to a programming language, I b eliev e the b est existing to ol to use at the college lev el is F athom (Tinke rPlots is very similar and could b e substituted, but is aimed at sligh tly younger students). F athom oﬀers easy entry , lots of visu al cues, and enco urages it eration and randomization. It pro vid es ﬂexible and creative wa ys to exp lore data. The lack of sup p ort for reprod ucible an alysis or the sh aring of results is problematic, but for students n ot con tinuing on in statistic s, this ma y b e acceptable. Not recommended are applets (other than as demos by an instru ctor), graphing calculato rs, or spreadsheet softw are. References Academic T echnology Services (2013 ). “Comparing SAS, Stata, and SPSS .” http://w ww.ats.u cla.edu/stat/mult_pkg/co mpare_packages.htm . Aliaga M, Cobb G, Cu ﬀ C, Garﬁeld J, Gould R, Lo c k RH, Mo ore T , Rossman A, Stephenson B, Utts J, V elleman PF, Witmer JA (200 5). “Guidelines f or assessm ent and instruction in statistics education: College Rep ort.” T e chnic al r ep ort , American Statistical Asso ciation. Allaire JJ (2014). manipulate: Inter active plots f or RStudio . R pack age version 1.0 .1. Alv arado C, Do dds Z, Lib eskind-Hadas R (2012). “Increasing women’s participatio n in computing at Harvy Mudd College.” ACM Inr o ads , 3 (4), 55–64. Apac he Softw are F oun d ation (2016). “Apac he Zepp elin.” https:// zeppelin .apache.org/ . Arnold T, Kuhfeld WF (2015). “The StatRep system for repro ducible research.” T e chnic al r ep ort , SAS Institute, Inc. Bac h e SM, Wickham H (2014 ). magrittr: A F orwar d-Pip e Op er ator for R . R pac k age version 1.5. Baglin J (2013). “Applying a theoretical mod el for explaining the develo pment of technolog ical skills in statistics education.” T e chnolo gy Innovations in Statistics Educ ation , 7 (2). Bakk er A (2002). “Route-t yp e and landscap e-t yp e softw are for learning statistical data analysis.” In Pr o c e e dings of the 6th International Confer enc e on T e aching Statistics . Baumer B, ¸ Cetink ay a Rundel M, Bray A, L oi L, Horton NJ (2014). “R Markdo wn: Integrating A Repro ducible An alysis T o ol into I ntroductory Statistics.” T e chnolo gy Innovations in Statistics Educ ation , 8 (1). Ben-Zvi D (200 0). “T ow ard Understandin g the Role of T echnologica l T ools in Statist ical Learning.” Mathematic al thinking and le arning , 2 (1&2), 127– 155. Best AM, Morganstein D (1991 ). “Statistics Programs Designed for the Macintosh: Data Desk, Exstatix, F astat, J MP , StatView II, and Sup erANOV A.” The Americ an Statistician , 45 (4). Bezanson J , Edelman A, Karpinski S, Shah VB (2015). “Julia: A fr esh appr oach to numerical computing.” T e chnic al r ep ort , MIT and Julia Computing. Biehler R (1997). “Soft ware for Learning and for Doing Statistic s.” Internation al Statistic al R evi e w , 65 (2), 167–18 9. Biehler R (2003) . “Interrelated learning and working environment s for sup p orting the use of com- puter to ols in in trod uctory classes.” In IA SE satel lite c onfer enc e on statistics e duc ation and the internet . Inte rnational Statistical Institute. Biehler R, Ben-Zvi D, Bakk er A, Mak ar K (2013). Thir d International H andb o ok of M athematics Educ ation , c hapter T ec hnology for Enhancing Statistical Reasoning at the Sc ho ol Level. Spr in ger Science + Business Media. Bostock M (2013) . “D3.js: Data-driven do cuments.” http ://d3js. org/ . Bostock M (2015) . “m b ostock’s blocks.” http://bl. ocks.org/mbostock . Bostock M, Carter S, T se A (2014 ). “Is it b etter to rent or buy?” The New Y ork Times . Bostock M, Ogievetsky V, Heer J (2011). “D3 : Data-driv en do cuments.” IEEE T r ansactions on Visualization and Computer Gr aphics , 17 (12). Brya n J (2016). “Spreadsheets.” In useR! Confer enc e . Brya nt L, Muller S , P ass R (2000). “ODS, YES! Odious, NO! An int ro duction to the SAS Output Deliv ery System.” In Pr o c e e dings of the Twenty-Fifth Annual SAS Users Gr oup International Confer enc e . Cairo A (2013). The F unctional Art: An intr o duction to information gr aphics and visualization . New Riders. Carter S, Ericson M, Leonhard D, Marsh B, Quealy K (2010 ). “Budget P u zzle: Y ou Fix the Budget.” The New Y ork Times . Carver R, Everson M, Gabrosek J, Horton NJ, Lo c k RH, Mock o M, Rossman A, Row ell GH, V elleman P , Witmer JA, W oo d B (2016 ). Guidelines for assessment and instruction in statistics e duc ation: Col le ge R ep ort 2016 . Amer ican Statistical Association. Cass S, Diak op oulos N, Romero JJ (20 14). “Inte ractiv e: T he T op Programming Languages: I E EE Sp ectrum’s 2014 Rating.” IEEE Sp e ctrum . ¸ Cetink a ya Ru ndel M (2014). “ShinyEd.” https:// stat.duk e.edu/~mc301/shinyed/ . Chance B, Rossman A (2006). “Using simulatio n to teac h and learn statistics.” In ICOTS-7 . Chang W, Cheng J, Allaire JJ , Xie Y, McPherson J (2015). Shiny: Web applic ation fr amework for R . R pack age version 0.12 .0. De Leeu w J (2009). “Statistical Softw are - Ove rview.” T e chnic al r ep ort , Department of Statistics, Unive rsity of California, Los Angeles. Diez DM, Barr CD, ¸ Cetink a ya Rundel M (2014). Intr o ductory Statistics with R andomization and Simulation . Op enIntro. Donoho D (2015). “50 yea rs of Data Science.” In Princ eton NJ, T ukey Centennial W orkshop . Ellior AJ, Maier MA, Moller AC, F riedman R, Meinh ardt J (2007). “Color and ps yc hological functioning: T he eﬀe ct of red on perf ormance attainment.” Journal of Exp erimental Psycholo gy , 136 (1), 154–1 68. Eve rson M, Zieﬄer A, Garﬁeld J (2008). “Implementing n ew reform guidelines in teaching intro- ductory college statistics courses.” T e aching Statistics , 30 (3). F ello ws I (201 2). “Deducer: A Data Analysis GUI for R.” Journal of Statistic al Softwar e , 49 (8). Finzer W (2002 ). “Th e F athom exp erience: Is researc h-based deve lopment of a commercial statistics learning environmen t p ossible?” In ICOTS-6 . Finzer W (2013) . “The Data Science Ed ucation Dilemma.” T e chnolo gy Innovations in Statistics Educ ation , 7 (2). Fitzalle n N (2013). “Characterizing Students’ Interac tion with TinkerPlots.” T e chnolo gy Innova- tions in Statistics Educ ation , 7 (1). F ox J (200 4). “Getting s tarted with the R Commander: A basic-statistic s graph ical user in terface to R.” In useR! Confer enc e . F ranklin C, Kader G, Mewborn D, Moreno J, Pec k R, Perry M, Schaeﬀe r R (2005). Guidelines for assessment and instruction i n statistics e duc ation r ep ort: K-12 . America n Statistical Association. URL http://w ww.amsta t.org/education/gaise/GAI SEPreK- 12_Full.pdf . F riel SN (200 8). “The Researc h F rontier: Where T echnolog y In teracts with the T eac h ing and Learning of Data Analysis an d Statistics.” In MK Heid, GW Blume (eds .), R ese ar ch on te c h- nolo gy and the te aching and le arning of mathemat ics , v olume 2. Nat ional C ou n cil of T eac h ers of Mathematics. Garﬁeld J, Ben-Zvi D (2008). “Preparing school teachers to develop students’ statistical reason- ing.” In Joint ICMI/IASE Study: T e aching Statistics in Scho ol Mathematics. Chal lenges for T e aching and T e acher Educ ation. Pr o c e e dings of the ICM I Study 18 and 2008 IASE R ound T able Confer enc e . ICMI/IASE. Garﬁeld J, Chance B, Sn ell J L (2002). The T e aching and L e arning of M athematics at the University L e vel , c hapter T ec hnology in college statistics courses. S pringer. Godf r ey AJR (2013). “Statistical softw are from a blind person’s p ersp ectiv e.” The R Journal , 5 (1), 73–79 . Gould R, Johnson T, McNamara A, Mo lyneux J, Moncada-Mac hado S (20 15). Intr o duction to Data Scienc e . Mobilize: Mobilizing for In nov ativ e Computer Science T eac h ing and Learning. Gould R, Pec k R (2004). “Preparing secondary mathematics educators to teac h statistics.” Cur- ricular development of statistics e duc ation , p p. 244–2 55. Grolem und G, Wic kham H (2011). “Dates and times made easy with lu b ridate.” Journal of Statistic al Softwar e , 40 (3). Guo P (2014) . “Python is Now the Most Popular Introductory T eac h ing Language at T op U.S. Unive rsities.” Blo g@ACM . Haghish EF (2014 ). “MarkDoc: Literate programming in Stata.” http://w ww.haghi sh.com/resources/pdf/Hag hish_MarkDoc.pdf . Hall J (2008). “Using Censu s at School and Tinkerplots to supp ort Ontario elementary teachers’ statistics teaching and learning.” In Joint ICMI/IASE Study: T e aching Statistics in Scho ol Mathematics. Ch al lenges for T e aching and T e acher Educ ation. Pr o c e e dings of the ICM I Study 18 and 2008 IASE R ound T able Confer enc e . ICMI/IASE. Hammerman JK, Rubin A (2004). “Strategies for managing statistical complexity with new softw are tools.” Statistics Educ ation R ese ar ch Journal , 3 (2), 12–41. Heer J (2014). “V ega.” https:// github.c om/trifacta/vega . Hermans F, Murphy-Hill E (20 15). “Enron’s spreadsheets and related emails: A dataset a nd analysis.” In ICSE . Herndon T , Ash M, P ollin R (201 3). “Does Hig h Public Debt Consisten tly Stiﬂe Economic Growth? A Critique of Reinhart and Rogoﬀ.” Cambridge Journal of Ec onomics , 38 (2). Horton NJ, Baumer B, Wickham H (2014). “T eac hing p recursors to data science in introductory and second courses in statistics.” In ICOTS-9 . IBM Corp (2013). “SPSS Statistics for Windows, V ersion 22.0.” Armonk, NY: IBM Corp. Inc PT (2015). “Collaborative Data Science.” https://plo t.ly . Kandel S, Paepck e A, Hellerstein J , Heer J (2011 a). “W rangler: Interactiv e visual sp eciﬁcation of data transformation scripts.” In CHI 2011 . Kandel S, et al. (2011 b). “Research directions in data wrangling: Visualizations and tran s formations for usable and credib le d ata.” Information Visualization , 10 (4), 271–28 8. Kaplan D, Sh oop L (2013). “Data and Compu ting F undamentals.” http://h tmlprevi ew.github.io/?https://gi thub.com/dtkaplan/DataAndCom putingFundamentals/blob / m Katz J, Andrews W (2013). “Ho w Y’all, Y ouse and Y ou Gu y s T alk.” The New Y ork Times . Ka y A (1984). “Computer Soft ware.” Scientiﬁc Americ an . Knuth DE (1984 ). “Literate Programming.” The Computer Journal , 27 (2). Konold C (2007). “Designing a d ata analysis to ol f or learners.” In MC Lov ett, P Sh ah (eds.), Thinking with Data . Lawrence Erlbaum Asso ciates. Konold C, K azak S (2008). “Reconnecting d ata and chance.” T e chnolo gy Innovations in Statistics Educ ation , 2 (1). Konold C, Miller CD (2005 ). “Tin kerPlots: Dynamic data exploration.” Computer softwar e Emeryvil le, CA : Key Curriculum Pr e ss . Legacy M (2008). “AP Statistics T eac h er’s Guide.” T e chnic al r ep ort , The College Board. Lehrer R (200 7). “Introducing students to data representa tion and statisti cs.” In J W atson, K Beswick (eds.), Pr o c e e dings of the 30th annual c onfer enc e of the Mathematics E duc ation R e - se ar ch Gr oup of Austr alia . Leisc h F (2002). “Swea ve, Part I: Mixing R and LaT eX.” R News , 2 (3). Lenth R V (2012). StatWe ave Users’ Manual . Universit y of Iow a. Leonhardt, D (2014) . “@DLeonhardt: “The most visited page in NYT history is th e dialect qu iz: http:/ /nyti.ms/1 bYNB1z . @jshk atz made it. He’s joined our team.” .” 14 F ebruary 2014, 9:35 a.m. Tweet. Lock PF, L o ck RH, Lock DF, Morgan KL, Lo ck EF (2012). Stat istics: Unlo cking the Power of Data . Wiley . Mac kin la y JD, Hanrahan P , Stolte C (2007). “Show me: Automatic Presentation for Visual Anal- ysis.” IEEE T r ansactions on Visualization and Computer Gr aphics , 13 (6), 1137–11 44. Mathews SM, Reed M, Angel N (2013). “Getting stud ents excited ab out data analysis.” Ohio Journal of Scho ol Mathematics . McCullough BD, Heiser D A (2008). “On the accuracy of statistical pro cedures in Microsoft Excel 2007.” Com putational Statistics & Data Analysis , 52 , 4570 –4578. McKinney W (2012). Python for data analysis: Data wr angling with Panda s, NumPy, and iPython . O’Reilly . McNamara A (2015). Bridging the Gap Betwe en T o ols for L e arning and for Doing Statistics . Ph.D. thesis, Unive rsity of California, Los Angeles. McNamara A (2016). “Key attributes of a mo dern statistic al compu ting tool.” S ubmitted. Melard G (20 14). “On the accuracy of statisti cal pro cedures in Microsoft Excel 2010.” Computational Statistics , 29 (1095 ). Morgan KL, Lock RH, Lock PF, L ock EF, Lock DF (2014 ). “StatKey: Onlin e to ols for bo otstrap inte rv als and rand omizatio n tests.” In ICOTS-9 . Muller CL, Kidd C (201 4). “Debugging geographers: T eaching p rogramming to non-computer scien tists.” Journal of Ge o gr aphy in Higher Educ ation , 38 (2), 175–192. New Y ork Times (2012). “The Electo ral Map: Building a Path to Victo ry .” The New Y ork Times . Nolan D, T emple L an g D (2007 ). “Dynamic, in teractive docum ents for teaching statistic al practice.” International Statistic al R evi e w , 75 (3), 295–321 . Nolan D, T emp le Lang D (201 0). “Compu tin g in the statistics curr icula.” The Americ an Statistician , 64 (2), 97–107 . P ´ erez F, Granger BE (200 7). “iPython: a System for Interac tiv e Scient iﬁc Computing.” Computing in Scienc e & Engine ering , 9 (3), 21–29. Pe rez F, Granger BE (2015 ). “Pro j ect Jupyter: Computational Narr atives as the E n gine of Col- laborative Data Science.” T e chnic al r ep ort , Pro ject Jupyter. Pfannkuch M, Ben-Zvi D (2011). T e aching Statist ics in Scho ol Mathematics- Chal lenges for T e ach- ing and T e acher Educ ation , c hapter Dev eloping T eachers’ Stati stical Thinking. Spr inger Science + Business Media. Plaue C , Co ok LR (2015). “Data journalism: Lessons learned while designing an inte rdisciplinary service course.” In SIGCSE’15 . Pruim R, Horton NJ, Kaplan D (2014 ). Start te aching with R . Pr o ject MOSAIC. Pruim R, K aplan D, Horton NJ (2015a ). mos aic: Pr oje ct MOSAIC (mosaic-web.or g) statistics and mathematics te aching utilities . R pack age version 0.9.2-2 . URL http://C RAN.R- pro ject.org/package=m osaic . Pruim R, Kaplan D, Horton NJ (2015 b). mosaic: Pr oje ct MOSAIC Statistics and Mathematics T e aching Utilities . R pack age version 0.1 0.0. R Core T eam (2015 ). “Compr ehensive R Archiv e Net work.” h ttp://cr an.r- proj ect.org/ . R Core T eam (2016). R: A language and envir onment for statistic al c omputing . R F oundation for Statistical Computing, Vienna, Austria. URL http://w ww.R- proj ect.org . Ragan-Kelley M, Perez F, Granger B, K luyver T, I v an ov P , F rederic J, Bussonier M (2014). “The Jupyter/IPython architecture: a uniﬁ ed view of computational research, from int eractiv e explo- ration to communication and publication.” In Amer ic an Ge ophysic al U nion . Ranum D, Miller B, Zelle J, Guzdial M (2006) . “successful app r oac h es to teac hing introdu ctory computer science courses with python.” In SIGCSE’06 . Rising B (2014). “Repro ducible research in Stata.” In 12 th German Stata Users Gr oup Me eting . Robinson D (2016). br o om: Convert statistic al analysis obje cts into tidy data fr ames . R pack age ve rsion 0.4.1. RStudio T eam (2014). “RStudio: Integ rated Dev elopment for R.” http://w ww.rstud io.com/products/rstudio/ . Rubin A (2002 ). “Interact ive visualizations of statistical relationships: what d o we gain?” In R ese ar ch p ap ers fr om ICOTS 6 . Rubin A, Hammerman J K, Konold C (2006 ). “Exploring informal inference w ith interactiv e visu- alizati on soft ware.” In R ese ar ch p ap ers fr om ICOTS 7 . Sark ar D (2008). L attic e: Multivariate data visualization with R . Springer. SAS Institute I (2012). “JMP , V ersion 10.” SAS Institute Inc (2015). “SAS 14.1.” Sat yanara yan A, Heer J (201 4). “Lyra: An interact ive visualization design environment.” In Eur o gr aphics Confer enc e on Visualization (Eur oVis) 2014 , v olume 33, p. 3. Sat yanara yan A, Moritz D, W ongsuph asaw at K, Hee r J (2017). “V ega-Lite : A grammar of interac- tiv e graphics.” IEEE T r ans. Visualization & Comp. Gr aphics (Pr o c. InfoVis) . Sat yanara yan A, Ru ssell R, Hoﬀswell J, Heer J (2016) . “Reactiv e V ega: A streaming data ﬂ ow arc hitecture for d eclarativ e interactiv e visualization.” IEEE T r ansactions on Visualization and Computer Gr aphics , 22 (1). StataCorp (2015) . “Stata Statistic al Softw are: Release 14.” College Station, TX: StataCorp LP. The Concord Consortium (201 6). “Common Online Da ta Analysis Platform.” https:// codap.co ncord.org/ . Two Sigma O p en Sour ce, LLC (2016). “Beak er: The Data Scientist’ s Lab oratory .” http://b eakernot ebook.com/ . V ance A (2009 ). “Data analysts captiv ated by R’s p ow er.” New Y ork Times . V elleman PF (1989 ). Data Desk: Handb o ok, V olume 1 . Data Description, Inc. V erb orgh R, Wilde MD (2013). U si ng Op enR eﬁne . Pac kt Pu b lishing. V erzani J (200 5). “simpleR- Using R for introductory statistics.” Chapman & Hall/CR C. W atson J, Donne J (2009 ). “Tink erP lots as a researc h to ol to explore student understandin g.” T e chnolo gy Innovations in Statistics Educ ation , 3 (1). W atson J , Fitz allen N (2010). “Th e Devel opment of Graph Und erstanding in the Mathematics Curriculu m.” T e chnic al r e p ort , New South W ales Department of Education and T r aining. W est W, W u Y, Heydt D (2004). “An introdu ction to StatCrunch 3.0.” Journa l of Statistic al Softwar e , 9 (5). Wic k h am H (2009). g g plot2: Ele gant gr aphics for data analysis . Sp ringer New Y ork. Wic k h am H (2014a). “Tidy data.” Journal of Statistic al Softwar e , 59 (10). Wic k h am H (2014b). “Why d plyr?” In useR! Confer enc e . Wic k h am H (2016). stringr: Simple, c onsistent wr app ers for c ommon string op er ations . R pac k age ve rsion 1.1.0. Wic k h am H, F r an cois R (201 5a). dplyr: A gr ammar of data manipula tion . R pack age version 0. 4.1. Wic k h am H, F r ancois R (2015b). r e adr: R e ad tabular data . R pack age version 0.2. 2. Wild C, Elliott T (2016). “iNZight.” https:// www.stat .auckland.ac.nz/~wild/iNZigh t . Wilkinson L (2005). The Gr ammar of Gr aphics . Statistics and computing. Springer S cience + Business Media. Wills G (2008 ). Handb o ok of Data Visualization , chapter Linked Data Views. Sp ringer Handbo oks. Wills G (2016). “Brunel Visualizatio n.” https:// github.c om/Brunel- Visualization/Brunel . Wilson G, Brya n J, Cranston K, Kitzes J, Nederbr agt L, T eal TK (2016 ). “Goo d en ough practices for scienti ﬁc computing.” https:// swcarpen try.github.io/good- enough- practices- in- sc ientific- computing/ . Xie Y (2014) . Dynamic Do cuments with R and knitr . Chapman & Hall/CR C T he R Series.

On the State of Computing in Statistics Education: Tools for Learning and for Doing

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment