Object-Oriented Programming, Functional Programming and R

This paper reviews some programming techniques in R that have proved useful, particularly for substantial projects. These include several versions of object-oriented programming, used in a large number of R packages. The review tries to clarify the o…

Authors: John M. Chambers

Object-Oriented Programming, Functional Programming and R
Statistic al Scienc e 2014, V ol. 29, No. 2, 167– 180 DOI: 10.1214 /13-STS452 c  Institute of Mathematical Statisti cs , 2014 Object-Oriented Programming, F unctional Programming and R John M. Chamb ers Abstr act. This pap er reviews some programming tec hniques in R that ha v e p ro v ed u seful, particularly for subs tantial p ro jects. These include sev eral v ersions of ob ject-orien ted programming, u sed in a large num- b er of R pac k ages. The review tries to clarify the origins and ideas b ehind the v arious v ersions, eac h of whic h is v aluable in the appropri- ate con text. R has also b een strongly influenced by the ideas of functional pr o- gramming and , in particular, by the d esire to com bine fu nctional with ob ject orien ted pr ogramming. T o clarify how this particular mix of ideas has tur ned out in the current R language and sup p orting soft w are, the pap er will first review the basic ideas b ehin d ob ject-orien ted and functional pr ogramming, and then examine th e ev olution of R with these ideas pro viding conte xt. F unctional programming supp orts w ell-defined, defens ible soft w are giving repro du cible r esults. Ob ject-orien ted pr ogramming is the mec h- anism p ar exc el lenc e for managing complexit y while k eeping things sim- ple for the user. T he tw o paradigms hav e b een v aluable in supp orting ma jor softw are f or fitting mo dels to data and n umerous other statistical applications. The paradigms h a v e been ad op ted, and adapted, distinctiv ely in R . F unctional programming motiv ates muc h of R bu t R do es not enforce the paradigm. O b ject-orien ted p rogramming from a functional p ersp ec- tiv e differs from that used in non-fun ctional languages, a distinction that needs to b e emphasized to av oid confusion. R initially replicated the S languag e from Bell Labs, whic h in turn w as strongly influen ced by earlier p rogram libraries. At eac h stage, n ew ideas h a v e b een added, but the pr evious soft wa re con tin ues to show its influence in the design as w ell. Outlining th e ev olution will fu rther clar- ify why we currently ha v e this somewhat unusual com bination of ideas. Key wor ds and phr ases: Programming languages, fun ctional program- ming, ob ject-orien ted pr ogramming. John M. Chamb ers is Consulting Pr ofessor, Dep artment of Statistics, Stanfor d U niversity, Stanfor d, California 94305 -4065, USA e-mail: jmc@stat.stanfor d.e du . This is an electronic reprint of the original a rticle published b y the Institute o f Mathematica l Statistics in Statistic al Scienc e , 20 14, V o l. 2 9, No. 2, 167–1 8 0 . This reprint differ s fro m the or iginal in pagina tion and t yp ogr aphic detail. 1. INTRODUCTION R has become an important medium for comm u- nicating new metho d ology in s tatistics and related tec hn ology . References to the supp orting R soft- w are frequent ly accompan y journ al articles or other publications d escribing new results. The softw are is a v ailable to other R us ers, ideally as a pac k age in a standard r ep ository . The b enefits for statistics as a discipline are considerable: The communit y has 1 2 J. M. CHAMBERS rapid access to new ideas in a fr ee, op en -sour ce for- mat as softw are that can in most cases b e installed and u sed immediately by those intereste d in the sta- tistical tec hniques. The user comm unity has b oth created and b enefited fr om this resource. This pap er examines tw o of the most signifi- can t parad igms i n pr ogramming languages gener- ally: ob ject-orien ted programming (OO P) and fun c- tional programming. R mak es u se of b oth, but in its o wn wa y . Both paradigms are v aluable for seri- ous programming with the language. But in b oth cases, understanding th e relev ant ideas in the c on- text of R is n eeded to a v oid confusion. The confu- sion sometimes arises, in b oth cases, fr om applying to R interpretations of th e paradigms that apply to other languages bu t not to this one. Section 2 of the pap er w ill r eview the ideas, generally an d in their R v ersions, with the goal of clarifying the basics. Giv en the imp ortance of R soft wa re to the comm u- nit y , creators of new R soft w are should b enefit from understand ing these concepts. W e will also examine in Section 3 of the pap er the ev olution that led to these v ersions of functional programming and OOP . The pr im e motiv ation was not la nguage design in the abstract but to pr o vide the to ols needed for researc h and data analysis by the user comm unit y at the time. R originally repro- duced the f u nctionalit y of the S languag e at Bell Labs, whic h itself had ev olv ed thr ou gh several stages b eginning in the late 1970s and whic h was in turn based on earlier statistical softw are libr aries, mainly in Fo rtran . R added imp ortan t new ideas and has contin ued to evol ve , but the main con ten ts inherited through S shap ed th e capabilities and the approac h to statisti- cal computing. In a s urprisin g n umb er of areas, what w e think of as “the R w a y” of organizing the compu - tations actually reflects soft ware develo p ed tw ent y y ears or more b efore R existed. Ha vin g b een inv olve d in all the stages, I am n at- urally inclined to a historical p ersp ectiv e, but it i s also the case that t he h istory itself had sub stan tial impact on the results. It m a y b e comforting to view programming languages as ab s tract d efinitions, b ut in p r actice they evo lve f r om the needs, inte rests an d limitations of th eir creators and users. 2. FUNCTIONAL AND OBJECT-ORIENTED PROGRAMMING: THE MAIN IDEAS F unctional and ob ject-orien ted programming fit naturally into statistical app lications and into R . The original motiv ating u se c ase, fitting models to data, r emains comp elling. An expression suc h as irisFi t < - lm(Sep al.Wi dth ∼ . - Sepal. Length , i ris) calls a function that crea tes an ob ject represen ting the linear mo d el sp ecified b y the firs t argum en t, ap- plied to th e data sp ecified by the second argument . The computatio n is functional, w ell-defined b y the argumen ts. It returns an ob ject wh ose prop erties pro vide the informatio n needed to study and wo rk with th e fi tted mo d el. Other f unctions and other ob- jects can adap t to different m o dels in a form that is con v enien t f or b oth the u ser and the implemen ter. Principles of functional p rogramming guid e us in writing reliable, repro ducible functions for the dif- feren t mo dels. Ob ject-orien ted programming p ro- vides to ols for defin ing the m o del ob jects clearly , and ad ap tin g to n ew ideas a nd new forms of mo d- els. Section 3.4 go es into details of the R implemen- tations. As they h a v e b een realized in R , b oth paradigms cen ter o n a f ew, in tu itiv e concepts. Th e details are more complicated, as th ey u sually are. In the case of functional pr ogramming, the realizatio n in R is only partial, reflecting the la nguage’s origins as well as practical c onsiderations. In t he case of OOP , there are no w at least three realizations of the ideas in R , using t wo different paradigms. All thr ee ha ve signif- ican t applications and pr actical v alue. Despite all these devilish d etails, the main ideas remain visible and useful, particularly w h en pr o- gramming serious applications using the language. 2.1 F unctional Progra mming F or our purp oses, the main principles of fun ctional programming can b e summarized as follo ws: 1. Programming consists largely of d efining func- tions . 2. A fun ction d efinition in the language, like a function in m athematics, imp lies that a fun ction call returns a uniqu e v alue corresp ond ing to eac h v alid set of argumen ts, but only dep endent on these ar- gumen ts. 3. A function call h as no sid e effects th at could alter other computations. The implication of th e second p oin t is that func- tions in the pr ogramming language are mapp in gs from the allo w ed set of argumen ts to some range of output v alues. In particular, the returned v alue OBJECT-ORIENTED AND FUNCTIONAL PROGRAMMING 3 should not dep end on ot her quan tities that affect the “state” of the soft wa re wh en the fun ction call is ev aluated. T rue functional languages conform to these ideas b oth by wh at they do provide, s u c h a s pattern ex- pressions, and what they d o not provide, suc h as pro cedur al iteration or dynamic assignmen ts. The classic tutorial example of the factorial fu nction, for example, could b e expressed in the Hask ell language b y the pattern: factor ial x = if x > 0 then x * factor ial (x-1) else 1, plus some typ e in formation, su c h as that a v alue for x must b e an int eger scalar. Is R a functional programming language in this sense? No. The stru cture of the language do es not enforce fun ctionalit y; Section 2.3 examines that structure as it relates to fun ctional programming and O OP . The evo lution of R from earlier w ork in statistica l computing also inevitably left p ortions of earlier pre-fun ctional computations; Section 3 out- lines the history . Random num b er generation, for ex- ample, is imp lemen ted in a distinctly “state-based” mo del in wh ic h an ob ject in the global environ- men t ( .Ran dom.see d ) represents the cur r en t state of the generators. Purely functional languages ha ve dev elop ed tec hn iques f or many of these computa- tions, but r ewr iting R to eliminate its huge b o d y of supp orting soft wa re is n ot a pr actical prosp ect and w ould requir e replacing s ome v ery w ell-tested and w ell-analyzed computations (rand om n umber gen- eration b eing a go o d examp le). F unctional pr ogramming remains an imp ortant paradigm for statistical compu ting in sp ite of these limitations. Statistical mo d els for data, the motiv at- ing examp le for man y features in S and R , illustrate the v alue of analyzing the softw are from a fu nctional programming p ersp ectiv e. Soft wa re for fitting mo d- els to data remains one of the most activ e uses of R . The fun ctional v alidity of such soft ware is im- p ortant b oth for theoretical justification and to de- fend the results in areas of cont rov ersy: Can we sho w that th e fitted models are w ell-defined functions of the d ata, p erhaps with ot her inputs to the mo del suc h as p rior d istributions considered as add itional argumen ts? The structure of R as describ ed in Sec- tion 2.3 can provide su pp ort for analyzing fu nctional v alidit y . Equally u sefully , suc h analysis can also illu- minate the limits of functional v alidit y f or particular soft w are, su c h as that for mo del-fitting. 2.2 Object-Oriented Program ming The main ideas of ob ject-orien ted programming are also qu ite simple and in tuitiv e: 1. Ev erything w e compute with is an o bje c t , and ob jects sh ou ld b e stru ctur ed to su it the goals of our computations. 2. F or this, the key programming to ol is a c lass definition s aying that ob jects b elonging to this class share s tr ucture defined by pr op erties they all ha v e, with the prop erties b eing themselve s ob jects of some sp ecified class. 3. A class can inherit from (con tain) a simpler sup erclass, such that an ob ject of this class is al so an ob ject of th e sup erclass. 4. In order to co mpu te with ob jects, w e can de- fine metho ds that are only used when ob jects are of certain classes. Man y programming languages reflect these ideas, ei- ther from their inception or b y adding some or all of the id eas to an existing language. Is R an OOP language ? Not f r om its inceptio n, but it has add ed imp ortant soft wa re reflecting th e ideas. In fact, it h as done so in at least th r ee separate forms, giving rise to some confusion that this p ap er attempts to r educe. Some of the confusion arises from not r ecognizing that the fin al item in the list ab ov e can b e imp le- men ted in rad ically d ifferen t wa ys, dep end ing on the general paradigm o f the programming la nguage. A k ey distinction is whether the metho d s are to b e em b edd ed in some form of functional pr ogramming. T raditionally , mo st languages adopting the OOP paradigm are not fu nctional; either the language b e- gan with ob jects and classes as a cent ral motiv ation ( SIMULA , Ja va ) or add ed the paradigm to an exist- ing non-functional language ( C ++ , Python ). In suc h languages, method s were nat ur ally asso ciated with classes, essen tially as callable prop erties of the ob- jects. The language w ould then include synta x to call or invoke a metho d on a particular ob ject, most often using the in fix op erator “ . ”. T he class d efini- tion then encapsulates all the soft w are for the class. Where metho ds are needed f or other computations, suc h as sp ecial metho d names in Python or op era- tor o v erloading in C ++ , these are pro vided b y ad- ho c mec hanisms in the language, but the method remains part of the class d efinition. In a language th at is fu nctional or that aspires to b ehav e f u nctionally as S and R do, the natur al r ole 4 J. M. CHAMBERS of metho ds corresp onds to the intuitiv e meaning of “metho d”—a technique for computing the desired result of a fun ction call. In fu nctional OOP , the par- ticular computational technique is chosen because one or m ore argumen ts are ob jects from recognized classes. Metho ds in this situation b elong to fu nctions, not to classes; th e f u nctions are ge neric . In the simplest and most common case, referred to as a standard generic function in R , the fu n ction defines the f ormal argumen ts but otherwise consists of nothin g b ut a table of the corresp onding metho d s p lus a command to select the metho d in the table t hat ma tc hes the classes of the arguments. The sele cted metho d is a function; the call to the generic is then ev aluated as a call to the s elected metho d. W e will refer to t his form of ob j ect-orien ted pro- gramming as functional OOP as opp osed to the e n- c apsulate d form in whic h methods are part of the class d efinition. 2.3 Their Relationship to R T o u nderstand computations in R , t w o slogans are helpful: • Ev erything that exists is an ob ject. • Ev erything that happ ens is a function call. In con tr ast to languages suc h as Java and C ++ where ob jects are distinct from more primitiv e data t yp es, ev ery reference in R is to an ob ject, in partic- ular, to a s in gle internal stru ctur e t yp e in the under- lying C implementa tion. T his applies to data in the usual sense and also to all parts of the language it- self, suc h as fun ction defin itions and fu n ction calls. Computations that are more complex than a con- stan t or a simple name are all treated as fun ction calls b y the R ev aluator, w ith cont rol s tr uctures an d op erators simp ly alternativ e syn tax hiding the fu nc- tion call . [Details and examples are s ho wn in (Cham- b ers ( 2008 ), pages 458–46 8).] The t wo slogans, ho wev er, do not imply that computations in R must follo w either functional or ob ject-orien ted programming in the senses outlined in the preceding secti ons. With resp ect to ob ject- orien ted p rogramming, R has seve ral implemen ta- tions that ha ve ev olv ed as outlined in Section 3 . These can b e u sed by programmers to pr o v id e soft- w are follo wing either of the OOP p arad igms. F unctional programming’s relationship to R is less straigh tforw ard. The ev aluation p r o cess in R do es not en f orce functional programming, but d o es en- courage it to a degree. In particular, the ev aluation pro cess in R con tributes to f unctional programming b y largely a v oiding side effects when fu nction ca lls are ev aluated, b u t some mec hanisms in the language and esp ecially in the un derlying s u pp ort code can b ehav e in a non-fu nctional wa y . T o und erstand in a bit more detail, we need to examine this ev aluation pro cess. Computations in R are carried out by the R ev alu- ator b y ev aluating fu nction call ob jects. These ha v e an expr ession for the function defin ition (usu ally a reference to it by name) and zero or more expres- sions f or the arguments to the call. T he f ull d etails are somewhat b ey ond our scop e here, bu t an essen- tial question is how references to ob jects are han- dled. Any programming language m ust hav e r efer- ences to data, which in R m eans references to ob- jects. As discussed in S ection 3 , the ev olution of such references is cen tral to the ev olution of programming languages, esp ecially for statistics. In R a referen ce to an ob ject is the com bin ation of a name and a con text in which t o lo ok up that name; the con texts in R are t hemselve s ob jects, of t yp e “ environ ment ”. A reference is therefore the com b ination of a name an d an en vironment. (W e’ll lo ok at an examp le s hortly .) Note t hat w e are talking ab out references to ob- jects; most ob jects in R are not themselv es refer- ence ob jects. Languages implementing OOP in the traditional, non-fu nctional form essen tially alw ays include r eference ob jects, in particular, wh at are termed mutable references. If a metho d alters an ob ject, sa y , by assigning new v alues to some of its prop erties, all r eferences to that ob ject see the c h ange, regardless of the conte xt of the call to th e metho d. Whether the reassignmen t of the prop ert y tak es place where the ob ject o riginated or d own in some other metho d make s n o difference; the ob ject itself is the reference. In con trast, the referen ce in R consists of a name and an en vironment—the environmen t in w h ic h the ob ject referr ed to has b een assigned with that name. Most R p rogramming is based on a concept of lo- c al r efer enc e s ; that is, reassigning p art of an ob ject referred to by name alters the ob ject referr ed to by that name, but only in the lo cal en vironment. If that lo cal reference started out as a reference in some other en vironment, that other reference is still to the original ob ject. OBJECT-ORIENTED AND FUNCTIONAL PROGRAMMING 5 T o understand the relation of lo cal references to functional programming in R , an example and a few more details o f function call ev aluation are needed. R ev aluates fu n ction call s as ob jects. F o r example, when the ev aluator encounte rs the call lm(Sep al.Wi dth ∼ . - Sep al.Le ngth, ir is), it uses the ob ject represent ing the call to create an en vironment for the ev aluation. The call iden tifi es the function, a lso an ob ject of course, t ypically referr ing to it by name. In this case lm r efers to an ob ject in the stats pac k age. T h at ob- ject has f ormal argumen ts [14 of th em, in the case of lm() ]. The ev aluator in itializes an environmen t for the call with ob jects corresp ond ing to the formal ar- gumen ts, as unev aluated expr essions built from the t w o actual argu m en ts and default expr essions foun d in the function d efinition. F or details se e Sectio n 4 of th e language defin ition, R Core T eam ( 2013 ) and Chapter 13 of Cham b ers ( 2008 ). As an aside, the common use of terms like “call b y v alue” (and t he con trasting “ca ll b y r eference”) for argu m en t pass- ing in R is in v alid and m isleading. Arguments are not “passed” in the usual sense. Lo cal references op erate on all the ob jects in the en vironment to preve nt side effects. Th e formal ar- gumen t data to lm() matc hes the expression iris , whic h refers to an ob ject in the datasets pac k age. Ex- pressions that extract information from data w ork on that ob ject. But the lo cal reference defin ed by data and the environmen t of the ev aluation is d is- tinct from the reference to iris in the pac k age. If an assignment or rep lacemen t exp r ession is encoun- tered that w ould alt er data , the ev aluator will d u- plicate the ob ject first to ensur e lo calit y of t he ref- erence. The lo cal r eference paradigm is helpful in v alidat- ing the fu nctionalit y of an R function. Only the lo cal assignmen ts and replacement s need to b e examined; calls to other f u nctions will n ot alter references in this en vironment , so long as those functions stic k to lo cal reference b ehavio r. If a function f() calls a function g() and b oth fun ctions stic k to lo cal refer- ence assignments, then kno wing that th e v alue of a call to g( ) dep ends only on the arguments is all that is needed; ho w g() computes that v alue is irrelev ant . While lo cal r eferences help a v oid sid e effects, they do not prev en t computations fr om referring to ob- jects or other data outside the f unctions b eing called, and therefore p oten tially returning a result that dep end s on a n on-functional “state.” Wh ether a particular co mpu tation in R is strictly functional can on ly b e determined by examining it in detai l, including all the functions that call co de in C o r F o rtran . The rest of this section tak es a sligh t detour to consider h o w one migh t do that examination. V alidating Functionalit y in R In prin ciple, the functional v alidit y of particu- lar computations could b e analyzed and either cer- tified or the limitations to functionalit y rep orted. Suc h fun ctional v alidation would b e u seful in cases where either the theoretical v alidit y or the implica- tions of the result in an app lication are b eing ques- tioned. Fitting mo dels to data p ro vides a natural example for b oth asp ects. Giv en a fun ction taking as arguments data and a mo del sp ecification and returning a fitted mo d el ob ject, can one v alidate that the returned ob ject is functionally defined by the arguments? If n ot, can the non-fun ctionalit y b e parametrized meaningfully , in which case one can construct a functional v ersion of the computation by including s u c h p arameters as implicit arguments? R do es not ha ve organized su pp ort for suc h v alidit y in v estigations, but deve loping to ols for the p u rp ose w ould b e a w orthwhile pro ject. F unctional v alidation is a b otto m-up construction. The b ottom la y er consists of an y fun ctions calle d that are not implemente d in R , typical ly those that call r outines in C + +, C or F o rtran . In cluded are the R p rimitiv es, routines from numerical libr aries and a v ariet y of other standard sour ces, p lus any new co de brought in to implemen t the computation in q u es- tion. The functional v alidit y of eac h of these is an empirical assertion. S ome are clearly non-fun ctional, suc h as the “ << - ” op erator an d assign() fu nction that do n onlo cal assignmen ts. Man y computations in R ev en tually call su b pro- grams not originally wr itten for R . Eac h of these m ust b e examined for p otenti al n on-functional b e- ha vior, sometimes a daunting task. Ho w ev er, go o d practice in using well -tested, preferably op en-sour ce supp orting soft wa re will often pro vide a plausible basis. If R co de in cludes an in terface to c o de in C , Fo r- tran or other languages whose fun ctional v alidity cannot b e established, nothing more can b e s aid. Other th an suc h co de, functional v alidit y is likely to fail for one of thr ee reasons: • dep end ance on n onlo cal v alues; 6 J. M. CHAMBERS • using l o w-lev el computations in R kn o wn t o vio- late functionalit y; • c hanging functions or other ob jects at run time. A prime e xample of the first is the use of external data, such as the global optio ns ob j ect, for conv er- gence tolerances or other parameters for iterativ e n umerical compu tations. An example of the second is the inclusion of pseudo-rand om v alues in the cal- culation. The thir d problem might b e caused, for example, by u sing a function fr om the global en vi- ronment . The third danger is greatly red u ced when the co de resides in th e namespace of a pack age with explicit imp ort ru les. An y r easonable approac h to v alidating functionalit y would mak e this a requir emen t. My f eeling is that most examples of failures could b e corrected t o create functionally v alid extensions of the computation in question. T olerances are often organized through th e R opti ons() fun ction, explic- itly designed to a v oid functional pr ogramming by allo wing users to set state parameters that are then queried by the calculation. O nce iden tified, such op- tions could b e conv erted to additional argumen ts to the function b eing v alidated. [A general mec hanism w ould b e a version of getO ption() that required the option in question to b e su pplied as an argument.] Pseudo-random v alues are used in a v ariet y of pro cedur es, including some optimization tec hniques where they are exp ected to p ro vide more robust n u- merical b eh a vior by jittering v alues d uring iteration. These can b e m ade f unctionally v alid by u sing well - defined generator soft wa re, su ch as that supp lied in R itself, and by treating the initial s tate of the gener- ator as another nonlo cal v alue to b e incorp orated as an add itional argument . On e sh ould alwa ys in clude an explicit initialization via set. seed() in an y ex- ample exp ected to b e r epro du cible, and that pr ac- tice can b e th e basis for a functionally v alid ve rsion of the compu tation. Bey ond these sp ecific examp les, numerical compu- tations often d ep end on the underlying p arameters of the floating-point compu tations, for example, to select conv ergence criteria for iteration. F ortunately , sev eral decades of w ork b y numerical analysts and hardware designers h a v e greatly standardized the sp ecification of the numerical engine in mo dern com- puters: just kn o wing 32-bit or 6 4-bit gets us a long w a y . Dev eloping a fr amew ork for v alidating functional- it y see ms to me an interesting co op erative researc h direction that could b e of v alue to the statistical comm unit y . 3. THE EV OLUTION O F F UNCTIONAL PROGRAMMING, OO P AND R The computational parad igms for fun ctional p ro- gramming and for ob ject-orien ted p rogramming ha v e ev olv ed from a sequence of c hanges in soft wa re, b eginning with the earliest p rogramable compu ters. During the sa me perio d, soft wa re for statistics w as also ev olving, one thread of whic h led through early libraries to S and then to R . There may b e an app earance of earlier languages b eing replaced b y later and presumably improv ed approac hes. It is true that eac h ma jor revision as- serts imp ro v ement s that will extend our abilities to express our ideas in soft w are. Ho w ev er, none of the v ersions of S or R actually totally r eplaced earlier soft w are p aradigms. The cur ren t s oft ware in, and in terfaced from, R il- lustrates this evo lution. R h as dev elop ed imp ortant new tec hniques, bu t originated from the S language, repro du cing nearly all of S as it was describ ed at that time. S in turn wen t through several evolutio n- ary change s and w as itself based on extensiv e earlier soft w are, particularly subroutine libr aries for Fo rtran programming. Examining the h istory sho ws that a surpr ising portion of what w e see no w is structure inherited from the early stages. The form in wh ic h functional pr ogramming and OOP were adopted was also influen ced b y the ex- isting soft ware. Examining the histo ry will explain man y of the c hoices made. 3.1 F rom Hardw are to Data and Lib rarie s The earliest general-purp ose compu ters w ere pro- grammed in terms of the physical mac hine, its stor- age and the basic op er ations pr ovided to mo v e data around and p erform arithmetic and other op era- tions. The IBM 650 (Figure 1 ) was pr obably th e first computer wid ely sold and u sed (and the ma- c h ine on w h ic h I did my fi rst programming, around 1960) . In this p re-silicon wo rld, storag e for d ata or pro- grams resided on a r otating m agnetic dru m, holding 2000 d ecimal words. Data could b e read or written only when the corresp onding segmen t of the drum passed und er the appropriate fixed head, so that physic al p ositioning of data was a serious asp ect of p erformance. With this close view of the hard- w are, programming language s (assembly languag es for the actual mac hine in structions) defin ed storage in terms of sin gle p h ysical units (w ords i n the 650) and blo c ks of sequentia l storage. OBJECT-ORIENTED AND FUNCTIONAL PROGRAMMING 7 Fig. 1. An IBM 650 c omputer, m id 1950s. Under the glass is the magnetic drum stor age unit (memory), 2000 wor ds for data and pr o gr ams. This w as not an environmen t to encourage ab- straction of ideas ab out data. Ho wev er, by 1960 the first generatio n of “high-lev el” languages had b een in tro du ced and would s upp ort profoun d c hanges. F or s tatistical compu ting this meant primarily Fo r- tran . In terms of data storage, Fo rtran actually con- tin ued the basic notion of single items (scalars) and con tiguous blo cks (arrays). Two ma jor c hanges, ho w ev er, were made. First, the con ten ts were de- scrib ed in terms of their conte nt, the first data typ es including in teger and floating p oin t n umb ers. Sec- ond, th e language encouraged op er ations that iter- ated ov er th e con ten ts of th e arr a y s . By interpreting an arra y as a sequ ence of equal-length subarra ys, this indexing extended to matrices and to m ulti-w a y tables. Along with the n ew paradigm for data and facil - ities for iteration, the h igh-lev el languages encour- aged soft wa re to b e organized in subr outines, so that a compu tational metho d could b e realized as one or sev eral un its of soft ware. While the change s ma y seem mod est from the curren t p ersp ectiv e, they in fact sup p orted a ma jor rev olution in scien tific com- puting generally and emp hatically so in compu ting for s tatistics. Algorithm series and other pub lications supp orted b y professional so cieties b egan to accumulat e refer- eed, trustw orthy pro cedu res for many key compu- tations. Th e s tatistics research group at Bell Labs dev elop ed a large Fo rtran library that reflected our needs and our p hilosoph y of researc h and data anal- ysis. The b o ok “Compu tational Metho d s for Data Analysis”, Ch am b ers ( 1977 ), d id n ot present soft- w are b ut did reflect the to ols that would later form the b asis for S . After an in tro duction and discus- sion o f program design, the remaining six c h apters co vered computations supp orted by the libr ary: 3. Data M anagement and Manipulation (includ - ing sorting and table lo okup). 4. Numeric al Computation s (appro ximations, F ourier transforms, inte gration). 5. Line ar Mo dels (n umerical lin ear algebra, re- gression, m ultiv ariate metho ds). 6. Nonline ar Mo dels (optimization, nonlinear least squares). 7. Simulation of R andom Pr o c esses (ran d om num- b er generation and Mon te Carlo). 8 J. M. CHAMBERS 8. Computationa l Gr aphics (plotting tec hn iqu es, scatter plots, histograms and probabilit y p lots). Eac h of these was supp orted in the pr e- S era by subroutines th at wo uld then b ecome the basis for corresp ondin g functions in S . Muc h of the organization for basic to ols in R has inherited, through S , the structure of the sub rou- tine library . Th at includes the graph ical computa- tions, in p articular, features essenti al to S and R : separation of graphic device sp ecification from plot- ting; th e plot, figure and margins structure; graph- ical parameter sp ecification to con trol s t yle. These w ere not created for S but tak en o v er fr om previous F o rtran soft wa re, describ ed in Bec ker and Chambers ( 1977 ). The Bell Labs softw are was in the bac kground of Cham b ers ( 1977 ), but general readers were giv en in- structions for obtaining similar softw are from p ub- licly av ailable sour ces f or the metho d s describ ed. The pr o cedure wo uld not alw a ys b e simp le, b ut the p oten tial a v ailabilit y marked a b ig step forward. F or the first time, statisticians could draw on an ex- tensiv e range of relev an t soft w are to s upp ort their researc h , at least in prin ciple. V arious statistical soft w are pac k ages h ad existed for some time, bu t these were by and large oriented to rou tin e analysis, to teac hin g or to s p ecialize d statistical tec hn iqu es. Cham b ers ( 1977 ) and the soft w are it reflected w ere aimed at researc h in statistics and c hallenging data analysis. F or this pur p ose, a more general and op en- ended app roac h wa s needed. 3.2 F rom Fo rtran to S F or those inv olv ed with statistical theory or ap- plications, in academia or industry , there w ere t w o main limitations to th e soft ware describ ed so f ar: a v ailabilit y and the programming in terface. T he Ap- p end ix to Cham b ers ( 1977 ) w as a set of ta bles for eac h of th e c hapters, with ro ws corresp onding to computational tools that w ere more or less av ail- able to readers. Th e last column of the table l isted sources for the co rresp onding s oft ware. Th e en tries in that co lumn w ere not u n iformly helpfu l; in the b est situation, a generally a v ailable program lib r ary could b e ordered that pro vided a num b er of th e subroutines, but these were not d esigned for sta- tistical applications, most b eing directed at n umer- ical metho ds typica lly motiv ated by applications in physic s. More than half of the en tries r ead “List- ing,” implying a lab orious and error-prone man- ual pro cedur e for the user. [As an example, man y “bug rep orts” came to us as a result of confusin g an “ I ” and a “ 1 ” when t yp ing in the stable dis- tribution softw are, Chamb ers, Mallo ws and S tuc k ( 1976 ).] Substantia l i n-hous e libraries, su c h as the one a t Bell Labs, ga ve users a fairly wide range of com- putations, supp orted by impr o ved numerical and other algorithms. Ho w ev er, to apply the computa- tions sp ecifically to a particular d ataset with partic- ular resu lts in mind requir ed some s ubstant ial addi- tional F o rtran p rogramming. That programming had to b e rep eate d and revised for e ac h analysis or re- searc h question. In the 1970s the s ituation wa s th er efore a com b i- nation of improv ed basic computational capabilities but with a high programming barrier for most statis- ticians. Th e classical lin ear regression in Fo rtran as sho wn in Bec k er and Chambers ( 1985 ), f or example, w as fairly straigh tforw ard: call lsfit (X, N, P, y, coef, resid). This computes the fitted mo d el and returns it as v ectors of co efficient s and residuals. The data as ob- jects are restricted to arra ys, a matrix X an d ve ctor y for the data a nd t wo arra ys, coef and resid for the fitted mo del. Th e structure of the ob jects and their storag e allocation remains the pr ogrammer’s resp onsib ilit y . Linking the basic computation to the data in an act ual analysis remained non tr ivial and mistak es along the wa y w ere lik ely . And this is for the most standard of mo dels. Eve n giv en an exten- siv e library , the programming to apply the to ols to most applicatio ns wa s a lab orious, err or-prone activ- it y , usually assigned to dedicated programmers, re- searc h assistan ts or students. The statistician’s ideas w en t through non trivial tran s lation b efore they w ere expressed as compu tations. The first tw o versions of S w ere designed t o p ro- vide an “in teractiv e environmen t” that included the computational areas describ ed in C ham b ers ( 1977 ) and that allo w ed the s tatistician to form ulate id eas directly for co mpu tation. The second v ers ion of S w as licensed for general us e and describ ed in Bec ker and Chambers ( 1984 ). In S , the linear r egression computation b ecame a simpler expression, storage for d ata w as pro vided automatica lly and the ret ur n ed m o del w as no w an ob ject, with comp onen ts for the coefficients and residuals: fit <- reg(X, y). OBJECT-ORIENTED AND FUNCTIONAL PROGRAMMING 9 A t th is stage, S had a functional app earance, not radically u nlik e R , bu t its paradigm wa s essen tially an ext ension of the Fo rtran view. D ynamically cre- ated, self-describing ob jects were assigned in a single w orkspace, but the un derlying computations w ere those of the earlier subroutine lib rary: The functions in S , do cumen ted in Bec k er and Cham b ers ( 1984 ), w ere in f act int erfaces to F o rtran subroutines: reg () w ould in fact b e programmed by calling l sfit() . Although there was a macro facilit y in the lan- guage, programming a f unction in this v ersion of S mean t “extending S ” as describ ed in the b ook of that name, Bec k er and Chambers ( 1985 ). Th e definition of the new f u nction w as programmed in an “inte rface language ” b u ilt on Fo rtran and com- piled from its F o rtran trans lation. As the main pro- gramming mechanism this was u n satisfactory , in the sense that extending the language had a substan- tial learning barr ier b eyo nd using th e language. The abilit y to access other soft w are via an in ter-system in terface remains a k ey feature of R , ho wev er, one still u nder activ e develo pment. Equally as imp ortan t as the tec hnical side w as th e b eginning of a net wo rk of statisticia ns in v olv ed in creating and sharing softw are through the medium of the language. S was licensed from the early 1980s, a v ailable thanks to the newly distributed UNIX op er- ating system, with inexp en s iv e academic licenses to encourage adoptio n b y univ ers it y r esearchers, also follo win g the example of UNIX . Op en-sour ce soft- w are was not an option, but the researc h comm unity w as increasingly in vo lv ed and their in terest stim u- lated fu rther dev elopmen ts on our p art, p articularly from con tacts with intereste d users b elonging to a “b eta testing” net wo rk. Sim ultaneously , we were thin k in g ab out a new ap- proac h to the language i tself, emp hasizing the p r o- gramming asp ect of creating new soft w are f or statis- tical and other quantita tiv e app lications. Describ ed initially in Chambers ( 1987 ) as a language s epa- rate from S , this researc h later merged with other c h anges to form the next v ersion, lab eled S3 and de- scrib ed in the “blue b o ok,” Be c k er, Cham b ers and Wilks ( 1988 ). Th e slogans in Section 2.3 were b asic to this version of S : eve rythin g is an ob ject (stated explicitly) and function calls do all the computation (implicit). This w as fu nctional programming (more or less) and ob ject- b ase d but not ob ject-orien ted. Ob jects w ere giv en structure through attributes attac hed to v ectors and through named comp onents, bu t there w ere no classes or metho ds . 3.3 F rom Data t o Classes a nd Metho ds The languages that originated the concepts of classes, prop erties, inheritance and metho ds came out of sev eral motiv ations. Th e first, Simula , wa s concerned with simulat ing systems. In r etrosp ect, mo deling b y sim ulation and modeling by fitting to data hav e clear corresp ondences but w ith quite a differen t p ersp ectiv e. F or an example, su p p ose w e w an t to simulate a simple m o del for an ev olving p opulation of individuals. I n R notation, but qu ite in th e st yle of Simula , we define a class Simple Pop . An ob ject f r om this class is a sp ecific realization of the mo d el p opu lation with prop erties that define the p robabilities of birth and death, and a v ector of p opulation size at eac h ge neration. An ob j ect from the p opulation is created b y calling the generato r for th e class: p <- Simpl ePop(b irth = 0.08, death = 0.1, size = 100). Rather than a sin gle functional computation as in the case of linear regression, computations proceed b y s im ulating the ev olution of the p opulation ob ject p . The ob ject itself evo lve s; in the terminolog y of OOP , it is a mutable r e fer enc e . A corresp ond ing d ifference in the pr ogramming paradigms of S and the emerging OOP language s w as that the latter did n ot tak e a fu nctional view of computation. Instead, computations largely con- sisted of in vo king a metho d on an ob ject. In the SimplePo p example, the fu ndamenta l computation is to s im ulate one generation of the ev olution by in- v oking the evolv e() metho d p$evol ve(). The v alue returned b y this metho d is irr elev ant. T he metho d’s pur p ose is to c h an ge the ob ject, in this case b y simulati ng one furth er generation and ap- p end in g the resu lting v alue to a prop ert y in the ob- ject, n amely , p$size . (S ee files “ SimpleP op.R ” and “ SimplePopE xample.R ” i n the supplemen tary ma- terials.) F ollo wing the develo pment of Si mula in the late 1960s, a v ariet y of languages adopted this p aradigm. C ++ added classes and metho ds to the C language; lik e C , it w as initially us ed for a v ariety of program- ming tasks implementi ng UNIX and application soft- w are for UNIX . In con trast to the “add-on” nature of C ++ , the Smalltalk la nguage w as a very pur e, 10 J. M. CHAMBERS simplified r ealizat ion of th e ideas in Simula . Its ma- jor, and revolutio nary , application wa s to imp lemen t the grap h ical user interface created at Xero x P ARC in the 1970s. Man y other v ersions of encapsulated OOP follo we d, either added on to existing languages or incorp orated in to new languages from the start. Dialect s of the Lisp language and languages based on Li sp also incorp orated OO P in v arious forms . During the 1980s, sev eral r esearc h pr o jects built sta- tistical s oftw are on the b asis of th ese languages, in- cluding some elegan t and p oten tially widely appli- cable systems, notably LISP-ST A T, Tierney ( 1990 ). As it turned out, h o w ev er, the most wid ely used ver- sion of OOP f or statistical ap p lications wo uld come from a somewhat casual app roac h in S . 3.4 F unctional OOP in S and R The c hief motiv ation for introducing classes and functional methods to S w as the initial applica- tion: fitting, examining and mo difying d iv erse kind s of statistica l mo d els for data. This r emains ar- guably the m ost compelling e xample for functional OOP in statistics. The “Statistica l Mo dels in S ” pro ject r ep orted in Ch am b ers and Hastie ( 1992 )— the “white b o ok”—brough t toge ther ten authors present ing soft w are for a v ariet y of statistical mo d- els, fr om linear regression to tree-based mo dels. The differen t mo d els w ere presen ted as consisten tly as p ossible. Eac h t yp e of m o del had a definition as an ob- ject having the inf ormation, su ch as co efficien ts and other pr op erties, required. The ob ject wa s created b y a corresp ond ing function taking as argumen ts the data, mo del d escription and p ossibly other control - ling p arameters. A linear regression fit, for example, called the function lm() : irisFi t <- lm(Sepal. Width ∼ . - S epal.L ength , i ris) and returned a corresp onding linear regression ob- ject. F urther computations on this ob ject w ould ex- amine the mo del, return inform ation ab out it, or up d ate the fit. The underlying computations still used b asic soft wa re similar to th at for l sfit() and reg() . Ho w ev er, the description of the mo d el (a for- m ula) and the data (a data frame) were designed to apply to statistical mo dels generally . F or example, to fit a generalized linear mo del the user called glm() with form ula and data argumen ts t ypically similar to those in a call t o lm() . Other a rguments w ould pro vide information su itable to the p articular t y p e of mo del (a li nk fun ction, e.g.). F or the con venience of the user, further computa- tions s hould hav e a un iform app earance. T o print or plot the fitted mo d el or to compute pr edictions or an up d ated m o del corresp ondin g to new data, the u ser should call the same fun ction [ print() , plo t() , predict( ) or u pdate() ] in the same wa y , regard- less of the t yp e of mo del. T h e o w n er of the soft wa re for a particular typ e of mo d el, on th e other hand , w ould lik e to write just that v ersion of eac h function, without b eing resp onsible f or the other versions. Once stated, this is essentiall y a pr escription for functional OO P : a c lass o f o b jects for eac h kind o f mo del, generic functions for the computations on the ob jects and metho ds for eac h function for eac h class. Where one class of m o dels is an extension of another (analysis of v ariance as a sub class of linear mo dels, e.g.), methods can b e inherited when that mak es sense. An implemen tation of generic functions and meth- o ds was in tro duced as part of the statistical mo d- els pr o ject and describ ed in the App endix to the white b o ok. Th e cen tral m echanism wa s an explicit metho d dispatc h. Th e fu nction pr int() , for exam- ple, w ould ev aluate the expression: UseMet hod(" print"). The ev aluation of this call w ould examine the “ class ” attribu te of the first f orm al argum en t to the function. If presen t, this w ould b e a c h aracter v ec- tor. Eligible metho ds w ould b e those matc h ing one of the strings in th e class vect or; if none matc hed, a metho d matc hing the string “ default ” would b e used. Inheritance was implemente d by having more than one string in the class, with the first string b e- ing “the” class and the r emaind er corr esp onding to inherited b ehavio r. Cham b ers and Hastie ( 1992 ), in th e discussion of c lasses and methods , noted that S differed from other OOP languages b ecause of its fu n ctional p ro- gramming style . In f act, th is version of functional OOP fin essed the resulting d istinction from encap- sulated OOP in tw o w ays. First, the metho d s were dispatc hed according to a sin gle argument, the fi rst formal argumen t of the generic fu nction in p rinci- ple. As a result, the metho ds w ere unambiguously asso ciated with a single class, as they wo uld b e in en- capsulated OOP . Metho ds we re actually disp atc hed on either argument to the usual bin ary op erators, but a num b er of encapsulated OOP languag es do the same, u nder the euphemism of op erator o ver- loading. OBJECT-ORIENTED AND FUNCTIONAL PROGRAMMING 11 Second, the question of w hether metho ds b e- longed to a class o r a function was a v oided by not ha ving them b elong to either. Metho ds were as- signed as ordinary fun ctions and iden tified by the pattern of their name: “ function . class ”. I n an y case, there w ere no class ob jects and generic fun ctions w ere ordinary fun ctions that in vok ed UseMeth od() to select and ca ll the appropr iate metho d . Nei ther the fun ction n or the class w as able to own the meth- o ds. T ec h nically , the metho d d ispatc h in this v er s ion of OOP wa s instance-based, not class-based, since no ru le enforced a consisten t set of classes, that is, that all ob jects w ith a giv en first class s tring wo uld ha v e id en tical follo wing str ings for the sup erclasses. ( R for some time had an S 3 class in the base pac k- age w ith a main class string “ POSIXt ”, representing date/times, that could b e follo wed in different ob- jects by one o f t wo strings that in fac t represen ted sp ecializations, i.e., sub classes, of “ POSIXt ”.) The classes and metho d s imp lemen ted for statisti- cal mo d els constituted a b are-b ones ve rsion of fun c- tional OOP , w hic h is not to imply that this w as a bad id ea. Adv an tages include a relativ ely lo w learn- ing barrier for programming an d a thin imp lemen- tation lay er ab ov e the previously existing language, whic h in turn means less computational ov erhead in some circumstances. [In terestingly , the encapsu lated OOP of Pyth on has a similarly thin implementat ion, with classes conta ining metho ds bu t without defin- ing the pr op erties. A ve ry analogous d efense is made for th at im p lemen tation, in Section 9 of the Python tutorial, Python ( 2013 ), e.g.] A m ore form al version of functional OOP w as de- v elop ed at Bell Labs, in tro d uced i nto S in the late 1990s and describ ed in Cham b ers ( 199 8 ). By this time, S -based soft ware w as exclusiv ely licensed to the Insigh tful Corp oration, whic h later purc hased the rights to the S soft w are, in 2004, and was itself subsequently purchased by Tib co. The new p aradigm d iffered from S3 classes and metho ds in three main w a ys: 1. Metho ds could b e sp ecified for an arbitrary subset of th e formal argu m en ts, and metho d dis- patc h wo uld find the b est m atc h to th e classes of the corresp ondin g arguments in a call to the generic function. 2. Classes were defined exp licitly with giv en prop- erties (the slots) and optional sup erclasses for inher- iting b oth prop erties and metho ds. 3. Generic fun ctions, metho d s and class d efini- tions were themselv es ob jects of formally defined classes, giving the p aradigm reflectivit y . The new paradigm wa s part of the version of S de- scrib ed in the 1998 b o ok and generally referr ed to as S 4. The S4 lab el is generally applied to this O OP paradigm, wh ether in S or R . S4 metho d s neve r had m uc h c hance of r eplacing S3 metho ds. In pr ac- tice, man y S4 generic functions we re b ased on f unc- tions th at already dispatc h ed S 3 m etho ds. In this case, the S3 generic function b ecame th e default S4 metho d. The wo rk on S 4 p aralleled in time the arriv al of R and its con ve rsion int o a b road-based join t pro ject follo win g the initial p ublication by Ihak a and Gen- tleman ( 1996 ). The implemen tation of R wa s de- signed to pr ovide the functionalit y for S describ ed in the blue b o ok and w h ite b ook, in clud ing S3 meth- o ds. Beginning in 2000, an implemen tation of th e S 4 v ersion of OOP was added to R . The “Soft ware for Data An alysis” b o ok, Ch am b ers ( 2008 ), i nclud es a description of the R version. Both versions of functional OOP will remain in R . Man y prefer the simplicit y of the old form, and in an y case the v ery large b o dy of existing co d e will not b e discarded, and should not b e. S ome imp ortan t ex- tensions ha ve b een made, for example, by register- ing th e S 3 m etho ds from a pac k age. Ma jor forw ard- lo oking pr o jects ha v e typically used th e new er ver- sion, for example, the Bio conducto r pro ject for bioin- formatics soft wa re, Gen tleman et al. ( 2004 ), and the Rcpp in terface to C ++ , Eddelbuettel and F ran¸ cois ( 2011 ). Recen t c hanges, su c h as making the S3 and S4 v ersions of inheritance as compatible as p ossible, ha v e b een aimed at helping the tw o form s to co exist pro du ctiv ely . An y programming paradigm with s ome degree of formalit y is lik ely to ha ve a h igher initial learnin g barrier and require some extra sp ecification f r om the programmer. A comparison of encapsulated OO P programming w ith Python to th at with Java is an in teresting parallel to S3 and S4. In b oth examples, the less formal v ersion is lik ely to b e quick er to learn, while the more formal version pro vides more in for- mation ab out the resulting softw are. Th at informa- tion in turn can supp ort some f orms of v alidation for the resu lting soft wa re, as well as to ols to analyze and d escrib e it. Python and Java b eing rather dif- feren t languages in other resp ects as w ell, p ro jects are not to o lik ely to make a c hoice b etw een them 12 J. M. CHAMBERS based solely on the formalit y of the ob ject-orien ted programming. With R , a conscious choic e is m ore like ly . The ar- gumen ts for a more f orm al appr oac h apply particu- larly , in my op in ion, to pro jects with one or m ore of the c haracteristics: a su bstan tial amount of softw are is lik ely to b e written; the application has a f airly wide scop e in terms of either the data or the com- puting metho ds ; or the v alidity and reliabilit y of th e resulting s oftw are is imp ortan t. Nothing preven ts go o d soft w are b eing written without formal tools in this case nor of bad soft- w are b eing written with them. Ho w ev er, th ere are sev eral p oten tial b en efits that can b e summarized in parallel with the main inno v ations noted ab o v e: 1. Allo win g metho ds to dep end on multiple argu- men ts fits the fu nctional p aradigm in R , in whic h the argument s collectiv ely define the domain of th e function. Man y f unctions in R are naturally applied to d ifferen t classes of ob jects, not necessarily corre- sp ond ing to the first argum en t, or only to one argu- men t. F or example, w hen bin ary op erators su c h as arithmetic are defined for a new class, a clean design of metho d s for the op erators often needs to distin- guish three cases: the first operan d only b elonging to the new cla ss, the second op erand only or b oth op erands. 2. A formal definition for a class allo ws p rogram- mers to r ely on the prop erties of ob jects ge nerated from the class. Oth erwise, the nature of the ob jects can only b e inferred, if at all, from analyzing all the soft w are that creates or mo difies an ob j ect of this class. 3. Ha vin g formal d efinitions for th e generic fun c- tions, metho ds and class definitions themselv es s u p- p orts a growing set of to ols for installing and using pac k ages that include such fu nctions, metho ds or classes. The b en efits of a general, reliable form of functional OOP extend to deve lopments in the language itself. F or example, r eference classes w ere bu ilt on the S4 classes and metho ds, with no inte rnal changes to the R ev aluator required. 3.5 Reference Classes F un ctional OOP remains an activ e area in R . In addition, r efer enc e c lasses , intro d uced to R in 2010 in v ersion 2.12 .0, provide an implemen tation of e ncapsulated OOP . Class d efinitions include the prop erties of the cla ss with optional typ e declara- tions; prop erties ma y also b e optionally d eclared read-only . C lass definitions are themselves ob jects a v ailable at ru nt ime. Metho d s are programmed as R fu nctions, in whic h the ob ject itself is implic- itly a v ailable, not an explicit argument. Metho ds can access or assign p r op erties in the ob ject by name. These c h aracteristics mak e the implementa - tion more Ja va -lik e, say , than Python - or C + +-like . The p rogrammer defines a reference class in the R st yle, calling setRefCl ass() instead of setClass( ) . The call retur ns a generator for the class and sa v es the class d efinition ob ject as a side effect, as do es setClass () for S4 classes. As a sid e comment , w hile R us es a mo del for most of its ob jects and compu tations that is fu ndamen- tally different f rom the ob ject references in encapsu - lated OOP , a few k ey f eatures made the implementa - tion of reference classes in R p ossible and even r ela- tiv ely str aigh tforward. Most imp ortantly , the R data t yp e “ envi ronment ” pro v id es a v ehicle for ob ject references and prop erties. Environmen ts are un iver- sal in R and we ll su pp orted by programming to ols. In particular, the activ e binding m ec h anism, which allo ws a ccess and assignment op eratio ns on ob jects in envi ronments to be p rogrammed in R , was v alu- able in the imp lemen tation. Reference classes allo w the use of encapsulated OOP for ob jects that suit that p arad igm m ore natu- rally th an they do functional OOP . As noted in Sec- tion 3.3 , th e essent ial distinction b etw een fun ctional and encapsulated OOP is wh ether an ob ject is cre- ated, once, by a fu nction call or is instead a m utable ob ject th at changes as metho d s are inv oke d. Statistical compu ting has examples clearly suited to eac h of these paradigms. The linear mo d el re- turned by lm() is not op en to mutat ion. Change the num b ers in the co efficien ts or resid u als and you no longer hav e an ob ject that should b elong to that class. In con trast, a mo del sim ulating a dyn amic pro- cess suc h as the Simp lePop class in Section 3.3 exists precisely for the purp ose of c hanging, with its ev o- lution b eing the cen tral p oint of interest. Other, less directly statistical compu tations in R also ma y cor- resp ond to mutable ob jects, for example, the frames or other ob jects in a graphical interface. Not ev ery case is clear cut. Sometimes, essential ly the same class structure m a y b e more app ropriate for f unctional or encapsulated classes dep end ing on the purp ose of the computation. Data frames are a p rime example. This essen tial ob ject structure is OBJECT-ORIENTED AND FUNCTIONAL PROGRAMMING 13 view ed naturally as functional wh en it is part of a functional ob ject related to the data frame. F or ex- ample, a fi tted mo del that w ante d to b e f ully re- pro du cible could return the d ata fr ame on which the fitting wa s based [e.g., l m() includ es the mo del frame it constructs]. Suc h a data fr ame is clearly functional; again, c hange it and y ou inv alidate the mo del. On the other h an d , a data frame to b e used in data cleaning and editing is an ob ject that n eeds to b e mutable. Ha vin g b oth paradigms in a s ingle language is unusual. Some functional-st y le languages hav e im- plemen ted functional OOP , notably Dylan , in ter- esting for its parallels w ith OOP in R —see Shalit ( 1996 ), particularly the discussion of m etho d dis- patc h . Other languages with a functional structur e ha v e nevertheless add ed w hat is essential ly encapsu- lated OO P , for examp le, Odersky , Sp o on and V en- ners ( 2010 ) for the case of Scala . W e hop e that pr o v id ing b oth paradigms in R en- courages soft wa re design that is natural for the ap- plication. It do es at the same time pose some sub- tleties. Reference classes an d reference class ob jects are s omewhat abnormal in R . One needs to under - stand the d istinctions from standard R ob jects. The k ey is th e lo cal reference mechanism noted in Section 2.3 . T he R ev aluator enforces lo cal reference b y d uplicating an ob ject wh en a computation might alter a n on lo cal r eference. Certain ob ject typ es are exceptions that are not d uplicated. The imp ortan t exception is t yp e “ enviro nment ”. Reference classes are implemen ted by extend in g this t yp e. En capsu- lated OO P in R uses no sp ecial form of the fun c- tion call. Metho d inv o cation is just a call to the “ $ ” op erator, for which reference classes ha v e an S4 metho d. Reference seman tics are obtained b y one basic fact: environmen ts are n ev er d uplicated auto- matically . T he S4 class mec hanism in R n ev ertheless allo ws one to sub class the “ environm ent ” t yp e in order to define r eferen ce class b eha vior. The ob jects in th e fields of a reference class ob ject can b e ordinary R ob jects. They b ehav e j ust as usual and when used in function calls will ha v e r egular lo cal reference b eha vior in that call. It is only when fields in the reference ob ject itself are replaced th at the encapsu lated O OP is r elev an t. Reference class ob jects are also go o d candidates for interface s to other languages that implemen t the same O OP paradigm, such as Ja va , C ++ or Pyt hon . The R ob ject could b e a p ro xy for an ob ject in the other language with metho d s inv oke d in R but e x- ecuted on the original ob ject. T he Rcpp in terface to C ++ , Eddelbuettel and F ran¸ coi s ( 2011 ), has a mec h anism for extending C + + classes in this w a y . C ++ classes can only b e inferred from the source, meaning th at either th e programmer must s u pply the interfac e information (as in the current imple- men tation) or some p ro cessing of the sour ce must b e applied (currently us ed to exp ort functions fr om C ++ but not classes). Java classes are accessible as ob jects, via “reflectance” in Java terminology , so that in principle pro xy classes in R should b e p ossible. The rJavax pac k age by Da nenb erg ( 2011 ) has an initial imp lementati on. F or Python , m etho ds are a v ailable f rom the ob jects bu t prop erties are not formally d efined. A t the time of writing, basic inter- faces to Python exist, for example, Gr othendiec k and Bellosta ( 2012 ), which could b e extend ed to sup p ort class interface s, with metho ds but n ot p rop erties in- ferred fr om the Python class ob jects. F ur ther work on th ese and other in ter-system in- terfaces would b e a v aluable cont ribu tion to the user comm unit y . 4. SUMMARY R pla ys a ma jor role in the comm un ication an d dissemination of new tec hniqu es for statistics and for r esu lts of statistic al researc h more generally . In particular, the many p ac k ages written in R or usin g R as a base f or in terfacing to other s oftw are consti- tute an essentia l, rapidly growing resource. There- fore, the qualit y of such softw are an d th e abilit y of programmers to create and extend it are imp ortan t. The cu r rent R language and i ts supp orting f unc- tionalit y are t he result of man y y ears of evolutio n, from early pr ogramming libraries thr ough the S lan- guage to R , whic h itself h as evolv ed and accum u- lated a v ariet y of programming tec hniqu es. This ev o- lution has b een m uch influenced by the fu nctional and ob ject-orien ted pr ogramming paradigms. New v ersions hav e co ntin ued to in clude supp orting soft- w are and pr ogramming to ols found us eful at earlier stages along w ith im p ro v ed capabilities. The pr ogramming p arad igms b ecome esp ecially relev an t when the applications are complex or the qualit y of th e resulting soft wa re is imp ortan t. In particular, the ve rsions of ob ject-orien ted program- ming in R can assist in deali ng with complexit y of the un derlying data. As noted, R imp lements OO P in tw o form s, functional and en capsulated. These 14 J. M. CHAMBERS are complemen tary , with one or the other su itable for p articular applications. The latter is essen tially the form of OOP used in most other languages, but the former is distinctly different . Consid er ab le con- fusion has arisen in discussions of OOP in R from not n oting th at distinction, whic h the present pap er has tr ied to clarify . More generally , u nderstandin g the r ole of ob ject- orien ted and functional pr ogramming in R ma y assist future con tributing pr ogrammers in us in g related p rogramming to ols. Th e con tin uing r apid gro w th of R -based softw are and the expandin g, c h al- lenging range of tec hniqu es it has to su pp ort make effectiv e programming an imp ortan t goal for the sta- tistical communit y . The imp ortance of ob ject-orien ted programming is lik ely to increase as stat istical softw are tak es on new and c hallenging applications. In particular, the need to deal with increasingly large ob jects and dis- tributed sour ces of data will b ring in sp ecialized classes of data and will need p ow erful computing to ols. One imp ortan t d irection has b een to trans- form selected soft wa re in R , particularly to sp eed up large-scal e computations; see, for example, the com- panion pap er T emple Lang ( 2014 ). Complement ary to this is to int erface to other languages an d soft- w are when th ese provide b etter p erformance on “big data” and other computationally demand in g appli- cations. In particular, interface s that matc h with ob ject-orien ted treatmen ts for sp ecialize d forms of data can exploit the OO P f acilitie s in R . Th e in ter- face to C + +, Eddelbu ettel and F ran¸ cois ( 2011 ), is an example. F ur ther devel opment of suc h in terfaces will b e of m uch b enefit. F un ctional programming is p erhaps not su c h an ob viously h ot topic at the momen t. Ho we v er, th e underlying philosophy that o ur soft ware should b e in the form of reliable, defens ible u nits is v ery muc h part of R . Situations where the v alidit y of sta tisti- cal compu tations n eeds to b e defended are lik ely to increase, give n the gro wing need for statistical treat- men t of complex problems for science and so ciet y . A CKNO WLEDGMENTS Thanks to the Asso ciate Ed itor and the referees for some helpf u l commen ts on presentati on and con- ten t. Thanks especially to Vincen t Carey for orga - nizing a nd enco uraging the set of talks and pap ers of wh ic h this is part. REFERENCES Becker, R. A. and Chambers, J. M. (1977). Gr-z: A sys- tem of graphical subroutines for data analysis. In Pr o c. Interfac e Symp. on Statistics and C omputing 10 409–41 5. Becker, R . A. and Chambers, J. M. (1984). S: An In- ter active Envir onment for Data A nalysis and Gr aphics . W adsworth, Belmont, CA. Becker, R . A. and C hambers, J. M. (1985). Extending the S System . W adsw orth, Belmont, CA. Becker, R. A . , Chambers, J. M. and Wilks, A. R. (1988). The New S L anguage . Chapman & Hall, Bo ca Raton, FL. Chambers, J. M. (1977). C om putational Metho ds for Data Ana lysis . Wiley , New Y ork. MR0659716 Chambers, J. M. ( 1987). In terface for a qu antitativ e pro- gramming environment. In Comp. Sci. and Stat., Pr o c. 19th Symp. on the I nterfac e 280–286. Chambers, J. M. (1998). Pr o gr ammi ng with Data: A Guide to the S L anguage . Sp ringer, New Y ork. Chambers, J. M. (2008). Softwar e f or Data Analysis: Pr o- gr amming with R . S pringer, N ew Y ork. Chambers, J. M. and Hastie, T. , eds. (19 92). Statistic al Mo dels in S . Chapman & Hall, Bo ca R aton, FL. Chambers, J. M . , Mallow s, C. L. and Stuck, B. W. (1976). A metho d for simula ting stable rand om v ariables. J. Amer. Statist. Asso c. 71 340–34 4. MR0415982 Danenber g, P. (2011). rJa v ax: rJav a extensions. R pack- age version 0.3. Avai lable at http://CRAN.R- project. org/package=rJ avax . Eddelbuettel, D. and Franc ¸ ois, R. (2011). R cpp: Seam- less R and C++ integration. Journal of Statistic al Sof twar e 40 1–18. Gentleman, R. C. , Care y, V. J. , Ba tes, D. M. et al. (2004). Bio conductor: O p en softw are developmen t for com- putational biology and bioinformatics. Genome Bi olo gy 5 R80. Grot hend ieck, G. and Bellost a , C. J. G. (2012). rJython: R interf ace to Python via Jython. R pack age versi on 0.0-4. Av ailable at http://CRAN.R- project.org/ package=rJ ython . Ihaka, R. and G entleman, R. (1996). R: A l anguage fo r data analysis an d graph ics. J. Comput. Gr aph. Stat ist. 5 299–314 . Odersky, M. , Sp oon, L. and Venners, B. (2010). Pr o gr am- ming i n Sc ala , 2nd ed. Artima, W alnut Creek, CA. Python (2013). The Python T u torial. Python. A v ailable at http://docs.python . org/tutorial . R Core T eam (2013). R L anguage Definition . R F ounda- tion for Statistical Comput ing, Vienna, A ustria. ISBN 3- 900051-13 -5. A v ailable at http://cran.r- project.org/ doc/manuals/R- lang.html/ . Shalit, A. (1996). The Dylan R efer enc e Manual . Addison- W esley , R eading, MA. Temple Lang, D. (2014). Enhancing R with adv anced com- pilation to ols and method s. Statist. Sci. 29 181–200. Tierney, L. (1990). LI SP-ST A T: An Obj e ct-Oriente d Envi- r onment for Statistic al Computing and Dynamic Gr aphics . Wiley , New Y ork.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment