bayes4psy -- an Open Source R Package for Bayesian Statistics in Psychology

ba y es4psy – an Op en Source R P ac kage for Ba y esian Statistics in Psyc hology Jure Demšar Univ ersity of Ljubljana Grega Rep o vš Univ ersity of Ljubljana Erik Štrum b elj Univ ersity of Ljubljana Abstract Researc h in psyc hology generates in teresting data sets and unique statistical mo delling tasks. Ho wev er, these tasks, while important, are often v ery sp eciﬁc, so appropriate sta- tistical mo dels and metho ds cannot b e found in accessible Ba yesian tools. As a result, the use of Bay esian methods is limited to those that hav e the tec hnical and statistical fundamen tals that are required for probabilistic programming. Suc h knowledge is not part of the typical psyc hology curriculum and is a diﬃcult obstacle for psyc hology stu- den ts and researc hers to o v ercome. The goal of the bay es4psy pac kage is to bridge this gap and oﬀer a collection of mo dels and metho ds to b e used for data analysis that arises from psyc hology exp erimen ts and as a teaching to ol for Ba yesian statistics in psychol- ogy . The package contains Bay esian t-test and bo otstrapping and models for analyzing reaction times, success rates, and colors. It also provides all the diagnostic, analytic and visualization to ols for the mo dern Bay esian data analysis workﬂo w. K eywor ds : R , Bay esian statistics, psychology , reaction time, success rate, Ba yesian t-test, color analysis, linear mo del, b o otstrap. 1. In tro duction Through the dev elopment of sp ecialized probabilistic mo dels Bay esian data analysis oﬀers a highly ﬂexible, in tuitive and transparent alternative to classical statistics. Ba yesian ap- proac hes were on the sidelines of data analysis throughout m uch of the modern era of science. Mostly due to the fact that computations required for Ba yesian analysis are usually quite complex. But computations that were only a decade or tw o ago too complex for specialized computers can now adays b e executed on av erage desktop computers. In part also due to mo dern Mark ov chain Mon te Carlo (MCMC) metho ds that make computations tractable for virtually all parametric mo dels. This, along with sp ecialized probabilistic programming lan- guages for Bay esian mo delling – such as Stan ( Carp enter, Lee, Brubaker, Riddell, Gelman, Go o drich, Guo, Hoﬀman, Betancourt, and Li 2017 ) and JAGS ( Plummer 2003 ) – drastically increased the accessibility and usefulness of Bay esian metho dology for data analysis. Indeed, 2 ba yes4psy Ba yesian data analysis is steadily gaining momen tum in the 21 st cen tury ( Gelman, Carlin, Stern, Dunson, V ehtari, and R ubin 2014 ; Kruschk e 2014 ; McElreath 2018 ), mainly so in nat- ural and tec hnical sciences. Unfortunately , the use of Bay esian data analysis in so cial sciences remains scarce, most lik ely due to the steep statistical and technical learning curv e of Ba yesian metho ds. There are many adv antages of Ba yesian data analysis, suc h as its ability to w ork with missing data and combine prior information with data in a natural and principled w ay . F urthermore, Ba yesian metho ds oﬀer high ﬂexibilit y through hierarchical mo delling and calculated p osterior distribution, while calculated p osterior parameter v alues can b e used as easily interpretable alternativ es to p-v alues – Ba yesian metho ds pro vide v ery in tuitive answers, suc h as “the parameter µ has a probability of 0.95 of falling inside the [-2, 2] interv al” ( Dunson 2001 ; Gelman et al. 2014 ; Krusc hk e 2014 ; McElreath 2018 ). One of the so cial sciences that could b eneﬁt the most from Ba yesian metho dology is psyc hology . The majority of the data that arises in psyc hological experimen ts, such as reaction times, success rates, and colors, can b e analyzed in a Ba yesian manner by using a small set of probabilistic mo dels. Ba yesian metho dology could also alleviate the replication crisis that is p estering the ﬁeld of psychology ( Op en Science Collaboration 2015 ; Sc ho oler 2014 ; Stanley , Carter, and Doucouliagos 2018 ). The abilit y to replicate scientiﬁc ﬁndings is of paramoun t importance to scien tiﬁc progress ( Bak er and P enny 2016 ; McNutt 2014 ; Munafò, Nosek, Bishop, Button, Chambers, Percie Du Sert, Simonsohn, W agenmakers, W are, and Ioannidis 2017 ). Unfortunately , more and more replication attempts rep ort that they had failed to repro duce original results and con- clusions ( Amrhein, Greenland, and McShane 2019 ; Op en Science Collab oration 2015 ; Sc ho oler 2014 ). This so-called replication crisis is harmful not only to the authors but to science it- self. A recent attempt to replicate 100 studies from three prominent psychology journals ( Op en Science Collab oration 2015 ) show ed that only approximately a third of studies that claimed statistical signiﬁcance (p v alue lo wer than 0.05) also sho wed statistical signiﬁcance in replication. Another recen t study ( Camerer, P aulson, Dreb er, Holzmeister, Ho, Hub er, Jo- hannesson, Kirc hler, Nav e, Nosek, Pfeiﬀer, Altmejd, Buttrick, Chan, Chen, F orsell, Gampa, Heik ensten, Hummer, Imai, Isaksson, Manfredi, Rose, W agenmak ers, and W u 2018 ) tried to replicate systematically selected studies in the so cial sciences published in Nature and Science b et ween 2010 and 2015, replication attempts w ere successful only in 13 cases out of 21. The main reasons behind the replication crisis seem to b e p o or qualit y con trol in journals, unclear writing and inadequate statistical analysis ( Hurlb ert, Levine, and Utts 2019 ; W asser- stein and Lazar 2016 ; W asserstein, Sc hirm, and Lazar 2019 ). One of the main issues lies in the desire to claim statistical signiﬁcance through p-v alues. Many man uscripts published to day rep eat the same mistak es even though prominent statisticians prepared extensive guidelines ab out what to do and mainly what not to do ( Hubbard 2015 ; W asserstein and Lazar 2016 ; W asserstein et al. 2019 ; Ziliak 2019 ). Reluctance to adhere to mo dern statistical practices has led scientist to believe that a more drastic shift in statistical thinking is needed, and some b eliev e that it might come in the form of Ba yesian statistics ( Dunson 2001 ; Gelman et al. 2014 ; Kruschk e 2014 ; McElreath 2018 ). The ba yes4psy R package provides a state-of-the art framework for Bay esian analysis of psy- c hological data. It incorporates a set of probabilistic mo dels that can be facilitated for anal- ysis of data that arises during man y t yp es of psyc hological experiments. All mo dels are pre-compiled, meaning th at users do not need an y sp ecialized soft ware or skills (e.g. kno wl- edge of probabilistic programming languages), the only requiremen ts are the R programming Journal of Statistical Soft ware 3 language and v ery basic programming skills (same skills as needed for classical statistical analysis in R ). Besides the probabilistic mo dels, the package also incorp orates the diagnostic, analytic and visualization to ols required for mo dern Bay esian data analysis. As such the ba yes4psy pac kage represen ts a bridge into the exciting w orld of Bay esian statistics for all studen ts and researches in the ﬁeld of Psychology . 2. Mo dels and metho ds F or statistical computation, that is, sampling from the p osterior distributions, the bay es4psy pac kage utilizes Stan ( Carp enter et al. 2017 ). Stan is a state-of-the-art platform for statistical mo deling and high-p erformance statistical computation and oﬀers full Bay esian statistical inference with MCMC sampling. It also oﬀers user friendly in terfaces with most program- ming languages used for statistical analysis, including R . R ( R Core T eam 2017 ) is one of the most p ow erful and widespread programming languages for statistics and visualization. Visualizations in the ba yes4psy package are based on the ggplot2 package ( Wic kham 2009 ). 2.1. Priors In Bay esian statistics we use prior probability distributions (priors) to express our b eliefs ab out the parameters b efore any evidence (data) is tak en into accoun t. Priors represen t an elegan t wa y of in tertwining previous knowledge with new facts ab out the domain of analysis. Prior distributions are usually based on previously conducted and veriﬁed research or on kno wledge pro vided b y the domain exp erts. If such data is not a v ailable, we usually resort to our own weakly informative, v ague prior kno wledge. In the bay es4psy package users are able to express prior knowledge b y putting prior distribu- tions on all of the mo del’s parameters. Users can express their knowledge by using uniform, normal, gamma, or b eta distributions. If users do not sp ecify any prior knowledge ab out the mo del’s parameters, then ﬂat/improp er priors are put on those parameters. F or details see the practical illustrations of using the ba yes4psy pac kage in Section 3 . 2.2. Ba y esian t-test The t-test is probably th e most commonly used hypothesis test. W e added the Bay esian v ersion of t-test to the ba yes4psy pac kage. The t-test is based on Kruschk e’s mo del ( Krusc hk e 2013 , 2014 ). The Ba yesian t-test uses a scaled and shifted Student’s t distribution (Figure 1 ). This distribution has three parameters – degrees of freedom ( ν ), mean ( µ ) and v ariance ( σ )). There are some minor diﬀerences b etw een our implementation and Krusc hke’s. Instead of pre-deﬁned v ague priors for all parameters, w e can deﬁne custom priors for the ν , µ and σ parameters. Since Krusc hke’s main goal w as the comparsion betw een tw o groups, his implemen tation models t wo data sets sim ultaneously . Our implemen tation is more ﬂexible, users can mo del sev eral data sets individually and then mak e pairwise comparisons or a sim ultaneous cross comparison b etw een multiple ﬁts. W e illustrate the use of the t-test in Section 3.3 . 2.3. Mo del for analyzing reaction times Psyc hological experiments t ypically ha ve a hierarc hical structure – eac h subject performs 4 ba yes4psy t(ν , μ , σ) t distribution y i prior distributions Figure 1: Visualization of the Bay esian t-test. The mo del has three parameters – degrees of freedom ν , mean µ and v ariance σ . y i denotes i -th datum in the provided data set. the same test for a num b er of times, sev eral sub jects are then group ed together on their c haracteristics (e.g. by age, sex, health) and the ﬁnal statistical analysis is then conducted at the group level. Suc h structure is ideal for Ba y esian hierarc hical mo delling ( Krusc hk e 2014 ). Our sub ject-level reaction time mo del is based on the exp onentially modiﬁed normal distri- bution. This distribution has pro ven to b e a suitable interpretation for the long tailed data that arise in reaction time measurements. T o mo del the data at the group lev el we put hi- erarc hical normal priors on all parameters of the sub ject-level exp onentially mo diﬁed normal distribution. The subject lev el parameters are th us µ i , σ i and λ i , where i is the sub ject index. And hierarc hical normal priors on these parameters are N ( µ m , σ m ) for the µ parameter, N ( µ s , σ s ) for the σ parameter and N ( µ l , σ l ) for the λ parameter. Figure 2 is a graphical representation of the Bay esian reaction time mo del. F or a practical application of this mo del see Section 3.1 . 2.4. Mo del for analyzing success rates The success rate mo del is based on the Bernoulli-Beta mo del that is found in most Bay esian statistics textb o oks ( Gelman et al. 2014 ; Krusc hke 2014 ; McElreath 2018 ). This mo del is used for modelling binary data, in our case whether or not a sub ject solv es a psyc hological task. The success rates mo del also has a hierarchical structure. The success rate of individual sub jects is modelled using Bernoulli distributions, where the p i is the success rate of subject i . A reparametrized Beta distribution, Beta( pτ , (1 − p ) τ ) , is used as a hierarc hical prior on sub ject-lev el parameters, where p is the group level success rate and τ is the scale parameter. A graphical represen tation of our hierarc hical success rate model can b e seen in Figure 3 . F or a practical application of this mo del see Section 3.1 . Journal of Statistical Soft ware 5 N(μ μ , σ μ ) N(μ σ , σ σ ) N(μ λ , σ λ ) prior distributions normal distribution normal distribution normal distribution ... emn(μ 1 , σ 1 , λ 1 ) t 1,i exponentially modified normal distribution emn(μ 2 , σ 2 , λ 2 ) t 2,i exponentially modified normal distribution emn(μ n , σ n , t n ) t n,i exponentially modified normal distribution Figure 2: Visualization of the Ba y esian reaction time mo del. The mo del has a hierar- c hical structure, reaction times b elonging to eac h individual subject ( t n,i depicts i -th reaction time of the subject n ) are used to construct exponentially mo diﬁed normal distributions at the sub ject level. Parameters of subject lev el distributions are then connected at the group lev el b y using n ormal distributions, which can then be used for group lev el analysis. 2.5. Mo del for analysis of sequen tial tasks In some psyc hological experiments the data ha v e a time comp onent or are ordered in some other w ay . F or example, when subjects participate in a sequence of questions or tasks. T o mo del ho w a subject’s p erformance c hanges o ver time, we implemented a hierarchical linear normal mo del. The sequence for each individual sub ject is modelled using a simple linear mo del with subject- sp eciﬁc slop e and intercept. T o mo del the data at the group lev el w e put hierarchical normal priors on all parameters of the sub ject-level linear mo dels. The parameters of sub ject i are th us α i for the intercept and β i for the slop e of the linear model along with σ i for mo delling errors of the ﬁt (residuals). And hierarc hical normal priors on these parameters are N ( µ α , σ α ) for the in tercept ( α ), N ( µ β , σ β ) for the slope ( β ) parameter, along with N ( µ σ , σ σ ) for the residuals ( σ ). A graphical representation of the mo del is shown in Figure 4 . F or a practical application of this mo del see Section 3.2 . 2.6. Mo del for analysis of color based tasks Color stimuli and sub ject resp onses in psyc hological exp erimen ts are most commonly deﬁned through the RGB color model. The name of the mo del comes from the initials of the three additiv e primary colors, red, green and blue. These colors are also the three comp onents 6 ba yes4psy β(p • τ , (1 - p) • τ) prior distributions ... bernoulli(p 1 ) y 1,i Bernoulli distribution bernoulli(p 2 ) y 2,i bernoulli(p n ) y n,i be t a distribution Bernoulli distribution Bernoulli distribution Figure 3: Visualization of the Bay esian success rate mo del. The mo del has a hi- erarc hical structure, data ab out success of individual sub jects ( y n,i depicts success on the i -th attempt of the sub ject n ) is used for ﬁtting Bernoulli distributions on the sub ject lev el. P arameters of subject lev el distributions are then connected at the group level with a Beta distribution. of the mo del, eac h comp onent has a v alue from 1 to 255 which deﬁnes the presence of a particular color. Since deﬁning and analyzing colors through the R GB mo del is not very user friendly or intuitiv e, our Bay esian mo del is capable of working with b oth the RGB and HSV color mo dels. HSV (h ue, saturation and v alue) is an alternativ e representation of the R GB mo del that is muc h easier to read and interpret for most h uman b eings. The Bay esian color mo del works in a comp onent-wise fashion – six distributions (three for the R GB components and three for the HSV comp onen ts) are ﬁtted to data for each comp onen t individually . F or R GB comp onents w e use normal distributions (truncated to the [0, 255] in terv al). In the HSV case, we used normal distributions (truncated to the [0, 1] in terv al) for saturation and v alue components and the von Mises distribution for the hue comp onen t. The v on Mises distribution (also known as the circular normal distribution) is a close approx- imation to the normal distribution wrapp ed on the [0, 2pi] interv al. A visualization of our Ba yesian mo del for colors can b e seen in Figure 5 and its practical application in Section 3.4 . 2.7. Ba y esian b o otstrap Bo otstrapping is a commonly used resampling technique for estimating statistics on a popu- lation by sampling a data set with replacemen t. It is usually used to for ev aluating measures of accuracy (suc h as the mean, bias, standard deviation, conﬁdence in terv als ...) to sample estimates. It allo ws estimation of the sampling distribution of almost any statistic using random sampling metho ds and is as suc h useful in a wide rep ertoire of scenarios. Journal of Statistical Soft ware 7 N(μ α , σ α ) N(μ β , σ β ) N(μ σ , σ σ ) prior distributions normal distribution normal distribution normal distribution ... N(α 1 + β 1 • x 1,i , σ 1 ) N(α 2 + β 2 • x 2,i , σ 2 ) N(α n + β n • x n,i , σ n ) y 1,i | x 1,i y 2,i | x 2,i y n,i | x n,i ~ ~ ~ linear normal model linear normal model linear normal model Figure 4: Visualization of the hierarc hical linear mo del. The mo del has a hierarchical structure, linear normal mo dels are ﬁtted on the sub ject lev el from data belonging to eac h particular sub ject. Since the ordering of results is imp ortant input data come in pairs of dep enden t (e.g. result or answ er) and independent v ariables (e.g. time or the question index), y n,i | x n,i depicts the v alue of the i -th dependent v ariable given the v alue of the independent v ariable i for the sub ject n . P arameters of sub ject lev el distributions are joined on the group lev el b y using normal distributions. These distributions can then b e used for group level analysis of the data. The Bay esian bo otstrap inside the ba yes4psy pac kage is the analogue of the classical b o otstrap ( Efron 1979 ). It is based on Rasm us Bååth’s implemen tation ( Bååth 2015 ), which in turn is based on metho ds dev elop ed b y ( R ubin 1981 ). Bay esian bo otstrap do es not sim ulate the sampling distribution of a statistic estimating a parameter, but instead sim ulates the p osterior distribution of the parameter. The statistical mo del underlying the Bay esian b o otstrap can b e c haracterized b y dra wing w eights from a uniform Dirichlet distribution with the same dimension as the n umber of data p oin ts, these draws are then used for calculating the statistic in question and w eighing the data ( Bååth 2015 ). F or more details ab out the implemen tation see ( Bååth 2015 ) and ( R ubin 1981 ). 2.8. Metho ds for ﬁtting and analyzing Ba yesian ﬁts This section provides a quick ov erview of all the metho ds for ﬁtting and analyzing the models describ ed in previous sections. F or a more detailed description of eac h function readers are invited to consult the do cumentation and examples that are provided along with the R pac kage. The ﬁrst set of functions constructs a Ba yesian mo del from input data. Users can al so use these functions to deﬁne priors (for an example, see the second part of the Section 3.1 ), set 8 ba yes4psy N(μ r , σ r ) N(μ g , σ g ) N(μ b , σ b ) prior distributions normal distribution normal distribution normal distribution r i g i b i vM(μ h , κ h ) N(μ s , σ s ) N(μ v , σ v ) v on Mises distribution normal distribution normal distribution h i s i v i Figure 5: Visualization of the Bay esian color mo del. The mo del is comp osed of six parts. Three parts are used to describe the RGB (red, green, blue) color model comp onents and three parts are used to describ e the HSV (hue, saturation, v alue) color mo del components. All components, except h ue, are mo deled with normal distributions, while h ue is mo delled with the von Mises distribution – a circular normal distribution. the num b er of MCMC iterations along with sev eral other parameters of the MCMC algorithm (some basic MCMC settings are describ ed in the do cumentation of this pac kage, for more adv anced settings consult the oﬃcial Stan ( Carpenter et al. 2017 ) do cumentation). • b_ttest is a function for ﬁtting the Ba y esian t-test model, the input data is a v ector of normally distributed real n umbers. • b_linear to construct the hierarc hical linear mo del for analyzing sequen tial tasks users ha ve to provide three data v ectors – x a vector con taining v alues of the in dependant v ariable (time, question index ...), y a vector containing v alues of the dep endant v ariable (sub ject’s responses) and s a vector containing ids of sub jects used for denoting that x i / y i pair b elongs to sub ject i . • b_reaction_time input data to the Bay esian reaction time model consists of t w o v ectors – vector t includes reaction times while v ector s is used for linking reaction times with sub jects. • b_success_rate to ﬁt the Ba y esian success rate model users ha v e to provide t wo data v ectors. The ﬁrst vector r contains results of an exp erimen t with binary outcomes (e.g. success/fail, hit/miss ...) and the second v ector s is used to link results to sub jects. • b_color input data to this model is a three column matrix or a data.frame where eac h column represents one of the comp onents of the chosen color mo del (R GB or HSV). If the input data are provided in the HSV format then users also hav e to set the hsv parameter to TRUE . Journal of Statistical Soft ware 9 • b_bootstrap the mandatory input into the Ba yesian bo otstrap function is the input data and the statistics function. The input data can b e in the form of a vector, matrix or a data.frame . Before interpreting the results, users can use the follo wing functions to chec k if the mo dels pro vide a go o d representation of the data: • plot_trace draws the Mark ov c hain trace plot for main parameters of the model, pro viding a visual wa y to insp ect sampling b ehavior and assess mixing across chains and conv ergence. • plot_fit draws the ﬁtted distribution against the input data. With hierarc hical mo dels w e can use the sub jects parameter to draw ﬁts on the sub ject level. • plot_fit_hsv a sp ecial function for insp ecting the ﬁt of the color mo del b y using a color wheel like visualization of HSV comp onen ts. F or a summary of the p osterior with Mon te Carlo standard errors and conﬁdence in terv als users can use the summary or print / show functions. • summary prin ts summary statistics of the main mo del’s parameters. • print , show prin ts a more detailed summary of the mo del’s parameters. It includes estimated means, Mon te Carlo standard errors ( se_mean ), conﬁdence in terv als, eﬀective sample size ( n_eff , a crude measure of eﬀectiv e sample size), and the R-hat statistic for measuring auto-correlation. R-hat measures the p otential scale reduction factor on split chains and equals 1 at conv ergence ( Bro oks and Gelman 1998 ; Gelman and Rubin 1992 ). Users can also extract samples from the p osterior for further analysis: • get_parameters returns a data.frame of mo del’s parameters. In hierarc hical mo dels this returns a data.frame of group lev el parameters. • get_subject_parameters can b e used to extract sub ject level parameters from hierar- c hical models. The compare_means function can b e used for comparison of parameters that represent means of the ﬁtted models, to visualize these means one can use the plot_means function and for visualizing the diﬀerence b et ween means the plot_means_diﬀerence function. All comparison functions (functions that print or visualize diﬀerence b et ween ﬁtted mo dels) also oﬀer the option of deﬁning the region of practical equiv alence (rop e) b y setting the rope parameter. • compare_means prints and returns a data.frame con taining the comparison results, can b e used for comparing t wo or m ultiple mo dels at the same time. • plot_means_difference visualizes the diﬀerence of means b etw een tw o or multiple mo dels at the same time. 10 ba yes4psy • plot_means plots the distribution of parameters that depict means, can b e used on a single or multiple mo dels at the same time. • plot_means_hsv a sp ecial function for the Ba yesian color model that plots means of HSV comp onents b y using a color wheel like visualization. The following set of functions works in a similar fashion as the one for comparing means, the diﬀerence is that this one compares entire distributions and not just the means. The comparison is based on dra wing a large amount of samples from the distributions. • compare_distributions prin ts and returns a data.frame con taining the comparison results, can b e used for comparing t w o or m ultiple models at the same time. • plot_distributions_difference visualizes the diﬀerence of distributions underlying t wo or multiple ﬁts at the same time. • plot_distribution plots the distributions underlying the ﬁtted models, can b e used on a single or m ultiple models at the same time. • plot_distributions_hsv a sp ecial function for the Bay esian color mo del that plots the distribution b ehind HSV components by using a color wheel lik e visualization. 3. Illustrations Belo w are illustrative practical examples of mo dels in the bay es4psy pac kage. Additional ex- amples can be found in the online repository ( https://github.com/bstatcomp/bayes4psy_ tools ). F or the sake of brevity , we omitted similar visualizations and outputs, e.g. w e provided the diagnostic outputs and visualizations only the ﬁrst time they app ered and omitted them later due to similarity . 3.1. The ﬂank er task In the Eriksen ﬂank er task ( Eriksen and Eriksen 1974 ) participan ts are presented with an image of an o dd n umber of arro ws (usually ﬁv e or sev en). Their task is to indicate the orien tation (left or right) of the middle arro w as quic kly as p ossible whilst ignoring the ﬂanking arro ws on left and right. There are tw o types of stimuli in the task: in the c ongruent condition (e.g. ‘«««<‘) b oth the middle arrow and the ﬂanking arrows point in the same direction, whereas in the inc ongruent condition (e.g. ‘«<>«<‘) the middle arro w points to the opp osite direction of the ﬂanking arrows. As the participants hav e to consciously ignore and inhibit the misleading information pro vided b y the ﬂanking arro ws in the incongruen t condition, the performance in the incongruen t condition is robustly w orse than in the congruent condition, both in terms of longer reaction times as well as higher prop ortion of errors. The diﬀerence b et ween reaction times and error rates in congruen t and incongruent cases is a measure of subject’s abilit y to fo cus his or her atten tion and inhibit distracting stim uli. Journal of Statistical Soft ware 11 In the illustration b elo w w e compare reaction times and error rates when solving the ﬂank er task b etw een the con trol group (health y sub jects) and the test group (sub jects suﬀering from a certain medical condition). First, we load package bay es4psy and package dplyr for data wrangling. Second, we load the data and split them in to control and test groups. F or reaction time analysis we use only data where the resp onse to the stimuli w as correct: R> library(bayes4psy) R> library(dplyr) R> data <- read.table("./data/flanker.csv", sep="\t", header=TRUE) R> control_rt <- data %>% filter(result == "correct" & group == "control") R> test_rt <- data %>% filter(result == "correct" & group == "test") The mo del requires sub jects to b e indexed from 1 to n . Control group subject indexes range from 22 to 45, so we cast it to 1 to 23. R> control_rt$subject <- control_rt$subject - 21 No w we are ready to ﬁt the Ba yesian reaction time model for b oth groups. The mo del function requires tw o parameters – a v ector of reaction times t and the vector of subject indexes s . R> rt_control_fit <- b_reaction_time(t=control_rt$rt, s=control_rt$subject) R> rt_test_fit <- b_reaction_time(t=test_rt$rt, s=test_rt$subject) Before we interpret the results, w e chec k MCMC diagnostics and mo del ﬁt. plot_trace(rt_control_fit) plot_trace(rt_test_fit) 12 ba yes4psy mu_m mu_s mu_l 0 500 1000 1500 2000 0 500 1000 1500 2000 0 500 1000 1500 2000 2.5 5.0 7.5 10.0 12.5 0.0 0.5 1.0 1.5 −2 −1 0 1 chain 1 2 3 4 Figure 6: The trace plot for rt_control_fit . The traceplot gives us no cause for concern regarding MCMC con vergence and mixing. The trace plot for rt_test_fit is similar. Note that the ﬁrst 1000 iterations (shaded gra y) are used for warm up (tuning of the MCMC algorithm) and are discarded. The next 1000 iterations are used for sampling. R> print(rt_control_fit) Inference for Stan model: reaction_time. 4 chains, each with iter=2000; warmup=1000; thin=1; post-warmup draws per chain=1000, total post-warmup draws=4000. mean se_mean sd 2.5% 97.5% n_eff Rhat mu[1] 0.46 0.00 0.01 0.44 0.47 4789 1 mu[2] 0.36 0.00 0.01 0.35 0.38 4661 1 ... sigma[1] 0.04 0.00 0.01 0.03 0.05 5406 1 sigma[2] 0.03 0.00 0.01 0.02 0.04 5165 1 ... lambda[1] 14.41 0.02 1.62 11.59 17.87 4441 1 lambda[2] 11.59 0.02 1.15 9.53 14.01 5271 1 ... mu_m 0.51 0.00 0.01 0.48 0.54 5589 1 mu_l 6.86 0.01 0.91 5.12 8.75 5299 1 mu_s 0.07 0.00 0.01 0.06 0.08 4115 1 sigma_m 0.06 0.00 0.01 0.05 0.09 6078 1 sigma_l 4.24 0.01 0.78 3.02 5.99 3940 1 sigma_s 0.02 0.00 0.00 0.01 0.03 3862 1 rt 0.66 0.00 0.02 0.61 0.71 5112 1 rt_subjects[1] 0.53 0.00 0.01 0.51 0.54 4261 1 Journal of Statistical Soft ware 13 rt_subjects[2] 0.45 0.00 0.01 0.44 0.47 5654 1 ... R> print(rt_test_fit) The output abov e is truncated and sho ws only v alues for 2 of the 24 sub jects on the sub ject lev el of the hierarchical mo del. The output pro vides further MCMC diagnostics, which again do not giv e us cause for concern. The conv ergence diagnostic Rhat is practically 1 for all parameters and there is little auto-correlation (p ossibly ev en some positive auto-correlation) – eﬀective sample sizes ( n_eff ) are of the order of samples taken and Monte Carlo standard errors ( se_mean ) are relativ ely small. What is a go o d-enough eﬀectiv e sample sizes dep ends on our goal. If w e are in terested in p osterior quan tities such as the more extreme p ercen tiles, the eﬀective sample sizes should b e 10,000 or higher, if possible. If w e are only in terested in estimating the mean, 100 eﬀective samples is in most cases enough for a practically negligible Monte Carlo error. W e can increase the eﬀectiv e sample size by increasing the amoun t of MCMC iterations with the iter parameter. In our case we can ac hiev e an eﬀective sample size of 10,000 b y setting iter to 4000. Because the MCMC diagnostics giv e us no reason for concern, w e can lea ve the warmup parameter at its default v alue of 1000. R> rt_control_fit <- b_reaction_time(t=control_rt$rt, s=control_rt$subject, iter=4000) R> rt_test_fit <- b_reaction_time(t=test_rt$rt, s=test_rt$subject, iter=4000) Because w e did not explicitly deﬁne any priors, ﬂat (improp er) priors were put on all of the mo del’s parameters. In some cases, ﬂat priors are a statement that we ha ve no prior kno wledge ab out the exp eriment results (in some sense). In general, ev en ﬂat priors can express a preference for a certain region of parameter space. In practice, we will almost alw ays hav e some prior information and we should incorp orate it in to the modelling pro cess. Next, we c heck whether the mo del ﬁts the data w ell by using the plot_fit function. If we set the subjects parameter to FALSE , w e will get a less detailed group level ﬁt. 14 ba yes4psy R> plot_fit(rt_control_fit) R> plot_fit(rt_test_fit) 21 22 23 24 16 17 18 19 20 11 12 13 14 15 6 7 8 9 10 1 2 3 4 5 0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 0 2 4 6 0 2 4 6 0 2 4 6 0 2 4 6 0 2 4 6 reaction time density Figure 7: The ﬁt plot for the rt_control_fit . The data are visualized as a blue region while the ﬁt is visualized with a black line. In this case the mo del ﬁts the underlying data w ell, similar conclusions can b e reached for the test group ( rt_test_fit ). W e now use the compare_means function to compare reaction times b etw een healthy (con trol) and unhealthy (test) subjects. In the example b elow w e use a rop e (region of practical equiv- alence) interv al of 0.01 s, meaning that diﬀerences smaller that 1/100 of a second are deemed as equal. The compare_means function pro vides a user friendly output of the comparison and also returns the results in the form of a data.frame . R> rt_control_test <- compare_means(rt_control_fit, fit2=rt_test_fit, rope=0.01) ---------- Group 1 vs Group 2 ---------- Probabilities: - Group 1 < Group 2: 0.98 +/- 0.00409 - Group 1 > Group 2: 0.01 +/- 0.00304 - Equal: 0.01 +/- 0.00239 95% HDI: - Group 1 - Group 2: [-0.17, -0.01] The compare_means function output contains probabilities that one group has shorter re- action times than the other, the probability that b oth groups are equal (if rope in terv al is pro vided) and the 95% HDI (highest density interv al, Krusc hke ( 2014 )) for the diﬀerence b et ween groups. Based on the output w e are quite certain (98% +/- 0.5%) that the healthy Journal of Statistical Soft ware 15 group’s ( rt_control_fit ) expected reaction times are lo w er than the unhealth y group’s ( rt_test_fit ). W e can also visualize this diﬀerence by using the plot_means_difference function. The plot_means function is an alternative for comparing rt_control_fit and rt_test_fit – the function visualizes the parameters that determine the means of each mo del. R> plot_means_difference(rt_control_fit, fit2=rt_test_fit, rope=0.01) −0.09 −0.17 −0.01 0 50 100 −0.2 −0.1 0.0 difference count Figure 8: A visualization of the diﬀerence b etw een rt_control_fit and rt_test_fit . The histogram visualizes the distribution of the diﬀerence, vertical blue line denotes the mean, the blac k band at the b ottom marks the 95% HDI in terv al and the gray band marks the rop e in terv al. Since the whole 95% HDI of diﬀerence is negative and lies outside of the rop e in terv al w e can conclude that the statement that healthy subjects are faster than unhealthy ones is most likely correct. 16 ba yes4psy R> plot_means(rt_control_fit, fit2=rt_test_fit) 0 5 10 15 0.00 0.25 0.50 0.75 1.00 reaction time density group 1 2 Figure 9: A visualization of means for rt_control_fit and rt_test_fit . Group 1 visualizes means for the health y subjects and group 2 for the unhealthy sub jects. W e can with high probabilit y (98% +/- 0.5%) claim that health y sub jects hav e faster reaction times when solving the ﬂank er task than unhealth y subjects. Next, we analyze if the same applies to success rates. The information about success of subject’s is stored as "correct"/"incorrect". Ho wev er, the Ba yesian success rate model requires binary inputs (0/1) so w e hav e to transform the data. Also, like in the reaction time example, we hav e to correct the indexes of con trol group sub jects. R> data$result_numeric <- 0 R> data[data$result == "correct", ]$result_numeric <- 1 R> control_sr <- data %>% filter(group == "control") R> test_sr <- data %>% filter(group == "test") R> control_sr$subject <- control_sr$subject - 21 Since the only prior information ab out the success rate of participants was the fact that success rate is located b etw een 0 and 1, w e used a beta distribution to put a uniform prior on the [0, 1] interv al (w e put a Beta (1 , 1) prior on the p parameter). W e execute the mo del ﬁtting by running the b_success_rate function with appropriate input data. R> p_prior <- b_prior(family="beta", pars=c(1, 1)) Journal of Statistical Soft ware 17 R> priors <- list(c("p", p_prior)) R> sr_control_fit <- b_success_rate(r=control_sr$result_numeric, s=control_sr$subject, priors=priors, iter=4000) R> sr_test_fit <- b_success_rate(r=test_sr$result_numeric, s=test_sr$subject, priors=priors, iter=4000) The pro cess for inspecting Bay esian ﬁts is the same and the results are similar as ab ov e, so w e omit them. When visually insp ecting the qualit y of the ﬁt (the plot_fit function) w e can set the subjects parameter to FALSE , which visualizes the ﬁt on th e group level. This oﬀers a quick er, but less detailed metho d of insp ection. R> plot_trace(sr_control_fit) R> plot_trace(sr_test_fit) R> print(sr_control_fit) R> print(sr_test_fit) R> plot_fit(sr_control_fit, subjects=FALSE) R> plot_fit(sr_test_fit, subjects=FALSE) Since diagnostic functions sho w no pressing issues and the ﬁts lo ok goo d w e can pro ceed with the actual comparison b et ween the tw o ﬁtted models. W e will again estimate the diﬀerence b et ween t wo groups with compare_means . R> sr_control_test <- compare_means(sr_control_fit, fit2=sr_test_fit) ---------- Group 1 vs Group 2 ---------- Probabilities: - Group 1 < Group 2: 0.53 +/- 0.01052 - Group 1 > Group 2: 0.47 +/- 0.01052 95% HDI: - Group 1 - Group 2: [-0.02, 0.02] As w e can see the success rate b et ween the tw o groups is not that diﬀeren t. Since the probabilit y that healthy group is more successful is only 53% (+/- 1%) and the 95% HDI of the diﬀerence ([0.02, 0.02]) includes the 0 we cannot claim inequality ( Kruschk e 2014 ). W e can visualize this result b y using the plot_means_difference function. 18 ba yes4psy R> plot_means_difference(sr_control_fit, fit2=sr_test_fit) −0.00 −0.02 0.02 0 100 200 300 400 −0.04 −0.02 0.00 0.02 0.04 difference count Figure 10: A visualization of the diﬀerence b etw een the sr_control_fit and the sr_test_fit . Histogram visualizes the distribution of the diﬀerence, v ertical blue line denotes the mean diﬀerence and the black band at the bottom marks the 95% HDI in terv al. Since the 95% HDI of diﬀerence includes the v alue of 0 w e cannot claim inequalit y . If w e used a rope in terv al and the whole rope in terv al lied in the 95% HDI interv al we could claim equality . Journal of Statistical Soft ware 19 3.2. A daptation lev el In the adaptation lev el exp eriment participan ts had to assess weigh ts of the ob jects placed in their hands by using a verbal scale: very very ligh t, v ery ligh t, light, medium light, medium, medium hea vy , hea vy , v ery heavy and very very heavy . The task w as to assess the weigh t of an ob ject that w as placed on the palm of their hand. T o standardize the pro cedure the participan ts had to place the elbow on the desk, extend the palm and assess the weigh t of the ob ject after it w as placed on their palm b y sligh t up and do wn mo vemen ts of their arm. During the experiment participants w ere blinded b y using non-transparen t fabric. In total there w ere 15 objects of the same shape and size but diﬀeren t mass (photo ﬁlm canisters ﬁlled with metallic balls). Ob jects were group ed into three sets: • light set: 45 g, 55 g, 65 g, 75 g, 85 g (weigh ts 1 to 5), • medium set: 95 g, 105 g, 115 g, 125 g, 135 g (w eights 6 to 10), • heavy set: 145 g, 155 g, 165 g, 175 g, 185 g (weigh ts 11 to 15). The exp erimenter sequentially placed weigh ts in the palm of the participant and recorded the trial index, the w eigh t of the object and participan t’s resp onse. The participan ts were divided into t wo groups, in group 1 the participants ﬁrst assessed the w eights of the light set in ten rounds within whic h all ﬁv e w eights were weigh ted in a random order. After completing the 10 rounds with the light set, the exp erimenter switc hed to the medium set, without an y announcement or break. The participant then w eigh ted the medium set across another 10 rounds of w eighting the ﬁve w eights in a random order within each round. In group 2 the ov erall pro cedure was the same, the only diﬀerence being that they started with the 10 rounds of the heavy set and then p erformed another 10 rounds of weigh ting of the medium set. Imp ortantly , the weigh ts within eac h set w ere giv en in random order and the exp erimen ter switc hed b etw een sets seamlessly without an y break or other indication to the participan t. W e will use the bay es4psy package to show that the t wo groups provide diﬀeren t assessmen t of the w eights in the second part of the exp erimen t even though both groups are resp onding to w eights from the same (medium) set. The diﬀerence is v ery pronounced at ﬁrst but then fades a wa y with subsequent assessmen ts of medium w eights. This is congruen t with the h yp othesis that eac h group formed a diﬀerent adaptation level during the initial phase of the task, the formed adaptation lev el then determined the perceptual exp erience of the same set of w eights at the b eginning of the second part of the task. W e will conduct the analysis b y using the hierarc hical linear model and the Bay esian t- test. First we ha ve to construct ﬁts for the second part of the exp eriment for each group indep enden tly . The code below loads and prepares the data, just lik e in the previous example, sub ject indexes ha ve to b e mapp ed to a [1, n] in terv al. W e will use to ggplot2 package to ﬁne-tune graph axes and prop erly annotate graphs returned b y the ba y es4psy pac kage. R> library(bayes4psy) R> library(dplyr) R> library(ggplot2) 20 ba yes4psy R> data <- read.table("./data/adaptation_level.csv", sep="\t", header=TRUE) R> group1 <- data %>% filter(group == 1) R> group2 <- data %>% filter(group == 2) R> n1 <- length(unique(group1$subject)) R> n2 <- length(unique(group2$subject)) R> group1$subject <- plyr::mapvalues(group1$subject, from=unique(group1$subject), to=1:n1) R> group2$subject <- plyr::mapvalues(group1$subject, from=unique(group1$subject), to=1:n2) R> group1_part2 <- group1 %>% filter(part == 2) R> group2_part2 <- group2 %>% filter(part == 2) Once the data is prepared w e can ﬁt the Ba yesian mo dels, the input data comes in the form of three v ectors, x stores indexes of the measurements, y subject’s resp onses and s indexes of sub jects. The warmup and iter parameters are set in order to ac hieve an eﬀective sample size of 10,000. R> fit1 <- b_linear(x=group1_part2$sequence, y=group1_part2$response, s=group1_part2$subject, iter=10000, warmup=500) R> fit2 <- b_linear(x=group2_part2$sequence, y=group2_part2$response, s=group2_part2$subject, iter=10000, warmup=500) The ﬁtting pro cess is alw ays follo wed b y the qualit y analysis. R> plot_trace(fit1) R> plot_trace(fit1) R> print(fit1) Inference for Stan model: linear. 4 chains, each with iter=10000; warmup=500; thin=1; post-warmup draws per chain=9500, total post-warmup draws=38000. mean se_mean sd 2.5% 97.5% n_eff Rhat alpha[1] 7.66 0.00 0.31 7.07 8.28 25452 1 alpha[2] 8.63 0.00 0.23 8.19 9.08 23074 1 Journal of Statistical Soft ware 21 ... beta[1] -0.14 0.00 0.04 -0.24 -0.06 20097 1 beta[2] -0.12 0.00 0.03 -0.19 -0.05 30442 1 ... sigma[1] 1.67 0.00 0.15 1.41 2.00 45998 1 sigma[2] 0.99 0.00 0.10 0.82 1.21 44379 1 ... mu_a 8.05 0.00 0.18 7.68 8.41 25983 1 mu_b -0.11 0.00 0.02 -0.15 -0.07 20126 1 mu_s 1.10 0.00 0.09 0.92 1.29 33871 1 sigma_a 0.61 0.00 0.16 0.38 0.98 24984 1 sigma_b 0.05 0.00 0.02 0.01 0.09 6726 1 sigma_s 0.34 0.00 0.08 0.21 0.54 30901 1 lp__ -374.28 0.09 6.47 -387.21 -361.12 5372 1 R> print(fit1_part2) R> plot_fit(fit1) R> plot_fit(fit1) The trace plot show ed no MCMC related issues (for an example of trace plot see Figure 6 ), eﬀectiv e sample sizes of parameters relev an t for our analysis ( µ a , µ b and µ s ) are large enough. Since the visual insp ection of the ﬁt also looks goo d w e can contin ue with our analysis. T o get a quic k description of ﬁts w e can tak e a lo ok at the summary statistics of mo del’s parameters. R> summary(fit1) intercept (alpha): 8.05 +/- 0.00266, 95% HDI: [7.69, 8.39] slope (beta): -0.11 +/- 0.00033, 95% HDI: [-0.15, -0.07] sigma: 1.10 +/- 0.00094, 95% HDI: [0.91, 1.28] R> summary(fit2) intercept (alpha): 5.81 +/- 0.00461, 95% HDI: [5.20, 6.43] slope (beta): 0.12 +/- 0.00036, 95% HDI: [0.08, 0.16] sigma: 1.40 +/- 0.00165, 95% HDI: [1.13, 1.66] V alues of intercept suggest that our initial h yp othesis ab out adaptation lev el is true. Sub ject’s that w eighted ligh ter ob ject in the ﬁrst part of the exp eriment ( fit1 ) ﬁnd medium ob jects at the b eginning of exp eriment’s second part hea vier than sub jects that w eighted heavier objects in the ﬁrst part ( fit2 ). W e can conﬁrm this assumption b y using functions that p erform a more detailed analysis (e.g. compare_means and plot_means_difference ). R> comparison_results <- compare_means(fit1, fit2=fit2) ---------- Intercept ---------- Probabilities: - Group 1 < Group 2: 0.00 +/- 0.00000 - Group 1 > Group 2: 1.00 +/- 0.00000 95% HDI: 22 ba yes4psy - Group 1 - Group 2: [1.54, 2.91] ---------- Slope ---------- Probabilities: - Group 1 < Group 2: 1.00 +/- 0.00000 - Group 1 > Group 2: 0.00 +/- 0.00000 95% HDI: - Group 1 - Group 2: [-0.29, -0.18] R> plot_means_difference(fit1, fit2=fit2, par="intercept") 2.25 1.54 2.91 0 500 1000 1500 1 2 3 4 difference count Intercept Figure 11: Diﬀerence of intercept b etw een the t wo ﬁts. Since the whole 95% HDI is p ositiv e w e are quite conﬁdent that the sub ject’s that weigh ted lighter ob ject in the ﬁrst part of the experiment ( fit1 ) ﬁnd medium ob jects heavier than subjects that initially weigh ted hea vier objects ( fit2 ). The fact that the slop e for the ﬁrst group is very lik ely to be negativ e (the whole 95% HDI lies b elo w 0) and positive for the second group (the whole 95% HDI lies ab o ve 0) suggests that the adaptation lev el phenomenon fades a wa y with time. W e can visualize this by plotting means and distributions underlying b oth ﬁts. The plotting functions in the ba yes4psy package return regular ggplot2 plot ob jects, so we can use the same tec hniques to annotate or change the lo ok and feel of graphs as we would with the usual ggplot2 visualizations. Journal of Statistical Soft ware 23 R> plot_distributions(fit1, fit2) + labs(title="Part II", x="measurement number", y="") + theme(legend.position=) + theme(legend.position="none") + scale_x_continuous(limits=c(1, 10), breaks=seq(1:10)) + ylim(0, 10) 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 measurement number weight Figure 12: Comparison of distributions underlying fit1 and fit2 . The hypothesis that each group formed a diﬀerent adaptation lev el during the initial phase of the task seems to be true. Group that switches from hea vy to medium weigh ts assesses w eigh ts as ligh ter than they really are while for the group that switches from light to medium the w eights app ear hea vier. With time these adaptation levels fade a wa y and assessments conv erge to the same estimates of weigh t. 24 ba yes4psy 3.3. The Stro op color-w ord test The Stroop test ( Stro op 1935 ) show ed that when the stimuli is incongruen t – the name of a color is prin ted in diﬀeren t ink than the one denoted b y its name (for example, red ) – naming the color tak es longer and is more error-prone than naming the color of a rectangle or a set of characters that does not form a word (for example, XXXXX )). In our version of the Stroop test participants w ere faced with four types of conditions: • Reading neutral – the name of the color was prin ted in blac k ink, the participant had to read the color’s name. • Naming neutral – string XXXXX was written in colored ink (red, green or blue), the participan t had to name the ink color. • Reading incongruen t – name of the color was printed in incongruent ink, the participant had to read the written name of the color. • Naming incongruen t – name of the color was prin ted in incongruen t ink, the participan t had to name the ink color. W e are primarily interested in expected task completion times. Every participan t had the same num b er of stimuli in every condition, so we opt for a Bay esian t-test. The data are already split into the four conditions describ ed abov e, so we only need to sp ecify the priors. W e based them on our previous exp erience with similar tasks – participan ts ﬁnish the task in appro ximately 1 minute and the typical standard deviation for a participan t is less than 2 min utes. R> library(bayes4psy) R> library(dplyr) R> library(ggplot2) R> data <- read.table("./data/stroop_simple.csv", sep="\t", header=TRUE) R> mu_prior <- b_prior(family="normal", pars=c(60, 30)) R> sigma_prior <- b_prior(family="uniform", pars=c(0, 120)) R> priors <- list(c("mu", mu_prior), c("sigma", sigma_prior)) R> fit_reading_neutral <- b_ttest(data$reading_neutral, priors=priors, iter=4000, warmup=500) R> fit_reading_incongruent <- b_ttest(data$reading_incongruent, priors=priors, iter=4000, warmup=500) R> fit_naming_neutral <- b_ttest(data$naming_neutral, priors=priors Journal of Statistical Soft ware 25 iter=4000, warmup=500) R> fit_naming_incongruent <- b_ttest(data$naming_incongruent, priors=priors, iter=4000, warmup=500) There w ere no reasons for concern in the MCMC diagnostics and mo del ﬁts, so we omit them for brevity . In practice, w e should of course alw ays p erform these steps. W e proceed by cross-comparing several ﬁts with a single line of code. R> fit_list <- c(fit_reading_incongruent, fit_naming_neutral, fit_naming_incongruent) R> multiple_comparison <- compare_means(fit_reading_neutral, fits=fit_list) ---------- Group 1 vs Group 2 ---------- Probabilities: - Group 1 < Group 2: 1.00 +/- 0.00054 - Group 1 > Group 2: 0.00 +/- 0.00054 95% HDI: - Group 1 - Group 2: [-4.66, -0.96] ---------- Group 1 vs Group 3 ---------- Probabilities: - Group 1 < Group 3: 1.00 +/- 0.00000 - Group 1 > Group 3: 0.00 +/- 0.00000 95% HDI: - Group 1 - Group 2: [-15.34, -10.19] ---------- Group 1 vs Group 4 ---------- Probabilities: - Group 1 < Group 4: 1.00 +/- 0.00000 - Group 1 > Group 4: 0.00 +/- 0.00000 95% HDI: - Group 1 - Group 2: [-36.72, -28.44] ---------- Group 2 vs Group 3 ---------- Probabilities: - Group 2 < Group 3: 1.00 +/- 0.00000 - Group 2 > Group 3: 0.00 +/- 0.00000 95% HDI: - Group 1 - Group 2: [-12.63, -7.09] ---------- Group 2 vs Group 4 ---------- Probabilities: - Group 2 < Group 4: 1.00 +/- 0.00000 - Group 2 > Group 4: 0.00 +/- 0.00000 26 ba yes4psy 95% HDI: - Group 1 - Group 2: [-34.12, -25.48] ---------- Group 3 vs Group 4 ---------- Probabilities: - Group 3 < Group 4: 1.00 +/- 0.00000 - Group 3 > Group 4: 0.00 +/- 0.00000 95% HDI: - Group 1 - Group 2: [-24.21, -14.88] ---------------------------------------- Probabilities that a certain group is smallest/largest or equal to all others: largest smallest equal 1 0 0.9991111111 0 2 0 0.0008888889 0 3 0 0.0000000000 0 4 1 0.0000000000 0 When w e compare more than 2 ﬁts, we also get an estimate of the probabilities that a group has the largest or the smallest exp ected v alue. Based on the ab o ve output, the participants are b est at the reading neutral task (Group 1), follow ed by the reading incongruent task (Group 2) and the naming neutral task (Group 3). They are the w orst at the naming incongruent task (Group 4). This ordering is true with very high probabilit y , so w e can conclude that b oth naming and incongruency of stimuli increase resp onse times of sub jects, with naming ha ving a bigger eﬀect. W e can also visualize this in v arious w a ys, either as distributions of mean times needed to solv e the giv en tasks (Figure 13 ) or as a diﬀerence b etw een these means (Figure 14 ). Journal of Statistical Soft ware 27 R> plot_means(fit_reading_neutral, fits=fit_list) + scale_fill_hue(labels=c("Reading neutral", "Reading incongruent", "Naming neutral", "Naming incongruent")) + theme(legend.title=element_blank()) 0.0 0.2 0.4 0.6 40 50 60 70 80 value density Reading neutral Reading incongruent Naming neutral Naming incongruent Figure 13: A visualization of means for all four types of Stro op tasks. X-axis (v alue) denotes task completion time. Naming and incongruency conditions mak e the task more diﬃcult, with naming ha ving a bigger eﬀect. 28 ba yes4psy R> plot_means_difference(fit_reading_neutral, fits=fit_list) 0.0 0.2 0.4 0.6 40 50 60 70 80 value density −2.84 −4.66 −0.96 0 200 400 600 −8 −6 −4 −2 0 difference count −12.70 −15.34 −10.19 0 200 400 600 −15 −10 difference count −32.60 −36.72 −28.44 0 200 400 600 −40 −35 −30 −25 difference count 2.84 0.96 4.66 0 200 400 600 0 2 4 6 8 difference count 0.0 0.2 0.4 40 50 60 70 80 value density −9.87 −12.63 −7.09 0 200 400 600 −16 −12 −8 −4 difference count −29.76 −34.12 −25.48 0 200 400 600 −40 −35 −30 −25 −20 difference count 12.70 10.19 15.34 0 200 400 600 10 15 difference count 9.87 7.09 12.63 0 200 400 600 4 8 12 16 difference count 0.0 0.1 0.2 0.3 40 50 60 70 80 value density −19.89 −24.21 −14.88 0 200 400 600 −30 −25 −20 −15 −10 difference count 32.60 28.44 36.72 0 200 400 600 25 30 35 40 difference count 29.76 25.48 34.12 0 200 400 600 20 25 30 35 40 difference count 19.89 14.88 24.21 0 200 400 600 10 15 20 25 30 difference count 0.00 0.05 0.10 0.15 40 50 60 70 80 value density Figure 14: Diﬀerences in the mean task completion times for the four conditions.. Ro w and column 1 represen t the reading neutral task, ro w and column 2 the reading incon- gruen t task, ro w and column 3 the naming neutral task and ro w and column 4 the naming incongruen t task. Since 95% HDI in terv als in all cases exclude 0 we are conﬁden t that the task completion times are diﬀeren t. Journal of Statistical Soft ware 29 3.4. Afterimages In the afterimages task participants w ere ask ed to ﬁx their gaze on a ﬁxation p oint in the middle of the computer screen. Stimulus – a colored rectangle – was then sho wn ab ov e the ﬁxation p oint. After 20 seconds the rectangle disappeared and a color palette was sho wn on the righ t-hand side of the screen. P articipants were asked to k eep their gaze on the ﬁxation p oin t while using the mouse to select the color that b est matches the color of the afterimage that appeared abov e the ﬁxation point. Then a colored rectangle of the selected color and same size as b efore w as shown b elow the ﬁxation p oint. Finally , participan ts conﬁrmed their selection. F or eac h trial the color of the stimulus rectangle, the resp onse in RGB and the resp onse time were recorded. The goal of this study was to determine which of the t wo color co ding mec hanisms (trichromatic or opp onent-process), b etter explains the color of the afterimages. W e used six diﬀerently colored rectangles: red, green, blue, cy an, magenta, y ellow. W e start our analysis by loading the exp eriment and stimuli data. The exp erimen t data include subject index, reaction time, resp onse in RGB format, stimuli name (e.g blue) and stim uli v alues in R GB and HSV. The stim uli data set includes only the information about stim uli (names, RGB and HSV v alues). R> library(bayes4psy) R> library(dplyr) R> library(ggplot2) R> data_all <- read.table("./data/after_images.csv", sep="\t", header=TRUE) R> stimuli <- read.table("./data/after_images_stimuli.csv", sep="\t", header=TRUE) W e can then ﬁt the Bay esian color mo del for red color stimuli and inspect the ﬁt. R> data_red <- data_all %>% filter(stimuli == "red") R> data_red <- data.frame(r=data_red$r, g=data_red$g, b=data_red$b) R> fit_red <- b_color(colors=data_red) R> plot_trace(fit_red) R> print(fit_red) R> plot_fit_hsv(fit_red) W e rep eat the same pro cess ﬁv e more times for the remaining ﬁve colors of stim uli. W e start the analysis by loading data ab out the colors predicted by the tric hromatic or the opp onent- pro cess theory . 30 ba yes4psy Figure 15: The special plot_fit_hsv function dev elop ed for the color mo del. Input data points are visualized with circles, mean of the ﬁt is visualized with a solid line and the 95% HDI of the underlying distribution is visualized as a colored band. R> trichromatic <- read.table("./data/after_images_trichromatic.csv", sep="\t", header=TRUE) R> opponent_process <- read.table("./data/after_images_opponent_process.csv", sep="\t", header=TRUE) W e can then use the plot_distributions_hsv function of the Bay esian color mo del to pro duce for each stimuli a visualization of the accuracy of b oth color co ding mechanisms predictions. Each graph visualizes the ﬁtted distribution, display ed stimuli and resp onses predicted b y the trichromatic and opp onent-process co ding. This additional information can b e added to the visualization via annotation p oin ts and lines. Below is an example for the red stimulus, visualizations for other ﬁve stimuli are practically the same. R> stimulus <- "red" R> lines <- list() R> lines[[1]] <- c(trichromatic[trichromatic$stimuli == stimulus, ]$h, trichromatic[trichromatic$stimuli == stimulus, ]$s, trichromatic[trichromatic$stimuli == stimulus, ]$v) R> lines[[2]] <- c(opponent_process[opponent_process$stimuli == stimulus, ]$h, opponent_process[opponent_process$stimuli == stimulus, ]$s, opponent_process[opponent_process$stimuli == stimulus, ]$v) R> points <- list() R> points[[1]] <- c(stimuli[stimuli$stimuli == stimulus, ]$h_s, stimuli[stimuli$stimuli == stimulus, ]$s_s, stimuli[stimuli$stimuli == stimulus, ]$v_s) Journal of Statistical Soft ware 31 R> plot_red <- plot_distributions_hsv(fit_red, points=points, lines=lines, hsv=TRUE) R> plot_red <- plot_red + ggtitle("Red") + theme(plot.title = element_text(hjust = 0.5)) W e use the co wplot library to combine the plots. R> cowplot::plot_grid(plot_red, plot_green, plot_blue, plot_yellow, plot_cyan, plot_magenta, ncol=3, nrow=2, scale=0.9) Red Green Blue Y ellow Cyan Magenta Figure 16: A comparison of thrichromatic and op onent-process color co ding pre- diction. The long solid line visualizes the trichromatic color co ding prediction while the dashed line visualizes the opponent-process color coding. Short solid line represents the mean h ue of the ﬁt and the the colored band the 95% HDI of the distribution underlying the ﬁt. The small colored circle visualizes the color of the presented stimuli. In the case of blue and yello w stim uli the dashed line is not visible b ecause b oth color codings predict the same outcome. The prediction based on the thrichromatic color co ding seems more accurate as its prediction is alw ays inside the 95% of the most probable sub ject’s resp onses and is alw ays closer to the mean predicted hue than the opp onent-process prediction. The opp onent-process prediction is outside of the 95% of the most probable subject’s responses in cases of red and green stimuli. 32 ba yes4psy 4. Conclusion The ba y es4psy package helps psyc hology studen ts and researc hers with little or no exp erience in Ba y esian statistics and probabilistic programming to do mo dern Ba y esian analysis in R . The pac kage includes several Bay esian models that co v er a wide range of tasks that arise in psyc hological exp eriments. Users can perform a Ba yesian t-test or Bay esian b o otstrap and can analyze reaction times, success rates, colors or sequen tial tasks. The package cov ers all parts of Ba yesian data analysis, from ﬁtting and diagnosing ﬁtted mo dels to model visualization and comparison. W e plan to upgrade the package with additional to ols that will bring Bay esian statistics even closer to non-tec hnical researchers. F or example, w e will implemen t probability distribution elicitation tools, which will ease the extraction of prior knowledge from domain exp erts and the prior construction pro cess ( Morris, Oakley , and Crow e 2014 ). Over the last couple of y ears neuroimaging techniques (e.g. fMRI and EEG) ha v e become v ery popular for trac king brain activit y during psychological exp eriments. The implementation of Bay esian mo dels for analyzing such data is also one of our future goals. Computational details The results in this pap er were obtained using R 3.5.3. R itself and all packages used are a v ailable from the Comprehensive R Archiv e Netw ork (CRAN) at https://CRAN.R- project. org/ . The source co de of the bay es4psy pac kage can b e found at https://github.com/bstatcomp/ bayes4psy and the illustrative examples from Section 3 can be found at https://github. com/bstatcomp/bayes4psy_tools . The bay es4psy package is curren tly in the ﬁnal stages of CRAN publication. A c kno wledgmen ts The research b ehind this manuscript was partially funded by the Slov enian Researc h Agency (ARRS) through grants L1-7542 (A dv ancement of computationally in tensive methods for eﬃcien t mo dern general-purp ose statistical analysis and inference), P3-0338 (Ph ysiological mec hanisms of neurological disorders and diseases), J3-9264 (Decomposing cognition: W ork- ing memory mec hanism and representations), P5-0410 (Digitalization as driving force for sustainabilit y of individuals, organizations, and so ciety) and P5-0110 (Psyc hological and neu- roscien tiﬁc aspects of cognition). References Amrhein V, Greenland S, McShane B (2019). “Scientists Rise Up Against Statistical Signiﬁ- cance. ” Natur e , 567 , 305–307. doi:10.1038/d41586- 019- 00857- 9 . Bååth R (2015). “ba yesbo ot: an Implemen tation of R ubin’s (1981) Ba yesian Bo otstrap. ” URL http://www.sumsar.net/blog/2015/04/ the- non- parametric- bootstrap- as- a- bayesian- model/https://cran.r- project. Journal of Statistical Soft ware 33 org/web/packages/bayesboot/index.htmlhttp://www.sumsar.net/blog/2015/07/ easy- bayesian- bootstrap- in- r/ . Bak er M, P enny D (2016). “Is There a Repro ducibilit y Crisis?” Natur e , 533 (7604), 452–454. ISSN 14764687. doi:10.1038/533452A . Bro oks SP , Gelman A (1998). “General Metho ds for Monitoring Conv ergence of Iterative Sim ulations. ” Journal of Computational and Gr aphic al Statistics , 7 (4), 434–455. doi: 10.1080/10618600.1998.10474787 . Camerer CF, P aulson JA, Dreber A, Holzmeister F, Ho TH, Huber J, Johannesson M, Kirc hler M, Nav e G, Nosek BA, Pfeiﬀer T, Altmejd A, Buttrick N, Chan T, Chen Y, F orsell E, Gampa A, Heik ensten E, Hummer L, Imai T, Isaksson S, Manfredi D, Rose J, W agenmakers EJ, W u H (2018). “Ev aluating the Replicability of So cial Science Experiments in Nature and Science Betw een 2010 and 2015. ” Natur e Human Behaviour , 2 (9), 637–644. ISSN 23973374. doi:10.1038/s41562- 018- 0399- z . 9312160v1 , URL https://doi.org/10. 1038/s41562- 018- 0399- z . Carp en ter B, Lee D, Brubak er MA, Riddell A, Gelman A, Goo dric h B, Guo J, Hoﬀman M, Betancourt M, Li P (2017). “Stan: A Probabilistic Programming Language. ” Journal of Statistic al Softwar e , 76 (1). doi:10.18637/jss.v076.i01 . URL http://www.jstatsoft. org/ . Dunson DB (2001). “Commentary: Practical Adv an tages of Ba yesian Analysis of Epi- demiologic Data. ” A meric an Journal of Epidemiolo gy , 153 (12), 1222–1226. doi:https: //doi.org/10.1093/aje/153.12.1222 . Efron B (1979). “Bo otstrap Metho ds: Another Lo ok at the Jackknife. ” The A nnals of Statistics , 7 (1), 1–26. Eriksen BA, Eriksen CW (1974). “Eﬀects of Noise Letters Up on Identiﬁcation of a T ar- get Letter in a Nonsearch T ask. ” Journal of Child Psycholo gy and Psychiatry and A l lie d Disciplines , 16 (1), 143–149. ISSN 00219630. doi:10.3758/BF03203267 . Gelman A, Carlin JBB, Stern HSS, Dunson DB, V eh tari A, R ubin DBB (2014). Bayesian Data A nalysis, Thir d Edition . ISBN 9781439840955. doi:10.1007/s13398- 014- 0173- 7.2 . Gelman A, Rubin DB (1992). “Inference F rom Iterativ e Sim ulation Using Multiple Sequences. ” Statistic al Scienc e , 7 (4), 457–511. Hubbard R (2015). Corrupt R ese ar ch: The Case for R e c onc eptualizing Empiric al Management and So cial Scienc e . Sage Publications. Hurlb ert SH, Levine RA, Utts J (2019). “Coup de Grâce for a T ough Old Bull: “Statistically Signiﬁcan t” Expires. ” A meric an Statistician , 73 (sup1), 352–357. ISSN 15372731. doi: 10.1080/00031305.2018.1543616 . Krusc hke JK (2013). “Bay esian Estimation Sup ersedes the t T est. ” Journal of Exp erimental Psycholo gy , 142 (2), 573–603. doi:10.1037/a0029146 . Krusc hke JK (2014). Doing Bayesian Data A nalysis: A T utorial With R, JA GS, and Stan . 2nd edition. Academic Press. ISBN 978-0124058880. 34 ba yes4psy McElreath R (2018). Statistic al R ethinking: A Bayesian Course With Examples in R and Stan . CRC Press. ISBN 9781315362618. doi:10.1201/9781315372495 . McNutt M (2014). “Repro ducibility. ” Scienc e , 343 (6168), 229. doi:10.1126/science. 1250475 . Morris DE, Oakley JE, Crow e JA (2014). “A w eb-based to ol for eliciting probabilit y distri- butions from exp erts. ” Envir onmental Mo del ling and Softwar e , 52 , 1–4. ISSN 13648152. doi:10.1016/j.envsoft.2013.10.010 . URL http://dx.doi.org/10.1016/j.envsoft. 2013.10.010 . Munafò MR, Nosek BA, Bishop D V, Button KS, Cham b ers CD, P ercie Du Sert N, Simonsohn U, W agenmak ers EJ, W are JJ, Ioannidis JP (2017). “A Manifesto for Reproducible Science. ” Natur e Human Behaviour , 1 (1), 1–9. ISSN 23973374. doi:10.1038/s41562- 016- 0021 . URL http://dx.doi.org/10.1038/s41562- 016- 0021 . Op en Science Collab oration (2015). “Estimating the Repro ducibility of Psychological Sci- ence. ” Journal of the A meric an Statistic al A sso ciation , 349 (6251). doi:10.1126/science. aac4716 . Plummer M (2003). “JA GS: A Program for Analysis of Ba yesian Graphical Models Using Gibbs Sampling. ” In Pr o c e e dings of the 3r d international workshop on distribute d statistic al c omputing . Vienna. URL http://www.ci.tuwien.ac.at/Conferences/DSC- 2003/ . R Core T eam (2017). “R: A Language and En vironment for Statistical Computing. ” URL https://www.r- project.org/ . R ubin DB (1981). “The Bay esian Bo otstrap. ” The A nnals of Statistics , 9 (1), 130–134. Sc ho oler JW (2014). “Metascience Could Rescue the ‘replication crisis’. ” Natur e , 515 (7525), 9–9. ISSN 0028-0836. doi:10.1038/515009a . Stanley TD, Carter EC, Doucouliagos H (2018). “What Meta-Analyses Rev eal Ab out the Replicabilit y of Psyc hological Research. ” Psycholo gic al Bul letin , 144 (12), 1325–1346. ISSN 00332909. doi:10.1037/bul0000169 . Stro op JR (1935). “Studies of In terference in Serial V erbal Reactions. ” Journal of Exp eri- mental Psycholo gy , 18 (6), 643–661. W asserstein RL, Lazar NA (2016). “The ASA’s Statement on p-V alues: Context, Pro- cess, and Purp ose. ” The A meric an Statistician , 70 (2), 129–133. ISSN 0003-1305. doi: 10.1080/00031305.2016.1154108 . URL https://www.tandfonline.com/doi/full/10. 1080/00031305.2016.1154108 . W asserstein RL, Schirm AL, Lazar NA (2019). “Moving to a W orld Beyond “p < 0.05”. ” A meric an Statistician , 73 (sup1), 1–19. ISSN 15372731. doi:10.1080/00031305.2019. 1583913 . Wic kham H (2009). “ggplot2: Elegant Graphics for Data Analysis. ” URL http://ggplot2. org . Journal of Statistical Soft ware 35 Ziliak ST (2019). “How Large Are Y our G-V alues? T ry Gosset’s Guinnessometrics When a Little “p” Is Not Enough. ” A meric an Statistician , 73 (sup1), 281–290. ISSN 15372731. doi:10.1080/00031305.2018.1514325 . 36 ba yes4psy Aﬃliation: Jure Demšar F aculty of Computer and Information Science Univ ersity of Ljubljana V ečna pot 113 1000 Ljubljana, Slov enia E-mail: jure.demsar@fri.uni-lj.si

bayes4psy -- an Open Source R Package for Bayesian Statistics in Psychology

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment