INTEREST: INteractive Tool for Exploring REsults from Simulation sTudies

Simulation studies allow us to explore the properties of statistical methods. They provide a powerful tool with a multiplicity of aims; among others: evaluating and comparing new or existing statistical methods, assessing violations of modelling assu…

Authors: Aless, ro Gasparini, Tim P. Morris

INTEREST: INteractive Tool for Exploring REsults from Simulation sTudies
MMMMMM YYYY, V olume VV, Issue II. doi: XX.XXXXX/jdssv.v000.i00 2 INTEREST INTEREST : INteractiv e T o ol for Exploring REsults from Sim ulation sT udies Alessandro Gasparini Univ ersit y of Leicester Tim P . Morris MR C Clinical T rials Unit at UCL Mic hael J. Cro wther Univ ersit y of Leicester Abstract Sim ulation studies allo w us to explore the prop erties of statistical metho ds. They provide a p o w erful to ol with a multiplicit y of aims; among others: ev alu- ating and comparing new or existing statistical metho ds, assessing violations of mo delling assumptions, helping with the understanding of statistical concepts, and supp orting the design of clinical trials. The increased av ailability of p o werful computational to ols and usable softw are has contributed to the rise of sim ulation studies in the current literature. How ever, simulation studies in volv e increasingly complex designs, making it difficult to provide all relev ant results clearly . Dis- semination of results plays a fo cal role in simulation studies: it can drive applied analysts to use metho ds that ha ve b een shown to p erform w ell in their settings, guide researc hers to dev elop new methods in a promising direction, and provide insigh ts into less established metho ds. It is crucial that we can digest relev ant results of sim ulation studies. Therefore, w e dev elop ed INTEREST : an INter- active T o ol for Exploring REsults fr om Simulation sT udies . The to ol has b een dev elop ed using the Shiny framew ork in R and is av ailable as a web app or as a standalone package. It requires uploading a tidy format dataset with the results of a simulation study in R , Stata , SAS , SPSS , or comma-separated format. A v ariety of p erformance measures are estimated automatically along with Mon te Carlo standard errors; results and p erformance summaries are displa yed both in tabular and graphical fashion, with a wide v ariety of a v ailable plots. Conse- quen tly , the reader can fo cus on simulation parameters and estimands of most in terest. In conclusion, INTEREST can facilitate the in v estigation of results from sim ulation studies and supplemen t the rep orting of results, allowing researc hers to share detailed results from their simulations and readers to explore them freely . K eywor ds : Simulation study , Monte Carlo, Visualisation, Rep orting, R , Shiny , Repli- cabilit y . Journal of Data Science, Statistics, and Visualisation 3 1. Bac kground Mon te Carlo sim ulation studies are computer exp eriments based on generating pseudo- random observ ations from a kno wn truth. Statisticians usually mean Monte Carlo simulation study when they sa y Simulation study ; throughout this article, we will just use simulation study but this encapsulates Monte Carlo simulation studies. Simulation studies hav e several applications and represent an inv aluable to ol for statistical research no w ada ys: in statistics, establishing prop erties of curren t metho ds is k ey to allow them to b e used – or not – with confidence. Sometimes it is not p ossible to deriv e exact analytical prop erties; for example, a large sample appro ximation ma y b e p ossible, but ev aluating the approximation in finite samples is required. Approximations often re- quire assumptions as w ell: what are the consequences of violating such assumptions? Mon te Carlo simulation studies come to the rescue and can help to answ er these ques- tions. They also can help answer questions such as: is an estimator biased in a finite sample? What are the consequences of mo del missp ecification? Do confidence interv als for a given parameter achiev e the adv ertised/nominal level of cov erage? Ho w do es a newly dev elop ed metho d compare to an established one? What is the p o wer to detect a desired effect size under complex exp erimen tal settings and analysis metho ds? Sim ulation studies are being used increasingly in a wide v ariety of settings. F or in- stance, searching on the database of p eer-review ed researc h literature Scopus ( https:// www.scopus.com ) with the query string TITLE-ABS-KEY ("simulation study") AND SUBJAREA (math) yields more than 30000 results with a 20-fold increase during the last 30 years, from 148 do cuments in 1989 to 3185 in 2019 (Figure 1 ). The increased a v ailability of p ow erful computational to ols and ready-to-use soft w are to researc hers surely con tributed to the rise of simulation studies in the current literature. Figure 1: T rend in published do cuments on sim ulation studies from 1960 onw ards. The n um b er of do cumen ts was iden tified on Scopus via the searc h k ey TITLE-ABS-KEY ("simulation study") AND SUBJAREA (math) , and the num b er of do cuments identi- fied in 2019 is lab elled on the plot. 3,185 0 500 1,000 1,500 2,000 2,500 3,000 3,500 1959 1969 1979 1989 1999 2009 2019 Year Number of documents per year Despite the p opularity of simulation studies, they are often p o orly designed, analysed, 4 INTEREST and rep orted. Morris et al . review ed 100 researc h articles published in V olume 34 of Statistics in Me dicine (2015) with at least one sim ulation study and found that information on data-generating mec hanisms (DGMs), n um b er of rep etitions, softw are, and estimands were often lac king or po orly rep orted, making critical appraise and replication of published studies a difficult task ( Morris et al. 2019 ) . Another asp ect of sim ulation studies that is often p o orly rep orted or not rep orted at all is the Monte Carlo error of estimated performance measures, defined as the standard error of estimated p erformance, o wing to the fact that a finite num b er of rep etitions are used and so p erformance is estimated with uncertain t y . Mon te Carlo errors pla y an imp ortan t role in understanding the role of chance in the results of sim ulation studies and hav e b een sho w ed to b e sev erely underrep orted ( Koehler et al. 2009 ). The p ossibilit y of indep enden tly verifying results from scien tific studies is a fundamen tal asp ect of science ( Laine et al. 2007 ); as a consequence, several rep orting guidelines hav e emerged under the banner of the EQUA TOR Net work ( http://www.equator- network. org ) ( Sch ulz et al. 2010 ; von Elm et al. 2007 ). Despite similar calls for harmonised re- p orting to allo w for greater repro ducibilit y in the area of computation science ( Peng 2011 ) and several articles adv o cating for more rigour in sp ecific asp ects of simula- tion studies ( Hoaglin and Andrews 1975 ; Hauc k and Anderson 1984 ; Díaz-Emparanza 2002 ; Burton et al. 2006 ; White 2010 ; Smith and Marshall 2011 ), design and report- ing guidelines for simulation studies are generally lacking in the statistical literature, with a few examples in the area of structural equation mo delling ( Bandalos and Gagné 2012 ; Bo omsma 2013 ). Morris et al . in tro duced the ADEMP framework (Aims, Data- generating mec hanisms, Estimands, Metho ds, Performance measures) aiming to fill pre- cisely that gap. In the R ep orting section they compared the several wa ys of rep orting results that they observed in their reviews, including results in text for small simula- tion studies, tabulating and plotting results, and ev en the nested-lo op plot prop osed b y R üc k er and Sc h w arzer for fully-factorial simulation studies with many data-generating mec hanisms ( Rüc k er and Sc h w arzer 2014 ). They concluded by arguing that ther e is no c orr e ct way to pr esent r esults, but we enc our age c ar eful thought to facilitate r e adability, c onsidering the c omp arisons that ne e d to b e made . As outlined in Spiegelhalter et al ., there is little exp erimen tal evidence on how differ- en t t yp es of visualisations are p erceived ( Spiegelhalter et al. 2011 ); despite that, they highligh t the ease of impro ving understanding via in teractive visualisations that can b e adjusted b y the user to best fit sp ecific requiremen ts. The recen t adv ent of to ols suc h as Data-Driven Do cumen ts ( D 3 , or D3.js ) ( Bosto ck et al. 2011 ) and Shin y ( Chang et al. 2019 ) has further facilitated the developmen t of interactiv e visualisations. The increased av ailabilit y of p o w erful computational to ols has not only contributed to a rise in the p opularity of sim ulation studies, it has also allo w ed researc hers to sim ulate an ev er-growing n um b er of data-generating mechanisms and include sev eral estimands and metho ds to compare: up to 4 . 2 × 10 10 , 32, and 33, resp ectively , in the aforemen tioned review ( Morris et al. 2019 ). With a large num b er of data-generating mec hanisms, estimands, or metho ds, analysing and rep orting the results of a sim ulation study becomes cumbersome: what results shall w e focus on so as not to b ewilder readers? Whic h estimands and metho ds should w e include in our tables and plots? Ho w should w e plot or tabulate several data-generating mechanisms at once? In an attempt to address these questions, w e dev elop ed INTEREST , an INter active Journal of Data Science, Statistics, and Visualisation 5 T o ol for Exploring REsults fr om Simulation sT udies . INTEREST is a browser-based in teractiv e to ol, and it requires first uploading a dataset with results from a sim ulation study; then, it estimates performance measures and it displa ys a v ariety of tables and plots automatically . The user can fo cus on sp ecific data-generating mechanisms, estimands, and metho ds: tables and plots are up dated automatically . This article will in tro duce the implemen tation details of INTEREST in the Implementation section and the main features in the R esults and discussion section, where w e will further discuss its relev ance. W e also present a case study to motiv ate the use of INTEREST and illustrate its use in practice. Finally , w e conclude the manuscript with some ending remarks in the Conclusions section. 2. Implemen tation INTEREST was developed using the free statistical soft w are R ( R Core T eam 2020 ) and the R pac kage Shin y ( Chang et al. 2019 ). Shiny is an R package (and framework) that allo ws building in teractiv e w eb apps straight from within R : the resulting applications can b e hosted online, embedded in rep orts and dash b oards, or just run as standalone apps. The front-end of INTEREST has b een built using the shinydash b oard package ( Chang and Borges Rib eiro 2018 ); shinydash b oard is based up on A dminL TE ( https://adminlte. io/ ), an op en-source admin control panel built on top of the Bo otstrap framework (V er- sion 3.x) and released under the MIT license. The bac k-end functionality of INTEREST is published as a standalone R pac kage named rsimsum for easier long-term main tainabilit y ( Gasparini 2018 ); rsimsum is freely a v ailable on the Comprehensiv e R Arc hiv e Net work (CRAN) under the GNU General Public License V ersion 3 ( https://www.gnu.org/licenses/gpl- 3.0 ). INTEREST is av ailable as an online application and as a standalone version for offline use. The online v ersion is hosted at https://interest.shinyapps.io/interest/ , and can b e accessed via any web bro wser on an y device (desktop computers, laptops, tablets, smartphones, etc.). The standalone offline version can b e obtained from GitHub ( https://github.com/ellessenne/interest ) and can b e run on an y desktop com- puter and laptop with a lo cal instance of R ; if required, R can b e downloaded for free from the w ebsite of the R pro ject ( R Core T eam 2020 ). INTEREST (as rsimsum ) is published under the GNU General Public License V ersion 3. 3. Results and discussion The main in terface of INTEREST is presented in Figure 2 . The in terface is comp osed of a main area on the right and a na vigation bar on the left; the navigation bar includes sub-men us for customising plots or mo difying the default b eha viour of INTEREST . W e no w in tro duce and describ e the functionality of the application. 3.1. Data The use of INTEREST starts by pro viding a tidy dataset (also known as long format, 6 INTEREST Figure 2: Homepage of INTEREST . On the left, there is a navigation bar with sub- men us useful to tune the default b ehaviour of the app. On the right, the main windo w of INTEREST . Journal of Data Science, Statistics, and Visualisation 7 T able 1: Example of dataset in tidy format, with eac h row identifying a rep etition for eac h com bination of data-generating mechanism and analytical metho d. Rep etition DGM Metho d Estimate 1 1 1 ˆ θ 1 , 1 , 1 2 1 1 ˆ θ 2 , 1 , 1 3 1 1 ˆ θ 3 , 1 , 1 1 2 1 ˆ θ 1 , 2 , 1 2 2 1 ˆ θ 2 , 2 , 1 3 2 1 ˆ θ 3 , 2 , 1 1 1 2 ˆ θ 1 , 1 , 2 2 1 2 ˆ θ 2 , 1 , 2 3 1 2 ˆ θ 3 , 1 , 2 1 2 2 ˆ θ 1 , 2 , 2 2 2 2 ˆ θ 2 , 2 , 2 3 2 2 ˆ θ 3 , 2 , 2 . . . . . . . . . . . . with v ariables in columns and observ ations in ro ws ( Wic kham 2014 ); an example of tidy data is included in T able 1 ) with results from a simulation study via the Data tab from the side men u. A dataset can b e provided to INTEREST in three differen t w a ys: 1. The user can upload a dataset. The uploaded file can b e a comma-separated file ( .csv ), a Stata dataset (v ersion 8-15, .dta ), an SPSS dataset ( .sav ), a SAS dataset ( .sas7bdat ), or an R serialised ob ject ( .rds ); the format will b e inferred automatically from the extension of the uploaded file, and the auto-detection is case-insensitiv e. It is also p ossible to upload compressed files (ending in .gz , .bz2 , .xz , or .zip ) that are automatically decompressed. The maxim um supp orted file size is 100MB; 2. The user can provide a URL link to a dataset hosted elsewhere. All considerations relativ e to the file format from p oin t (1) are also v alid here; 3. Finally , the user can paste a dataset (e.g. from Microsoft Excel) in a text b o x. The pasted data is assumed to b e tab-separated. If users stored the results of their simulation study in a differen t format, we recommend using one of the readily av ailable to ols (e.g. the pivot_* functions from the tidyr pac kage in R or the reshape command in Stata ) to reshap e the data b efore uploading it to INTEREST . Once a dataset has b een uploaded via one of the three metho ds outlined, the user will hav e to define the v ariables required by INTEREST and some optional v ariables, dep ending on the structure of the input dataset. The names of each column (i.e. v ariable) from the uploaded dataset automatically p opulate a set of select-list inputs to assist the user. 8 INTEREST The only v ariable required b y INTEREST is a v ariable defining a p oin t estimate from the sim ulation study; users can also pass standard errors of suc h estimates, and the true v alue of the estimand. If neither of these v alues is pro vided, only p erformance measures that can actually b e calculated with the av ailable information are returned. In order to pro vide additional flexibility , the user can define a column in the dataset that defines the true v alues of the estimand: this is esp ecially useful e.g. in settings where the true v alue can v ary b etw een rep etitions. F urther to that, a user can pro vide rep etition-sp ecific confidence b ounds or ev en use t-distributed critical v alues rather than normal theory (by sp ecifying a column that contains degrees of freedom p er eac h rep etition); once again, this can all b e set via the Data tab, and will affect relev ant p erformance measures. Finally , a user can define a v ariable representing metho ds b e- ing compared with the current sim ulation study (and choose the comparator), and one or more v ariables defining data-generating mec hanisms (DGMs, e.g. sample size, true correlation, true baseline hazard function for surviv al models, etc.). W e denote with metho ds the levels of the factor of primary comparativ e in terest in a sim ulation study , and not necessarily an analytical metho d (strictly sp eaking). Other factors e.g. c har- acteristics of the data-generating mec hanism can b e used as well, if represen ting the primary comparativ e in terest of a study . In its current form, INTEREST can only accept a single column as a metho d v ariable; when the primary fo cus of a simulation study is on several factors at once, we suggest pre-pro cessing the dataset b y creating a single column with all p ossible combinations from the factors of interest (e.g. using the interaction function in R ). The V iew uplo ade d data side tab in INTEREST displays the dataset uploaded by the user using the R package DT , an R in terface to the DataT ables plug-in for jQuery ( Xie et al. 2020 ). The resulting table is in teractiv e and can b e sorted and filtered b y the user. It is go o d practice to verify that the uploaded dataset is as exp ected b efore contin uing with the analysis and any visual exploration. 3.2. Missing data INTEREST includes a section for exploring missingness of estimates and/or standard errors from eac h rep etition of a simulation study , which ma y o ccur, for example, due to non-con vergence of some rep etitions. Missing v alues need to b e carefully explored and handled at the initial stage of any analysis. Missingness may originate as a con- sequence of soft w are failures: if so, the co de could (or should) b e made more robust to ensure fewer or no failures. Con versely , missing data may arise as a consequence of characteristics of the simulated data, yielding to non-conv ergence of the estimation pro cedures. In other words, missing v alues may not b e missing completely at random. A discussion on the in terpretation of missing v alues can b e found elsewhere ( White et al. 2011 ; Morris et al. 2019 ). The missing data functionality is based on the R pac kage naniar ( Tierney et al. 2020 ), and can be accessed via the Missing data tab. It comprises visual and tabular sum- maries; missing data visualisations av ailable in INTEREST are the following: • Bar plots of num b er (or prop ortion) of missing v alues by metho d and data- generating mechanism (if defined). Num b er and proportion of missing v alues are pro duced for eac h v ariable included in the data uploaded to INTEREST ; Journal of Data Science, Statistics, and Visualisation 9 • A plot to visualise the amount of missing data in the whole dataset; • A scatter plot with missing status depicted with differen t colours; to b e able to plot missing v alues, they are replaced with v alues 10% low er than the minim um v alue in that v ariable. This plot allows identifying trends and patterns b etw een v ariables in missing v alues (e.g. all estimates with a very large standard error ha v e a missing p oin t estimate); • A heat plot with metho ds on the horizon tal axis and the data-generating mech- anisms on the v ertical axis, with the colour fill represen ting the p ercentage of missingness in eac h tile. Eac h plot can b e further customised and exp orted (e.g. for use in slides and rep orts): more details in the Plots section below. Finally , INTEREST computes and outputs a table with the num b er, proportion, and the cumulativ e num b er of missing v alues p er v ariable, stratifying b y method and data-generating mechanisms; the table can b e easily exp orted to L A T E X format for further use (via the kable function from the R pac kage knitr ( Xie 2020 )). 3.3. P erformance measures INTEREST estimates p erformance measures automatically as so on as the user defines the required v ariables via the Data tab. Supported p erformance measures are presented in T able 2 , and discussed in more detail elsewhere ( Burton et al. 2006 ; White 2010 ; Morris et al. 2019 ). In addition to that, INTEREST returns mean and median estimate, and mean and median squared error of the estimate. Finally , INTEREST computes and returns Mon te Carlo standard errors b y default. The list of p erformance measures estimated b y INTEREST can b e customised via the Options tab: b y default, all are included. 3.4. T ables Estimated p erformance measures are presen ted in tabular form in the Performanc e me asur es side tab, once again using the R pac kage DT . The table of estimated per- formance measures is relativ e to a giv en data-generating mec hanism, which can b e mo dified using a select list input on the side. It is also p ossible to customise the num- b er of significant digits and to select whether Monte Carlo standard errors should b e excluded in eac h table or not via the Options tab. Finally , it is p ossible to exp ort the tables in tw o w a ys: 1. Exp ort the table in L A T E X format, e.g. for use in rep orts, articles, or presen tations, via the Exp ort table tab and the kable function from the R pac kage knitr ( Xie 2020 ). The caption of the table can b e directly customised; 2. Exp ort estimated p erformance measures as a dataset, e.g. to b e used with a differen t soft ware package of c hoice. The table of estimated p erformance measures can b e exp orted as display ed b y INTEREST or in tidy format, and in a v ariety of formats: comma-separated ( .csv ), tab-separated ( .tsv ), R ( .rds ), Stata (v ersion 8-15, .dta ), SPSS ( .sav ), and SAS ( .sas7bdat ). 10 INTEREST T able 2: Ov erview of p erformance measures estimated by INTEREST . P erformance measure Description Bias Deviation b etw een estimate and the true v alue Empirical standard error Log-run standard deviation of the esti- mator Relativ e precision against a reference Precision of a metho d B compared to a reference metho d A Mean squared error The sum of squared bias and v ariance of the estimator Mo del standard error A v erage estimated standard error Co v erage Probabilit y that a confidence in terv al con tains the true v alue Bias-eliminated co v erage Co verage after remo ving bias, i.e. b y computing the probabilit y that a con- fidence in terv al contains the av erage p oin t estimate across rep etitions in- stead of the true v alue P o w er Po wer of a significance test 3.5. Plots INTEREST can pro duce a v ariety of plots to automatically visualise results from sim- ulation studies. Plots pro duced b y INTEREST can b e categorised in to tw o broad groups: plots of estimates (and their estimated standard errors) and plots of p erfor- mance, follo wing analysis. Plots for metho d-wise comparisons of estimated v alues and standard errors are: • Scatter plots; • Bland-Altman plots ( Altman and Bland 1983 ; Bland and Altman 1999 ); • Ridgeline plots ( Wilke 2018 ); • Contour and hexbin plots (as implemen ted in ggplot2 ’s geom_density_2d and geom_hex geometric ob jects). Eac h plot will include all data-generating mec hanisms b y default and allows comparing serial trends and the relativ e p erformance of metho ds included in the simulation study; con tour and hexbin plots are esp ecially useful to deal with ov erplotting. Con v ersely , the following plots are supp orted for estimated p erformance: • Plots of p erformance measures with confidence interv als based on Mon te Carlo standard errors. There are t wo v ariations of this plot: forest plots, and lolly plots. Both metho ds displa y the estimated performance measure alongside confidence in terv als based on Mon te Carlo standard errors; differen t metho ds are arranged side b y side, either on the horizontal or on the vertical axis; Journal of Data Science, Statistics, and Visualisation 11 • Heat plots of p erformance measures: these plots are mosaic plots where the sev eral metho ds b eing compared (if defined) are on the horizon tal axis and the data- generating mec hanisms are on the vertical axis. Then, each tile of the mosaic plot is coloured according to the v alue of a giv en p erformance measure. T o the b est of our knowledge, this is a nov el w a y of visualising results from simulation studies, with an application in practice that can b e found elsewhere ( Gasparini et al. 2019 ); • Zip plots to visually explain co v erage probabilities b y plotting the confidence in terv als directly . More information on zip plots is presen ted elsewhere ( Morris et al. 2019 ); • Nested lo op plots, useful to compare p erformance measures from studies with sev eral DGMs at once. This visualisation is described in more detail elsewhere ( R üc k er and Sch warzer 2014 ). Finally , all plots can b e exp orted for use in man uscript, reports, or presentations b y simply clicking the Save plot button underneath a plot; all plots are exp orted by default in .png format, but other options are a v ailable via the Options tab. F or instance, to suit a wide v ariet y of p ossible use cases, INTEREST supp orts several alternative image formats such as pdf , svg , and eps . Through the Options tab it is also p ossible to customise the resolution of the plot for non-vectorial format (in dots p er inch, dpi ) and the physical size (height and width) of the plots to b e exp orted. The Options tab allows further customisations: for instance, it is p ossible to (1) define a custom lab el for the x-axis and the y-axis and (2) change the ov erall app earance of the plot by applying one of the predefined themes (which are describ ed in more detail in the U ser guide tab). 3.6. In teractiv e apps for exploring results INTEREST allows researchers to upload a dataset with the results of their Monte Carlo sim ulation study obtaining estimates of p erformance in a quic k and straigh tforw ard wa y . This is v ery app ealing, esp ecially with simulation studies with several data-generating mec hanisms where it could b e confusing to in v estigate all scenarios at once. Using the app it is p ossible to v ary data-generating mec hanisms and obtain up dated tables and plots in real-time, therefore allo wing to quickly iterate and tak e in to consideration all p ossible scenarios. 3.7. In teractiv e apps for disseminating results One of the intended usage scenarios for INTEREST consists of supplementing rep orting of simulation studies. This is esp ecially useful with large simulation studies, where it is most cumbersome to summarise all results in a manuscript: it is common to include in the main man uscript only a subset of results for conciseness. The remaining results are then relegated to supplemen tary material, web app endices, or not published at all - undermining dissemination and replicability of a study . F urthermore, given that it is b ecoming increasingly common to publish the code of sim ulation study , one could publish the dataset with the results alongside the co de used to obtain it. That dataset could then b e uploaded to INTEREST b y readers, 12 INTEREST who could then explore the full results of the study as they wish. Giv en the ubiquity of w eb services like GitHub ( https://github.com ) and data-sharing rep ositories suc h as Zeno do ( https://zenodo.org/ ), w e encourage INTEREST users to publish online the full results of their sim ulation studies for other users to do wnload and exp eriment with. 4. F uture dev elopmen ts Although INTEREST is fully functional in its current state, sev eral future dev elop- men ts are b eing planned. F or instance, w e aim to include support for m ultiple esti- mands at once as curren tly supp orted b y rsimsum via the multisimsum function. W e also aim to impro v e the flexibility of INTEREST in terms of customisation (of tables and plots), e.g. by displaying the raw R co de used to generate the plots behind the scenes. Finally , w e are considering adding additional in teractive features to the app via HTML widgets, D 3 , or other approaches; there are sev eral R packages that allow incorp orating in teractiv e graphs into Shin y apps suc h as h tmlwidgets ( V aidyanathan et al. 2019 ), plotly ( Siev ert 2018 ), and r2d3 ( Luraschi and Allaire 2018 ). 5. Case study The case study included in this Section illustrates the use of INTEREST to analyse publicly a v ailable results of a simulation study . In particular, w e will b e using the results from the work ed illustrativ e example included in Morris et al . ( Morris et al. 2019 ). The study dataset contains the results of a sim ulation study comparing three different metho ds for estimating the hazard ratio in a randomised trial with a time to ev ent outcome. In particular, the metho ds b eing compared are prop ortional hazards surviv al mo dels of the kind: h i ( t ) = h 0 ( t ) exp( X i θ ) , where θ is the log hazard ratio for the effect of a binary exp osure (e.g. treatment). This class of mo dels requires an assumption regarding the shap e of the baseline hazard function h 0 ( t ) : it can b e assumed to follo w a giv en parametric distribution, or it can b e left unsp ecified (yielding therefore a Co x mo del). The aim of this simulation study consists of assessing the impact of suc h an assumption on the estimation of the log hazard ratio. Morris et al . consider tw o distinct data-gener ating me chanisms , v arying the baseline hazard function: 1. An exp onen tial baseline hazard with λ = 0 . 1 (DGM = 1); 2. A W eibull baseline hazard with λ = 0 . 1 , γ = 1 . 5 (DGM = 2). In b oth settings, data are simulated on 300 patien ts with a binary cov ariate (e.g. treat- men t) sim ulated using X i ∼ Bern(0 . 5) - simple randomisation with an equal allo cation Journal of Data Science, Statistics, and Visualisation 13 ratio. The log hazard ratio is set to b e θ = − 0 . 50 ; this is the true v alue of the estimand of in terest. Three distinct metho ds are fit to each simulated scenario: a parametric surviv al mo del that assumes an exp onen tial baseline hazard, a parametric surviv al mo del that assumes a W eibull baseline hazard, and a Cox semi-parametric surviv al mo del. Finally , the p erformanc e me asur es of interest are bias, cov erage, empirical and mo del- based standard errors. Assuming that V ar ( ˆ θ ) ≤ 0 . 04 , 1600 rep etitions are run to ensure that the Monte Carlo standard error of bias (the k ey p erformance measure of interest) is lo w er than 0.005. The dataset with the results of this simulation study is publicly a v ailable in Stata format, and can b e do wnloaded from a GitHub rep ository at the following URL: https://github.com/tpmorris/simtutorial/raw/master/Stata/estimates.dta Within the dataset published on GitHub, the exp onential, W eibull, and Co x models are co ded as mo del 1, 2, and 3, resp ectively . The w orkflow of INTEREST starts b y pro viding the dataset with the results of the sim ulation study . Given that the dataset is already a v ailable online, w e can directly pass the URL ab o v e to INTEREST and then define the required v ariables (as illustrated in Figure 3 ); the uploaded dataset can then b e v erified via the View uplo ade d data tab (Figure 4 ). W e can also customise the p erformance measures reported by INTEREST via the Options tab (Figure 5 ), e.g. fo cussing on those outlined ab o ve as k ey p erformance measures (bias, co v erage probability , empirical standard errors, mo del-based standard errors). The next step of the w orkflow consists of in vestigating missing v alues: this can b e ac hiev ed via the Missing data tab. In particular, there is no missing data in the study dataset (Figure 6 ). W e can, therefore, con tinue the analysis kno wing that there is no pattern of serial missingness or non-con vergence issues in our data. The p erformance measures of interest are tabulated in the Performanc e me asur es tab, e.g. for DGM = 2 (Figure 7 ). W e can see that bias for the exp onen tial mo del is muc h larger than the W eibull and Cox mo dels: appro ximately 10% of the true v alue (in absolute terms) compared to less than 1%. Empirical and mo del-based standard errors are quite similar for the W eibull and Co x models; con versely , the exp onen tial model seemed to o verestimate the model-based standard error. Cov erage was as advertised for all metho ds, at approximately 95%. By comparison, all mo dels p erformed equally in the other scenario (DGM = 1); these results are omitted from the manuscript for brevit y , but we encourage readers to replicate this analysis and v erify our statemen t. The Performanc e me asur es tab pro vides a L A T E X table ready to b e pasted e.g. in a man uscript: the resulting table is included as T able 3 . A dataset with all the estimated p erformance measures here tabulated can also b e exp orted to b e used elsewhere (Figure 8 ). W e can also visualise the results of this sim ulation study . First, we can pro duce a metho d-wise comparison of p oint estimates from eac h metho d using e.g. scatter plots (Figure 9 ) or Bland-Altman plots (Figure 10 ). With b oth plots, it is p ossible to appre- ciate that for the DGM with γ = 1 . 5 the exp onential mo del yields p oint estimates that 14 INTEREST Figure 3: App in terface to load the dataset for the case study . INTEREST can imp ort datasets that are av ailable online b y simply pasting a link to it; then, the required v ariables can b e defined via a list of pre-p opulated select inputs. are quite differen t compared to the W eibull and Co x mo dels. Analogous plots can b e obtained for estimated standard errors. The p erformance measures tabulated in the Performanc e me asur es tab can also b e plotted via the Plots tab. F or instance, it is straightforw ard to obtain a forest plot for bias (as illustrated in Figure 11 ) which can b e exp orted b y clic king the Save plot button. The plots’ app earance can also b e customised via the Options tab, e.g. by mo difying the axes’ lab els and the ov erall theme of the plot (Figure 12 ); the resulting forest plot, exp orted in .pdf format, is included as Figure 13 . Several other data visualisations are supp orted b y INTEREST , as describ ed in the previous Sections: lolly plots, zip plots, and so on. 6. Conclusions As outlined in the introduction, Mon te Carlo sim ulation studies are to o often p o orly analysed and rep orted ( Morris et al. 2019 ). Giv en the increased use in metho dological statistical researc h, we hop e that INTEREST could improv e rep orting and disseminat- ing results from sim ulation studies to a large exten t. As illustrated in the case study , the exploration and analysis of the Monte Carlo simulation study of Morris et al . can Journal of Data Science, Statistics, and Visualisation 15 Figure 4: V erifying the dataset for the case study . After imp orting the study dataset, it is recommended to verify that the uploaded data is correct. T able 3: Example of L A T E X table directly exp orted from INTEREST , case study DGM 2: true W eibull baseline hazard function. P erformance Measure 1 2 3 Bias in p oin t estimate 0.0494 (0.0035) 0.0048 (0.0038) 0.0062 (0.0038) Empirical standard error 0.1381 (0.0024) 0.1516 (0.0027) 0.1511 (0.0027) Mo del-based standard error 0.1539 (0.0001) 0.1541 (0.0001) 0.1542 (0.0001) Co verage of nominal 95% confidence interv al 0.9600 (0.0049) 0.9556 (0.0051) 0.9575 (0.0050) 16 INTEREST Figure 5: Customising the p erformance measures rep orted b y INTEREST . It is p ossible to fo cus on a subset of key p erformance measures by selecting them via the Options tab. Journal of Data Science, Statistics, and Visualisation 17 Figure 6: In vestigating missing data. Missingness patterns in the study dataset need to b e assessed b efore con tin uing with the analysis. Several visualisations and tabular displa ys are a v ailable from the Missing data tab. 18 INTEREST Figure 7: T able of p erformance measures for a giv en DGM. Performance measures of in terest are tabulated in the Performanc e me asur es tab, e.g. for the 2 nd DGM (with a W eibull baseline hazard function). Journal of Data Science, Statistics, and Visualisation 19 Figure 8: Exp orting options for estimated p erformance measures. Performance mea- sures of interest can b e exp orted in a v ariety of formats ready to b e used elsewhere (e.g. for dissemination purp oses or to develop ad-ho c visualisations). 20 INTEREST Figure 9: Visual comparison of p oint estimates via scatter plots. P oints estimates for eac h metho d-DGM com bination can b e pro duced automatically using INTEREST . Journal of Data Science, Statistics, and Visualisation 21 Figure 10: Visual comparison of point estimates via Bland-Altman plots. P oin ts es- timates for each metho d-DGM combination can b e pro duced automatically using IN- TEREST . 22 INTEREST Figure 11: Visual comparison of p erformance measures via forest plots. Estimated p erformance measures suc h as bias can b e easily plotted via the Plots tab. Journal of Data Science, Statistics, and Visualisation 23 Figure 12: Customising the visual app earance of plots. INTEREST allows customising the app earance of plots pro duced by the app via the Options tab, e.g. by mo difying the axes’ lab els and/or the o v erall theme. 24 INTEREST Figure 13: F orest plot for bias, case study on surviv al regression mo delling. This forest plot pro duced b y INTEREST and further customised via the Options tab can b e directly exp orted from the app. ● ● ● ● ● ● dgm: 1 dgm: 2 1 2 3 1 2 3 0.00 0.02 0.04 Method Bias b e fully reproduced b y using INTEREST . Estimated performance measures are tab- ulated automatically , and plots can b e used to visualise the p erformance measures of in terest. Moreo v er, the user is not constrained to a giv en set of plots and can fully explore the results with ease e.g. b y v arying DGMs to fo cus on or b y c ho osing different data visualisations. Most in terestingly , the only requiremen t to repro duce the simula- tion study describ ed in the case study is a device with a web browser and connection to the Internet. T o the b est of our knowledge, there is no similar application readily a v ailable to b e used b y researc hers and readers of published Mon te Carlo sim ulation studies alik e. A c kno wledgemen ts TPM is supp orted b y the Medical Research Council (gran t num b ers MC_UU_12023/21 and MC_UU_12023/29). MJC is partially funded by the MRC-NIHR Metho dology Researc h P anel (MR/P015433/1). W e thank Ian R. White for discussions that lead to the inception and dev elopment of INTEREST . References Altman, D. G. and Bland, J. M. (1983). Measurement in medicine: The analysis of Journal of Data Science, Statistics, and Visualisation 25 metho d comparison studies. The Statistician , 32(3):307, DOI: 10.2307/2987937 , https://doi.org/10.2307%2F2987937 . Bandalos, D. L. and Gagné, P . (2012). Sim ulation metho ds in structural equation mo deling. In Handb o ok of structur al e quation mo deling , pages 92–108. The Guilford Press. Bland, J. M. and Altman, D. G. (1999). Measuring agreemen t in metho d comparison studies. Statistic al Metho ds in Me dic al R ese ar ch , 8(2):135– 160, DOI: 10.1177/096228029900800204 , https://doi.org/10.1177% 2F096228029900800204 . Bo omsma, A. (2013). Rep orting monte carlo studies in structural equation mod- eling. Structur al Equation Mo deling: A Multidisciplinary Journal , 20(3):518–540, DOI: 10.1080/10705511.2013.797839 , https://doi.org/10.1080%2F10705511. 2013.797839 . Bosto c k, M., Ogiev etsky , V., and Heer, J. (2011). D 3 : Data-driv en documents. IEEE T r ansactions on V isualization and Computer Gr aphics , 17(12):2301–2309, DOI: 10.1109/tvcg.2011.185 . Burton, A., Altman, D. G., Ro yston, P ., and Holder, R. L. (2006). The design of sim ulation studies in medical statistics. Statistics in Me dicine , 25(24):4279–4292, DOI: 10.1002/sim.2673 . Chang, W. and Borges Ribeiro, B. (2018). shin ydash b oard : Cr e ate Dashb o ar ds with shin y , https://CRAN.R- project.org/package=shinydashboard . R package ver- sion 0.7.1. Chang, W., Cheng, J., Allaire, J., Xie, Y., and McPherson, J. (2019). shiny : W eb A pplic ation F r amework for R , https://CRAN.R- project.org/package=shiny . R pac kage v ersion 1.4.0. Díaz-Emparanza, I. (2002). Is a small Mon te Carlo analysis a go o d analysis? Statistic al Pap ers , 43(4):567–577. Gasparini, A. (2018). rsimsum : Summarise results from Mon te Carlo sim ulation studies. Journal of Op en Sour c e Softwar e , 3(26):739, DOI: 10.21105/joss.00739 , https://doi.org/10.21105/joss.00739 . Gasparini, A., Clements, M. S., Abrams, K. R., and Crowther, M. J. (2019). Impact of mo del missp ecification in shared frailt y surviv al mo dels. Statistics in Me dicine , DOI: 10.1002/sim.8309 , https://doi.org/10.1002/sim.8309 . Hauc k, W. W. and Anderson, S. (1984). A surv ey regarding the rep orting of simulation studies. The A meric an Statistician , 38(3):214–216. Hoaglin, D. C. and Andrews, D. F. (1975). The rep orting of computation-based results in statistics. The A meric an Statistician , 29(3):122–126. 26 INTEREST K o ehler, E., Brown, E., and Haneuse, S. J. (2009). On the assessment of Monte Carlo error in simulation-based statistical analyses. The A meric an Statistician , 63(2):155– 162, DOI: 10.1198/tast.2009.0030 . Laine, C., Go o dman, S. N., Grisw old, M. E., and Sox, H. C. (2007). Repro ducible researc h: Mo ving to w ard research the public can really trust. A nnals of Internal Me dicine , 146(6):450–453, DOI: 10.7326/0003-4819-146-6-200703200-00154 . Lurasc hi, J. and Allaire, J. (2018). r2d3 : Interfac e to D 3 V isualizations , https:// CRAN.R- project.org/package=r2d3 . R pac kage v ersion 0.2.3. Morris, T. P ., White, I., and Cro wther, M. J. (2019). Using simulation studies to ev alu- ate statistical metho ds. Statistics in Me dicine , pages 1–29, DOI: 10.1002/sim.8086 . P eng, R. D. (2011). Reproducible researc h in computational science. 334(6060):1226– 1227, DOI: 10.1126/science.1213847 . R Core T eam (2020). R : A L anguage and Envir onment for Statistic al Computing . R F oundation for Statistical Computing, Vienna, Austria, https://www.R- project. org/ . R üc k er, G. and Sch warzer, G. (2014). Presen ting simulation results in a nested lo op plot. BMC Me dic al R ese ar ch Metho dolo gy , 14(1), DOI: 10.1186/1471-2288-14-129 . Sc h ulz, K. F., Altman, D. G., Moher, D., and for the CONSOR T Group (2010). CON- SOR T 2010 Statemen t: Up dated guidelines for rep orting parallel group randomised trials. PLOS Me dicine , 7(3):1–7, DOI: 10.1371/journal.pmed.1000251 . Siev ert, C. (2018). plotly for R , https://plotly- r.com . Smith, M. K. and Marshall, A. (2011). Imp ortance of proto cols for sim ulation studies in clinical drug dev elopment. Statistic al Metho ds in Me dic al R ese ar ch , 20(6):613–622. Spiegelhalter, D., Pearson, M., and Short, I. (2011). Visualizing uncertaint y ab out the future. Scienc e , 333(6048):1393–1400, DOI: 10.1126/science.1191181 . Tierney , N., Co ok, D., McBain, M., and F a y , C. (2020). naniar : Data Structur es, Summaries, and Visualisations for Missing Data , https://CRAN.R- project.org/ package=naniar . R package version 0.5.0. V aidyanathan, R., Xie, Y., Allaire, J., Cheng, J., and Russell, K. (2019). h tmlwid- gets : HTML Widgets for R , https://CRAN.R- project.org/package=htmlwidgets . R pac kage v ersion 1.5.1. v on Elm, E., Altman, D. G., Egger, M., P o co ck, S. J., Gøtzsc h, P ., V an- den brouc ke, J. P ., and for the STROBE Initiativ e (2007). The Strengthen- ing the Reporting of Observ ational Studies in Epidemiology (Strob e) Statemen t: Guidelines for rep orting observ ational studies. PLOS Me dicine , 4(10):1–5, DOI: 10.1371/journal.pmed.0040296 . White, I. R. (2010). simsum : Analyses of sim ulation studies including Mon te Carlo error. The Stata Journal , 10(3):369–385. Journal of Data Science, Statistics, and Visualisation 27 White, I. R., Royston, P ., and W o od, A. M. (2011). Multiple imputation using chained equations: Issues and guidance for practice. Statistics in Me dicine , 30(4):377–399, DOI: 10.1002/sim.4067 . Wic kham, H. (2014). Tidy data. Journal of Statistic al Softwar e , 59(10), DOI: 10.18637/jss.v059.i10 . Wilk e, C. O. (2018). ggridges : Ridgeline Plots in ggplot2 , https://CRAN.R- project. org/package=ggridges . R package version 0.5.1. Xie, Y. (2020). knitr : A Gener al-Purp ose Package for Dynamic R ep ort Gener ation in R , https://yihui.org/knitr/ . R package version 1.28. Xie, Y., Cheng, J., and T an, X. (2020). DT : A W r app er of the JavaScript Libr ary DataT ables , https://CRAN.R- project.org/package=DT . R package v ersion 0.12. 28 INTEREST Affiliation: Alessandro Gasparini Biostatistics Researc h Group Departmen t of Health Sciences Univ ersit y of Leicester George Da vies Cen tre Univ ersit y Road Leicester LE1 7RH United Kingdom E-mail: alessandro.gasparini@ki.se Tim P . Morris MR C Clinical T rials Unit at UCL 90 High Holb orn London W C1V 6LJ United Kingdom Mic hael J. Cro wther Biostatistics Researc h Group Departmen t of Health Sciences Univ ersit y of Leicester George Da vies Cen tre Univ ersit y Road Leicester LE1 7RH United Kingdom Journal of Data Science, Statistics, and Visualisation https://jdssv.org/ published b y the In ternational Asso ciation for Statistical Computing http://iasc- isi.org/ MMMMMM YYYY, V olume VV, Issue I I Submitte d: yyyy-mm-dd doi:XX.XXXXX/jdssv.v000.i00 A c c epte d: yyyy-mm-dd

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment