Causality on Longitudinal Data: Stable Specification Search in Constrained Structural Equation Modeling

1 Causality on Longitudinal Data: Stab le Speciﬁcation Search in Constrained Str uctural Equation Modeling Ridho Rahmadi, P erry Groot, Mar ieke HC v an Rijn, J an AJG van den Br and, Marianne Heins, Hans Knoop , T om Hesk es, the Alzheimer’ s Disease Neuroimaging Initiativ e ∗ , the MASTERPLAN Study Group , the OPTIMISTIC consor tium ∗∗ Abstract —A typical problem in causal modeling is the instability of model structure learning, i.e., small changes in ﬁnite data can result in completely different optimal models. The present work introduces a no vel causal modeling algor ithm f or longitudinal data, that is robust f or ﬁnite samples based on recent advances in stability selection using subsampling and selection algorithms. Our approach uses exploratory search but allows incor poration of prior knowledge , e.g., the absence of a par ticular causal relationship between two speciﬁc variables. W e represent causal relationships using structural equation models. Models are scored along two objectiv es: the model ﬁt and the model comple xity . Since both objectives are often conﬂicting we apply a multi-objective ev olutionar y algor ithm to search for P areto optimal models. T o handle the instability of small ﬁnite data samples, we repeatedly subsample the data and select those substructures (from the optimal models) that are both stable and parsimonious. These substructures can be visualized through a causal graph. Our more e xplorator y approach achiev es at least comparable perf or mance as, but often a signiﬁcant improv ement over state-of-the-ar t alter native approaches on a simulated data set with a known ground tr uth. W e also present the results of our method on three real-w or ld longitudinal data sets on chronic fatigue syndrome, Alzheimer disease, and chronic kidne y disease. The ﬁndings obtained with our approach are generally in line with results from more h ypothesis-dr iv en analyses in earlier studies and suggest some no vel relationships that deser v e fur ther research. Index T erms —Longitudinal data, Causal modeling, Structural equation model, Stability selection, Multi-objective e volutionar y algorithm, chronic fatigue syndrome , chronic kidney disease, Alzheimer’ s disease F 1 I N T R O D U C T I O N Causal modeling, an essential pr oblem in many disciplines [1], [2], [3], [4], [5], [6] attempts to model the mechanisms by which variables • R. Rahmadi is with the Department of Informatics, Universitas Islam Indonesia and Institute for Computing and Information Sciences, Radboud University Nijmegen, the Netherlands. E-mail: r .rahmadi@cs.ru.nl • P . Gr oot and T . Heskes are with Institute for Computing and Information Sciences, Radboud University Nijmegen, the Nether- lands. • M. van Rijn and J. van den Brand are with Department of Nephrology , Radboud University Medical Center , Nijmegen, The Netherlands. • H. Knoop is with the Department of Medical Psychology , Aca- demic Medical Center , University of Amsterdam, The Nether- lands. • ∗∗ The members of OPTIMISTIC consortium are described in the acknowledgment. relate and to understand the changes on the model if the mechanisms were manipulated [7]. In the medical domain, revealing causal relationships may lead to improvement of clin- ical practice, for example, the development of treatment and medication. Slowly but steadily , causal discovery methods ﬁnd their way into the medical literature, pr oviding novel in- • ∗ One of the data sets used in preparation of this article were obtained from the Alzheimer ’ s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investiga- tors can be found at: http:// adni.loni.usc.edu/ wp- content/ uploads/ how to apply/ ADNI Acknowledgement List.pdf • M. Heins is with the Netherlands Institute for Health Services Research, The Netherlands. 2 sights through exploratory analyses [8], [9], [10]. Moreover , data in the medical domain is often collected through longitudinal studies. Unlike in a cross-sectional design, where all measurements ar e obtained at a single occa- sion, the data in a longitudinal design consist of repeated measurements on subjects through time. Longitudinal data make it possible to capture change within subjects over time and thus gives some advantage to causal modeling in terms of providing more knowledge to estab- lish causal relationships [11]. As emphasized in Fitmaurice et al., [12] there is much natural heterogeneity among subjects in terms of how diseases progress that can be explained by the longitudinal study design. Another advantage is that in order to obtain a similar level of statistical power as in cr oss-sectional studies, fewer subjects in longitudinal studies are re- quired [13]. T o date, a number of causal modeling methods have been developed for longitu- dinal (or time series) data. Some of the methods are based on a V ector Autoregr es- sive (V AR) and/or Structural Equation Model (SEM) framework which assumes a linear sys- tem and independent Gaussian noise [14], [15], [16], [17], [18]. Some other methods, inter- estingly , take advantage of nonlinearity [19], [20], [21], or non-Gaussian noise [20], [22], to gain even more causal information. Most of the aforementioned methods conduct the es- timation of the causal structures in somewhat similar ways. For example [15], [16], [17], [20], [22] use the (partial correlations of the) V AR residuals to either test independence or as in- put to a causal sear ch algorithm, e.g., LiNGAM (Linear Non-Gaussian Acyclic Model) [23], PC (“P” stands for Peter , and “C” for Clark, the authors) [24]. In general these causal search algorithms are solely based on a single run of model learning which is notoriously insta- ble: small changes in ﬁnite data samples can lead to entirely differ ent inferred structur es. This implies that, some approaches might not be robust enough to correctly estimate causal models from various data, especially when the data set is noisy or has small sample size. In the present paper , we introduce a r obust causal modeling algorithm for longitudinal data that is designed to resolve the instability inherent to structur e learning. W e refer to our method as S3L, a shorthand for Stable Speciﬁ- cation Search for Longitudinal data. It extends our previous method [25], here referr ed to as S3C, which is designed for cross-sectional data. S3L is a general framework which subsamples the original data into many subsets, and for each subset S3L heuristically searches for Par eto optimal models using a multi-objective opti- mization approach. Among the optimal mod- els, S3L observes the so-called relevant causal structur es which represent both stable and par- simonious model structures. These steps con- stitute the structure estimation of S3L which is fundamentally dif ferent fr om the aforemen- tioned approaches that mostly use a single run for model estimation. For completeness, detail about S3C/L is described in Section 2. More- over , in the default setting S3L assumes some underlying contexts: iid samples for each time slice (lag), linear system, additive independent Gaussian noise, causal sufﬁciency (no latent variables), stationary (time-invariant causal re- lationships), and fairly uniform time intervals between time slices. The main contributions of S3L are: • The causal structure estimation of S3L is conducted through multi-objective opti- mization and stability selection [26] over optimal models, to optimize both the sta- bility and the parsimony of the model structur es. • S3C/L is a general framework which allows for other causal methods with all of their corresponding assumptions, e.g., nonlinearity , non-Gaussianity , to be plugged in as model r epresentation and estimation. The multi-objective search and the stability selection part are independent of any mentioned assumptions. • In the default model repr esentation, S3L adopts the idea of the “rolling” model from [27] to transform a longitudinal SEM model with an arbitrary number of time slices into two parts: a baseline model and a transition model. The baseline model captures the causal relationships at base- line observations, when subjects enter the study . The transition model consists of two 3 time slices, which essentially represent the possible causal relationships within and across time slices. W e also describe how to reshape the longitudinal data corr espond- ingly , so as to match the transformed lon- gitudinal model which then can easily be scored using standar d SEM software. • W e provide standardized causal effects which are computed from IDA estimates [28]. • W e carry out experiments on three differ - ent real-world data of (a) patients with chronic fatigue syndrome (CFS), (b) pa- tients with Alzheimer disease (AD), and (c) patients with chronic kidney disease (CKD). Some relevant methods, have attempted to make use of common structures to infer causal models. Causal Stability Ranking (CStaR) [29], originally designed for gene expression data, tries to ﬁnd stable rankings of genes (covari- ates) based on their total causal effect on a speciﬁc phenotype (response), using a subsam- pling procedure similar to stability selection and IDA to estimate causal effects. As CStaR only focuses on r elationships from all covari- ates to a single speciﬁc response, it seems to be difﬁcult to generalize it to other do- mains where any possible causal relationship may be of inter est. Moreover , another ap- proach called Group Iterative Multiple Model Estimation (GIMME), originally developed for functional Magnetic Resonance Imaging (fMRI) data and essentially an extension of extended uniﬁed SEM (combination of V AR and SEM) [30], aims to combine the group-level causal structur es with the individual-level structur es, resulting in a causal model for each individ- ual which contains common structur es to the group. Such subject-speciﬁc estimation may be feasible given relatively long time series (as in resting state fMRI), but likely too challenging for the typical longitudinal data in clinical stud- ies with a limited number of time slices per subject. Still in the domain of fMRI, there is a method called Independent Multiple-sample Greedy Equivalence Search (IMaGES) [31]. The method is a modiﬁcation of GES (described in the following paragraph), and designed to handle unexpected statistical dependencies in combined data. Since IMaGES was developed mainly for combining results of multiple data sets, we do not consider it further . Having both the transformed longitudinal model and the reshaped data, we can run other alternative approaches which are designed for cross-sectional data and conduct comprehen- sive comparisons. Her e, for evaluation of S3L, we generate simulated data and compare with some advanced constrained-based approaches such as PC-stable [32], Conservative PC (CPC) [33], CPC-stable [32], [33], and PC-Max [34]. All of these methods are extensions of the PC algo- rithm which in principle consists of two stages. The ﬁrst stage uses conditional independence tests to obtain the skeleton (undirected edges) of the model, and the second stage orients the skeleton based on some rules, r esulting in an essential graph or markov equivalence class model (described in Section 2.1, for more detail see [35]). W e also compare with an advanced scor e- based algorithm called Fast Greedy Equiva- lent Search (FGES) [36]. It is an extension of GES which in general starts with an empty (or sparse) model, and iteratively adds an edge (forward phase) which mostly increases the score until no more edge can be added. Then GES iteratively prunes an edge (back- ward phase) which does not decrease/improve the score until no more edge can be excluded. The rest of this paper is organized as fol- lows. All methods used in our approach are presented in Section 2. The r esults and the cor - responding discussions are presented in Sec- tion 3. Finally , conclusions and future work are presented in Section 4. 2 M E T H O D S 2.1 Stable Speciﬁcation Searc h for Cross- Sectional data In [25] we intr oduced our pr evious work, S3C, which searches over structur es r epresented by SEMs. In SEMs, reﬁning models to impr ove the model quality is called speciﬁcation sear ch . Generally S3C adopts the concept of stability selection [26] in order to enhance the robust- ness of structure learning by considering a whole range of model complexities. Originally , 4 in stability selection, this is realized by varying a continuous regularization parameter . Here, we explicitly consider differ ent discrete model complexities. However , to ﬁnd the optimal model structur e for each model complexity is a hard optimization problem. Therefor e, we rephrase stability selection as a multi-objective optimization pr oblem, so that we can jointly run over the whole range of model complexi- ties and ﬁnd the corresponding optimal struc- tures for each model complexity . In more detail, S3C can be divided into two phases. The ﬁrst phase is sear ch , perform- ing exploratory search over Structural Equa- tion Models (SEMs) using a multi-objective evolutionary algorithm called Non-dominated Sorting Genetic Algorithm II (NSGA-II) [37]. NSGA-II is an iterative procedur e which adopts the idea of evolution. It starts with random models and in every generation (iteration), at- tempts to impr ove the quality of the models by manipulating (r eﬁning) good models (parents) to make new models (offsprings). The quality of the models is characterized by scoring that is based on two conﬂicting objectives: model ﬁt with r espect to the data and model complexity . The model manipulations ar e realized by using two genetic operators: crossover that combines the structur es of parents and mutation that ﬂips the structur es of models. Mor eover , the composition of model population in the next generation is determined by selection strategy . One of the key features of NSGA-II is that in every iteration, it sorts models based on the concept of domination , yielding fronts or sets of models such that models in front l dominate those in fr ont l + 1 . The domination concept states that model m 1 is said to dominate model m 2 if and only if model m 1 is no worse than m 2 in all objectives and the model m 1 is strictly better than m 2 in at least one objective. The ﬁrst front of the last generation is called the Pareto optimal set, giving optimal models for the whole range of model complexities. Details of the NSGA-II algorithm are described in Deb et al [37]. Based on the idea of stability selection [26], S3C subsamples N subsets from the data D with size b| D | / 2 c without replacement, and for each subset, the search phase above is applied, giving sets of Pareto optimal models. After that, all Pareto optimal models are transformed into their corresponding Markov equivalence classes which can be represented by Completed Partially Directed Acyclic Graphs (CPDAGs) [35]. Since all DAGs that are a member of the same Markov equivalence class repr esent the same probability distribution, they are indistinguish- able based on the observational data alone. In SEMs, these models are called covariance equivalent [38] and return the same scores. From these CPDAGs we compute the edge and causal path stability graphs (see Figure 7 for an example) by grouping them accor ding to model complexity and computing their selec- tion pr obability , i.e., the number of occurr ences divided by the total number of models for a certain level of model complexity . The edge stability considers any edge between a pair of variables (i.e., A → B , B → A , or A − B ) and the causal path stability considers dir ected path, e.g., A → B of any length. Stability selection is then performed by specifying two thresholds, π sel (boundary of selection pr oba- bility) and π bic (boundary of complexity). For example, setting π sel = 0 . 6 means that all causal relationships with edge stability or causal path stability greater than or equal to this threshold are considered stable . The second thr eshold π bic is used to contr ol overﬁtting. For every model complexity j , we compute the Bayesian Infor- mation Criterion (BIC) score for each model in j based on the data subset to which the model is ﬁtted. W e then compute B I C j the average of BIC scores in model complexity j . W e set π bic to the minimum B I C j . All causal relationships with an edge stability or a causal path stability that is smaller than or equal to π bic (e.g., π bic = 27 in Figur e 7c) are considered parsimonious . Hence, the causal relationships greater than or equal to π sel and smaller than or equal to π bic are considered both stable and parsimonious and called relevant from which we can derive a causal model. In addition, we call the region with which the relevant structur es intersect as r elevant region. The second phase concerns visualization , combining the stability graphs into a graph with nodes and edges. This is done by adding the r elevant edges and orienting them using 5 prior knowledge described in Section 2.2.2) and the relevant causal paths. Mor e speciﬁcally , we ﬁrst connect the nodes following the relevant edges. Then we orient these edges based on the prior knowledge. And ﬁnally , we orient the r est of the edges following the relevant causal paths. The resulting graph consists of directed edges which repr esent causal r elation- ship and possibly with additional undirected edges which repr esent strong association but for which the direction is unclear from the data. Furthermore, following Meinshausen and B ¨ uhlmann [26], for each edge in the graph we take the highest selection pr obability it has across dif ferent model complexities in the relevant region of the edge stability graph as a measure of reliability and annotate the corresponding edge with this reliability score. The reliability score indicates the conﬁdence of a particular relevant structure. The higher the score, the more we can expect that the relevant structur e is not falsely selected [26]. In addition each dir ected edge is annotated with a standardized causal effect estimate which is explained in Section 2.2.3. The stability graphs are considered to be the main outcome of our approach where the visualization eases inter- pretation. 2.2 Stable speciﬁcation sear ch for longitu- dinal data Stable Speciﬁcation Search for Longitudinal data (S3L) is an extension of S3C. In principle, as illustrated in Figur e 1, S3L applies S3C on transformed longitudinal models, called base- line and transition models (explained in Sec- tion 2.2.1). Furthermor e, in order to see to which extent a covariate would cause a re- sponse, S3L provides standardized total causal effect estimates which are intrinsically com- puted fr om estimates from IDA [28] (described in Section 2.2.3). In the following subsections, we ﬁrst describe how we transform a longi- tudinal model and reshape the data accord- ingly , and then we discuss the implication of allowing prior knowledge in our S3C structur e learning. 2.2.1 Longitudinal model and data reshaping Based on the idea of a “r olling” network in [27] we transform a longitudinal SEM with an ar- bitrary number of time slices (e.g., Figure 2c) into two parts: a baseline model (Figure 2a) and a transition model (Figure 2b). In the original paper , the authors treat these models as prob- abilistic networks, here we treat them purely as SEMs. The baseline model essentially r ep- resents the causal relationships between vari- ables that may happen at the initial time slice t 0 , for instance, causal r elationships that oc- cur before a medical treatment started. More- over , the baseline model may also repr esent relationships of the unobserved process before t 0 [27]. The transition model constitutes the causal relationships between variables across time slices t i − 1 and t i , and between variables within time slice t i for i > 0 , for example, causal relationships that repr esent interactions during a medical treatment. In S3L, the structur e esti- mations will be conducted on the baseline and transition model separately . From the transition model we distinguish two kinds of causal relationships, namely intra- slice causal r elationship (e.g., solid arcs in Figure 2b), and inter-slice causal r elationship (e.g., dashed arcs in Figure 2b). The intra- slice causal relationship repr esents relation- ships within time slice t i . Accordingly the inter - slice causal relationship repr esents relation- ships between time slices t i − 1 and t i . W e assume that the inter-slice causal relationships are in- dependent of t (stationary). W e also assume that the time intervals between time slices are fairly uniform. In addition, the transition model implies two more constraints (explained in Section 2.2.2): there is no intra-slice causal relationship allowed in time slice t i − 1 and the inter-slice causal relationships always go for- ward in time, i.e., from time slice t i − 1 to time slice t i . Moreover , in order to score the transformed models, we reshape the longitudinal data ac- cordingly . Figure 3 shows an illustration of the data reshaping. Suppose we are given longitu- dinal data with s instances, p variables, and i time slices. W e assume that the original data shape is in a form of a matrix D of size s × q , 6 S3L Reshaped Data BaselineData Longitudinal Data S3C A C B Baseline Model A A B B Transition Model α / β  α / β  C C Fig. 1: Given a longitudinal data set, S3L uses the baseline observations to infer a baseline model, and reshapes the whole data set to infer a transition model. Both baseline and transition model are annotated with a reliability score α and a standardized causal ef fect β . 𝑋 1 𝑋 2 𝑋 3 𝑋 4 𝑡 0 (a) 𝑋 1 𝑋 2 𝑋 3 𝑋 4 𝑋 1 𝑋 2 𝑋 3 𝑋 4 𝑡 𝑖 − 1 𝑡 𝑖 (b) 𝑋 1 𝑋 2 𝑋 3 𝑋 4 𝑋 1 𝑋 2 𝑋 3 𝑋 4 𝑋 1 𝑋 2 𝑋 3 𝑋 4 𝑡 0 𝑡 1 𝑡 2 (c) Fig. 2: (a) The baseline model which is used to capture causal relationships at the initial time slice, e.g., before medical tr eatment. (b) The transition model which is used to represent causal r elationships within and between time slices, e.g., during medical treatment. (c) The corresponding “unrolled” longitudinal model. with q = p × i . The reshaped data is then a matrix D 0 of size s 0 × q 0 , with s 0 = s ( i − 1) and q 0 = 2 p . Having such r eshaped data allows us to use standard SEM software to compute the scores. 2.2.2 Constrained SEM In practice, we are often given some prior knowledge about the data. The prior knowl- edge which may be, e.g., results of previous studies, gives us some constraints in terms of causal r elations. For example, in the case of, say disease A , ther e exists some common knowledge which tells us that symptom S does not cause disease A directly . In terms of a SEM speciﬁca- tion, the prior knowledge can be translated into a constrained SEM in which ther e is no dir ected edge from variable S (denotes symptom S ) to variable A (denotes disease A ); this still allows for directed edges fr om A to S or directed paths (indirect relationships) from S to A , e.g., a path S → ... → A with any variables in between. S3C, and hence S3L allow for such prior knowledge to be included in the model. In S3L, this prior knowledge only applies to the intra-slice causal relationships. 7         ...         ...                 r e s h a pe  󰆒  Fig. 3: D is a matrix r epresenting the original data shape which consists of s instances, p variables, and i time slices. D 0 is a matrix repr esenting the corresponding reshaped data. Model speciﬁcations should comply with any prior knowledge when performing spec- iﬁcation search and when measuring the edge and causal path stability . Recall that in order to measur e the stability , all optimal models (DAGs) ar e converted into their correspond- ing equivalence class models (CPDAGs). This model transformation, however , could r esult in CPDAGs that are inconsistent with the prior knowledge. For example, a constraint A 6→ B may be violated since arcs B → A in the DAG may be converted into undirected (reversible) edges A − B in the CPDAG. In or der to preserve constraints, we therefor e extended an efﬁcient DAG-TO-CPDAG algorithm of Chickering [35], as described in Rahmadi et al [25]. Essentially , the motivation of our extension to Chickering’s algorithm is similar to that of Meek’s algorithm [39], that is, to obtain a CPDAG consistent with prior knowledge. 2.2.3 Estimating causal eff ects W e employ IDA [28] to estimate the total causal effects of a covariate X i on a response Y from the relevant structur es. This method works as follows. Given a CPDAG G = { G 1 , . . . , G m } which contains m differ ent DAGs in its equiv- alence class, IDA applies intervention calculus [38], [40] to each DAG G j to obtain multisets Θ i = { θ ij } j ∈ 1 ,...,m , i = 1 , . . . , p , wher e p is the number of covariates. θ ij speciﬁes the possible causal effect of X i on Y in graph G j . Causal ef fects can be computed using so- called intervention calculus [38], which aims to determine the amount of change in a response variable Y when one would manipulate the covariate X i (and not the other variables). Note that this notion differs from a regression-type of association (see IDA paper for illustrative examples). Given a DAG G j , the causal effect θ ij can be computed using the so-called back- door adjustment, which takes into account the associations between Y , X i and the parents pa i ( G j ) of X i in G j . Under the assumption that the distribution of the data is normal and the model is linear , causal effects can be computed from a regr ession of Y on X i and its parents. Speciﬁcally , we have Maathuis et al., [28] θ ij = β i | pa i ( G j ) , where, for any set S ⊆ { X 1 , . . . , X p , Y } \ { X i } , β i | S = ( 0 , if Y ∈ S coefﬁcient of X i in Y ∼ X i + S, if Y 6∈ S, and Y ∼ X i + S is the linear regr ession of Y on X i and S . Note that IDA estimates the total causal effect from a covariate and r esponse, which considers all possible, either direct or indirect, causal paths from the covariate to the response. 8 IDA works for continuous, normally dis- tributed variables and then only requir es their observed covariance matrix as input to compute the regression coefﬁcients. Following Drasgow [41], we tr eat discrete variables as sur- rogate continuous variables, substituting the polychoric correlation for the correlation be- tween two discrete variables and the polyserial correlation between a discrete and a continuous variable. Our ﬁtting procedur e does not yield a sin- gle CPDAG, but a whole set of CPDAGs to repr esent the given data. W e therefore extend IDA as follows. W e gather G π bic , the CPDAGs of all optimal models with complexity equal to π bic . For each CPDAG G ∈ G π bic , we compute the possible causal effects Θ of each relevant causal path using IDA. For example, for the causal effect from X to Y , we obtain estimates Θ k X → Y , k = 1 , . . . , N , wher e N is the number of subsets. All causal ef fect estimations in Θ k X → Y are then concatenated into a single multiset Θ X → Y . T o repr esent the estimated causal effects fr om X to Y , we compute the median ˜ Θ X → Y and iff X and Y ar e continuous variables, we stan- dardize the estimation using ˜ Θ X → Y · σ X σ Y , where σ X and σ Y are the standard deviations of the covariate and the response, r espectively . Standardized causal effects allow us to mean- ingfully compare them. 3 R E S U L T S A N D D I S C U S S I O N 3.1 Implementation W e implemented S3C and S3L as an R pack- age named stablespec . The package is pub- licly available at the Comprehensive R Ar chive Network (CRAN) 1 , so it can be installed di- rectly , e.g., from the R console by typing install.package("stablespec") or from RStudio. W e also included a package documen- tation as a brief tutorial on how to use the functions. 1. https://cran.r- project.org/web/packages/stablespec/ index.html 3.2 P arameter settings For application to simulated data and real- world data, we subsampled 50 and 100 subsets from the data with size b| D | / 2 c , respectively . W e did not do comprehensive parameter tun- ing for NSGA-II, instead, we followed guide- lines provided in Grefenstette [42]. The param- eters for applications to both simulated and real-world data were set as follows: the number of iterations was 35 , the number of models in the population was 150 , the probability of applying crossover was 0 . 85 , the probability of applying mutation to a model structur e was 0 . 07 , and the selection strategy was binary tour- nament selection [43]. W e scor e models using the chi-squar e χ 2 and the model complexity . The χ 2 is considered the original ﬁt index in SEM and measures how close the model-implied co- variance matrix is to the sample covariance ma- trix [44]. The model complexity r epresents how many parameters (ar cs) need to be estimated in the model. The maximum model complexity with p variables is given by p ( p − 1) / 2 . When using multi-objective optimization we minimize both the χ 2 and model complexity objectives. These two objectives are, however , conﬂicting with each other . For example, min- imizing the model complexity typically means compromising the data ﬁt. 3.3 Application to simulated data 3.3.1 Data Generation W e generated data sets from a longitudinal model containing four continuous variables and thr ee time slices (depicted by Figure 4). W e generated ten data sets for each of these sample sizes: 400 and 2000 , with random pa- rameterizations. All data sets ar e made publicly available. 2 3.3.2 P erformance measure W e conducted comparisons between S3L with FGES, PC-stable, CPC, CPC-stable, and PC- Max in two dif ferent scenarios: with and with- out prior knowledge about part of the causal directions. In the case of prior knowledge, we added that variable X 1 at t i cannot cause 2. A vailable at https://tinyurl.com/smmr- rahmadi- dataset 9 variables X 2 and X 3 at t i directly . This prior knowledge translates to constraints that the various methods can use to restrict their search space. In addition to both scenarios, we also added longitudinal constraints to the models of FGES, PC-stable, CPC, CPC-stable, and PC- Max the same as those used in the transition model of S3L, i.e., there is no intra-causal re- lationship from time t i − 1 and the inter-slice causal relationships always go forward in time t i − 1 to t i . The parameters of FGES, PC-stable, CPC, CPC-stable, and PC-Max used in this simula- tion are set following some existing examples [28], [45], [46]. For FGES, the penalty of BIC score is 2 and the vertex degree in the forward search is not limited. For PC-stable, CPC, CPC- stable, and PC-Max, the signiﬁcance level when testing for conditional independence is 0 . 01 , and the maximum size of the conditioning sets is inﬁnity . Moreover , as the true model is known, we measure the performance of all approaches by means of the Receiver Operating Characteristic (ROC) [47] for both edges and causal paths. W e compute the T rue Positive Rate (TPR) and the False Positive Rate (FPR) based on the CPDAG of the true model. As for example, in the case of edge stability , a true positive means that an edge obtained by our method or the other approaches is pr esent in the CPDAG of the ground truth. T o compare the ROC curves of our method  1  2  3  4  1  2  3  4  1  2  3  4  0  1  2 Fig. 4: The longitudinal model with four vari- ables and three time slices, used to generate simulated data. and those of alternative approaches, we em- ployed three signiﬁcance tests. The ﬁrst two tests, as intr oduced in [48] and in [49] compare the Ar ea Under the Curve (AUC) of the ROC curves by using the theory of U-statistics and bootstrap r eplicates, r espectively . The thir d test [50] compar es the actual ROC curves by eval- uating the absolute differ ence and generating rank-based permutations to compute the sta- tistical signiﬁcance. The null hypothesis is that (the AUC of) the ROC curves of our method and those of alternative appr oaches ar e identi- cal. Furthermore, we computed the ROC curves using two dif ferent schemes: averaging and individual. Both schemes are applied to all methods and to all data sets generated. In the averaging scheme, the ROC curves ar e com- puted from the average edge and causal path stability from different data sets, and then the statistical signiﬁcance tests are applied to these ROC curves. On the other hand, in the indi- vidual scheme the ROC curves ar e computed from the edge and causal path stability on each data set. W e then applied individual statistical signiﬁcance tests on the ROC curves for each data set and used Fisher ’s method [51], [52], to combine these test results into a single test statistic. The experimental designs (with and with- out prior knowledge) and the ROC schemes (averaging and individual) are aimed to show empirically and compr ehensively how r obust the results are of each approach in various practical cases as well as against changes in the data. 3.3.3 Discussion W e ﬁrst discuss the result of our experiments on the data set with sample size 400 . Figur e 5 shows the ROC curves for the edge stability ((a) and (c)) and the causal path stability ((b) and (d)) from the averaging scheme. Panels (a) and (b) represent the results without prior knowledge, while panels (c) and (d) repr esent the r esults with prior knowledge. T able 3 lists the corresponding AUCs. T ables 1 and 2 present the results of the signiﬁcance tests for both the averaging and individual schemes in the experiment with and 10 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 FPR TPR PC−stable CPC FGES S3L CPC−stable PC−Max (a) 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 FPR TPR PC−stable CPC FGES S3L CPC−stable PC−Max (b) 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 FPR TPR PC−stable CPC FGES S3L CPC−stable PC−Max (c) 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 FPR TPR PC−stable CPC FGES S3L CPC−stable PC−Max (d) Fig. 5: Results from simulation data with sample size 400: ROC curves for (a) the edge stability and (b) the causal path stability (without prior knowledge), and (c) the edge path stability and (d) the causal path stability (with prior knowledge), for differ ent values of π sel in the range of [0 , 1] . T able 3 lists the corresponding AUCs. without prior knowledge, respectively . In the case without prior knowledge, generally the AUCs of the edge and the causal path stability of S3L are better ( p -value ≤ 0 . 05 , or even ≤ 0 . 001 , few of them ar e mar ginally signiﬁ- cant, e.g., p -value ≤ 0 . 1 ) than those of other approaches according to both schemes, except those of FGES for which generally there is no evidence of a dif ference ( p -value > 0 . 1 ). In the case with prior knowledge, in general the results are similar to those of experiment without prior knowledge, but now the AUC of the causal path stability of S3L is better ( p - value ≤ 0 . 05 ) than that of FGES. The ROC of the causal path stability of S3L is now also better ( p -value ≤ 0 . 05 ) than those of PC-stable, CPC, CPC-stable, and PC-Max according to the individual scheme. This is an improvement over the experiment without prior knowledge. Next we discuss the result of our experi- ments on the data set with sample size 2000 . Figure 6 shows the ROC curves and T able 6 lists the corresponding AUCs. T ables 4 and 5 list the results of the signiﬁcance tests for both the averaging and individual schemes in the experiment with and without prior knowl- edge, respectively . In the case without prior knowledge, generally the AUCs of the edge and the causal path stability of S3L are better than ( p -value ≤ 0 . 05 ) those of other appr oaches according to the individual scheme. Moreover , the ROCs of the edge and the causal path stability of S3L are better than those of FGES ( p -value ≤ 0 . 001 ) and CPC-stable ( p -value ≤ 11 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 FPR TPR PC−stable CPC FGES S3L CPC−stable PC−Max (a) 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 FPR TPR PC−stable CPC FGES S3L CPC−stable PC−Max (b) 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 FPR TPR PC−stable CPC FGES S3L CPC−stable PC−Max (c) 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 FPR TPR PC−stable CPC FGES S3L CPC−stable PC−Max (d) Fig. 6: Results fr om simulation data with sample size 2000: ROC curves for (a) the edge stability and (b) the causal path stability (without prior knowledge), and (c) the edge path stability and (d) the causal path stability (with prior knowledge), for differ ent values of π sel in the range of [0 , 1] . T ables 6 lists the corresponding AUCs. 0 . 1 ), r espectively , according to the individual scheme. In the case with prior knowledge, the results are pretty much similar to those of the experiment without prior knowledge, but only now the p -value tends to become smaller , e.g., ( p -value ≤ 0 . 001 ). T o conclude, we see that in general S3L attains at least comparable performance as, but often a signiﬁcant improvement over , alterna- tive appr oaches. This holds in particular for causal directions and in the case of a small sample size. The presence of prior knowledge enhances the performance of the S3L. 3.4 Application to real-world data Here the true model is unknown, so we can only compare the results of S3L with those reported in earlier studies and interpr etation by medical experts. W e set the thresholds to π sel = 0 . 6 and π bic to the model complexity where the minimum average of BIC scores is found. By thresholding we get the relevant causal relationships: those which occur in the relevant r egion. Details of the pr ocedure are given in Section 2.1. The model assumptions in the application to real-world data follow from the assumptions of S3L in the default setting. The assumptions include iid samples on each time slice, linear system, independent gaussian noise, no latent variables, stationary , and fairly uniform time intervals between time slices. Moreover , ther e is an important note related to the visualization of the stability graphs. 12 T ABLE 1: T able of p -values from comparisons on data set with sample size 400 between S3L and alternative approaches without prior knowledge. The null hypothesis is that (the AUC of) the ROC curves of S3L and those of alternative approaches are equivalent. For each signiﬁcance test, we compared the ROC of the edge (Edge) and causal path (Causal) stability (see Figure 5a and 5b) on both averaging (A vg.) and individual (Ind.) schemes. FGES PC-stable CPC CPC-stable PC-Max Signiﬁcance test A vg. Ind. A vg. Ind. A vg. Ind. A vg. Ind. A vg. Ind. DeLong [48] Edge 0 . 315 0 . 909 0 . 021 < 10 − 5 0 . 025 < 10 − 5 0.052 < 10 − 5 0 . 050 < 10 − 5 Causal 0 . 451 0 . 109 0.069 < 10 − 5 0 . 825 < 10 − 5 0 . 012 < 10 − 5 0 . 126 < 10 − 5 Bootstrap [49] Edge 0 . 331 0 . 935 0 . 020 < 10 − 5 0 . 024 < 10 − 5 0.051 < 10 − 5 0 . 049 < 10 − 5 Causal 0 . 466 0.090 0.063 < 10 − 5 0 . 830 < 10 − 5 0 . 010 < 10 − 5 0 . 121 < 10 − 5 V enkatraman [50] Edge 0 . 304 0 . 906 0.091 0 . 102 0.076 0 . 118 0 . 359 0 . 516 0 . 449 0 . 743 Causal 0 . 332 0 . 197 0 . 831 0 . 225 0 . 845 0 . 365 0 . 569 0 . 512 0 . 584 0 . 131 T ABLE 2: T able of p -values from comparisons on data set with sample size 400 between S3L and alternative approaches with prior knowledge. The null hypothesis is that (the AUC of) the ROC curves of S3L and those of alternative approaches are equivalent. For each signiﬁcance test, we compared the ROC of the edge (Edge) and causal path (Causal) stability (see Figure 5c and 5d) on both averaging (A vg.) and individual (Ind.) schemes. FGES PC-stable CPC CPC-stable PC-Max Signiﬁcance test A vg. Ind. A vg. Ind. A vg. Ind. A vg. Ind. A vg. Ind. DeLong [48] Edge 0.090 0 . 146 0.086 < 10 − 3 0.099 < 10 − 5 0 . 219 0 . 001 0 . 227 0 . 002 Causal 0 . 264 0 . 003 0.061 < 10 − 5 0 . 035 < 10 − 5 0 . 022 < 10 − 5 0 . 031 < 10 − 5 Bootstrap [49] Edge 0 . 118 0 . 188 0.084 < 10 − 5 0.099 < 10 − 5 0 . 208 < 10 − 3 0 . 223 0 . 001 Causal 0 . 251 0 . 002 0.060 < 10 − 5 0 . 031 < 10 − 5 0 . 020 < 10 − 5 0 . 026 < 10 − 5 V enkatraman [50] Edge 0 . 430 0 . 598 0.056 0 . 680 0 . 103 0 . 543 0 . 680 0 . 998 0 . 707 0 . 998 Causal 0 . 637 0 . 783 0 . 485 0 . 004 0.069 < 10 − 3 0 . 116 0.094 0 . 171 0 . 007 T ABLE 3: T able of AUCs for the edge and causal path stability for each method, fr om simulation on data with sample size 400 , with (yes) and without prior knowledge (no). S3L FGES PC-stable CPC CPC-stable PC-Max AUC no yes no yes no yes no yes no yes no yes edge 0 . 80 0 . 74 0 . 83 0 . 81 0 . 63 0 . 60 0 . 63 0 . 62 0 . 63 0 . 65 0 . 63 0 . 65 causal 0 . 92 0 . 96 0 . 90 0 . 93 0 . 84 0 . 90 0 . 92 0 . 89 0 . 78 0 . 84 0 . 85 0 . 85 13 T ABLE 4: T able of p -values from comparisons on data set with sample size 2000 between S3L and alternative approaches without prior knowledge. The null hypothesis is that (the AUC of) the ROC curves of S3L and those of alternative approaches are equivalent. For each signiﬁcance test, we compared the ROC of the edge (Edge) and causal path (Causal) stability (see Figure 6a and 6b) on both averaging (A vg.) and individual (Ind.) schemes. FGES PC-stable CPC CPC-stable PC-Max Signiﬁcance test A vg. Ind. A vg. Ind. A vg. Ind. A vg. Ind. A vg. Ind. DeLong [48] Edge 1 . 000 0.099 0 . 223 0 . 010 0 . 320 0 . 014 0.071 0 . 001 0 . 118 0 . 009 Causal 0.052 < 10 − 5 0 . 563 < 10 − 3 0 . 353 < 10 − 5 0 . 221 < 10 − 5 0 . 952 0 . 183 Bootstrap [49] Edge 1 . 000 0 . 103 0 . 222 0 . 007 0 . 321 0 . 010 0.077 0 . 003 0 . 103 0 . 006 Causal 0 . 045 < 10 − 5 0 . 554 < 10 − 3 0 . 357 < 10 − 5 0 . 202 < 10 − 5 0 . 952 0 . 161 V enkatraman [50] Edge 0 . 480 0 . 963 0 . 187 0 . 801 0 . 212 0 . 872 0.069 0 . 900 0 . 100 0 . 972 Causal 0 . 418 < 10 − 3 0 . 404 0 . 637 0 . 289 0 . 339 0 . 726 0 . 897 0 . 520 0 . 250 T ABLE 5: T able of p -values from comparisons on data set with sample size 2000 between S3L and alternative approaches with prior knowledge. The null hypothesis is that (the AUC of) the ROC curves of S3L and those of alternative approaches are equivalent. For each signiﬁcance test, we compared the ROC of the edge (Edge) and causal path (Causal) stability (see Figure 6c and 6d) on both averaging (A vg.) and individual (Ind.) schemes. FGES PC-stable CPC CPC-stable PC-Max Signiﬁcance test A vg. Ind. A vg. Ind. A vg. Ind. A vg. Ind. A vg. Ind. DeLong [48] Edge 0 . 296 0 . 978 0 . 413 < 10 − 3 0 . 348 0 . 005 0 . 147 < 10 − 3 0 . 122 < 10 − 3 Causal 0 . 142 < 10 − 5 0 . 817 < 10 − 3 0 . 698 < 10 − 3 0 . 043 < 10 − 5 0 . 279 < 10 − 5 Bootstrap [49] Edge 0 . 295 0 . 983 0 . 412 < 10 − 3 0 . 344 0 . 002 0 . 146 < 10 − 5 0 . 125 < 10 − 5 Causal 0 . 144 < 10 − 5 0 . 833 < 10 − 3 0 . 706 < 10 − 3 0 . 043 < 10 − 5 0 . 279 < 10 − 5 V enkatraman [50] Edge 0 . 761 0 . 862 0 . 119 0 . 207 0 . 210 0 . 290 0 . 146 0.082 0 . 140 0 . 257 Causal 0 . 486 0 . 595 0 . 384 0 . 763 0 . 172 0 . 742 0 . 488 0 . 984 0 . 652 0 . 903 T ABLE 6: T able of AUCs for the edge and causal path stability for each method, from simulation on data with sample size 2000 , with (yes) and without prior knowledge (no). S3L FGES PC-stable CPC CPC-stable PC-Max AUC no yes no yes no yes no yes no yes no yes edge 0 . 77 0 . 73 0 . 77 0 . 78 0 . 67 0 . 67 0 . 69 0 . 65 0 . 62 0 . 61 0 . 65 0 . 60 causal 0 . 92 0 . 93 0 . 85 0 . 90 0 . 90 0 . 94 0 . 88 0 . 93 0 . 87 0 . 87 0 . 92 0 . 91 14 A DAG without edges will always be trans- formed into a CPDAG without edges. A fully connected DAG without prior knowledge will be transformed into a CPDAG with only undi- rected edges. However if prior knowledge is added, a fully connected DAG will be trans- formed into a CPDAG in which the edges corresponding to the prior knowledge are di- rected. From these observations it follows that in the edge stability graph all paths start with a selection probability of 0 and end up in a selection probability of 1 . In the causal path stability graph when no prior knowledge has been added all paths start with a selection probability of 0 and end up in a selection prob- ability of 0 . However , when prior knowledge is added some of the paths may end up in a selection probability of 1 because of the added constraints. 3.4.1 Application to chronic f atigue syndrome data Our ﬁrst application to real-world data consid- ers a longitudinal data set of 183 patients with chronic fatigue syndr ome (CFS) who received cognitive behavior therapy (CBT) [53]. Empirical studies have shown that CBT can signiﬁcantly reduce fatigue severity . In this study we focus on the causal relat ionships between cognitions and behavior in the process of reducing sub- ject’s fatigue severity . W e ther efore include six variables namely fatigue severity , the sense of control over fatigue, focusing on the symptoms, the objective activity of the patient ( oActivity ), the subject’s perceived activity ( pActivity ), and the physical functioning . The data set consists of ﬁve time slices wher e the ﬁrst and the ﬁfth time slices are the pre- and post-tr eatment ob- servations, respectively , and the second until the fourth time slices ar e observations during the treatment. The missing data is 8 . 7% and to impute the missing values, we used single imputation with Expectation Maximization (EM) in SPSS [54]. As all of the variables have large scales, e.g., in the range between 0 to 155 , we treat them as continuous variables. W e added prior knowledge that the variable fatigue at t 0 and t i does not cause any of the other variables directly . This is a common assumption made in the analysis of CBT in order to investigate the causal impact on fatigue severity [53], [55]. First we discuss the baseline model, which only considers the baseline causal r elation- ships. The corresponding stability graphs can be seen in Figur es 7a and 7b. As mentioned before, π sel is set to 0 . 6 and from the sear ch phase of S3L we found that π bic = 6 . Figur es 7a and 7b show that three relevant edges and two relevant causal paths were found. Following the visualization procedur e (see visualization phase in Section 2.1), we get a baseline model in Figure 8a. The model shows that pActivity is a direct cause for fatigue severity . This follows from the prior assumption that we made and is consistent with earlier works [53], [55]. This causal r elationship suggests that a r eduction of (perceived) activity , leads to an increase of fatigue. In addition we found a strong rela- tionship between pActivity and oActivity whose direction cannot be determined. This relation- ship is somewhat sensible as both variables measuring patient’s activity . W e also found a connection between focusing and control , which is not surprising as focusing on symptoms also depends on patient’s sense of contr ol over fatigue. One would expect that if a patient has less control on the fatigue, the focus on the symptom would increase. Next we discuss the transition model, which considers all causal relationships over time slices. The corresponding stability graphs are depicted in Figures 7c and 7d. W e set π sel = 0 . 6 and the sear ch phase of S3L yielded π bic = 27 . Figures 7c shows that nineteen relevant edges were found, consisting of eleven intra-slice (blue lines) and eight inter-slice relationships of which six are between the same variables (orange lines) and two are between dif ferent variables (black lines). Figur e 7d shows that thirty-ﬁve r elevant causal paths wer e found, consisting of twelve intra-slice (blue lines) and twenty-three inter-slice relationships of which six are between the same variables (orange lines) and seventeen are between differ ent vari- ables (black lines). Applying the visualization procedur e, we get the transition model in Fig- ure 8b. The model shows that all variables have intra-slice causal r elationships to fatigue severity . These r elationships are consistent with 15 0.0 0.2 0.4 0.6 0.8 1.0 Model Complexity Selection Probability 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 π bic π sel pActivity — fatigue oActivity — p A ctivity focusing — control (a) 0.0 0.2 0.4 0.6 0.8 1.0 Model Complexity Selection Probability 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 pActivity —> fatigue control —> fatigue π bic π sel (b) Model complexity Selection probability 0 2 4 6 8 10 13 16 19 22 25 28 31 34 37 40 43 46 49 0 0.2 0.4 0.6 0.8 1 intra−slice relationship inter−slice relationship (same variables) inter−slice relationship (different v ar iables) π bic π sel (c) Model complexity Selection probability 0 2 4 6 8 10 13 16 19 22 25 28 31 34 37 40 43 46 49 0 0.2 0.4 0.6 0.8 1 intra−slice relationship inter−slice relationship (same variables) inter−slice relationship (different v ar iables) π bic π sel (d) Fig. 7: The stability graphs of the baseline model in (a) and (b) and the transition model in (c) and (d) for chronic fatigue syndrome, with edge stability in (a) and (c), and causal path stability in (b) and (d). The relevant r egions, above π sel and left of π bic , contain the relevant structures. [53], [55], [56] which conclude that during the CBT an increase in sense of control over fatigue, physical functioning, and perceived physical activity , together with a decrease in focusing on symptoms lead to a lower level of fatigue severity . Inter estingly , the actual activity seems insufﬁcient to reduce fatigue severity [53], however , how the patient perceives his own activity does seem to help. Additionally , we also found that, with similar causal ef- fects, all variables (except pActivity and fatigue ) also cause the change in fatigue indir ectly via pActivity as an intermediate variable. This sug- gests that, as discussed in [53] an increase in perceived activity does seem important to explain the change in fatigue. The variables focusing and functioning also appear to be in- direct causes of changes in the level of fatigue severity . 3.4.2 Application to Alzheimer’ s disease data For the second application to real-world data, we consider a longitudinal data set about Alzheimer ’ s Disease (AD), which is provided by the Alzheimer ’ s Disease Neuroimaging Ini- tiative (ADNI) [57], and can be accessed at adni.loni.usc.edu. The ADNI was launched in 2003 as a public-private partnership, led by Principal Investigator Michael W . W einer , MD. The primary goal of ADNI has been to test whether serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neu- ropsychological assessment can be combined to measur e the progr ession of mild cognitive impairment (MCI) and early AD. For up-to-date information see www .adni- info.or g. 16 0 . 9 4 c o n t r o l ( 𝑡 0 ) pA c t i v i t y ( 𝑡 0 ) o A c t i v i t y ( 𝑡 0 ) f un c t i o n i n g ( 𝑡 0 ) f o c us i n g ( 𝑡 0 ) f a t i g ue ( 𝑡 0 ) 0 . 9 6 / 0 . 3 9 0 . 6 5 (a) cont r ol ( 𝑡 𝑖 − 1 ) pA ct i v i t y ( 𝑡 𝑖 − 1 ) oA ct i v i t y ( 𝑡 𝑖 − 1 ) f uncti oni ng ( 𝑡 𝑖 − 1 ) f ocus i ng ( 𝑡 𝑖 − 1 ) f at i g ue ( 𝑡 𝑖 − 1 ) cont r ol ( 𝑡 𝑖 ) pA ct i v i t y ( 𝑡 𝑖 ) oA ct i v i t y ( 𝑡 𝑖 ) f uncti oni ng ( 𝑡 𝑖 ) f ocus i ng ( 𝑡 𝑖 ) f at i g ue ( 𝑡 𝑖 ) 1 / 0 .71 0 .9 5 / 0 .4 6 1 / 0 .61 1 / 0 .44 1 / 0 .68 1/ 0.77 1 / 0 .67 0 .76 / - 0 .42 1 / 0 .33 0 .75 / - 0 .09 1/ - 0 .4 0 0 .61 / 0 .02 1 / 0 .26 0 .91 / 0 .23 1/ - 0 .33 1/ - 0 .40 1/ - 0 .43 1 / 0 .34 1/ - 0 .52 (b) Fig. 8: (a) The baseline model and (b) the transition model of chr onic fatigue syndrome. The dashed line represents a strong relation between two variables but the causal direction cannot be determined from the data. Each edge has a reliability score (the highest selection pr obability in the relevant region of the edge stability graph) and a standar dized total causal effect estimation. For example, the annotation “ 1 / 0 . 71 ” repr esents a reliability score of 1 and a standardized total causal effect of 0 . 71 . Note that the standardized total causal effect repr esents not just the direct causal effect corresponding to the edge, but the total causal effect also including indirect effects . In the present paper we focus on patients with MCI, an intermediate clinical stage in AD [58]. Following Haight et al., [59] we include only the variables: subject’s cognitive dysfunc- tion ( ADAS-Cog ), hippocampal volume ( hip- pocampal vol ), whole brain volume ( brain vol ), and brain glucose metabolism ( brain glucose ). The data set contains 179 subjects with four continuous variables and six time slices. The ﬁrst time slice captures baseline observations and the next time slices are for the follow-up observations. The missing data is 22 . 9% and like in the application to CFS, we imputed the missing values using single imputation with EM. W e added prior knowledge that the vari- able ADAS-Cog at t 0 and t i does not cause any of the other variables directly . W e performed the sear ch over 100 subsamples of the original data set. First we discuss the baseline model which only considers the baseline causal r elation- ships. The corresponding stability graphs are shown in Figures 9a and 9b. π sel is set to 0 . 6 and the search phase of S3L found that π bic = 4 . Figures 9a and 9b show that four rele- vant edges and two r elevant causal paths were found. Following the visualization procedur e, we obtain the baseline model in Figur e 10a. W e found that an increase in both brain glucose metabolism and hippocampal volume causes reduction in subject’s cognitive dysfunction. These causal relations ar e consistent with ﬁnd- ings in [59] which also concluded that both brain glucose and hippocampal vol were inde- pendently related to ADAS-Cog (in our model, it is repr esented by independent dir ect causal paths). Additionally , str ong relations between hippocampal volume and brain volume seem plausible as they both measure the volume of the brain (partly and entirely). Next we discuss the transition model which considers all causal relationships across time slices. W e set π sel = 0 . 6 and the search phase of S3L yielded π bic = 12 . The corr esponding stability graphs can be seen in Figures 9c and 9d. W e found twelve relevant edges (see Fig- ure 9c), consisting of four intra-slice (blue lines) and eight inter-slice relationships of which four 17 0.0 0.2 0.4 0.6 0.8 1.0 Model Complexity Selection Probability 0 1 2 3 4 5 6 π bic π sel hippocampal_vol — brain_v ol hippocampal_vol — brain_glucose hippocampal_vol — AD AS−Cog brain_glucose — ADAS−Cog (a) 0.0 0.2 0.4 0.6 0.8 1.0 Model Complexity Selection Probability 0 1 2 3 4 5 6 hippocampal_vol —> AD AS−Cog brain_glucose —> ADAS−Cog π bic π sel (b) Model complexity Selection probability 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 0 0.2 0.4 0.6 0.8 1 intra−slice relationship inter−slice relationship (same variables) inter−slice relationship (different v ar iables) π bic π sel (c) Model complexity Selection probability 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 0 0.2 0.4 0.6 0.8 1 intra−slice relationship inter−slice relationship (same variables) inter−slice relationship (different v ar iables) π bic π sel (d) Fig. 9: The stability graphs of the baseline model in (a) and (b) and the transition model in (c) and (d) for Alzheimer ’s disease, with edge stability in (a) and (c), and causal path stability in (b) and (d). The relevant regions, above π sel and left of π bic , contain the relevant structures. are between the same variables (orange lines) and four are between different variables (black lines). Moreover , we found seventeen relevant causal paths (see Figure 9d), consisting of six intra-slice (blue lines) and eleven inter-slice relationships of which four are between the same variables (orange lines) and seven are between different variables (black lines). Ap- plying the visualization pr ocedure, we obtain the transition model in Figure 10b. In addition, the dir ection of the edge from brain glucose to brain vol follows because we do not allow cycles in our model. W e found that ther e are indirect and direct causal relationships from hippocampal vol and brain vol at both t i − 1 and t i to ADAS-Cog at t i . These particular causal rela- tionships support the hypothesis in [59] which says that any changes in both hippocampal volume and brain volume will cause short- term effects on a subject’s cognitive dysfunc- tion, both dir ect and indirect. In the original paper the authors suggested that the indirect causal relationship is through brain glucose , but our analysis also discovers a potential indirect effect through brain vol . Interestingly we found that a change in subject’s cognitive dysfunction in a previous time slice t i − 1 causes a reduction in brain volume in time slice t i . 3.4.3 Application to chronic kidney disease For the third application to r eal-world data, we consider a longitudinal data set about chronic kidney disease (CKD), provided by the MAS- TERPLAN study group [60]. The MASTER- PLAN study was initiated in 2004 as a ran- domized, contr olled trial studying the ef fect of intensiﬁed treatment with the aid of nurse practitioners on car diovascular and kidney out- 18 br a i n _g l uc o se ( 𝑡 0 ) h i ppo c a m pa l _v o l ( 𝑡 0 ) b r a i n _v o l ( 𝑡 0 ) A D A S - C o g ( 𝑡 0 ) 0 . 9 0 / - 0 . 2 8 0 . 9 8 / - 0 . 2 5 0 . 9 0 1 (a) br a i n _g l uc o se ( 𝑡 𝑖 − 1 ) h i ppo c a m pa l _v o l ( 𝑡 𝑖 − 1 ) b r a i n _v o l ( 𝑡 𝑖 − 1 ) A D A S - C o g ( 𝑡 𝑖 − 1 ) br a i n _g l uc o se ( 𝑡 𝑖 ) h i ppo c a m pa l _v o l ( 𝑡 𝑖 ) b r a i n _v o l ( 𝑡 𝑖 ) A D A S - C o g ( 𝑡 𝑖 ) 1 / 0 . 8 1 1 / 0 . 5 4 1 / 0 . 9 8 1 / 1 1/ - 0 . 0 8 0 . 8 5 / 0 0 . 6 7 / 0 . 5 4 0 . 7 5 / - 0 . 2 5 0 . 7 1 / - 0 . 8 4 0 . 9 5 / - 0 . 2 2 0 . 8 3 / - 1 . 2 1 0 . 6 2 / - 0 . 0 4 (b) Fig. 10: (a) The baseline model and (b) the transition model of Alzheimer ’s disease. The dashed line repr esents a str ong relation between two variables but the causal direction cannot be determined from the data. Each edge has a reliability score (the highest selection pr obability in the relevant region of the edge stability graph) and a standar dized total causal effect estimation. For example, the annotation “ 1 / 0 . 81 ” repr esents a reliability score of 1 and a total standardized causal effect of 0 . 81 . Note that the standardized total causal effect repr esents not just the direct causal effect corresponding to the edge, but the total causal effect also including indirect effects . come in CKD. This intensiﬁed treatment regi- men addressed eleven possible risk factors for the progr ession of CKD simultaneously . The study previously showed that this intensiﬁed treatment r esulted in fewer patients r eaching end stage kidney disease compared to standard treatment [60]. Here we focus on the potential causal me- diators for the protective effect incurred by the intensiﬁed tr eatment with the aid of nurse practitioners. In other words, we aim to iden- tify which of the treatment targets contributed to the observed overall treatment effect. In the pr esent analysis, we include only vari- ables of interest, being treatment status, either nurse practitioner aided care or standard care, as allocated by the randomization procedur e ( treatment ), estimated glomerular ﬁltration rate ( gfr )—a marker for overall kidney function, and a variable indicating informative censor- ing ( inf cens ). Informative censoring occurred when patients reached end stage kidney dis- ease requiring r enal replacement therapy , such as dialysis or a kidney transplantation, or when they died. Furthermore, we consider ed treat- ment targets that were previously hypothe- sized to contribute most to the overall treat- ment effect: systolic blood pressur e ( sbp ), LDL- cholesterol ( ldl ) and parathyroid hormone ( pth ) concentrations in blood, and pr otein excretion via urine ( pcr ). In total, there are 497 sub- jects with seven variables (both continuous and discrete) over ﬁve time slices. The ﬁrst time slice contains the baseline observations taken before treatment, and the next time slices ar e the follow-up observations during tr eatment. Particularly we set the variable treatment only at t i − 1 as it r emains the same over all time slices, and the variable inf cens only at t i as it is a consequence of previous treatment. W e further added the prior knowledge that gfr at t i does not directly cause any other variables, and that there ar e no relations between any variable and inf cens within t i . Both gfr and inf cens are read-out for CKD progr ession and ar e within a time slice always the consequence and never the cause of another variable. However , we relax this prior knowledge at time slice t 0 as it is a common assumption that without the treatment, pth is a consequence of poor kidney function. The missing data is 5 . 2% and a single imputation with EM was conducted to impute the missing values like in applications to CFS and ADNI data. W e performed the search over 19 100 subsamples of the original data set. First we discuss the baseline model, which only considers the baseline causal r elation- ships. Figures 11a and 11b depict the corre- sponding stability graphs. As in applications to CFS and ADNI data, π sel is set to 0 . 6 and based on the sear ch phase of S3L we found that π bic = 2 . Figures 11a and 11b shows that two relevant edges were found. Applying the visualization procedur e, we get the baseline model in Figure 12a. W e found that both pth and pcr were associated with kidney function at baseline. The direction of these associations remains unclear . From renal physiology , we know that proteinuria may result in kidney damage. However , kidney damage and pr o- teinuria may be common consequences of hy- pertension at an earlier stage in the patient’s history . The association between parathyroid hormone and GFR is unsurprising, as calcium and phosphate metabolism is disrupted in pa- tients with advanced kidney disease. However , elevated pth may in turn r esult in further kid- ney damage by increased vascular calciﬁcation. In other words, the associations seem plau- sible from a physiological point of view , but the association may be in either direction. In the CKD example, a causal direction is almost impossible to ascertain when only using cross- sectional data. Next we discuss the transition model, which takes into account all causal r elationships across time slices. W e set π sel = 0 . 6 and found π bic = 23 . Based on Figure 11c, we obtained seventeen relevant edges, consisting of four intra-slice (blue lines) and thirteen inter -slice relationships of which ﬁve ar e between the same variables (orange lines) and eight are between differ ent variables (black lines). Based on Figure 11d, we obtained twenty-six relevant causal paths, consisting of ﬁve intra-slice (blue lines) and twenty-one inter-slice r elationships of which ﬁve are between the same variables (orange lines) and sixteen are between differ ent variables (black lines). Applying the visualiza- tion procedur e, we get the transition model in Figure 12b. Most of the intra-slice and inter - slice causal r elationships are very stable with selection probabilities close to 1 . W e found inter-slice causal relationships from gfr , sbp , pth , and pcr to inf cens . Furthermore, gfr , sbp , and pcr ar e well known determinants for CKD progr ession. The causal relationship from pth to inf cens was somewhat surprising. However , pth is a marker for r egulation of phosphate stores in the body and r elated to overall vascu- lar damage through vascular calciﬁcation, and may thereby be related to mortality . Indeed, literature indicates that lowering pth in dialysis patients resulted in a reduction in mortality [61]. The same may hold true for patients who have CKD and who do yet need dialysis treat- ment. Perhaps most surprising are the relations between sbp and pcr and gfr , r espectively . Fr om renal physiology we know that higher ﬁltra- tion pressur es due to higher blood pressur e causes the short term glomerular ﬁltration rate to incr ease slightly [62]. Likewise, at higher ﬁltration pressur e, more and larger proteins are pushed out of the blood stream and into the pro-urine and are ultimately excreted via the urine. In the long term, chronically elevated ﬁl- tration pressur es and elevated levels of pr otein in the pr o-urine cause kidney damage and ulti- mately even end stage kidney disease. Overall, the results are consistent with literature and physiology [63]. 4 C O N C L U S I O N A N D F U T U R E W O R K Causal discovery from longitudinal data turns out to be an important problem in many disci- plines. In the medical domain, revealing causal relationships from a given data set may lead to improvement of clinical practice, e.g., further development of treatment and medication. In the past decades, many causal discovery al- gorithms have been introduced. These causal discovery algorithms, however , have difﬁculty dealing with the inherent instability in struc- ture estimation. The pr esent work introduces S3L, a novel discovery algorithm for longitudinal data that is robust for ﬁnite samples, extending our previous method [25] on cross-sectional data. S3L adopts the concept of stability selection to improve the r obustness of structure learn- ing by taking into account a whole range of model complexities. Since ﬁnding the optimal model structur e for each model complexity 20 0.0 0.2 0.4 0.6 0.8 1.0 Edg e Stability Model Complexity Selection Probability 0 1 2 3 4 5 6 7 8 9 10 π bic π sel pth — gfr pcr — gfr (a) 0.0 0.2 0.4 0.6 0.8 1.0 Causal P ath Stability Model Complexity Selection Probability 0 1 2 3 4 5 6 7 8 9 10 π bic π sel (b) Model complexity Selection probability 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 0 0.2 0.4 0.6 0.8 1 intra−slice relationship inter−slice relationship (same variables) inter−slice relationship (different v ar iables) π bic π sel (c) Model complexity Selection probability 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 0 0.2 0.4 0.6 0.8 1 intra−slice relationship inter−slice relationship (same variables) inter−slice relationship (different v ar iables) π bic π sel (d) Fig. 11: The stability graphs of the baseline model in (a) and (b) and the transition model in (c) and (d) for chronic kidney disease, with edge stability in (a) and (c), and causal path stability in (b) and (d). The relevant r egions, above π sel and left of π bic , contain the relevant structures. is a har d optimization problem, we r ephrase stability selection as a multi-objective optimiza- tion problem, so that we can jointly optimize over the whole range of model complexities and ﬁnd the corresponding optimal structures. Moreover , S3L is a general framework that can be combined with alternative approaches, without modifying their original assumptions, e.g., linearity , non-Gaussian noise, etc. The comparison on the simulated data shows that S3L achieves at least comparable perfor- mance as, but often a signiﬁcant improvement over alternative approaches, mainly in obtain- ing the causal relations, and in the case of small sample size. Moreover , the results of experiments on three real-world data sets are corroborated by literature studies [53], [55], [56], [59], [61], [63], [64], [65], [66]. However , the current method considers only longitudinal data with observed variables and cannot handle missing values (other than through imputation as a prepr ocessing step). W e also still assume that the time inter- vals between time slices is fairly uniform between subjects. Some existing appr oaches called random-coefﬁcient models, also termed multi-level or hierarchical regression models [67], [68], are ﬂexible to handle unequal intervals between time slices within a subject and/or across subjects. Future resear ch will aim to account for these aforementioned issues. A C K N O W L E D G M E N T S The authors wish to thank Thaddeus J. Haight, Falma Kemalasari, Joseph Ramsey and two anonymous refer ees for their valuable discus- sions, comments, and suggestions. 21 1 sbp ( 𝑡 0 ) l d l ( 𝑡 0 ) pc r ( 𝑡 0 ) pt h ( 𝑡 0 ) g f r ( 𝑡 0 ) 0 . 8 3 (a) t r ea t m ent ( 𝑡 𝑖 − 1 ) sbp ( 𝑡 𝑖 − 1 ) l dl ( 𝑡 𝑖 − 1 ) pcr ( 𝑡 𝑖 − 1 ) pt h ( 𝑡 𝑖 − 1 ) g f r ( 𝑡 𝑖 − 1 ) i nf _ce ns ( 𝑡 𝑖 ) sbp ( 𝑡 𝑖 ) l dl ( 𝑡 𝑖 ) pcr ( 𝑡 𝑖 ) pt h ( 𝑡 𝑖 ) g f r ( 𝑡 𝑖 ) 1 / 0 .88 0 .9 8 / - 0 .04 1 / 0 .61 1 / 0 .54 1 / 0 .64 1/ 0.65 0 .94 / 0 .32 0 .80 / 0 .07 0 .98 / 0 .02 0 .92 / - 0 .33 0.83 / 0.14 0 .82 / - 0 .02 0 .88 / - 0 .1 3 1/ - 0 .4 7 1 / 0 .37 0 .78 0 .72 / - 0 .03 (b) Fig. 12: (a) The baseline model and (b) the transition model of chronic kidney disease. The dashed line repr esents a strong relation between two variables but the causal direction cannot be determined from the data. Each edge has a r eliability score (the highest selection pr obability in the relevant region of the edge stability graph) and a standar dized total causal effect estimation. For example, the annotation “ 1 / 0 . 88 ” repr esents a reliability score of 1 and a standardized total causal effect of 0 . 88 . Note that the standardized total causal effect repr esents not just the direct causal effect corresponding to the edge, but the total causal effect also including indirect effects . The OPTIMISTIC Consortium comprises: Partner 1: Radboud University Medical Centre, The Netherlands , Ms. Shaghayegh Abghari; Dr . Armaz Aschraﬁ; Mrs. Sacha Bouman; Ms. Y vonne Cornelissen; Dr . Jef frey Glennon; Dr . Perry Groot; Prof. Arend Heer - schap; Ms. Linda Heskamp; Prof. T om Heskes; Ms. Katarzyna Kapusta; Mrs. Ellen Klerks; Dr . Hans Knoop; Mrs. Daphne Maas; Mr . Kees Okkersen; Dr . Geert Poelmans, Mr . Ridho Rah- madi; Prof. dr . Baziel van Engelen (Chief In- vestigator and Partner lead); Dr . Marlies van Nimwegen. Partner 2: University of Newcastle upon T yne, UK , Dr . Grainne Gorman (Partner lead); Ms. Cecilia Jimenez Moreno; Prof. Hanns Lochm- ller; Prof. Mike T renell; Ms. Sandra van Laar; Ms. Libby W ood. Partner 3: Ludwig- Maximilians-Universitt Mnchen, Germany , Prof. dr . Benedikt Schoser (Partner lead); Dr . Stephan W enninger; Dr . An- gela Schller . Partner 4: Assistance Publique-Hpitaux de Paris, France , Mrs Rmie Auguston; Mr . Lignier Baptiste; Dr . Caroline Barau; Prof. Guillaume Bassez (Partner lead); Mrs. Pascale Chevalier; Ms. Flor ence Couppey; Ms. Stphanie Delmas; Prof. Jean-Franois Deux; Mrs. Celine Dogan; Ms. Amira Hamadouche; Dr . Karolina Han- kiewicz; Mrs. Laure Lhermet; Ms. Lisa Minier; Mrs. Amandine Rialland; Mr . David Schmitz. Partner 5: University of Glasgow , UK , Prof. Darren G. Monckton (Partner lead); Dr . Sarah A. Cumming; Ms. Berit Adam. Partner 6: The University of Dundee, UK , Prof. Peter Donnan (Partner lead); Mr . Michael Hannah; Dr . Fiona Hogarth; Dr . Roberta Littleford; Dr . Emma McKenzie; Dr . Petra Rauchhaus; Ms. Erna W ilkie; Mrs. Jennifer W illiamson. Partner 7: Catt-Sci L TD, UK , Prof. Mike Catt (Partner lead). Partner 8: concentris research management gmbh, Germany , Mrs. Juliane Dittrich; Ms. Ameli Schwalber (Partner lead). Partner 9: The University of Aberdeen, UK , Prof. Shaun T reweek (Partner lead). 22 The author(s) declared no potential conﬂicts of interest with respect to the resear ch, author- ship, and/or publication of this article. The author(s) disclosed receipt of the fol- lowing ﬁnancial support for the research, au- thorship, and/or publication of this article: This work was supported, in part, by the DGHE of Indonesia as well as by the Euro- pean Community’s Seventh Framework Pr o- gramme (FP7/2007-2013) under grant agree- ment n ◦ 305697. The collection and sharing of brain imaging data used in one of the applications to r eal- world data was funded by the Alzheimer ’s Dis- ease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imag- ing and Bioengineering, and through gen- erous contributions from the following: Ab- bV ie, Alzheimer ’s Association; Alzheimer ’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir , Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Com- pany; EuroImmun; F . Hoffmann-La Roche Ltd and its afﬁliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Devel- opment, LLC.; Johnson & Johnson Pharmaceu- tical Resear ch & Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Di- agnostics, LLC.; NeuroRx Research; Neur otrack T echnologies; Novartis Pharmaceuticals Corpo- ration; Pﬁzer Inc.; Piramal Imaging; Servier; T akeda Pharmaceutical Company; and T ransi- tion Therapeutics. The Canadian Institutes of Health Research is providing funds to sup- port ADNI clinical sites in Canada. Private sector contributions ar e facilitated by the Foun- dation for the National Institutes of Health (www .fnih.org). The grantee or ganization is the Northern California Institute for Resear ch and Education, and the study is coordinated by the Alzheimer ’s Disease Cooperative Study at the University of California, San Diego. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California. R E F E R E N C E S [1] R. M. Daniel, M. G. Kenward, S. N. Cousens, and B. L. De Stavola, “Using causal diagrams to guide analysis in missing data problems,” Statistical methods in medical resear ch , vol. 21, no. 3, pp. 243–256, 2012. [2] K. D. Hoover , “Causality in economics and econometrics,” The new Palgrave dictionary of economics , vol. 2, 2008. [3] S. Abu-Bader and A. S. Abu-Qarn, “Government expendi- tures, military spending and economic growth: causality evidence from egypt, israel, and syria,” Journal of Policy Modeling , vol. 25, no. 6, pp. 567–583, 2003. [4] M. T aguri, J. Featherstone, and J. Cheng, “Causal me- diation analysis with multiple causally non-order ed mediators,” Statistical methods in medical resear ch , p. 0962280215615899, 2015. [5] J. Pearl, “Causal inference from indirect experiments,” Artiﬁcial intelligence in medicine , vol. 7, no. 6, pp. 561–582, 1995. [6] J. Detilleux, J.-Y . Reginster , A. Chines, and O. Bruy ` ere, “A bayesian path analysis to estimate causal effects of baze- doxifene acetate on incidence of vertebral fractures, either directly or through non-linear changes in bone mass density ,” Statistical methods in medical research , vol. 25, no. 1, pp. 400–412, 2016. [7] P . Spirtes, “Introduction to causal inference,” The Journal of Machine Learning Research , vol. 11, pp. 1643–1662, 2010. [8] S. la Bastide-van Gemert, R. P . Stolk, E. R. van den Heuvel, and V . Fidler , “Causal inference algorithms can be useful in life course epidemiology ,” Journal of clinical epidemiology , vol. 67, no. 2, pp. 190–198, 2014. [9] E. Sokolova, P . Groot, T . Claassen, and T . Heskes, “Causal discovery from databases with discrete and continuous variables,” in Probabilistic Graphical Models . Springer , 2014, pp. 442–457. [10] G. F . Cooper , I. Bahar , M. J. Becich, P . V . Benos, J. Berg, J. U. Espino, C. Glymour , R. C. Jacobson, M. Kienholz, A. V . Lee et al. , “The center for causal discovery of biomedical knowledge from big data,” Journal of the American Medical Informatics Association , p. ocv059, 2015. [11] E. W . Frees, Longitudinal and panel data: analysis and appli- cations in the social sciences . Cambridge University Press, 2004. [12] G. M. Fitzmaurice, N. M. Laird, and J. H. W are, Applied longitudinal analysis . John W iley & Sons, 2012, vol. 998. [13] D. Hedeker and R. D. Gibbons, Longitudinal data analysis . John W iley & Sons, 2006, vol. 451. [14] N. R. Swanson and C. W . Granger , “Impulse response functions based on a causal approach to residual or- thogonalization in vector autoregr essions,” Journal of the American Statistical Association , vol. 92, no. 437, pp. 357– 367, 1997. [15] D. A. Bessler and S. Lee, “Money and prices: Us data 1869–1914 (a study with dir ected graphs),” Empirical Eco- nomics , vol. 27, no. 3, pp. 427–446, 2002. [16] S. Demiralp and K. D. Hoover , “Searching for the causal structure of a vector autoregression,” Oxford Bulletin of Economics and statistics , vol. 65, no. s1, pp. 745–767, 2003. [17] A. Moneta, “Graphical causal models and vars: an em- pirical assessment of the real business cycles hypothesis,” Empirical Economics , vol. 35, no. 2, pp. 275–300, 2008. 23 [18] J. Kim, W . Zhu, L. Chang, P . M. Bentler , and T . Ernst, “Uni- ﬁed structural equation modeling approach for the anal- ysis of multisubject, multivariate functional mri data,” Human Brain Mapping , vol. 28, no. 2, pp. 85–93, 2007. [19] A. Moneta, N. Chlaß, D. Entner , and P . O. Hoyer , “Causal search in structural vector autoregressive models.” in NIPS Mini-Symposium on Causality in T ime Series , 2011, pp. 95–114. [20] J. Peters, D. Janzing, and B. Sch ¨ olkopf, “Causal inference on time series using restricted structural equation mod- els,” in Advances in Neural Information Processing Systems , 2013, pp. 154–162. [21] T . Chu and C. Glymour , “Search for additive nonlinear time series causal models,” Journal of Machine Learning Research , vol. 9, no. May , pp. 967–991, 2008. [22] A. Hyv ¨ arinen, S. Shimizu, and P . O. Hoyer , “Causal modelling combining instantaneous and lagged ef fects: an identiﬁable model based on non-gaussianity ,” in Proceed- ings of the 25th international conference on Machine learning . ACM, 2008, pp. 424–431. [23] S. Shimizu, P . O. Hoyer , A. Hyv ¨ arinen, and A. Kerminen, “A linear non-gaussian acyclic model for causal discov- ery ,” Journal of Machine Learning Research , vol. 7, no. Oct, pp. 2003–2030, 2006. [24] P . Spirtes, C. N. Glymour , and R. Scheines, Causation, prediction, and search . MIT press, 2000, vol. 81. [25] R. Rahmadi, P . Groot, M. Heins, H. Knoop, and T . Heskes, “Causality on cross-sectional data: Stable speciﬁcation search in constrained structural equation modeling,” Applied Soft Computing , 2016. doi: http://dx.doi.org/10.1016/j.asoc.2016.10.003. [On- line]. A vailable: http://www .sciencedirect.com/science/ article/pii/S1568494616305130 [26] N. Meinshausen and P . B ¨ uhlmann, “Stability selection,” Journal of the Royal Statistical Society: Series B (Statistical Methodology) , vol. 72, no. 4, pp. 417–473, 2010. [27] N. Friedman, K. Murphy , and S. Russell, “Learning the structure of dynamic probabilistic networks,” in Proceed- ings of the Fourteenth conference on Uncertainty in artiﬁcial intelligence . Morgan Kaufmann Publishers Inc., 1998, pp. 139–147. [28] M. H. Maathuis, M. Kalisch, P . B ¨ uhlmann et al. , “Estimat- ing high-dimensional intervention effects from observa- tional data,” The Annals of Statistics , vol. 37, no. 6A, pp. 3133–3164, 2009. [29] D. J. Stekhoven, I. Moraes, G. Sveinbj ¨ ornsson, L. Hennig, M. H. Maathuis, and P . B ¨ uhlmann, “Causal stability rank- ing,” Bioinformatics , vol. 28, no. 21, pp. 2819–2823, 2012. [30] K. M. Gates, P . C. Molenaar , F . G. Hillary , and S. Slobounov , “Extended uniﬁed sem approach for mod- eling event-related fmri data,” NeuroImage , vol. 54, no. 2, pp. 1151–1158, 2011. [31] J. D. Ramsey , S. J. Hanson, C. Hanson, Y . O. Halchenko, R. A. Poldrack, and C. Glymour , “Six problems for causal inference from fmri,” neur oimage , vol. 49, no. 2, pp. 1545– 1558, 2010. [32] D. Colombo and M. H. Maathuis, “Order -independent constraint-based causal structure learning.” The Journal of Machine Learning Research , vol. 15, no. 1, pp. 3741–3782, 2014. [33] J. Ramsey , J. Zhang, and P . L. Spirtes, “Adjacency- faithfulness and conservative causal inference,” arXiv preprint arXiv:1206.6843 , 2012. [34] J. Ramsey , “Improving accuracy and scalability of the pc algorithm by maximizing p-value,” arXiv preprint arXiv:1610.00378 , 2016. [35] D. M. Chickering, “Learning equivalence classes of Bayesian-network structur es,” The Journal of Machine Learning Research , vol. 2, pp. 445–498, 2002. [36] J. Ramsey , M. Glymour , R. Sanchez-Romero, and C. Glymour , “A million variables and more: the fast greedy equivalence search algorithm for learning high- dimensional graphical causal models, with an application to functional magnetic resonance images,” International Journal of Data Science and Analytics , pp. 1–9, 2017. [37] K. Deb, A. Pratap, S. Agarwal, and T . Meyarivan, “A fast and elitist multiobjective genetic algorithm: NSGA- II,” IEEE T ransactions on Evolutionary Computation , vol. 6, no. 2, pp. 182–197, 2002. [38] J. Pearl, Causality: models, reasoning and inference . Cam- bridge Univ Press, 2000. [39] C. Meek, “Causal inference and causal explanation with background knowledge,” in Proceedings of the Eleventh conference on Uncertainty in artiﬁcial intelligence . Morgan Kaufmann Publishers Inc., 1995, pp. 403–410. [40] J. Pearl, “Statistics and causal inference: A review ,” T est , vol. 12, no. 2, pp. 281–345, 2003. [41] F . Drasgow , “Polychoric and polyserial correlations,” En- cyclopedia of statistical sciences , 1988. [42] J. J. Grefenstette, “Optimization of control parameters for genetic algorithms,” Systems, Man and Cybernetics, IEEE T ransactions on , vol. 16, no. 1, pp. 122–128, 1986. [43] B. L. Miller and D. E. Goldberg, “Genetic algorithms, tournament selection, and the effects of noise,” Complex systems , vol. 9, no. 3, pp. 193–212, 1995. [44] R. Kline, Principles and Practice of Structural Equation Mod- eling , ser . Methodology in the social sciences. Guilford Press, 2011. ISBN 9781606238769 [45] M. Kalisch, M. M ¨ achler , D. Colombo, M. H. Maathuis, and P . B ¨ uhlmann, “Causal inference using graphical models with the R package pcalg,” Journal of Statistical Software , vol. 47, no. 11, pp. 1–26, 2012. [Online]. A vailable: http://www .jstatsoft.org/v47/i11/ [46] C. W ongchokprasitti, rcausal: R-Causal Library , 2016, r package version 0.99.8. [47] T . Fawcett, “ROC graphs: Notes and practical considera- tions for resear chers,” Machine learning , vol. 31, pp. 1–38, 2004. [48] E. R. DeLong, D. M. DeLong, and D. L. Clarke-Pearson, “Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach,” Biometrics , pp. 837–845, 1988. [49] X. Robin, N. T urck, A. Hainard, N. T iberti, F . Lisacek, J.-C. Sanchez, and M. M ¨ uller , “pROC: an open-source package for R and S+ to analyze and compare ROC curves,” BMC bioinformatics , vol. 12, no. 1, p. 77, 2011. [50] E. V enkatraman and C. B. Begg, “A distribution-fr ee procedur e for comparing receiver operating characteristic curves from a paired experiment,” Biometrika , vol. 83, no. 4, pp. 835–848, 1996. [51] R. A. Fisher , Statistical methods for resear ch workers . Gen- esis Publishing Pvt Ltd, 1925. [52] R. A. F . Frederick Mosteller , “Questions and answers,” The American Statistician , vol. 2, no. 5, pp. 30–31, 1948. [Online]. A vailable: http://www .jstor .org/stable/2681650 [53] M. J. Heins, H. Knoop, W . J. Burk, and G. Bleijenberg, “The pr ocess of cognitive behaviour therapy for chronic fatigue syndrome: Which changes in perpetuating cogni- tions and behaviour are r elated to a reduction in fatigue?” 24 Journal of psychosomatic research , vol. 75, no. 3, pp. 235–241, 2013. [54] IBM SPSS Statistics for Windows, V ersion 24. , IBM Corp., Armonk, NY , 2016. [55] J. V ercoulen, C. Swanink, J. Galama, J. Fennis, P . Jongen, O. Hommes, J. V an der Meer , and G. Bleijenberg, “The persistence of fatigue in chronic fatigue syndrome and multiple sclerosis: development of a model,” Journal of psychosomatic research , vol. 45, no. 6, pp. 507–517, 1998. [56] J. F . W iborg, H. Knoop, L. E. Frank, and G. Bleijenberg, “T owards an evidence-based treatment model for cogni- tive behavioral interventions focusing on chr onic fatigue syndrome,” Journal of psychosomatic resear ch , vol. 72, no. 5, pp. 399–404, 2012. [57] M. W . W einer , P . S. Aisen, C. R. Jack, W . J. Jagust, J. Q. T rojanowski, L. Shaw , A. J. Saykin, J. C. Morris, N. Cairns, L. A. Beckett et al. , “The Alzheimer ’s dis- ease neuroimaging initiative: pr ogress report and future plans,” Alzheimer ’ s & Dementia , vol. 6, no. 3, pp. 202–211, 2010. [58] R. C. Petersen, G. E. Smith, S. C. W aring, R. J. Ivnik, E. G. T angalos, and E. Kokmen, “Mild cognitive impairment: clinical characterization and outcome,” Archives of neurol- ogy , vol. 56, no. 3, pp. 303–308, 1999. [59] T . J. Haight, W . J. Jagust, and Alzheimer ’s Disease Neu- roimaging Initiative, “Relative contributions of biomark- ers in Alzheimer ’s disease,” Annals of epidemiology , vol. 22, no. 12, pp. 868–875, 2012. [60] M. J. Peeters, A. D. van Zuilen, J. A. van den Brand, M. L. Bots, M. van Buren, M. A. ten Dam, K. A. Kaasjager , G. Ligtenberg, Y . W . Sijpkens, H. E. Sluiter et al. , “Nurse practitioner care improves renal outcome in patients with CKD,” Journal of the American Society of Nephrology , vol. 25, no. 2, pp. 390–398, 2014. [61] G. M. Chertow , G. A. Block, R. Corr ea-Rotter , T . B. Dr ¨ ueke, J. Floege, W . G. Goodman, C. A. Herzog, Y . Kubo, G. M. London, K. W . Mahaffey et al. , “Effect of cinacalcet on cardiovascular disease in patients undergoing dialysis.” The New England journal of medicine , vol. 367, no. 26, pp. 2482–2494, 2012. [62] R. J. Johnson, J. Feehally , and J. Floege, Comprehensive clinical nephrology . Elsevier Health Sciences, 2014. [63] A. Levin, P . Stevens, R. Bilous, J. Coresh, A. De Fran- cisco, P . De Jong, K. Grifﬁth, B. Hemmelgarn, K. Iseki, E. Lamb et al. , “Kidney Disease: Impr oving Global Out- comes (KDIGO) CKD work group: Kdigo 2012 clinical practice guideline for the evaluation and management of chronic kidney disease,” Kidney Int Suppl , vol. 3, no. 1, p. e150, 2013. [64] W . Henneman, J. Sluimer , J. Barnes, W . V an Der Flier , I. Sluimer , N. Fox, P . Scheltens, H. V renken, and F . Barkhof, “Hippocampal atrophy rates in alzheimer disease added value over whole brain volume measures,” Neurology , vol. 72, no. 11, pp. 999–1007, 2009. [65] D. Mungas, B. Reed, W . Jagust, C. DeCarli, W . Mack, J. Kramer , M. W einer , N. Schuff, and H. Chui, “V olumetric MRI predicts rate of cognitive decline related to AD and cerebr ovascular disease,” Neurology , vol. 59, no. 6, pp. 867–873, 2002. [66] H. Rusinek, S. De Santi, D. Frid, W .-H. T sui, C. Y . T arshish, A. Convit, and M. J. de Leon, “Regional brain atrophy rate predicts future cognitive decline: 6-year longitudinal mr imaging study of normal aging 1,” Radiology , vol. 229, no. 3, pp. 691–696, 2003. [67] S. W . Raudenbush and A. S. Bryk, Hierarchical linear models: Applications and data analysis methods . Sage, 2002, vol. 1. [68] I. G. Kreft and J. de Leeuw , Introducing multilevel modeling . Sage, 1998.

Causality on Longitudinal Data: Stable Specification Search in Constrained Structural Equation Modeling

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment