Causal Inference in Observational Data

Causal Inference in Observ ational Data Pranjul Y adav Dept. of Computer Science University of Minnesota Minneapolis, MN yada v023@umn.edu Michael Steinbach Dept. of Computer Science University of Minnesota Minneapolis, MN stei0062@umn.edu Vipin Kumar Dept. of Computer Science University of Minnesota Minneapolis, USA kumar001@umn.edu Bonnie Westr a School of Nursing University of Minnesota Minneapolis, MN westr006@umn.edu Ale xander Hoff Dept. of Computer Science University of Minnesota Minneapolis, USA hoffx231@umn.edu Connie Delaney School of Nursing University of Minnesota Minneapolis, USA delaney@umn.edu Lisiane Pr unelli School of Nursing University of Minnesota Minneapolis, USA pruin001@umn.edu Gyorgy Simon Dept. of Health Sciences Research Ma yo Clinic, Rochester , MN simon.gyorgy@ma y o .edu ABSTRA CT Our aging population increasingly suﬀers from multiple c hronic diseases sim ultaneously , necessitating the comprehensiv e treat- men t of these conditions. Finding the optimal set of drugs and in terven tions for a com binatorial set of diseases is a com binatorial pattern exploration problem. Association rule mining is a p opular tool for such problems, but the require- men t of health care for ﬁnding causal, rather than asso- ciativ e, patterns renders asso ciation rule mining unsuitable. One of the purp ose of this study was to apply SSC guide- line recommendations to EHR data for patient s with sev ere sepsis or septic sho ck and determine guideline compliance as well as its impact on inpatien t mortality and sepsis com- plications. Propensity Score Matc hing in conjuction with Bootstrap Simulation w ere used to matc h patien ts with and without exp osure to the SCC recommendations. Findings sho wed that EHR data could be used to estimate compli- ance with SCC recommendations as well as the eﬀect of compliance on outcomes. F urther, we prop ose a no vel frame- w ork based on the Rubin-Neyman causal mo del for extract- ing causal rules from observ ational data, correcting for a n umber of common biases. Speciﬁcally , given a set of in- terv entions and a set of items that deﬁne subpopulations (e.g., diseases), we wish to ﬁnd all subpopulations in whic h eﬀectiv e interv ention com binations exist and in eac h suc h subpopulation, w e wish to ﬁnd all in terv ention com bina- tions suc h that dropping any in terven tion from this com- bination will reduce the eﬃcacy of the treatmen t. A k ey aspect of our framework is the concept of closed interv en- Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proﬁt or commercial advantage and that copies bear this notice and the full cita- tion on the ﬁrst page. Copyrights for components of this work owned by others than A CM must be honored. Abstracting with credit is permitted. T o copy otherwise, or re- publish, to post on servers or to redistribute to lists, requires prior speciﬁc permission and /or a fee. Request permissions from permissions@acm.org. c  2016 A CM. ISBN 978-1-4503-2138-9. DOI: 10.1145/1235 tion sets which extend the concept of quantifying the eﬀect of a single in terven tion to a set of concurren t in terven tions. Closed interv ention sets also allow for a pruning strategy that is strictly more eﬃcien t than the traditional pruning strategy used by the Apriori algorithm. T o implement our ideas, we introduce and compare ﬁve metho ds of estimating causal eﬀect from observ ational data and rigorously ev alu- ate them on syn thetic data to mathematically pro ve (when possible) why they work. W e also ev aluated our causal rule mining framew ork on the Electronic Health Records (EHR) data of a large cohort of patients from May o Clinic and sho wed that the patterns we extracted are suﬃciently rich to expla in the contro v ersial ﬁndings in the medical literature regarding the eﬀect of a class of cholesterol dru gs on T ype-I I Diabetes Mellitus (T2DM). K eywords Causal Inference, Confounding, Counterfactual Estimation. 1. RELA TED WORK Causation has received substan tial researc h interest in many areas. In computer science, Pea rl [5] and Rosenbaum[6] laid the foundation for causal inference, upon whic h several ﬁelds, cognitive science, econometrics, epidemiology , philos- oph y and statistics ha ve built their resp ective metho dologies [7, 8, 9]. A t the center of causation is a causal mo del. Arguably , one of the earliest and popular mo dels is the Rubin-Neyman causal mo del [3]. Under this mo del X causes Y , if X o ccusr before Y ; and without X , Y would b e diﬀeren t. Beside the Rubin-Neyman mo del, there are sev eral other causal models, including the Granger causality [10] for time se- ries, Bay es Netw orks [11], Structural Equation Modeling [8], causal graphical models [12], and more generally , probabilis- tic graphical models [13]. In our w ork, w e use the p otential outcome framework from the Rubin-Neyman model and we use causal graphical models to identify and correct for bi- ases. Causal graphical mo dels are to ols to visualize causal re- lationships among v ariables. No des of the causal graph are v ariables and edges are causal relationships. Most methods assume that the causal graph structure is a pri ori given, how- ev er, metho ds hav e b een prop osed for discov ering the struc- ture of the causal graph [14, 15]. In our w ork, the structure is partially given : w e know the relationships among groups of v ariables, how ever we ha ve to assign each v ariable to the correct group based on data. Kno wing the correct graph structure is imp ortan t, because substructures in the graph are suggestive of sources of bias. T o correct for biases, we are looking for sp eciﬁc substruc- tures. F or example, causal c hains can be sources of ov ercor- rection bias and ” V” -shap ed structures can b e indicativ e of confounding or endogenous selection bias [9]. Many other in teresting substructures hav e b een studied [16, 17, 18]. In our w ork, we consider three fundamental suc h structures: direct causal eﬀect, indirect causal eﬀect and confounding. Of these, confounding is the most sev ere and received the most research interest. Numerous methods exist to handle confounding, whic h includes prop ensity score matc hing (PSM) [19], structural marginal mo dels [9] and g-estimation [8]. The latter tw o extend PSM for v arious situations, for example, for time- v arying in terven tions [9]. Propensity score matching is used to estimate the eﬀect of an interv ention on an outcome. The prop ensit y score is the prop ensity (probability) of a patient receiving the in ter- v ention giv en his baseline c haracteristics and the propensity score is used to create a new p opulation that is free of con- founding. Man y PSM techniques exist and they typically diﬀer in ho w they use the prop ensit y score to create this new p opulation [20, 21, 22, 23]. Applications of causal mo deling is not exclusive to so cial and life sciences. In data mining, Lambert et al. [24] in- v estigated the causal eﬀect of new features on click through rates and Chan et al. [25] used doubly robust estimation tec hniquest to determine the eﬃcacy of displa y adv ertise- men ts. Ev en extending asso ciation rules mining to causal rule mining has b een attempted b efore [26, 27, 28]. Li et al. [26] used o dds ratio to identify causal patterns and later ex- tended their technique [28] to handle large data set. Their tec hnique, how ever, is not rooted in a causal model and hence oﬀers no protection against computing systematically biased estimates. In their proposed causal decision trees [29], they used the p otential outcome framework, but still ha ve not addressed correction for v arious biases, including confounding. 2. SIMPLE CA USAL RULE MINING IN IR- REGULAR TIME-SERIES D A T A 2.1 Introduction According to the Cen ter for Disease Control and Preven- tion, the incidence of sepsis or septicemia has doubled from 2000 through 2008, and hospitalizations hav e increased by 70% for these diagnoses1. In addition, sev ere sepsis and shock hav e higher mortalit y rates than other sepsis diag- noses, account ing for an estimated mortalit y betw een 18% and 40%. During the ﬁrst 30 days of hospitalization, mor- talit y can range from 10% to 50% depending on the patient s risk factors. P atients with sev ere sepsis or septic shock are sic ker, hav e longer hospital stays, are more frequently dis- c harged to other short-term hospital or long-term care insti- tutions, and represen t the most expensive hospital condition treated in 20112. The use of evidence-based practice (EBP) gui delines, such as the Surviving Sepsis Campaign (SSC), could lead to an earlier diagnosis, and consequently , earlier treatmen t. Ho w- ev er, these guidelines hav e not b een widely incorporated in to clinical practice. The SSC is a compilation of interna- tional recommendations for the management of severe sep- sis and shock. Many of these recommendations are inter- v entions to preven t further system deterioration during and after diagnosis. Even when the presence of sepsis or progres- sion to sepsis is susp ected early in the course of treatment, timely implementation of adequate treatment management and guideline compliance are still a c hallenge. Therefore, the eﬀectiveness of the guideline in preven ting clinical com- plications for this population is still unclear to clinicians a nd researc hers alik e. The ma jority of studies hav e fo cused on early detection and preven tion of sepsis and little is kno wn about the com- pliance rate to SSC and the impact of compliance on the pre- v ention of sepsis-related complication s. F urther, the mea- suremen t of adherence to individual SSC recommendations rather than the entire SSC is, to our knowledge, limited. The ma jority of studies ha ve used traditional randomized con trol trials with analytic tec hniques such as regression mo deling to adjust for risk factors kno wn from previous research. Data- driv en metho dologies, such as data mining techniques and mac hine learning, ha v e the potential to iden tify new insigh ts from electronic health records (EHRs) that can strengthen existing EBP guidelines. The national mandate for all health profess ionals to imp le- men t in terop erable EHRs by 2015 provides an opp ortunit y for the reuse of p otentially large amounts of EHR data to ad- dress new researc h questions that expl ore patterns of patien t c haracteristics, evidence-based guideline in terven tions, and impro vemen t in health. F urthermore, expanding the range of v ariables do cumented in EHRs to include team-based as- sessmen t and interv ention data can increase our understand- ing of the compliance with EBP guidelines and the inﬂuence of these guidelines on patient outcomes. In the absence of suc h data elements, adherence to guidelines can only b e in- ferred; it cannot b e directly observed. In this section, w e present a metho dology for using EHR data to estimate the compliance with the SSC guideline rec- ommendations and also estimate the eﬀect of the individ- ual recommendations in the guideline on the preven tion of in-hospital mortalit y and sepsis-related complications in pa- tien ts with severe sepsis and septic shock. 2.2 Methods Data from the EHR of a health system in the Midw est w as transferred to a clinical data rep ository (CDR) at the Univ ersity of Minnesota which is funded through a Clini- cal T ranslational Science Award. After IRB approv al, de- iden tiﬁed data for all adult patien ts hospitalized betw een 1/1/09 to 12/31/11 with a severe sepsis or sho c k diagnosis w as obtained for this study . 2.2.1 Data and cohort selection The sample included 186 adult patien ts age 18 years or older with an ICD-9 diagnosis code of severe sepsis or sho c k (995.92 and 785.5*) iden tiﬁed from billing data. Since 785.* codes corresp onding to sho c k can capture patients without sepsis, patien ts without severe sepsis or septic shoc k, and pa- tien ts who did not receive antibio tics w ere excluded. These exclusions aimed to capture only those patients who had se- v ere sepsis and septic sho c k, and were treated for that clin- ical condition. The ﬁnal sample consisted of 177 patients. 2.2.2 V ariables of inter est Fifteen predictor v ariables (baseline characteristic s) were collected. These include so cio demographics and health dis- parities data: age, gender, race, ethnicit y , and pay er (Medi- caid represents low income); lab oratory results: lactate and white blood cells coun t (WBC); vi tal signs: heart rate (HR), respiratory rate (RR), temperature (T emp), mean arterial bloo d pressure (MAP); and diagnoses for respiratory , car- dio v ascular, cerebro v ascular, and kidney-related comorbid conditions. ICD-9 codes for comorbid conditions were se- lected according to evidence in the literature . Comorbidities w ere aggregated from the patien tˆ a ˘ A ´ Zs prior problem list to detect preexisting (up on admission) respiratory , cardio v as- cular, cerebro v ascular, and kidney problems. Eac h category w as treated as yes/no if an y of the ICD-9 co des in that cat- egory were present. The outcomes of in terest w ere inhospital mortality and de- v elopment of new complication s ( respiratory , cardio v ascular, cerebro v ascular, and kidney) during the hospital encoun ter. New complications were determined as the presence of ICD- 9 co des on the patients billing data that did not exist at the time of the admission. 2.2.3 Study design This study aimed to analyze compliance with the SSC guideline recommendations in patients with sev ere sepsis or septic shock. Therefore, the baseline (TimeZero) w as de- ﬁned as the onset of sepsis and the patien ts were under ob- serv ation un til discharged. Unfortunately , the timestamp for the diagnoses is dated bac k to the time of admission; hence the onset of sepsis needs to b e estimated. The onset time for sepsis was deﬁned as the earliest time during a hos- pital encoun ter when the patient meets at least tw o of the follo wing six criteria: MAP < 65, HR > 100, RR > 20, tem- perature < 95 or > 100.94, WBC < 4 or > 12, and lactate > 2.0. The onset time w as established based on curren t clinical practice and literature on sepsis5. The earliest time when t wo or more of these aforementioned conditions were met, a TimeZero ﬂag w as added to the time of ﬁrst o ccurrence of that abnormalit y , and the timing of the SSC compliance commenced. 2.2.4 Guideline compliance SSC guideline recommendations were translated into a readily computable set of rules. These rules hav e condi- tions related to an observ ation (e.g. MAP < 65 Hgmm) and an in terven tion to administer (e.g. give v asopressors) if the patien t meets the condition of the rule. The SSC guideline w as transformed in to 15 rules in a computational format, one for each recommendation in the SSC guideline recom- mendations, and eac h rule w as ev aluated for eac h patien t (see Figure 1). After eac h rule is an abbreviated name sub- sequen tly used in this pap er. Figure 1: SSC rules for measuring guideline compli- ances W e call the treatmen t of a patien t compli ant (exp osed) for a sp eciﬁc recommendation, if the patient meets the condi- tion of the corresponding rule any time after TimeZero and the required in terven tion was administered; the treatmen t is non-compliant (unexposed) if the patient meets the con- dition of the corresp onding rule after TimeZero, but the in- terv ention w as not administered (an y time after TimeZero); and the recommendtion is not applicable to a treatmen t if the pat ient do es not meet the condition of the co rresp onding rule. In estimating compliance (as a metric) with a speciﬁc recommendation, we simply measure the num b er of compli- an t encounters to which the recommendation is applicable. In this phase of the study , the time when a recommendation w as administered was not incorp orated in the analysis. W e also estimate the eﬀect of the recommendation on the outcomes. W e call a patien t exp osed to a recommendation, if the recommendation is applicable to the patient and the corresponding interv ention was administered to the patient. W e call a p atient unexposed to a recommendation if the rec- ommendation is applicable but was not applied (the treat- men t w as non-complian t). The incidence fraction in exp osed patien ts with respect to an outcome is the fraction of pa- tien ts with the outcome among the exp osed patients. The incidence fraction of the unexp osed patients can b e deﬁned analogously . W e deﬁne the eﬀect of the recommendation on an outcome as the diﬀerence in the incidence fractions betw een the unexp osed and exp osed patients. The recom- mendation is b eneﬁcial (protectiv e against an outcome) if the eﬀect is p ositive, namely , the incidence faction in the unexposed is higher than the incidence fraction in the unex- posed patients. 2.2.5 Data quality Included v ariables were assessed for data quality regard- ing accuracy and completeness based on the literature and domain knowledge. Constraints were determined for plausi- ble v alues, e.g., a CVP reading could not be greater than 50. V alues outside of cons traints were reco ded as mi ssing v alues. An y observ ation that took place b efore the estimated onset of sepsis (TimeZero) w as considered a baseline observ ation. Simple mean imputation was the metho d of choice for im- puting missing v alues. Imputation was necessary for lactate (7.7%), temp erature (3%), and WBC (3%). There was no missing data for the other v ariables and for the outcomes of interest. Central venous pressure was not included as a baseline c haracteristic due to the high n umber of missing v alues (54%). 2.2.6 Pr opensity score matc hing P atients who received SSC recommendation s ma y b e in w orse health than patient who did not receive SSC rec- ommendations. F or example, patients whose lactate w as measured may hav e more apparen t (and p ossibly adv anced) sepsis than patients whose lactate was not measured. T o compensate for such disparities, propensity score matching (PSM) was employ ed. The goal of PSM is to balance the data set in terms of the cov ariates betw een patients exposed and unexposed to the SSC guideline recommendations. This is ac hiev ed b y matc hing exposed patien ts with unexp osed patien ts on their prop ensit y (probabilit y) of receiving the recommendations. This ensures that at TimeZero, pairs of patien ts, one exp osed and one unexp osed, are at the same state of health and they only diﬀers in their exp osure to the recommendation. PSM is a p opular tec hnique for estimating treatmen t eﬀects. T o compute the prop ensity of patients to receive treat- men t, a logistic regression mo del w as used, where the de- pendent v ariable is exposure to the recommendation and the indep enden t v ariables are the co v ariates. T he linear prediction (propensity score) of this model w as computed for every patien t. A new (matched) p opulation was created from pairs of exposed and unexposed patients with match- ing prop ensity scores. Tw o scores match if they diﬀer by no more than a certain calip er (.1 in our study). The eﬀect of the recommendation was estimated by comparing the inci- den t fraction among the exp osed and unexp osed patients in the matched p opulation. 2.2.7 PSM nested inside bootstrapping simulation In order to incorporate the eﬀect of additional sources of v ariability arising due to estimation in the prop ensity score model and v ariability in the prop ensity score matc hed sam- ple, 500 b o otstrap samples were drawn from the original sample. In eac h of these b o otstrap iterations, the prop ensity score model was estimated using the abov e calip er matching tec hniques and the eﬀect of the recommendation was com- puted with resp ect to all outcomes. In recent years, b o ot- strap simulation has b een widely employ ed in conjunction with PSM to b etter handle bias and confounding v ariables. F or each recommendation and outcome, the 500 b o otstrap iterations result in 500 estimates of the eﬀect (of the recom- mendation on the outcome), appro ximating the sampling distribution of the eﬀect. 2.3 Results T able 1 shows the baseline characteristics of the study population. Results are reported as total count for categor- ical v ariables, and mean with in ter-quartile (25% to 75%) range for contin uous v ariables. As shown in T able 1, the ma jority of patients were male, Caucasian, and had Medi- caid as the pay er. Before the onset of sepsis, Cardiov ascular comorbidities (56.4%) were common, the mean HR (101.3) w as slightly ab o ve the normal, as w ell as lactate (2.8), and WBC (15.8). The mean length o f stay for the sample w as 15 da ys, rangi ng from less than 24 hours to 6 months. TimeZero w as within the ﬁrst 24 hours of admission, and patients at that time were primarily (86.4%) in the emergency depart- men t. F eature Mean T otal Number of Patien ts 177 Averag e Age 61 Gender(Male) 102 Race(Caucasian) 97 Ethnicity (Latino) 11 Pa yer(Medicaid) 102 White Blo o d cell 15.8 Lactate 28 Mean blo o d Pressure 73.9 T emperature 98.4 Heart Rate 101.3 Respiratory Rate 20.6 Cardiov ascular 100 Cerebrov ascular 66 Respiratory 69 Kidney 62 T able 1: Demographics statistics of patien t p opula- tion In Figure 2, the eﬀects of v arious rule-combination pairs are depicted. An eﬀect is deﬁned as the diﬀerence in the mean rate of progression to complications b etw een the ex- posed and unexposed groups. Since w e used b o otstrap sim u- lation, for each rule-complication pair, 500 replications were performed resulting in a sampling distribution for the ef- fect. Sampling distribution for each rule-association pair is presen ted as boxplots. The b oxplots represent the statis- tic measured, i.e. in this study , the diﬀerential impact of a recommendation on mortalit y b etw een the exp osed and unexposed population. When this statistic is 0, the recom- mendation has no eﬀect. If the recommendation is greater than 0, it means that the recommendation is protectiv e for that sp eciﬁc condition; and if the recommendation is b elow 0, the recommendation may even increase the risk for the outcome for that sp eciﬁc condition. The panes (groups of b oxplot s) correspond to the compli- cations and the b o xes within each pane corresp ond to the recommendation (rule). F or example, the eﬀect of the V en- tilator rule (Recommendation 15: patients in respiratory distress should be put on ven tilator) on mortalit y (Death) is sho wn in the rightmost box (V en tilator) in the bottom-most pane (Death). Since all eﬀects in the b o xplot are ab ov e 0, namely the num b er of observ ed complications in the unex- posed group is higher than in the exposed, compliance with the V entilator rule reduces the num b er of deaths. Therefore, the corresp onding recommendation is b eneﬁcial to protect patien ts from Death (mortalit y). In T able 3, we present the 95% Conﬁdence In terv als for v arious rule-outcome pairs. 95% Conﬁdence interv als for v arious rule-outcome pairs. Figure 2: Bo x-plots of the mean diﬀerence b etw een groups (unexp osed - exp osed) to the guideline rec- ommendations and each of the outcomes of interest. T o further ensure the v alidity of the results, we examine the propensity score distribution in the exposed and unexp osed group. As an example, Figure 3 illustrates the prop ensity score distribution for a randomly selected b ootstrap iter- ation to measure the eﬀect of V en tilator on Death. The horizon tal axis represents the prop ensity score, which is the probabilit y of receiving the in terv entions, and the vertical axis represent s the density distribution, namely the prop or- tion of patients in each group with a particular prop ensity for b eing put on V entilator. Figure 3 sho ws substantial o verlap b etw een the prop ensity scores in the exposed and unexposed group. The propensity score ov erlap represents the distribution; the predictor V en tilator across the exp osed and unexp osed populations regarding the outcome Death; the balance was successful when the propensity score w as applied for this p opulation. Other rule-complication pairs exhibit similar propensity score distribution. 3. CONCLUSION The ov erall purp ose of this study was to use EHR data to determine compliance with the Surviving Sepsis Campaign Figure 3: Distribution of the prop ensity scores b e- t ween exp osed and unexposed groups for the out- come Death when patients and the SSC recommen- dation w as V en tilator.. (SSC) guideline and measure its impact on inpatient mor- talit y and sepsis complications in patients with severe sep- sis and septic sho c k. Results show ed that compliance with man y of the recommendations was > 95% for MAP and CVP with ﬂuid resuscitation given for low readings. Other high compliance (greater than 80%) recommendations were: in- sulin giv en for high blo o d glucose and ev aluating respiratory distress. The recommendations with the low est compliance ( < 30%) were: v asopressor or albumin for con tin uing lo w MAP or CVP readings. This ma y be due to a study de- sign artifact, where the rule only considered interv entions initiated after TimeZero (estimated onset of sepsis) while the ﬂuid resuscitation may hav e taken place earlier. Al- ternativ ely , the apparently p o or compliance could also b e explained with issues related to the coding of ﬂuids: during data v alidation, w e found that it w as diﬃcult to trac k ﬂuids. Our study also demonstrates that retrospective EHR data can be used to ev aluate the eﬀect of compliance with guide- line recommendations on outcomes. W e found a n umber of SSC recommendations that were signiﬁcantly protectiv e against more than one complication: V en tilator wa s protec- tiv e against Cardiov ascular and Respiratory complications as w ell as Death; use of V asopressors was pro tective for Res- piratory complications. Other recommendations, BCulture, Antibiotic, V asopres- sor, Lactate, CVP , and RespDistress, show ed results less consisten t with our exp ectation. F or instance, V asopressor used to treat low MAP , appears to increase cerebro v ascular complications. While this ﬁnding is not statistically signif- ican t, it ma y b e congruen t with the fact that small brain v essels are v ery sensitive to changes in blo o d pressure. Low MAP can cause oxygen depriv ation, and consequently brain damage. V en tilator, V asopressor, and BGlucose sho wed protective eﬀects against Respiratory complications. The SSC guide- line recommends the implementation of ven tilator therapy as so on as any c hange in respiratory status is noticed. This in terven tion aims to protect the patient against further sys- tem stress, restore hypoxia, help with p erfusion across the main respiratory-cardio vessels, and decrease release of to x- ins due to respiratory eﬀorts. Our study is a pro of-of-concept study demonstrating that EHR data can b e used to estimate the eﬀect of guideline recommendations. How ever, for sev eral combinations of rec- ommendations and outcomes, the eﬀect was not signiﬁcan t. W e believ e that the reason is that guid elines represen t w ork- ﬂo ws and t he eﬀect of the w orkﬂow go es beyond the eﬀects of the individual guideline recommendations. F or example, b y considering the recommendations outside the context of the w orkﬂow, w e may ignore whether the interv ention addressed the condition that triggered its administration. If low MAP triggered the administration of v asopressors, without consid - ering the workﬂo w, we do not kno w whether MAP returned to the normal levels thereafter. Thus we cannot equate an adv erse outcome with the failure of the guideline, it may b e the result of the insuﬃciency of the in terv ention. Moving forw ard, we are going to model the w orkﬂows b ehind the guidelines and apply the same principles that we developed in this work to estimate the eﬀect of the entire workﬂo w. This phase of our study did not address the timing of recommendations nor the time prior to TimeZero. F or this analysis, guideline compliance was considered only after Time- Zero (the estimated onset), since compliance with SSC is only necessary in the presence of suspected or conﬁrmed sep- sis. There is no reason to susp ect sepsis b efore TimeZero. Ho wev er, some interv entions ma y hav e started earlier, with- out respect to sepsis. F or example, 100% of the patients in this sample had an tibiotics (p oten tially preven tive an tibi- otics), b ut only 99 (55%) patien ts receiv ed it after TimeZero. The EHR does not pro vide date a nd time for certain ICD- 9 diagnoses. During a hospital stay , all new diagnoses are recorded with the admission date. W e kno w whether a diag- nosis was presen t on ad mission or not, thus we know whether it is a preexisting or new condition, but do not know pre- cisely when the patient developed this condition during the hospitalization. F or this reason, we are unable to detect whether the SSC guideline w as applied before or after a com- plication o ccurred, th us w e ma y underestimate the beneﬁcial eﬀect of some of the recommendations. F or example, high lev els of lactate is highly related to hypoxia and pulmonary damage. If these patients w ere c heck ed for lactate after pul- monary distress, w e would consider the treatmen t compliant with the Lactate recommendation, but we would not know that the respiratory distress was already present at the time of the lactate measuremen t and we would incorrectly count it as a complication that the guideline failed to prev en t. 4. COMPLEX CA USAL RULE MINING IN IRREGULAR TIME-SERIES D A T A 4.1 Introduction Eﬀectiv e management of human health remains a ma jor societal challenge as evidenced by the rapid growth in the n umber of patien ts with multiple chronic conditions. T ype- II Diab etes Mellitus (T2DM), one of those conditions, af- fects 25.6 million (11.3%) Americans of age 20 or older and is the sev enth leading cause of deat h in the United States [1]. Eﬀectiv e treatment of T2DM is frequen tly complicated b y diseases comorbid to T2DM, suc h as high blo o d pressure, high c holesterol, and ab dominal obesity . Currently , these diseases are treated in isolation, which leads to wasteful du- plicate treatments and sub optimal outcomes. The recent rise in the n umber of patients with multiple chronic condi- tions necessitates comprehensive treatment of these condi- tions to reduce medical waste and improv e outcomes. Finding optimal treatment for patients who suﬀer from m ultiple associated diseases, each of which can hav e m ul- tiple a v ailable treatments is a complex problem. W e could simply u se tec hniques based on association, but a reasonable algorithm would likely ﬁnd that the use of a drug is asso ci- ated with some unfav orable outcome. This do es not mean that the drug is harmful; in fact in man y cases, it simply means that patients who take the drug are sick er than those who do not and th us they hav e a higher c hance of the unfa- v orable outcome. What we really wish to know is whether a treatment c auses an unfav orable outcome, as opp osed to being merely associated with it. The diﬃcult y in quantifying the eﬀect of interv entions on outcomes stems from subtle biases. Supp ose we wish to quan tify the eﬀect of a choleste rol-low ering agent , statin, on diabetes. W e could simply compare the prop ortion of dia- betic patients in the subpopulation that tak es statin and the subpopulation that do es not and estimate the eﬀect of statin as the diﬀerence b etw een the tw o prop ortions. This metho d w ould give the correct answ er only if the statin-taking and non-statin-taking patien ts are iden tical in all resp ects that inﬂuence the diab etes outcome. W e refer to this situation as treated and untreated patients b eing c omp ar able . Unfor- tunately , statin taking patients are not comparable to non- statin-taking patien ts, b ecause they tak e statin to treat high c holesterol, which b y and in itself increases the risk of dia- betes. High cholesterol c onfounds the eﬀect of statin. Man y diﬀerence sources of bias e xist, confounding is just one of th e man y . In this manu script, we are going to address several diﬀeren t sources of bias, including confounding. T ec hniques to address suc h biases in causal eﬀect estima- tion exist. Ho wev er, these techniques hav e b een designed to quan tify the eﬀect of a single interv en tion. In trying to apply these techniques to our pro blem of ﬁnding optimal treatment for patients suﬀering from v arying sets of diseases, we face t wo challeng es. First, patient s with m ultiple conditions will lik ely need a com bination of drugs. Quantifying the eﬀect of multi- ple concurren t interv entions is seman tically diﬀerent from considering only a single in terven tion. The key concept in estimating the eﬀect of an interv ention is comp ar ability : to estimate the eﬀect of in terv ention, we need t wo groups of patien ts who are identical in all relev ant aspects except that one group receives the in terve ntion and the other group does not. F or a single in terv ention, the ﬁrst gro up is typ ically the sic kest patients who still do not get treated and the second group consists of the healthiest patient who get treatment. They are reasonably in the same state of health. How ever, when we go from a single in terven tion to m ultiple interv en- tion and try to estimate their joint eﬀect, comparability no longer exists. A patien t requiring multiple sim ultaneous in- terv entions is so fundamentally diﬀeren t from a patient who does not need any interv ention that they are not compara- ble. The other k ey c hallenge in ﬁnding optimal in terv ention sets for patien ts with com binatorial sets of diseases is the com binatorial search space. Even if we could trivially extend the methods for quan tifying the eﬀect of a single interv en- tion to a set of concurrent in terven tions, we would hav e to systematically explore a com binatorially large search space. The asso ciation rule mining framework [2] provides an ef- ﬁcien t solution for exploring combinato rial search spaces, ho wev er, it only detects associative relationships. Our in- terest is in causal relationships. In this man uscript, we propose causal rule mining, a frame- w ork for transitioning from asso ciation rule mining tow ards causal inference in subp opulations. Sp eciﬁcally , given a set of interv entions and a set of items to deﬁne subpopulations, w e wish to ﬁnd all subp opulations in which eﬀective inter- v ention combinations exist and in each such subpopulation, w e wish to ﬁnd all interv ention combinations such that drop- ping an y interv en tion from this combination will reduce the eﬃcacy of the treatment. W e call these close d intervention sets , which are not b e confused with closed item sets. As a concrete example, in terven tions can be drugs, subp opula- tions can b e deﬁned in terms of their diseases and for each subpopulations (set of diseases), our algorithm would return eﬀectiv e drug co cktails of increasing num b er of constituen t drugs. Leaving out any drug from the co c ktail will reduce the eﬃcacy of the treatmen t. Closed interv en tion sets allow us to go from estimating a single interv ention to multiple in terven tions. T o address the exploration of the combinatorial search space, we prop ose a nov el frequency-based anti monotonic pruning strategy enable by the closed in terven tion set con- cept. The essence of an timonotonic prop erty is that if a set I of interv entions do es not satisfy a criterion, none of its supersets will. The proposed pruning strategy based on the closed interv ention is strictly more eﬃcient than the tradi- tional pruning strategy used b y the Apriori algorithm [2]. Underneath our combinatorial exploration algorithm, we utilize the Rubin-Neyman mo del of causation [3]. This mo del sets tw o conditions for causation: a set X of interv entions causes a c hange in Y iﬀ X happ ens b efore Y and Y would b e diﬀeren t had X not o ccurred. The unobserv able outcome of what w ould happen had a treated patien t not received treat- men t is a p otential outc ome and needs to b e estimated. W e presen t and compare ﬁve methods for estimating these p o- ten tial outcomes and describe the biases these methods can correct. T ypically the ground truth for the eﬀect of drugs is not kno wn. In order to assess the quality of the estimates, we conduct a sim ulation study utilizing ﬁve diﬀerent syn thetic data set that in tro duce a new source of bias. W e will ev al- uate the eﬀect of the bias on the ﬁv e prop osed methods underscoring the statemen ts with rigorous pro ofs when p os- sible. W e also ev aluate our work on a real clinical data set from Ma yo Clinic. W e hav e data for o ver 52,000 patients with 13 y ears of follow-up time. Our outcome of int erest is 5-year inciden t T2DM and we wish to extract patterns of in terven- tions for patients suﬀering from com binations of common comorbidities of T2DM. First, we ev aluate our metho dology in terms of the computational cost, demonstrating the eﬀec- tiv eness of the pruning metho dologies. Next, w e ev aluate the patterns qualitativ ely , using patterns inv olving statins. W e sho w that our methodology extracted patterns that allo w us to explain the contro versial patterns surrounding statin [4]. Con tributions. (1) W e propose a no vel framew ork for ex- tracting causal rules from o bserv ational data correcting for a n umber of common biases. (2) W e introduce the concept of closed in terven tion sets to extend the concept of quantifying the eﬀect of a single interv ention to a set of concurrent in- terv entions sidestepping the patient comparabilit y problem. Closed interv ention sets also allow for a pruning strategy that is strictly more eﬃcien t than the traditional pruning strategy used by the Apriori algorithm [2]. (3) W e compare ﬁv e metho ds of estimating causal eﬀect from observ ational data that are applicable to our problem and rigorously ev al- uate them on syn thetic data to mathematically prov e (when possible) why they work. 4.2 Background: Association Rule Mining W e ﬁrst brieﬂy review the fundamental concepts of asso- ciation rule mining and extend these concepts to causal rule mining in t he next section. Consider a set I of items , whic h are single-term predicates ev aluating to ‘true’ or ‘false’. F or example, { ag e > 55 } can b e in item. A k- itemset is a set of k items, ev aluated as the conjunction (logical ’and’) of its constituent items. Consider a dataset D = { d 1 , d 2 .....d n } , which consists of n observ ations . Eac h observ ation, de- noted by D j is a set of items. An itemset I = i 1 , i 2 , . . . , i k ( I ⊂ I ) supp orts an observ ation D j if all items in I ev al- uate to ‘true’ in the observ ation. The supp ort of I is the fraction of the observ ations in D that support I . An item- set is frequ ent if its support exceeds a pre-deﬁned minim um support threshold. A asso ciation rule is a logical implication of form X ⇒ Y , where X and Y are disjoin t itemsets. The support of a rule is supp ort( X Y ) and the conﬁdence of the rule is conf ( X ⇒ Y ) = supp ort( X Y ) supp ort( X ) = P( Y | X ) . 4.2.1 Causal Rule Mining Giv en an interv ention itemset X and an outcome item Y , such that X and Y are disjoin t, a causal rule is an impli- cation of form X → Y , suggesting that X c auses a c hange in Y . Let the itemset S deﬁne a subp opulation , consist- ing of all observ ations that supp ort S . This subp opulation consists of all observ ations for which all items in S ev alu- ate to ‘true’. The causal rule X → Y | S implies that the in terven tion X has causal eﬀect on Y in the subpopulation deﬁned by S . The quantit y of interest is the causal eﬀect , whic h is the change in Y in the subp opulation S caused by X . W e will formally deﬁne the metric used to quantify the causal eﬀect shortly . Rubin-Neyman Causal Model. X has a causal eﬀect on Y if (i) X happ ens earlier than Y and (ii) if X had not happened, Y would b e diﬀeren t [3]. Our study design ensures that the interv en tion X precedes the outcome Y , but fulﬁlling the second conditions requires that w e estimate the outcome for the same patien t b oth under interv ent ion and without in terven tion. Potential Outc omes. Ev ery patien t in the dataset has tw o potential outcomes: Y 0 denotes their outcome had they not had the interv ention X ; and Y 1 denotes the outcome had they had the in terv ention. Typically , only one of the t w o po- ten tial outcomes can b e observ ed. The observ able outcome is the actual outcome (denoted b y Y ) and the unobserv able potential outcome is called the counterfactual outcome. Using the deﬁnition of counterfactual outcome, w e can no w deﬁne the metric for estimating the change in Y caused b y X . Average T reatmen t resp onse on the T reated (A TT) is a widely known metric in the causal literature and is computed as follows: A TT( X → Y | S ) = E [ Y 1 − Y 0 ] X =1 = E [ Y 1 ] X =1 − E [ Y 0 ] X =1 , where E denotes the exp ectation and the X = 1 in the sub- script signals that we only ev aluate the exp ectation in the treated patients ( X = 1). A TT aims to compute an av erage p er-patient change caused b y the in terven tion. Y 0 = Y 1 , indicates that the in terven tion resulted in no change in outcome for the patient. Biases. Beside X , n umerous other v ariables can also exert inﬂuence o ver Y , leading to biases in the estimates. T o correct for these biases, we ha ve correctly account for these other eﬀects. The quintessen tial to ol for this purp ose is the causal graph, depicted in Figure 4. The no des of this graph are sets of v ariables that pla y a causal role and edges are causal eﬀects. This is no t a correlati on graph (or dep endence graph), because for example, U and Z are dep enden t given X , y et there is no edge betw een them. V ariables (items in I ) can exert inﬂuence on the eﬀect of X on Y in three wa y: they may only inﬂuence X , they may only inﬂuence Y or them may inﬂuence b oth X and Y . Ac- cordingly , v ariables can b e categorized into four categories: V are v ariables that directly inﬂuence Y and th us ha ve dir e ct eﬀe ct on Y U are v ariables that only inﬂuence Y through X and th us ha ve indir e ct eﬀe ct on Y ; Z are v ariables that inﬂuence b oth X and Y and are called c onfounders ; and ﬁnally O are v ariables that do not inﬂuence either X or Y and hence can b e safely ignored. Figure 4: Rubin-Neyman Causal Mo del Most of the causal inference literature assumes that the causal graph is known and true. In other words, we kno w apriori which v ariables fall into each of the categories, U , Z , V and O . In our case, only X and Y are sp eciﬁed and w e hav e to infer which category each other v ariable (item) belongs to. Since this inference relies on asso ciation (de- pendence) rather than causation, the discov ered graph may ha ve errors, misclassiﬁcations of v ariables into the wrong category . F or example, b ecause of the marginal dep endence betw een U and Y , v ariables in U can easily get misclassi- ﬁed as Z . Suc h misclassiﬁcations do not necessarily lead to biases, but they can cause loss of eﬃciency . Problem F ormulation. Giv en a data set D , a set S of subp opulation-deﬁning items, a set X of in terv en tion items, a minimal supp ort threshold θ and a minim um eﬀect threshold η , w e wish to ﬁnd all subp opulations S ( S ⊂ S ) and all interv etions X ( X ⊂ X ), X and S are disjoint, such that the causal rule X → Y | S is frequent and its in terven tion set X is closed w.r.t. our metric of causal eﬀect, A TT. Note that the meaning of θ , the minimum support thresh- old, is diﬀerent than in asso ciation rule mining literature. T ypically , rules with supp ort less than θ are considered un- in teresting, in other cases, it is simply a computational con- v enience, but in our case, w e set θ to a minimum v alue such that A TT is estimable for the disco vered patterns. W e call a causal rule frequen t iﬀ its support exceeds the user-speciﬁed minimum threshold θ supp ort( X → Y | S ) = supp ort( X Y S ) = P( X Y S ) > θ and we call an interv ent ion set X closed w.r.t. to A TT iﬀ ∀ x ∈ X, | AT T ( x → Y | S,X \ x ) | > η , where η is the user-sp eciﬁed minimum causal eﬀect thresh- old. In other words, a causal rule is closed in a subp opu- lation, if its (absolute) eﬀect is greater than any of its sub- rules. Example. In a medical setting, X may b e drugs, S could be comorbid diseases. Then X is a drug-combination that hopefully treats set of diseases S . This set of drugs b eing close d w.r.t. A TT means that dropping any drug from X will reduce the ov erall eﬃcacy of the treatmen t; the patien t is not taking unnecessary drugs. An itemset is closed if its supp ort is strictly higher than all of its subitemsets’. Analogously , an in terv ention set is closed if its absolute causal eﬀect is strictly higher than all of its subitemsets’. 4.2.2 F r equent Causal P attern Mining Algorithm W e can no w present o ur algorithm for causal pattern min- ing. A t a v ery high lev el, the algorithm comprises of tw o nested frequen t pattern enumeration [30] lo ops. The outer loop enumerates subpopulation-deﬁning itemsets S using items in S , while the inner loop enumerates interv en tion com binations using items in X \ S . More generally , X and S can ov erlap but we do not consider that in this pap er. Ef- fectiv e algorithms to this end exists [31, 32], we simply use Apriori [2]. Once the patterns are disco vered, the A TT of the interv en- tions are computed, using one of the methods from Section 4.3 and the frequent, eﬀective patterns are returned. On the surface, this approac h appears v ery exp ensive, ho wev er several no v el, extremely eﬀective prun ing strategies are p ossible and we describ e them b elow. P otential Outcome Support Pruning. Let X b e an in terven tion k -itemset, S b e a subp opulation-deﬁning item- set, and let X and S b e disjoin t. F urther, X − i be an itemset that ev aluates to ‘true’ iﬀ all items except the i th are ‘true’ but the i th item is ‘false’. Using association rule mining ter- minology , all items in X except the i th are presen t in the transaction. Definition 1 (Potential Outcome Suppor t Pruning). We only ne e d to c onsider itemsets X such that min { supp ort( S, X ) , supp ort( { S, X − 1 ) , . . . , supp ort( S, X − k ) } > θ . In order to b e able to estimate the eﬀect of x ∈ X in the subpopulation S , we need to hav e observ ations with x ‘true’ and also with x ‘false’ in S . Lemma 1. Potential Outc ome Supp ort Pruning is anti- monotonic. Proof: Consider a causal rule X → Y | S . If the causal rule X → Y | S is infrequent, then supp ort( X S ) < θ ∨ ∃ i, supp ort( X − i S ) < θ . If supp ort( X − i S ) had insuﬃcient support, then any exten- sion of it with an interv en tion item x will contin ue to hav e insuﬃcien t supp ort, thus the X x → Y | S rule will hav e in- suﬃcien t supp ort. Lik ewise, if supp ort( X S ) had insuﬃcient support, then any extension of it with an interv ention item x will also hav e insuﬃcien t supp ort. Pruning based on Causal Eﬀect. Proposition 1. Eﬀe ctive c ausal rule pruning c ondition is anti-monotonic. Ra tionale: T o explain the rational, let us return to the medical example, where X is a com bination of drugs forming a treatment. Assuming that the eﬀects of drugs are additiv e, if a casual rule X → Y | S is ineﬀective b ecause ∃ x i ∈ X, | A TT( x i → Y | S,X \ x i ) | < η , then forming a new rule X x j → Y | S will also b e ineﬀectiv e because | A TT( x i → Y | S,x j ,X \ x i ) | will b e ineﬀective. In the presence of p ositiv e interaction s (that reinforce each other’s eﬀect) among the drugs, this statemen t may not hold true. Beside statistical reasoning, one can question why a patient should receive a drug that has no eﬀect in a com bination. 4.3 Causal Estimation Methods A TT, our metric of interest, with respect to a single in- terv ention x in a subpopulation S is deﬁned as A TT( x → Y | S ) = E [ Y 1 − Y 0 ] S,X =1 , whic h is the exp ected diﬀerence b etw een the p oten tial out- come under treatment Y 1 and the p oten tial outcome with- out treatment Y 0 in patients with S who actually received treatmen t. Since we consider treated patients, the p oten tial outcome Y 1 can b e observed, the potential outcome Y 0 can- not. Thus at least one of the tw o must be estimated. The methods we presen t b elo w diﬀer in which potential outcome they estimate and how they estimate it. F or the discussion b elow, w e consider the v ariables X , Z , U and V from the causal graph in Figure 4. X is a single in terven tion, U , V and Z can be s ets of items. F or regression models, we will denote the matrix deﬁned by U , V and Z in the subpopulation S as U , V and Z (same letter as the v ariable sets). Coun terfactual Conﬁdence (CC). This is the simplest method. W e simply assume that the patients who receiv e in terven tion X = 1 and those who do not X = 0, do not dif- fer in any imp ortan t resp ect that would inﬂuence Y . Under this assumption, Y 1 in the treated is simply the actual out- come in the treated and the p otential outcome Y 0 is simply the actual outcome in the non-treated ( X = 0). Th us A TT = conf (( X = 1) → Y | S ) − conf (( X = 0) → Y | S ) , = P( Y | S, X = 1) − P( Y | S, X = 0) In the follo wings, to improv e readabilit y , we drop the S subscript. All ev aluations tak e place in the S subpopula- tions. Direct Adjustmen t (DA). W e cannot estimate Y 0 in the treated ( X = 1) as the actual outcome Y in the untreated, because the treated and un treated populations can signiﬁ- can tly diﬀer in v ariables suc h as Z and V that inﬂuence Y . In Direct Adjustmen t, we attempt to directly remo ve the ef- fect of V and Z by including them into a regression model. Since a regression mo del relates the means of the predictors with the mean of the outcome, we can remov e the eﬀect of V and Z by making their means 0. Let R b e a generalized linear regression mo del, predicting Y via a link function g g ( Y | V , Z, X ) = β 0 + β V V + β Z Z + β X X. Then the (link-transformed) p otential outcome under treat- men t is g ( Y 1 ) = β 0 + β V V + β Z Z + β X and the p oten tial outcome without treatment is g ( Y 0 ) = β 0 + β V V + β Z Z . The A TT is then A TT = E  g − 1 ( Y 1 | V , Z, X = 1)  X =1 − E  g − 1 ( Y 0 | V , Z, X = 0)  X =1 . where g − 1 ( Y 1 | V , Z, X = 1) is prediction for an observ ation with the observ ed V and Z but with X set to 1. The E ( · ) X =1 notation signiﬁes that these exp ectation of the predictions are tak en only ov er patients who actually receiv ed the treat- men t. The adv antage of DA (o ver CC) is manyfold. First, it can adjust for Z and V as long the mo del speciﬁcation is correct, namely the in teraction terms that ma y exist among Z and V are sp eciﬁed correctly . Second, w e get correct estimates ev en if we ignore U , b ecause U is conditionally indep enden t of Y giv en X . This unfortunately only is a theoretical adv antage, because w e hav e to infer from the data whether a v ariable is a predictor of Y and U is marginally dependent on Y , so w e will likely adjust for U , even if we don’t need to. Coun terfactual Mo del (CM). In this tec hnique, w e build an explicit mo del for the p otent ial outcome without treat- men t Y 0 using patients with X = 0. Speciﬁcally , we build a model g ( Y | V , Z, X = 0) = β 0 + β V V + β Z Z. and estimate the p otential outcome as g ( Y 0 | V , Z ) = g ( Y | V , Z, X = 0) . The A TT is then A TT = P( Y | X = 1) − E  g − 1 ( Y 0 | V , Z )  X =1 . Similarly to Direct Adjustment, the Co unterfactual Mo del does not depend on U . How ever, in case of the Counterfac- tual Model, we are only considering the p opulation with X = 0. In this p opulation, U and Y are independent, thus w e will not include U v ariables in to the mo del. Prop ensit y Score Matching (PSM). The cen tral idea of Prop ensity Score Matchin g is to create a new population, suc h that patients in this new p opulation are comparable in all relev ant resp ects and thus the exp ectation of the p oten- tial outcome in the untreated equals the exp ectation of the actual outcome in the untreat ed. P atients are matc hed based on their propensity of receiv- ing treatment. This prop ensit y is computed as a logistic regression mo del with treatment as the dep endent v ariable log P( X ) 1 − P( X ) = β 0 + β V V + β Z Z. P atient pairs are formed, such that in each pair, one patien t receiv ed treatmen t and the other did not and their propen- sities for treatmen t diﬀer by no more than a user-deﬁned caliper diﬀerence ρ . The matched p opulation has an equal num b er of treated and un treated patients, is balanced on V and Z , th us the patien ts are comparable in terms of their baseline risk of Y . Hopefully , the only factor causing a diﬀerence in outcome is the treatment. F or estimating A TT, the p oten tial outcome without treat- men t is estimated from the actual outcomes of the patien ts in the matched population who did not receive treatment: AT T = E [ Y 1 − Y 0 ] − P( Y | X = 1 , M ) − P( Y | X = 0 , M ) , where M denotes the matched population. Among the metho ds we consider, propensity score matc h- ing most strictly enforces the patien t comparability crite- rion, how ever, it is susceptible to missp eciﬁcation of the propensity regression model, whic h can erode the quality of the matching . Stratiﬁed Non-Parametric (SN). In the stratiﬁed esti- mation, we directly compute the exp ectation via stratiﬁca- tion. The assumption is that the patient in each stratum are comparable in all relev ant respects and only diﬀer in the presence or absence of interv ention. In each stratum, we can estimate the p otential outcome Y 0 in the treated as the actual outcome Y in the untreated. AT T = E [ Y 1 − Y 0 ] X =1 = X l P ( l | X = 1) [ P ( Y 1 | l, X = 1) − P ( Y 0 | l, X = 1)] = X l P ( l | X = 1) [ P ( Y | X = 1) − P ( Y | X = 0)] , where l iterates ov er the com bined lev els of V and Z . If w e can ident ify the items that fall into U , then we can ignore them, otherwise, w e should include them as well in to the stratiﬁcation. The stratiﬁed metho d mak es v ery few assumptions and should arrive at the correct estimate as long as each of the strata are suﬃcien tly large. The key disadv antage of the stratiﬁed metho d lies in stratiﬁcation itself: when the n um- ber of items across which w e need to stratify is to o large, we ma y end up dividing the p opulation into excessively many small subp opulations (strata) and become unable to esti- mate the causal eﬀect in many of them thus introducing bias into the estimate. 4.4 Results After describing our data and study design, we present three ev aluations of the proposed metho dology . The ﬁrst ev aluation demonstrates the computational eﬃciency of our pruning metho dologies, isolating the eﬀect of eac h prun- ing metho ds: (i) Apriori support-based pruning, (ii) Po- ten tial Outcome Supp ort Pruning, and (iii) Poten tial Out- come Supp ort Pruning in conjunction with Eﬀective Causal Rule Pruning. In the second section, we pro vide a quali- tativ e ev aluation, lo oking at patterns inv olving statin. W e attempt to use the extracted patterns to explain the con- tro versial ﬁndings that exist in the literature regarding the eﬀect of statin on diab etes. Finally , in order to compare the treatmen t eﬀect estimates to a ground truth, whic h does not exits for real drugs, we sim ulate a data set using prop ortions w e deriv ed from the Ma yo Clinic data set. Data and Study Design. In this study we utilized a large cohort of Ma yo Cl inic patien ts with data betw een 1999 and 2013. W e included all adult patients (69,747) with re- searc h consen t. The baseline of our study was set at Jan. 1, 2005. W e collected lab results, medications, vital signs and status, and medication orders during a 6-y ear r etr osp e ctive p erio d b etw een 1999 and the baseline to ascertain the pa- tien t’s baseline comorbidities. F rom this cohort, w e excluded all patien ts with a diagnosis of diabetes b efore the baseline (478 patients), missing fasting plasma glucose measurements (14,559 patien ts), patien ts whose lipid health could not b e determined (1,023 patients) and patien ts with unknown hy- pertension status (498 patien ts). Our ﬁnal study cohort con- sists of 52,139 patien ts who were follow ed un til the summer of 2013. P atients were phenotyped during the retrosp ective p eriod. Comorbidities of interest include Impaired F asting Glucose (IF G), ab dominal ob esity , Hyp ertension (HTN; high blo o d pressure) and hyperlipidemia (HLP; high cho lesterol). F or eac h comorbidit y , the phenotyping algorithm classiﬁed pa- tien ts in to three broad levels of severit y: normal, mild and sev ere. Normal patien ts show no sign of disease; mild pa- tien ts are either untreated and out of con trol or are con- trolled using ﬁrst-line therapy; severe patien ts require more aggressiv e therapy . IFG is categorized in to normal and pre- diabetic, the latter indicating impaired fasting plasma glu- cose lev els but not meeting the diab etes criteria yet. F or this study , progression to T2DM within 5 years from base- line (i.e. Jan 1, 2005) was c hosen as our outcome of in terest. Out of 52,139 patients 3627 patien ts progressed to T2DM , 41028 patien ts did not progressed t o T2DM and the remain- ing patien ts (7484) dropp ed out of the study . In T able 2 we presen t statistics ab out our patient population. 4.4.1 Pruning Efﬁciency In our w ork, w e prop osed tw o new pruning metho ds. First, w e ha ve the P oten tial Outcome Supp ort Pruning, which aims to eliminate patterns for whic h the A TT is not es- timable. Second, we hav e the Eﬀective Causal Rule Pruning, where we eliminate patterns that do not impro ve treatment eﬀectiv eness relativ e to the subitemsets. In Figure 5 we present the num b er of patterns discov ered using (i) the traditional Apriori support based pruning, (ii) our prop osed Poten tial Outcome Support Pruning (POSP), and (iii) POSP in conjunction with Eﬀectiv e Causal Rule Pruning (ECRP). The n um b er of patterns discov ered by POSP is strictly less than the num b er of patterns disco vered b y the Apriori pruning. POSP in conjunction with ECRP is very eﬀective. 4.4.2 Statin In this section, we demonstrate that the prop osed causal rule mining metho dology can be used to discov er non-trivial patterns from the ab ov e diab etes data set. In recent years, the use of statins, a class of cholesterol- lo wering agents, hav e b een prescribed increasingly . High c holesterol (hyperlipidemia) is linked to cardio-v ascular mor- talit y and the eﬃcacy of statins in reducing cardio-v ascular mortalit y is well documented. How ever, as evidenced by a 2013 BMJ editorial [4] devoted to this topic, statins are sur- T2DM Present Absent T otal Number of Patien ts 3627 41028 Averag e Age 44.73 35.58 Male(%) 51 41 F emale(%) 49 59 Patien t Diagnosis Status (%) NormFG 42 84 PreDM 58 16 Normal Ob esity 29 59 Mild Obesity 25 30 Severe Obesity 46 11 Normal Hyp ertension 48 69 Mild Hypertension 33 23 Severe Hypertension 19 08 Normal Hyp erlipidemia 12 29 Mild Hyperlipidemia 72 64 Severe Hyperlipidemia 16 07 Patien t Medication Status(%) Statin 26 11 Fibrates 03 01 Cholesterol.Other 02 01 Acerab 17 07 Diuret 18 07 CCB 08 04 BetaBlocke rs 22 10 HTN.Others 01 01 T able 2: Demographics statistics of patien t p opula- tion Figure 5: Comparison of Pruning T echniques rounded in contro versy . In patien ts with normal bloo d sugar lev els (lab eled as NormalFG), statins hav e a detrimental ef- fect, they increase the risk of diab etes; y et in pre-diabetic patien ts (PreDM), it appears to hav e no eﬀect. What we demonstrate b elo w is that this phenomenon is simply dis- ease heterogeneity . First, w e describe how this problem maps to the causal rule mining problem. Our set of interv ent ions ( X ) consists of statin and our subp opulation deﬁning v ariables consist of the v arious levels of HTN, HLP and IF G ( S ). Our inter- est is the eﬀect of statin ( x ) on T2DM ( Y ) in all p ossible subpopulations S , S ⊂ S . In this setup, HTN, which is associated with b oth hy- perlipidemia (and statin use), as w ell as with T2DM, is a confounder ( Z ). A c holesterol drug, other than statin, (sa y) ﬁbrates, are in the U category: they are predictiv e of statin (patients on monotherap y who take ﬁbrates do not tak e statins), but hav e no eﬀect on Y , b ecause its eﬀect is already incorp orated into the hyperlipidemia sev erity v ari- ables that deﬁned the subp opulation. V ariables that only inﬂuence diabetes but not statin use (say a diab etes drug) w ould fall into the V category . All subpopulations hav e v ari- ables that fall into Z and U and some subp opulation may also hav e V . The HLP v ariable in T able 2 uses statin as part of its deﬁ- nition, thus w e constructed tw o new v ariables. The ﬁrst one is HLP1, a v ariable at the b orderline b etw een HLP-Normal and HLP-Mild, consisting of untreated patients with mildly abnormal lab results (these fall into HLP-Normal) and pa- tien ts who are diagnosed and receiv e a ﬁrst-line treatmen t (they fall in to HLP-Mild). Comparability is the cen tral con- cept of estimating causal eﬀects and these patients are com- parable at baseline. Similarly , we also created another v ari- able, HLP2, which is at the b order of HLP-Mild and HLP- Sev ere, again consisting of patients who are comparable in relev ant asp ects of their health at baseline. S CC D A CM PSM SN PreDM 0.145 0.022 0.010 0.022 0.017 NormF G 0.060 0.023 0.034 0.017 0.029 HLP1 0.078 0.019 0.014 0.010 0.010 HLP2 0.021 -0.013 -0.010 -0.021 -0.015 PreDM,HLP1 0.067 0.018 0.021 0.004 0.002 PreDM,HLP2 0.001 -0.038 -0.031 -0.048 -0.043 NormF G,HLP1 0.043 0.020 0.015 0.014 0.013 NormF G,HLP2 0.017 -0.002 -0.002 -0.005 -0.004 T able 3: A TT due to statin in v arious subp opula- tions S as estimated by the 5 prop osed metho ds. T able 3 presents the A TT estimates obtained by the v ar- ious metho ds prop osed in Section 3.4 for some of the most relev ant subp opulations. Negative A TT indicates b eneﬁcial eﬀect and p ositiv e A TT indicates detrimental eﬀect. Coun terfactual conﬁdence (CC) estimates statin to b e detri- men tal in all subpopulations. While statins are known to ha ve detrimen tal eﬀect in patien ts with normal glucose lev- els [4], it is unlikely that statins are univ ersally detrimental, ev en in patien ts with sev ere h yperlipidemia, the very disease it is supp osed to treat. The results b et ween DA, CM, PSM and SN are similar, with PSM and SN having larger eﬀect sizes in general. The picture that emerges from these results is that patien ts with sev ere h yp erlipidemia app ear to b eneﬁt from statin treat- men t ev en in terms of their diab etes outcomes, while statin treatmen t is mo derately detrimental for patients with mild h yp erlipidemia. Bootstrap estimation was used to compute the statisti- cal signiﬁcance of these results. F or brevity , we rep ort the results only for PSM. Th e estimates are signiﬁcant in the fol- lo wing subp opulations: NormF G, PreDM+HLP2 (p-v alues are < .001) and NormFG+HLP1 (p-v alue .05). The true A TT in these subp opulations is not know. T o in vestigate the accuracy that the v arious methods ac hieve, w e use simulated that is largely based on this example [4, 33]. 4.4.3 Synthetic Data In this section, we describe four experimen ts utilizing syn- thetic data sets, eac h of which introduces a new p otential source of bias. Our ob jective is to illustrate the a bility of the ﬁv e metho ds from Section 3.4 for adjusting for these biases. W e compare their A TT estimates to the true A TT we used to generate the dat a set and discuss reasons for their success or failure. The rows of T able 4 correspond to the synthetic data sets in increasing order of the biases we in tro duced and the columns corresponds to the metho ds: Conf (conﬁdence), CC (Coun- terfactual Conﬁdence), D A (Direct Adjustment), CM (Coun- terfactual Model), PSM (Prop ensity Score Matc hing) and SNP (Stratiﬁed Non-P arametric). Some of these metho ds, DA, CM, PSM and SNP take the causal graph structure into accoun t while estimating A TT. Speciﬁcally , they require the information whether a v ariable is a confounder ( Z ), has a direct eﬀect ( V ), an indirect eﬀect ( V ), or no eﬀect ( O ). PSM and SNP are not sensitiv e to the inclusion of sup erﬂuous v ariables, they simply decrease the method’s eﬃciency . In all of the data sets, we use a notation consistent with Figure 1: Z is the central disease with outcome Y ; X is the in terven tion of interest that treats Z ; V is another disease with direct causal eﬀect on Y , but V is indep enden t of X ; and U is a third disease, which can b e treated with X , but has no impact on Y . All d ata sets con tain 5000 observ ations. I. Dir e ct Causal Eﬀe ct fr om V . W e assume that ev ery pa- tien t in the cohort has disease Z at the same severit y . They are all comparable w.r.t. Z . 30% of the patients are sub- ject to the interv ention X aimed at treating Z , while others are not. Un treated patients face a 25% chance of ha ving Y , while treated patien ts only ha ve 10% c hance. Some patien ts, 20% of the p opulation, also hav e disease V , which directly aﬀects Y : it increases the probabilit y of Y by 5%. In this example the true A TT is -.15, as X reduces the c hance of Y by 15%. Our causal graph dictates that X and V be marginally independent, hence this this eﬀect is homogeneous across the levels of V . (Otherwise V w ould become predictive of X and it would b ecome a confounder. Confounding is discussed in exp eriments I I I-V.) All methods estimated the A TT correctly , b ecause A TT does not dep end on V . W e can demonstrate this b y stratifying on V and using the marginal indep endence of X and V . AT T = E [P( Y | X = 1) − P( Y | X = 0)] = X v ∈ V P( V = v ) [P( Y | V = v, X = 1) − P( Y | V = v , X = 0)] = X v ∈ V [P( Y , V = v | X = 1) − P( Y , V = v | X = 0)] = P( Y | X = 1) − P( Y | X = 0) where v denotes the lev els of V . The marginal independence of X and V is used in step three: P( Y | V , X ) = P( Y , V , X ) P( V , X ) = P( Y , V | X )P( X ) P( X, V ) = P( Y , V | X ) P( V ) . II. Indir e ct Causal Eﬀe ct. The setup for this exp erimen t is the same as for the ’Direct Causal Eﬀect’ exp erimen t, except w e hav e disease U instead of V . Just like Z , disease U is also treated b y X , but U has no direct eﬀect on Y ; its eﬀect is indirect through X . U is thus indep enden t of Y given X . The true A TT contin ues to be -.15. Again, the A TT do es not dep end on U , hence all metho ds estimated it correctly . T o demonstrate that A TT do es not depend on U , we use stratiﬁcation and the conditional inde- pendence of Y and U . AT T = E [P( Y | X = 1) − P( Y | X = 0)] = X u ∈ U [P( Y | U = u, X = 1)P( U = u | X = 1) − P( Y | U = u, X = 0)P( U = u | X = 0)] = X u ∈ U [P( Y | X = 1)P( U = u | X = 1) − P( Y | X = 0)P( U = u | X = 0)] = P( Y | X = 1) X u P( U = u | X = 1) − P( Y | X = 0) X u P( U = u | X = 0) = P( Y | X = 1) − P( Y | X = 0) III. Confounding. In this exp eriment, w e consider the sim- plest case of confounding, inv olving a single disease Z , a single treatmen t X and outcome Y . 20% of the patien ts ha ve disease Z and 95% of the diseased patients are treated with X , while 5% are not. All treated patient s hav e Z . 25% of the untreated patients ( Z = 1 and X = 0) hav e outcome Y ; 10% of the treated patients ( Z = 1 and X = 1) hav e the outcome; and only 5% of the healthy patien ts ( Z = 0) hav e it. The true A TT is -.15. In the presence of confounding, the counterfactual conﬁ- dence and A TT do not coincide. With z denoting the levels of Z and P( z ) being a shorthand for P( Z = z ), AT T = E [P( Y | X = 1) − P( Y | X = 0)] = X z P( z ) [P( Y | X = 1 , z ) − P( Y | X = 0 , z )] , while the count erfactual conﬁdence (CC) is C C = P( Y | X = 1) − P( Y | X = 0) = X z [P( Y | X = 1 , z )P( z | X = 1) − P( Y | X = 0 , z )P( z | X = 0)] . When P( z | X ) 6 = P( z ), these quantities do not coincide. Ho wev er, an y metho d that can estimate P( Y | X, Z ) for all lev els of Z and X will arriv e at the correct A TT estimate. W e used logistic regression in our implementation of the Direct Adjustment metho d, which can estimate P( Y | X , Z ) when X and Z hav e no interactions. Note that the causal graph admits interaction betw een X and Z , th us model mis- speciﬁcation can cause biases in the estimate. IV. Confounding with Indir e ct Eﬀe ct. In addition to the Confounding exp eriment, we also ha ve an indirect causal eﬀect from U . W e now hav e tw o diseases, Z and U , each of which can b e treated with X . 20% of the p opulation has Z and indep enden tly , 20% has U . 25% of the patients who ha ve Z and ha ve no treatment ( X = 0) hav e Y , while only 10% of the treated ( X = 1) patients hav e it, regardless of whether the patient has U . (If the probabilit y of Y w as aﬀected b y U , it would b e another confounder, rather than ha ve an indirect eﬀect.) X has a b eneﬁcial A TT of -.15 in patients with Z == 1 (and X == 1) and has no eﬀect in patients with Z = 0 (who get X b ecause of U ). Th us the true A TT=-.0833. In this experiment, the count erfactual mo del was the b est- performing mo del. The counterfactual mo del estimates the A TT through the deﬁnition A TT = E [P( Y 1 | X = 1) − P( Y 0 | X = 1)] , where Y 0 is the p otential outcome the patient w ould hav e without treatmen t X = 0 and P( Y 0 | X = 1) is the coun- terfactual probabilit y of Y (the probability of Y had they not receiv ed X ) in the p opulation who actually got X = 1. Note that the p otential outcome Y 1 | X = 1 in the patien ts who actually got X = 1 is the observed outcome Y | X = 1. With u and z denoting the levels of U and Z , resp ectively and P( u ) b eing a shorthand for P( U = u ), AT T = E [P( Y | X = 1) − P( Y 0 | X = 1)] = X u X z P( u, z ) [P( Y | X = 1 , u, z ) − P( Y 0 | X = 1 , u, z )] = X z P( z ) X [P( Y | X = 1 , z ) − P( Y 0 | X = 1 , z )] = X z P( z ) X [P( Y | X = 1 , z ) − P( Y | X = 0 , z )] , whic h coincides with the data generation mec hanism, hence the estimate is correct. In the deriv ation, step 2 holds because U and Z are inde- pendent given X and step 3 u ses the fact tha t the counterfac- tual mo del estimates P 0 ( Y | X = 1 , z , u ) from the untreated patien ts, th us P( Y 0 | X = 1 , z , u ) = P( Y | X = 0 , z , u ) = P( Y | X = 0 , z ) . V. Confounding with Dir e ct and Indir e ct Eﬀe cts. In this ex- periment, w e ha ve three diseases: our index disease Z , which is a confounder; U having an indirect eﬀect on Y via X ; and V having a direct eﬀect on Y . 20% of the p opulation has eac h of Z , V and U independently . 95% of patients with Z or U get the interv en tion X . 25% of the un treated patients with Z get Y , while only 10% of the treated patien ts do, regardless of whether they hav e U . P atients with V face a 5% in their chance of exp eriencing outcome Y . X has a b eneﬁcial A TT of -.15 in patients with Z = 1 and ha ve no eﬀect in patients with Z = 0 (who get X because of U ). Whether a patient has V do es not inﬂuence the eﬀect of X . The true A TT is thus -.0833. None of the methods estimated the eﬀect correctly , but Prop en- sit y Score Matching came closest. Analytic deriv ation of wh y it performed w ell is outside the scop e of this pap er, but in essence, its success is driv en b y its abilit y to maximally ex- ploit the indep endence relationships encoded in the causal graph. It can ignore V when it constructs the propensity score model, because X and V are independent (when Y not given); and it can ignore U and V when it computes the A TT in the propensity matched p opulation. On the other hand, the causal graph admits in teraction among U , Z and X , thus a logistic regression mo del as the propensity score model can be sub ject to model misspeciﬁcation. The Stratiﬁed Non-P arametric metho d, whic h is essentially just a direct implementation of the deﬁnition of A TT, un- derestimated the A TT by almost 25%. The reason lies in the excessiv e stratiﬁcation across all combinations of the lev els of U , V , and Z . Ev en with just three v ariables, most strata did not hav e suﬃciently man y patients (either treated or un treated) to estimate P( Y | X , u, v , z ). In the discussion, we will describ e remedies to o vercome this problem. Conf CC D A CM PSM SN I. +.110 -.150 -.150 -.150 NA -.150 II. +.099 -.150 -.150 -.150 -.151 -.149 II I. +.099 +.047 -.136 -.136 -.136 -.136 IV. +.077 +.024 -.019 -.083 -.068 -.064 V. +.072 +.038 -.037 -.105 -.074 -.067 T able 4: The A TT estimates by the 6 metho ds in the ﬁv e exp erimen ts. The exp erimen tal conditions, the full names of the methods and the true A TT v alue are sp eciﬁed in the text. 5. DISCUSSION AND CONCLUSION W e proposed the causal rule mining framework, whic h transitions pattern mining from ﬁnding patterns that are associated with an outcome tow ards patterns that cause c hanges in the outcome. Finding causal relationships in- stead of associations is absolutely critical in health care, but also has app eal b eyond health care. The numerous biases that arise in establishing causation mak e quantifying causal eﬀects diﬃcult. W e use the Neyman- Rubin causal mo del to deﬁne causation and use the p oten- tial outcome framework to estimate the causal eﬀects. W e correct for three kinds of p otential biases: those stemming from direct causal eﬀect, indirect causal eﬀect and confound- ing. W e compared ﬁve diﬀerent metho ds for estimating the causal eﬀect, ev aluated them on real and synthetic data and found that three of these metho ds gav e very similar results. W e hav e demonstrated on real clinical data that our pro- posed metho d can eﬀectively enumerate causal patterns in a large com binatorial searc h space due to the tw o new pruning methods we dev elop ed for this work. W e also demonstrated that the patterns disco v ered from the data were very rich and we managed to illustrate ho w the eﬀect of statin is dif- feren t in v arious subpopulations. The results we found are consisten t with the literature but go b eyond what is already kno wn about statin’s eﬀect on the risk of diab etes. The discussions and exp erimen tal results provided in this paper pro vide some general guidance on when to use the diﬀeren t methods we described. W e recommend counterfac- tual conﬁdence if no confound ing is susp ected as coun terfac- tual conﬁdence is computationally eﬃcien t and can arriv e at the correct solution even when direct eﬀects and indirect eﬀects are present. In the presence of confounding, propen- sit y score matc hing gav e the most accurate results, but due to the need to create a matched p opulation, it has built-in randomness, increasing its v ariance. Moreo ver, the coun ter- factual mo del as well as the prop ensit y score model are sus- ceptible to mo del missp eciﬁcation. If unknown in teractions among v ariables are susp ected, we recommend the strati- ﬁed non-parametric metho d. With this technique, mo del misspeciﬁcation is virtually imp ossible, how ever, its sample size requiremen t is high. The stratiﬁed model is subopti- mal if we need to stratify across man y v ariables. Stratify- ing across man y v ariables can fragment the p opulation into man y strata too small to aﬀord us with the ability to esti- mate the eﬀects correctly . If the estimates use some strata but not others, they may be biased. Acknowledgments This study is supp orted b y National Science F oundation (NSF) gran t: IIS-1344135. Conten ts of this do cument are the sole resp onsibility of the authors and do not necessarily represen t oﬃcial views of the NSF. 6. ADDITIONAL A UTHORS 7. REFERENCES [1] National diab etes fact sheet: national estimates and general information on diabetes and prediab etes in the united states, 2011. Atlanta, GA: US Dep artment of He alth and Human Servic es, Centers for Dise ase Contr ol and Pr evention , 201, 2011. [2] R. Agraw al and R. Srik ant. F ast algorithms for mining association rules. In Pr o c. 20th int. c onf. very lar ge data b ases, VLDB , volume 1215, pages 487–499, 1994. [3] J. Sekhon. The neyman-rubin model of causal inference and estimation via matc hing methods. The Oxfor d handbo ok of p olitic al metho dolo gy , pages 271–299, 2008. [4] R. Huupp onen and J. Viik ari. Statins and the risk of dev eloping diabetes. BMJ , 346, 2013. [5] V. Didelez and I. Pigeot. Judea p earl: Causalit y: Models, reasoning, and inference. Politische Vierteljahr esschrift , 42(2):313–315, 2001. [6] P . Rosenb aum. Observational studies . Springer, 2002. [7] D. F reedman. F rom association to causation via regression. A dvanc es in applie d mathematics , 18(1):59–110, 1997. [8] W. Bielby and R. Hauser. Structural equation models. Annu al r eview of so ciolo gy , pages 137–161, 1977. [9] J. Robins, M. Hernan, and B. Brum back. Marginal structural mo dels and causal inference in epidemiology . Epidemiolo gy , pages 550–560, 2000. [10] C. Granger. Some recent developmen t in a concept of causalit y . Journal of e c onometrics , 39(1):199–211, 1988. [11] F. Jensen. A n intr o duction to Bayesian networks , v olume 210. UCL press London, 1996. [12] F. Elwert. Graphical causal mo dels. In Handb o ok of c ausal analysis for so cial r ese ar ch , pages 245–273. Springer, 2013. [13] K. Murphy . Dynamic bay esian net wo rks. Pr ob abilistic Gr aphic al Mo dels, M. Jor dan , 2002. [14] D. Heck erman. A bay esian approach to learning causal net works. In UAI , pages 285–295. Morgan Kaufmann Publishers Inc., 1995. [15] D. Heck erman. Bay esian netw orks for data mining. Data mining and know le dge disc overy , 1(1):79–119, 1997. [16] G. Co op er. A simple constrain t-based algorithm for eﬃcien tly mining observ ational databases for causal relationships. Data Mining and Know ledg e Disc overy , 1(2):203–224, 1997. [17] C. Silverstein, S. Brin, R. Mot wan i, and J. Ullman. Scalable techniques for mining causal structures. Data Mining and Know le dge Disc overy , 4(2-3):163–192, 2000. [18] S. Mani, P . Spirtes, and G. Coop er. A theoretical study of y structures for causal discov ery . arXiv pr eprint arXiv:1206.6853 , 2012. [19] P . Austin. An introduction to propensity score methods for reducing the eﬀects of confounding in observ ational studies. Multivariate b ehavior al r ese ar ch , 46(3):399–424, 2011. [20] J. Lunceford and M. Da vidian. Stratiﬁcation and w eighting via the prop ensit y score in estimation of causal treatment eﬀects: a comparative study . Statistics in me dicine , 23(19):2937–2960, 2004. [21] P . Austin and E. Stuart. Estimating the eﬀect of treatmen t on binary outcomes using full matching on the prop ensity score. Statistic al metho ds in me dic al r ese ar ch , page 0962280215601134, 2015. [22] P . Rosen baum and D. Rubin. The cen tral role of the propensity score in observ ational studies for causal eﬀects. Biometrika , 70(1):41–55, 1983. [23] P . Austin and E. Stuart. Movi ng tow ards b est practice when using inv erse probabilit y of treatment weigh ting (ipt w) using the propensity score to estimate causal treatmen t eﬀects in observ ational studies. Statistics in me dicine , 34(28):3661–3679, 2015. [24] D. Lambert and D. Pregibon. More bang for their buc ks: Assessing new features for online advertisers. In Workshop on Data mining and audienc e intel ligenc e for advertising , pages 7–15. A CM, 2007. [25] D. Chan, R. Ge, O. Gershony , T. Hesterb erg, and D. Lambert. Ev aluating online ad campaigns in a pipeline: causal models at scale. In ACM-SIGKDD , pages 7–16. AC M, 2010. [26] J. Li, T. Le, L. Liu, J. Liu, Z. Jin, and B. Sun. Mining causal asso ciation rules. In Data Mining Workshops (ICDMW), 2013 IEEE 13th International Confer enc e on , pages 114–123. IEEE, 2013. [27] P . Holland and D. Thay er. Diﬀeren tial item performance and the mantel-haenszel procedure. T est validity , pages 129–145, 1988. [28] J. Li, T. Le, L. Liu, J. Liu, Z. Jin, B. Sun, and S. Ma. F rom observ ational studies to causal rule mining. AC M T r ansactions on Intel ligent Systems and T e chnolo gy (TIST) , 7(2):14, 2015. [29] J. Li, S. Ma, T. Le, L. Liu, and J. Liu. Causal decision trees. arXiv prepri nt arXiv:1508.03812 , 2015. [30] B. Go ethals. Survey on frequent pattern mining. Univ. of Helsinki , 2003. [31] J. Han, J. Pei, B. Mortaza vi-Asl, Q. Chen, U. Day al, and M. Hsu. F reespan: frequen t pattern-pro jected sequen tial pattern mining. In ACM-SIGKDD , pages 355–359. ACM, 2000. [32] J. Han, J. Pei, Y. Yin, and R. Mao. Mining frequent patterns without candidate generation: A frequen t-pattern tree approach. Data mining and know le dge disc overy , 8(1):53–87, 2004. [33] M Regina Castro, Gyorgy Simon, Stephen S Cha, Barbara P Y a wn, L Joseph Melton I I I, and Ped ro J Caraballo. Statin use, diab etes incidence and o verall mortalit y in normoglycemic and impaired fasting glucose patients. Journal of Gener al Internal Me dicine , pages 1–7.

Causal Inference in Observational Data

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment