Estimating high-dimensional intervention effects from observational data

February 23, 2026

Reading time: 6 minute

...

📝 Original Info

Title: Estimating high-dimensional intervention effects from observational data
ArXiv ID: 0810.4214
Date: 2009-09-02
Authors: Researchers from original ArXiv paper

📝 Abstract

We assume that we have observational data generated from an unknown underlying directed acyclic graph (DAG) model. A DAG is typically not identifiable from observational data, but it is possible to consistently estimate the equivalence class of a DAG. Moreover, for any given DAG, causal effects can be estimated using intervention calculus. In this paper, we combine these two parts. For each DAG in the estimated equivalence class, we use intervention calculus to estimate the causal effects of the covariates on the response. This yields a collection of estimated causal effects for each covariate. We show that the distinct values in this set can be consistently estimated by an algorithm that uses only local information of the graph. This local approach is computationally fast and feasible in high-dimensional problems. We propose to use summary measures of the set of possible causal effects to determine variable importance. In particular, we use the minimum absolute value of this set, since that is a lower bound on the size of the causal effect. We demonstrate the merits of our methods in a simulation study and on a data set about riboflavin production.

💡 Deep Analysis

Deep Dive into Estimating high-dimensional intervention effects from observational data.

📄 Full Content

arXiv:0810.4214v3 [stat.ME] 2 Sep 2009 The Annals of Statistics 2009, Vol. 37, No. 6A, 3133–3164 DOI: 10.1214/09-AOS685 c ⃝Institute of Mathematical Statistics, 2009 ESTIMATING HIGH-DIMENSIONAL INTERVENTION EFFECTS FROM OBSERVATIONAL DATA By Marloes H. Maathuis, Markus Kalisch and Peter B¨uhlmann ETH Z¨urich We assume that we have observational data generated from an unknown underlying directed acyclic graph (DAG) model. A DAG is typically not identiﬁable from observational data, but it is possible to consistently estimate the equivalence class of a DAG. Moreover, for any given DAG, causal eﬀects can be estimated using intervention calculus. In this paper, we combine these two parts. For each DAG in the estimated equivalence class, we use intervention calculus to esti- mate the causal eﬀects of the covariates on the response. This yields a collection of estimated causal eﬀects for each covariate. We show that the distinct values in this set can be consistently estimated by an algorithm that uses only local information of the graph. This lo- cal approach is computationally fast and feasible in high-dimensional problems. We propose to use summary measures of the set of pos- sible causal eﬀects to determine variable importance. In particular, we use the minimum absolute value of this set, since that is a lower bound on the size of the causal eﬀect. We demonstrate the merits of our methods in a simulation study and on a data set about riboﬂavin production. 1. Introduction. Our work is motivated by the following problem in bi- ology. We want to know which genes play a role in a certain phenotype, say a disease status or, in our case, a continuous value of riboﬂavin (vi- tamin B2) production in the bacterium Bacillus subtilis. To be more pre- cise, our goal is to infer which genes have an eﬀect on the phenotype in terms of an intervention. If we knocked down single genes, which of them would show a relevant or important eﬀect on the phenotype? The diﬃculty is, however, that the available data are only observational. For our con- crete problem, we observe the logarithm of the riboﬂavin production rate as a continuous response and expression measurements from essentially the Received October 2008; revised January 2009. AMS 2000 subject classiﬁcations. 62-09, 62H99. Key words and phrases. Causal analysis, directed acyclic graph (DAG), graphical mod- eling, intervention calculus, PC-algorithm, sparsity. This is an electronic reprint of the original article published by the Institute of Mathematical Statistics in The Annals of Statistics, 2009, Vol. 37, No. 6A, 3133–3164. This reprint diﬀers from the original in pagination and typographic detail. 1 2 M. H. MAATHUIS, M. KALISCH AND P. B¨UHLMANN whole genome of B. subtilis as high-dimensional covariates. Using such ob- servational data, we want to infer all (single gene) intervention eﬀects. This task coincides with inferring causal eﬀects, a well-established area in Statis- tics (e.g., [5, 8, 10, 11, 13, 18, 24, 25, 26] and [31]). We emphasize that, in our application, it is exactly the intervention or causal eﬀect that is of in- terest, rather than a regression-type eﬀect of association. If we can estimate the intervention eﬀects from observational data, we can score each gene ac- cording to its potential to have an intervention (knock-down) eﬀect on the riboﬂavin production rate, and the most promising candidate genes can be tested afterward in biological experiments. Pearl ([25], page 285) formulates the distinction between associational and causal concepts as follows: “an associational concept is any relationship that can be deﬁned in terms of a joint distribution of observed variables, and a causal concept is any relationship that cannot be deﬁned from the distribu- tion alone... . Every claim invoking causal concepts must be traced to some premises that invoke such concepts; it cannot be inferred or derived from statistical associations alone.” Thus, in order to obtain causal statements from observational data, one needs to make additional assumptions. One possibility is to assume that the data were generated by a directed acyclic graph (DAG) which is known beforehand. DAGs describe causal concepts, since they code potential causal relationships between variables: the exis- tence of a directed edge x →y means that x may have a direct causal eﬀect on y, and the absence of a directed edge x →y means that x cannot have a direct causal eﬀect on y (see Remark 2.3 for a deﬁnition of direct causal eﬀect). Given a set of conditional dependencies from observational data and a corresponding DAG model, one can compute causal eﬀects using interven- tion calculus (e.g., [24] and [25]). In this paper, we consider the problem of inferring causal information from observational data, under the assumption that the data were generated by an unknown DAG. This is a more realistic assumption, since, in many practical problems, one does not know the DAG. In this scenario, the causal

…(Full text truncated)…

🇰🇷 이 논문을 한글로 읽기

📄 Read Full PDF on ArXiv

Reference

This content is AI-processed based on ArXiv data.

Estimating high-dimensional intervention effects from observational data

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

Reference

Table of Contents

Table of Contents

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

Reference

Related Posts

Information Preserving Component Analysis: Data Projections for Flow Cytometry Analysis

A Time-Evolving 3D Method Dedicated to the Reconstruction of Solar plumes and Results Using Extreme Ultra-Violet Data

Estimation of missing data by using the filtering process in a time series modeling

Start searching

No results found