General sample size analysis for probabilities of causation: a delta method approach

Reading time: 5 minute
...

📝 Original Info

  • Title: General sample size analysis for probabilities of causation: a delta method approach
  • ArXiv ID: 2602.17070
  • Date: 2026-02-19
  • Authors: ** 정보 없음 (논문에 저자 정보가 제공되지 않았습니다.) **

📝 Abstract

Probabilities of causation (PoCs), such as the probability of necessity and sufficiency (PNS), are important tools for decision making but are generally not point identifiable. Existing work has derived bounds for these quantities using combinations of experimental and observational data. However, there is very limited research on sample size analysis, namely, how many experimental and observational samples are required to achieve a desired margin of error. In this paper, we propose a general sample size framework based on the delta method. Our approach applies to settings in which the target bounds of PoCs can be expressed as finite minima or maxima of linear combinations of experimental and observational probabilities. Through simulation studies, we demonstrate that the proposed sample size calculations lead to stable estimation of these bounds.

💡 Deep Analysis

📄 Full Content

Probabilities of causation (PoCs) are used in many real-world applications, such as marketing, law, social science, and health science, especially when decisions depend on whether an action caused an outcome. For example, Li and Pearl (2022) introduced a "benefit function" that is a linear combination of PoCs and reflects the payoff or cost of selecting an individual with certain features, with the goal of finding those most likely to show a target behavior. Stott et al. (2004) used it in climate event assignment to quantify how much human influence changes the risk of an extreme event. Mueller and Pearl (2023) argued that PoCs can be used for personalized decision making, and Li et al. (2020) found that PoCs can be helpful for improving the accuracy of some machine learning methods.

Without extra assumptions, PoCs are generally not identifiable, so one often works with bounds instead of point values. Using the structural causal model (SCM), Pearl (1999) defined three binary PoCs, including PNS, PN, and PS. Tian and Pearl (2000) derived bounds for these quantities using both experimental and observational information, and later Li andPearl (2019, 2024) provided formal proofs. Several papers studied how to tighten these bounds. For example, Mueller et al. (2021) used covariates and causal structure to narrow the bounds for PNS, and Dawid et al. (2017) used covariates to narrow the bounds for PN.

Most of these works implicitly assume that the experimental and observational samples are large enough to estimate the needed probabilities well. However, there is little work on how to choose sample sizes so that the estimated bounds have a desired level of precision. This gap limits the use of the theory in real applications. Li et al. (2022) discussed this issue, but focused on a special case for the PNS bound. To our knowledge, there is still no unified framework that links a target error level to required experimental and observational sample sizes for general PoC bounds.

In this paper, we study adequate sample sizes for estimating bounds of PoCs from a general perspective. Our starting point is that many sharp bounds can be written as a finite minimum or maximum of explicit functions of a finite set of probabilities, including experimental probabilities such as P (y x ) and observational probabilities such as P (x, y). For PNS, the bound components are typically linear in these probabilities; for PN and PS, the bound components often take a ratio form. Under mild regularity conditions (e.g., denominators bounded away from zero), the bound components are smooth transformations of the underlying probability vector. The resulting lower and upper bounds can still be non-smooth because they are formed by finite minima or maxima. The fact that the components are smooth but the bounds can be non-smooth leads to different asymptotic behavior in the two cases, which our results accommodate.

Our main contributions are:

• We propose a general sample size framework for estimating PoC bound endpoints (i.e., lower and upper bounds) with a pre-specified margin of error, based on multivariate deltamethod variance approximations for smooth endpoints, and a directional delta method implemented via numerical methods for non-smooth endpoints, following Fang and Santos (2019).

• Our asymptotic results apply whenever the bound endpoints can be expressed as finite minima or maxima of explicit bound components. This covers common forms in the PoC literature and also extends to other bounded causal quantities, including linear combinations of PoCs.

• We provide simulation studies showing that the proposed sample sizes are stable and sufficient in practice, and that they are far less conservative than existing results in the literature.

In this section, we briefly review the definitions of the three aspects of binary causation following Tian and Pearl (2000). Our analysis is based on the counterfactual framework within structural causal models (SCMs) as introduced in Pearl (2009). We denote by Y x = y the counterfactual statement that variable Y would take value y if X were set to x. Throughout the paper, we use y x to represent the event Y x = y, y x ′ for Y x ′ = y, y ′

x for Y x = y ′ , and y ′ x ′ for Y x ′ = y ′ . Experimental information is summarized through causal quantities such as P (y x ), while observational information is summarized by joint distributions such as P (x, y). Unless otherwise stated, X denotes the treatment variable and Y denotes the outcome variable.

For the following three probabilities of causation, we all assume that X and Y are two binary variables in a causal model M . Let x and y stand for the propositions X = true and Y = true, respectively, and x ′ and y ′ for their complements. Then we have:

Definition 1 (Probability of Necessity (PN)). The probability of necessity is defined as:

(1)

Definition 2 (Probability of Sufficiency (PS)). The probability of sufficiency is defined as:

(2)

Defin

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut