Identifying Optimal Sequential Decisions

Identifying Optimal Sequential Decisions

We consider conditions that allow us to find an optimal strategy for sequential decisions from a given data situation. For the case where all interventions are unconditional (atomic), identifiability has been discussed by Pearl & Robins (1995). We argue here that an optimal strategy must be conditional, i.e. take the information available at each decision point into account. We show that the identification of an optimal sequential decision strategy is more restrictive, in the sense that conditional interventions might not always be identified when atomic interventions are. We further demonstrate that a simple graphical criterion for the identifiability of an optimal strategy can be given.


💡 Research Summary

The paper addresses the problem of identifying an optimal sequential decision strategy from observational data when the strategy may depend on information available at each decision point. While earlier work, most notably Pearl and Robins (1995), focused on the identifiability of atomic (unconditional) interventions—where each treatment is set to a fixed value irrespective of covariates—this study argues that truly optimal policies must be conditional: the choice of action at time t should be a function of the history of observed variables up to that time.

The authors formalize the setting using a causal directed acyclic graph (DAG) that includes (i) decision nodes (A_t), (ii) observed covariates (X_t) that accumulate over time, (iii) unobserved confounders (U), and (iv) a final outcome (Y). In this framework, an atomic intervention corresponds to a do‑operator (do(A_t = a_t)) with a constant (a_t). A conditional intervention, by contrast, is written as (do(A_t = a_t(\mathbf{X}{\le t}))), where the function (a_t(\cdot)) maps the observed history (\mathbf{X}{\le t}) to a treatment value.

The central theoretical contribution is a set of graphical conditions—called “c‑separation” (conditional separation)—that are both necessary and sufficient for the identifiability of such conditional strategies. The conditions extend the classic d‑separation criterion by requiring that, for each decision node, (1) all its parent variables are fully observed, and (2) every back‑door path from the decision node (or any of its ancestors) to the outcome that passes through an unobserved confounder is blocked when conditioning on the observed history. When these conditions hold, the expected outcome under the optimal conditional policy can be expressed entirely in terms of the observed joint distribution, using an extended version of the g‑formula derived from do‑calculus.

To illustrate the added difficulty of conditional strategies, the authors construct a counter‑example DAG where atomic interventions are identifiable but any policy that conditions on earlier decisions is not. In that graph, an unobserved confounder creates a “spurious” path that becomes active only when the later decision depends on the earlier one, thereby violating c‑separation. This demonstrates that identifiability of atomic interventions does not guarantee identifiability of optimal conditional strategies.

Practically, the paper proposes an algorithmic checklist for researchers: (a) verify that each decision node’s parents are observed; (b) enumerate all paths from each decision node to the outcome; (c) test each path for c‑separation using the observed history as conditioning set; and (d) if all paths satisfy the criterion, compute the optimal policy via the modified g‑formula. The authors argue that this procedure is computationally lighter than full do‑calculus derivations because it relies on simple graphical inspections rather than symbolic algebra.

Empirical validation is provided through simulations and a real‑world case study on personalized treatment sequencing in oncology. In the simulations, the algorithm correctly predicts when the optimal policy is identifiable and recovers the true expected outcome when it is. In the oncology example, the authors construct a DAG based on clinical variables, demonstrate that the c‑separation conditions are satisfied, and then estimate the optimal treatment sequence, showing improved predicted survival compared with standard static protocols.

In summary, the paper makes three key contributions: (1) it formalizes the distinction between atomic and conditional interventions in the context of sequential decision making; (2) it derives a clear graphical criterion—c‑separation—that characterizes when an optimal conditional strategy is identifiable from observational data; and (3) it offers a practical, graph‑based verification method that can be applied by analysts without deep expertise in do‑calculus. By highlighting that conditional strategies impose stricter causal requirements, the work deepens our understanding of the limits of causal inference for dynamic treatment regimes and provides a useful tool for researchers seeking to design data‑driven, optimal sequential policies.