Anti-causal domain generalization: Leveraging unlabeled data

Reading time: 5 minute
...

📝 Original Info

  • Title: Anti-causal domain generalization: Leveraging unlabeled data
  • ArXiv ID: 2602.17187
  • Date: 2026-02-19
  • Authors: ** - Sorawit Saengkyongam (Apple) - (공동 저자들, 소속: Apple, ETH Zürich 등 – 논문에 명시된 전체 저자 리스트는 원문을 참고) **

📝 Abstract

The problem of domain generalization concerns learning predictive models that are robust to distribution shifts when deployed in new, previously unseen environments. Existing methods typically require labeled data from multiple training environments, limiting their applicability when labeled data are scarce. In this work, we study domain generalization in an anti-causal setting, where the outcome causes the observed covariates. Under this structure, environment perturbations that affect the covariates do not propagate to the outcome, which motivates regularizing the model's sensitivity to these perturbations. Crucially, estimating these perturbation directions does not require labels, enabling us to leverage unlabeled data from multiple environments. We propose two methods that penalize the model's sensitivity to variations in the mean and covariance of the covariates across environments, respectively, and prove that these methods have worst-case optimality guarantees under certain classes of environments. Finally, we demonstrate the empirical performance of our approach on a controlled physical system and a physiological signal dataset.

💡 Deep Analysis

📄 Full Content

Machine learning models are often trained on data from a limited set of environments and subsequently deployed in new, previously unseen environments. A central challenge in this setting is domain generalization: learning predictive models that perform well not only on the training environments but also on novel test environments that may differ from those seen during training (Blanchard et al., 2011;Muandet et al., 2013). This challenge is particularly acute in high-stakes applications, such as healthcare, where distribution shifts between hospitals, patient populations, or measurement devices can significantly impact model perfor-1 Apple 2 ETH Zürich. Correspondence to: Sorawit Saengkyongam .

Preprint. February 20, 2026. mance (Subbaswamy & Saria, 2020;DeGrave et al., 2021).

One prominent approach to domain generalization leverages the framework of structural causal models (SCMs; Pearl, 2009;Peters et al., 2016) to characterize the mechanisms by which distributions shift across environments. Under this framework, one can identify predictors that remain invariant, and hence robust, across a class of environments implied by the underlying structural causal model. This perspective has led to a variety of methods that exploit invariance for robust prediction (Rojas-Carulla et al., 2018;Magliacane et al., 2018;Arjovsky et al., 2019;Heinze-Deml & Meinshausen, 2021;Rothenhäusler et al., 2021;Pfister et al., 2021;Saengkyongam et al., 2022;Shen et al., 2026). However, existing methods typically require labeled data from multiple environments to estimate invariance properties, which limits their applicability when labeled data are scarce or expensive to obtain.

In many practical settings, unlabeled data are more abundant than labeled data. This motivates studying domain generalization in a type of semi-supervised setting: learning robust models using labeled data from only a small number of environments while leveraging unlabeled data from many others. A key question we address is under which assumptions unlabeled data can provide useful information about the structure of distribution shifts, even without outcome labels. This work demonstrates that this is possible under an anti-causal learning setting, where the outcome Y causes the observed predictors X. Anti-causal structures arise naturally in many applications. In healthcare, a patient’s underlying physiological state (the outcome) often causes the observable measurements (the predictors). In speech recognition, the spoken content (the outcome) causes the observed audio signal (the predictors).

In an anti-causal setting, environment perturbations that only affect the predictors X do not propagate to the outcome Y . Nevertheless, such perturbations can induce both covariate and concept shifts (see Section 3), harming predictive performance in unseen environments. To mitigate these distributional shifts, we develop regularization strategies that penalize sensitivity to environment perturbations. Crucially, the perturbation directions can be estimated solely from the marginal distribution of X, without requiring la-bels. We provide theoretical guarantees showing that our regularized estimators are optimal for worst-case risk over a class of environments characterized by the directions of variation in the unlabeled data, with the degree of extrapolation controlled by the regularization strength.

Contributions. Our main contributions are as follows.

  1. We formalize an anti-causal domain generalization framework in a semi-supervised setting, where labeled data are available from a few environments, and unlabeled data are available from many others (Section 3). 2. We propose two regularization strategies, Mean-based Invariant Regularization (MIR) and Variance-based Invariant Regularization (VIR), that exploit distributional variations in the unlabeled data to encourage robustness. We provide theoretical guarantees, showing that the regularized estimators are optimal in terms of worst-case risk over certain classes of environments (Section 4). 3. We evaluate our methods on two real-world datasets that naturally exhibit an anti-causal structure: a controlled physical system and a physiological signal dataset (Section 7).

Our work builds on the literature that exploits invariance in heterogeneous data to address domain generalization; see Bühlmann (2020) for an overview. The concept of invariant prediction (Peters et al., 2016) has been employed to identify robust (or stable) predictors across various settings (Rojas-Carulla et al., 2018;Heinze-Deml et al., 2018;Magliacane et al., 2018;Pfister et al., 2021;Heinze-Deml & Meinshausen, 2021;Saengkyongam et al., 2023;2024a). Several works have extended this idea from selecting invariant predictors to incorporating invariance as a regularization term (Arjovsky et al., 2019;Heinze-Deml & Meinshausen, 2021;Rothenhäusler et al., 2021;Saengkyongam et al., 2022;Shen et al., 2026). These methods le

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut