Omitted Variable Bias in Language Models Under Distribution Shift

Reading time: 5 minute
...

📝 Original Info

  • Title: Omitted Variable Bias in Language Models Under Distribution Shift
  • ArXiv ID: 2602.16784
  • Date: 2026-02-18
  • Authors: ** 논문에 명시된 저자 정보가 제공되지 않았습니다. (가능하면 원문에서 확인 후 추가) **

📝 Abstract

Despite their impressive performance on a wide variety of tasks, modern language models remain susceptible to distribution shifts, exhibiting brittle behavior when evaluated on data that differs in distribution from their training data. In this paper, we describe how distribution shifts in language models can be separated into observable and unobservable components, and we discuss how established approaches for dealing with distribution shift address only the former. Importantly, we identify that the resulting omitted variable bias from unobserved variables can compromise both evaluation and optimization in language models. To address this challenge, we introduce a framework that maps the strength of the omitted variables to bounds on the worst-case generalization performance of language models under distribution shift. In empirical experiments, we show that using these bounds directly in language model evaluation and optimization provides more principled measures of out-of-distribution performance, improves true out-of-distribution performance relative to standard distribution shift adjustment methods, and further enables inference about the strength of the omitted variables when target distribution labels are available.

💡 Deep Analysis

📄 Full Content

For language models to be widely useful in real-world applications, they must remain robust, reliable, and performant across data distributions and contexts. However, even for modern large language models (LLMs) capable of impressive performance on diverse tasks, the challenge of distribution shift remains. When evaluated on data that differs in distribution from their training data, LLMs often exhibit brittle behavior, such as difficulty answering questions containing simple mutations that do not appear in training (Xu et al., 2025;Huang et al., 2025) or failures in reasoning given counterfactual information that alters the question's world state (González & Nori, 2024;Hüyük et al., 2025).

Established approaches for dealing with distribution shift in language models account for shifts that are observable to language models-that is, shifts resulting from variables that models are able to capture from the text. We place a novel focus on a component of distribution shift that these existing methods fail to address: the unobservable component of distribution shift, or shifts over variables that cannot be captured directly by language models. This non-observability may occur for two reasons. (i) The shift may occur over variables that are external to the text, such as attributes of the individuals who wrote or labeled the texts. These variables typically remain unmeasured in observational text settings.

(ii) The shift may occur over information that is contained in the text but not in a model’s representation of the text. While this information is observable to a human reader, it is not observable to a language model. This type of information loss is significant, since it almost inevitably occurs in language data as nearly infinite-dimensional, unstructured raw text is reduced to a lower-dimensional numerical form that can be used by models.

In this paper, we first identify that the omitted variable bias (OVB) arising from these unobserved variables can compromise the evaluation and optimization of language models under distribution shift. In evaluation, failure to account for omitted variables can lead to overly optimistic performance estimates in the target domain; in optimization, it can yield models that still remain brittle despite adjustment for observable sources of distribution shift.

To address these challenges, we introduce a framework that maps the strength of the omitted variables to a bound on a language model’s generalization performance. Under this framework, although the omitted variables themselves are unobserved, their potential influence can be benchmarked and used to identify a set of plausible target distributions over which the worst-case generalization performance of a language model can be defined. This worst-case generalization bound gives rise to three primary contributions, which we demonstrate empirically. First, the bound provides a more principled and robust metric for evaluating generalization performance in the absence of target distribution labels, improving on standard adjust-ment objectives that account only for observed distribution shift. Second, the bound can be directly optimized to produce models that are explicitly more robust to unobserved sources of shift and that generalize more reliably to the target distribution. Third, when target labels are available and true test performance can be computed, the bound can be used to infer the strength of the omitted variables for a given model under distribution shift, offering interpretability to models that are otherwise opaque.

Domain adaptation and distribution shift. Model performance degradation resulting from differences between source (training) and target (test) distributions has long been studied in the machine learning literature. These distribution shifts arise from a variety of factors, such as differing domains or disparities among subgroups (Arjovsky et al., 2020;Yang et al., 2023). Even for modern LLMs, selection biases when collecting data for preference alignment may skew the distribution of human feedback during posttraining (Lin et al., 2024); and recent work on reasoning suggests that LLMs’ strong performance may be at least partially attributed to memorization of their training data, as even minor perturbations in test distributions can elicit large drops in performance (Xu et al., 2025;Huang et al., 2025).

One of the most common and portable methods for addressing distribution shift is importance weighting, which reweights training samples to match the target distribution via a density ratio. Importance-weighted estimators are often combined with outcome models to form doubly robust estimators that are more robust to misspecification and estimation error (Robins et al., 1994). These methods are widely used to correct for covariate and label shift across prediction, recommendation, and reinforcement learning tasks (Byrd & Lipton, 2019;Kallus et al., 2022;Kim et al., 2022;Lin et al., 2024). However, whe

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut