Domain Generalization Under Posterior Drift
Domain generalization (DG) is the problem of generalizing from several distributions (or domains), for which labeled training data are available, to a new test domain for which no labeled data is available. For the prevailing benchmark datasets in DG, there exists a single classifier that performs well across all domains. In this work, we study a fundamentally different regime where the domains satisfy a \emph{posterior drift} assumption, in which the optimal classifier might vary substantially with domain. We establish a decision-theoretic framework for DG under posterior drift, and investigate the practical implications of this framework through experiments on language and vision tasks.
💡 Research Summary
Domain generalization (DG) traditionally assumes that a single classifier can perform well across all source domains, an assumption that holds for many computer‑vision benchmarks where only the marginal distribution of inputs changes. This paper challenges that view by focusing on a fundamentally different regime called posterior drift, where the conditional distribution P(Y|X) varies substantially across domains. Such drift is common in natural‑language processing (e.g., annotator disagreement) and other settings where domain‑specific factors affect label semantics.
The authors introduce a unified probabilistic framework that treats a domain D, a metadata variable M (partial information about D), and the usual input‑label pair (X,Y) as jointly distributed. Crucially, they assume conditional independence of (X,Y) and M given D, i.e., P(X,Y|D,M)=P(X,Y|D). This allows the metadata to be viewed as a summary of the domain that can be observed at test time (for example, the marginal distribution P_X|D or annotator identity).
Within this framework they define three risk settings:
- No domain information (Pooling‑ERM) – the classifier sees only X. The optimal predictor is the Bayes classifier for the marginal joint distribution P(X,Y), yielding risk R*_pool.
- Partial domain information (Domain‑Informed ERM, DI‑ERM) – the classifier sees X and M. The optimal predictor is the Bayes classifier for P(Y|X,M), with risk R*_DI.
- Full domain information – the classifier sees the true domain label D. This yields the lowest possible risk R*_full, which serves as a theoretical lower bound.
A simple hierarchy follows: R*_pool ≥ R*_DI ≥ R*_full. The paper’s main theoretical contribution is to quantify when the first inequality is strict. They introduce a point‑wise margin γ(x,m) = max_k P(Y=k|X=x,M=m) – second‑max_k P(Y=k|X=x,M=m). Intuitively, γ measures the confidence of the Bayes decision given the metadata.
Theorem 1 shows that if posterior drift is present and the expected margin is positive, then the risk gap satisfies
R*_pool – R*_DI ≥ c·E
Comments & Academic Discussion
Loading comments...
Leave a Comment