Dissecting Performative Prediction: A Comprehensive Survey
The field of performative prediction had its beginnings in 2020 with the seminal paper “Performative Prediction” by Perdomo et al., which established a novel machine learning setup where the deployment of a predictive model causes a distribution shift in the environment, which in turn causes a mismatch between the distribution expected by the predictive model and the real distribution. This shift is defined by a so-called distribution map. In the half-decade since, a literature has emerged which has, among other things, introduced new solution concepts to the original setup, extended the setup, offered new theoretical analyses, and examined the intersection of performative prediction and other established fields. In this survey, we first lay out the performative prediction setting and explain the different optimization targets: performative stability and performative optimality. We introduce a new way of classifying different performative prediction settings, based on how much information is available about the distribution map. We survey existing implementations of distribution maps and existing methods to address the problem of performative prediction, while examining different ways to categorize them. Finally, we point out known and previously unknown connections that can be drawn to other fields, in the hopes of stimulating future research.
💡 Research Summary
The survey “Dissecting Performative Prediction: A Comprehensive Survey” provides a systematic overview of the field that originated with the 2020 “Performative Prediction” paper by Perdomo et al. The authors first formalize the core setting: a predictive model with parameters θ is trained on an initial data distribution D_init, but once deployed the model itself induces a shift in the environment. This shift is captured by a distribution map D(·), a deterministic function that maps any model parameters θ to the resulting data distribution D(θ). Consequently, the risk (expected loss) of the model on the distribution it creates may be far larger than the risk on the training distribution.
Two natural optimization objectives arise. Performative stability (θ_PS) requires that the model be a fixed point of the best‑response operator: θ_PS = arg min_θ Risk(θ, D(θ_PS)). In other words, if the environment settles at the distribution induced by θ_PS, the optimal model for that distribution is exactly θ_PS. Under strong convexity and Lipschitz assumptions on the loss and the distribution map, iterative best‑response procedures converge to such a fixed point. Performative optimality (θ_PO) instead seeks the global minimizer of the performative risk PR(θ) = Risk(θ, D(θ)). This point need not be a fixed point; it simply yields the lowest possible risk when evaluated on the distribution it itself creates. Finding θ_PO is generally harder because it requires exploring the entire distribution map, and global optimality can be guaranteed only when PR is convex.
The paper introduces a novel classification of performative‑prediction problems based on the amount of information available about the distribution map. Three levels are distinguished: (i) sample‑based – only empirical samples from D(θ) are observable; (ii) model‑based – a parametric or non‑parametric model of D(·) is estimated from data, incurring estimation error; (iii) full‑knowledge – the distribution map is known exactly or assumed to have a tractable analytic form. The richer the information, the more sophisticated the optimization algorithms that can be employed.
Algorithmic approaches are grouped into two families. The first family targets the stable point: iterative fixed‑point methods, best‑response dynamics, or variational‑inequality solvers. These methods are sample‑efficient and converge quickly under smoothness assumptions, but they do not guarantee global optimality. The second family tackles the global performative risk directly. Techniques include meta‑optimization, policy‑gradient style updates, Bayesian optimization, and Thompson sampling‑based exploration of the parameter space. These methods can approach θ_PO even when the loss landscape is non‑convex, but they typically require a reliable model of D(·) and more computational resources.
A substantial portion of the survey is devoted to connections with established machine‑learning subfields. Adversarial attacks can be viewed as an adversary that deliberately moves the distribution map in a harmful direction. Algorithmic recourse corresponds to the problem of minimally perturbing inputs to achieve a desired outcome, which is mathematically equivalent to navigating D(·) toward a favorable region. Fairness research that studies long‑term effects of interventions (e.g., affirmative‑action policies) also fits the performative framework because the policy changes the underlying population distribution over time. By exposing these links, the authors argue that performative prediction is not a niche curiosity but a unifying perspective that forces a re‑examination of many standard assumptions in ML.
Finally, the survey distinguishes the standard stateless setting—where the environment reacts to the current model as if it were the first deployment—from a stateful extension in which past deployments influence future distribution shifts. The stateful variant aligns with Markov Decision Processes and opens the door to reinforcement‑learning techniques, multi‑agent analysis, and dynamic policy design.
Overall, the paper offers a comprehensive taxonomy of problem settings, a clear exposition of stability versus optimality objectives, a detailed catalog of existing algorithms, and a thoughtful discussion of interdisciplinary connections. It serves both as a reference for newcomers and as a roadmap for future research directions, highlighting open challenges such as efficient estimation of distribution maps, convergence guarantees under weaker assumptions, and the integration of performative prediction with sequential decision‑making frameworks.
Comments & Academic Discussion
Loading comments...
Leave a Comment