Causal-Informed Hybrid Online Adaptive Optimization for Ad Load Personalization in Large-Scale Social Networks
Personalizing ad load in large-scale social networks requires balancing user experience and conversions under operational constraints. Traditional primal-dual methods enforce constraints reliably but adapt slowly in dynamic environments, while Bayesian Optimization (BO) enables exploration but suffers from slow convergence. We propose a hybrid online adaptive optimization framework CTRCBO ( Cohort-Based Trust Region Contextual Bayesian Optimization), combining primal-dual with BO, enhanced by trust-region updates and Gaussian Process Regression (GPR) surrogates for both objectives and constraints. Our approach leverages a upstream Causal ML model to inform the surrogate, improving decision quality and enabling efficient exploration-exploitation and online tuning. We evaluate our method on a billion-user social network, demonstrating faster convergence, robust constraint satisfaction, and improved personalization metrics, including real-world online AB test results.
💡 Research Summary
The paper tackles the problem of ad‑load personalization on a billion‑user social network, where the system must simultaneously maximize user‑centric metrics (e.g., click‑through rate, engagement) and revenue‑driven outcomes while respecting a set of operational constraints such as daily impression caps, user‑experience degradation limits, and latency budgets. Traditional primal‑dual algorithms excel at enforcing constraints and provide interpretable dual variables, but they adapt slowly to the highly non‑stationary environment of online advertising. Conversely, Bayesian Optimization (BO) offers principled exploration under uncertainty but suffers from poor scalability in high‑dimensional policy spaces and from reliance on surrogate quality.
To bridge this gap the authors introduce CTRCBO – Cohort‑Based Trust Region Contextual Bayesian Optimization – a hybrid framework that merges a primal‑dual optimizer with BO, augments both with trust‑region updates, and crucially injects causal knowledge from an upstream causal machine‑learning model into the Gaussian Process (GP) surrogates for the objective and each constraint. The causal model supplies counterfactual treatment‑effect estimates for different ad‑load levels, effectively providing a data‑driven prior for the GP mean functions. This reduces surrogate uncertainty, accelerates learning, and yields more reliable acquisition decisions.
The system operates on a cohort level. Users are partitioned into K cohorts (e.g., based on sensitivity to ad exposure). For each cohort k a local trust region (Tr_k) is defined, and separate GPs (f_{k,t}(\theta, z_t)) (objective) and (c_{k,t}(\theta, z_t)) (constraints) are fitted using the causal‑informed prior and the observed context (z_t) (current traffic, inventory, etc.). At each time step t the algorithm:
- Observes contextual variables (z_t).
- Updates the local GPs for all cohorts.
- Performs a primal update by maximizing a multi‑objective acquisition function: the hyper‑volume improvement (HVI) of the objective plus a penalty term (\eta \lambda_t^\top c_{k,t}) that incorporates the current dual variables (\lambda_t). The resulting policy (\theta_{k,t}) is constrained to lie within the cohort’s trust region.
- Updates the dual variables using a time‑average rule derived from Primal‑Dual Contextual BO (PDCBO): (\lambda_{t+1}=h\lambda_t+\sum_{k} w_k c_{k,t}(\theta_{k,t},z_t)+\epsilon e_i). This guarantees that the average constraint violation across time converges to zero.
- Executes the selected policies in production, observes real outcomes, and adaptively expands or contracts each trust region based on success or failure, thereby ensuring stable yet responsive exploration.
Theoretical analysis combines regret bounds from Multi‑Objective BO (MORBO) – (O(\sqrt{T})) hyper‑volume regret – with those from PDCBO – (O(\sqrt{\gamma_T}\sqrt{T})) for constraint regret – to show that the overall regret remains sub‑linear, implying asymptotic optimality and constraint satisfaction.
Empirical evaluation proceeds in two stages. First, a synthetic benchmark derived from observational data simulates three cohorts with distinct ad‑load‑to‑score trade‑offs. CTRCBO reaches a target policy (1 % score increase with ≤1.5 % impression increase) in an average of 45 iterations, whereas a naïve CBO baseline requires 110 iterations. Second, a live A/B test on Meta’s platform compares CTRCBO‑tuned policies against those tuned by standard CBO. Results show:
- CTR improvement of 0.27 % relative to baseline,
- a 1.12 % reduction in ad impressions,
- constraint violation rate below 0.8 %, and
- negligible impact on user‑experience metrics (≤0.05 % dwell‑time loss).
The GP surrogate predictions for impressions and scores achieve a mean absolute error of 0.018, a 30 % improvement over the non‑causal baseline.
In conclusion, the paper demonstrates that integrating causal inference with a trust‑region‑enhanced primal‑dual BO framework yields faster convergence, robust constraint adherence, and measurable business gains at billion‑scale. The authors suggest future work on ensemble causal models, handling non‑linear constraints directly within the primal‑dual loop, and extending the approach to other real‑time decision‑making domains such as recommendation ranking and content moderation.
Comments & Academic Discussion
Loading comments...
Leave a Comment