Statistical Learning from Attribution Sets
We address the problem of training conversion prediction models in advertising domains under privacy constraints, where direct links between ad clicks and conversions are unavailable. Motivated by privacy-preserving browser APIs and the deprecation of third-party cookies, we study a setting where the learner observes a sequence of clicks and a sequence of conversions, but can only link a conversion to a set of candidate clicks (an attribution set) rather than a unique source. We formalize this as learning from attribution sets generated by an oblivious adversary equipped with a prior distribution over the candidates. Despite the lack of explicit labels, we construct an unbiased estimator of the population loss from these coarse signals via a novel approach. Leveraging this estimator, we show that Empirical Risk Minimization achieves generalization guarantees that scale with the informativeness of the prior and is also robust against estimation errors in the prior, despite complex dependencies among attribution sets. Simple empirical evaluations on standard datasets suggest our unbiased approach significantly outperforms common industry heuristics, particularly in regimes where attribution sets are large or overlapping.
💡 Research Summary
The paper tackles a pressing problem in online advertising: how to train conversion‑rate (CVR) prediction models when privacy‑preserving browser APIs and the deprecation of third‑party cookies prevent a direct link between an ad click and the resulting conversion. In this setting the learner only observes a sequence of click feature vectors and, for each conversion, a set of candidate clicks (an “attribution set”) that could have caused it. The authors formalize the generation of these sets as an oblivious adversary that, for each positive label, draws a window of k consecutive clicks and places the true causal click at position r with probability π
Comments & Academic Discussion
Loading comments...
Leave a Comment