Towards Reliable Social A/B Testing: Spillover-Contained Clustering with Robust Post-Experiment Analysis

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

A/B testing is the foundation of decision-making in online platforms, yet social products often suffer from network interference: user interactions cause treatment effects to spill over into the control group. Such spillovers bias causal estimates and undermine experimental conclusions. Existing approaches face key limitations: user-level randomization ignores network structure, while cluster-based methods often rely on general-purpose clustering that is not tailored for spillover containment and has difficulty balancing unbiasedness and statistical power at scale. We propose a spillover-contained experimentation framework with two stages. In the pre-experiment stage, we build social interaction graphs and introduce a Balanced Louvain algorithm that produces stable, size-balanced clusters while minimizing cross-cluster edges, enabling reliable cluster-based randomization. In the post-experiment stage, we develop a tailored CUPAC estimator that leverages pre-experiment behavioral covariates to reduce the variance induced by cluster-level assignment, thereby improving statistical power. Together, these components provide both structural spillover containment and robust statistical inference. We validate our approach through large-scale social sharing experiments on Kuaishou, a platform serving hundreds of millions of users. Results show that our method substantially reduces spillover and yields more accurate assessments of social strategies than traditional user-level designs, establishing a reliable and scalable framework for networked A/B testing.

💡 Research Summary

The paper tackles a fundamental challenge in online controlled experiments on social platforms: network interference, or spillover, where a user’s treatment can affect the outcomes of connected peers, violating the Stable Unit Treatment Value Assumption (SUTVA). Existing solutions fall into two camps. Design‑based approaches (e.g., switchback testing, geographic isolation, independent‑set designs, and cluster randomization) aim to prevent spillover by altering the assignment mechanism, but they either sacrifice granularity, reduce effective sample size, or rely on generic community‑detection algorithms that are not optimized for experimental needs. Post‑experiment analytical methods (exposure re‑weighting, interference‑aware causal estimators, graph‑based adjustments) keep the original randomization but require strong modeling assumptions and can be difficult to interpret in production.

The authors propose a two‑stage framework that integrates a purpose‑built clustering algorithm with a variance‑reduction estimator tailored to cluster‑level randomization. In the pre‑experiment stage, they construct a multi‑behavior interaction graph (capturing sharing, messaging, following, etc.) and run Balanced Louvain, a modification of the classic Louvain method. Balanced Louvain augments modularity optimization with a soft size penalty: when moving a node to a candidate cluster, the gain ΔQ is reduced by α·P(|C|), where P is a piecewise linear function that penalizes clusters exceeding a threshold τ (typically half the maximum allowed size). The penalty is normalized by the average node degree (\bar{k}=2m/n), ensuring that α directly controls the trade‑off between modularity and size balance. After convergence, a hard size constraint splits any oversized cluster using an internal‑connectivity‑based heuristic, moving low‑connectivity nodes to new clusters until the size limit N_max is satisfied. This yields clusters that (i) minimize cross‑cluster edges (reducing spillover), (ii) are roughly equal in size (supporting stable randomization), and (iii) are temporally stable across experiment cycles.

In the post‑experiment stage, the authors introduce CUPAC (Covariate‑Adjusted Pre‑experiment Adjustment for Clusters), extending the CUPED idea to the cluster‑randomized setting. For each cluster they compute pre‑experiment behavioral covariates (e.g., average sharing rate) and estimate a regression coefficient β from historical data. The treatment effect estimator is then
\

Towards Reliable Social A/B Testing: Spillover-Contained Clustering with Robust Post-Experiment Analysis

💡 Research Summary

Comments & Academic Discussion

Leave a Comment