Competition-Aware CPC Forecasting with Near-Market Coverage

Competition-Aware CPC Forecasting with Near-Market Coverage
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Cost-per-click (CPC) in paid search is a volatile auction outcome generated by a competitive landscape that is only partially observable from any single advertiser’s history. Using Google Ads auction logs from a concentrated car-rental market (2021–2023), we forecast weekly CPC for 1,811 keyword series and approximate latent competition through complementary signals derived from keyword text, CPC trajectories, and geographic market structure. We construct (i) semantic neighborhoods and a semantic keyword graph from pretrained transformer-based representations of keyword text, (ii) behavioral neighborhoods via Dynamic Time Warping (DTW) alignment of CPC trajectories, and (iii) geographic-intent covariates capturing localized demand and marketplace heterogeneity. We extensively evaluate these signals both as stand-alone covariates and as relational priors in spatiotemporal graph forecasters, benchmarking them against strong statistical, neural, and time-series foundation-model baselines. Across methods, competition-aware augmentation improves stability and error profiles at business-relevant medium and longer horizons, where competitive regimes shift and volatility is most consequential. The results show that broad market-outcome coverage, combined with keyword-derived semantic and geographic priors, provides a scalable way to approximate latent competition and improve CPC forecasting in auction-driven markets.


💡 Research Summary

The paper tackles the problem of forecasting cost‑per‑click (CPC) in paid‑search advertising, where the price is generated by a real‑time auction and therefore reflects a competitive environment that is only partially observable from any single advertiser’s data. Using a massive Google Ads log covering the car‑rental sector from 2021 to 2023, the authors construct a weekly panel of 1,811 keywords observed over 127 weeks (≈2.5 years) and aim to predict the weekly CPC for each series.

Recognizing that latent competition cannot be directly measured, the authors devise three observable proxies: (1) a semantic neighborhood derived from pretrained transformer embeddings of keyword text, forming a fixed semantic graph; (2) a behavioral neighborhood obtained by aligning CPC time‑series with Dynamic Time Warping (DTW) and linking series with similar trajectories; and (3) geographic‑intent covariates that capture localized demand (city, airport, region) and market heterogeneity. These proxies are used in two ways: as exogenous features fed into a wide range of time‑series models, and as relational priors (adjacency matrices) for spatiotemporal graph neural networks (GNNs).

The experimental suite is extensive. Baselines include classical statistical models (ARIMA, Prophet), deep learning models (LSTM, Transformer), and state‑of‑the‑art time‑series foundation models (Chronos‑2, TimeGPT, Moirai). Graph‑based forecasters such as Diffusion Convolutional GNN and Temporal Graph Attention are also evaluated. For each model three configurations are compared: (a) vanilla (no competition information), (b) competition‑aware via exogenous covariates, and (c) competition‑aware via relational graph structure. Forecast horizons of 1, 4, and 8 weeks are tested using a 5‑fold rolling‑origin cross‑validation. Accuracy metrics (MAE, RMSE, MAPE) are reported overall and separately for four market regimes defined by mean CPC and coefficient of variation (high‑value/high‑volatility, high‑value/low‑volatility, low‑value/high‑volatility, low‑value/low‑volatility), a segmentation the authors call the “competitive frontier”.

Key findings are: (i) competition‑aware augmentation yields modest gains for very short‑term (1‑week) forecasts but substantial improvements for medium and long horizons (4‑8 weeks), reducing MAE by 6‑12% on average; (ii) the largest error reductions appear in the high‑value/high‑volatility quadrant, where forecasting failures are most costly for advertisers, with MAE drops exceeding 20%; (iii) geographic‑intent covariates act as a strong stabilizing prior for foundation models, dampening volatility and improving long‑term consistency; (iv) the semantic keyword graph enhances the performance of GNNs, especially when combined with behavioral DTW links, indicating that both substitution similarity and shared temporal patterns are valuable relational signals; (v) DTW‑based behavioral neighborhoods alone provide limited benefit but synergize well with other proxies.

The authors conclude that approximating latent competition through observable proxies is a practical and scalable approach for CPC forecasting in auction‑driven markets. Their contributions are threefold: reframing CPC prediction as a partial observability problem; engineering domain‑specific competition proxies and integrating them across a broad model universe; and empirically demonstrating that competition‑aware forecasting delivers horizon‑dependent and regime‑dependent gains, particularly for high‑risk keywords. The work suggests that even without direct access to competitors’ bids, advertisers and platform engineers can leverage text, temporal, and geographic signals to build more robust cost forecasts, thereby reducing budget risk and enabling more informed bidding strategies.


Comments & Academic Discussion

Loading comments...

Leave a Comment