No One Left Behind: How to Exploit the Incomplete and Skewed Multi-Label Data for Conversion Rate Prediction

February 23, 2026

Reading time: 6 minute

...

📝 Abstract

In most real-world online advertising systems, advertisers typically have diverse customer acquisition goals. A common solution is to use multi-task learning (MTL) to train a unified model on post-click data to estimate the conversion rate (CVR) for these diverse targets. In practice, CVR prediction often encounters missing conversion data as many advertisers submit only a subset of user conversion actions due to privacy or other constraints, making the labels of multi-task data incomplete. If the model is trained on all available samples where advertisers submit user conversion actions, it may struggle when deployed to serve a subset of advertisers targeting specific conversion actions, as the training and deployment data distributions are mismatched. While considerable MTL efforts have been made, a long-standing challenge is how to effectively train a unified model with the incomplete and skewed multi-label data. In this paper, we propose a fine-grained Knowledge transfer framework for Asymmetric Multi-Label data (KAML). We introduce an attribution-driven masking strategy (ADM) to better utilize data with asymmetric multi-label data in training. However, the more relaxed masking in ADM is a double-edged sword: it provides additional training signals but also introduces noise due to skewed data. To address this, we propose a hierarchical knowledge extraction mechanism (HKE) to model the sample discrepancy within the targe task tower. Finally, to maximize the utility of unlabeled samples, we incorporate ranking loss strategy to further enhance our model. The effectiveness of KAML has been demonstrated through comprehensive evaluations on offline industry datasets and online A/B tests, which show significant performance improvements over existing MTL baselines.

💡 Analysis

🇰🇷 한글로 읽기

📄 Content

No One Left Behind: How to Exploit the Incomplete and Skewed Multi-Label Data for Conversion Rate Prediction Qinglin Jia∗ jiaql@pku.edu.com Peking University China Zhaocheng Du zhaochengdu@huawei.com Noah’s Ark Lab, Huawei China Chuhan Wu wuchuhan1@huawei.com Noah’s Ark Lab, Huawei China Huifeng Guo† huifeng.guo@huawei.com Noah’s Ark Lab, Huawei China Ruiming Tang tangruiming@huawei.com Noah’s Ark Lab, Huawei China Shuting Shi Muyu Zhang {shishuting2,zhangmuyu}@huawei.com Huawei Technology Co., Ltd. China Abstract In most real-world online advertising systems, advertisers typically have diverse customer acquisition goals. A common solution is to use multi-task learning (MTL) to train a unified model on post-click data to estimate the conversion rate (CVR) for these diverse targets. In practice, CVR prediction often encounters missing conversion data as many advertisers submit only a subset of user conversion actions due to privacy or other constraints, making the labels of multi-task data incomplete. If the model is trained on all available samples where advertisers submit user conversion actions, it may struggle when deployed to serve a subset of advertisers targeting specific conversion actions, as the training and deployment data distributions are mismatched. While considerable MTL efforts have been made, a long-standing challenge is how to effectively train a unified model with the incomplete and skewed multi-label data. In this paper, we propose a fine-grained Knowledge transfer framework for Asymmetric Multi-Label data (KAML). We intro- duce an attribution-driven masking strategy (ADM) to better utilize data with asymmetric multi-label data in training. However, the more relaxed masking in ADM is a double-edged sword: it provides additional training signals but also introduces noise due to skewed data. To address this, we propose a hierarchical knowledge extrac- tion mechanism (HKE) to model the sample discrepancy within the targe task tower. Finally, to maximize the utility of unlabeled sam- ples, we incorporate ranking loss strategy to further enhance our model. The effectiveness of KAML has been demonstrated through comprehensive evaluations on offline industry datasets and online A/B tests, which show significant performance improvements over existing MTL baselines. ∗Work done while at Huawei. †Corresponding author. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. Conference acronym ’XX, June 03–05, 2018, Woodstock, NY © 2018 Copyright held by the owner/author(s). Publication rights licensed to ACM. ACM ISBN 978-1-4503-XXXX-X/18/06 https://doi.org/XXXXXXX.XXXXXXX CCS Concepts • Information systems →Computational advertising; Rec- ommender systems. Keywords Online Advertising, CVR Prediction, Multi-task Learning, ACM Reference Format: Qinglin Jia, Zhaocheng Du, Chuhan Wu, Huifeng Guo, Ruiming Tang, Shut- ing Shi, and Muyu Zhang. 2018. No One Left Behind: How to Exploit the Incomplete and Skewed Multi-Label Data for Conversion Rate Prediction. In Proceedings of Make sure to enter the correct conference title from your rights confirmation emai (Conference acronym ’XX). ACM, New York, NY, USA, 9 pages. https://doi.org/XXXXXXX.XXXXXXX 1 INTRODUCTION Online advertising plays a critical role in customer acquisition, typically following a sequential event pattern of “impression → click →conversion”[2, 44]. Advertising platforms must align with user interests to maximize advertisers’ return on investment (ROI). Therefore, post-click conversion rate predictions (CVR)[20] are essential tasks for personalized ad ranking systems, as they help optimize advertisers’ bid strategies under various pricing models, such as optimized cost-per-click (OCPC) and cost-per-action (CPA) advertising. Advertising platforms rely critically on accurate CVR predictions to adjust bidding strategies and optimize budget alloca- tion, thereby driving the effectiveness of online advertising. Advertisers have diverse customer acquisition needs, which cor- respond to different post-click conversion behaviors. In typical app promotion campaigns, for example, five main customer acquisition goals are associated with distinct conversion actions: activation, re-engagement, registration, payment, and retention. In OCPC adver- tising, advertisers select a conversion action for bidding based on their customer acquisition goals and submit the target conversion action of users who clicked the ad to the platform, in acc

View Original ArXiv

This content is AI-processed based on ArXiv data.

No One Left Behind: How to Exploit the Incomplete and Skewed Multi-Label Data for Conversion Rate Prediction

📝 Abstract

💡 Analysis

📄 Content

Table of Contents

Table of Contents

📝 Abstract

💡 Analysis

📄 Content

Start searching

No results found