Compress, Cross and Scale: Multi-Level Compression Cross Networks for Efficient Scaling in Recommender Systems

Compress, Cross and Scale: Multi-Level Compression Cross Networks for Efficient Scaling in Recommender Systems
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Modeling high-order feature interactions efficiently is a central challenge in click-through rate and conversion rate prediction. Modern industrial recommender systems are predominantly built upon deep learning recommendation models, where the interaction backbone plays a critical role in determining both predictive performance and system efficiency. However, existing interaction modules often struggle to simultaneously achieve strong interaction capacity, high computational efficiency, and good scalability, resulting in limited ROI when models are scaled under strict production constraints. In this work, we propose MLCC, a structured feature interaction architecture that organizes feature crosses through hierarchical compression and dynamic composition, which can efficiently capture high-order feature dependencies while maintaining favorable computational complexity. We further introduce MC-MLCC, a Multi-Channel extension that decomposes feature interactions into parallel subspaces, enabling efficient horizontal scaling with improved representation capacity and significantly reduced parameter growth. Extensive experiments on three public benchmarks and a large-scale industrial dataset show that our proposed models consistently outperform strong DLRM-style baselines by up to 0.52 AUC, while reducing model parameters and FLOPs by up to 26$\times$ under comparable performance. Comprehensive scaling analyses demonstrate stable and predictable scaling behavior across embedding dimension, head number, and channel count, with channel-based scaling achieving substantially better efficiency than conventional embedding inflation. Finally, online A/B testing on a real-world advertising platform validates the practical effectiveness of our approach, which has been widely adopted in Bilibili advertising system under strict latency and resource constraints.


💡 Research Summary

This paper addresses the fundamental challenge of efficiently modeling high‑order feature interactions in click‑through‑rate (CTR) and conversion‑rate (CVR) prediction, which are central to modern recommender systems. While deep learning recommendation models (DLRMs) dominate industrial practice, their interaction backbones often suffer from a trade‑off among expressive power, computational efficiency, and scalability. Existing modules such as product‑based layers, deep cross networks, and attention‑based designs either become computationally prohibitive when scaled or lack sufficient capacity to capture complex dependencies.

To overcome these limitations, the authors propose the Multi‑Level Compression Cross network (MLCC), a three‑stage architecture built around a unified “Compress‑Cross‑Scale” paradigm. First, a Global Compressor (GC) transforms the raw embedding tokens into a compact set of globally contextualized tokens using learnable mappings, thereby preserving essential information while reducing dimensionality. Second, the Progressive Layered Crossing (PLC) module dynamically fuses the original local tokens with the compressed global tokens through a series of weighted cross‑operations. Unlike static cross formulas, PLC adapts its weights conditioned on the inputs at each layer, enabling efficient extraction of high‑order interactions without exploding parameter counts. Third, the Scale stage either restores the crossed tokens to the original dimensionality for downstream MLP processing or directly produces the final prediction. The overall computational complexity remains linear in the number of fields, embedding dimension, and a modest factor k, making the architecture amenable to large‑scale deployment.

Building on MLCC, the paper introduces Multi‑Channel MLCC (MC‑MLCC), which replicates the MLCC pipeline across multiple parallel channels. Each channel learns interactions in a distinct sub‑space, and a shared aggregation layer merges the channel outputs. This horizontal scaling strategy dramatically improves the model’s capacity‑to‑cost ratio: increasing the number of channels yields near‑linear gains in expressive power while incurring far less parameter growth and FLOPs than conventional embedding inflation.

Extensive experiments were conducted on three public benchmarks (e.g., Criteo, Avazu, KDD‑Cup) and a large‑scale industrial dataset from Bilibili’s advertising system. MLCC consistently outperformed strong baselines such as DLRM, DeepFM, xDeepFM, and AutoInt, achieving AUC improvements ranging from +0.07% to +0.20% while reducing parameters and FLOPs by up to 6×. MC‑MLCC further pushed the envelope: on the industrial dataset it matched top‑performing baselines with over 26× fewer parameters and FLOPs, and on public datasets it surpassed computation‑matched DLRM variants by up to +0.52% AUC.

A systematic scaling analysis examined the impact of embedding dimension, head number, and channel count. The results demonstrated that channel‑based scaling delivers substantially higher return‑on‑investment than naïve embedding enlargement, offering stable and predictable performance gains across all dimensions. Moreover, the authors observed that MLCC mitigates the “embedding collapse” phenomenon—where additional dimensions contribute little useful information—by actively transforming raw capacity into diverse interaction patterns.

Finally, an online A/B test on a real‑world advertising platform validated the practical benefits. Deploying MC‑MLCC under strict latency constraints (≤ 2 ms) yielded a cumulative 32% increase in advertiser value (ADV V) without violating resource limits. The model has been adopted in Bilibili’s production pipeline, and the authors released the code publicly to facilitate reproducibility.

In summary, the paper contributes (1) a novel hierarchical compression‑cross architecture (MLCC) that balances expressiveness and efficiency, (2) a multi‑channel extension (MC‑MLCC) that provides a scalable horizontal dimension with minimal overhead, (3) thorough empirical evidence of superior performance and efficiency across benchmarks and a large industrial setting, and (4) a demonstration of real‑world impact through successful deployment in a high‑traffic advertising system.


Comments & Academic Discussion

Loading comments...

Leave a Comment