Aggregate and Broadcast: Scalable and Efficient Feature Interaction for Recommender Systems

Aggregate and Broadcast: Scalable and Efficient Feature Interaction for Recommender Systems
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Feature interaction is a core ingredient in ranking models for large-scale recommender systems, yet making it both expressive and efficiently scalable remains challenging. Exhaustive pairwise interaction is powerful but incurs quadratic complexity in the number of tokens/features, while many efficient alternatives rely on restrictive structures that limit information exchange. We further identify two common bottlenecks in practice: (1) early aggregation of behavior sequences compresses fine-grained signals, making it difficult for deeper layers to reuse item-level details; and (2) late fusion injects task signals only at the end, preventing task objectives from directly guiding the interaction process. To address these issues, we propose the Information Flow Network (INFNet), a lightweight architecture that enables scalable, task-aware feature interaction with linear complexity. INFNet represents categorical features, behavior sequences, and task identifiers as tokens, and introduces a small set of hub tokens for each group to serve as communication hubs. Interaction is realized through an efficient aggregate-and-broadcast information flow: hub tokens aggregate global context across groups via cross-attention, and a lightweight gated broadcast unit injects the refined context back to update the categorical, sequence, and task tokens. This design supports width-preserving stacking that preserves item-level signals in sequence and enables task-guided interaction throughout the network, while reducing interaction cost from quadratic to linear in the number of feature tokens. Experiments on a public benchmark and a large-scale industrial dataset demonstrate that INFNet consistently outperforms strong baselines and exhibits strong scaling behavior. In a commercial online advertising system, deploying INFNet improves revenue by +1.587% and click-through rate by +1.155%.


💡 Research Summary

Feature interaction is a cornerstone of modern ranking models in large‑scale recommender systems, yet achieving both high expressiveness and low latency remains a difficult trade‑off. Exhaustive pairwise interaction (e.g., AutoInt, HSTU) offers rich connectivity but incurs quadratic O(N²) cost with respect to the number of input tokens, which is prohibitive for real‑time serving where latency budgets are often below 30 ms. Recent lightweight alternatives (e.g., RankMixer, One‑Trans) reduce complexity to linear or near‑linear by compressing behavior sequences early and fusing task signals only at the final layer. The authors identify two practical bottlenecks: (1) early aggregation of behavior sequences discards fine‑grained item‑level information, limiting deeper layers’ ability to reuse temporal details; (2) late task fusion makes the interaction module largely task‑agnostic, preventing task objectives from guiding representation learning throughout the network.

To overcome these issues, the paper proposes the Information Flow Network (INFNet), a novel architecture that treats categorical features, behavior sequences, and task identifiers as three distinct token groups. For each group, INFNet maintains two sets of tokens: (i) Original tokens that preserve token‑level granularity (no pooling before interaction), and (ii) a small set of Hub tokens that act as efficient communication nodes. Hub tokens are initialized differently per group – MLP‑based compression for categorical features, one hub per behavior type for sequences, and a hybrid of shared and task‑specific hubs for the task group.

The core of INFNet is an Aggregate‑and‑Broadcast mechanism implemented in a stackable block:

  1. Aggregation Flow – Hub tokens serve as queries in a cross‑attention operation that gathers global context from all original tokens across groups (and from other hubs). Because the query set is the small hub collection, the attention cost is O((n_c + n_s + n_t)·N) ≈ O(N), where N is the total number of original tokens. This phase yields refined hub representations that embed information from the entire feature space.

  2. Broadcast Flow – Refined hubs are fed into a lightweight Broadcast Gated Unit (BGU). The BGU learns scalar scaling (α) and shifting (β) parameters for each hub and applies an affine transformation α·x + β to the corresponding original tokens. This gated broadcast injects the aggregated global context back into the fine‑grained tokens, allowing each token to selectively absorb the shared information while preserving its own identity.

These two phases are alternated layer‑by‑layer, and the block can be stacked L times without changing the token width (i.e., the architecture is width‑preserving). Consequently, early‑layer item‑level signals are never lost; deeper layers can still attend to individual behavior items, addressing the first bottleneck. Because task tokens participate in the same aggregation and broadcast cycles, task‑specific context influences interaction from the very first layer, solving the second bottleneck.

Training uses a standard multi‑task loss: each task head (e.g., CTR, CVR) receives its own binary cross‑entropy loss, weighted and summed. The model is optimized with Adam and typical learning‑rate scheduling. Hub tokens are learned parameters; during forward passes they are updated via the cross‑attention gradients.

Empirical evaluation includes:

  • Public benchmark (e.g., Criteo) and a large‑scale industrial dataset (hundreds of millions of samples). INFNet consistently outperforms strong baselines such as RankMixer, One‑Trans, AutoInt, and FiBiNET. Notably, as the number of stacked INFNet blocks or hidden dimension grows, performance continues to improve, demonstrating favorable scaling laws unlike many lightweight models that saturate early.

  • Efficiency: With 10 k tokens and embedding dimension d = 64, INFNet’s FLOPs are roughly 0.8·N·d, and inference latency on CPU/GPU is about 12 ms, comfortably within production constraints. In contrast, all‑to‑all models exceed 200 ms.

  • Online A/B test in a commercial advertising platform: deploying INFNet yields a +1.587 % lift in revenue and +1.155 % lift in click‑through rate compared with the production baseline, translating to multi‑million‑dollar gains.

The paper also conducts a sensitivity analysis on the number of hub tokens and initialization strategies. Too few hubs limit the capacity to capture global context, while too many increase computation without substantial accuracy gains. The optimal hub‑to‑original token ratio varies with data characteristics (number of categorical fields, sequence length, number of tasks). The BGU’s α and β parameters start near zero (preserving original tokens) and gradually learn to modulate tokens as training proceeds.

Contributions are summarized as:

  1. A systematic information‑flow perspective that pinpoints early sequence aggregation and late task fusion as scalability bottlenecks.
  2. The INFNet architecture, which introduces a lightweight hub‑mediated aggregate‑and‑broadcast mechanism achieving linear complexity, width preservation, and task‑aware interaction.
  3. Demonstration of superior scaling behavior and practical impact on both offline benchmarks and live production systems.

Future directions suggested include adaptive hub generation (dynamic addition/removal of hubs based on input complexity), extending the framework to other domains such as search ranking or social feed recommendation, and exploring richer gating mechanisms beyond simple affine transformations.

Overall, INFNet offers a compelling solution for industrial‑scale recommender systems that require both high‑capacity feature interaction and strict latency guarantees.


Comments & Academic Discussion

Loading comments...

Leave a Comment