AsarRec: Adaptive Sequential Augmentation for Robust Self-supervised Sequential Recommendation
Sequential recommender systems have demonstrated strong capabilities in modeling users’ dynamic preferences and capturing item transition patterns. However, real-world user behaviors are often noisy due to factors such as human errors, uncertainty, and behavioral ambiguity, which can lead to degraded recommendation performance. To address this issue, recent approaches widely adopt self-supervised learning (SSL), particularly contrastive learning, by generating perturbed views of user interaction sequences and maximizing their mutual information to improve model robustness. However, these methods heavily rely on their pre-defined static augmentation strategies~(where the augmentation type remains fixed once chosen) to construct augmented views, leading to two critical challenges: (1) the optimal augmentation type can vary significantly across different scenarios; (2) inappropriate augmentations may even degrade recommendation performance, limiting the effectiveness of SSL. To overcome these limitations, we propose an adaptive augmentation framework. We first unify existing basic augmentation operations into a unified formulation via structured transformation matrices. Building on this, we introduce AsarRec (Adaptive Sequential Augmentation for Robust Sequential Recommendation), which learns to generate transformation matrices by encoding user sequences into probabilistic transition matrices and projecting them into hard semi-doubly stochastic matrices via a differentiable Semi-Sinkhorn algorithm. To ensure that the learned augmentations benefit downstream performance, we jointly optimize three objectives: diversity, semantic invariance, and informativeness. Extensive experiments on three benchmark datasets under varying noise levels validate the effectiveness of AsarRec, demonstrating its superior robustness and consistent improvements.
💡 Research Summary
The paper “AsarRec: Adaptive Sequential Augmentation for Robust Self-supervised Sequential Recommendation” addresses a critical challenge in sequential recommender systems: performance degradation due to noise in real-world user interaction data (e.g., from errors or ambiguous behaviors). While self-supervised learning (SSL), particularly contrastive learning, has emerged as a promising solution by creating augmented views of sequences and maximizing their agreement, existing methods rely on pre-defined, static augmentation strategies (like masking or reordering). This static approach is fundamentally limited because the optimal augmentation type or combination can vary drastically across different datasets, noise conditions, and even individual user sequences. Applying an inappropriate augmentation can harm rather than help performance.
To overcome this, the authors propose AsarRec, a novel framework that learns to generate adaptive, context-aware augmentations. The core innovation lies in three key components:
-
Unified Matrix Formulation: The paper first unifies common sequential augmentations (Crop, Mask, Reorder, Insert, Substitute) under a single mathematical framework. It conceptualizes an augmentation as applying a structured linear transformation to the original sequence via a “hard semi-doubly stochastic matrix” M (where s’ = M · s). This reformulation turns the problem of choosing an augmentation into a problem of learning an optimal transformation matrix.
-
Differentiable Generation of Transformation Matrices: Directly learning a discrete matrix M with strict constraints (binary values, row/column sum constraints) is challenging for gradient-based optimization. AsarRec employs a two-step, differentiable process: an encoder first predicts a continuous “probabilistic transition matrix,” which is then projected onto the space of valid hard semi-doubly stochastic matrices using a novel differentiable Semi-Sinkhorn algorithm. This enables end-to-end learning of the augmentation strategy.
-
Multi-Objective Optimization for Augmentation Quality: To ensure the learned augmentations are beneficial for the downstream recommendation task, AsarRec jointly optimizes three objectives:
- Diversity: Encourages the model to produce distinct transformation matrices for different views, promoting rich self-supervised signals.
- Semantic Invariance: Constrains the degree of item reordering caused by the transformation, preserving the core sequential semantics of the user’s original intent.
- Informativeness: Guides the model to generate the most challenging augmented views (i.e., those that minimize mutual information between views), forcing the encoder to learn more robust and noise-invariant representations.
Extensive experiments on three benchmark datasets (Games, MIND, etc.) under varying noise levels (clean, 20% injected noise) demonstrate AsarRec’s superiority. It consistently and significantly outperforms state-of-the-art sequential recommendation baselines and static SSL methods. The performance gains are especially pronounced in high-noise settings, validating the framework’s robustness. Ablation studies confirm the contribution of each of the three optimization objectives. In summary, AsarRec successfully transforms data augmentation from a fixed, heuristic component into a learnable and adaptive module, marking a significant advance in building robust, self-supervised sequential recommenders.
Comments & Academic Discussion
Loading comments...
Leave a Comment