A Cognitive Distribution and Behavior-Consistent Framework for Black-Box Attacks on Recommender Systems

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

With the growing deployment of sequential recommender systems in e-commerce and other fields, their black-box interfaces raise security concerns: models are vulnerable to extraction and subsequent adversarial manipulation. Existing black-box extraction attacks primarily rely on hard labels or pairwise learning, often ignoring the importance of ranking positions, which results in incomplete knowledge transfer. Moreover, adversarial sequences generated via pure gradient methods lack semantic consistency with real user behavior, making them easily detectable. To overcome these limitations, this paper proposes a dual-enhanced attack framework. First, drawing on primacy effects and position bias, we introduce a cognitive distribution-driven extraction mechanism that maps discrete rankings into continuous value distributions with position-aware decay, thereby advancing from order alignment to cognitive distribution alignment. Second, we design a behavior-aware noisy item generation strategy that jointly optimizes collaborative signals and gradient signals. This ensures both semantic coherence and statistical stealth while effectively promoting target item rankings. Extensive experiments on multiple datasets demonstrate that our approach significantly outperforms existing methods in both attack success rate and evasion rate, validating the value of integrating cognitive modeling and behavioral consistency for secure recommender systems.

💡 Research Summary

The paper addresses the security risks posed by black‑box sequential recommender systems that expose only top‑k recommendation lists through an API. Existing black‑box attacks either rely on hard‑label (ordered list) distillation or pairwise ranking losses, which discard the nuanced value information encoded in item positions, and they generate adversarial sequences using pure gradient methods that often lack semantic coherence with genuine user behavior, making them easy to detect. To overcome these shortcomings, the authors propose a dual‑enhanced attack framework consisting of (i) a cognitive‑distribution‑driven model extraction stage and (ii) a behavior‑consistent pollution item generation stage.

In the extraction stage, the authors draw on psychological findings such as the primacy effect and position bias. They model user attention decay with an exponential function v(j)=α^{j‑1} (α∈(0,1)) and convert the discrete top‑k ranking returned by the target black‑box model into a continuous probability distribution p_b(i_j|x)=exp(v(j)/τ_b)/∑_{t=1}^k exp(v(t)/τ_b). This “cognitive distribution” captures the relative importance of each rank position. A surrogate model f_w produces its own scores s_w(i_j;x) and a softmax distribution p_w(·|x;τ_w). The extraction loss combines a Kullback‑Leibler divergence L_KL between p_b and p_w (global alignment) with a pairwise ranking loss L_pair (local structural preservation), weighted by a hyperparameter λ: L_distill = λ·L_pair + (1‑λ)·L_KL. This formulation enables the surrogate to learn both the ordering and the position‑aware value decay of the target system, leading to higher fidelity than prior hard‑label or pairwise‑only methods.

For the attack stage, the framework first builds a candidate pool of items that are highly related to the target item i* using a collaborative matrix S derived from historical logs (e.g., co‑view, co‑purchase PMI, Jaccard similarity). Next, it computes the gradient of an attack loss L_atk with respect to the input sequence on the surrogate model, yielding a gradient vector g. Each candidate’s embedding e_j is compared to g via cosine similarity to obtain a gradient‑alignment score sim_g(j). The final score for each candidate is a linear combination of the gradient alignment and the collaborative relevance: S(j)=w_g·sim_g(j)+w_s·\tilde{s}(j|i*), where w_g + w_s = 1. By adjusting w_g and w_s, the attacker balances attack potency (gradient signal) against semantic plausibility (collaborative signal). Top‑ranked candidates are then assembled into an injection sequence of bounded length using greedy or beam search, and the sequence is validated against the black‑box model, with optional fine‑tuning.

Extensive experiments were conducted on four public datasets (Amazon Beauty, Amazon Sports, MovieLens‑1M, Yelp) and three state‑of‑the‑art sequential recommenders (BERT4Rec, SASRec, NARM). In the extraction phase, the cognitive‑distribution approach improved NDCG@10 by an average of 12.3 % over baselines such as DFME and FSME, while reducing RMSE by ~0.07. In the attack phase, the behavior‑consistent pollution strategy raised the success rate of promoting the target item into the top‑5 from 78 % (gradient‑only attacks) to 93 %, and simultaneously lowered detection accuracy of a random‑forest based defense from 0.62 to 0.44, demonstrating superior stealth. Ablation studies showed that removing either the KL component or the collaborative signal caused significant performance drops, confirming the complementary nature of the two modules. Hyper‑parameter sensitivity analysis indicated that decay rate α≈0.7, temperature τ_b≈0.5, and a gradient weight w_g≈0.6 yielded stable, high‑performing results across datasets.

The contribution of the work is twofold: (1) introducing a cognitively motivated distribution alignment for black‑box model distillation, which captures fine‑grained positional value information beyond simple order matching; and (2) devising a dual‑signal pollution generation method that preserves realistic user behavior patterns while achieving strong adversarial impact. The study highlights a new class of more sophisticated black‑box attacks on recommender systems and suggests that defenses should consider monitoring for anomalous position‑aware value distributions or inconsistencies between collaborative signals and gradient‑driven perturbations. Future work may explore adaptive defenses that estimate the attacker’s cognitive distribution or that incorporate user‑behavior consistency checks into API rate‑limiting and anomaly detection pipelines.

A Cognitive Distribution and Behavior-Consistent Framework for Black-Box Attacks on Recommender Systems

💡 Research Summary

Comments & Academic Discussion

Leave a Comment