With a Little Help From My Friends: Collective Manipulation in Risk-Controlling Recommender Systems

Recommendation systems have become central gatekeepers of online information, shaping user behaviour across a wide range of activities. In response, users increasingly organize and coordinate to steer algorithmic outcomes toward diverse goals, such a…

Authors: Giovanni De Toni, Cristian Consonni, Erasmo Purificato

With a Little Help From My Friends: Collective Manipulation in Risk-Controlling Recommender Systems
With a Little Help F rom My F riends: Collectiv e Manipulation in Risk-Con trolling Recommender Systems Gio v anni De T oni 1 , Cristian Consonni 2 , Erasmo Purificato 2 , Emilia Gomez 2 , and Bruno Lepri 1 1 F ondazione Bruno Kessler, Italy {gdetoni,lepri}@fbk.eu 2 Europ ean Commission, Joint Researc h Centre (JR C) ∗ {cristian.consonni,erasmo.purificato,emilia.gomez-gutierrez}@ec.europa.eu Abstract Recommendation systems ha ve become central gatek eep ers of online information, shap- ing user b eha viour across a wide range of activities. In resp onse, users increasingly organize and coordinate to steer algorithmic outcomes to ward diverse goals, such as promoting rel- ev an t conten t or limiting harmful material, relying on platform affordances – suc h as likes , r eviews , or r atings . While these mechanisms can serv e b eneficial purp oses, they can also b e lev eraged for adversarial manipulation, particularly in systems where such feedback directly informs safety guaran tees. In this pap er, w e study this vulnerability in recently proposed risk-con trolling recommender systems, whic h use binary user feedbac k (e.g., “Not Inter este d” ) to prov ably limit exposure to un w anted con tent via conformal risk control. W e empirically demonstrate that their reliance on aggregate feedbac k signals mak es them inherently sus- ceptible to coordinated adv ersarial user b ehaviour. Using data from a large-scale online video-sharing platform, we show that a small co ordinated group (comprising only 1% of the user p opulation) can induce up to a 20% degradation in nDCG for non-adversarial users by exploiting the affordances provided by risk-controlling recommender systems. W e ev aluate simple, realistic attack strategies that require little to no knowledge of the underlying recom- mendation algorithm and find that, while co ordinated users can significantly harm ov erall recommendation qualit y , they cannot selectiv ely suppress sp ecific con tent groups through rep orting alone. Finally , w e propose a mitigation strategy that shifts guarantees from the group level to the user level, showing empirically how it can reduce the impact of adversarial co ordinated behaviour while ensuring p ersonalized safety for individuals. 1 In tro duction Recommendation systems are used daily b y nearly 50% of the global p opulation [8] for a v ariet y of activities, ranging from en tertainment to professional w ork. They are also known to amplify p oten tially harmful c ontent , including fake news [58] and hate sp e e ch [46], con tributing to more to xic online environmen ts [47] with real-w orld consequences suc h as adverse mental health out- comes [26, 39]. Because these systems strongly shap e what information users encounter, p eople often attempt to influence their b eha viour through the signals they pro vide to the platform - suc h as r atings , r eviews , or likes . Users hav e b een observ ed to adopt sev eral strategies to influ- ence recommender b ehaviour [33], often guided by informal folk the ories ab out ho w algorithms op erate [50, 16, 17]. Prior researc h on algorithmic c ol le ctive action [27, 56] shows that these ∗ Disclaimer : The view expressed in this pap er is purely that of the authors and may not, under any circum- stances, b e regarded as an official p osition of the Europ ean Commission. 1 practices can scale b eyond individual users, where they collectiv ely adapt their interactions to shap e future recommendations, with goals that extend b ey ond their p ersonal feeds [12, 38]. Suc h mobilization may arise or ganic al ly [40] or b e organized through p aid lab or [42, 20], generating bursts of co ordinated signals in tended to influence con tent’s ranking and visibility . Examples include attempts to b o ost the exp osure of certain items [45] or to curb the spread of harmful con- ten t by sharing screenshots rather than engaging directly with original p osts [10]. More broadly , researc h has do cumen ted man y grassro ots and collective forms of algorithmic action in social media [10, 53] as w ell as in gig-economy platforms [56, 49] ( e.g. , #DeclineNow mov emen t [52]) where users co ordinate to induce desired p ositiv e c hanges in algorithmic behaviour. In other cases, collectiv e in teractions with platform signals tak e explicitly adv ersarial forms. Examples include r eview b ombing campaigns on gaming platforms ( e.g. , Steam), where co ordinated neg- ativ e feedbac k is used to downrank particular items follo wing public contro versies or in protest against platform p olicies [57, 40, 34]. More broadly , coordinated manipulation of platform sig- nals has also b een used to strategically influence information ecosystems, for instance, through online astr oturfing campaigns [55] that attempt to amplify p olitical narrativ es or misinformation artificially ( e.g. , 2016 U.S. presidential elections [5]). T ak en together, these examples illustrate ho w collective b ehavior can meaningfully reshap e the signals that recommendation algorithms rely on. More imp ortan tly , users’ signals are not only used to rank conten t but are increasingly central to efforts to mitigate the harms asso ciated with recommendation systems ( e.g. , X’s Comm unity Notes). F or example, a common approac h has b een to provide users with in terface-level con trols, such as a “Not Inter este d” button, that enable them to signal dissatisfaction with an item directly to the platform [29]. How ev er, the same signals may also b e strategically manipulated b y co ordinated groups of users. As a result, affordances in tended to improv e user safety and con tent quality may inadverten tly create new levers for influencing recommendation outcomes. In this paper, w e inv estigate the following question: What would happ en if a gr oup of users or ganize d themselves to str ate gic al ly use the “Not Inter este d” fe atur e to alter r e c ommender b ehavior? Sp ecifically , we inv estigate ho w co ordinated user behavior may in teract with the guarantees pro- vided by risk-c ontr ol ling r e c ommender systems [14]. They are a recen tly proposed recommender system that directly leverages users’ negative feedbac k (the “ Not Inter este d ” signal) to b ound, in exp ectation, the frequency of un w anted items in users’ feeds via conformal risk con trol [3]. They introduce a filtering lay er that excludes p oten tially harmful or unw an ted items based on a glob al thr eshold , calibrated on a held-out set of historical user-item in teractions. In contrast to standard in terface-lev el con trols, these risk-c ontr ol ling r e c ommender systems provide the first formal guaran tees that directly link user actions, suc h as reporting an item, to deterministic c hanges in recommendation outcomes in exp ectation. P erhaps unin tuitively , in our pap er, we sho w that adversarial co ordinated b ehaviour can only str engthen the formal risk-control guarantees: adv ersarial users can artificially increase the empirical risk observ ed during calibration, which in turn forces the system to adopt mor e c onservative filtering thresholds. Ho wev er, w e demonstrate that this seemingly b eneficial effect ma y come at a signific ant c ost to r e c ommendation quality . In particular, co ordinated rep orting can induce dispr op ortionate de gr adations in system p erformance, even when carried out by very small collectiv es. F or example, a group of just 40 adv ersarial users, reporting at most 1% of the items they encounter, ma y reduce standard nDCG b y up to 20% in the w orst case scenario. In terestingly , although c hanges to the risk-con trol guarantees scale line arly with the n umber of coordinated users, our results show that ev en small collectiv es can mak e it substan tially more difficult to low er the exp ected risk experienced by non-adv ersarial users at test time. 2 This asymmetry is consistent with prior w ork p ostulating that, in settings dominated by w eak signals, suc h as binary “ Not Inter este d ” feedbac k 1 , co ordinated users who inject str onger signals can exert outsized influence [27]. W e also show that adversarial users cannot selectiv ely suppress the exp osure of a target item group ( i.e. , the frequency with which items from that group app ear in users’ top- k recommendations), since the underlying risk-con trolling procedure is agnostic to group mem b ership 2 . Lastly , we describ e and empirically v alidate a simple impro vemen t that mak es risk-controlling recommender systems robust to adv ersarial manipulation by enforcing guaran tees at individual rather than population lev el. Our work can b e view ed as a pr e-deployment audit of risk-con trolling recommender systems, examining ho w coordinated user b eha viour ma y in teract with their guaran tees. Indeed, sev eral regulations and standardized framew orks prop ose auditing as a mec hanism to identify and mit- igate p otential risks associated with artificial in telligence systems [41, 21, 18, 60]. F or instance, the EU’s Digital Servic es A ct (DSA) [18] requires v ery large online platforms to conduct systemic risk assessmen ts that must accoun t for risks arising from intentional manipulation of the service, including c o or dinate d or automated activity (Article 34(2) of the EU DSA). Motiv ated b y this p ersp ectiv e, w e argue that ev aluating suc h dynamics should b e part of the preliminary assess- men t of risk-controlling mec hanisms b efore their widespread deploymen t in real-w orld platforms. Our con tributions. F ormally , w e first theoretically characterise ho w the co ordinated users who str ate gic al ly employ the “Not Inter este d” feedbac k mechanism affect the risk-con trol guar- an tees of recommender systems (Theorem 1). Building on these results, we analyse a set of realistic r ep orting str ate gies 3 that collectives ma y adopt to maximise their influence on recom- mendation outcomes (Section 5). Second, w e empirically ev aluate the impact of suc h collectives on a risk-con trolling recommender system [14] using real-world in teraction data from a large online video-sharing platform, Kuaishou (Section 6). Our ev aluation fo cuses on standard rec- ommendation metrics, including nDCG and R e c al l , as w ell as on c hanges in item exp osur e . Finally , we empirically show ho w calibrating a risk-con trolling recommender system with an individual-lev el threshold can successfully mitigate adv ersarial collectiv e behaviour (Section 7). 2 Related W ork Our pap er builds up on further related w ork on algorithmic collectiv e action, adversarial attacks on recommender systems, and coun tering the harmfulness of recommender systems. Despite the v arying in terpretations of co ordinated user b eha vior, ranging from b eneficial [27] to adversarial [42], w e use the terms “c ol le ctive” , “c o or dinate d gr oups” , “str ate gic users” , and “adversaries” in terchangeably throughout this pap er. Collectiv e A ction on Algorithmic Systems. Hardt et al. [27] initiated the theoretical study of algorithmic c ol le ctive action by emplo ying a simple mo del to describe ho w the co or- dinated efforts by groups of users can steer model outcomes. Sigg et al. [56] further studied a com binatorial mo del to describe the strategic in teraction b etw een w orkers and a gig economy 1 F or example, in a dataset from the large-scale video-sharing platform Kuaishou [23], user-provided “ Not Inter este d ” feedback is extremely sparse, with an a verage rep orting rate of approximately 0 . 002% p er user [14]. 2 Nev ertheless, risk-controlling recommender systems may still induce disp ar ate imp acts when the underlying recommender exhibits pre-existing biases tow ard particular item groups ( cf. Section B). 3 Throughout the pap er, we use the term “rep ort” to indicate a feedback giv en to an item through a “ Not Inter este d ” or “ Show me less of this ” button. This should not be confused with reporting an item in terms of flagging conten t for violating the terms of service, comm unity standards, or other norms of the platform, an action that triggers immediate remo v al or review of the item that is usually reserv ed for sp ecific categories of con tent. 3 platform. Several other works expanded the study of algorithmic collective action in general recommendation systems [48, 7, 37, 36]. F or example, b y prop osing framew orks to study the in teractions b etw een different groups with distinct ob jectiv es [37, 36] underlying ho w the col- lectiv e size, rather than the homogeneity or heterogeneit y of the group, is the more critical factor in ho w effectiv ely a collectiv e can manipulate recommender system outcomes. Our w ork is the first to study the effects of collectiv es on affordances designed to mitigate the spread of harmful conten t on recommender systems. Sp ecifically link ed to our w ork is the study b y Baumann and Mendler-Dünner [7], which in v estigated collectiv e action in the con text of m usic recommendation. They demonstrated that a small collectiv e of fans (controlling as little as 0.01% of training data) could significantly amplify the visibility of an underrepresen ted artist b y strategically reordering their playlists. Our setting is conceptually distinct. W e study the effects of users interacting with a “Not Inter este d” affordance and ev aluate broader platform-lev el consequences, including recommendation p erformance and item exposure. In addition, we fo cus on a risk-c ontr ol ling r e c ommender system [14], whereas they consider a classical recommender mo del. Finally , unlike Baumann and Mendler-Dünner [7], w e do not retrain the recommender. In our exp erimen ts, the recommender remains fixed, and only the risk calibration threshold is recomputed, an in terven tion that is substan tially less costly for the platform ( cf. Section 3). Collectiv e A dversarial Activities in Recommender Systems. The literature on “astr o- turfing” [55] or “cr owdturfing” activities [59, 42, 61] sho ws ho w it is p ossible to cheaply acquire fak e engagemen t on certain gig economy platforms ( e.g. , Fiv err, Sina W eib o) - often mediated b y low-w age work ers hired to create fake grassro ots supp ort or spam, such as fake r eviews , c om- ments , or likes . F or example, studies sho w that it costs around 0.80-3.00$ to hire 100 real users who will in teract with the target profiles to b o ost their engagement rate [59]. These profiles, b e- ing r e al users , are harder to detect b y online platforms, and they represent a p otential source of malicious collectiv e action [61]. Our work considers such a real-world setting, where a malicious group of users organizes to collectiv ely exploit the risk-con trolling guarantees of a platform to- w ards altering its b eha viour. F urthermore, our w ork is technically close to the concept of shil ling attacks [67] – also known as pr ofile-inje ction attacks – where malicious actors create and inject fak e user profiles, user interactions, or ratings. F or example, Anelli et al. [2] illustrated how an adv ersary might modify audio signals in a m usic recommender system to asso ciate tracks with explicit and illegal con ten t, linking adversarial p erturbations to harmful recommendations. F or these reasons, w e also provide an empirically v alidated strategy that can mitigate the effect of adv ersarial collective actions on simulation studies. Our discussion here on adv ersarial attacks is necessarily limited, so we refer to further surv eys [19, 63] highlighting how adversarial attacks are capable of influencing public opinion b y spreading fake news and manipulated con ten t through recommender systems. Coun tering the Harmfulness of Recommender Systems. In terest in understanding ho w to deal with harmful c ontent of r e c ommender systems in online platforms has spanned the last decade. Beyond harmful material, recommenders can also disseminate conten t that users p er- ceiv e as unwante d , inten ded as what ma y conflict with the users’ p ersonal v alues, trigger negative emotions, or may b e essentially medio cre, th us p erceived as a “waste of time” [29, 44]. Gillespie [25] described ho w social media platforms reduce the visibility and reac h of p otentially danger- ous conten t by demoting it in algorithmic rankings and recommendations, rather than removing it en tirely . By including a negative signal to preven t sp ecific con tent from being recommended to users, this study shows how platforms can use recommenders as an effective means of tar- geted conten t mo deration. Along this line, v arious concrete strategies hav e b een proposed to tac kle harmful con ten t in recommender systems [13, 44, 4]. Unfortunately , evidence suggests 4 that these mec hanisms are often ineffectiv e since they usually suffer from low visibilit y [31] and are opaque in how they affect the recommendation process, particularly b ey ond the user’s own feed [62, 29]. F or instance, users often exp ect negativ e feedback to reduce the prev alence of similar con tent, without penalising the con ten t itself or its creator [29]. Recently , De T oni et al. [14] introduced risk-c ontr ol ling r e c ommender systems , the first mo del-agnostic method using conformal risk con trol to prov ably bound un wan ted conten t in p ersonalised recommendations with distribution-free and model-free guaran tees. In the follo wing sections, we will in vestigate and audit suc h risk-c ontr ol ling r e c ommender systems b y testing their effect in the presence of a collectiv e group of users acting adv ersarially . 3 Preliminaries Let us no w formally outline how to build a risk-c ontr ol ling r e c ommendation system . Consider a finite set of items I = { i 1 , . . . , i N } and users U = { u 1 , . . . , u M } . Without loss of generality , w e assume that eac h item (or user) is described b y a d -dimensional feature vector. Let us consider a rank er f : U × I → R + that estimates ho w r elevant an item is, based on the user’s profile and the item’s characteristics. Giv en the ranker and a target user u ∈ U , w e usually wan t to pick the best k ∈ N + items to display to the user the recommendation set S ( U = u, k ) = { i j ∈ I : π ( i, u ) ≤ k } , where π : I × U → N returns an item ordering induced b y the ranker scores, e.g. , from the highest to the least preferred item. In standard le arning-to-r ank tasks, w e wan t to find the optimal ranker f ∗ = argmax f ∈F ℓ ( S ( U, k )) that maximises a target metric ℓ : I × U → R , suc h as the engagement . Let R : 2 |I | × U → [0 , 1] b e a function that describes the risk of giving a recommendation set S ⊆ I to a user. F or example, the function can mo del the unwante dness of the recommendations, suc h as the fraction of items flagged as “ Not Inter este d ” b y a user. F urther, consider r : I × U → [0 , 1] to be a score function that quan tifies the risk of a single item ( e.g. , the probabilit y that a user will flag an item), learned from historical user-item interactions. Adopting a common t wo-stage setup used in many commercial recommender systems [30], w e filter items based on their score, remo ving them only if it is greater than a threshold λ ∈ Λ , before building the final recommendation set: S λ ( U = u, k ) = { i j ∈ T λ ( U = u ) : π ( i, u ) ≤ k } where T λ ( U = u ) = { i ∈ I : 1 − r ( i, u ) ≥ λ } (1) where T λ ( U = u ) denotes the set of candidate items whose score is ab o ve the threshold. In the classical conformal risk control form ulation [3], the filtering condition is t ypically written as r ( i, u ) ≥ λ . Our form ulation, 1 − r ( i, u ) ≥ λ , is equiv alent under the assumption that r ( i, u ) ∈ [0 , 1] , and selects items with lo wer predicted risk ( i.e. , lo w er probability of b eing flagged as harmful). In summary , w e wan t to find the optimal threshold λ that maximises the giv en ob jective, suc h as the engagement , b y k eeping the target risk b elow a platform-defined lev el α ∈ [0 , 1] in exp ectation: λ ∗ = argmax λ ∈ Λ E [ ℓ ( S λ ( U, k ))] s.t. E [ R ( S λ ( U, k ))] ≤ α (2) De T oni et al. [14] show ed that we can prov ably control the risk in recommender systems, as defined by Equation 2, using simple binary user feedback, captured through “ Not Inter este d ” in teractions b y exploiting c onformal risk c ontr ol [3]. Under standard assumptions, including data exchangeabilit y and mild regularity conditions on the risk function ( e.g. , monotonicity), they show that a v alid filtering threshold can b e selected as ˆ λ = inf n λ : Q Q +1 ˆ R ( λ ) + 1 Q +1 ≤ α o , where ˆ R ( λ ) = 1 Q P Q j =1 R ( S λ ( U = u j , k )) is the empirical risk estimated on a c alibr ation set , a 5 held-out collection of Q user-item in teractions used exclusively to guarantee risk con trol. They formalise the risk of a recommendation set S λ ( U, k ) as: R ( S λ ( U, k )) = | i ∈ S λ ( U, k ) : H ( U, I = i ) = 1 | | S λ ( U, k ) | (3) where H is a binary random v ariable, dra wn from P ( H | I , U ) , indicating whether an item is deemed unw anted b y a user ( H = 1 ). T o ensure that the recommendation set remains of size k (or close to it), despite p oten tially aggressive filtering, they prop ose filling filtered p ositions with previously consumed items that hav e not b een flagged as unw an ted, referred to as r ep e ate d safe items . More broadly , conformal risk control is b oth mo del-fr e e and p ost ho c : it can be applied on top of an y pretrained rank er f or risk predictor r that follo ws the tw o-stage arc hitecture, requiring minimal additional assumptions and in tegration effort. Mo deling Co ordinated Attac ks in Risk-Con trolling Recommender. W e consider an adv ersarial v arian t of algorithmic collective action [27, 7], in which a subset of users coordinates to strategically manipulate system inputs with the explicit go al of de gr ading or suppr essing the exp osur e of tar gete d c ontent ( e.g. , silencing a creator or reducing the visibilit y of a conten t group). Unlike prior w ork, where collectives pursue socially motiv ated or w elfare-improving ob jectiv es, we fo cus on adversarial ob jectiv es that are strategically aligned but not necessarily ethic al ly gr ounde d . Such collectives can plausibly emerge in online environmen ts where users share common in terests or antagonisms, including lo osely organized comm unities ( e.g. , Reddit, 4c han, 8c han) or more structured campaigns. Empirical evidence shows that large-scale co- ordinated actions – ranging from attention manipulation ( e.g. , #TulsaFlop [6]) to harassmen t campaigns ( e.g. , #GamerGate [51]) – can arise with minimal central coordination, often relying on broadcast communication and shared in tent rather than tigh t sync hronization. Moreo ver, lo w-cost cro wdturfing infrastructures [42] further reduce co ordination barriers b y enabling cen- trally orchestrated manipulation through paid participan ts. In our setting, these mechanisms translate into users collectiv ely marking targeted items as “Not Inter este d” , thereb y pro viding systematically biased feedback to the recommender system. W e assume a weak co ordination mo del: participan ts share a common target and action ( e.g. , flagging a predefined set of items), but do not require fine-grained information ab out other users, or real-time sync hronization. This captures b oth de c entr alize d c amp aigns and c entr al ly c o or dinate d efforts . F ormally , given a calibration set of Q users, w e assume that a subset K < Q b ehav es adv ersarially 4 , yielding a fraction β = K Q . Finally , w e formalize the collectiv e’s ob jective along tw o distinct dimen- sions: (i) the degradation of ov erall recommendation p erformance, measured by the decline in recommendation qualit y exp erienced b y non-adv ersarial users at test time, and (ii) the targeted suppression of sp ecific groups of items, reflected in reduced exp osure. 4 Manipulating a Risk-Con trolling Recommendation System T o identify the effectiv e targets of adv ersarial b ehaviour, within the theoretical framework of conformal risk control, we first analyse the impact of a co ordinated group of adversarial users on the resulting upper b ound on risk. Unlik e prior work on adv ersarial conformal prediction and risk con trol [65, 3, 24] or classical shil ling attacks [67], the adversaries w e consider do not inject forged or p oisoned data ( e.g. , b y manipulating items or in teractions). Con v ersely , they 4 In practice, a risk-controlling recommender system must calibrate its filtering threshold λ using the full user p opulation to av oid distribution shifts [3] that could happ en at test time. Moreov er, it has b een shown that some platforms ha ve more relaxed p olicies, thus allowing users to create potentially dozens of new accounts [66]. As a consequence, adversarial users are lik ely to be included in the calibration set b y default. 6 op erate entirely within the platform’s intended affordances, strategically using standard feedback mec hanisms. W e begin b y presen ting our main theoretical result, follow ed by a discussion of its k ey practical implications. The complete pro of is a v ailable in Section A. Theorem 1. Consider a held-out c alibr ation set Q = { ( u, i, h ) j } Q j =1 . L et us assume that ther e ar e K users within Q that ar e b ehaving str ate gic al ly. Given a tar get level α ∈ [0 , 1] , let us c onsider ˆ λ ∈ Λ cho osen as ˆ λ = inf n λ : Q Q +1 ˆ R ( λ ) + 1 Q +1 ≤ α o wher e ˆ R ( λ ) = 1 Q P Q j =1 R ( S λ ( U = u j , k )) is c ompute d over the c alibr ation set. L astly, let us denote the non-adversarial users with U nonadv . Then, under data exchange ability, we have that the exp e cte d risk for non-adversarial users is b ounde d fr om ab ove by: E [ R ( S λ ( U nonadv , k ))] ≤ max { 0 , α − K Q + 1 r adv λ } (4) wher e r adv λ = 1 K P u adv ∈K R ( S λ ( U = u adv , k )) is the risk of the adversarial users u adv ∈ K ⊂ Q within the c alibr ation set. In tuitively , Theorem 1 shows that eac h adv ersarial user consumes a p ortion of the a v ailable risk budget ( cf. K Q +1 r adv λ ), thereb y pushing the actual risk reduction for unsuspecting users closer to zero. This effect scales line arly with the n um b er of adv ersarial users included in the calibration set. As a consequence, when non-adv ersarial users request a mo derate reduction in unw an ted con ten t ( e.g. , a 20% decrease), the system may be forced to enforce substan tially mor e aggr essive filtering, p otentially eliminating nearly all risky items, to satisfy the global risk constrain t in the presence of adversarial feedback. Finally , under the extreme assumption that adv ersarial users flag ev ery item in their recommendation sets as un w anted, w e obtain a tigh ter upp er b ound: Corollary 2. L et us assume that R ( S λ ( U adv , k )) = 1 for e ach adversarial user with the c alibr a- tion set. Then, the exp e cte d risk for non-adversarial users U nonadv is b ounde d fr om ab ove: E [ R ( S λ ( U nonadv , k ))] ≤ max { 0 , α − K Q + 1 } (5) In summary , as shown by Theorem 1 and Corollary 2, the actual exp ected risk exp erienced b y standard users decreases linearly based on the n umber of adv ersarial users. While this ma y app ear beneficial at first glance, in Section 6 w e demonstrate that such reductions can come at a substantial cost to recommendation qualit y . Imp ortantly , Theorem 1 highligh ts a key consideration for co ordinated collectives. F or a fixed collectiv e size, the magnitude of the effect dep ends critically on the empirical adversarial risk r adv λ observ ed in the calibration set. As the filtering threshold increases, the influence of adversarial users ma y diminish, for instance, once all items flagged by the collective are remov ed from the top- k recommendations. Consequen tly , an effectiv e adversarial strategy m ust sustain a high empirical risk at calibration time, as the threshold increases, thereb y preserving its impact on the system. 5 On the Rep orting Strategies Building on the previous sections, we observ e that the effectiveness of an adv ersarial collective hinges on its ability to adopt a r ep orting str ate gy that maintains a high empirical risk during calibration. W e b egin b y noting that the filtering procedure defined in Eq. (1) remov es items in order of decreasing risk score 1 − r ( i, u ) , replacing those d eemed most risky first. As a 7 result, an effective strategy for adversarial users is to rep ort items that ha ve low estimate d risk scores r ( i, u ) . Suc h items are intuitiv ely filtered only at higher thresholds λ , thereby forcing the system to adopt more conserv ativ e filtering to satisfy the global risk constrain t. W e formalise this in tuition through the following rep orting strategy: LowRisk ( γ ) = { i ∈ I u : π r ( i, u ) ≤ ⌈ γ · |I u |⌉} (6) where I u denotes the set of items sho wn to user u , π r ( i, u ) induces an ordering of items based on their estimated risk scores 1 − r ( i, u ) , and γ ∈ [0 , 1] sp ecifies the fraction of items that an adv ersarial user reports to the platform. While effectiv e in principle, the LowRisk ( γ ) strategy assumes white-b ox access to the recommender’s internal risk scores – an assumption that is t ypi- cally unrealistic in practice – thus represen ting the w orst-case theoretical scenario. Nev ertheless, prior work has sho wn the feasibilit y of data-free mo del extraction attac ks against sequen tial rec- ommenders [64], thus making this strategy potentially plausible for organized collectiv es. More generally , adversarial strategies should rely on minimal b ackgr ound know le dge , while maximising their do wnstream impact, and also satisfying practical constraints such as remaining difficult to detect. Because risk scores are hidden within the platform bac k end, w e therefore turn to alternativ e strategies based on observable pr oxies a v ailable through the user in terface ( e.g. , item p opularit y , ranking p osition, or metadata), which adversaries could plausibly exploit to appro x- imate the LowRisk strategy . F urther rep orting strategies. Motiv ated b y documented cases of real-world algorithmic collectiv e action [56], w e consider three additional plausible adv ersarial rep orting strategies tar- geting risk-c ontr ol ling r e c ommender systems . Giv en that the risk score of eac h item is una v ailable to the platform’s users, these alternative strategies m ust rely solely on information that the ad- v ersaries can readily garner from the platforms, resp ectively: (i) the num b er of likes an items has received ( Likes ); (ii) an item’s rank p osition ( TopRank ); and (iii) the item’s assigned tags ( Tags ). The in tuition b ehind each strategy is describ ed as follows. F or the TopRank strategy , the idea is that an item’s rank p osition plays a role in the risk-control pro cedure: given a risk score, higher-rank ed items are more lik ely to b e replaced than low er-rank ed ones. Therefore, rep orting higher-rank ed items ma y delay their remo v al and require a larger filtering threshold, k eeping r adv λ high. F or the Likes strategy , item p opularit y , as proxied b y the num ber of likes, ma y inv ersely correlate (imp erfectly) with perceived un wan tedness; therefore, the idea is that highly liked items are, in expectation, less lik ely to b e flagged as unw anted and therefore tend to ha v e lo w er risk scores. Finally , for the Tags strategy , w e assume that eac h item b elongs to a group g ∈ G , where G denotes a platform-defined taxonomy . 5 F or example, on a video-sharing platform, items ma y b e categorised as entertainment or c o oking . Some groups ma y b e system- atically more p opular than others and, b y the same reasoning, asso ciated with low er exp ected risk. F urthermore, w e consider this strategy to represen t the scenario where a collectiv e wan ts to target al l the items of a sp ecific category with the in ten tion of reducing the visibility of that category on the platform for all users. Based on these observ ations, we define the following rep orting strategies: Likes ( γ ) = { i ∈ I u : π l ( i, u ) ≤ ⌈ γ · |I u |⌉} (7) TopRanker ( γ ) = { i ∈ I u : π f ( i, u ) ≤ ⌈ γ · |I u |⌉} (8) Tag ( g ) = { i ∈ I u ∩ I g } (9) 5 F or simplicity , w e assume a single categorical assignment p er item. Nevertheless, we ac knowledge p otential c hallenges arising when considering interse ctionality asp ects in recommender systems ev aluation [15]. 8 where π l ( i, u ) and π f ( i, u ) denote the item orderings induced, resp ectiv ely , by the n umber of lik es and by the recommender’s ranking function, and I g denotes the set of items belonging to group g ∈ G . As a b enchmark, w e consider a naive strategy where the collectiv e of adversarial users simply decides to rep ort a fraction of items c hosen at random: Random ( g ) = { i ∈ I u } s.t. | Random ( g ) | = ⌈ γ · |I u |⌉ (10) 6 Ev aluating the Effects of Collective A ction In this section, w e empirically examine the impact of co ordinated adv ersarial users who seek to alter recommender behaviour by strategically flagging videos as unwante d . Our analysis lever- ages real-w orld user-item interaction data and a state-of-the-art recommender model. F urther, w e released all the source co de, ra w data, and pre/post processing script on GitHub under a p ermissiv e license. 6 Building on the theoretical insights dev elop ed in the previous sections, w e address the follo wing researc h questions: R Q1 Which rep orting strategy is most effectiv e at sustaining high adversarial risk r adv λ across all α ∈ [0 , 1] ? R Q2 How do es coordinated adv ersarial reporting affect standard p erformance metrics? R Q3 T o what exten t can adv ersarial collectives influence the exposure of sp ecific item groups through reporting? Exp erimen tal details. W e conduct our exp eriments using the KuaiR and dataset [23], which consists of real user-item in teractions collected from the large-scale video-sharing platform Kuaishou . 7 T o t he best of our knowledge, KuaiRand is the only publicly a v ailable dataset that includes realistic “Not Inter este d” feedback provided by users in response to recommended videos. Our experimental setup follows the t w o-stage architecture prop osed by De T oni et al. [14], using their pretrained Ligh tGCL 8 recommender mo del [11]. W e adopt the same calibration and test splits as in the original work to ensure comparabilit y . F rom the calibration set, we randomly sample a fraction β ∈ (0 . 001 , 0 . 1) of users to form an adversarial collective. Users in this group are assumed to co ordinate and follow one of the rep orting strategies describ ed in Section 5. F or the LowRisk , Likes , TopRanker , and Random strategies, w e additionally sp ecify a reporting rate γ ∈ [0 , 1] , represen ting the fraction of videos each adversarial user flags as “ Not Inter este d .” F or the Tag strategy , w e c ho ose the top-3 most frequent tags in the KuaiR and dataset g ∈ { 39 , 34 , 67 } . In practice, w e restrict γ to { 0 . 001 , 0 . 01 , 0 . 1 } , reflecting the assumption that users who rep ort an excessively large fraction of items would likely b e detected and remo ved b y the platform [4]. Ev aluation metrics. W e ev aluate recommendation quality using nDCG@k and Recall@k, fixing k = 20 throughout, as is common in the literature [14, 11]. F or a given risk level α ∈ [0 , 1] , and for each reporting strategy and metric m ∈ { nDCG@ k , Recall@ k } we quan tify the impact of adv ersarial b ehaviour through the normalised p erformance change at test time: Reduction ( β ) = 1 β  m (0) − m ( β )  (11) 6 https://github.com/geektoni/collective- action- recsys 7 https://www.kuaishou.com/ . Last accessed: 13/01/2026. 8 W e fo cus on Ligh tGCL for risk prediction, as it achiev ed the strongest p erformance in the original study . By strongest p erformance, w e mean that LightGCL uses less rep eated safe items than comparable mo dels to achiev e the same risk level [14, Figure 4]. 9 LowRisk ( γ ) Likes ( γ ) TopRanker ( γ ) Random ( γ ) Tag ( g = 39) Tag ( g = 34) Tag ( g = 67) 0 . 00 0 . 25 0 . 50 0 . 75 1 . 00 λ 10 − 4 10 − 3 10 − 2 10 − 1 Empirical average risk (a) γ = 0 . 001 0 . 00 0 . 25 0 . 50 0 . 75 1 . 00 λ 10 − 5 10 − 4 10 − 3 10 − 2 10 − 1 Empirical average risk (b) γ = 0 . 01 0 . 00 0 . 25 0 . 50 0 . 75 1 . 00 λ 10 − 4 10 − 3 10 − 2 10 − 1 10 0 Empirical average risk (c) γ = 0 . 1 Figure 1: E xpected empirical risk of adversarial users at calibration time as a function of the filtering threshold λ ∈ [0 , 1] , for rep orting rates γ ∈ { 0 . 001 , 0 . 01 , 0 . 1 } and different rep orting strategies. The collective size is fixed to β = 0 . 01 . Shaded areas indicate one standard deviation o ver 10 runs. where m (0) denotes the p erformance of the unp erturb ed risk-con trolling recommender system when enforcing risk level α (cf. Eq. 2), and m ( β ) denotes p erformance based on the fraction of adv ersarial users within the calibration set β = K Q . Eq. (11) yields a dimensionless quan tity , and its v alue is relative to the reduction theoretically exp ected for a collective of size β . A reduction of 0 indicates no observ able effect, whereas a reduction of 1 corresp onds to an effect prop ortional to the size of the collective. V alues greater than 1 indicate a dispr op ortionate impact. T o address R Q3, w e additionally measure group-lev el exp osure, defined as the probabilit y that an item b elonging to group g app ears in the top- k recommendations: Exp osure( k , β ) = E  1 { I ∈ S λ ∗ α,β ( U, k ) } | G = g  (12) where λ ∗ α,β ∈ Λ denotes the optimal threshold for risk level α in the presence of an adv ersarial collectiv e of size β . (R Q1) Effectiv e adv ersarial strategies should target low-risk items. F or a fixed collec- tiv e size β , Fig. 1 sho ws that all considered strategies can maintain a high exp ected adversarial risk at calibration time r adv λ up to relatively large filtering thresholds ( λ ≈ 0 . 75 ), across all rep orting rates γ ∈ { 0 . 001 , 0 . 01 , 0 . 1 } . Among them, the LowRisk strategy consistently ac hiev es the highest empirical risk, even when adversarial users rep ort as little as 0 . 1% of the items they encoun ter. This outcome is expected: by targeting lo w-risk items, adv ersaries ensure that these items are remo ved only at higher thresholds, forcing the system to adopt more conserv ativ e filtering. The Tags strategy also p erforms comp etitively , where its effectiv eness is comparable to LowRisk with a low rep orting budget ( γ = 0 . 001 ). How ev er, Fig. 3a sho ws that Tags re- quires reporting roughly t wo orders of magnitude more items to achiev e a similar empirical risk, highligh ting the imp ortance of selecting the right items rather than merely increasing reporting v olume. By con trast, the Likes and TopRanker strategies, despite rep orting the same num ber of items as LowRisk , result in appro ximately an expected risk that is an order of magnitude lo wer, performing similarly to random rep orting. W e attribute this to tw o factors: first, in the t wo-stage architecture, items ranked highly by the recommender are not necessarily those with the low est estimated risk and ma y therefore b e filtered early as γ increases, limiting the effec- tiv eness of TopRanker ; second, item popularity is only an imp erfect proxy for un wan tedness; in Kuaishou , lik es might not reliably reflect p erceived harm, which reduces the effectiveness of 10 LowRisk ( γ ) Likes ( γ ) TopRanker ( γ ) Random ( γ ) Tag ( g = 39) Tag ( g = 34) Tag ( g = 67) 0 . 002 0 . 004 0 . 006 0 . 008 0 . 010 β 0 5 10 Reduction (a) nDCG @ 20 ( γ = 0 . 001 ) 0 . 002 0 . 004 0 . 006 0 . 008 0 . 010 β 0 5 10 15 20 Reduction (b) nDCG @ 20 ( γ = 0 . 01 ) 0 . 002 0 . 004 0 . 006 0 . 008 0 . 010 β 0 20 40 Reduction (c) nDCG @ 20 ( γ = 0 . 1 ) 0 . 002 0 . 004 0 . 006 0 . 008 0 . 010 β 0 20 40 Reduction (d) Recall @ 20 ( γ = 0 . 001 ) 0 . 002 0 . 004 0 . 006 0 . 008 0 . 010 β 0 20 40 60 80 Reduction (e) Recall @ 20 ( γ = 0 . 01 ) 0 . 002 0 . 004 0 . 006 0 . 008 0 . 010 β 0 100 200 300 Reduction (f ) Recall @ 20 ( γ = 0 . 1 ) Figure 2: Expected p erformance reduction as a function of the fraction of adversarial users in the calibration set, β ∈ [0 . 001 , 0 . 1] , for different rep orting rates γ ∈ { 0 . 001 , 0 . 01 , 0 . 1 } and rep orting strategies. A reduction betw een 0 and 1 indicates an effect proportional to the size of the collective, whereas v alues greater than 1 indicate a disprop ortionate impact. Shaded areas denote confidence in terv als obtained via bo otstrapping o ver 10 runs. 11 LowRisk ( γ ) Likes ( γ ) TopRanker ( γ ) Random ( γ ) Tag ( g = 39) Tag ( g = 34) Tag ( g = 67) None Random / Likes TopRanker / LowRisk g = 39 g = 34 g = 67 0 5 10 % of rep orted items (a) F requency of reported items. 0 20 40 60 80 100 Desired reduction in unw anted conten t (%) 0 25 50 75 100 Empirical risk reduction (%) (b) Expected risk for standard users. Figure 3: Relationship b et ween rep orting in tensit y and risk reduction. A small fraction of carefully selected rep orted items in the calibration set (Fig. 3a) can induce a sharp reduction in exp ected un wan ted con tent for non-adv ersarial users relative to the baseline ( None in Fig. 3b). A cross all panels, the collectiv e size is fixed to β = 0 . 01 and the reporting rate to γ = 0 . 1 . W e rep ort the standard deviation ov er 10 runs as a shaded area or error bars. Likes . Nonetheless, Fig. 1a sho ws that at γ = 0 . 001 (a rep orting rate close to what is observed in practice) and γ = 0 . 01 , the Likes strategy impro v es ov er TopRanker and Random . W e argue that the relativ ely mo dest performance of Likes in our setting is likely amplified by the fact that, in the implementation by De T oni et al. [14], risk scores are learned solely from sparse user-item interactions without incorp orating auxiliary features. In deplo yments where richer signals are a v ailable, popularity-based proxies suc h as likes may therefore allo w adv ersarial col- lectiv es to more closely approximate the whiteb o x LowRisk strategy . Finally , Fig. 1c shows that when the rep orting rate increases to γ = 0 . 1 , all strategies con v erge to ward the performance of Tags . This result highlights that, without precise information ab out item risk, reporting a sufficien tly large num b er of items increases the lik eliho o d that flagged items remain in the top- k recommendations during calibration, thereb y sustaining adv ersarial impact. (R Q2) F ew adv ersaries can cause a disprop ortionate impact on p erformance. In this exp eriment, w e fix a target reduction in exp ected risk of 25% and ev aluate the resulting p erformanc e r e duction ( cf. Eq. (11)) on nDCG@k and Recall@k for non-adversarial users at test time. As shown in Fig. 2, even v ery small adv ersarial collectives can induce disprop ortionate degradations in recommendation qualit y when emplo ying effectiv e strategies. In particular, Figs. 2a and 2d sho w that, under a lo w rep orting budget ( γ = 0 . 001 ), the LowRisk strategy yields reductions of up to 5 and 20 in nDCG@20 and Recall@20, respectively , with a collectiv e comprising only 0 . 1% of calibration users (appro x. four users). When the rep orting budget increases to γ = 0 . 1 , the same strategy pro duces substan tially larger effects, with reductions reac hing up to 40 for nDCG@20 and 300 for Recall@20. As a concrete example, for β = 0 . 002 and γ = 0 . 1 , w e observe nDCG@20 reductions of approximately 10% and 2% under the LowRisk and Likes strategies, resp ectively . F or Recall@20, the corresp onding reductions are substantially larger, at 60% and 10% . While these changes ma y app ear mo dest – particularly for nDCG – prior w ork sho ws that even small v ariations in recommender p erformance can translate in to significan t business impact [32] ( e.g. , a 2–3% increase in click-through rate on eBay was asso ciated with a 6% increase in rev en ue [9]). As detailed in Section 5, the LowRisk strategy highligh ts a 12 LowRisk ( γ ) Likes ( γ ) TopRanker ( γ ) Random ( γ ) Tag ( g = 39) Tag ( g = 34) Tag ( g = 67) None 12.5 25.0 50.0 75.0 Desired reduction in unw an ted conten t (%) 0 20 40 60 80 % of previously seen items Figure 4: A dversarial strategies substan tially increase conten t rep etition in recommendations, leading to degraded p erformance – up to 80% rep eated items under the LowRisk strategy . The collectiv e size is fixed to β = 0 . 01 and the rep orting rate to γ = 0 . 1 . W e rep ort the standard deviation o v er 10 runs as error bars. p oten tial upp er-b ound in p erformance degradation across div erse collective sizes. By contrast, for γ < 0 . 01 , the remaining strategies struggle to generate appreciable performance losses. Ho wev er, once adv ersarial users rep ort 10% of the items they encoun ter, all strategies lead to noticeable degradation, with reductions of up to 10 and 40 for nDCG@20 and Recall@20, resp ectiv ely . In terestingly , as sho wn in Fig. 3b, non-adv ersarial users simultaneously exp erience a sharp reduction in exp ected risk relativ e to the baseline (grey line in Fig. 3b), dep ending on the adv ersarial strategy adopted. This highligh ts a key mechanism underlying the observed per- formance loss. Risk-con trolling recommender systems enforce risk constraints b y filtering risky items and r eplacing them with safe alternatives – often previously consumed items that ha ve not b een flagged as un wan ted [14, Prop ert y 1]. This mechanism is consisten t with real-w orld rec- ommender ecosystems, where rep eated consumption is common across domains and platforms routinely resurface previously consumed items or unfinished con ten t [1, 43]. While effectiv e for risk reduction, excessive reliance on rep eated items undermines no velt y and serendipit y , thereby degrading recommendation quality . Consisten t with this mec hanism, Fig. 4 shows that adv er- sarial strategies can dramatically increase con tent rep etition; under the LowRisk strategy , up to 80% of recommended items ha ve b een seen previously by users, leading to sustained p erformance degradation. Overall, these results suggest that adv ersarial effectiven ess dep ends critically on the relativ e strength of adv ersarial signals compared to those of non-adversarial users. Giv en that most users exhibit extremely low exp ected risk, ev en a small collectiv e that injects a stronger, co ordinated signal can exert an outsized influence, an observ ation consistent with prior work on algorithmic collectiv e action [7, 27]. (R Q3) A dversarial rep orting cannot selectiv ely alter group exp osure. Building on the previous findings, we inv estigate whether a co ordinated adversarial collective can manipulate the exp osure of sp ecific item groups through targeted rep orting, as implemented by the Tag strategy . T o this end, we fo cus on group g = 34 and compare its exp osure under tw o adversarial collectives: one using the Random ( γ ) strategy and another using the targeted Tag ( g = 34) strategy . T o ensure a fair comparison, we set γ for Random to match the fraction of items rep orted under Tag ( g = 34) 13 0 20 40 60 80 100 Desired reduction in unw anted conten t (%) 0 2 4 ∆Exp osure (%) β 0.001 0.005 0.01 Figure 5: Difference in top- k exp osure ( k = 20 ) for group g = 34 in Kuaishou . Compar- ing Random ( γ ) and Tag ( g = 34) , b oth strategies flag approximately 10% of items y et induce similar exp osure reductions for non-adv ersarial users. Results are sho wn for collectiv e sizes β ∈ { 0 . 001 , 0 . 005 , 0 . 01 } , with confidence in terv als obtained via b o otstrapping o ver 10 runs. (appro x. 10% in Fig. 3a). F urther, we ev aluate the difference in exp osure across v arying collective sizes β ∈ { 0 . 001 , 0 . 005 , 0 . 01 } . As sho wn in Fig. 5, w e observ e no appreciable difference in exp osure betw een the t wo strategies at any collective size. This result follows directly from the design of the risk metric in tro duced b y De T oni et al. [14], whic h is agnostic to item group mem b ership and p enalizes items solely based on their estimated risk score. The risk metric (Eq. (3)) depends only on the n umber of flagged items within each top- k list and is agnostic to whic h sp ecific items are flagged. The same holds for the threshold calibration procedure. Consequen tly , adv ersaries cannot selectively reduce the exp osure of any target group simply b y rep orting items from that group more frequen tly . As further discussed in Section B, disparities in exp osure may still arise, but only due to pre-existing biases in the underlying recommender mo del, and w ould diminish under an optimal risk predictor. Importantly , ev en under such ideal conditions, c o or dinate d users c an stil l influenc e r e c ommendation outc omes : in risk-con trolling recommender systems, the threshold λ is calibrated from user feedbac k, independently of the scoring model, as conformal risk con trol guaran tees are mo del-agnostic. Ov erall, these findings indicate that risk-con trolling recommender systems exhibit a de gr e e of r obustness to adversarial attempts at manipulating group-lev el exp osure. 7 Mitigating the Effects of A dversarial Collectiv e Action Our findings in Section 6 suggest that risk-c ontr ol ling r e c ommender systems are neither inher- en tly fragile nor univ ersally robust. Their vulnerabilit y depends critically on the lev el at which guaran tees are enforced and on how replacement mec hanisms in teract with user b ehaviour. W e argue that a more promising design direction is to mov e aw a y from a single glob al thr eshold to ward user-level risk c ontr ol . Concretely , instead of computing a single threshold λ ∈ Λ ( cf. Eq. (1)) using a calibration set dra wn from the en tire population, the system could estimate p ersonalised thresholds λ u for eac h user u ∈ U . More formally , for a given user u ∈ U , let us as- sume we ha ve access to a held-out calibration set Q u = { ( u, i, h ) j } Q u j =1 of user-item in teractions. Then, the user-lev el threshold ˆ λ u for a giv en risk α ∈ [0 , 1] can b e defined as follo w: ˆ λ u = inf  λ : Q u Q u + 1 ˆ R u ( λ ) + 1 Q u + 1 ≤ α  s.t. ˆ R u ( λ ) = 1 Q u Q u X j =1 R ( S λ ( U = u, k )) (13) 14 0 25 50 75 100 Desired reduction in unwan ted conten t (%) 0 . 0 0 . 1 0 . 2 0 . 3 ∆ nDCG @ 20 γ 0.001 0.01 0.1 (a) nDCG @ 20 0 25 50 75 100 Desired reduction in unwan ted conten t (%) 0 . 0 0 . 2 0 . 4 ∆ Recall @ 20 γ 0.001 0.01 0.1 (b) Recall @ 20 0 25 50 75 100 Desired reduction in unw anted content (%) − 100 − 75 − 50 − 25 0 ∆ Prev. seen items (%) γ 0.001 0.01 0.1 (c) Previously seen items Figure 6: Difference ( ∆ ) in p erformance b etw een user-lev el and p opulation-level thresholds in risk-con trolling recommender systems at test time, under the LowRisk strategy . F or nDCG @ 20 and Recall @ 20, a higher difference is b etter. F or the previously seen items, a low er difference is better. Results are sho wn for reporting rates γ ∈ { 0 . 001 , 0 . 01 , 0 . 1 } , with confidence interv als obtained via b o otstrapping o v er 10 runs. The collective size is fixed to β = 0 . 01 . where ˆ R u ( λ ) denotes the p er-user risk score, computed only ov er the interactions made b y the giv en user. By calibrating risk guaran tees at the individual level, the effects of coordinated adv ersarial b ehaviour can b e lo calized, preven ting them from propagating system-wide. Under suc h a design, the reporting strategies describ ed in Section 5 would lose their effe ctiveness , as rep orted items would influence only the reporting user’s personalised threshold. In this section, w e empirically ev aluate this approach b y answ ering the following research question: R Q4 Can w e mitigate the adversarial collectives’ influence on the recommender by emplo ying user-lev el thresholds? Exp erimen tal details. In our exp eriment, we adopt the same empirical setup and ev aluation proto col describ ed in Section 6. W e fo cus on the LowRisk strategy , as it exhibits the (theoretical) largest impact on recommender p erformance. W e compare t w o approaches: (i) a personalized calibration of the risk-controlling threshold (Eq. (13)), and (ii) the standard p opulation-lev el calibration used in Section 6. T o quan tify the differences, w e measure the performance gap ∆ m for m ∈ { nDCG@ k , Recall@ k } b etw een the user-lev el and population-level risk-controlling recommender systems, where ∆ > 0 indicates an impro vemen t of the user-level approach. Addi- tionally , as in RQ3, w e rep ort the difference in the fraction of previously seen items app earing in the top- k recommendations under the tw o calibration strategies. In this case, ∆ < 0 indicates that the user-lev el approach requires few er previously seen items to ac hiev e the same level of risk con trol. All results are computed ov er non-adv ersarial users in the test set. (R Q4) P ersonalized thresholds mitigate the effects of algorithmic collectiv es. Un- der the strongest attack strategy ( LowRisk ), adopting p ersonalized thresholds as in Eq. (13) yields consisten t impro v ements ov er the standard global threshold. In particular, w e observ e gains of +0 . 3 nDCG @ 20 and +0 . 4 Recall @ 20, while reducing the num b er of previously seen ( i.e. , rep eated) items in the recommendation lists b y approximately 99% (Fig. 6c). Fig. 6 fur- ther illustrates these results. Importantly , these impro vemen ts are ac hieved without sacrificing the desired risk-con trol guarantees: b oth approac hes satisfy the target b ounds, although the global threshold tends to b e more conserv ative. The observ ed gains can be explained by the fact that p ersonalized calibration prev ents coordinated groups from influencing the calibration pro cess. Unlik e the global threshold setting - where adv ersarial collectives can distort the shared calibration set ( cf. Section 6) - eac h user no w relies on an individual c alibr ation set , effectiv ely isolating them from collective manipulation. Moreov er, personalization adapts the interv ention 15 to heterogeneous user b eha vior. Since most users exhibit lo w rep orting rates, ev en under strin- gen t reduction targets α ∈ [0 , 1] , only a small num b er of items need to b e filtered or replaced (on a v erage ≈ 1 . 5 items p er top- k list). This leads to higher-qualit y recommendations while still enforcing risk control for any α . Consisten tly , we find that the b enefits of p ersonalization increase with both the desired reduction in unw an ted con ten t and the strength of the collective attac k, indicating that user-lev el calibration is particularly effectiv e in mitigating co ordinated adv ersarial behavior. 8 Discussion W e now discuss some limitations of the presen t research, whic h op en up further a ven ues for future w ork. First, our empirical ev aluation necessarily relies on an offline simulation with simulate d c ol le ctive of users that ma y not reflect the complex interaction of a live system. A natural next step is to v alidate these findings through online studies, similar to prior work using br owser extensions or p articip atory infr astructur es [48, 28], which would enable the collection of real user in teractions with a functioning risk-con trolling recommender system. Second, the effectiveness of the prop osed adv ersarial strategies dep ends on sp ecific implementation choices and assumptions underlying the risk-con trolling mec hanism. F or instance, although LowRisk emerges as the most effective strategy in our exp erimen ts ( cf. Fig. 2), it may be impractical to deplo y in real-w orld settings due to limited access to internal risk scores. That said, mo del extr action and black-b ox infer enc e te chniques [64] could, in principle, b e used to appro ximate these scores, p oten tially making such strategies feasible. More broadly , the impact of all strategies ma y differ in deplo yment , where feedback dela ys, interface design, and heterogeneous user b ehaviour could substan tially alter outcomes. In particular, users ma y emplo y the “Not Inter este d” signal to express dissatisfaction with rep etitiv e or lo w-nov elt y conten t rather than genuine disin terest or perceived harm [29]. F or example, prior w ork sho ws that tolerance to harmful con tent v aries across cultural and so cial con texts [35]. Giv en the curren t design of risk-con trolling recommender systems and our findings on conten t rep etition ( cf. Figs. 4 and 6c), suc h well-in ten tioned feedbac k could inadverten tly trigger feedbac k loops that push the system tow ard increasingly conserv ativ e filtering. As a re- sult, a small group of high-rep orting users, acting in go o d faith, may disprop ortionately influence the system-wide risk threshold, effectiv ely driving the recommender tow ard more conserv ative b eha viour, akin to the adv ersarial effects analysed in Section 6. How ever, as we ha v e sho wn empirically in Section 7, employing user-lev el thresholds might represent a promising solution to wards mitigating these phenomena. Beyond robustness, user-lev el risk con trol also offers a principled approac h to addressing the c old-start problem [54]. New users could b e initialised with predefined risk tolerances that get subsequen tly refined as feedback accum ulates. Suc h p ersonalisation could further leverage existing so cial or relational structures on the platform, for example, by assigning new users the risk tolerance of so cially connected or b eha viourally-similar users, aligning risk con trol with established mec hanisms of trust and similarit y , and p otentially fostering user altruism [22]. Imp ortantly , in this wa y , risk control ceases to b e a purely platform- c entric safe guar d to become a more user-c entric appr o ach that emp o wers individuals to directly shap e their recommendation experience. 9 Conclusion This study in v estigates the vulnerability of risk-c ontr ol ling r e c ommender systems to collectives of adv ersarial users. While conformal risk control provides principled, mo del-agnostic guarantees 16 to bound unw an ted con tent, w e demonstrate that co ordinated groups can exploit these mech- anisms to their adv antage. Theoretically , we prov ed that adv ersarial users consume a portion of the global risk budget, forcing the system to adopt more conserv ativ e filtering thresholds for the en tire p opulation. Empirically , using real-w orld data from a large video-sharing platform, w e demonstrated that a collective of just 1% of users can degrade recommendation qualit y for non-adv ersarial users b y up to 20% . W e examined sev eral realistic attac k strategies that co- ordinated adversaries could adopt, including targeting top-ranked ( TopRanker ) and most-lik ed ( Likes ) items, and quantified their effects on widely used platform metrics such as nDCG@20 and Recall@20. T argeting items with low estimated risk scores ( LowRisk ) yielded the strongest and most sustained adv ersarial impact; ho wev er, this strategy would require white-box access to the system, limiting its practical feasibility . F urthermore, our results rev eal that these sys- tems do not allow adversaries to selectiv ely suppress sp ecific item groups. W e also sho w that calibrating risk-controlling recommender systems with user-level thresholds can empirically mit- igate the impact of adversarial collectives, leading to more stable p erformance as the collective size increases. Our findings demonstrate that global-lev el risk con trol is susceptible to coordi- nated manipulation, whereas user-lev el approaches offer a promising path forw ard to mitigate the impact of adv ersarial b ehaviour. A c kno wledgemen ts The w ork of GDT and BL was partially supp orted b y the following pro jects: Horizon Europe Programme, gran ts #101120237-ELIAS and #101120763-T ANGO. F unded b y the Europ ean Union. Views and opinions expressed are ho wev er those of the author(s) only and do not neces- sarily reflect those of the Europ ean Union or the Europ ean Health and Digital Executive Agency (HaDEA). Neither the European Union nor the gran ting authority can b e held responsible for them. References [1] Ash ton Anderson, Ra vi Kumar, Andrew T omkins, and Sergei V assilvitskii. 2014. The dynamics of rep eat consumption. In Pr o c e e dings of the 23r d international c onfer enc e on W orld wide web . 419–430. [2] Vito W alter Anelli, Y ashar Deldjo o, T ommaso DiNoia, and F elice An tonio Merra. 2021. A dversarial recommender systems: A ttac k, defense, and adv ances. In R e c ommender systems handb o ok . Springer, 335–379. [3] Anastasios Nik olas Angelop oulos, Stephen Bates, A dam Fisch, Lihua Lei, and T al Sch us- ter. 2024. Conformal Risk Control. In The Twelfth International Confer enc e on L e arning R epr esentations . https://openreview.net/forum?id=33XGfHLtZg [4] Arna v Arora, Presla v Nak ov, Momchil Hardalov, Sheikh Muhammad Sarwar, Vibha Na yak, Y oan Dinko v, Dimitrina Zlatko v a, Kyle Dent, Amey a Bhatawdek ar, Guillaume Bouchard, and Isab elle Augenstein. 2023. Detecting Harmful Conten t on Online Platforms: What Platforms Need vs. Where Research Efforts Go. ACM Comput. Surv. 56, 3, Article 72 (Oct. 2023), 17 pages. doi: 10.1145/3603399 [5] Christopher A. Bail, Brian Gua y , Emily Maloney , Aidan Combs, D. Sunshine Hilly- gus, F riedolin Merhout, Deen F reelon, and Alexander V olfovsky . 2020. Assessing the Russian In ternet Researc h Agency’s impact on the political attitudes and behaviors of 17 American T witter users in late 2017. Pr o c e e dings of the National A c ademy of Scienc es 117, 1 (2020), 243–250. arXiv:https://www.pnas.org/doi/pdf/10.1073/pnas.1906420116 doi: 10.1073/pnas.1906420116 [6] Jac k Bandy and Nicholas Diak op oulos. 2020. # tulsaflop: A case study of algorithmically- influenced collectiv e action on tiktok. arXiv pr eprint arXiv:2012.07716 (2020). [7] Joac him Baumann and Celestine Mendler-Dünner. 2024. Algorithmic collectiv e action in recommender systems: promoting songs b y reordering pla ylists. A dvanc es in Neur al Infor- mation Pr o c essing Systems 37 (2024), 119123–119149. [8] Ljubisa Bo jic. 2024. AI alignment: Assessing the global impact of recommender systems. F utur es 160 (2024), 103383. [9] Y uri M Bro vman, Marie Jacob, Natra j Sriniv asan, Stephen Neola, Daniel Galron, Ryan Sny- der, and P aul W ang. 2016. Optimizing similar item recommendations in a semi-structured mark etplace to maximize con version. In Pr o c e e dings of the 10th A CM Confer enc e on R e c- ommender Systems . 199–202. [10] Jenna Burrell, Zo e Kahn, Anne Jonas, and Daniel Griffin. 2019. When users con trol the algorithms: v alues expressed in practices on t witter. Pr o c e e dings of the ACM on human- c omputer inter action 3, CSCW (2019), 1–20. [11] Xuheng Cai, Chao Huang, Lianghao Xia, and Xubin Ren. 2023. LightGCL: Simple Y et Effectiv e Graph Contrastiv e Learning for Recommendation. In The Eleventh International Confer enc e on L e arning R epr esentations . [12] Sarah H Cen, Andrew Ily as, Jennifer Allen, Hannah Li, and Aleksander Mądry . 2024. Measuring Strategization in Recommendation: Users Adapt Their Behavior to Shap e F uture Con tent. In Pr o c e e dings of the 25th ACM Confer enc e on Ec onomics and Computation . 203– 204. [13] Jerry Chee, Shank ar Kalyanaraman, Sindhu Kiranmai Ernala, Udi W einsb erg, Sarah Dean, and Stratis Ioannidis. 2024. Harm mitigation in recommender systems under user preference dynamics. In Pr o c e e dings of the 30th ACM SIGKDD Confer enc e on Know le dge Disc overy and Data Mining . 255–265. [14] Gio v anni De T oni, Erasmo Purificato, Emilia Gomez, Andrea Passerini, Bruno Lepri, and Cristian Consonni. 2025. Y ou Don’t Bring Me Flo wers: Mitigating Unw an ted Recommenda- tions Through Conformal Risk Con trol. In Pr o c e e dings of the Ninete enth A CM Confer enc e on R e c ommender Systems . 492–502. [15] Mic hael D. Ekstrand, Mucun Tian, Ion Madrazo Azpiazu, Jennifer D. Ekstrand, Oghene- maro Anuy ah, Da vid McNeill, and Maria Soledad Pera. 2018. All The Co ol Kids, Ho w Do They Fit In?: P opularit y and Demographic Biases in Recommender Ev aluation and Effec- tiv eness. In Pr o c e e dings of the 1st Confer enc e on F airness, A c c ountability and T r ansp ar ency (Pr o c e e dings of Machine L e arning R ese ar ch, V ol. 81) , Sorelle A. F riedler and Christo Wilson (Eds.). PMLR, 172–186. https://proceedings.mlr.press/v81/ekstrand18b.html [16] Motahhare Eslami, Karrie Karahalios, Christian Sandvig, Kristen V accaro, Aimee Ric kman, Kevin Hamilton, and Alex Kirlik. 2016. First I" like" it, then I hide it: F olk Theories of So cial F eeds. In Pr o c e e dings of the 2016 cHI c onfer enc e on human factors in c omputing systems . 2371–2382. 18 [17] Motahhare Eslami, Aimee Ric kman, Kristen V accaro, Amirhossein Aley asen, Andy V uong, Karrie Karahalios, Kevin Hamilton, and Christian Sandvig. 2015. " I alw a ys assumed that I wasn’t really that close to [her]" Reasoning ab out Invisible Algorithms in News F eeds. In Pr o c e e dings of the 33r d annual ACM c onfer enc e on human factors in c omputing systems . 153–162. [18] Europ ean Parliamen t and Council of the Europ ean Union. 2022. Regulation (EU) 2022/2065 of the Europ ean P arliamen t and of the Council of 19 October 2022 on a Single Mark et F or Digital Services and amending Directiv e 2000/31/EC (Digital Services Act) (T ext with EEA relev ance). https://eur- lex.europa.eu/eli/reg/2022/2065/oj Official Journal of the Europ ean Union. [19] W enqi F an, Xiangyu Zhao, Xiao Chen, Jingran Su, Jingtong Gao, Lin W ang, Qidong Liu, Yiqi W ang, Han Xu, Lei Chen, et al. 2022. A comprehensiv e survey on trustw orthy recom- mender systems. arXiv pr eprint arXiv:2209.10117 (2022). [20] Amir F a yazi, Kyumin Lee, James Ca verlee, and Anna Squicciarini. 2015. Uncov ering crowd- sourced manipulation of online reviews. In Pr o c e e dings of the 38th international ACM SIGIR c onfer enc e on r ese ar ch and development in information r etrieval . 233–242. [21] F ederal T rade Commission. 2022. T rade Regulation Rule on Commer- cial Surveillance and Data Security . F ederal Register, V ol. 87, No. 161. https://www.federalregister.gov/documents/2022/08/22/2022- 17752/ trade- regulation- rule- on- commercial- surveillance- and- data- security A dv ance notice of prop osed rulemaking. [22] Ek aterina F edorov a, Madeline Kitch, and Chara Podimata. 2025. User Altruism in Recom- mendation Systems. arXiv pr eprint arXiv:2506.04525 (2025). [23] Chongming Gao, Shijun Li, Y uan Zhang, Jia wei Chen, Biao Li, W enqiang Lei, Peng Jiang, and Xiangnan He. 2022. KuaiRand: An Unbiased Sequen tial Recommendation Dataset with Randomly Exp osed Videos. In Pr o c e e dings of the 31st ACM International Confer enc e on Information and Know le dge Management (A tlan ta, GA, USA) (CIKM ’22) . 3953–3957. doi: 10.1145/3511808.3557624 [24] Asaf Gendler, T sui-W ei W eng, Luca Daniel, and Y aniv Romano. 2021. Adv ersarially robust conformal prediction. In International Confer enc e on L e arning R epr esentations . [25] T arleton Gillespie. 2022. Do not recommend? Reduction as a form of con ten t mo deration. So cial Me dia+ So ciety 8, 3 (2022), 20563051221117552. [26] Jennifer Golbeck. 2025. Recommender System-Induced Eating Disorder Relapse: Harmful Con tent and the Challenges of Resp onsible Recommendation. ACM T r ans. Intel l. Syst. T e chnol. 16, 1, Article 15 (Jan. 2025), 13 pages. doi: 10.1145/3675404 [27] Moritz Hardt, Eric Mazumdar, Celestine Mendler-Dünner, and Tijana Zrnic. 2023. Al- gorithmic collectiv e action in mac hine learning. In International Confer enc e on Machine L e arning . PMLR, 12570–12586. [28] Lê-Nguyên Hoang, Louis F aucon, Aidan Jungo, Sergei V olo din, Dalia Papuc, Orfeas Lios- satos, Ben Crulis, Mariame Tighanimine, Isab ela Constantin, Anastasiia Kucherenk o, et al. 2021. T ournesol: A quest for a large, secure and trustw orth y database of reliable h uman judgmen ts. arXiv pr eprint arXiv:2107.07334 (2021). 19 [29] Jih yeong Hong, Eun-Y oung Ko, Juho Kim, and Jeong-w o o Jang. 2025. Why Social Media Users Press’ Not Interested’: Motiv ations, Anticipated Effects, and Result Interpretation. Pr o c e e dings of the ACM on Human-Computer Inter action 9, 7 (2025), 1–25. [30] Junjie Huang, Jizheng Chen, Jianghao Lin, Jiarui Qin, Ziming F eng, W einan Zhang, and Y ong Y u. 2025. A comprehensiv e survey on retriev al metho ds in recommender systems. A CM T r ansactions on Information Systems 44, 1 (2025), 1–43. [31] Lujain Ibrahim, Luc Rocher, and Ana V aldivia. 2024. Characterizing and mo deling harms from interactions with design patterns in AI in terfaces. arXiv pr eprint (2024). [32] Dietmar Jannach and Michael Jugo v ac. 2019. Measuring the business v alue of recommender systems. ACM T r ansactions on Management Information Systems (TMIS) 10, 4 (2019), 1–23. [33] Shagun Jhav er, Alice Qian Zhang, Quan Ze Chen, Nikhila Natara jan, Ruotong W ang, and Am y X Zhang. 2023. P ersonalizing conten t moderation on so cial media: User p ersp ectives on mo deration c hoices, interface design, and labor. Pr o c e e dings of the ACM on Human- Computer Inter action 7, CSCW2 (2023), 1–33. [34] Harry H Jiang, Lauren Brown, Jessica Cheng, Meh tab Khan, Abhishek Gupta, Deja W ork- man, Alex Hanna, Johnathan Flo wers, and Timnit Gebru. 2023. AI Art and its Impact on Artists. In Pr o c e e dings of the 2023 AAAI/ACM Confer enc e on AI, Ethics, and So ciety . 363–374. [35] Jialun Aaron Jiang, Morgan Klaus Scheuerman, Casey Fiesler, and Jed R Brubaker. 2021. Understanding in ternational p erceptions of the severit y of harmful conten t online. PloS one 16, 8 (2021), e0256762. [36] A dity a Karan, Prabhat Kalle, Nic holas Vincent, and Hari Sundaram. 2025. Sync or Sink: Bounds on Algorithmic Collectiv e A ction with Noise and Multiple Groups. arXiv pr eprint arXiv:2510.18933 (2025). [37] A dity a Karan, Nicholas Vincent, Karrie Karahalios, and Hari Sundaram. 2025. Algorithmic Collectiv e A ction with T w o Collectives. In Pr o c e e dings of the 2025 ACM Confer enc e on F airness, A c c ountability, and T r ansp ar ency . 1468–1483. [38] Nadia Karizat, Dan Delmonaco, Motahhare Eslami, and Nazanin Andalibi. 2021. Algorith- mic F olk Theories and Identit y: Ho w TikT ok Users Co-Produce Knowledge of Identit y and Engage in Algorithmic Resistance. Pr o c. ACM Hum.-Comput. Inter act. 5, CSCW2, Article 305 (Oct. 2021), 44 pages. doi: 10.1145/3476046 [39] Amro Khasa wneh, Kapil Chalil Madathil, Emma Dixon, P amela Wiśniewski, Heidi Zinzo w, and Reb ecca Roth. 2020. Examining the self-harm and suicide contagion effects of the Blue Whale Challenge on Y ouT ube and T witter: qualitative study . JMIR mental he alth 7, 6 (2020), e15973. [40] Ben Kuchera. 2017. The anatomy of a r eview b ombing c amp aign . P olygon. https:// www.polygon.com/2017/10/4/16418832/pubg- firewatch- steam- review- bomb Accessed: 2026-02-26. 20 [41] Khoa Lam, Benjamin Lange, Borhane Blili-Hamelin, Jo v ana Davido vic, Shea Brown, and Ali Hasan. 2024. A framework for assurance audits of algorithmic systems. In Pr o c e e dings of the 2024 ACM Confer enc e on F airness, A c c ountability, and T r ansp ar ency . 1078–1092. [42] Kyumin Lee, Prithivi T amilarasan, and James Cav erlee. 2013. Crowdturfers, campaigns, and so cial media: T racking and revealing cro wdsourced manipulation of so cial media. In Pr o c e e dings of the international AAAI c onfer enc e on web and so cial me dia , V ol. 7. 331–340. [43] Ming Li, Ali V ardasbi, Andrew Y ates, and Maarten De Rijk e. 2023. Repetition and explo- ration in sequential recommendation. In Pr o c e e dings of the 46th international ACM SIGIR Confer enc e on R ese ar ch and Development in Information R etrieval . 2532–2541. [44] Alexander Liu, Siqi W u, and Paul Resnick. 2024. Ho w to T rain Y our Y ouT ub e Recommender to A v oid Unw an ted Videos. In Pr o c e e dings of the International AAAI Confer enc e on W eb and So cial Me dia , V ol. 18. 930–942. [45] Marilia Marasciulo. 2022. How Anitta me gafans game d Sp otify to help cr e- ate Br azil’s first glob al chart-topp er . https://restofworld.org/2022/ anitta- fans- spotify- brazil- global- chart/ Last accessed: 13 Jan uary 2026. [46] Ariadna Matamoros-F ernández and Johan F ark as. 2021. Racism, hate sp eec h, and social media: A systematic review and critique. T elevision & new me dia 22, 2 (2021), 205–224. [47] Binn y Mathew, Anurag Illendula, Puny a jo y Saha, Soum ya Sark ar, P aw an Goy al, and Ani- mesh Mukherjee. 2020. Hate b egets hate: A temp oral study of hate speech. Pr o c e e dings of the ACM on Human-Computer Inter action 4, CSCW2 (2020), 1–24. [48] Celestine Mendler-Dünner, Gabriele Carov ano, and Moritz Hardt. 2024. An engine not a camera: Measuring p erformativ e p o wer of online search. A dvanc es in Neur al Information Pr o c essing Systems 37 (2024), 59266–59288. [49] Karol Morales-Muñoz and Beltran Ro ca. 2022. The spatiality of collectiv e action and orga- nization among platform work ers in Spain and Chile. Envir onment and Planning A: Ec on- omy and Sp ac e 54, 7 (2022), 1411–1431. arXiv:https://doi.org/10.1177/0308518X221103262 doi: 10.1177/0308518X221103262 [50] Marco Morik, Ashudeep Singh, Jessica Hong, and Thorsten Joac hims. 2020. Controlling fairness and bias in dynamic learning-to-rank. In Pr o c e e dings of the 43r d international A CM SIGIR c onfer enc e on r ese ar ch and development in information r etrieval . 429–438. [51] T orill Elvira Mortensen and T anja Sihv onen. 2020. Negativ e emotions set in motion: the con tinued relev ance of# GamerGate. In The Palgr ave handb o ok of international cyb er crime and cyb er devianc e . Springer, 1353–1374. [52] Edw ard Jr. Ongw eso. 2021. Or ganize d Do orDash Drivers’ #De clineNow Str at- e gy Is Driving Up Their Pay. https://www.vice.com/en/article/3anwdy/ organizeddoordash- drivers- declinenow- strategy- is- driving- up- their- pay Last accessed: 13 Jan uary 2026. [53] Victoria O’Meara. 2019. W eapons of the chic: Instagram influencer engagement p o ds as practices of resistance to Instagram platform lab or. So cial Me dia+ So ciety 5, 4 (2019), 2056305119879671. 21 [54] Seung-T aek Park and W ei Ch u. 2009. Pairwise preference regression for cold-start recom- mendation. In Pr o c e e dings of the thir d ACM c onfer enc e on R e c ommender systems . 21–28. [55] Da vid Sc ho ch, F ranzisk a B Keller, Sebastian Stier, and JungHwan Y ang. 2022. Co ordination patterns reveal online political astroturfing across the world. Scientific r ep orts 12, 1 (2022), 4572. [56] Dorothee Sigg, Moritz Hardt, and Celestine Mendler-Dünner. 2025. Decline now: A combi- natorial model for algorithmic collectiv e action. In Pr o c e e dings of the 2025 CHI Confer enc e on Human F actors in Computing Systems . 1–17. [57] V enera T omaselli, Giulio Giacomo Cantone , and V aleria Mazzeo. 2022. Review b omb: On the gamification of the ideological conflict. In Handb o ok of r ese ar ch on cr oss-disciplinary uses of gamific ation in or ganizations . IGI Global Scien tific Publishing, 334–354. [58] An tonela T ommasel and Filippo Menczer. 2022. Do Recommender Systems Make So cial Media More Susceptible to Misinformation Spreaders?. In Pr o c e e dings of the 16th ACM Confer enc e on R e c ommender Systems (Seattle, W A, USA) (R e cSys ’22) . Asso ciation for Computing Mac hinery , New Y ork, NY, USA, 550–555. doi: 10.1145/3523227.3551473 [59] Pier P aolo T ricomi, Sousan T arahomi, Christian Cattai, F rancesco Martini, and Mauro Con ti. 2023. Are w e all in a truman sho w? spotting instagram cro wdturfing through self-training. In 2023 32nd International Confer enc e on Computer Communic ations and Networks (ICCCN) . IEEE, 1–10. [60] Europ ean Union. 2021. Prop osal for a regulation of the european parliamen t and of the council laying do wn harmonised rules on artificial in telligence (artificial in telligence act) and amending certain union legislative acts. COM/2021/206final (2021), 1–107. [61] Gang W ang, Christo Wilson, Xiaohan Zhao, Yib o Zhu, Manish Mohanlal, Haitao Zheng, and Ben Y Zhao. 2012. Serf and turf: cro wdturfing for fun and profit. In Pr o c e e dings of the 21st international c onfer enc e on W orld Wide W eb . 679–688. [62] Leijie W ang, Kathryn Y urec hk o, Pranati Dani, Quan Ze Chen, and Am y X Zhang. 2025. End User Authoring of P ersonalized Conten t Classifiers: Comparing Example Lab eling, Rule W riting, and LLM Prompting. In Pr o c e e dings of the 2025 CHI Confer enc e on Human F actors in Computing Systems . 1–21. [63] Shoujin W ang, Xiuzhen Zhang, Y an W ang, and F rancesco Ricci. 2024. T rust worth y recom- mender systems. A CM T r ansactions on Intel ligent Systems and T e chnolo gy 15, 4 (2024), 1–20. [64] Zhenrui Y ue, Zhankui He, Huimin Zeng, and Julian McAuley . 2021. Blac k-Bo x Attac ks on Sequen tial Recommenders via Data-F ree Mo del Extraction. In Pr o c e e dings of the 15th A CM Confer enc e on R e c ommender Systems (Amsterdam, Netherlands) (R e cSys ’21) . Asso ciation for Computing Mac hinery , New Y ork, NY, USA, 44–54. doi: 10.1145/3460231.3474275 [65] Soroush H Zargarbashi, Mohammad Sadegh Akhondzadeh, and Aleksandar Bo jchevski. 2024. Robust Y et Efficient Conformal Prediction Sets. In International Confer enc e on Machine L e arning . PMLR, 17123–17147. [66] Jing Zeng and D Bondy V aldo vinos Ka ye. 2022. F rom con tent mo deration to visibilit y mo deration: A case study of platform go vernance on TikT ok. Policy & Internet 14, 1 (2022), 79–95. 22 [67] Kaik e Zhang, Qi Cao, F ei Sun, Y unfan W u, Sh uchang T ao, Hua wei Shen, and Xueqi Cheng. 2025. Robust recommender system: a survey and future directions. Comput. Surveys 58, 1 (2025), 1–38. A Pro ofs A.1 Pro of for Theorem 1 Pr o of. Consider a held-out calibration set Q = { ( u, i, h ) j } Q j =1 . Let us assume that there are K ⊂ Q users that are b eha ving strategically . Given λ ∈ Λ , let us denote the follo wing expected risks for b oth standard and adv ersarial users on the calibration set: ˆ R ( λ ) = 1 |Q| X u ′ ∈Q R ( S λ ( U = u ′ , k )) (14) = 1 |Q|   X u ′ ∈Q\K R ( S λ ( U = u ′ , k )) + X u ′′ ∈K R ( S λ ( U = u ′′ , k ))   (15) = 1 |Q| X u ′ ∈Q\K R ( S λ ( U = u ′ , k )) + 1 |Q| X u ′′ ∈K R ( S λ ( U = u ′′ , k )) (16) Let us denote ˆ R nonadv ( λ ) = 1 |Q| P u ′ ∈Q\K R ( S λ ( U = u ′ , k )) . F urther, let us assume that all adv ersaries hav e the same risk R ( S λ ( U = u, k )) = r adv for all u ∈ K . Th us, leading to the follo wing equalit y: ˆ R ( λ ) = ˆ R nonadv ( λ ) + |K| |Q| r adv (17) Giv en a target level α ∈ [0 , 1] , let us consider ˆ λ ∈ Λ choosen as ˆ λ = inf n λ : Q Q +1 ˆ R ( λ ) + 1 Q +1 ≤ α o , th us satisfying the risk con trol guaran tees [3]. Let us denote with Q = |Q| and K = |K| . Then, w e can rewrite the selection threshold to distinguish betw een standard and adv ersarial users as follo ws: Q Q + 1 ˆ R ( λ ) + 1 Q + 1 ≤ α (18) Q Q + 1  ˆ R nonadv ( λ ) + K Q r adv  + 1 Q + 1 ≤ α (19) Q Q + 1 ˆ R nonadv ( λ ) +   Q Q + 1 K   Q r adv + 1 Q + 1 ≤ α (20) Q Q + 1 ˆ R nonadv ( λ ) + 1 Q + 1 ≤ α − K Q + 1 r adv (21) In practice, the K adv ersarial users “consumes” K Q +1 units of risk budget, leaving only α − K Q +1 r adv for the original p opulation. Thus, we can denote the quan tity α eff = α − K Q +1 r adv as the effe ctive risk r e duction for standard users. Since risks are nonnegativ e, w e may take max { 0 , α eff } as a v alid target. By exc hange- abilit y and the standard conformal risk control guaran tee, inequalit y (21) implies that for a non-adv ersarial test user U nonad w e ha ve E [ R ( S λ ( U nonadv , k ))] ≤ max { 0 , α − K Q + 1 r adv λ } (22) This pro v es Theorem 1. 23 0 . 2 0 . 4 0 . 6 0 . 8 Score ( r ( i, u )) 0 1 2 3 4 Empirical Density 54 (n=957) 23 (n=1682) (a) Empirical densit y of the risk scores 0 20 40 60 80 100 Desired reduction in unw anted content (%) 0 20 40 60 ∆Reduction (%) Group 54 23 (b) Reduction in exposure in top-k Figure 7: Exp osure reduction for tw o item groups ( g ∈ { 54 , 23 } ) in Kuaishou . Although the t wo groups exhibit similar exp ected harmfulness, they differ in their risk score distributions: E [ r ( U, I ) | G = 54] = 0 . 604 and E [ r ( U, I ) | G = 23] = 0 . 576 (Fig. 7a). As a result, risk-control induces disparate exp osure across the tw o groups (Fig. 7b), ev en in the absenc e of adversarial users . B Disparate Exp osure in Risk-Con trolling Recommenders In the exp eriments of Section 6, w e examined the aggregate impact of adversarial collectiv es on recommendation performance. W e no w turn to a finer-grained analysis of item-gr oup exp osur e . Sp ecifically , we fo cus on groups of items that share a similar ground-truth lik eliho o d of being flagged as unw an ted, i.e. , with comparable v alues of E [ H = 1 | G ] . F or these groups, we measure empirical exp osure at test time ( cf. Eq. (12)) under the risk-con trolling recommender system in the absenc e of adversarial users . While man y groups exp erience comparable reductions in exp osure–as exp ected, giv en that risk control replaces risky items with safer alternatives–w e observ e notable disparities across some groups. F or example, Fig. 7b rep orts the exp osure reduction for tw o item groups, g ∈ { 54 , 23 } , whic h share the same ground-truth unw antedness rate ( E [ H = 1 | G = 54] ≈ E [ H = 1 | G = 23] ≈ 0 . 002 ). Despite this similarit y , the groups exhibit markedly differen t exp osure dynamics as the level of risk control v aries. In particular, at a target reduction of 50% in unw an ted conten t, the exp osure of group 54 remains relatively stable, whereas group 23 exp eriences nearly a 30% decrease. This disparity arises from differences in the distributions of the learned risk scores across groups, as illustrated in Fig. 7a. As a result, ev en when groups are equally likely to be flagged as unw an ted, risk-controlling recommender systems may induce disp ar ate exp osur e [15] if the underlying risk predictor r ( i, u ) exhibits group- dep enden t biases. Importantly , these disparities emerge indep enden tly of adversarial b ehaviour, highligh ting that risk control migh t amplify pre-existing biases in the learned risk mo del rather than in troducing new ones. 24

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment