Recommender system in X inadvertently profiles ideological positions of users
Studies on recommendations in social media have mainly analyzed the quality of recommended items (e.g., their diversity or biases) and the impact of recommendation policies (e.g., in comparison with purely chronological policies). We use a data donation program, collecting more than 2.5 million friend recommendations made to 682 volunteers on X over a year, to study instead how real-world recommenders learn, represent and process political and social attributes of users inside the so-called black boxes of AI systems. Using publicly available knowledge on the architecture of the recommender, we inferred the positions of recommended users in its embedding space. Leveraging ideology scaling calibrated with political survey data, we analyzed the political position of users in our study (N=26,509 among volunteers and recommended contacts) among several attributes, including age and gender. Our results show that the platform’s recommender system produces a spatial ordering of users that is highly correlated with their Left-Right positions (Pearson rho=0.887, p-value < 0.0001), and that cannot be explained by socio-demographic attributes. These results open new possibilities for studying the interaction between human and AI systems. They also raise important questions linked to the legal definition of algorithmic profiling in data privacy regulation by blurring the line between active and passive profiling. We explore new constrained recommendation methods enabled by our results, limiting the political information in the recommender as a potential tool for privacy compliance capable of preserving recommendation relevance.
💡 Research Summary
This paper investigates whether X’s (formerly Twitter) friend‑recommendation algorithm unintentionally learns and exploits users’ political orientations, thereby constituting a form of algorithmic profiling. The authors launched a data‑donation campaign in France, recruiting 682 volunteers who installed a browser extension that recorded every “Who to follow” (WTF) recommendation shown to them between January 2023 and May 2024. Over this period more than 2.5 million recommendation events were captured, involving 105 K distinct accounts that were suggested to at least two volunteers. Combining volunteers (V) and repeatedly recommended accounts (U) yielded a study population E of 26 509 users for which the authors also collected follower graphs, profile texts, photos, and the most recent 200 posts via X’s public API.
The core technical contribution is a method to reconstruct the hidden 256‑dimensional embedding that powers X’s recommender. Public documentation reveals that X embeds users, posts, and ads in a high‑dimensional space and ranks candidates by inner‑product similarity. Treating the observed recommendation pairs as constraints, the authors formulate a constrained optimization problem: find embeddings that maximize the likelihood of the observed rankings while respecting the known architecture (inner‑product scoring, candidate generation). Using gradient descent, they infer a vector φ_i for each user i. Model validation on a held‑out 10 % of the data yields AU‑ROC = 0.700, Precision@1 = 0.725 and Precision@3 = 0.691, substantially outperforming three baselines (random, most‑followed second‑neighbors, most‑followed first‑neighbors) whose AU‑ROC values fall below 0.47. Robustness checks (different training splits, alternative hyper‑parameters, simulated volunteer bias, platform‑usage shifts around the May 2023 ownership change) confirm that the inferred embeddings reliably reproduce the recommendation behavior.
To examine political content, the authors rely on an external dataset that assigns a Left‑Right score (0 = far‑left, 10 = far‑right) to X users based on a multidimensional ideology‑scaling of follower networks calibrated with French election survey data. This dataset also provides a second “anti‑elite” dimension. Of the 26 509 users, 8 249 (31 %) have such calibrated scores. By projecting these users into the inferred embedding, the authors identify a dominant linear direction that aligns with the Left‑Right axis. Pearson correlation between the projection onto this direction and the external Left‑Right score is ρ = 0.887 (p < 0.0001), indicating a very strong linear relationship. By contrast, correlations with demographic variables (age, gender) are modest (|ρ| ≤ 0.13), and partial‑correlation analyses show that the political direction cannot be explained by these covariates. Additional user attributes—estimated age and gender from profile pictures (using the M3 model), follower count percentile, and topic interests (e.g., news) derived from a pretrained topic model—exhibit low absolute Spearman correlations with the political axis, reinforcing the conclusion that the recommender’s embedding encodes political preference independently of observable demographics.
The paper then discusses legal and ethical implications. GDPR, the EU Digital Services Act, and similar regulations in South Korea, Switzerland, and Brazil prohibit processing of political opinion data without explicit consent. The authors argue that the recommender’s latent political profiling blurs the line between “active” profiling (explicit data collection) and “passive” profiling (inference from behavioral signals), potentially challenging current regulatory definitions.
Finally, the authors explore a mitigation strategy: they construct an orthogonal projection that removes the identified political direction from all user embeddings before recommendation. Experiments show that after this projection, recommendation relevance (Precision@1) drops only modestly (to ≈0.68) while preserving diversity metrics, suggesting that privacy‑compliant recommender designs are feasible.
In sum, the study provides the first large‑scale empirical evidence that a mainstream social‑media recommender system learns a spatial representation of users that is highly predictive of their Left‑Right political stance, independent of demographic factors. It highlights a concrete tension between AI‑driven personalization and data‑privacy law, and offers a concrete technical pathway—constrained embedding manipulation—to reconcile recommendation utility with regulatory compliance.
Comments & Academic Discussion
Loading comments...
Leave a Comment