Measuring Individual User Fairness with User Similarity and Effectiveness Disparity
Individual user fairness is commonly understood as treating similar users similarly. In Recommender Systems (RSs), several evaluation measures exist for quantifying individual user fairness. These measures evaluate fairness via either: (i) the disparity in RS effectiveness scores regardless of user similarity, or (ii) the disparity in items recommended to similar users regardless of item relevance. Both disparity in recommendation effectiveness and user similarity are very important in fairness, yet no existing individual user fairness measure simultaneously accounts for both. In brief, current user fairness evaluation measures implement a largely incomplete definition of fairness. To fill this gap, we present Pairwise User unFairness (PUF), a novel evaluation measure of individual user fairness that considers both effectiveness disparity and user similarity. PUF is the only measure that can express this important distinction. We empirically validate that PUF does this consistently across 4 datasets and 7 rankers, and robustly when varying user similarity or effectiveness. In contrast, all other measures are either almost insensitive to effectiveness disparity or completely insensitive to user similarity. We contribute the first RS evaluation measure to reliably capture both user similarity and effectiveness in individual user fairness. Our code: https://github.com/theresiavr/PUF-individual-user-fairness-recsys.
💡 Research Summary
The paper addresses a fundamental gap in the evaluation of individual‑user fairness for recommender systems (RS). Existing fairness metrics either (i) measure disparity in effectiveness scores across users (e.g., standard deviation, Gini, envy‑based measures) while ignoring how similar the users actually are, or (ii) incorporate user similarity but only compare the representation of the recommended item sets (the UF metric). Neither approach fully captures the principle that “similar users should receive similarly effective recommendations,” which is the canonical definition of individual fairness.
To fill this void, the authors propose Pairwise User unFairness (PUF), a novel metric that simultaneously accounts for (a) the similarity between any two users and (b) the absolute difference in a chosen effectiveness score (such as P@k or NDCG@k) for those users. Formally, for a set U of m users, let sim(u,u′)∈
Comments & Academic Discussion
Loading comments...
Leave a Comment