Agent-Based User-Adaptive Filtering for Categorized Harassing Communication
We propose an agent-based framework for personalized filtering of categorized harassing communication in online social networks. Unlike global moderation systems that apply uniform filtering rules, our approach models user-specific tolerance levels and preferences through adaptive filtering agents. These agents learn from user feedback and dynamically adjust filtering thresholds across multiple harassment categories, including offensive, abusive, and hateful content. We implement and evaluate the framework using supervised classification techniques and simulated user interaction data. Experimental results demonstrate that adaptive agents improve filtering precision and user satisfaction compared to static models. The proposed system illustrates how agent-based personalization can enhance content moderation while preserving user autonomy in digital social environments.
💡 Research Summary
The paper addresses a fundamental limitation of current content‑moderation systems on social media: they apply a single, globally trained classifier to all users, implicitly assuming that everyone perceives harassment in the same way. Drawing on psychological and behavioral research that documents substantial inter‑individual variability in harassment perception, the authors propose an agent‑based, user‑adaptive filtering framework that personalizes moderation according to each user’s tolerance and preferences.
First, the authors construct a detailed taxonomy of harassment types. Starting from an extensive literature review, they identify twelve fine‑grained categories (e.g., general harassment, cruel statements, religious/racial/ethnic slurs, sexual‑orientation‑based harassment, gender‑based harassment, threats, multiple‑type harassment, and non‑harassment). For each category they compile representative keywords, which they use to query the Twitter streaming API. From an initial pool of 8,000 tweets, after cleaning and manual tri‑labeling by three lab members, 5,231 tweets remain, each assigned a majority‑vote category label. The labeling process itself reveals considerable disagreement among annotators, underscoring the subjective nature of harassment perception.
To capture real‑world user preferences, the authors conduct a large‑scale crowdsourced survey on Amazon Mechanical Turk. They recruit 360 workers, each of whom evaluates 75 tweets drawn from all categories. For every tweet, participants answer two questions: (1) the perceived harassment intensity (None, Minimal, Moderate, High, Extreme) and (2) whether they would like the tweet filtered from their personal feed (Yes/No). Each tweet is evaluated by five different workers, yielding roughly 26,500 responses.
Statistical analysis of the survey data demonstrates three key findings. An ANOVA test shows that perceived intensity significantly influences the likelihood of a user choosing to filter a tweet (p < 0.01). Post‑hoc Tukey HSD confirms that the “Extreme” intensity level differs markedly from all lower levels. A Wilcoxon rank‑sum test further reveals that filtering preferences vary across harassment categories; for example, religious/racial slurs are filtered at a higher rate than general harassment at the same intensity level. These results collectively validate the hypothesis that both harassment category and intensity jointly shape individual filtering decisions.
Motivated by this evidence, the authors design a personalized agent for each user. The agent treats the user’s own survey responses as training data, fitting a lightweight classifier (logistic regression, SVM, or a shallow neural network). The agent continuously updates its decision threshold as the user provides additional feedback, thereby adapting to evolving tolerance levels. The system architecture consists of three layers: (1) a data ingestion module that streams tweets from the platform, (2) a per‑user agent that predicts “filter” vs. “allow” for each incoming tweet, and (3) a feedback loop where the user’s explicit choices are fed back to retrain the agent.
The authors evaluate three configurations: (a) a general classifier trained on the entire labeled tweet set, (b) a majority‑vote baseline that applies the average user’s filtering choices, and (c) the proposed user‑adaptive agents. Evaluation metrics include precision, recall, F1‑score, and a post‑experiment user‑satisfaction questionnaire. The personalized agents achieve a precision of 0.87, recall of 0.82, and F1 of 0.84, substantially outperforming the general model (precision 0.71, recall 0.68, F1 0.69) and the majority‑vote baseline. Moreover, users report a 15 % increase in satisfaction when interacting with their own adaptive agents.
The paper’s contributions are fivefold: (1) a psychologically grounded, multi‑category harassment taxonomy, (2) a large‑scale, publicly relevant dataset of 26 k+ user‑level sensitivity responses, (3) rigorous statistical validation of inter‑user variability, (4) the design and implementation of an agent‑based adaptive filtering architecture, and (5) empirical evidence that personalization yields superior moderation performance and user experience.
In conclusion, the study demonstrates that effective online safety mechanisms must move beyond one‑size‑fits‑all policies toward user‑centric, adaptive moderation. Future work is suggested to deploy the agents in live social‑media environments, monitor long‑term behavioral effects, and extend the framework to other languages and cultural contexts.
Comments & Academic Discussion
Loading comments...
Leave a Comment