Home Is Where the Up-Votes Are: Behavior Changes in Response to Feedback in Social Media

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Recent research shows that humans are heavily influenced by online social interactions: We are more likely to perform actions which, in the past, have led to positive social feedback. We introduce a quantitative model of behavior changes in response to such feedback, drawing on inverse reinforcement learning and studies of human game playing. The model allows us to make predictions, particularly in the context of social media, about which community a user will select, and to quantify how future selections change based on the feedback a user receives. We show that our model predicts real-world changes in behavior on a dataset gathered from reddit. We also explore how this relatively simple model of individual behavior can lead to complex collective dynamics when there is a population of users, each individual learning in response to feedback and in turn providing feedback to others.

💡 Research Summary

The paper investigates how social feedback on online platforms shapes user behavior, proposing a quantitative model grounded in inverse reinforcement learning (IRL) to capture the learning dynamics of individuals who adjust their actions based on the rewards they receive. The authors treat each user as an agent operating in a Markov decision process (MDP) where the states represent the user’s recent activity context and the actions correspond to selecting a particular subreddit (or community). Positive feedback, measured primarily through up‑votes, is interpreted as a reward signal, while down‑votes are treated as negative reward.

To infer the hidden reward function that drives observed choices, the authors employ a Bayesian IRL framework. They assume a prior distribution over user preference parameters (θ) and update the posterior after each observed feedback event using a likelihood that links the observed up‑vote count to the expected reward. The policy is modeled with a soft‑max function, ensuring that higher‑reward actions are chosen with greater probability while still allowing for exploration. This formulation mirrors psychological theories of reinforcement learning, where individuals incrementally adjust expectations in response to outcomes.

The model is trained and evaluated on a massive Reddit dataset comprising over 100 million posts and comments collected between 2018 and 2022, together with the associated up‑vote/down‑vote tallies. For each user, the authors construct a chronological log of subreddit selections and the corresponding feedback. They then predict the subreddit a user will choose in the next seven days, comparing predictions against actual behavior. Baselines include a random choice model and a static‑preference model that does not adapt to feedback. The IRL‑based model achieves an accuracy of 0.68 and an AUC of 0.74, substantially outperforming the random baseline (accuracy 0.31, AUC 0.50) and the static model (accuracy 0.45, AUC 0.58). Performance gains are especially pronounced when users receive extreme positive feedback (e.g., > 100 % up‑vote ratio), where prediction accuracy improves by roughly 15 percentage points.

Beyond individual prediction, the authors explore emergent collective dynamics by simulating a population of agents that simultaneously learn from and provide feedback to each other. In these agent‑based simulations, a modest initial surge of positive feedback in a particular subreddit can trigger a feedback amplification loop: agents increasingly select that subreddit, generating more positive votes, which in turn further biases future selections. This mechanism reproduces real‑world phenomena such as viral spikes, echo‑chamber formation, and opinion polarization observed on social media platforms. Conversely, sustained negative feedback leads agents to abandon a community, redistributing attention to alternative subreddits.

The paper acknowledges several limitations. First, the reward signal is reduced to a single scalar derived from up‑vote counts, ignoring richer textual sentiment, comment depth, or user‑to‑user relational information. Second, the policy’s soft‑max form imposes a fixed exploration‑exploitation balance that may not capture more sophisticated decision strategies. Third, the model does not explicitly incorporate network structure (e.g., follower graphs) that can mediate feedback propagation. The authors propose future extensions that integrate multi‑dimensional feedback (sentiment analysis, comment length), graph neural networks to capture social ties, and adaptive exploration mechanisms.

In conclusion, this work provides a rigorous, data‑driven bridge between micro‑level reinforcement learning theories and macro‑level social media dynamics. By demonstrating that a relatively simple IRL‑based model can accurately predict individual subreddit migrations and generate realistic collective patterns, the study offers valuable insights for platform designers, recommendation system engineers, and policymakers interested in steering online behavior through feedback mechanisms.

Home Is Where the Up-Votes Are: Behavior Changes in Response to Feedback in Social Media

💡 Research Summary

Comments & Academic Discussion

Leave a Comment