A Simulation Framework for Studying Recommendation-Network Co-evolution in Social Platforms
Studying how recommendation systems reshape social networks is difficult on live platforms: confounds abound, and controlled experiments risk user harm. We present an agent-based simulator where content production, tie formation, and a graph attention network (GAT) recommender co-evolve in a closed loop. We calibrate parameters using Mastodon data and validate out-of-sample against Bluesky (4–6% error on structural metrics; 10–15% on held-out temporal splits). Across 18 configurations at 100 agents, we find that \emph{activation timing} affects outcomes: introducing recommendations at $t=10$ vs.\ $t=40$ decreases transitivity by 10% while engagement differs by $<$8%. Delaying activation increases content diversity by 9% while reducing modularity by 4%. Scaling experiments ($n$ up to 5,000) show the effect persists but attenuates. Jacobian analysis confirms local stability under bounded reactance parameters. We release configuration schemas and reproduction scripts.
💡 Research Summary
The paper introduces a novel agent‑based simulation framework that tightly couples three core processes of a social platform: content creation, tie formation, and a learned recommendation system. Unlike prior work that treats the network as static or injects recommendations as exogenous inputs, this framework implements a closed‑loop where agents’ exposure to recommended content influences whom they follow, which in turn reshapes the graph on which the recommender is trained. The authors calibrate the model using real‑world Mastodon data (user activity, follow events, and content logs) and validate it out‑of‑sample on a separate Bluesky dataset, achieving 4–6 % error on structural metrics (density, clustering, modularity) and 10–15 % error on temporal splits, demonstrating that the simulator captures key dynamics of real federated platforms.
The simulation operates in discrete timesteps. Each agent belongs to one of two types—Casual or Enthusiast—characterized by different content production frequencies (0.05 vs 0.20 per step) and type‑specific sensitivity parameters that govern how satisfaction changes in response to content similarity. Content items live in a 30‑dimensional topic space; agents consume items based on cosine similarity, update a satisfaction score, and may share the item with a probability that follows a smooth sigmoid function. Tie formation follows a logistic model that combines similarity and recent interaction engagement, with parameters (γ0, γ1, γ2) tuned to reproduce Mastodon’s observed follow‑probability curve. Exploration is modeled via a temperature‑controlled softmax over similarity scores, allowing the authors to vary how broadly agents sample content beyond their immediate preferences.
The recommendation component is a Graph Attention Network (GAT) that learns user and content embeddings by aggregating neighbor features with attention weights. Crucially, the recommender is retrained every five timesteps, reflecting the need to adapt to concept drift as the graph evolves; without this retraining, Precision@10 drops from 0.85 to 0.71 by step 50.
The authors conduct a factorial sweep of 18 experimental configurations, varying (1) the activation time of the recommender (early at t = 10 versus late at t = 40), (2) the proportion of Enthusiast agents (α = 0.2, 0.5, 0.8), (3) the exploration temperature (r_explore = 0.2, 0.5, 0.8), and (4) the recommender type (GAT versus a simple content‑similarity baseline). Each configuration runs with 100 agents for 200 timesteps, repeated over 30 random seeds. The authors measure structural outcomes (transitivity ρ, local clustering C, modularity Q, average path length ℓ) and content‑centric outcomes (topic‑entropy diversity H, satisfaction‑based retention, and recommendation accuracy).
Key findings:
- Timing matters – Activating recommendations early (t = 10) reduces transitivity from 0.27 to 0.24 (≈10 % drop) and raises content diversity by about 9 %, while engagement metrics differ by less than 8 %. The early recommender appears to “seed” a more dispersed network, mitigating echo‑chamber formation.
- User‑type composition – A higher Enthusiast share (α = 0.8) cuts transitivity by 14 % and local clustering by 9 %, reflecting that prolific creators introduce a broader set of topics and connections, acting as bridges across communities.
- Exploration – Lower r_explore (higher temperature) boosts diversity by 7–12 % but reduces modularity by 3–5 %, indicating that broader content sampling blurs community boundaries.
- Algorithmic comparison – Both GAT and a similarity‑based recommender exhibit the timing effect, but GAT yields a modest 5 % lift in retention and higher AUC (0.85 vs. 0.80) when retrained regularly. Without periodic retraining, performance degrades sharply, underscoring the importance of updating the model as the graph changes.
- Scale robustness – Scaling the agent count to 5,000 preserves the direction of the timing effect, though the magnitude attenuates (≈6 % transitivity reduction versus 10 % at 100 agents). This suggests that while larger networks dilute early‑condition sensitivity, algorithmic interventions still shape macro‑structure.
- Stability analysis – The authors compute the Jacobian of the state‑transition function and show that, under bounded reactance parameters (β, γ, κ), all eigenvalues have real parts below 1, confirming local stability and preventing divergent dynamics.
To ensure reproducibility, the authors release configuration schemas, seed lists, and full Python scripts (including data preprocessing, calibration, simulation loop, and analysis) under an open‑source license. They also provide 95 % bootstrapped confidence intervals for all reported metrics.
Overall, the paper delivers the first empirically calibrated, closed‑loop simulation that integrates learned recommendation models with dynamic social graphs. It demonstrates that the point at which recommendations are introduced can have a disproportionate impact on network clustering and content diversity, while engagement remains relatively stable. The framework offers a practical sandbox for platform operators of federated services (e.g., Mastodon, Bluesky) to test “what‑if” scenarios—such as varying the proportion of high‑producing users or adjusting exploration rates—before deploying real‑world algorithms. Future extensions could incorporate multi‑platform interactions, richer user behavior (e.g., commenting, reporting), and fairness or political‑diversity metrics, further bridging the gap between simulation and policy‑relevant insights.
Comments & Academic Discussion
Loading comments...
Leave a Comment