From Birdwatch to Community Notes, from Twitter to X: four years of community-based content moderation
Community Notes (formerly known as Birdwatch) is the first large-scale crowdsourced content moderation initiative that was launched by X (formerly known as Twitter) in January 2021. As the Community Notes model gains momentum across other social media platforms, there is a growing need to assess its underlying dynamics and effectiveness. This resource paper provides (a) a systematic review of the literature on Community Notes, and (b) a major curated dataset and accompanying source code to support future research on Community Notes. We parsed Notes and Ratings data from the first four years of the program and conducted language detection across all Notes. Focusing on English-language Notes, we extracted embedded URLs and identified discussion topics in each Note. Additionally, we constructed monthly interaction networks among the Contributors. Together with the literature review, these resources offer a robust foundation for advancing research on the Community Notes system.
💡 Research Summary
This paper presents a comprehensive resource on X’s Community Notes (formerly Birdwatch), the first large‑scale crowdsourced content‑moderation system launched in January 2021. The authors first situate the work within the broader literature on misinformation, noting the limitations of expert‑only fact‑checking and purely algorithmic moderation, and argue that collective judgment can complement these approaches. A systematic literature review summarises prior findings: Community Notes is effective at flagging misinformation, often earlier than professional fact‑checkers, and enjoys high agreement with expert assessments; however, it suffers from latency, partisan rating bias, and vulnerability to manipulation.
The core contribution is a curated dataset covering the first four years of the program. Using the X API, the authors extracted all Notes and associated Ratings, performed language detection on over 150 languages, and focused subsequent analyses on English‑language notes. They extracted embedded URLs to assess source diversity, and applied a hybrid topic‑modeling pipeline (LDA + BERTopic) to identify roughly 30 coherent topics, revealing that political discussions exhibit the strongest ideological polarization while non‑political topics are comparatively neutral.
In addition to content analysis, the paper constructs monthly directed weighted interaction networks among Contributors based on rating exchanges. Community detection (Louvain) uncovers distinct partisan clusters, confirming earlier observations that early rating mechanisms (simple “most helpful” counts) reinforced echo chambers. The later introduction of a “bridging algorithm” that weights ratings from users on opposite ends of an opinion spectrum modestly increased cross‑ideological agreement, but the network remains dominated by a small set of highly active contributors. This concentration contributes to the observed average delay of 24.29 hours for a Note to achieve “helpful” status—far longer than the two‑hour window in which 99.3 % of misleading posts receive fact‑checking comments elsewhere—meaning many notes appear after a post’s viral peak.
The authors release the full dataset, preprocessing scripts, and analysis code, enabling reproducibility and future work on algorithmic refinements, bias mitigation, and real‑time moderation. They acknowledge limitations such as API rate caps, evolving privacy policies, and potential missing data. Finally, the paper discusses policy implications: improving rater diversity, reducing publication latency, and safeguarding against coordinated attacks are essential for scaling community‑driven moderation without compromising accuracy or fairness. Overall, the study provides a solid empirical foundation for scholars and platform designers seeking to understand and enhance crowd‑based moderation systems.
Comments & Academic Discussion
Loading comments...
Leave a Comment