TweetCred: Real-Time Credibility Assessment of Content on Twitter

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

During sudden onset crisis events, the presence of spam, rumors and fake content on Twitter reduces the value of information contained on its messages (or “tweets”). A possible solution to this problem is to use machine learning to automatically evaluate the credibility of a tweet, i.e. whether a person would deem the tweet believable or trustworthy. This has been often framed and studied as a supervised classification problem in an off-line (post-hoc) setting. In this paper, we present a semi-supervised ranking model for scoring tweets according to their credibility. This model is used in TweetCred, a real-time system that assigns a credibility score to tweets in a user’s timeline. TweetCred, available as a browser plug-in, was installed and used by 1,127 Twitter users within a span of three months. During this period, the credibility score for about 5.4 million tweets was computed, allowing us to evaluate TweetCred in terms of response time, effectiveness and usability. To the best of our knowledge, this is the first research work to develop a real-time system for credibility on Twitter, and to evaluate it on a user base of this size.

💡 Research Summary

The paper presents TweetCred, a real‑time system that assesses the credibility of individual tweets, particularly during sudden‑onset crises when misinformation spreads rapidly on Twitter. Unlike prior work that treats credibility detection as an offline classification problem, TweetCred continuously scores tweets as users browse their timelines. The authors collected a large corpus of tweets (over 10 million) from six high‑impact events in 2013 (Boston Marathon bombings, Typhoon Haiyan/Yolanda, Cyclone Phailin, Washington Navy Yard shootings, a polar vortex cold wave, and Oklahoma tornadoes) using Twitter’s streaming API. From each event they randomly selected 500 tweets and obtained crowdsourced annotations via CrowdFlower. Annotation proceeded in two stages: first, workers judged whether a tweet was related to the event (R1–R3); second, only tweets deemed informative (R1) were further labeled for credibility (C1–C3). The final distribution among informative tweets was 52 % “definitely credible”, 35 % “seems credible”, and 13 % “definitely incredible”.

To enable real‑time scoring, the authors engineered 45 features that can be extracted from a single tweet without requiring extensive historical data. These features fall into four groups: tweet meta‑data (timestamp, source, geolocation), simple content metrics (character count, word count, number of URLs, hashtags, unique characters, presence of “via”, emoticons, etc.), linguistic cues (sentiment words, profanity, pronouns, self‑references), author attributes (followers, friends, account age, location), network signals (retweet count, mentions, reply status), and external reputation scores (Web of Trust rating for URLs, YouTube like/dislike ratio). Feature importance analysis showed that tweet‑based signals dominate; the top ten features include the presence of “via”, character count, unique character count, word count, user location, retweet count, tweet age, URL presence, and ratios of statuses/followers and friends/followers.

For the learning‑to‑rank component, the authors evaluated four algorithms commonly used in information retrieval: AdaRank, Coordinate Ascent, RankBoost, and SVM‑rank. Using 4‑fold cross‑validation on the labeled data, they measured performance with Normalized Discounted Cumulative Gain (NDCG) at cut‑offs 25, 50, 75, and 100. AdaRank and Coordinate Ascent achieved the highest NDCG scores, but required substantially longer training times (≈1 minute). SVM‑rank delivered comparable NDCG (within 0.02 of the best) while training in under 10 seconds and testing in less than a second, making it suitable for frequent model updates based on user feedback. Consequently, SVM‑rank was selected for the production system.

TweetCred was deployed as a browser extension (the most widely used interface), a web application, and a RESTful API. When a user loads their Twitter timeline, the extension extracts tweet IDs, sends them to the server, which computes credibility scores using the SVM‑rank model and returns a 1‑to‑7 rating displayed alongside each tweet. Performance evaluation on real users (1,127 participants over three months) showed that 80 % of scores were computed and displayed within six seconds. In a post‑deployment survey, 63 % of users either agreed with the automatically generated scores or disagreed by only one or two points on the 1‑7 scale, indicating reasonable alignment with human judgments.

The study’s contributions are threefold: (1) a semi‑supervised ranking model that leverages a rich set of real‑time tweet features; (2) a fully operational real‑time credibility system evaluated on a sizable user base; (3) empirical evidence that such a system can deliver timely and acceptable credibility assessments during crisis events. Limitations include reliance on crowdsourced annotations from U.S. workers, potential cultural bias, and a focus on text‑based features, which may not capture multimodal content (images, videos). Future work aims to expand multilingual labeling, incorporate multimodal signals, and develop online learning mechanisms that adapt the model continuously from user feedback, thereby improving robustness and personalization of credibility scores.

TweetCred: Real-Time Credibility Assessment of Content on Twitter

💡 Research Summary

Comments & Academic Discussion

Leave a Comment