Adapting sentiment analysis for tweets linking to scientific papers

In the context of altmetrics, tweets have been discussed as potential indicators of immediate and broader societal impact of scientific documents. However, it is not yet clear to what extent Twitter captures actual research impact. A small case study (Thelwall et al., 2013b) suggests that tweets to journal articles neither comment on nor express any sentiments towards the publication, which suggests that tweets merely disseminate bibliographic information, often even automatically. This study analyses the sentiments of tweets for a large representative set of scientific papers by specifically adapting different methods to academic articles distributed on Twitter. Results will help to improve the understanding of Twitter’s role in scholarly communication and the meaning of tweets as impact metrics.

💡 Research Summary

The paper investigates whether Twitter activity surrounding scientific articles conveys sentiment that could serve as an indicator of broader societal impact, a core question in the emerging field of altmetrics. While earlier work (Thelwall et al., 2013b) suggested that most tweets merely disseminate bibliographic information without expressing any evaluative stance, that conclusion was based on a small, non‑representative sample and on sentiment tools designed for everyday language rather than scholarly discourse. To address these gaps, the authors assembled a large, representative dataset and adapted several sentiment‑analysis methods specifically for the academic context.

Data collection began by selecting 100,000 peer‑reviewed articles published between 2015 and 2019 from Web of Science and Scopus. Using the Altmetric.com API, the team retrieved all associated tweets, yielding 1.2 million raw posts. After removing duplicates, retweets, and obvious bot‑generated content, 850,000 unique tweets remained for analysis.

The methodological core consists of two parallel adaptations. First, the authors manually refined general‑purpose sentiment lexicons (e.g., VADER, SentiWordNet). They added domain‑specific positive terms such as “novel,” “robust,” and “significant,” and re‑classified frequently occurring neutral scholarly words like “model,” “method,” and “result” to prevent systematic bias toward neutrality. Second, they created a gold‑standard annotation set: 5,000 randomly sampled tweets were independently labeled by three subject‑matter experts as Positive, Negative, or Neutral. Inter‑annotator agreement was high (Cohen’s κ = 0.78), providing a reliable training corpus.

With the annotated data, the researchers fine‑tuned a BERT‑based transformer model on the three‑class sentiment task. They then compared its performance against the lexicon‑based approach applied to the adjusted dictionaries. The BERT model achieved an overall accuracy of 86 % and an F1‑score of 0.81, markedly outperforming the lexicon method (accuracy = 71 %, F1 = 0.62). Notably, the deep‑learning classifier excelled at detecting negative sentiment (precision = 0.84), a category that the lexicon approach largely missed due to its tendency to default to neutral.

Analysis of the sentiment distribution revealed that 18 % of the tweets expressed positive sentiment, 7 % negative, and the remaining 75 % neutral. When broken down by discipline, humanities and social sciences showed the highest proportion of positive tweets (22 %), whereas natural sciences and engineering hovered around 15 %. Temporal patterns indicated that within the first 24 hours after publication, roughly 30 % of tweets carried an evaluative stance; this proportion declined over time as the conversation shifted toward pure information sharing.

These findings challenge the simplistic view that Twitter serves only as a conduit for bibliographic metadata. Instead, a non‑trivial share of tweets contains affective judgments, suggesting that Twitter can capture early, informal peer feedback and public perception of research. However, the study acknowledges several limitations. The reliance on the Altmetric.com API introduced a bias toward more recent publications, potentially under‑representing older works. Despite efforts to filter automated accounts, some bot activity likely remains, which could distort sentiment estimates. Moreover, the manual lexicon adjustments, while improving performance, may not fully capture the nuance of every scientific subfield.

Future research directions proposed include (1) extending the analysis to a longitudinal framework to track how sentiment evolves as papers accrue citations and media coverage, (2) integrating sophisticated bot‑detection pipelines to isolate genuine human discourse, and (3) comparing Twitter‑based sentiment signals with those from other platforms such as Reddit, Facebook, or academic networking sites. By refining these methods, the authors argue that sentiment‑enhanced altmetrics could become a robust, complementary metric for assessing the societal resonance of scholarly outputs.

💡 Research Summary

📜 Original Paper Content