People on Media: Jointly Identifying Credible News and Trustworthy Citizen Journalists in Online Communities
Media seems to have become more partisan, often providing a biased coverage of news catering to the interest of specific groups. It is therefore essential to identify credible information content that provides an objective narrative of an event. News communities such as digg, reddit, or newstrust offer recommendations, reviews, quality ratings, and further insights on journalistic works. However, there is a complex interaction between different factors in such online communities: fairness and style of reporting, language clarity and objectivity, topical perspectives (like political viewpoint), expertise and bias of community members, and more. This paper presents a model to systematically analyze the different interactions in a news community between users, news, and sources. We develop a probabilistic graphical model that leverages this joint interaction to identify 1) highly credible news articles, 2) trustworthy news sources, and 3) expert users who perform the role of “citizen journalists” in the community. Our method extends CRF models to incorporate real-valued ratings, as some communities have very fine-grained scales that cannot be easily discretized without losing information. To the best of our knowledge, this paper is the first full-fledged analysis of credibility, trust, and expertise in news communities.
💡 Research Summary
The paper tackles the pressing problem of identifying trustworthy information in today’s increasingly partisan online news ecosystems. While platforms such as Reddit, Digg, and NewsTrust allow users to rate, review, and discuss articles, the resulting feedback is entangled with many latent factors: the linguistic style and objectivity of the article, the political or topical bias of the source, the expertise and personal bias of each user, and the complex network of interactions among them (e.g., up‑votes/down‑votes, co‑rating patterns). Existing approaches either treat these elements in isolation, rely on discrete‑label CRFs, or employ collaborative‑filtering models that ignore the continuous nature of many community rating scales.
Core contribution
The authors propose a Continuous Conditional Random Field (CCRF) that jointly models four inter‑dependent random variables: (1) article credibility, (2) source trustworthiness, (3) user expertise (the “citizen journalist” role), and (4) real‑valued ratings (both per‑review and overall article scores). Each node in the graph is a continuous variable ranging from 1 to 5, matching the fine‑grained rating scales used by many news communities. Edges encode seven types of relationships (source‑article, article‑review, user‑review, user‑article, source‑user, source‑review, and user‑user interactions), forming cliques that capture the “cross‑talk” between users, sources, and articles.
Feature engineering
- Linguistic/style features: frequencies of assertive, factive, hedging, implicative, report verbs, discourse markers, subjectivity, bias, and affective lexicons are computed for each article and each user review.
- Topic features: Latent Dirichlet Allocation (LDA) is applied to the corpus to obtain a distribution over latent topics for every document. The resulting topic vectors are combined with Support Vector Regression (SVR) to estimate topic‑specific expertise for users and sources.
- Metadata: user activity statistics (number of posts, votes, replies), source attributes (format, known political leaning), and interaction counts are also incorporated.
Learning and inference
Parameter learning follows the methodology of prior CCRF work (e.g.,
Comments & Academic Discussion
Loading comments...
Leave a Comment