User-level sentiment analysis incorporating social networks
We show that information about social relationships can be used to improve user-level sentiment analysis. The main motivation behind our approach is that users that are somehow “connected” may be more likely to hold similar opinions; therefore, relationship information can complement what we can extract about a user’s viewpoints from their utterances. Employing Twitter as a source for our experimental data, and working within a semi-supervised framework, we propose models that are induced either from the Twitter follower/followee network or from the network in Twitter formed by users referring to each other using “@” mentions. Our transductive learning results reveal that incorporating social-network information can indeed lead to statistically significant sentiment-classification improvements over the performance of an approach based on Support Vector Machines having access only to textual features.
💡 Research Summary
The paper investigates how social‑network information can be leveraged to improve user‑level sentiment classification on Twitter. The authors argue that users who are “connected” in a social graph are more likely to share similar opinions, a phenomenon often referred to as homophily. To test this hypothesis, they construct two distinct types of graphs for each target topic: (1) a follower/followee (t‑follow) graph that captures who follows whom, and (2) an @‑mention graph that records when a user mentions another user in a tweet. Both graphs are further divided into directed edges (one‑way connections) and mutual edges (reciprocal connections), yielding four possible definitions of a link between two users.
Data collection focuses on five topics—Obama, Sarah Palin, Glenn Beck, Lakers, and Fox News. Starting from seed accounts (e.g., BarackObama, RepRonPaul), the authors crawl the Twitter network, then manually label users based on clear signals in their bios or usernames. This yields a set of gold‑standard users (ranging from 231 to 889 per topic) together with thousands of t‑follow edges and several hundred @‑edges. Statistical analysis shows that the probability two users share the same sentiment is substantially higher when an edge exists, especially in the t‑follow graph (often >0.8). Conversely, users with the same sentiment are more likely to be linked, confirming the intuition that network structure correlates with opinion.
The core technical contribution is a factor‑graph model that jointly incorporates textual evidence and network structure. Each user node is connected to its tweets via “f” factors derived from a standard Support Vector Machine (SVM) trained on textual features. User‑user edges are modeled with “h” factors that encourage neighboring users to take the same label, effectively a Potts model over the graph. The overall graph is heterogeneous, containing both user and tweet nodes. Learning proceeds in a semi‑supervised (transductive) setting: only a small fraction of users are labeled, while the rest are unlabeled. Parameter estimation uses a combination of label propagation and variational Expectation‑Maximization, allowing the model to infer labels for all users in the graph.
Experiments compare four configurations: (i) SVM‑only (textual baseline), (ii) SVM + t‑follow directed edges, (iii) SVM + @‑mention directed edges, and (iv) combinations of directed and mutual edges. Across all topics, the graph‑based models outperform the baseline by 3–5 percentage points in accuracy and F1 score, with statistical significance (p < 0.01). The best performing edge type varies by topic: mutual t‑follow edges excel for political topics (Obama, Sarah Palin), while mutual @‑mentions are more beneficial for the sports topic (Lakers). An additional analysis shows that even a small number of high‑quality edges can yield significant gains, suggesting that the method is robust to sparsity.
In summary, the study demonstrates that incorporating social‑network structure into user‑level sentiment analysis yields measurable improvements over text‑only approaches, especially in the context of short, noisy tweets. The authors conclude that network information helps mitigate the paucity of textual cues and the scarcity of labeled data. Future work is proposed on extending the framework to other platforms, handling multiple topics jointly, and adapting to dynamic network changes in real time.
Comments & Academic Discussion
Loading comments...
Leave a Comment