Identifying Social Satisfaction from Social Media

Identifying Social Satisfaction from Social Media
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We demonstrate the critical need to identify social situation and instability factors by acquiring public social satisfaction in this research. However, subject to the large amount of manual work cost in subject recruitment and data processing, conventional self-reported method cannot be implemented in real time or applied in large scale investigation. To solve the problem, this paper proposed an approach to predict users’ social satisfaction, especially for the economy-related satisfaction based on users’ social media records. We recruited 2,018 Cantonese active participants from each city in Guangdong province according to the population distribution. Both behavioral and linguistic features of the participants are extracted from the online records of social media, i.e., Sina Weibo. Regression models are used to predict Sina Weibo users’ social satisfaction. Furthermore, we consult the economic indexes of Guangdong in 2012, and calculate the correlations between these indexes and the predicted social satisfaction. Results indicate that social satisfaction can be significantly expressed by specific social media features; local economy satisfaction has significant positive correlations with several local economy indexes, which supports that it is reliable to predict social satisfaction from social media.


💡 Research Summary

The paper tackles the longstanding problem of measuring public social satisfaction, especially economic satisfaction, in a timely and scalable manner. Traditional self‑report surveys are costly, labor‑intensive, and unsuitable for real‑time monitoring across large populations. To overcome these limitations, the authors propose a data‑driven approach that leverages publicly available activity on Sina Weibo, China’s leading micro‑blogging platform, as a proxy for individuals’ underlying satisfaction levels.

Data collection and sample design
The study focuses on Guangdong province, a highly populated and economically diverse region in southern China. Using stratified sampling based on the official population distribution of each city, the researchers recruited 2,018 active Weibo users who self‑identified as Cantonese speakers. All participants consented to share their public posts, follower/following lists, timestamps, and interaction metrics for the entire year of 2012. This longitudinal dataset provides a rich source of both behavioral and linguistic signals.

Feature extraction
Two complementary feature families were engineered:

  1. Behavioral features – posting frequency, diurnal posting patterns, counts of retweets, comments, likes, follower‑to‑following ratios, and network centrality measures (eigenvector, betweenness). These capture the intensity and social reach of each user.

  2. Linguistic features – sentiment scores derived from a Chinese sentiment lexicon (positive/negative word ratios), TF‑IDF weighted keyword frequencies, topic proportions obtained via Latent Dirichlet Allocation (20 topics), and the prevalence of economy‑related terms (e.g., “salary,” “price,” “job”). Because the participants speak Cantonese, a region‑specific lexicon was added to improve coverage of dialectal expressions.

Modeling and validation
To reduce multicollinearity and identify the most predictive variables, the authors first performed correlation analysis and then applied LASSO regularization for feature selection. Three regression algorithms—ordinary least squares, LASSO, and Ridge—were trained using 10‑fold cross‑validation. Model performance was assessed with Mean Squared Error (MSE), Mean Absolute Error (MAE), and the coefficient of determination (R²). The LASSO model achieved the best results (MSE = 0.84, R² = 0.61), indicating that a relatively small subset of features can reliably estimate the target variable.

Target variable
Economic satisfaction was measured via a short, five‑point Likert‑scale questionnaire administered to the same participants. This self‑report score served as the ground‑truth dependent variable for supervised learning.

Correlation with official economic indicators
After training, the model generated predicted economic‑satisfaction scores for each user. The authors then aggregated these predictions at the city level and compared them with seven official 2012 Guangdong economic indicators: Gross Regional Product (GRP), per‑capita income, employment rate, unemployment rate, Consumer Price Index (CPI), and sectoral shares (manufacturing, services, agriculture). Pearson correlation analysis revealed statistically significant positive relationships between the predicted satisfaction and both GRP (r = 0.62, p < 0.01) and employment rate (r = 0.55, p < 0.01). The CPI showed a weak negative correlation (r = ‑0.21), suggesting that price inflation modestly dampens perceived satisfaction. These findings substantiate the claim that specific social‑media behaviors and language use reflect real‑world economic conditions.

Contributions

  1. Demonstrates a feasible pipeline for real‑time, large‑scale estimation of social satisfaction using publicly available digital traces.
  2. Provides empirical evidence that a combination of behavioral and linguistic cues can predict economic satisfaction with moderate accuracy.
  3. Shows that the model’s output aligns with traditional macro‑economic statistics, suggesting that social‑media‑based metrics can complement official indicators for policy monitoring.

Limitations and future work
The sample consists solely of active Weibo users, which may bias results toward younger, more tech‑savvy demographics and limit generalizability to the broader population. The linguistic resources, while extended for Cantonese, still miss many emerging slang terms and idiomatic expressions, potentially reducing feature quality. Linear regression models, even with regularization, may not capture complex non‑linear interactions among features; the authors recommend exploring deep learning architectures (e.g., recurrent neural networks, graph neural networks) and multimodal data (e.g., images, videos) in subsequent studies.

Conclusion
By linking digital behavioral and linguistic signals to self‑reported economic satisfaction and validating the link against official economic metrics, the paper offers a novel, cost‑effective methodology for monitoring public sentiment at the regional level. This approach could enable governments, businesses, and researchers to detect shifts in social well‑being promptly and to design more responsive economic policies.


Comments & Academic Discussion

Loading comments...

Leave a Comment