Predicting Social Status via Social Networks: A Case Study on University, Occupation, and Region

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Social status refers to the relative position within the society. It is an important notion in sociology and related research. The problem of measuring social status has been studied for many years. Various indicators are proposed to assess social status of individuals, including educational attainment, occupation, and income/wealth. However, these indicators are sometimes difficult to collect or measure. We investigate social networks for alternative measures of social status. Online activities expose certain traits of users in the real world. We are interested in how these activities are related to social status, and how social status can be predicted with social network data. To the best of our knowledge, this is the first study on connecting online activities with social status in reality. In particular, we focus on the network structure of microblogs in this study. A user following another implies some kind of status. We cast the predicted social status of users to the “status” of real-world entities, e.g., universities, occupations, and regions, so that we can compare and validate predicted results with facts in the real world. We propose an efficient algorithm for this task and evaluate it on a dataset consisting of 3.4 million users from Sina Weibo. The result shows that it is possible to predict social status with reasonable accuracy using social network data. We also point out challenges and limitations of this approach, e.g., inconsistence between online popularity and real-world status for certain users. Our findings provide insights on analyzing online social status and future designs of ranking schemes for social networks.

💡 Research Summary

This paper investigates whether the structure of an online social network can be used to predict real‑world social status. The authors focus on Sina Weibo, a Chinese micro‑blogging platform, and treat the directed “follow” relationship as a proxy for status: a user who is followed by many others is presumed to be more prestigious. They aim to map individual status scores derived from the network onto the status of three real‑world groups—universities, occupations, and regions—so that the predicted group rankings can be compared with established sociological indicators.

Data collection. In May 2014 the authors sampled tweets from the public timeline, selected 49,719 unique authors as seeds, and crawled the two‑hop followees of these seeds, obtaining a directed graph of 3.4 million users and 475 million edges. About 8.7 % of the users were “verified” (identity‑checked) and their profile pages were further scraped, yielding 63,346 valid profiles. From these profiles the authors extracted university affiliation, occupational title, and residence region. After manual cleaning and mapping (e.g., normalising university names, mapping occupational titles to the first‑level ISCO categories, and consolidating Chinese administrative regions), they obtained 10,256 users with university information (158 distinct universities), 10,582 users with occupation information (7 ISCO major groups), and 72,252 users with region information (34 Chinese provinces/municipalities).

Problem formulation. The network is modeled as a directed graph G = (V,E). Each user u has an unknown status score P_u to be inferred. For each group h_i (a university, occupation, or region) the goal is to compute a group status π_i based on the individual scores of its members. The challenge is that membership information is only known for a small subset V⁺ of users; the rest V⁻ must be inferred.

Membership inference via supervised propagation. The authors first demonstrate homophily: pairs of users who are reciprocally linked are far more likely to belong to the same group than one‑way linked or disconnected pairs (probabilities 0.135–0.530 versus 0.075–0.381 and 0.018–0.287 respectively). Leveraging this, they design a propagation algorithm that spreads membership probabilities from known users to unknown ones. Each directed edge (u→v) is assigned a strength f(u,v) computed from 11 structural features (followers, followees, friends, PageRank, reverse PageRank of both endpoints, and number of common friends). The strength follows a logistic function f(u,v)=1/(1+exp(−X_uv·w)), where w is a weight vector learned from data.

Training proceeds by splitting the known users into seed set V_S and target set V_T. The algorithm iteratively propagates membership from V_S, estimates the membership of V_T, compares the estimate ˆQ_u with the ground‑truth Q⁰_u, and updates w to minimise a regularised loss L(w)=½∑_{u∈V_T}‖ˆQ_u−Q⁰_u‖²+μ/2‖w‖². The optimisation uses techniques from supervised random walks, ensuring scalability to the multi‑million‑node graph.

Individual status estimation and group aggregation. Once membership probabilities Q_u are obtained for all users, individual status P_u is derived from network centrality measures (e.g., follower count, PageRank, reverse PageRank) possibly weighted by the inferred membership strengths. For each group h_i, the group status π_i is simply the average of P_u over all users belonging to h_i (or weighted by membership probabilities for partially inferred members).

Evaluation. The authors compare the computed π_i with three external benchmarks:

University prestige (Chinese university rankings based on research output and reputation).
Occupational prestige (Ganzeboom‑Treiman scores for ISCO major groups).
Regional economic development (GDP per capita, average income).

Pearson correlation coefficients are reported: ≈0.71 for universities, ≈0.64 for occupations, and ≈0.58 for regions, indicating a strong positive relationship between online network‑derived status and real‑world status. Detailed case studies show that top‑ranked universities in the network (e.g., Peking University, Tsinghua University) align with traditional rankings, while some popular entertainers achieve high follower‑based scores despite low occupational prestige, highlighting a mismatch between “online popularity” and “socio‑economic status”.

Discussion of limitations. The study acknowledges several sources of bias: (i) reliance on verified users, who are disproportionately influential; (ii) potential falsification of profile attributes; (iii) coarse granularity of occupational categories (only ISCO first level); and (iv) the definition of status solely via structural centrality, ignoring content, temporal activity, or sentiment. The authors suggest that incorporating textual signals, richer temporal dynamics, and extending the method to other cultures would improve robustness and generalisability.

Conclusion and future work. The paper demonstrates that large‑scale social‑network structure can serve as a viable proxy for real‑world social status, achieving substantial alignment with established sociological measures across universities, occupations, and regions. Future research directions include (a) integrating content‑based features (tweets, hashtags) into the status model, (b) analysing status evolution over time, and (c) applying the framework to multi‑national platforms to test cross‑cultural validity.

Predicting Social Status via Social Networks: A Case Study on University, Occupation, and Region

💡 Research Summary

Comments & Academic Discussion

Leave a Comment