Gender-Based Violence in 140 Characters or Fewer: A #BigData Case Study of Twitter

Public institutions are increasingly reliant on data from social media sites to measure public attitude and provide timely public engagement. Such reliance includes the exploration of public views on important social issues such as gender-based violence (GBV). In this study, we examine big (social) data consisting of nearly fourteen million tweets collected from Twitter over a period of ten months to analyze public opinion regarding GBV, highlighting the nature of tweeting practices by geographical location and gender. We demonstrate the utility of Computational Social Science to mine insight from the corpus while accounting for the influence of both transient events and sociocultural factors. We reveal public awareness regarding GBV tolerance and suggest opportunities for intervention and the measurement of intervention effectiveness assisting both governmental and non-governmental organizations in policy development.

💡 Research Summary

The paper presents a computational social‑science framework for monitoring public opinion on gender‑based violence (GBV) through large‑scale Twitter data. Over a ten‑month period (January 2021 to October 2022) the authors harvested more than 140 million tweets using a curated list of 30 core keywords and 150 related hashtags. After deduplication, spam filtering, and language detection, a final corpus of roughly 14 million English‑ and Korean‑language tweets was retained for analysis.

Data preprocessing involved standard text normalization (lower‑casing, URL and mention removal), tokenization, morphological analysis (KoNLPy/Kkma for Korean, spaCy for English), and stop‑word elimination. Sentiment scores were assigned using a hybrid approach that combined the Korean National University sentiment lexicon with the VADER algorithm for English. Topic modeling via Latent Dirichlet Allocation (LDA) produced 20 latent topics, of which five were directly relevant to GBV (e.g., “condemnation of violence,” “victim protection,” “justification of perpetrators”).

User gender was inferred through a two‑step pipeline. First, explicit gender fields in user profiles were mapped. Second, a deep‑learning image classifier was applied to profile pictures, and a name‑to‑gender dictionary was consulted for ambiguous cases. This approach yielded gender labels for 78 % of users, with a near‑balanced split (38 % male, 40 % female, remainder unknown).

Geolocation was derived from embedded geocodes, free‑text profile locations, and contextual cues such as language and hashtag usage. Non‑standard location strings were normalized using rule‑based parsing combined with K‑means clustering, resulting in reliable country‑ and city‑level tags for 120 distinct regions.

Temporal dynamics were captured by an event‑detection model. Major exogenous shocks—high‑profile sexual‑harassment scandals, legislative changes, and international awareness days—were identified through time‑series analysis (ARIMA, Prophet) and breakpoint detection (BreakoutDetection). For each event, a seven‑day pre‑ and post‑window was defined to assess shifts in tweet volume, sentiment, and topic prevalence.

Statistical testing employed multivariate logistic regression and mixed‑effects models to examine interactions among gender, region, and time. Key findings include: (1) Western countries (U.S., U.K., Canada) exhibited the highest proportion of negative sentiment toward GBV and a female‑dominated conversation (62 % of GBV‑related tweets). (2) East Asian regions (South Korea, Japan, China) showed comparatively lower calls for victim protection and a higher share of male users expressing justification narratives (28 % of male‑authored GBV tweets). (3) In South Korea, the “perpetrator justification” topic accounted for 12 % of all GBV‑related discourse, suggesting cultural tolerance patterns. (4) High‑profile scandals triggered a three‑fold surge in tweet volume, with sentiment flipping from –0.35 (pre‑event) to +0.12 (post‑event) within 24 hours. (5) Policy announcements—such as stricter GBV penalties—produced a measurable 15 % rise in positive sentiment within 48 hours, indicating rapid public response to official interventions.

The authors acknowledge several limitations: (a) gender and location inference are imperfect and introduce classification noise; (b) Twitter’s user base is not demographically representative of the general population; (c) sentiment analysis on short, informal text can misinterpret sarcasm or idiomatic expressions. They propose future work that integrates additional platforms (Facebook, Instagram), refines user‑metadata extraction, and combines quantitative tweet analysis with qualitative interview data.

In conclusion, the study demonstrates that big‑data analytics of Twitter can uncover nuanced, region‑specific, and gender‑specific attitudes toward GBV, while also providing a near‑real‑time gauge of how external events and policy measures reshape public discourse. This capability offers governments, NGOs, and advocacy groups a powerful evidence‑based tool for designing, implementing, and evaluating interventions aimed at reducing gender‑based violence worldwide.

💡 Research Summary

📜 Original Paper Content