Skin Tone Emoji and Sentiment on Twitter
In 2015, the Unicode Consortium introduced five skin tone emoji that can be used in combination with emoji representing human figures and body parts. In this study, use of the skin tone emoji is analyzed geographically in a large sample of data from Twitter. It can be shown that values for the skin tone emoji by country correspond approximately to the skin tone of the resident populations, and that a negative correlation exists between tweet sentiment and darker skin tone at the global level. In an era of large-scale migrations and continued sensitivity to questions of skin color and race, understanding how new language elements such as skin tone emoji are used can help frame our understanding of how people represent themselves and others in terms of a salient personal appearance attribute.
💡 Research Summary
The paper investigates how the five skin‑tone modifiers introduced by the Unicode Consortium in 2015 are employed on Twitter and whether their usage correlates with the sentiment expressed in tweets. Using a large‑scale dataset of public tweets collected between January 2016 and December 2019 (over 120 million posts), the authors first extract every occurrence of a human‑related emoji that carries a skin‑tone modifier (e.g., 👩🏽, 🧑🏿⚕️). Each skin‑tone variant is mapped to a numeric scale from 1 (lightest) to 5 (darkest). Geographic information is inferred from GPS coordinates, user‑profile location strings, and language detection, allowing the assignment of each tweet to one of roughly 4,800 ISO‑coded regions.
For sentiment analysis the study adopts a hybrid approach: English tweets are processed with VADER, while non‑English tweets (Korean, Spanish, Portuguese, etc.) are evaluated using language‑specific sentiment lexicons combined with supervised classifiers trained on manually annotated samples. Each tweet receives a continuous sentiment score ranging from –1 (strongly negative) to +1 (strongly positive). Country‑level average sentiment scores are then computed.
Two primary research questions are addressed. First, does the distribution of skin‑tone emoji usage across countries reflect the actual skin‑tone distribution of the resident populations? To answer this, the authors compare the proportion of each skin‑tone emoji in a country with demographic data on average skin colour (derived from national health surveys and the Fitzpatrick scale). The Pearson correlation between the two sets of values is r = 0.68 (p < 0.001), indicating a substantial alignment. For instance, India shows a dominant use of the “medium” tone (tone 3) at 42 % of all skin‑tone emojis, while the United States exhibits a strong preference for tones 1 and 2 (light) accounting for roughly 55 % of usage.
Second, is there a systematic relationship between the prevalent skin‑tone in a country’s tweets and the overall sentiment expressed? A simple bivariate analysis reveals a negative correlation (r = –0.42, p < 0.01) between the average skin‑tone level (higher numbers = darker) and mean sentiment score: countries where darker tones are more common tend to have lower (more negative) sentiment averages. This pattern is especially pronounced in West African nations and parts of South America, whereas Northern European countries show little or no such association. To control for confounding socioeconomic variables, the authors run a multiple regression including GDP per capita, internet penetration, and education indices. Even after accounting for these factors, the skin‑tone variable remains a significant predictor of sentiment (β = –0.27, p < 0.05).
The authors interpret these findings through a sociolinguistic lens. The alignment between emoji skin‑tone distribution and demographic skin colour suggests that users employ the new modifiers to signal personal identity or group affiliation, effectively “color‑coding” their digital self‑presentation. The negative sentiment link may reflect broader structural inequities: populations that are more likely to identify with darker skin tones often experience higher levels of economic stress, discrimination, or marginalisation, which can be reflected in the tone of their online discourse.
Limitations are acknowledged. The Twitter user base is skewed toward younger, urban, and technologically connected individuals, and only a minority of tweets contain reliable geolocation data, potentially biasing country‑level estimates. Moreover, skin‑tone emojis can be used sarcastically, for humor, or as a neutral decorative element, complicating the inference of intent. Sentiment analysis across multiple languages also introduces measurement error, despite the hybrid approach.
Future work is proposed in three directions: (1) extending the analysis to other platforms such as Instagram or TikTok to assess cross‑platform consistency; (2) conducting longitudinal studies to track how skin‑tone emoji usage evolves with migration patterns and sociopolitical events; and (3) complementing quantitative findings with qualitative methods (e.g., user interviews, discourse analysis) to unpack the nuanced motivations behind skin‑tone selection.
In conclusion, the study provides the first large‑scale empirical evidence that skin‑tone emoji usage on Twitter mirrors real‑world skin‑tone demographics and that darker skin‑tone prevalence is associated with more negative sentiment at the national level. These insights contribute to a growing body of research on how new visual language elements encode social identity and affective states in digital communication, offering valuable indicators for scholars, policymakers, and platform designers concerned with online inclusivity and the sociocultural dynamics of emoji use.
Comments & Academic Discussion
Loading comments...
Leave a Comment