Linguistic Signatures for Enhanced Emotion Detection
Emotion detection is a central problem in NLP, with recent progress driven by transformer-based models trained on established datasets. However, little is known about the linguistic regularities that characterize how emotions are expressed across different corpora and labels. This study examines whether linguistic features can serve as reliable interpretable signals for emotion recognition in text. We extract emotion-specific linguistic signatures from 13 English datasets and evaluate how incorporating these features into transformer models impacts performance. Our RoBERTa-based models enriched with high level linguistic features achieve consistent performance gains of up to +2.4 macro F1 on the GoEmotions benchmark, showing that explicit lexical cues can complement neural representations and improve robustness in predicting emotion categories.
💡 Research Summary
The paper investigates whether explicit linguistic cues can improve neural emotion detection models and provide interpretable insights into how emotions are expressed across diverse corpora. The authors first conduct a systematic survey of emotion‑related datasets published between 2017 and 2024, ultimately selecting 13 English datasets that span news headlines, social‑media snippets, and conversational dialogues. Using SEANCE—a tool built on the General Inquirer (GI) lexicon—they extract normalized frequency vectors for 180+ semantic categories from each emotion‑specific text collection. By retaining only the most frequent features that appear in at least half of the instances, they construct compact “linguistic signatures” for each of 30 harmonized emotion labels (e.g., joy, anger, admiration). These signatures reveal both universal markers (Active_GI, Iav_GI, Strong_GI) and emotion‑specific markers (Virtue_GI for admiration, Hostile_GI for anger, Need_GI for desire). Pairwise Jaccard analysis shows that while many emotions share common cues, a subset exhibits distinctive patterns, confirming the feasibility of cross‑dataset signature extraction (RQ1).
To test whether these signatures can boost transformer performance (RQ2), the authors propose two integration strategies for RoBERTa. The first, RoBERTa‑LexEnhance, concatenates the
Comments & Academic Discussion
Loading comments...
Leave a Comment