Addressing Data Imbalance in Transformer-Based Multi-Label Emotion Detection with Weighted Loss

Addressing Data Imbalance in Transformer-Based Multi-Label Emotion Detection with Weighted Loss
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

This paper explores the application of a simple weighted loss function to Transformer-based models for multi-label emotion detection in SemEval-2025 Shared Task 11. Our approach addresses data imbalance by dynamically adjusting class weights, thereby enhancing performance on minority emotion classes without the computational burden of traditional resampling methods. We evaluate BERT, RoBERTa, and BART on the BRIGHTER dataset, using evaluation metrics such as Micro F1, Macro F1, ROC-AUC, Accuracy, and Jaccard similarity coefficients. The results demonstrate that the weighted loss function improves performance on high-frequency emotion classes but shows limited impact on minority classes. These findings underscore both the effectiveness and the challenges of applying this approach to imbalanced multi-label emotion detection.


💡 Research Summary

The paper investigates a straightforward yet effective approach to mitigate class imbalance in multi‑label emotion detection by incorporating class‑wise weighting directly into the loss function of Transformer‑based models. The authors focus on the SemEval‑2025 Task 11 “BRIGHTER” dataset, specifically its English subset, which contains short text instances annotated with five emotion labels (anger, fear, joy, sadness, surprise). Because each instance may belong to multiple emotions, traditional oversampling or undersampling techniques can distort label co‑occurrence patterns, making them unsuitable for this scenario.

To address this, the authors compute a weight w_j for each emotion class j as the inverse of its frequency relative to the total number of training samples, then normalize by the maximum weight so that rare classes receive higher emphasis. This weight is multiplied into the binary cross‑entropy (BCE) loss, yielding a weighted BCE loss L′. Additionally, label smoothing with a small ε is applied to prevent over‑confidence on majority classes. The loss formulation is:

L′ = – (1/N) Σ_i Σ_j w_j


Comments & Academic Discussion

Loading comments...

Leave a Comment