Quantifying and Attributing Polarization to Annotator Groups

Quantifying and Attributing Polarization to Annotator Groups
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Current annotation agreement metrics are not well-suited for inter-group analysis, are sensitive to group size imbalances and restricted to single-annotation settings. These restrictions render them insufficient for many subjective tasks such as toxicity and hate-speech detection. For this reason, we introduce a quantifiable metric, paired with a statistical significance test, that attributes polarization to various annotator groups. Our metric enables direct comparisons between heavily imbalanced sociodemographic and ideological subgroups across different datasets and tasks, while also enabling analysis on multi-label settings. We apply this metric to three datasets on hate speech, and one on toxicity detection, discovering that: (1) Polarization is strongly and persistently attributed to annotator race, especially on the hate speech task. (2) Religious annotators do not fundamentally disagree with each other, but do with other annotators, a trend that is gradually diminished and then reversed with irreligious annotators. (3) Less educated annotators are more subjective, while educated ones tend to broadly agree more between themselves. Overall, our results reflect current findings around annotation patterns for various subgroups. Finally, we estimate the minimum number of annotators needed to obtain robust results, and provide an open-source Python library that implements our metric.


💡 Research Summary

The paper addresses a critical shortcoming in current annotation agreement metrics—namely, their inability to handle inter‑group analyses, sensitivity to imbalanced group sizes, and restriction to single‑label settings. These limitations are especially problematic for subjective NLP tasks such as toxicity and hate‑speech detection, where minority or marginalized annotators may systematically disagree with the majority, and such disagreements are often erased by majority‑voting aggregation.

To overcome these issues, the authors introduce a novel metric called apunim (Aposteriori Unimodality) together with a parametric statistical significance test. The core idea is to treat polarization as a clustering problem on the histograms of annotation values rather than as simple agreement. Polarization is quantified using nDFU (normalized Distance from Unimodality), a value in


Comments & Academic Discussion

Loading comments...

Leave a Comment