Learn to Select: Exploring Label Distribution Divergence for In-Context Demonstration Selection in Text Classification

Reading time: 6 minute
...

📝 Abstract

In-context learning (ICL) for text classification, which uses a few input-label demonstrations to describe a task, has demonstrated impressive performance on large language models (LLMs). However, the selection of in-context demonstrations plays a crucial role and can significantly affect LLMs’ performance. Most existing demonstration selection methods primarily focus on semantic similarity between test inputs and demonstrations, often overlooking the importance of label distribution alignment. To address this limitation, we propose a two-stage demonstration selection method, TopK + Label Distribution Divergence (L2D), which leverages a fine-tuned BERT-like small language model (SLM) to generate label distributions and calculate their divergence for both test inputs and candidate demonstrations. This enables the selection of demonstrations that are not only semantically similar but also aligned in label distribution with the test input. Extensive experiments across seven text classification benchmarks show that our method consistently outperforms previous demonstration selection strategies. Further analysis reveals a positive correlation between the performance of LLMs and the accuracy of the underlying SLMs used for label distribution estimation.

💡 Analysis

In-context learning (ICL) for text classification, which uses a few input-label demonstrations to describe a task, has demonstrated impressive performance on large language models (LLMs). However, the selection of in-context demonstrations plays a crucial role and can significantly affect LLMs’ performance. Most existing demonstration selection methods primarily focus on semantic similarity between test inputs and demonstrations, often overlooking the importance of label distribution alignment. To address this limitation, we propose a two-stage demonstration selection method, TopK + Label Distribution Divergence (L2D), which leverages a fine-tuned BERT-like small language model (SLM) to generate label distributions and calculate their divergence for both test inputs and candidate demonstrations. This enables the selection of demonstrations that are not only semantically similar but also aligned in label distribution with the test input. Extensive experiments across seven text classification benchmarks show that our method consistently outperforms previous demonstration selection strategies. Further analysis reveals a positive correlation between the performance of LLMs and the accuracy of the underlying SLMs used for label distribution estimation.

📄 Content

Learn to Select: Exploring Label Distribution Divergence for In-Context Demonstration Selection in Text Classification Ye Jiang 1, Taihang Wang1, Youzheng Liu1, Yimin Wang2*, Yuhan Xia3, Yunfei Long3 1 College of Information Science and Technology, Qingdao University of Science and Technology 2 College of Data Science, Qingdao University of Science and Technology 1 School of Electronic Engineering and Computer Science, Queen Mary University of London Abstract In-context learning (ICL) for text classification, which uses a few input-label demonstrations to describe a task, has demon- strated impressive performance on large language models (LLMs). However, the selection of in-context demonstrations plays a crucial role and can significantly affect LLMs’ perfor- mance. Most existing demonstration selection methods pri- marily focus on semantic similarity between test inputs and demonstrations, often overlooking the importance of label distribution alignment. To address this limitation, we propose a two-stage demonstration selection method, TopK + Label Distribution Divergence (L2D), which leverages a fine-tuned BERT-like small language model (SLM) to generate label dis- tributions and calculate their divergence for both test inputs and candidate demonstrations. This enables the selection of demonstrations that are not only semantically similar but also aligned in label distribution with the test input. Extensive ex- periments across seven text classification benchmarks show that our method consistently outperforms previous demon- stration selection strategies. Further analysis reveals a posi- tive correlation between the performance of LLMs and the accuracy of the underlying SLMs used for label distribution estimation. Introduction In-context learning (ICL) (Brown et al. 2020) is an emergent capability of large language models (LLMs), enabling them to make accurate predictions based on only a few input- output demonstrations provided at inference time (Dong et al. 2024). Compared to standard zero-shot prompting, ICL has shown superior effectiveness in leveraging in-context demonstrations, and has become a new paradigm for tack- ling a wide range of text classification tasks, including fake news detection (Jiang and Wang 2024) and natural language inference (Xu et al. 2024). However, the classification accuracy of LLMs employing ICL is highly sensitive to the choice and ordering of demon- strations. Prior studies (Min et al. 2022; Liu et al. 2022) have shown that even minor changes in the order of examples can lead to substantial variability in model predictions. To ad- dress this, retrieval-based methods have been proposed (Ru- bin, Herzig, and Berant 2022), which aim to select demon- strations that are semantically similar to the test input. These *Corresponding Author Figure 1: A comparison of 2-shot in-context demonstrations retrieved by different selection methods in SST-2. Although the test input is labeled as having a positive sentiment, the overall semantics are somewhat ambiguous or controversial. Our method effectively captures the adversative conjunction in the demonstrations and aligns the label distributions with that of the test input. approaches have been shown to consistently outperform ran- dom selection strategies. Furthermore, many studies have examined additional fac- tors that can affect ICL performance in text classification tasks. For example, Iter et al. (2023) suggests that the ef- fectiveness of in-context examples is negatively correlated with the perplexity of a fine-tuned model on the test samples. Similarly, Peng et al. (2024) demonstrates a positive corre- lation between ICL performance and the model’s compre- hension of the test inputs. While these studies have achieved promising results, they predominantly emphasize semantic consistency between the test input and the selected demon- strations. However, semantically similar examples may still contain contradictory or inconsistent labels, undermining their effectiveness. Moreover, test inputs may exhibit label ambiguity due to semantic uncertainty, as shown in Figure 1. A recent study (Fei et al. 2023) suggests that the perfor- arXiv:2511.10675v1 [cs.CL] 10 Nov 2025 mance improvements observed in LLMs through demonstra- tions may not primarily arise from accurate input-label pair- ings. In fact, demonstrations with randomly assigned (Min et al. 2022) or symbolic (Wei et al. 2023) labels have been shown to produce competitive results. This challenges exist- ing selection methods to be more robust in the presence of noisy data, where demonstrations with incorrect labels are often assigned to semantically similar test inputs. To address the above issues, we propose a two-stage method, denoted as TopK + Label Distribution Divergence (L2D). Specifically, we apply the TopK retrieval method (Liu et al. 2022) to extract a candidate pool of demonstra- tions from the training set, selected based on semantic sim- ilarity to the test input.

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut