Exploiting contextual information to improve stance detection in informal political discourse with LLMs

Exploiting contextual information to improve stance detection in informal political discourse with LLMs
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

This study investigates the use of Large Language Models (LLMs) for political stance detection in informal online discourse, where language is often sarcastic, ambiguous, and context-dependent. We explore whether providing contextual information, specifically user profile summaries derived from historical posts, can improve classification accuracy. Using a real-world political forum dataset, we generate structured profiles that summarize users’ ideological leaning, recurring topics, and linguistic patterns. We evaluate seven state-of-the-art LLMs across baseline and context-enriched setups through a comprehensive cross-model evaluation. Our findings show that contextual prompts significantly boost accuracy, with improvements ranging from +17.5% to +38.5%, achieving up to 74% accuracy that surpasses previous approaches. We also analyze how profile size and post selection strategies affect performance, showing that strategically chosen political content yields better results than larger, randomly selected contexts. These findings underscore the value of incorporating user-level context to enhance LLM performance in nuanced political classification tasks.


💡 Research Summary

This paper investigates whether enriching large language model (LLM) inputs with user‑level contextual information can improve political stance detection in informal online discourse, where sarcasm, ambiguity, and implicit cues are common. The authors assembled a real‑world dataset from the politics.com forum, extracting 56,035 posts authored by 257 users who self‑declared a clear LEFT or RIGHT affiliation. For each user, 70 % of their posts were reserved for building a structured profile, while the remaining 30 % formed a held‑out test set.

A user profile is a JSON object containing an inferred political leaning, confidence level, a short list of linguistic or topical indicators, frequent discussion subjects, a tone summary, and optional free‑text insights. Profiles were generated using Gemini 2.0 Flash (1 M‑token context window) with a prompt that extracts salient patterns from the user’s historical posts.

The experimental design comprises three sequential studies. Experiment 1 establishes a ceiling effect by generating maximal profiles from all available posts for each user and evaluating seven state‑of‑the‑art LLMs on 200 balanced test posts. Compared with a baseline that feeds only the target post, the context‑augmented condition yields absolute accuracy gains of 24.5 %–38.5 %, reaching up to 74 % overall.

Experiment 2 explores how profile size and post‑selection strategy influence performance. Five selection policies are implemented: (i) PoliticalSignalSelection, which scores posts using weighted political‑term lexicons (general, party‑specific, hot‑button issues) and selects the top‑scoring 60 % plus diverse topics; (ii) RandomSelection; (iii) ControversialTopicSelection; (iv) RecentPostSelection; and (v) LongFormSelection. Each policy is combined with eight post‑count settings (1, 2, 3, 5, 10, 20, 30, 50), yielding 40 experimental conditions evaluated on roughly 10 000 classification instances. The results show that PoliticalSignalSelection with 10–20 posts per profile provides near‑optimal performance; adding more posts yields diminishing returns due to token limits and noise.

Experiment 3 conducts a cross‑model analysis using the optimized configuration (PoliticalSignalSelection, 20 posts). Seven LLMs are tested: Claude 3.7 Sonnet, Grok‑2‑1212B, GPT‑4o Mini, Mistral Small‑24B, Meta‑LLaMA 3.1‑70B, Qwen, and Gemini 2.0 Flash. All models benefit from the user‑profile context, but the magnitude varies. GPT‑4o Mini and Gemini Flash achieve the largest improvements (~38 % absolute gain), while smaller models still see gains of 17 %–30 %. The authors attribute differences to each model’s token handling capacity and the breadth of political content present in pre‑training data.

Key insights include: (1) user‑level metadata acts as a powerful disambiguation cue for sarcastic or neutral‑tone posts; (2) the quality of contextual information outweighs sheer quantity—targeted political posts are far more informative than a larger random sample; (3) the contextual prompting approach is model‑agnostic, delivering consistent accuracy lifts across diverse architectures.

The paper also acknowledges limitations: profile generation incurs additional computational cost and raises privacy/ethical concerns about aggregating user history. Moreover, the study focuses on binary LEFT/RIGHT classification and a single forum, leaving open questions about multi‑label or cross‑platform generalization.

Future work is outlined to combine the presented contextual enrichment with advanced reasoning techniques such as Chain‑of‑Thought, ReAct, or Preference‑Optimized prompting, potentially yielding even finer-grained political nuance detection. The authors also propose scaling the approach to larger, multilingual datasets and investigating privacy‑preserving profile synthesis. Overall, the study demonstrates that integrating structured user profiles into LLM prompts substantially boosts stance detection performance in informal political discourse.


Comments & Academic Discussion

Loading comments...

Leave a Comment