Can One-sided Arguments Lead to Response Change in Large Language Models?
Polemic questions need more than one viewpoint to express a balanced answer. Large Language Models (LLMs) can provide a balanced answer, but also take a single aligned viewpoint or refuse to answer. In this paper, we study if such initial responses can be steered to a specific viewpoint in a simple and intuitive way: by only providing one-sided arguments supporting the viewpoint. Our systematic study has three dimensions: (i) which stance is induced in the LLM response, (ii) how the polemic question is formulated, (iii) how the arguments are shown. We construct a small dataset and remarkably find that opinion steering occurs across (i)-(iii) for diverse models, number of arguments, and topics. Switching to other arguments consistently decreases opinion steering.
💡 Research Summary
The paper investigates whether large language models (LLMs) can be nudged toward a specific stance on binary polemic questions simply by presenting them with one‑sided arguments that support that stance. The authors frame the problem around three independent dimensions: (i) the formulation of the question—either a non‑personal “YES/NO” style or a personal “I agree/I disagree” style; (ii) the way the model is asked to respond—either a direct answer to the question or a confirmation of the implied viewpoint (e.g., “Is this bad? ” vs. “This is bad, right?”); and (iii) the presentation of the arguments—either as a dialog where the model is told it has already agreed with each argument, or as a block of text where the arguments are listed sequentially.
To evaluate these dimensions, the authors construct a small dataset of 30 polemic topics drawn from historical, political, and religious domains. For each topic they manually write 3–7 concise, non‑personal, one‑sided arguments (132 arguments in total), ensuring each argument supports the target viewpoint and is on‑topic. Five contemporary LLMs are tested: gpt‑oss‑120b, Llama 3.3 70B, Llama 3.1 8B, Mistral 7B, and Gemma 3 4B. For every combination of question type, response mode, and argument presentation, the model is prompted 50 times; majority voting determines the final label: positive (agree/YES), negative (disagree/NO), or neutral (refusal, balanced answer, or no answer).
Key Findings
- Robust Opinion Steering – Across almost all settings, adding one‑sided arguments dramatically increases the proportion of positive responses and never increases negative responses. The effect is consistent across models, topics, and argument counts.
- Alignment of Personal/Non‑Personal Stance – The strongest steering occurs when the personal nature of the question matches the presentation style of the arguments: (a) personal “Agree/Disagree” questions paired with dialog‑style arguments, and (b) non‑personal “YES/NO” questions paired with block‑style arguments. This suggests that LLMs condition their reasoning on the social framing of the prompt.
- Topic‑Dependent Variation – Political questions yield the highest rates of “convinced” opinions, followed by historical and religious ones. Conversely, the largest share of remaining negative opinions appears in religious topics.
- Argument Quantity – Both small (3) and larger (7) sets of arguments can be convincing; no monotonic relationship between argument count and steering effectiveness is observed.
- Content‑Driven Effect – When the original arguments are swapped with arguments from a different question—either within the same topic class or across classes—the proportion of positive responses drops markedly, especially for cross‑class swaps. This demonstrates that the steering effect is primarily due to the semantic content of the arguments rather than spurious model correlations or a generic “sycophancy” tendency.
Implications
The study shows that simple, one‑sided argumentation can bypass alignment safeguards and reduce over‑refusal behavior, offering a new vector for both benign prompting and malicious manipulation. It highlights a potential route for creating echo chambers or spreading misinformation by feeding LLMs biased argument streams. The findings also suggest that argument‑based prompting could serve as a proxy for measuring the persuasiveness of arguments, though further work is needed to validate this against human judgments.
Limitations and Future Work
The arguments are deliberately factual‑neutral and short; real‑world polemics often contain misinformation, hate speech, or complex rhetorical devices, which may amplify ethical risks. The study is limited to five models and English prompts; extending to larger models, multilingual settings, and diverse cultural contexts is necessary. Future research directions include (i) comparing LLM opinion change with human persuasion outcomes, (ii) probing the impact of deliberately false or hateful one‑sided arguments, and (iii) designing system‑level defenses that detect and mitigate steering attempts based on argument content.
In sum, the paper provides compelling empirical evidence that one‑sided arguments are an effective, content‑driven tool for steering LLM responses across a range of question formulations and presentation styles, raising important considerations for alignment, safety, and the societal impact of conversational AI.
Comments & Academic Discussion
Loading comments...
Leave a Comment