Persona-driven Simulation of Voting Behavior in the European Parliament with Large Language Models
Large Language Models (LLMs) display remarkable capabilities to understand or even produce political discourse but have been found to consistently exhibit a progressive left-leaning bias. At the same time, so-called persona or identity prompts have been shown to produce LLM behavior that aligns with socioeconomic groups with which the base model is not aligned. In this work, we analyze whether zero-shot persona prompting with limited information can accurately predict individual voting decisions and, by aggregation, accurately predict the positions of European groups on a diverse set of policies. We evaluate whether predictions are stable in response to counterfactual arguments, different persona prompts, and generation methods. Finally, we find that we can simulate the voting behavior of Members of the European Parliament reasonably well, achieving a weighted F1 score of approximately 0.793. Our persona dataset of politicians in the 2024 European Parliament and our code are available at the following url: https://github.com/dess-mannheim/european_parliament_simulation.
💡 Research Summary
This paper investigates whether large language models (LLMs) can accurately simulate the voting behavior of individual Members of the European Parliament (MEPs) by using zero‑shot persona prompting with only limited publicly available information. The authors collect a comprehensive dataset from the HowTheyVote project, focusing on roll‑call votes where every MEP’s individual decision is recorded. The final corpus contains 27,770 votes cast by 710 MEPs across 47 high‑profile legislative proposals from the 2024 parliamentary term. For each MEP, two types of persona descriptions are constructed: (1) a concise attribute‑based prompt listing name, gender, age, birthplace, represented country, European political group, and national party; and (2) a longer summary generated by Llama‑3‑70B from the MEP’s English Wikipedia article.
To avoid leaking the group’s official stance, the authors feed the LLMs with the speeches of each European group’s representative (the “group position” speeches) in a randomized order, anonymising any party or person names with placeholders. The prompt then asks the model to output exactly one of three options – FOR, AGAINST, or ABSTENTION – after optionally providing a chain‑of‑thought reasoning. Two response strategies are explored: (r) reasoning first, then a vote; (nr) direct vote without explicit reasoning. Each persona is queried three times, and the mean weighted F1 score (accounting for the heavily imbalanced class distribution: ~77 % FOR, 17 % AGAINST, 6 % ABSTENTION) is used as the primary evaluation metric.
Four LLMs are evaluated: Llama‑3‑70B, Llama‑3‑8B, Qwen‑2.5‑72B, and Qwen‑2.5‑7B. All models were selected because their training cut‑off predates the earliest vote (16 January 2024), minimizing data leakage. The results show that the largest Llama model (70 B) combined with the attribute‑based persona and reasoning chain achieves the highest weighted F1 of 0.793. Smaller Llama‑3‑8B reaches 0.728, while the Qwen models perform slightly worse (0.789 for 72 B, 0.670 for 7 B). Reasoning improves performance for the larger models; however, for Qwen‑7B reasoning actually reduces the overall weighted F1 but dramatically increases the detection of minority classes (ABSTENTION and AGAINST), indicating a trade‑off between overall accuracy and sensitivity to less frequent outcomes.
At the group level, the best model predicts the majority position of centrist and progressive groups (S&D, Renew, Greens/EFA) with weighted F1 scores above 0.90, but struggles with far‑right or far‑left groups (ID, GUE/NGL, ECR), where scores fall below 0.60. The model consistently over‑predicts FOR votes and under‑predicts ABSTENTION, reflecting both the class imbalance and the model’s inherent bias toward affirmative decisions. An ablation study reveals that the national party affiliation is the strongest predictor of an MEP’s vote, aligning with political science findings that party discipline remains a key driver in the European Parliament.
The authors discuss several limitations: (i) poor ABSTENTION prediction, (ii) reliance on short, possibly biased speech excerpts rather than full legislative texts, (iii) potential residual data leakage from pre‑training corpora, and (iv) the observed left‑leaning bias of LLMs, which hampers accurate simulation of conservative or Eurosceptic MEPs. They suggest future work should incorporate richer policy documents, explore multi‑label or probabilistic voting outputs, and develop techniques to explicitly counteract model bias.
Overall, the study demonstrates that with carefully crafted persona prompts and reasoning chains, LLMs can approximate individual voting behavior in a multi‑party, multi‑national legislature to a respectable degree, achieving a weighted F1 close to 0.80. This opens avenues for using LLM‑based agents in political simulation, scenario analysis, and the study of legislative dynamics, provided that methodological safeguards and bias‑mitigation strategies are further refined.
Comments & Academic Discussion
Loading comments...
Leave a Comment