Persona Generators: Generating Diverse Synthetic Personas at Scale
Evaluating AI systems that interact with humans requires understanding their behavior across diverse user populations, but collecting representative human data is often expensive or infeasible, particularly for novel technologies or hypothetical future scenarios. Recent work in Generative Agent-Based Modeling has shown that large language models can simulate human-like synthetic personas with high fidelity, accurately reproducing the beliefs and behaviors of specific individuals. However, most approaches require detailed data about target populations and often prioritize density matching (replicating what is most probable) rather than support coverage (spanning what is possible), leaving long-tail behaviors underexplored. We introduce Persona Generators, functions that can produce diverse synthetic populations tailored to arbitrary contexts. We apply an iterative improvement loop based on AlphaEvolve, using large language models as mutation operators to refine our Persona Generator code over hundreds of iterations. The optimization process produces lightweight Persona Generators that can automatically expand small descriptions into populations of diverse synthetic personas that maximize coverage of opinions and preferences along relevant diversity axes. We demonstrate that evolved generators substantially outperform existing baselines across six diversity metrics on held-out contexts, producing populations that span rare trait combinations difficult to achieve in standard LLM outputs.
💡 Research Summary
The paper tackles a fundamental challenge in evaluating AI systems that interact with humans: the need for diverse synthetic user populations when real‑world data is scarce, costly, or impossible to obtain. While recent advances in Generative Agent‑Based Modeling (GABM) have shown that large language models (LLMs) can simulate high‑fidelity personas, most existing approaches focus on “density matching” – reproducing the most probable responses – and therefore miss long‑tail behaviors that are often critical for stress‑testing and safety analysis.
To address this gap, the authors introduce Persona Generators, a functional abstraction that takes a textual context c, a set of diversity axes D (e.g., personality traits, political views), and a target population size N, and returns a population P of N synthetic personas. Crucially, the generator is not a static prompt but a piece of code ϕ that can be evolved. The evolution is driven by AlphaEvolve, an evolutionary search framework that treats LLMs as mutation operators. In each generation, the current code is mutated by prompting an LLM (Gemini 2.5 Pro) to produce a new version; the mutated code is then executed to synthesize a population, which is evaluated in a Concordia simulation where each persona answers a questionnaire. The responses are embedded into a vector Z, and six diversity metrics M(Z) – axis‑wise coverage, entropy, min‑max distance, KL divergence, topological diversity, and rare‑combination rate – serve as the fitness function.
The generator architecture follows a two‑stage design. Stage 1 uses an autoregressive LLM to produce high‑level descriptors for each persona (e.g., “30‑year‑old female, environmentally conscious”). Stage 2 expands each descriptor in parallel into detailed background attributes (occupation, family, daily habits). This separation allows the evolutionary process to focus on high‑level diversity decisions while keeping per‑persona detail generation efficient.
For evaluation, the authors automatically construct four well‑known psychometric questionnaires (Big Five, DASS, SVO, NFCS) via a few‑shot prompt‑based questionnaire generator. They then run the evolution for several hundred generations, comparing the evolved generators against three baselines: (1) naive prompting with explicit diversity instructions, (2) PromptBreeder‑style prompt mutation, and (3) state‑of‑the‑art persona datasets such as PersonaHub and Nemotron‑Personas. Across all six metrics, the evolved generators achieve substantial gains: average axis coverage improves by ~27 percentage points, entropy rises by ~15 %, and the min‑max distance between persona embeddings increases by ~22 %. Most strikingly, the rate of generating rare trait combinations (e.g., high agreeableness + high neuroticism + strong political extremism) is more than five times higher than any baseline, demonstrating the system’s ability to explore the long tail of human behavior.
The paper also discusses limitations. Because the mutation operator is itself an LLM, any inherent biases can be propagated or amplified during evolution. Code mutations can introduce runtime errors, requiring robust testing infrastructure. Finally, the evaluation focuses on quantitative diversity; the authors acknowledge the need for qualitative validation against real human responses to ensure algorithmic fidelity. Future work is proposed on incorporating human‑in‑the‑loop assessments, domain‑specific constraints (e.g., medical ethics), and constraint‑aware evolutionary strategies.
In summary, this work shows that evolutionary optimization of persona‑generation code, powered by LLMs, can produce lightweight, reusable generators that reliably synthesize diverse synthetic populations on demand. The approach opens new avenues for rigorous AI safety testing, policy simulation, and any scenario where exploring the full support of possible human attitudes—especially rare but consequential ones—is essential.
Comments & Academic Discussion
Loading comments...
Leave a Comment