MIMIC: Integrating Diverse Personality Traits for Better Game Testing Using Large Language Model

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Modern video games pose significant challenges for traditional automated testing algorithms, yet intensive testing is crucial to ensure game quality. To address these challenges, researchers designed gaming agents using Reinforcement Learning, Imitation Learning, or Large Language Models. However, these agents often neglect the diverse strategies employed by human players due to their different personalities, resulting in repetitive solutions in similar situations. Without mimicking varied gaming strategies, these agents struggle to trigger diverse in-game interactions or uncover edge cases. In this paper, we present MIMIC, a novel framework that integrates diverse personality traits into gaming agents, enabling them to adopt different gaming strategies for similar situations. By mimicking different playstyles, MIMIC can achieve higher test coverage and richer in-game interactions across different games. It also outperforms state-of-the-art agents in Minecraft by achieving a higher task completion rate and providing more diverse solutions. These results highlight MIMIC’s significant potential for effective game testing.

💡 Research Summary

The paper tackles a fundamental limitation of current automated game‑testing agents: they largely ignore the strategic diversity that human players exhibit due to differing personality traits. While reinforcement‑learning (RL) and imitation‑learning (IL) agents are constrained by rigid reward functions or expert demonstrations, and recent large‑language‑model (LLM) agents excel at solving complex tasks but still produce homogeneous behavior, none of these approaches generate the breadth of actions needed to uncover edge cases in modern, nondeterministic games.

To address this gap, the authors introduce MIMIC (Multi‑personality Integrated Model for In‑game Coverage). MIMIC is built around four components: a Planner, an Action Executor, an Action Summarizer, and a Memory System. The Planner is the core LLM that receives a personality prompt drawn from the PathOS model, which defines seven gameplay‑relevant traits (Achievement, Adrenaline, Aggression, Caution, Completion, Curiosity, Efficiency). By conditioning on a specific trait, the Planner generates action plans that reflect how a human with that personality would approach the same goal.

A key technical contribution is the Hybrid Planning mechanism. Traditional LLM agents often adopt a “Bottom‑Up” style, issuing the next immediate action based solely on the current state. This works for simple tasks but fails on long‑horizon objectives. MIMIC dynamically switches between Bottom‑Up and a “Top‑Down” decomposition that breaks high‑level goals into sub‑goals. The switch is triggered either after a fixed number of steps or when plan diversity (measured by repeated actions or objects) falls below a threshold, encouraging the agent to explore new behaviors. Prompt‑chaining and verification steps are used to catch hallucinated or infeasible plans before execution.

The Action Summarizer evaluates each execution round using chain‑of‑thought reasoning, compares predicted outcomes with actual game logs, and produces a reflective summary. These summaries, together with raw state information, are stored as Memories. The Memory System can retrieve relevant past experiences, allowing the Planner to maintain personality consistency across sessions and to benefit from accumulated knowledge.

The Action Executor translates high‑level plans into concrete game interactions via two translators: (1) Plan‑to‑Code, which generates reusable script snippets (“Skills”) for games that expose low‑level APIs (e.g., Minecraft’s Mineflayer), and (2) Plan‑to‑Parameters, which directly maps plans to API call arguments for games with richer native interfaces. The executor also includes a personality‑aware time‑allocation module that, for example, grants aggressive agents more time for combat while limiting cautious agents’ exposure to danger.

Empirical evaluation spans three environments: a small open‑source game, a larger open‑source game, and the widely used commercial game Minecraft. In the small game, MIMIC achieved 100 % combinatorial coverage, matching human testers and narrowing the code‑coverage gap with existing agents. In the larger game, it outperformed random baselines, delivering a 1.30× increase in branch coverage and a 14.46× boost in combinatorial coverage. For Minecraft, the authors benchmarked MIMIC against ODYSSEY, the state‑of‑the‑art LLM‑based tester. Across an identical suite of eight tasks, MIMIC not only completed more tasks but also exhibited greater behavioral diversity in six of them, leading to richer scenario coverage. Detailed case studies of the “Obtain 1 Diamond” task illustrate how personality drives distinct strategies: an aggressive agent spends over 20 % of its actions fighting mobs and upgrading armor, a cautious agent avoids combat entirely and prioritizes safety measures such as torch crafting, while an adrenaline‑seeking agent deliberately seeks high‑risk encounters.

The authors justify the choice of the PathOS model over generic personality frameworks (Big Five, MBTI) by emphasizing its direct mapping to observable in‑game behaviors and its proven cross‑game applicability. They also describe a systematic mapping from PathOS entities to game‑specific terms (e.g., “Enemy Hazard” → “mobs” in Minecraft), demonstrating how MIMIC can be extended to new titles with modest engineering effort.

All prompts, code, and datasets are released publicly, ensuring reproducibility and providing a foundation for future research on personality‑driven game testing.

In summary, MIMIC demonstrates that integrating personality traits into LLM‑based planning, coupled with a memory‑augmented feedback loop, substantially expands the exploratory power of automated game‑testing agents. By producing diverse, human‑like playstyles, MIMIC uncovers a broader set of interactions and edge cases, offering a promising direction for improving quality assurance pipelines in the rapidly evolving video‑game industry.

MIMIC: Integrating Diverse Personality Traits for Better Game Testing Using Large Language Model

💡 Research Summary

Comments & Academic Discussion

Leave a Comment