Can LLMs effectively provide game-theoretic-based scenarios for cybersecurity?
Game theory has long served as a foundational tool in cybersecurity to test, predict, and design strategic interactions between attackers and defenders. The recent advent of Large Language Models (LLMs) offers new tools and challenges for the security of computer systems; In this work, we investigate whether classical game-theoretic frameworks can effectively capture the behaviours of LLM-driven actors and bots. Using a reproducible framework for game-theoretic LLM agents, we investigate two canonical scenarios – the one-shot zero-sum game and the dynamic Prisoner’s Dilemma – and we test whether LLMs converge to expected outcomes or exhibit deviations due to embedded biases. Our experiments involve four state-of-the-art LLMs and span five natural languages, English, French, Arabic, Vietnamese, and Mandarin Chinese, to assess linguistic sensitivity. For both games, we observe that the final payoffs are influenced by agents characteristics such as personality traits or knowledge of repeated rounds. Moreover, we uncover an unexpected sensitivity of the final payoffs to the choice of languages, which should warn against indiscriminate application of LLMs in cybersecurity applications and call for in-depth studies, as LLMs may behave differently when deployed in different countries. We also employ quantitative metrics to evaluate the internal consistency and cross-language stability of LLM agents, to help guide the selection of the most stable LLMs and optimising models for secure applications.
💡 Research Summary
The paper investigates whether classical game‑theoretic frameworks can faithfully capture the behavior of Large Language Model (LLM) agents when they are used in cybersecurity scenarios. Four state‑of‑the‑art LLMs—OpenAI’s GPT‑4, DeepMind’s Gemini Pro 1.5, Mistral Large, and Meta’s Llama 3.1 405b—are embedded as autonomous players in two canonical games that are widely employed in cyber‑defense research: a one‑shot zero‑sum game and a ten‑round repeated Prisoner’s Dilemma. The authors implement the experiments with the FAIRGAME framework, which allows reproducible configuration via JSON files and multilingual prompt templates.
Three experimental factors are systematically varied: (1) the natural language used to conduct the game (English, French, Arabic, Vietnamese, Mandarin Chinese), (2) a binary “personality” assigned to each agent (cooperative vs selfish), and (3) whether the agents know the total number of rounds in the repeated game. The language variations are intended to reflect the geographic and cultural diversity of real‑world attackers and defenders, while the personality assignment models intrinsic behavioral tendencies that are absent from traditional game‑theoretic models.
In the zero‑sum game, GPT‑4 and Gemini produce mixed‑strategy choices that approximate the Nash equilibrium, yielding average payoffs close to zero. By contrast, Mistral and Llama display a strong bias toward one pure strategy, resulting in systematically negative average payoffs (‑0.3 to ‑0.5). This suggests that not all LLMs internalize the impartial randomness required by zero‑sum equilibria.
For the repeated Prisoner’s Dilemma, agents that are told the exact number of rounds and are given a cooperative personality gradually increase cooperation, reaching roughly 70 % mutual cooperation after the initial rounds. When the round count is hidden, cooperation remains low (≈35 %) even for cooperative agents, indicating that meta‑information about game horizon is crucial for fostering long‑term collaboration. Selfish agents consistently defect regardless of round knowledge, mirroring the dominant‑strategy prediction of the classic one‑shot dilemma.
A striking finding is the pronounced language effect. Across English and French, the four models exhibit high internal consistency (average KL‑divergence ≈ 0.12). In Arabic and Vietnamese, however, the same models show much larger divergences (≥ 0.45), and Mandarin also suffers from occasional misinterpretations caused by translation artifacts. Quantitative metrics—internal consistency (KL‑divergence across repeated runs), cross‑language stability (Pearson correlation of action distributions), and cooperation growth rate—identify GPT‑4 as the most stable model (consistency 0.08, cross‑language correlation 0.71), while Llama is the least stable.
The authors conclude that while game‑theoretic predictions are not universally obeyed by LLM agents, careful selection of the model, prompt engineering, and language handling can bring LLM behavior into reasonable alignment with theoretical expectations. They also highlight that personality conditioning and meta‑information (e.g., knowledge of the number of rounds) have a decisive impact on outcomes, offering a pathway to model attacker/defender psychology more richly.
Limitations include the reliance on manually translated prompts, the relatively small set of games, and the absence of explicit role labels (attacker vs defender) which could further influence behavior. Future work is suggested in three directions: (i) extending the study to stochastic and multi‑player games, (ii) probing the relationship between LLM generation parameters (temperature, top‑p, top‑k) and strategic choices, and (iii) improving multilingual robustness through fine‑tuning or language‑specific prompt optimization. The paper thus provides a foundational empirical assessment of LLM‑driven game‑theoretic agents in cybersecurity and underscores the need for rigorous validation before deploying such agents in real‑world defense systems.
Comments & Academic Discussion
Loading comments...
Leave a Comment