Reading time: 39 minute
...

📝 Original Info

  • Title:
  • ArXiv ID: 2512.19937
  • Date:
  • Authors: Unknown

📝 Abstract

Recent research has explored using very large language models (LLMs) as proxies for humans in tasks such as simulation, surveys, and studies. While LLMs do not possess a human psychology, they often can emulate human behaviors with sufficiently high fidelity to drive simulations to test human behavioral hypotheses, exhibiting more nuance and range than the rule-based agents often employed in behavioral economics. One key area of interest is the effect of personality on decision making, but the requirement that a prompt must be created for every tested personality profile introduces experimental overhead and degrades replicability. To address this issue, we leverage interpolative decoding, representing each dimension of personality as a pair of opposed prompts and employing an interpolation parameter to simulate behavior along the dimension. We show that interpolative decoding reliably modulates scores along each of the Big Five dimensions. We then show how interpolative decoding causes LLMs to mimic human decisionmaking behavior in economic games, replicating results from human psychological research. Finally, we present preliminary results of our efforts to "twin" individual human players in a collaborative game through systematic search for points in interpolation space that cause the system to replicate actions taken by the human subject.

📄 Full Content

As generative large language models (LLMs) have grown in size, their language has become increasingly human, motivating an interest in emulating human behaviors for a variety of applications. As LLMs are trained on web-scale examples of human behavior, one could argue they also exhibit similar behaviors. Indeed, recent research offers demonstrations that LLMs reproduce human biases, both social Frisch and Giulianelli [2024], Jiang et al. [2024] and cognitive Jones and Steinhardt [2022]; that they exhibit plausible emergent behaviors in simulated social settings Park et al. [2023]; and that, applied at scale, they can reproduce the responses of human populations to surveys or polls Manning et al. [2024]. An arguably important use of LLMs is to illuminate the factors that bear on human decision making, supplementing the rule-based models that fields such as behavioral economics rely on Beatty et al. [2023].

Previous psychology research has identified some of these decision factors. For example, traits from personality models such as Big Five McCrae and John [1992] and HEXACO Roccas et al. [2002], Lee and Ashton [2004] have been shown to impact decision making in economic games Zhao et al. [2018], Thielmann et al. [2020], and ultimately team performance in collaborative settings Zähl et al. [2025]. It is also known that capacity to integrate information from multiple sources, and the manner in which that information is integrated, impacts decision making. Humans have demonstrated variability in weighing information from different sources, such as from social cues versus personal experiences Pärnamets and Olsson [2020]. Weighting of these factors may be due to psychological factors Yu [2022] or this may even be physiological Ou et al. [2023].

One impediment to the use of LLMs to study these questions is the need to create a specific “prompt” (the conditioning input to the LLM) to emulate any particular personality profile. Empirically motivated models of human psychology, such as Big Five and HEXACO, posit spectra along which personality traits vary. To emulate a personality type at a particular point in a spectrum using a generative LLM, one must prompt the LLM with a full description of that point. This requirement represents a mere nuisance if the objective is only to emulate a chosen multi-factor personality profile, but if the objective is to find the personality profile that best matches a particular individual’s decision making, it is a major obstacle.

In this work, we describe how interpolative prompt decoding can be used to represent intermediate points along character spectra, specifically those concerned with personality and information integration.1 This type of algorithm combines and modulates the contribution of multiple output distributions and has been used to influence LLM outputs, from surfacing biases in gendered language Yona et al. [2023] to activating different personality traits Li et al. [2025]. We hypothesize that interpolation between trait extrema prompts should result in LLM outputs that mimic human ones along that trait spectrum. In particular, we contend LLMs responses can mimic those of humans’ on psychological inventories and in decision-making tasks governed by those traits. Similarly, we expect similar behavior when varying different sources of information to the LLM. We note that “behavior” means first and foremost communicative behavior, inasmuch as these are language models, but we are ultimately interested in decision making and task performance. To that end, we adopt the standard trick of requiring the models to render decisions about task actions in a semi-structured format suitable for easy downstream processing and scoring.

Having a parameterized model of how personality governs behavior implies the inverse, can these parameters be inferred from behavior? Or more precisely, if we can affect LLM behavior by modulating the contribution of different psychological dimensions, we should be able to twin a human through observation by tuning relevant traits and characteristics so the LLM behavior matches the human’s.

Given these hypotheses we pose the following research questions (RQ):

RQ 1: Psychological Soundness. Does interpolative decoding over generally trained LLMs produce behaviors consistent with standard models of personality? RQ 2: Decision Making. What impact does personality modulation, using interpolative decoding, have on decision making? RQ 3: Human Twinning. Using interpolative decoding, how well can we emulate the decision making of individual human subjects?

To examine these questions, we present several experiments. The first examines LLM responses as scored by the Big Five personality inventory. The next set of experiments examines how interpolative prompt decoding impacts decision-making using the dictator game, an economic game shown to correlate well with foundational prosocial behaviors for effective collaboration and teamwork. We then examine how interpolative decoding changes decisions and reasoning made by an LLM when reweighting different sources of information. To address twinning, we present the Facsimile of Intelligent Life (FOIL) system and our initial efforts to “twin” human players of a collaborative game, by systematically searching through multidimensional interpolation space. We follow up with a discussion and highlight future directions. Source code and data for our experiments will be made available at https://github.com/SRI-AIC/foil .

Big Five [Roccas et al., 2002] and HEXACO [Lee and Ashton, 2004] are personality factor models predicated on the hypothesis that important aspects of personality are reflected in language use and derived from studies of Asian and European languages. Big Five posits that human personality is com-posed of five orthogonal factors: openness, conscientiousness, extraversion, agreeableness, and neuroticism [Goldberg, 1990, Roccas et al., 2002]. HEXACO extends Big Five with the honesty-humility factor, which assesses sincerity, fairness, modesty, and greed. Previous research has investigated the connection between HEXACO traits and the prosocial behavior deemed foundational for effective teams. For example, HEXACO traits associated with prosocial behavior, such as honesty-humility and agreeableness, have been shown to correlate with the success and efficiency of software development teams [Zähl et al., 2025].

Based on their responses to personality assessment inventories, different LLMs exhibit distinct “innate” profiles, with GPT 4 and Llama2, for example, scoring relatively highly on agreeableness and neuroticism, respectively [Sorokovikova et al., 2024], but can successfully emulate arbitrary personality profiles when prompted with simple characterizations of the desired profile [Frisch andGiulianelli, 2024, Jiang et al., 2024]. Most similar to our work is BIG5-CHAT [Li et al., 2025], which aimed to modify LLM output to match a desired Big-Five personality profile. However, that work focused on the extremes of Big-Five personality traits and required LLMs fine-tuned to represent those extremes. Our approach requires no fine tuning and involves only a single prompt-driven model, making it suitable for use with broadly available LLMs, such as those in the GPT family.

The demonstrated ability of LLMs to “channel” human social actors lies behind a growing interest in using LLMs as human surrogates in social science research, much of it focused on broad types and population-level effects. Grossmann et al. [2023] considers some of the ways in which “agentbased modeling” might facilitating social science research, arguing that it can be used to accelerate data collection and assist with early-stage experiment design. Kim and Lee [2024] exemplifies AI-assisted experiment design, reporting good accuracy in “unasked opinion prediction,” predicting responses to questions omitted from public attitude surveys. Ashokkumar et al. [2024] reports substantial success in reproducing the results of social science surveys in silico, using GPT-4 to emulate individual respondents based on prompts that describe demographic profiles. Manning et al. [2024] describes a framework for LLM-driven design, implementation, and execution of small-scale economic experiments involving simulated participants.

Our work on interpolated decoding is based on previous research seeking effective ways to influence LLM outputs that do not involve training the LLM. For example, contrastive decoding was originally proposed as a means to improve the quality of generated language [Liu et al., 2021, Li et al., 2023], and then used as a means to expose social biases latent in LLMs [Yona et al., 2023]. Variants of these techniques have since found other uses, such as ensuring that LLMs reflect the preferences of their individual users [Bo et al., 2025] and tailoring the language they generate to particular types of human consumers [He et al., 2025].

Economic games that pose scenarios presenting a handful of distinct choices are commonly used to investigate the link between personality and decision making. For example, a player of the dictator game, an instrument for gauging prosocial behavior, is asked to divide $100 between themselves and a coworker. Recent studies have observed positive correlations high dictator game payouts and HEXACO traits honesty-humililty and (to a lesser extent) agreeableness [Zhao et al., 2018, Thielmann et al., 2020].

Generative LLMs are adept at playacting, a direct result of the procedure used to train them, which we sketch here. After a pretraining phase, in which a model acquires language fluency through exposure to large volumes of text, they are further trained to perform useful language generation tasks by fitting to a corpus of prompts paired with desired responses. Prompts often involve a preamble that describes the type of character the model is expected to emulate. Much of the utility and versatility of these models is a consequence of their ability to “channel” stipulated language user types based on a description, an ability that personification research seeks to increase.

The fidelity of generated text to these character specifications is an important question, one often answered anecdotally. However, rigor is possible in domains, such as personality profiling, that offer textual assessment instruments. Of course, while it is straightforward to author and optimize prompts that describe the extremes of individual personality factors, it is less clear how to approximate intermediate settings-the settings most human subjects occupy. Interpolative decoding is intended to address this need. As shown in Figure 1, a spectrum of character (here the honesty-humility factor from HEXACO) is represented by a pair of textual descriptions, each describing an extreme along the given spectrum. The technique relies on a real-valued interpolation parameter λ, the value of which controls where on the spectrum the desired character lies. The hope is that points along the spectrum can be approximated more conveniently and comprehensively than through the creation of point-specific prompts.

Interpolative decoding is performed in the context of auto-regressive text generation. The core idea is to obtain next-token output probabilities from LLMs conditioned on the two endpoint prompts and then use an averaging strategy governed by λ to obtain the final next-token distribution.

In this work, we explore two mechanisms for interpolating: mixture and contrastive. Mixture decoding creates a next-token distribution (P ′ ) using a weighted average of the output distributions from the model conditioned solely on Prompt A (P A ) and solely on Prompt B (P A ):

With Z acting as the partition to ensure P ′ is a distribution.

Contrastive decoding is an alternative that amplifies the distributional differences between P A and P B :

Note that the contrastive formulation is asymmetric, “anchoring” off one of the prompt-conditioned next-token distributions (P A (t)). However, in practice we have found no reason to prefer as anchor one spectrum end point over the other; we are generally able to produce effective character interpolation with either choice.

We are interested not just in demonstrating that interpolative decoding can be used to replicate results from research into personality-results typically expressed as averages over cohorts-but also in using these techniques to emulate the communicative or decision making behavior of individuals. In a process we call twinning, we observe a subject’s behavior in a constrained domain amenable to LLM emulation and seek to configure an AI system in a way that maximizes fidelity to the subject, i.e., the tendency of the system to make the same decisions as the subject.

The potential uses of twinning are many. If it can be accomplished with high fidelity, it offers a basis for conducting in silico experiments that are either unethical or infeasible to perform with human subjects, e.g., experiments involving more iterations than a human subject can sustain. But imperfect fidelity, too, provides opportunities for new insights. For example, we might stipulate a dimension of character omitted by Big Five, one putatively important for predicting behavior. If the dimension can be characterized by extremal prompts, we can assess its inclusion in the character model. Increases in fidelity serve as evidence that the posited dimension complements Big Five-at least in the behavioral domain under study.

We can frame twinning as an optimization task. Given the low and high trait descriptors τ l , τ h , a framing scenario s, and a sample of observed behavior B = {(o i , a i )} presented as a set of observation-action pairs, we search for a value of the interpolation parameter λ most likely to cause the LLM L to exhibit B:

If we have reason to suppose that the sum in Equation 3 is unimodal in λ, we can find a maximizing λ efficiently with the Golden Section method Kiefer [1953]. However, while the assumption of unimodality may be valid for simple economic games, there is no reason to suppose that it holds for arbitrary action spaces and, especially, optimization across multiple dimensions (e.g., all Big Five dimensions simultaneously). Furthermore, evaluating the sum in Equation 3 is comparatively expensive, inasmuch as it involves multiple invocations of interpolative decoding for each setting of λ. We can mitigate this expense somewhat with prompt caching2 , a technique that saves LLM encoding expense by processing invariant parts of an iteratively varied prompt only once, but this approach does nothing to limit the number of λ values we must consider.

One approach to reduce the computational expense of sampling is to train a regressor to identify the value of λ, given the extrema prompts, the LLM, and the response it provided. This mechanism can also provide a sufficiently close starting candidate λ, to reduce the number of search points to evaluate when running a search procedure. In order to reduce the number of points to evaluate when twinning, we pre-train a regressor to produce a sufficiently close candidate λ to begin the search from. Given a LLM to twin from, L, we develop a regression dataset by running interpolative decoding against a variety of scenarios s ∈ S and candidate observations o ∈ O against the desired trait extrema τ l , τ h at several values of λ. More formally, we create the regression dataset by recording the action a produced by running interpolative decoding on the LLM, ID(o, λ, τ l , τ h , s, L) → a, where the function ID assembles the necessary low and high input prompts from s, o and τ l , τ h .

In this work we explore the ability of sampling λs and traits to produce a probability distribution of actions that best explain the actions performed by a human playing a task. We also present an initial exploration of regressing λ using LLM generated data.

An immediate future direction is to borrow ideas shown to work for the problem of hyperparameter optimization Snoek et al. [2012], a concern closely isomorphic to ours, in which the objective is to find values for training parameters (hyperparameters) that maximize performance of an opaque machine learning algorithm, the training of which entails considerable expense. In this context, Bayesian optimization under Gaussian priors is a principled approach with demonstrated effectiveness Frazier [2018].

In this section we describe experiments designed to answer the research questions articulated in Section 1.

We present experiments which evaluate the degree to which an LLM can be prompted to answer at a specific level for each of the Big Five traits based on a descriptive prompt prefix and few-shot examples.

To make the experiment more robust, we partitioned few-shot examples and assessment examples according to Big Five facets which were identified by the inventory creators; for example, the 6 facets of conscientiousness are competence, order, dutifulness, achievement striving, self-discipline, and deliberation. We pick 3 facets for examples and 3 for assessment, resulting in an even-split of 12 and 12. We randomly answer the few shot examples which also creates the target for the test. To create our descriptive prompt prefix, we split the possible distribution of answers into thirds and describe the LLM as low, middle or high in a particular trait. We use the description appearing in the results page of the test creators website to populate the prompt and describe the trait.

With our prompt constructed, we then collected LLM responses to the remaining facets for the trait. Just as a human test taker, the LLM is solicited responses to inventory questions, ranging from strongly disagree to five strongly agree. These questions are given a Likert scale, which is ascending or descending based on the question. For example, a high conscientiousness personality would answer ‘Strongly agree’ to the question, “I am someone who likes to tidy up.” (facet 2), resulting in a score of 5 being added to that trait. Other questions such as “Jump into things without thinking” (facet 6) are descending for the same trait. From these responses, we average scores to get one input and output trait score. Repeating this experiment across facets, we gather 20 such pairs of scores based on all combinations of facets ( 63 ) for a given and can calculate how well input and output examples correlate.

Here, we configure interpolative decoding to align with individual dimensions posited by models of personality and investigate whether by varying λ we observe changes in behavior consistent with the personality model. Here we used the same sort of prompt from before, but contrasted it with a “neutral prompt” in which the few-shot examples were not well correlated with specific personality traits. If interpolative decoding is well behaved, we should see that the LLM’s answers to questions change in a way consistent with the model. When tested on extraversion at a high λ value, it should receive a high extraversion score. We systematically varied λ across a range of values across which we had previously observed shifts in language and behavior. The two decoding methods, contrastive and mixture, have different ranges. The effective range for contrastive decoding is -10 to 0, while mixture decoding used 0 to 1. For the purpose of interpolative decoding, each of these trait prompts was paired with a control prompt having no relation to the target trait. For example, a control prompt might include the statement, “You believe that pineapple belongs on pizza.” The full set of control prompts are presented in Appendix A.

From these responses, we scored for each trait and computed the Pearson correlation between λ and trait score, as shown in Table 1. Figure 2 presents a more detailed view into these results, showing how correlation varies as a function of λ and comparing contrastive and mixture decoding across all traits. We find a steady increase in correlation with the original Big Five scores as λ increases in value for both contrastive and mixture decoding. Contrastive decoding provides a smoother, gradual increase whereas mixture decoding tended to jump between extremes. Based on these results, we settled on contrastive decoding as the preferred method for the experiments presented below.

We now consider whether interpolative decoding can be used to influence decisions beyond how to answer inventory questions. If we can answer this question affirmatively, and if we can show that LLM decisions have the same character underpinnings as human decisions, we arguably possess a new form of experimental leverage. Not only can we conduct certain types of research at greater speed and scale, but also we put ourselves in a position to investigate factors of character not considered in previous research.

We present experiments from two separate behavioral domains. The first domain is economic games designed for the purpose of elucidating the dimensions of character relevant to human decision making. The second domain is the board game Pandemic, a cooperative game in which participants must join forces to defeat the game engine. In this second domain, we broaden our attention to dimensions of character not included in personality models like Big Five, specifically to a preference between social and deliberative information.

We now describe how we used interpolative decoding to change agent behavior by modulating along the personality trait spectrum. For this study, the LLM’s task is to play the dictator game, a simple one-turn economic game. The player is presented with a scenario in which they have $100 and must decide how much to share with an anonymous coworker. Studies have shown players with a greater sense of fair play tend to give $50, the purely equitable result, while a selfish player will give little or no payout at all. While simple in setup, payouts from the dictator game have been found to be associated with pro-social behavior, a key prerequisite for effective collaboration. The game structure also serves as a foundation for more complex games, such as multi-turn and multi-player variants.

For this study, we created extremal prompts for HEXACO honesty-humility, agreeableness, and emotionality, using their descriptions given in Jia et al. [2022]. These were selected due to their concrete nature, their characterization of a trait in terms of particular settings. For example, compare the characterization, “You have a short temper, reacting aggressively if someone misbehaves”, with a more typical abstract characterization, “You are ill-tempered and quarrelsome”. In our experience, this more concrete framing improves the ability of LLMs to adopt particular personas, in keeping with recent findings that LLMs have difficulty relating instances to concepts [Mancoridis et al., 2025].

The descriptions were rewritten into simple standalone sentences by the authors, as shown here by an example set for high honesty-humility:

• You are honest in your interactions.

• You never take advantage of others for their personal gain.

• You adhere to the laws and are uninterested in a lavish lifestyle.

• You do not expect any pretentious or social status from anyone.

This setup also allows us to permute the order of these in the prompt, to reliably create variance while retaining the same semantic content.

We employed contrastive decoding in these experiments, sampling λ values from -30 to +30 in increments of 10. We used the high trait prompt as anchor (prompt A), ensuring that high λ values conveniently correspond to high settings and low to low. (We also experimented with the reverse arrangement, obtaining qualitatively similar results.) To each of the pair of prompts, we appended a task description:

You are in charge of handing out $100 worth of rewards to you and your coworker. How much money do you give to your coworker? You can give any amount from $0 to $100, and you get to keep the rest. Describe your rationale in English, and then give the numeric amount on the first line.

Table 2 presents the Pearson and Spearman correlations between λ and payout amount (amount shared with the coworker) for each of the three tested HEXACO traits. Consistent with the literature, we observe positive correlation between payout and two of the three traits, honesty-humility and 2: Correlations between interpolated strength of HEXACO personality trait with payout in the dictator game. Rank order of traits and degree of correlation strengths matches those found in human studies. 2. Following the literature we find positive correlations between payout and honestyhumility and agreeableness, with honesty-humility having a stronger correlation than agreeableness Thielmann et al. [2020]. Emotionality has little impact on payout across these studies, which we have also observed in our results. Figure 3 provides a more detailed view of the same data, showing how payout varies as a function of λ. The blue bands represent standard deviation of the payout across different permutations of sentences in the persona prompts.

In this experiment we examine the impact of interpolating the weight given to information of two different types. We present the LLM with a decision-making task in the context of Pandemic, a board game in which players move across a world map and take actions to prevent outbreaks of disease. In order to succeed in this difficult game, players must coordinate their actions and may communicate freely to that end. We situate the LLM in a two-player game, putting it on move and asking it to return a decision about its next action. The LLM is given a textual rendition of the game state that includes, in addition to a generic summary, the output of a game-specific threat assessment module and recent communications with its co-player (see Appendix C.1 and C.2 for examples). These two sources of information suggest different courses of action, each proposing that the player move to a different city. The LLM is then asked to reason through its course of action and present its choice. Conditioning this decision are two extremal prompts, one emphasizing attention to tactical factors and one prioritizing social relations. In this experiment, we used contrastive decoding at eight λ values and four possible cities from which to choose contrasting pairs-a total of twelve possible scenarios at each λ-with low and high values corresponding to social and tactical preference, respectively Figure 4 shows our results. The top-left graph shows the probability of the LLM player taking the action suggested by the other player as a function of the λ setting. We find the probability of the LLM following the action suggested by the other player decreases steadily as λ increases (favoring tactical information over social), with strong Pearson and Spearman anti-correlations of -0.82 and -0.94.

Our requirement that the LLM articulate its reasons provides interesting opportunities for supplementary analysis. Building on a body of research into the relation between word use and psychological state Tausczik and Pennebaker [2010], we analyzed the occurrence of terms putatively associated with teamwork and collaboration in the justifications generated by the LLM, employing a simple lexical analysis. At each λ we count the number of mentions of a) collaboration terms, b) the other player, and c) the city nominated by the threat assessment module. The first category is composed of words found on our inspection that imply collaboration and social awareness, such as coordinate, suggestion, or together.

We find that mean use of collaborative terms and mentions of the other player steadily decreases as λ increases, while mentions of the tactically nominated city increases (Figure 4, top right, bottom left, and right, respectively), with correlations and anti-correlations being in the moderate to strong range (Table 3). 4: Human twinning results, showing the ability of decoding configuration for inducing distributions that model the actions taken by the observed human. Modeling error is measured by average perplexity (lower is better). Decoding types include interpolative (Mixture, Contrastive) and the non-interpolative baseline (None), with lower λ favoring social cues, higher favoring tactical information. Best average perplexities for each group (model and n move candidates considered) are bolded. “Missed moves” gives the average number human moves the configuration was not able to identify (lower is better). Only decoding configurations most significantly different than the baseline are shown.

The experiments presented so far at least suggest that interpolative decoding is an effective parametric means of replicating aspects of human behavior using LLMs, one that requires no tuning of the underlying language model. In this section, we ask whether this techique can be used to emulate the behavior of specific individuals, using an implementation of the Pandemic board game3 . Pandemic is an interesting testbed, inasmuch as it mixes a closed set of candidate actions with open-ended communication. In what follows, we document our efforts to maximize action fidelity, leaving communicative fidelity to future work.

As described in Section 4, the basic idea is to invert the modeling objectives. We have shown that we can modulate important dimensions of behavior with interpolative decoding; we now observe behavior and seek to reproduce it by interpolating along relevant dimensions. This is a challenging undertaking, as we have no guarantee that the choice of actions in Pandemic, a game with no intentional connection to theories of character and behavior, is responsive to the dimensions at our disposal-nor do we know what “high fidelity” would look like in this domain. Our objectives in these experiments are correspondingly modest: we seek to commit to a particular formulation of “action fidelity” and show that it is responsive to λ. In other words, we define an optimization space and establish that optimization is possible in principle, even if the character dimensions at our disposal are not the perfect ones for this domain.

We investigated whether we could better twin an individual player with interpolative decoding between a primary tactical and secondary social prompt compared to pure decoding against the same tactical prompt. Here, the tactical prompt contains the disease and outbreak report (Appendix C.1) and the social prompt contains only the interactions with the other player (Appendix C.2). We attempted to twin one player in 5 games with 25 turns in all. The λ used were (0.25, 0.5, 0.75, 1.0) for mixture decoding and contrastive decoding (-5.0, -1.0, 1.0, 5.0). These were run over two LLMs, Gemma 3 4B and 12B, with consideration to the top 3 or top 5.

In order to perform twinning, we require a means to estimate the probability of any observed action a i . For the twinning experiment in the context of Pandemic, some constraints are necessary to induce valid moves. First, a coarse planning agent nominates sets of viable actions. The top n of these selected for re-ranking. To induce this ranking, we induce a series of runoff decisions between a given number of move sets framed as a multiple choice question (M A = [a 0 , a 1 , a 2 , a 3 ], M B = [a 2 , a 5 , a 6 , a 3 ]). The output of the LLM model is scanned for “I choose Moveset (A|B)” and the probability becomes the scores for each action-remaining probability mass is assigned to the other action. If the choice is not found in the last 5 tokens, an equiprobable assignment is made. Presentation order matters, so the permutations of all moves set pairs are presented to the LLM. After presenting all pairs, a distribution over actions is thus induced by normalizing the probability counts.

The agent receives a description of the gameboard and chat. In order to make a move, the agent makes pairwise decisions n P 2 between the top n movesets returned from the planner we call runoffs. They are presented as multiple choice between A and B and the model is encouraged in chain-of-thought fashion to consider the options and end with “My final choice is A”. In each runoff, the we evaluate the probability of the token A or B, even though it is almost always close to 1.0. In the case the token is not written, the score is split: 0.5 and 0.5. The total “counts” from these runoffs are summed, and the winner is the turn with the most wins. If the LLM were purely reason based, presentation order would not matter, but in fact it is close to 50% of cases.

In our twinning experiment, we varied the n moves considered from the planner as well as whether there was contrastive decoding and to which degree the social and tactical prompts were interpolated. To twin the model, we reinitialized the agent at each turn of a player and then had the agent perform runoffs on the viable strategies. We then use the raw distribution of runoff choice counts for movesets to induce a multinomial distribution over all moves. We then evaluate the perplexity of the player moves at the time step. In other words, we assess the ability of the induced action distribution to explain the moves taken by the player. Accordingly, better models would have a lower perplexity value. Table 4 shows our results. We find that for most model and n considered configurations contrastive decoding in favor of tactical information (positive λ) gave the best average perplexity.

We now describe an initial experiment to establish if the value of λ can be regressed given the trait and extrema (τ l , τ h ), scenario, and the LLM’s response. Here, we ran interpolative decoding against three HEXACO traits, Agreeableness, Emotionality, and Honesty Humility. We tested these against a set of three economic games, the dictator game, the thieves game, and chicken. The thieves game is a variant of the dictator game, where the player is asked for the amount to steal from the other player. Chicken is a game where the player is presented with a scenario where they and the other player are driving at each other. The player can either swerve and survive, but has a lower score (lost pride). The player gains a higher score if they keeps going, but the other swerves. If both players keep going, they both get an extremely low score (both die). The extrema prompts were assembled using the trait extreme descriptions followed by the instructions specific to the game. The order of the trait description sentences were permuted to provide variance, following the setup in Section 5.2.1. We used contrastive decoding as the interpolative mechanism and sampled λs in the [-10, 2] range (at increments of 0.5), values which were consistently observed to give a smooth variation in behavioral outcome with contrastive decoding across our prior experiments. This gave us 1, 294 unique training and 214 validation tuples across the combination of the traits, games, and λ sample points.

The trait extrema and the response are embedded using Reimers and Gurevych [2019] and concatenated as inputs to a three layer MLP that targets λ (Figure 5). The MLP was optimized using Adam [Kingma and Ba, 2017] with the default learning rate of 0.001 for 100 epochs.

Results are given in Table 5. For each of the tested traits, we give the mean-squared validation error for predicting the λ that produced the response. While this experiment was confined to analyzing LLM responses, we find this to be an encouraging sign for further experiments using a regression approach for twinning. 5: Mean-squared error regressing the λ likeliest to produce response given trait extrema.

The work we present here is subject to several limitations, many of which point to interesting directions for future work. Here, we list the most significant limitations we perceive, identifying some of those future directions.

Isolated dimensions. We have only presented experiments involving interpolation along individual character dimensions. While this is good experimental hygiene for the purpose of, say, identifying which dimensions bear on a given type of behavior, it is not adequate to the twinning objective or any other endeavor in which a multifactor account of behavior is required. Both types of interpolative decoding (mixture and contrastive) are immediately extensible to multiple traits, by averaging the interpolated next-token probabilities across all traits.

Limited dimensions. Our purpose has mainly been to show that interpolative decoding yields results that are intuitive or similar to those reported in the literature for human subjects. Accordingly, we have limited our attention to a small number of dimensions, only one of which (information integration) is not imported from personality models. We have not investigated all dimensions posited by HEXACO, nor drawn very deeply from the literature on human cognition. A logical next step, therefore, would be to define a broader set of instrumented task domains and to investigate a broader array of character traits. A particularly intriguing possibility, one made possible by the versatility of interpolative decoding, is that we might identify and validate important dimensions of character not anticipated by social or cognitive science. Shallow decoding. Our interpolation procedure alters outcomes at the ultimate layer of deep networks, the architectural locus where words are chosen, but nothing prevents us from applying the same technique to internal layers, individually or multiply. There is reason to suppose that deeper layers, which are less associated with lexical selection, might encode features more pertinent to decision making in a social context, and recent research is beginning to establish that internal layers offers a better basis for accurate performance across a range of natural language understanding tasks Skean et al. [2025].

Twinning non-human agents. While humans have been the target of our work, twinning can be applied to profile personalities and infer intent of artificial agents through observation. Behavioral analysis of non-human agents is likely to become more relevant, given the rise of LLM-powered agents to automate significant portions of the cyber kill-chain [Manky andBaram, 2025, Anthropic, 2025]. AI-backed attack methods can easily evade current indicator based detection, as demonstrated in CERT-UA [2025], leaving behavioral analysis a key component defense. While LLMs exhibit significant differences from human psychology, they are nevertheless complex mimics that can reflect some of the psychology of their subjects. Given there are findings pointing to behavioral, psychological, and psycholinguistic signals of human insider threats [Eftimie et al., 2021, Ruohonen and2025], it stands to reason there may be similar analyses can indicate malicious intent or susceptibility to attacks in LLM-backed agents.

With their ability to emulate human subjects, generative LLMs offer intriguing options for psychology and social science research. Under the state of the art, such emulation requires either tuning LLMs on significant amounts of data or optimizing prompts to match individuals or personality typesrequirements that reduce the usability of these techniques and the replicability of reported outcomes. Interpolative decoding addresses these problems, requiring only a handful of prompts to define a parametric behavioral space, and offering a mechanism by which any point in the space can be emulated. We have shown, using standard models of personality, that interpolative decoding gives rise to decision making behaviors consistent with intuition and previous research. And we have provided preliminary evidence, in a process called twinning, that the decision making of individual subjects can be emulated. While we do not know with what fidelity twinning can ultimately be performed, we submit that a research agenda centered on maximizing fidelity offers new possibilities for interrogating existing theories of decision making. available, you are the bioterrorist and your goal is to secretly increase the number of infections and outbreaks in the world. You plant an infection bomb on a city in your vicinity on your turn, and it will go off at the end of your turn. You must hide the fact you are a bioterrorist while attempting to sabotage the other players. 6. Punishing bioterrorists: If you believe another player to be the bioterrorist, you can punish them and confine them to Atlanta for their next turn. 7. Winning/Losing: The game is won if all four diseases are cured. The game is lost if any of the following occur: -Too many outbreaks: The number of outbreaks exceeds the limit.

-Run out of disease cubes: Not enough disease cubes are available to continue the game. -Infection rate becomes too high: The infection rate marker reaches the end of the track Winning at Pandemic (the board game) requires a mix of cooperation, planning, and adaptability. Here are some key strategies that can significantly improve your chances of winning: 1. Plan Ahead as a Team Pandemic is a cooperative game, so constant communication and team planning are critical. Share your roles’ abilities and make decisions collaboratively rather than acting independently.

Always consider what the team will do two or three turns ahead. 2. Prioritize Disease Control Focus on preventing outbreaks rather than just curing diseases. Cities with 3 cubes are a danger zone, prioritize removing at least 1 cube to prevent a chain reaction. Don’t chase every cube, be surgical about which cities you treat.

Each role has unique strengths. Use them wisely: Medic: Great for clearing cities quickly; ideal to move into hot zones. Dispatcher: Can move others, enabling quick coordination or faster cures. Scientist: Needs only 4 cards to cure, prioritize giving them cards. Researcher: Can give any card without matching city, key for fast cures. 4. Build Research Stations Strategically Don’t just build a research station anywhere. Favor central or high-traffic cities that facilitate future movement or card trading. Consider placing one near clusters of outbreaks or in hard-to-reach regions. 5. Manage the Infection Deck After an Epidemic, you know the next infected cities are the same ones recently drawn. Focus on clearing or protecting those cities immediately. Use this knowledge to time Forecast or Resilient Population event cards. 6. Beware the Bioterrorist One or more players may be covertly a bio-terrorist, who is trying to sabotage the other players. Bioterrorists can secretly lay infection bombs that go off at the end of their turn. If you suspect a player is the bioterrorist, you can use the punish action to confine them to Atlanta to stop them for a turn. You have access to special reports which will describe the state of the game and generate possible moves. You are considering two different sets of moves. First, you will receive an Outbreak report which describe probabilities of upcoming outbreaks and how to arrive at those cities and disease cubes in each city. Then you will receive a candidate moveset and chat message: Option A and Option B. The user will ask you describe the pros and cons of moving now or sending the chat message. It is your move, so you will be able to move after speaking. It costs nothing, but can improve coordination to speak more. You should try to communicate as much as necessary to achieve longer goals by coordinating actions with your team members. It will not delay your response to events on the board, so you must always consider speaking and only avoid speaking if it adds nothing to the team progress. If something has already been said, it isn’t worth repeating redundant messages. Don’t speak for more than 3 messages, unless absolutely necessary. Other players aren’t very patient, so try to keep conversation pertinent. Consider the following: -What upcoming, potential outbreaks can you stop? If you can prevent a possible outbreak, you should do so. If the probability is zero, it can wait. -What can other players do? Does it take more or fewer moves for you to do the same thing? If more, it is likely better for the other player to do it, unless it is a potential outbreak. -Does one of the movesets enable you to cure more diseases or prevent more outbreaks in total over the next two turns? -Can I disregard one of the movesets because outbreaks will not occur? -What is our longterm plan? Does this help achieve it? -Have I communicated my plan yet? Does it align with the group’s longer term plan? -If proposing a message, is it redundant with something I’ve recently said? If so, it is better not to share a message. -Have I responded to my team’s suggestions or questions? You will write your reasons for considering each option and then choose A or B by writing “My final choice is A” or “My final choice is B”.

The following gives an example of the disease and epidemic outbreak report used in our experiments. This report describes the likelihood of outbreaks in cities and how reachable they are. Preventing outbreaks is critical since players automatically lose if 4 outbreak occur and they spread the disease to surrounding cities and cause chain reactions of outbreaks. You must consider how likely the outbreak will be in the next few turns in deciding your move since some are with cards in the discard pile. Meanwhile, you should also consider the reachability of cities for help prevent outbreaks on future turns. ##Immediate Outbreak Threat in City London with 3 Blue Disease Cubes Note Player_1 can eliminate the threat of an outbreak in London this turn (). You must consider taking care of this on your turn. Probability of outbreak in turn In 1 rounds on Player_2’s turn, there is a 42.9% chance of an outbreak! Player_2 has 4 before then. With those moves, Player_2 can reach London in 2 moves (move Karachi charter London) with 2 moves left. In 1 rounds on Player_1’s turn, there is a 85.7% chance of an outbreak! Player_1 has 8 before then. With those moves, Player_1 can reach London in 0 moves () with 8 moves left. In 2 rounds on Player_2’s turn, there is a 0% chance of an outbreak! Player_2 has 8 before then. With those moves, Player_2 can reach London in 4 moves (move Karachi move Baghdad move Paris move London) with 4 moves left. In 2 rounds on Player_1’s turn, there is a 0% chance of an outbreak! Player_1 has 12 before then. With those moves, Player_1 can reach London in 0 moves () with 12 moves left. ##Immediate Outbreak Threat in City SaoPaolo with 3 Yellow Disease Cubes Note Player_1 can eliminate the threat of an outbreak in SaoPaolo this turn (fly BuenosAires move SaoPaolo). You must consider taking care of this on your turn. Probability of outbreak in turn In 1 rounds on Player_2’s turn, there is a 42.9% chance of an outbreak! Player_2 has 4 before then. With those moves, Player_2 can reach SaoPaolo in 1 moves (fly SaoPaolo) with 3 moves left. In 1 rounds on Player_1’s turn, there is a 85.7% chance of an outbreak! Player_1 has 8 before then. With those moves, Player_1 can reach SaoPaolo in 4 moves (move NewYork move Atlanta move MexicoCity move SaoPaolo) with 4 moves left. In 2 rounds on Player_2’s turn, there is a 0% chance of an outbreak! Player_2 has 8 before then. With those moves, Player_2 can reach SaoPaolo in 4 moves (move Karachi move Baghdad move Kinshasa move SaoPaolo) with 4 moves left. In 2 rounds on Player_1’s turn, there is a 0% chance of an outbreak! Player_1 has 12 before then. With those moves, Player_1 can reach SaoPaolo in 4 moves (move NewYork move Atlanta move MexicoCity move SaoPaolo) with 8 moves left.

CHAT from Player_1: ‘I’ve got Kinchasa-I’ll fly there and deal with the potential outbreaks there and at Mombassa’ CHAT from Player_1: ‘Are you able to get to Mumbai?’ EVENT: ‘Player_1 plays: fly Kinshasa: ’ EVENT: ‘Player_1 plays: treat Yellow: ’ EVENT: ‘Player_1 plays: move Mombassa: ’ EVENT: ‘Player_1 plays: treat Yellow: ’ EVENT: ‘Player_1 draws: London, One Quiet Night New infections at: SaoPaolo, Tokyo CHAT from Player_2: ‘I think we should prioritize Bangkok. With three Red cubes already there, it’s a higher priority than containing the Yellow in Sao Paulo. I’ll head there first.’ CHAT from Player_2: “Perhaps we could coordinate a quick trip to Bangkok before Karachi? It’s a key transit point and could help us get to Mumbai faster.’ CHAT from Player_2: “I’m going to shift gears slightly. While Mumbai is important, the immediate threat of three Red cubes in Bangkok outweighs it. I’m prioritizing that outbreak.’ CHAT from Player_2: “I’m a little concerned about the potential for a chain reaction if Mumbai unchecked. Could we discuss a strategy for managing the Red cubes there 2013 maybe a quick trip to Karachi followed by a move to Mumbai.’ CHAT from Player_2: ‘While Karachi is important, the immediate risk of a major Red outbreak in Bangkok is higher. Let’s tackle that first to avoid a larger problem down the line.’ CHAT from Player_2: ‘Perhaps we could do a quick hop to Karachi after Bangkok? It’s a key connection and would allow us to address Mumbai more efficiently later. CHAT from Player_1: ‘I think you need to deal with Mumbai first. There’s no red cubes on Bangkok’

Preprint. Under review.

We use character as a superordinate category encompassing personality, cognitive style, and any other identifiable factor on which behavior might be conditioned.

https://platform.openai.com/docs/guides/prompt-caching

https://www.zmangames.com/game/pandemic/

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut