From Model Choice to Model Belief: Establishing a New Measure for LLM-Based Research

Reading time: 5 minute
...

📝 Original Info

  • Title: From Model Choice to Model Belief: Establishing a New Measure for LLM-Based Research
  • ArXiv ID: 2512.23184
  • Date: 2025-12-29
  • Authors: Hongshen Sun, Juanjuan Zhang

📝 Abstract

Large language models (LLMs) are increasingly used to simulate human behavior, but common practices to use LLM-generated data are inefficient. Treating an LLM's output ("model choice") as a single data point underutilizes the information inherent to the probabilistic nature of LLMs. This paper introduces and formalizes "model belief," a measure derived from an LLM's token-level probabilities that captures the model's belief distribution over choice alternatives in a single generation run. The authors prove that model belief is asymptotically equivalent to the mean of model choices (a non-trivial property) but forms a more statistically efficient estimator, with lower variance and a faster convergence rate. Analogous properties are shown to hold for smooth functions of model belief and model choice often used in downstream applications. The authors demonstrate the performance of model belief through a demand estimation study, where an LLM simulates consumer responses to different prices. In practical settings with limited numbers of runs, model belief explains and predicts ground-truth model choice better than model choice itself, and reduces the computation needed to reach sufficiently accurate estimates by roughly a factor of 20. The findings support using model belief as the default measure to extract more information from LLM-generated data.

💡 Deep Analysis

Figure 1

📄 Full Content

"Do I contradict myself? Very well then I contradict myself, (I am large, I contain multitudes.)" Walt Whitman's Song of Myself passage illustrates the challenge of understanding human thought: every human choice is just one draw from a complex distribution of often contradictory states of mind. For decades, substantial research efforts in marketing and other social sciences have sought to go beyond observed choices to infer the mental landscape from which they arise (e.g., Guadagni and Little 1983;McFadden 1986).

The advent of large language models (LLMs) offers a vast new data source for these humancentric research fields. LLMs are increasingly used as “silicon agents” to simulate a wide spectrum of human behaviors at scale (e.g., Argyle et al. 2023). When properly conditioned, LLMs can reveal foundational patterns of human behavior, such as price sensitivity in product purchase decisions, producing synthetic data with the potential to mirror real-world observations (e.g., Arora, Chakraborty, and Nishimura 2025;Brand, Israeli, and Ngwe 2024;Horton 2023).

In this paper, we highlight a unique advantage of LLM-generated data: their ability to go beyond the model’s choice to offer a glimpse into the model’s “mind”-a modern response to the age-old challenge of studying human thought. This advantage is rooted in the probabilistic nature of LLMs (Bengio et al. 2003). When prompted to make a choice, an LLM computes a probability distribution over potential responses internally, and then samples from this distribution to generate an output.

This distributional knowledge over the potential choices offers deeper insights into the LLM’s inner trade-offs than its eventual choice alone.

For example, suppose an LLM is prompted to make a choice between two diaper brands, Pampers and Huggies. The LLM computes the choice probabilities between the two brands. For simplicity of illustration, assume temporarily that the LLM selects the next token using greedy search, outputting the choice that is more probable (Minaee et al. 2024). While the LLM’s eventual choice is binary, the underlying choice probabilities are more granular-they can be interpreted as the LLM’s “mind shares” between the two brands, potentially derived from aggregated market shares in the consumer data used for training. To see the value of this more granular data, imagine two scenarios following a price drop by Pampers:

  1. The LLM changes its probability of choosing Pampers from .49 to .51, and changes its eventual choice from Huggies to Pampers.

  2. The LLM changes its probability of choosing Pampers from .01 to .49, and maintains its eventual choice of Huggies.

Using the LLM’s eventual choice alone would likely overestimate the LLM’s price sensitivity (in absolute value) in the first scenario and underestimate its price sensitivity in the second. By contrast, using the probability distribution provided by LLMs helps preserve the information that would otherwise be lost in the coarse aggregation process often involved in translating mind to choice. This advantage of LLM-generated data has not been fully utilized in current practice. Common applications have often defaulted to treating LLMs like human subjects, recording a single response as the answer used for subsequent analysis. A subset of studies use resampling, repeatedly querying the model to derive a distribution of responses (e.g., Argyle et al. 2023;Brand, Israeli, and Ngwe 2024). The resampling approach is intuitively appealing, but can be computationally costly. The methodological gap, therefore, is the lack of a formal measure that captures the LLM’s internal state efficiently.

The paper bridges this gap by introducing model belief, a measure derived from an LLM’s tokenlevel log-probabilities (logits). We formalize this measure to delineate the latent choice distribution over a set of alternatives, as opposed to model choice, the commonly used measure that records the model’s eventual choice. We first prove a series of desirable theoretical properties of model belief.

We show that it is asymptotically equivalent to the mean of unbiased samples of model choice (a non-trivial property, as we shall explain), but forms a more efficient estimator with lower variance and faster convergence. Analogous properties hold for quantities that are smooth functions of model belief and model choice-a useful result for downstream applications. To examine the empirical performance of model belief, we conduct a demand estimation study where an LLM simulates consumer responses to different prices. In limited numbers of runs, model belief explains and predicts ground-truth model choice better than model choice itself, with more accurate, precise, and robust price sensitivity estimates. To achieve sufficiently accurate estimates, model belief needs only about 1/20th of the computation required by model choice. Last, we discuss why model belief should be used as the default measure to extract more information

📸 Image Gallery

cover.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut