Codified Finite-state Machines for Role-playing

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Modeling latent character states is crucial for consistent and engaging role-playing (RP) with large language models (LLMs). Yet, existing prompting-based approaches mainly capture surface actions, often failing to track the latent states that drive interaction. We revisit finite-state machines (FSMs), long used in game design to model state transitions. While effective in small, well-specified state spaces, traditional hand-crafted, rule-based FSMs struggle to adapt to the open-ended semantic space of RP. To address this, we introduce Codified Finite-State Machines (CFSMs), a framework that automatically codifies textual character profiles into FSMs using LLM-based coding. CFSMs extract key states and transitions directly from the profile, producing interpretable structures that enforce character consistency. To further capture uncertainty and variability, we extend CFSMs into Codified Probabilistic Finite-State Machines (CPFSMs), where transitions are modeled as probability distributions over states. Through both synthetic evaluations and real-world RP scenarios in established artifacts, we demonstrate that CFSM and CPFSM outperform generally applied baselines, verifying effectiveness not only in structured tasks but also in open-ended stochastic state exploration.

💡 Research Summary

The paper tackles a fundamental weakness of current large‑language‑model (LLM) based role‑playing (RP) agents: the lack of an explicit, persistent representation of a character’s latent internal state. Prompt‑only methods generate responses directly from the current scene and a textual description of the character, but they do not keep track of how the character’s emotions, abilities, or social roles evolve over time. This leads to drift, inconsistency, and difficulty in debugging.

To address this, the authors revisit finite‑state machines (FSMs), a classic symbolic tool used in game design to model discrete states and deterministic transitions. Traditional hand‑crafted FSMs are powerful but brittle when the state space is open‑ended and the transition logic must be inferred from natural language. The authors therefore propose Codified Finite‑State Machines (CFSMs), a framework that automatically converts a textual character profile into an executable FSM using an LLM as a code‑generation engine.

CFSM construction pipeline

State extraction – The LLM is prompted with the character’s backstory, traits, and constraints and asked to list the “key states” that matter for RP (e.g., “small Mario”, “super Mario”, “fire Mario”, “miss”).
Transition coding – Given the extracted state list, the LLM writes a Python function get_next_state(state, action) that implements the transition logic. Conditions are expressed via a helper binary_q(text, question) which asks the LLM a yes/no/unknown question about the current action or scene. This yields a clear, debuggable piece of code that can be executed at each turn.
Scalable defaulting – Instead of an O(n²) full transition table, the method assigns a default “stay in current state” rule to all n states (O(n)) and overwrites only the k profile‑specified transitions (O(k)).

Probabilistic extension (CPFSM)
CFSMs are deterministic; they return a single next state. To capture the inherent uncertainty of narrative situations, the authors introduce Codified Probabilistic Finite‑State Machines (CPFSMs). The binary_q calls return logits; a softmax over these logits produces a probability distribution over possible next states. The system maintains a state distribution vector H(t) and a transition matrix W(t). At each turn, the distribution is updated, allowing the agent to either sample a next state (for diversity) or pick the most probable one (for consistency).

Experiments

Synthetic domains: The authors test Mario power‑up transitions and a stealth‑combat scenario from Call of Duty. CFSM and CPFSM achieve 23‑37 % higher accuracy in applying the correct transition compared to a baseline that directly prompts the LLM to output the next state.
Real‑world RP benchmark: Using the Fandom Benchmark (≈5,000 scenes across 83 characters), three metrics are evaluated: (1) Behavioral consistency – how often the generated dialogue matches the character’s defined traits; (2) Transition traceability – whether the system’s internal state sequence aligns with the ground‑truth state progression; (3) Profile alignment – similarity between the generated behavior and the original textual profile. CFSM and CPFSM outperform strong baselines such as CharacterGLM, memory‑augmented prompting, and other state‑modeling approaches. CPFSM, in particular, yields 1.8× higher response diversity in ambiguous situations while preserving consistency.
Efficiency: The deterministic CFSM requires on average 0.12 s of LLM inference per transition; CPFSM adds a modest 0.06 s overhead (0.18 s total). End‑to‑end latency stays below 0.45 s per turn on a single GPU, making the approach viable for interactive RP.

Key insights

Explicit, codified state machines provide transparency that pure prompt‑only pipelines lack. Developers can inspect, edit, and debug the generated Python code, ensuring that undesirable behaviors are caught early.
Probabilistic transitions model narrative uncertainty, enabling the same scene to lead to multiple plausible character reactions, which is essential for rich storytelling.
LLM‑code collaboration bridges symbolic AI and neural language models, leveraging the LLM’s world knowledge to write correct transition logic while retaining the interpretability of symbolic FSMs.

Limitations and future work

The current binary‑question formulation can struggle with complex, multi‑dimensional conditions (e.g., simultaneous emotional and strategic considerations). Extending the framework to handle multi‑label queries or hierarchical condition trees is an open direction.
CPFSM’s probability estimates inherit any bias or over‑confidence from the underlying LLM logits. Calibration techniques (temperature scaling, post‑hoc smoothing) are needed to ensure reliable uncertainty estimates.
The work focuses on text‑only RP; extending to multimodal settings (visual cues, audio, game state) will require integrating multimodal LLMs and richer state representations.

Conclusion
Codified Finite‑State Machines and their probabilistic extension constitute a practical, scalable solution for latent state tracking in LLM‑driven role‑playing. By automatically extracting states from natural language profiles and generating executable transition code, the approach delivers higher consistency, traceability, and expressive flexibility than existing baselines. The paper opens a promising research avenue where symbolic structures are synthesized on‑the‑fly by powerful language models, paving the way for more reliable and narratively rich AI characters.

Codified Finite-state Machines for Role-playing

💡 Research Summary

Comments & Academic Discussion

Leave a Comment