LOGOS-CA: A Cellular Automaton Using Natural Language as State and Rule
Large Language Models (LLMs), trained solely on massive text data, have achieved high performance on the Winograd Schema Challenge (WSC), a benchmark proposed to measure commonsense knowledge and reasoning abilities about the real world. This suggests that the language produced by humanity describes a significant portion of the world with considerable nuance. In this study, we attempt to harness the high expressive power of language within cellular automata. Specifically, we express cell states and rules in natural language and delegate their updates to an LLM. Through this approach, cellular automata can transcend the constraints of merely numerical states and fixed rules, providing us with a richer platform for simulation. Here, we propose LOGOS-CA (Language Oriented Grid Of Statements - Cellular Automaton) as a natural framework to achieve this and examine its capabilities. We confirmed that LOGOS-CA successfully performs simple forest fire simulations and also serves as an intriguing subject for investigation from an Artificial Life (ALife) perspective. In this paper, we report the results of these experiments and discuss directions for future research using LOGOS-CA.
💡 Research Summary
The paper introduces LOGOS‑CA (Language Oriented Grid Of Statements – Cellular Automaton), a novel cellular automaton framework that replaces traditional numeric cell states and hard‑coded transition rules with natural‑language descriptions interpreted by a large language model (LLM). In LOGOS‑CA each cell stores a textual description of its current state and a short rule clause (e.g., “Rule: … State: …”). At every simulation step the system constructs a prompt that contains the target cell’s description and a structured list of its eight‑cell Moore neighborhood descriptions, then sends this prompt together with a fixed system prompt to an LLM (such as GPT‑4o or GPT‑5). The LLM is required to respond in a predefined JSON format containing a single field “next_description”. This field becomes the cell’s description for the next time step. The overall update loop is formalized in Algorithms 1 and 2, which iterate over the grid, query the LLM for each cell, and store the resulting history of grids.
Two experimental domains are used to evaluate the concept. The first is a classic forest‑fire cellular automaton. The model has four possible states (empty, tree, burning, ash) and deterministic rules: a burning cell turns to ash, a tree with any burning neighbor becomes burning, ash stays ash, and empty stays empty. These rules are encoded in a natural‑language template (Listing 1). The authors run the simulation on an 11 × 11 toroidal grid with a single central burning cell surrounded by obstacles. Five LLMs are tested (GPT‑4o‑mini, GPT‑4o, GPT‑5‑nano, GPT‑5‑mini, GPT‑5) with temperature fixed at 1. Results show that GPT‑4o and GPT‑5 perfectly reproduce the reference simulation, while GPT‑5‑mini matches closely but occasionally emits invalid outputs. GPT‑4o‑mini and GPT‑5‑nano break down as early as step 1, producing malformed descriptions and losing the required “State:” line. The authors note that temperature = 0 would likely improve rule adherence and that prompt engineering could rescue the weaker models.
The second experiment explores artificial life (Alife) dynamics where the textual state space is far richer. Two description templates are defined: one that generates random characters in the central cell and another that represents empty space. A 25 × 25 grid is initialized with the random‑character template at the centre and empty‑space templates elsewhere. Three independent runs are performed for each of two LLMs (GPT‑5‑nano and GPT‑5‑mini). To analyze the evolving textual field, all cell descriptions across all time steps are embedded using OpenAI’s text‑embedding‑3‑small model. Two visualizations are produced: (1) a global color map obtained by reducing the embeddings to three dimensions with PCA and mapping each dimension to RGB, and (2) a cluster‑based map where L2‑normalized embeddings are clustered with K‑Means (k = 20) and each cell colored by its cluster label. Frequency‑weighted cluster centroids are used to assign labels consistently across time. The visualizations reveal that GPT‑5‑mini quickly settles into a stable pattern: after about 20 steps the mean cosine change between successive embeddings drops sharply toward zero, and cluster boundaries become crisp. In contrast, GPT‑5‑nano maintains higher embedding change throughout the run, producing fuzzy, dynamic clusters and a more temporally volatile field. Quantitative change metrics (1 − cosine similarity) corroborate these observations.
The authors discuss several key insights. First, representing state and rule in natural language is theoretically feasible, but practical reliability hinges on the LLM’s ability to produce consistently formatted output. Model size, temperature, and prompt design directly affect rule fidelity. Second, the temperature parameter mediates a trade‑off between strict rule following (low temperature) and creative, diverse evolution (high temperature). Third, using text embeddings as a lens into the simulation uncovers high‑dimensional semantic structure that traditional numeric CA cannot expose, opening new avenues for analyzing emergent behavior in ALife contexts. Finally, despite current limitations (format collapse, stochasticity), LOGOS‑CA demonstrates a promising platform for language‑driven simulation, educational tools, interactive storytelling, and potentially for integrating symbolic knowledge with emergent dynamics in future AI systems.
Comments & Academic Discussion
Loading comments...
Leave a Comment