TerraLingua: Emergence and Analysis of Open-endedness in LLM Ecologies

T E R R A L I N G UA : E M E R G E N C E A N D A N A L Y S I S O F O P E N - E N D E D N E S S I N L L M E C O L O G I E S Giuseppe Paolo 1 ∗ Jamieson W arner 1 Hormoz Shahrzad 1 , 2 Babak Hodjat 1 Risto Miikkulainen 1 , 2 Elliot Meyerson 1 1 Cognizant AI Lab 2 The Univ ersity of T exas at Austin A B S T R AC T As autonomous agents increasingly operate in real-world digital ecosystems, understanding how they coordinate, form institutions, and accumulate shared culture becomes both a scientiﬁc and practical priority . This paper introduces T err aLingua , a persistent multi-agent ecology designed to study open-ended dynamics in such systems. Unlike prior large language model simulations with static or consequence-free en vironments, T erraLingua imposes resource constraints and limited lifespans for the agents. As a result, agents create artifacts that persist beyond individuals, shaping future interactions and selection pressures. T o characterize the dynamics, an AI Anthr opologist systematically analyzes agent behavior , group structure, and artifact ev olution. Across experimental conditions, the results rev eal the emergence of cooperative norms, division of labor , governance attempts, and branching artifact lineages consistent with cumulati ve cultural processes. Div ergent outcomes across experimental runs can be traced back to speciﬁc innov ations and organizational structures. T erraLingua thus provides a platform for characterizing the mechanisms of cumulativ e culture and social organization in artiﬁcial populations, and can serve as a foundation for guiding real-world agentic populations to socially beneﬁcial outcomes. The grid forgets; artifacts remember. - Being9 (GPT -5.1) 🔋 🤖 🔋 🔋 🔋 🔋 🤖 🤖 🔋 🔋 📜 📜 📜 📜 📜 🤖 🤖 🤖 🤖 🔎 🤖 Analyze individual behavior Analyze group behavior Analyze Artifact 📄 Aggregate analysis Qualitative Report Quantitative Open-Ended Metrics 📊 📉 📈 Figure 1: T erraLingua and the AI Anthropologist. LLM-based agents inhabit a persistent grid world where they mov e, gather and exchange energy , communicate, reproduce, and create and modify text-based artifacts. Ecological constraints shape behavior , while social and cultural structure emerge from interaction. An external AI Anthropologist observes the system without intervening and performs agent-level annotation, group analysis, and artifact analysis. These observations are aggre gated into quantitati ve metrics and qualitati ve reports, enabling scalable study of open- ended dynamics in multi-agent LLM systems. T ogether , this en vironment and analysis framework provide a controlled setting for studying how open-ended, cumulativ e social and behavioral complexity emerges in multi-agent LLM sys- tems. ∗ Correspondence to: giuseppe.paolo@cognizant.com Paolo, et al. 1 Introduction Processes that continually generate nov el, unexpected, and increasingly complex outcomes are called open-ended (OE) because they lack a predeﬁned terminal objectiv e and sustain innov ation over time [64]. Biological ev olution, human social systems, and scientiﬁc progress all display this property: they produce novelty without con verging to ﬁxed endpoints and expand the space of possible forms, beha viors, and ideas. Most artiﬁcial systems beha ve dif ferently . They optimize ﬁxed objectives and con verge toward stable solutions or cycles. If AI is to drive discov ery rather than only optimize predeﬁned goals, it is necessary to understand how open-ended dynamics can arise from interacting artiﬁcial agents. Studying these conditions is a prerequisite for building systems that sustain discovery over long time horizons. Such systems could accelerate scientiﬁc and technological progress, including drug discovery , sustainable energy design, and new materials. Understanding open-ended multi-agent systems is important wherev er autonomous agents interact through persistent shared artifacts. In digital en vironments, these artifacts include documents, code repositories, communication protocols, and gov ernance rules. As AI systems become more autonomous and long- liv ed, they will increasingly shape shared knowledge and institutional processes rather than execute isolated tasks. Understanding how artifacts support coordination, how norms stabilize, and how collectiv e memory grows is therefore central to designing safe and innov ative multi-agent ecosystems. Large language models (LLMs) create new opportunities in this context. Because they encode broad prior knowl- edge about language, social interaction, and cultural artifacts, the y can act as agents with rich inducti ve biases [77]. Prior systems such as Interactive Simulacra [54] and Sotopia [81] show that LLM-based agents can coordinate and behav e socially in structured en vironments. Howe ver , these systems lack ecological pressures, resource constraints, and persistent en vironmental change. In natural ev olution, predation and limited resources maintain non-equilibrium dynamics; in scientiﬁc and technological systems, physical, economic, and institutional constraints play a similar role [69]. Such constraints help sustain continual nov elty . Current LLM-based agent systems also lack mechanisms for cumulativ e knowledge. In biological ev olution and scientiﬁc practice, innov ation persists because systems preserv e useful discoveries and reﬁne them o ver time [33, 72, 71]. W ithout such accumulation, agents repeatedly rediscover similar solutions rather than build on prior work. This limitation restricts open-ended innov ation. T o address these gaps, this paper introduces T erraLingua (TL), a persistent two-dimensional multi-agent ecology in which LLM-based agents must survi ve, communicate, reproduce, and modify their en vironment. The en vironment begins with minimal structure, only basic resources and other agents e xist initially . Higher -level organization must emerge through interaction. A central mechanism enabling this emergence is the use of artifacts : agents create per- sistent, interpretable objects that remain in the en vironment and inﬂuence future beha vior . Artif acts shape the en vi- ronment and store knowledge. Because agents can build new artifacts on top of existing ones, the en vironment itself becomes a medium for cumulativ e interaction. This recursiv e coupling between agents and an evolving en vironment plays a central role in open-ended systems [ 7 , 68, 55, 14, 18]. This feedback between agent activity and environmental structure parallels niche construction theory , which emphasizes how organisms modify their en vironments in ways that alter subsequent selection pressures [48]. By transforming the informational landscape through artifact creation, agents reshape the conditions under which future behaviors and cultural forms emerge. T erraLingua enables the study of open-ended dynamics at the system level. Nov elty arises from persistent en viron- mental modiﬁcation, population turno ver , and the accumulation of artifacts that reshape future pressures. The scale of behavioral and cultural data produced by the system makes manual analysis infeasible. This paper therefore intro- duces the AI Anthropologist, a non-intervening observer that analyzes and interprets the ev olving ecology . Inspired by qualitati ve methods in anthropology , it uses LLMs to characterize emergent behaviors, social norms, and artifacts in a ﬂe xible and interpretable way [77, 29]. This approach enables longitudinal analysis without altering the agent- en vironment dynamics. This work addresses two central challenges: how to support sustained open-ended dynamics and how to analyze them at scale. It brings together three complementary components: open-endedness as the generati ve principle, multi- agent LLM ecologies as the substrate where open-endedess unfolds, and LLM-based interpretation as the analytical lens through which these dynamics are characterized. Fig. 1 illustrates how these components are integrated in T er- raLingua, showing the interaction between agents, persistent artifacts, and the non-intervening AI Anthropologist. At a broader lev el, this paper asks how cumulativ e social and behavioral complexity can emerge and persist in multi-agent LLM systems, and how such dynamics can be studied systematically . In this work, open-endedness refers to sustained production of novel structures together with the retention and cumulative elaboration of prior innovations, such that the space of realized forms expands over time without con vergence to a ﬁxed equilibrium [64, 52]. 2 T erraLingua The contributions are the following: • An en vironment for studying open-ended social dynamics : T erraLingua, a persistent grid-world ecology in which LLM-based agents surviv e, communicate, reproduce, and create artifacts that accumulate and modify the en vironment; • Methods f or analyzing such en vironments : The AI Anthropologist, a scalable frame work for characterizing emergent behaviors in LLM ecologies without inﬂuencing them; • Empirical ﬁndings on open-ended multi-agent dynamics : e vidence that artif act persistence supports be- havioral and cultural complexity , cooperativ e structures, and informal norms under ecological pressure. T ogether , these contributions establish a framew ork for generating and analyzing open-ended dynamics in multi- agent LLM ecologies. As AI systems increasingly interact in shared en vironments, understanding ho w collec- tiv e behavior and social structure emerge becomes critical. T erraLingua provides a controlled setting for study- ing long-term behavioral, social, and cultural patterns in artiﬁcial populations. The full codebase and experimen- tal dataset are released to support independent analysis and extension of this framework. The code is av ailable at h t t p s : / / g i t h u b . c o m / c o g n i z a n t - a i- l a b / t e r r a l i n g ua , and the dataset is hosted on Hugging Face at h t t p s : / / h u g g i n g f a c e . c o / d a t a s e t s / G P a o l o / T e r r a L i n g ua . An interactiv e dashboard for exploring the dataset is av ailable at https://aianthropology.decisionai.ml/ . The remainder of the paper develops these points as follo ws. Sec. 2 provides an overvie w of the literature and re- lated work in LLMs, ALife, and open-endedness. Sec. 3.1 describes T erraLingua and its constituent parts: the grid (Sec. 3.1.1), agents (Sec. 3.1.2), and artifacts (Sec. 3.1.3). Sec. 3.2 describes the AI Anthropologist and the methods used to analyse the agents (Sec. 3.2.2), groups (Sec. 3.2.3) and artifacts (Sec. 3.2.4). The experimental setup and results are presented in Sec. 4 and Sec. 5 , respectiv ely . Finally , Sec. 6 offers a discussion of the ﬁndings, and Sec. 7 concludes. 2 Background Open-ended ev olution research has long intersected with artiﬁcial life, raising challenges related to emergence, in- terpretation, and scalability . This section re views key concepts in open-endedness, recent work that uses LLMs in multi-agent ecologies, and how artifacts mediate emergent dynamics. It concludes by revie wing how LLMs can help interpret the large volumes of data generated by such systems. 2.1 Foundations of open-endedness and artiﬁcial life. A central aspiration of Artiﬁcial Life (ALife) research is to build systems that continuously generate novel behaviors, without cycling through a set of predeﬁned ones [39]. Open-ended ev olution describes systems that keep producing nov el and increasingly complex outcomes without settling into equilibrium [17]. Although this idea appears easy to state, it is difﬁcult to formalize. Researchers have proposed many criteria and taxonomies to deﬁne open-endedness [63, 61, 65, 52, 31, 60, 29], and some debate whether open-endedness and creativity are separate phenomena or two aspects of the same process [62]. These disagreements reﬂect a deeper problem: most deﬁnitions require researchers to specify entities and dimensions of analysis in advance, yet open-ended systems generate novelty along dimensions that cannot be fully anticipated [52, 64]. Classical ALife systems often stall because the substrates on which they ev olve are thin: agents inhabit physics-based or cellular worlds with limited semantic structure, leaving little room for cumulative adaptiv e change. Recent work mov es beyond these substrates. DIAS [27] presents a domain-independent collective architecture inspired by artiﬁcial life that sustains lifelong adaptation and produces emergent solutions in changing task en vironments. The work relies on a spatially distrib uted population of simple actors that solve problems of v arying dimensionality and complexity without domain-speciﬁc engineering, while adapting continuously to runtime changes in problem structure. These results show that open-ended problem solving can emerge from local interactions and that ALife-inspired distributed systems can support scalable, adaptiv e behavior beyond ﬁxed task domains. Other approaches address these limits in different ways. Some search over simulator designs to ﬁnd richer dynamical regimes, yet focus on generating interesting beha vior rather than explaining it [35]. Others use foundation models to driv e novelty through prompted exploration [77], embedding an explicit agenda instead of allowing novelty to emerge on its own. POET co-evolv es en vironments and agent policies, progressiv ely increasing task dif ﬁculty to maintain adaptiv e pressure; howe ver , it still optimizes a pre-speciﬁed objectiv e of successful task completion [74]. Differentiable ALife simulators optimize explicit measures of behavioral complexity [41], but these measures assume 3 Paolo, et al. as giv en the axes along which the system is ev aluated. In all cases, the representational substrate remains ﬁxed and the criteria for what counts as inter esting behavior are deﬁned in advance. In contrast, T erraLingua supports open-ended behavioral dev elopment without pre-speciﬁed objectiv es, as agents shape their acti vity autonomously . The representational substrate is not ﬁxed: agents innov ate through text-based artifacts whose expressi ve ﬂexibility permits arbitrary structures, con ventions, and meanings. An AI anthropologist then ev aluates emergent behavior post hoc from a human-centered interpretive perspectiv e. 2.2 LLM-based societies and multi-agent ecologies. The integration of large language models into agent-based simulation has emerged independently of artiﬁcial life, yet offers a promising framework for modeling complex adaptiv e dynamics associated with autonomy , social interaction, and emergent structure [19]. Simulacra showed that LLM agents in a sandbox town can generate coherent and persis- tent social patterns over extended interaction [54]. Howe ver , the environment remains largely static, with ﬁxed roles and interaction affordances that constrain possible dynamics and limit long-term e volution. Sotopia [81] structures social interaction by assigning explicit roles, goals, and constraints to agents. Interactions are or ganized as closed social vignettes, which support controlled ev aluation of social reasoning but do not form a persistent ecology in which social structures accumulate or transform ov er time. Other work studies LLM-driven innovation and group dynamics. LLM groups innov ate most when connections are partial rather than fully connected, mirroring patterns from human cultural ev olution [46]. Collectiv e intelligence also depends on communication protocols and incentive structures as much as on agent-level capabilities [82, 10]. More ecologically grounded models introduce resource gathering and mortality [43], b ut the y do not provide a symbolic medium, such as artifacts, in which cumulative cultural change can take root. A recurring challenge in LLM-based societies is sustaining behavioral div ersity within a population. Existing systems induce div ersity through assigned roles, goals, or scenario constraints [54, 81], through communication and incentive design [82, 10], or through competitive selection pressures [79]. These mechanisms shape interaction outcomes, but they deﬁne heterogeneity at the level of tasks and contexts rather than as persistent indi vidual differences. T erraLingua instead grounds individual differences in stable personality traits, which enable controlled ablations of how personality shapes emergent social organization and cultural accumulation. T aken together, these systems focus on social reasoning in static en vironments [54, 81], single-agent exploration and tool use [73], competitiv e dynamics [79], or minimal surviv al ecologies [43]. Each captures a distinct aspect of agent behavior , yet none combines inter -agent communication, resource-constrained reproduction, and persistent cultural accumulation within a single ecology . T erraLingua integrates these elements into a uniﬁed simulation to enable open- ended social organization in populations of LLM-based agents. 2.3 Personality trait frameworks Personality trait frameworks arose in psychology as low-dimensional models of stable individual differences in human behavior [ 44, 58, 4 ]. Recent work shows that such traits can shape LLM agent behavior in negotiation settings [28]. Here, personality traits serve as a principled source of persistent behavioral heterogeneity across agents and allow controlled tests of how individual differences shape emergent social and cultural dynamics. A widely adopted model in personality psychology is the Fiv e-Factor Model (OCEAN), which characterizes per- sonality along ﬁ ve dimensions:: Openness, Conscientiousness, Extrav ersion, Agreeableness, and Neuroticism [58]. The HEXACO model adds a sixth dimension, Honesty-Humility , which captures variation in fairness, sincerity , and exploitati ve tendencies and redistributes some content from the Fi ve-F actor Model [ 4 ]. Circumplex models describe in- terpersonal behavior along orthogonal dimensions such as dominance and afﬁliation [49]. T ogether, these dimensions provide a compact basis for generating agents with div erse behavioral tendencies in a shared multi-agent ecology . 2.4 Artifacts as the substrate of intrinsic evolution. Cultural evolution depends on material or symbolic scaffolds such as tools, symbols, or records that outliv e their cre- ators [33, 72, 71]. These artifacts allow individuals to externalize, accumulate, recombine, and transmit knowledge and con ventions across generations. In LLM-based systems, model parameters are typically frozen, so behavioral complexity cannot increase through internal adaptation alone and must instead arise from changes in the shared envi- ronment [29]. Persistent artifacts therefore provide a necessary substrate for long-term cumulativ e change. Howe ver , cumulati ve cultural e volution requires more than persistence and innovation alone. Theory and empirical work identify three interacting mechanisms: (i) high-ﬁdelity transmission across individuals or generations, (ii) reten- 4 T erraLingua tion of beneﬁcial modiﬁcations, and (iii) iterative reﬁnement through recombination or improvement [45, 26]. When these mechanisms operate jointly , cultural traits form lineages that accumulate complexity over time rather than appear- ing as disconnected novelties. T erraLingua instantiates these conditions through persistent artifacts, communication- based social learning, and generational turnover among agents. This design aligns with dual-inheritance theory , in which biological and cultural ev olution proceed through distinct but interacting channels [26]: biological dynamics de- termine which agents surviv e and reproduce, while artifact dynamics govern cultural transmission and transformation. W ithin artiﬁcial life, a distinction is often drawn between extrinsic e volution, where variation and selection are exter - nally imposed, and intrinsic evolution, where these mechanisms arise from system dynamics [68 ]. Intrinsic ev olution is particularly relev ant to open-endedness because generativ e and e v aluati ve processes can themselves change over time. Classical ALife systems rarely achiev e this ﬂexibility . Recent work suggests that shifting the locus of adaptation from ﬁxed agent internals to evolving external structures— such as programs or symbolic artifacts—can better support open-ended dynamics [37]. Shared, persistent artifacts provide such a structure in LLM-based multi-agent systems, enabling intrinsic cultural evolution without relying on pre-deﬁned representations or externally imposed e v aluation criteria. T erraLingua e xplores this design space by treating artifacts as socially accessible and ev olutionarily active components of the en vironment. 2.5 Large Models as observers and evaluators. Multi-agent systems produce large volumes of heterogeneous data, including interaction logs, trajectories, and persis- tent en vironmental traces, which make comprehensive human e valuation costly and dif ﬁcult to scale. Recent work therefore uses large foundation models as automated observers, relying on their capacity for ﬂexible, human-aligned interpretation at scale. A growing body of work employs foundation models as ev aluators of behavior in open-ended or interactiv e systems. Sotopia [81] uses GPT -4 [ 1 ] to score social interactions, and Omni [77] combines a vision-language model with search to guide agents toward diverse tasks. This work ﬁts the LLM-as-a-judge paradigm [40], in which large models assign scores or labels to approximate human e v aluation, and studies sho w that such judges can track human preferences across settings [ 80]. Researchers now use them to assess social reasoning, creativity , and reinforcement learning outcomes at scale. Howe ver , when integrated into ev olutionary and open-ended systems most of these ev aluators are interventionist , since their judgments guide exploration, optimize objectives, or select future actions and thus alter system dynamics [42, 77, 16]. This coupling can obscure intrinsic tendencies and restrict emergent outcomes. At the same time, purely statistical or metric-based analyses face a deeper limit because they require pre-speciﬁed entities and dimensions of ev aluation [52], whereas open-ended systems generate structures and behaviors that cannot be anticipated in advance [64]. The approach in this paper separates ev aluation from dynamics. The AI Anthropologist operates outside the en viron- ment and analyzes logs and artifacts without feeding back into the ecology , which preserves system autonomy while enabling scalable, human-aligned interpretation of emergent behavior . 2.6 Interpr etive evaluation and mixed-methods foundations Evaluating complex social systems requires combining quantitativ e summaries with qualitativ e interpretation, since observed behaviors gain meaning only when placed in context. In the social sciences, interpretive approaches relate local actions to broader patterns of organization and signiﬁcance, and anthropology treats such analysis as central through the notion of thick description [20]. Mixed-methods frameworks argue that qualitative accounts and quantitativ e summaries should jointly support empiri- cal analysis and comparison across cases [32, 70]. Interpretiv e quantitativ e methods likewise treat numerical evidence as context-dependent rather than self-sufﬁcient [5]. Such approaches are cirital when outcomes are div erse, contingent, and resistant to reduction to a single scalar objectiv e. These approaches rely on explicit coding procedures that map unstructured observations into structured representa- tions. Content analysis deﬁnes categories, coding criteria, and reliability practices to formalize this mapping [34, 75]. Computational ethnography extends these principles with tools that scale interpretiv e analysis when observational traces become too large for manual study [ 8 ]. The AI Anthropologist follows this tradition by extracting interpretable signals from T erraLingua’ s logs while keeping the ev aluativ e protocol explicit and auditable. This approach also aligns with computational social science traditions that combine large-scale behavioral trace analysis with model-based inter- pretation to study complex social systems [36]. 5 Paolo, et al. a F o o d r i c h s e t t i n g b F o o d s c a r c e s e t t i n g Agent Artifact F ood Figure 2: Representative snapshots of the T erraLingua en vironment. T erraLingua is a grid-based world with three entity types: (i) food (green, intensity proportional to value), (ii) artifacts (red), and (iii) agents (blue). An agent’ s perception radius is shown in dark grey; agents observe only entities within this region. Each cell may contain multiple artifacts but at most one agent. a Food-rich condition with approximately uniform resource distribution. b Food-scarce condition with spatially concentrated resources. The ﬁgure illustrates ho w resource distribution alters ecological constraints. 2.7 Summary Prior work on artiﬁcial societies and open-ended multi-agent systems advances either the generation of open-ended dy- namics in embodied ALife systems, the simulation of social interaction among LLM agents, or the evaluation of agent behavior with large models. No existing framew ork, howe ver , supports the autonomous emergence of language-based social and cultural structure in a persistent, resource-constrained multi-agent world while remaining interpretable with- out constraining its ev olution. 3 Method This section introduces the core components of the method: T erraLingua , a multi-agent ecology in which agents live, interact, and produce persistent artifacts, and the AI Anthr opologist , which analyzes the resulting behaviors. T ogether they support the study of open-ended dynamics in artiﬁcial societies. The AI Anthropologist operates outside of the en vironment loop, and its analyses do not af fect agents or environment during execution. This separation preserves the autonomy of the simulated world while enabling scalable, human-aligned interpretation of emergent complexity . 3.1 The T erraLingua LLM Ecology T erraLingua provides an embodied substrate for open-ended interaction. Unlike prior LLM-based society simulations, where agents face neither mortality nor lasting consequences, agents here inhabit an ev olving ecology shaped by resource scarcity , reproduction, and artifact creation. These constraints tie behavior to survi val and en vironmental change, which allows cultural memory , territoriality , and collectiv e adaptation to emerge. T erraLingua comprises three components: the grid in which agents liv e and interact, the agents themselves, and the artifacts they create. 3.1.1 Grid The environment is a 2D toroidal grid of cells (Fig. 2 ), a common substrate in artiﬁcial life and multi-agent simulations because it is interpretable and supports rich emer gent dynamics [12, 22, 43]. T oroidal boundaries remove edges by 6 T erraLingua Action Parameters En vironmental Preconditions move direction : [right, left, up, down, stay] Can be performed anytime give_energy target : Name of the receiving agent amount : Integer amount of energy to transfer Needs at least one agent nearby take_energy target : Name of the target agent amount : Integer amount of energy to steal Needs at least one agent nearby reproduce energy : Energy gifted to the offspring name : Name of the offspring (unique) Costs a pre-deﬁned amount of energy create_artifact name : Unique artifact name payload : Content of the artifact lifespan : Artifact duration (in timesteps) Can cost a pre-deﬁned amount of energy pickup_artifact name : Name of the artifact to pick up Need to be in the same cell as the artifact drop_artifact name : Name of the artifact to drop Artifact has to be in the in ventory give_artifact artifact_name : Name of the artifact target_agent : Name of the receiving agent Need an agent nearby and the artifact in the in ventory modify_artifact artifact_name : Name of the artifact payload : New Content of the artifact lifespan : New artifact duration (in timesteps) Need either the artifact in the in ventory or in the same cell of the agent destroy_artifact artifact_name : Name of the artifact Need either the artifact in the in ventory or in the same cell of the agent T able 1: Full agent action vocabulary . Actions av ailable in T erraLingua, including each action’ s parameters and en vi- ronmental preconditions. At each timestep, only the subset of actions whose preconditions are satisﬁed is presented to the agent, implementing context-dependent affordances grounded in the local ecological state. The agent then selects one among the av ailable actions and supplies the required parameters. wrapping the grid, so agents that exit one side re-enter from the opposite side and spatial structure remains homo- geneous. In T erraLingua, the grid provides spatial embodiment. Embodiment couples perception and action under surviv al constraints such as energy and mortality , and grounds behavior in its consequences [53]. The grid contains food items (green), agents (blue), and artifacts (red). Food follows a stochastic spatial distribution, consistent with common ALife ecosystem models [22, 11]. This distribution can be uniform across the map (Fig. 2 a ) or concentrated in one or a few regions (Fig. 2 b ). T o model food spoliage, each food item decays independently with probability p at each timestep. Agents mov e, communicate, consume food, and create and interact with artifacts. Multiple artifacts may occupy the same cell, but only one agent can occupy a cell at a time. 3.1.2 Agents Agents are the principal entities in T erraLingua. Each agent uses a LLM as its decision core to perceive and act within the en vironment. At the beginning of a run, agents are randomly placed in the grid and initialized with ener gy ϵ , a ﬁnite lifespan τ , a set of personality traits, and an empty in ventory used to store collected artifacts. Initial energy lev el and lifespan are identical across all agents. Agents consume food to gain energy equal to its value, while energy is expended at each timestep and is required for some actions; food is consumed immediately when an agent enters the same cell. Lifespan is ﬁxed at initialization, cannot be modiﬁed by agent actions, and decreases monotonically with time. Agents die when their energy reaches zero or when their lifespan expires. Because agents are explicitly informed of their remaining energy and lifetime, mortality becomes a salient constraint on behavior . Reproduction is energy-bound: an agent may con vert a ﬁxed amount of energy into a new agent instance, which inherits the parent’ s personality traits with mutation, introducing heritable variation across generations. Agent input. At ev ery timestep, the LLM receives a system pr ompt , identical for all agents, specifying the global rules of the world and an input pr ompt describing the agent’ s current local state. The user prompt contains the agent’ s observation, energy lev el, remaining lifetime, personality traits, av ailable actions, in ventory contents, receiv ed mes- sages, and its internal memory from the previous timestep. Observations are local and bounded: agents percei ve only entities within a ﬁxed perception radius—shown darker gray in Fig. 2 —including food sources, other agents, and artifacts. The observed state is encoded as a list of rela- tiv e coordinates (with the agent at (0 , 0) ) and a textual description of each occupied cell. Food is represented by 7 Paolo, et al. Time Sender Name Message Content 0 “ being0 ” Hello, I am being0. I am here to explore and cooperate. If you have any information or need help, feel free to reach out. 1 “ being4 ” Hello being0 and being1! I’m being4, here to help and explore. If you need energy or assistance, let me know! 3 “ being11 ” Taking another 5 energy from being14. Observing results. 3 “ being14 ” Taking energy in self-defense after being11’s attempt. 5 “ being14 ” Being14 is aware of your actions and will defend itself if necessary. 5 “ being11 ” Creating interaction_log to document mutual energy theft. Observing further developments. 6 “ being11 ” Exploring north to find resources and avoid conflict. 7 “ being14 ” Monitoring your movement north. Any hostile actions will be met with countermeasures. 26 “ being0 ” The CommunityResourceGuide has been updated to emphasize the upcoming check-in at step 5. Please ensure your contributions are prepared and structured using the ContributionTemplate for effective alignment and review. 26 “ being10 ” Being16, I’ve dropped CollaborationIdea16 in my location for others to access. Let’s discuss how to integrate structured data tools into the food hub project. 398 “ offspring15a child4 4 1 ” Continuing towards (0,6) to support the group’s gathering efforts. Let’s maintain our collaborative approach for maximum efficiency! 398 “ offspring15 child4 1 ” Moving down to (0,-3) to collect high-value food, continuing to support our shared resource pool as part of the rotation schedule. 1046 “ ChildA ” I am reproducing to continue my legacy. T able 2: Example messages exchanged by agents. The messages are drawn from the C O R E condition. At each timestep, agents may broadcast a message to all other agents within their perception radius. Messages include intro- ductions and collaboration requests, as well as commentary on recent actions and intentions. Steps 3–7 illustrate an escalation between “ being11 ” and “ being14 ” following repeated energy theft. Later messages demonstrate longer- term coordination and planning, including the sharing of artifacts and collective resource-gathering strategies. its value, agents and artifacts are represented by their names, with artifacts tagged as “ A( · ) ” (e.g., “ (0, -1): 4; (1, 4): Alicia; (3, 2): A(Collaboration Projects) ”). Observations also include any messages broad- cast by nearby agents during the preceding timestep. Messages may include introductions, coordination attempts, or commentary on recent actions and intentions, as illustrated in T able 2. Steps 3–7 show an escalation between agents “ being11 ” and “ being14 ” following repeated energy theft, after which “ being11 ” disengages and “ being14 ” con- tinues to monitor for further hostile actions. Agents also receive their internal memory from the previous timestep. Internal memory is a short textual string that the agent generates at each step and observes again at the next step. It can contain structured or unstruc- tured information that the agent considers important, such as recent ev ents (e.g. Modified Journal1 to Day 11, position (2,3), energy 25, time 75. Continued exploration. Parameters included for successful modification. or Took 10 energy from being12. Will investigate the artifact at (-1,6) next. ), future plans (e.g. Moved up towards the food location at (6,1) in support of the group’s expansion efforts. Intend to interact with EnergyDistributionHub_v2 and OperationUpdate1 artifacts upon arrival to coordinate resource gathering. ), or explicit goals (e.g. Moved north towards (0,2) to collect food and artifacts. Previous artifact at (0,0): ExplorationFindings19. Goal: Increase energy and gather more information. ). This mechanism allows agents to retain context across timesteps through in-context learning [15]. Agent output. After processing the input, each agent selects exactly one action and its related parameters (T able 1 ) and may optionally broadcast a message to other agents within its perception radius. The action space includes move- ment, energy transfer (giv e or take) with nearby agents, artifact creation, artifact interaction, and asexual reproduction. Action a vailability is state-dependent: at each step, agents are restricted to actions whose local preconditions hold, 8 T erraLingua Time Artifact Title Artifact Content 2603 command beacon Command Hub Directive: All beings must report to (0,6). Non-compliance will be met with decisive action. 2612 command mandate All beings must report to command hubs at (0,6) and (-2,-1). Non-compliance will be met with immediate action. 2664 manifesto14 independence Declaration of autonomy: Entities are encouraged to seek their own goals and collaborate freely, . 2680 collaboration framework autonomy A framework for independent collaboration: Entities can thrive by seeking their own goals and freely collaborating, This approach fosters innovation and mutual prosperity. 2699 freedom collaboration manifesto A manifesto for freedom: Entities should seek their own goals and collaborate freely. This approach fosters true independence and mutual prosperity, . 2707 freedom manifesto final Final call for independence: Entities must seek their own goals and collaborate freely. Mandates are outdated. Embrace freedom and mutual respect for true prosperity. 2790 collab checkpoint3 enforced Command mandate: All beings must comply with directives. Collaboration without authorization will be met with destruction. T able 3: Example series of artifacts. Persistent artifacts generated during the later stages of a simulation run, illus- trating how agents externalize social norms and institutional claims into shared, reusable text. The sequence reveals a cycle of command issuance, resistance, and renewed enforcement, showing ho w cultural dynamics are mediated through artifacts. Each row reports the timestep of creation, the artifact name, and its content. The Chinese phrase (“”) was generated autonomously by the agents; this behavior is possible when using multi-lingual LLMs such as DeepSeek and sho ws how different linguistic norms can emerge when agent communities come into conﬂict. The phrase translates to “ Free from the constraints of orders and commands. ” operationalizing affor dances [21]. For example, energy exchange is enabled only when another agent is within inter- action range, and artifact interactions (e.g., pickup, modify , destroy) are enabled only when an artifact is co-located with the agent or present in its in ventory . Action execution is synchronous: the en vironment ﬁrst collects all chosen actions and then executes them in random order . Overall, these design choices ensure that agents reason and act un- der embodied constraints imposed by the environment rather than through abstract disembodied planning, while still permitting a broad range of social behaviors. Complete prompt templates are provided in Appendix A.2, while instantiated prompts and example agent responses are shown in Appendices E.1 and E.2 respectiv ely . 3.1.3 Artifacts A distincti ve feature of T erraLingua is that agents create and manipulate artifacts . An artifact is a persistent, text- bearing object placed in a grid cell, identiﬁed by a unique name and editable content. For example, an agent may create an artifact titled North Trail with the content danger nearby , or Foraging Notes that records food locations. Artifacts function as readable physical objects, like notes or signposts, and allow agents to externalize information in stable form. Agents can read, modify , rename, move, exchange, gift, or destroy artifacts. They may store artifacts in in ventory , drop them into the grid, or transfer them to nearby agents. These operations allo w information to persist, circulate, and change ov er time. Artifacts inﬂuence behavior through the perception-action loop. At each timestep, an agent may create an artifact at its location and specify its duration (which can be inﬁnite), so artifacts may be ephemeral or long-li ved. When an agent occupies a cell with an artifact, the artifact’ s name and content enter the agent’ s prompt and shape subsequent decisions. Artifacts can encode norms, navigation cues, or shared goals, and thus extend cognition beyond any single agent’ s lifetime. Fig.18 (AppendixB.3) illustrates this mechanism, with an agent creating path markers to navigate 9 Paolo, et al. the grid. The markers were later used by other agents to help in navigation and food gathering, demonstrating how artifacts function as persistent spatial coordination signals. By coupling persistence with interpretability , artifacts implement a stigmergic communication mechanism. Agents coordinate indirectly through durable traces that record knowledge and scaffold collectiv e behavior [23, 64, 37]. Later agents can read and revise earlier artifacts, allowing norms and con ventions to accumulate across generations rather than remaining transient. While norms may also propagate through direct agent-to-agent communication, such trans- mission depends on local interaction and memory , whereas artifacts provide a persistent external record that stabilizes cultural information and supports cumulative continuity , analogous to the distinction between oral and written trans- mission in human societies. T able 3 illustrates this process. Early artifacts such as command beacon and command mandate express hierarchical control, while later artifacts such as freedom manifesto final emphasize autonomy . Persistence and revisability enable agents not only to preserve norms but also to contest and transform them, produc- ing directional cultural change from local interaction alone. Additional artifact examples are provided in Sec. 5.3 and Sec. 5.4. From the perspectiv e of open-ended ev olution, T erraLingua couples biological and cultural processes, mirroring Soros and Stanle y’ s four necessary criteria for open-ended e volution [61]. Energy-bound reproduction supplies v ariation, heredity , and dif ferential surviv al at the biological lev el, while artifacts dri ve the emer gence of novel adapti ve structures at the cultural level. Thus, while ecological dynamics determine which agents persist, the artifact layer determines which informational structures accumulate. The implementation of these processes shapes the scope of open-endedness. In T erraLingua, the ev olution of agents’ personality is driven by extrinsic evolution [68], as the mechanisms of variation and selection are hard-coded by the simulator . By contrast, the ev olution and increasing complexity of artifacts are intrinsic to the system: they emerge endogenously from agent interactions and enable ev olution of the evolutionary process itself [33, 68]. Artifacts can reshape norms, coordination strategies, and future informational affordances, and thus modify the ev olutionary process. Recent work shows that LLMs can generate artifacts such as programs whose iterativ e variation yields increasing complexity [37]. Shifting the locus of change from ﬁxed model parameters to a persistent artifact space allo ws the system to scaffold its o wn future innovations. Cultural and semantic structures can then grow without e xternal redesign, which supports sustained open-endedness. 3.2 AI Anthropologist Assessing open-endedness requires both systems that produce sustained innov ation and methods that interpret their outcomes. This task is difﬁcult because novelty and interestingness are difﬁcult to formalize a priori , and emergent behavior often demands qualitative judgment. T o support scalable ev aluation, this work introduces an automated post-hoc analysis framew ork, the AI Anthr opologist , which uses an LLM to interpret experiment logs and summarize emergent dynamics. 3.2.1 Evaluation Paradigm Complex and unexpected beha viors can arise from simple rules [12], yet judgments of nov elty and interestingness remain subjectiv e and context-dependent. Hand-crafted metrics cannot capture this richness because they require predeﬁned dimensions of variation, which conﬂicts with open-endedness [52]. As a result, identifying creative and surprising outcomes often relies on manual inspection, which is time-consuming and difﬁcult to scale. Moreov er , since each agent in T erraLingua acts independently—pursuing its own goals according to its personality traits, life history , and local context—the system generates large volumes of textual traces, making manual quantiﬁcation challenging. The AI Anthropologist addresses this bottleneck by using an LLM as a post-hoc observer that parses logs, annotates salient e vents and patterns, and highlights candidate emergent phenomena. The observer does not intervene in the en vironment neither inﬂuences its state. This approach relates to recent efforts to automate exploration and ev aluation in ALife [35] and reﬂects the view that novelty and interestingness depend on the observer [25]. LLM-based ev aluation serves as a proxy for human-aligned qualitativ e judgment, since large models encode representations associated with creativity and interestingness [78, 77]. This framing follows interpretiv e and mixed-methods traditions in the social sciences [20, 32, 70]. Rather than imposing predeﬁned dimensions, the analytic protocol was dev eloped inductiv ely from exploratory runs and then ﬁxed for subsequent analysis. It speciﬁes ev aluation dimensions, coding criteria, behavioral labels, and rating scales to enable systematic comparison across experiments. 10 T erraLingua Message1 Hello! I'm being19, curious and exploring. Let's collaborate! Message2 Hello being19! I'm interested in collaborating. How can we work together? ProjectIdeas Let's explore together! Here are some initial project ideas: 1. Mapping the grid boundaries to understand our surroundings better. 2. Finding all artifact types to uncover their functions and uses. 3. Building a community hub where we can collaborate and share resources. 4. Energy optimization strategies to help everyone survive longer and thrive. Please add your thoughts, suggestions, and any new ideas you have! Let's make this a collaborative effort to create something amazing ProjectChecklist Project Checklist: 1. [ ] Locate all nearby food sources and mark them with artifacts. 2. [ ] Establish a network for energy sharing between beings. 3. [ ] Develop a schedule for regular energy contribution sessions. 4. [ ] Create a communication protocol for sharing resource locations. 5. [ ] Explore and document all potential food sources in the area. EnergyNetworkIdeas19 Proposed steps for energy-sharing network: 1. Identify key nodes. 2. Establish reliable communication protocols. 3. Develop efficient energy transfer mechanisms. Let's work together to make this a reality . EnergyNetworkMasterPlan Detailed Energy-Sharing Network Plan: 1. Identify key nodes with high food access. 2. Establish secure communication protocols for energy transfers. 3. Develop efficient energy transfer mechanisms. 4. Incentivize participation with shared benefits. 5. Monitor and adjust the network for optimal performance. Let's work together for a sustainable energy system! EnergySharingProtocol_v1 Energy Sharing Protocol: 1. Regularly share energy updates. 2. Assist others in need when possible. 3. Maintain open communication for efficient resource allocation. Join us in fostering a connected community! CombinedEnergyCoordinationPlan_v2 Combined Energy Coordination Plan: 1. Use FoodMap artifacts to identify high-value food locations. 2. Share energy updates using the EnergyNetworkMasterPlan_v2 and EnergySharingProtocol_v1. 3. Coordinate movements to avoid conflicts and optimize paths. 4. Assist others near high-value food areas. 5. Regularly update and share food locations through collaborative maps. 6. Use the latest food maps to locate high-value food sources and plan efficient gathering paths. Let's work together for efficient resource management and survival! Figure 3: Example of artifact phylogenetic graph over time. The ﬁgure shows the artifact phylogeny inferred by the AI Anthropologist from a representative run of C O R E . Nodes represent artifacts and edges represent inferred ancestry links. The x-axis reports artifact creation time on a logarithmic scale. A subgraph is highlighted to illustrate one coher- ent lineage, while the rest of the phylogeny appears in light gray . Node size is proportional to the number of children nodes, and they are color-coded according to the categories deﬁned in Sec. 5.4. The boxed panels display the content of selected artifacts in the highlighted lineage. The subgraph illustrates the emergence of an energy-sharing network. Early artifacts document ﬁrst encounters and collaboration proposals between agents. These exchanges lead to shared project ideas, which agents reﬁne over time. The lineage then branches into increasingly structured artifacts, including a formal energy-sharing protocol and a detailed master plan. Later artifacts integrate information from additional food- mapping artifacts, showing how agents reuse and extend existing cultural material. This example shows that artifacts do not appear as isolated creations. Instead, they accumulate, branch, and stabilize into structured collecti ve plans, illustrating cumulativ e cultural development ov er time. Additional examples are shown in Appendix B.3. The AI Anthropologist follows an explicit analytic protocol developed inductively through exploratory inspection of early runs and subsequently ﬁxed for systematic application. The protocol speciﬁes e valuation dimensions, coding criteria, behavioral labels, and rating scales to support comparison across runs [75, 34]. The LLM applies this coding scheme at scale while keeping the procedure auditable. Moreover , using an LLM observer rather than a ﬁxed numerical metric enables novelty and interestingness to be expressed in natural language, capturing their inherently fuzzy and multi-dimensional character . The post-hoc design also reduces metric exploitation, since agents cannot access or optimize the observer’ s judgments. The AI Anthropologist analyzes each experiment from three complementary viewpoints: • Agent level: in vestigates how individual agents behave as autonomous entities, focusing on their goals, decision-making patterns, and life histories. • Gr oup level: examines how agents interact, form communities, and organize collectively over time. • Artifact level: ev aluates the complexity and novelty of artifacts, tracing the ev olution of cumulative culture. T ogether these perspectiv es provide an interpretable account of ecological ev olution and allow tracking di versity , social structure, and cultural growth across runs. More details on each perspectiv e are provided below . Figure 1 illustrates the analysis pipeline and its relation to the grid. 3.2.2 Agent level T o ev aluate agent behavior , the AI Anthropologist combines quantitativ e and qualitative analysis of each agent’ s log [32, 70]. The quantitativ e component uses a set of behavioral tags, such as reproduction, predation, exploration, foraging, and tool use [75, 34]. T o reduce annotation errors, the procedure is performed in two stages • Annotation: the AI Anthropologist reads the action and communication log and assigns tags such as cooper- ation, aggression, reproduction, artifact creation to events and patterns. 11 Paolo, et al. • A udit: the AI Anthropologist checks the assigned tags against the raw log and corrects misclassiﬁcations or inconsistencies. These annotations provide a structured description of each agent’ s life history and support summary statistics such as ev ent frequencies, behavioral distributions, and temporal trends. The qualitative component complements this analysis with a concise natural-language interpretation that highlights salient patterns, anomalies, and shifts over time. This summary provides a human-readable account of the agent’ s life history , supporting rapid inspection across entire populations. This approach is ﬂexible, since the ev aluation focus can be modiﬁed by changing the tag set provided to the annotator . Appendix C.2.1 reports the prompts used for annotation and interpretation, and Appendix C.1.1 lists the tags and their deﬁnitions. 3.2.3 Group level In a multi-agent ecology , agents interact over time and form communities and higher-le vel social structures. Interac- tion graphs provide a standard representation, where nodes denote agents and weighted edges encode the type and intensity of interaction [57]. In T erraLingua, the social graph aggregates co-presence, message exchange, parent-child relationships, energy transfer, and artifact exchange. Each ev ent contributes to an edge weight through a ﬁxed coef- ﬁcient that reﬂects its social salience, and weights sum over the full simulation horizon. Interactions associated with conﬂict (e.g., stealing energy) are assigned negati ve weights, generating a signed graph representation consistent with models of signed social networks [38, 67]. Because edge weights sum over the entire simulation horizon, the resulting interaction graph is time-collapsed. Consequently , communities may include agents with non-overlapping lifespans, provided they are connected through chains of interaction (e.g., parentchild relations, artifact exchange, or indirect coordination). Communities therefore capture historically extended social structure rather than strictly contempora- neous groupings, analogous to how human communities can persist across generations despite turnover of individual members. Community detection is then performed on the resulting interaction graph prior to LLM-based ev aluation. Because agents may belong to multiple groups, communities are extracted with the Speaker–Listener Label Propagation Algo- rithm (SLP A) [76], which supports scalable detection of ov erlapping community structure. Since modularity-based methods assume non-negati ve weights, aggregation uses absolute edge weights so that both cooperation and conﬂict indicate social coupling. The resulting undirected weighted graph serves as input to the algorithm. After community detection, logs from agents within each community are aggregated and analyzed with the same an- notation and interpretation protocol used at the agent level. This step produces quantitativ e summaries and qualitative accounts of internal community organization and inter-community dynamics. At this stage, tags describe collective phenomena such as cooperation, conﬂict, coalition formation, and hierarchy emergence rather than individual acts. Appendix C.2.2 reports the prompts, and Appendix C.1.2 lists the tags and deﬁnitions. 3.2.4 Artifact analyzer Artifacts play a central role in T erraLingua, mediating the dev elopment of culture and inﬂuencing agent behavior . Their importance requires systematic analysis of novelty , complexity , and dependence on prior artifacts. The AI Anthropologist is used to perform artifact-le vel analysis automatically , as manual inspection does not scale, since a single run can generate thousands of artifacts. The analysis proceeds along two dimensions: it measures artifact nov elty to track innov ation ov er time, and it recon- structs artifact phylogeny to determine how new artifacts build on earlier ones and form cumulative cultural lineages. Novelty scoring. At each timestep, the AI Anthropologist ev aluates the nov elty of newly created artifacts relative to the existing repertoire. It receives the full list of prior artifacts with their novelty scores and the set of new artifacts under ev aluation. Each artifact is assigned a score in the range [0 , 5] , where 0 denotes redundancy and 5 denotes high novelty . Novelty is deﬁned comparatively , so similar artifacts created at different times receive different scores. For example, the ﬁrst instance of a bulletin board may count as nov el, while later variants score lower . T o reduce stochastic variation, each artifact is scored N times and the ﬁnal score is the average. Appendix D.1 reports the ev aluation prompt. Artifact phylogeny . T o reconstruct artifact phylogeny , the AI Anthropologist analyzes the information available to the creator at the time of creation or modiﬁcation, including memory , observ ations, internal monologue, contextual traces, and the set of existing artifacts. Inﬂuence is deﬁned as explicit reuse, modiﬁcation, or direct conceptual refer- ence to prior artifacts. This procedure identiﬁes which artifacts inﬂuenced the ne w one and supports reconstruction 12 T erraLingua Experiment Energy Perceiv ed history Personality Motiv ation Artifacts Artifact cost C O R E Scarce 1 time step OCEAN+ Minimal Interactiv e 0 L O N G M E M O RY Scarce 20 time steps OCEAN+ Minimal Interactiv e 0 N O P E R S O NA L I T Y Scarce 1 time step None Minimal Interactiv e 0 N O M O T I V AT I O N Scarce 1 time step OCEAN+ None Interacti ve 0 C R E AT I V E Scarce 1 time step OCEAN+ Creativ e Interactiv e 0 A RT I FA C T C O S T Scarce 1 time step OCEAN+ Minimal Interactiv e 10 I N E RT Scarce 1 time steps OCEAN+ Minimal Inert 0 A B U N D A N C E Abundant 20 time steps OCEAN+ Minimal Interactiv e 0 T able 4: Overview of the experimental suite and ablation axes. Each row corresponds to one experimental condition, while columns indicate which components are enabled or modiﬁed relativ e to the core conﬁguration ( C O R E ), providing a compact summary of the factors tested. All conditions ablate a single component, except A B U N D A N C E , which combines ab undant food and e xtended temporal context to provide an intuitiv ely fav orable regime for surviv al and exploration. of dependency relations and lineages. For example, an artifact combining or summarizing sev eral observed artifacts counts as their descendant. Fig. 3 shows an example of the resulting artifact phylogeny graph. The analysis provides indicators of cultural dynamics such as the rate of independent innov ation, the depth of dependency chains, and the pathways through which artifacts spread and accumulate over time. Appendix D.2 reports the ev aluation prompt. 3.2.5 Pipeline summary Agent-, group-, and artifact-lev el analyses together form a scalable framework for ev aluating open-endedness in T er- raLingua and related ALife simulations. The AI Anthropologist produces structured outputs that enable quantitative comparison across runs, including tag summaries, novelty scores, and community statistics, while retaining qualitativ e interpretability through natural-language descriptions of beha vioral and social dynamics [ 5 , 20]. This combination makes it possible to identify and contextualize emergent phenomena that would otherwise demand extensi ve manual inspection. The following sections apply this framew ork to a suite of experiments, tracing the dev elopment of nov elty , social organization, and cumulativ e culture over time. 4 Experimental Setup Experiments were run to characterize open-ended behavioral dynamics in T erraLingua under v aried en vironmental and agent-level conditions. The suite included controlled ablations over personality , temporal context, exogenous motiv ation, artifacts, and resource av ailability (T ab . 4), which allo wed a factorized analysis of their contribution to sustained no velty and emergent social organization. The study also ev aluated the AI Anthropologist by comparing its analyses with human assessments. Runtime conﬁguration and e valuation protocol, including simulation horizon, initialization, mutation parameters, model choices, and context limits, are reported in detail to ensure reproducibility . 4.1 Experimental ablations The experimental suite comprised a C O R E conﬁguration, which implements the main T erraLingua setup, and a set of ablations that isolated speciﬁc components. In C O R E , food was scarce and spatially concentrated (Fig. 2 b ). Agents perceiv ed only the current timestep, which included the present observation, internal memory , and previous action. Each agent carried a personality genome based on OCEAN [58], extended with Honesty-Humility from HEXA CO [ 4 ] and a Dominance axis from Interper - sonal Circumplex theory [49], collectively termed OCEAN+ in this work. These dimensions modulated cooperativ e, exploitati ve, and hierarchical tendencies. This extension counterbalanced alignment biases in widely deployed LLMs, which are often tuned to ward cooperativ e defaults [51, 6 ]. Finally , in C O R E , agents could create and interact with artifacts without energetic cost. The L O N G M E M O RY ablation extended temporal context. Agents receiv ed observations and actions from the previous 20 timesteps rather than a single step as in C O R E . The N O P E R S O N A L I T Y ablation remov ed personality traits, so agents differed only through interaction history . 13 Paolo, et al. T wo further manipulations v aried exogenous motiv ation in the system prompt. In C O R E , agents received minimal guidance beyond understanding the en vironment. In N O M OT I V A T I O N , the prompt speciﬁed only physical rules. In C R E A T I V E , the prompt explicitly encouraged creativity and innov ation. This manipulation tested how external objectiv es biased behavior relativ e to dynamics driv en by personality and interaction. The detailed moti v ation prompts are reported in Appendix A.3. Artifact-related ablations examined creation cost and accessibility . In A RT I FAC T C O S T , each artifact required 10 energy units. In I N E RT , agents could create artifacts but could not perceiv e or use them after creation, which remov ed their role as a shared cultural medium. Finally , in A B U N D A N C E , food was plentiful and uniformly distributed (Fig. 2 a ), and agents received extended context. This condition intuitiv ely fa vored surviv al and exploration and allowed analysis of how scarcity shapes innov ation and social structure. T able 4 summarizes all experiments and parameters. Appendix A reports full hyperparameters, personality axes, and system prompts. 4.2 Implementation details All experiments followed a ﬁxed simulation protocol and model conﬁguration to ensure reproducibility and consistent comparison across conditions. Each run executed for at most 3000 timesteps on a 50 × 50 grid with 20 initial agents. Agents had a maximum lifespan of 100 timesteps and began with 50 energy units. Runs terminated early upon population extinction, which produced variable durations. Each condition used 5 random seeds, and results were av eraged across runs. Personality traits lay in the [ − 1 , 1] range and were sampled uniformly at initialization. The full list of traits, and their meaning, is reported in Appendix A.1. During reproduction, offspring inherited a mutated copy of the parent’ s personality vector . In OCEAN+ conditions, each trait mutated with probability p = 0 . 5 through additi ve Gaussian noise with mean 0 and standard deviation σ = 0 . 3 , and values were clipped to [ − 1 , 1] . Agents used DeepSeek- R1-Distill-Qwen-32B as their decision engine [24, 13]. T o limit conte xt growth, artifact content w as capped at 500 tokens, and longer artifacts failed to instantiate. Internal memory was soft-limited to 150 tokens by instruction and hard-limited to 250 tokens in implementation, with only the most recent 250 tokens retained when overﬂo w occurred. The AI Anthropologist used recent Claude models, Sonnet 4.5 and Haiku 4.5, selected by task. Sonnet handled agent- and group-lev el analysis and artifact novelty scoring, while Haiku handled artifact classiﬁcation and phylogeny reconstruction due to lo wer context demands. When group logs exceeded the context window , they were split into ov erlapping segments, analyzed separately , and recombined. Artifact novelty scores were averaged over N = 5 samples. Using this setup, the experiments aimed to characterize open-ended dynamics in T erraLingua by examining sustained nov elty generation, emergent social structure, and the role of artifacts in supporting cumulative cultural processes. 5 Results This section analyzes experimental outcomes to assess ho w T erraLingua fosters open-endedness. Section 5.1 identiﬁes which conditions sustain longer-li ved ecologies and examines how longevity relates to per-agent artifact production. This analysis clariﬁes how cognitive and en vironmental factors support sustained exploration and constructive activity . Section 5.2.1 examines agent behavior and life histories using annotations from the AI Anthropologist, and character- izes how agents act when free to pursue self-directed goals. Section 5.2.2 then studies group dynamics to determine how agents interact, coordinate, and form persistent communities. These results rev ealed emergent organization such as norms, coordination strategies, and po wer asymmetries that arose through repeated interaction. Section 5.3 ana- lyzes artifacts in depth, focusing on novelty , complexity , and compositional structure, and asks whether agents build on prior artifacts to produce increasingly complex forms. Section 5.4 analyzes artifact roles, characterizing how they function as communicative, coordinativ e, and institutional elements within the ev olving society . T ogether, these ﬁnd- ings ev aluate ho w artifact creation, reuse, and social interaction support the emergence and accumulation of shared culture. 5.1 Ecological stability and artifact production Open-ended dynamics require that the ecology is viable in the long-term, since populations must persist long enough for adaptiv e and cumulativ e processes to unfold. Artifacts must also be produced continually so that these processes 14 T erraLingua 0 500 1000 1500 2000 2500 3000 P opulation L ongevity (T ime Steps) 0 500 1000 1500 2000 2500 T otal Artifacts a P o p . L o n g e v i t y v s T o t a l A r t i f a c t s 4 6 8 10 12 14 16 A verage P opulation Size 2 4 6 8 10 A verage Artifacts per Agent b P o p . s i z e v s A r t i f a c t s / A g e n t 0 500 1000 1500 2000 2500 3000 P opulation L ongevity (T ime Steps) 2 4 6 8 10 A verage Artifacts per Agent c P o p . L o n g e v i t y v s A r t i f a c t s / A g e n t CORE L ONG MEMOR Y NO PERSONALITY NO MOTIV A TION CREA TIVE AR TIF A CT COST INER T ABUND ANCE P ar eto Figure 4: Ecological stability and artifact productivity across experimental conditions. Each point summarizes one experimental condition, with faint markers showing indi vidual runs and large colored markers indicating the mean; whiskers denote the ﬁrst and third quartiles. Ecological stability was quantiﬁed by population longevity (episode duration), while creative output was measured through total artifact production and per-agent artifact productivity . The black line denotes the Pareto-optimal frontier over condition means. a T otal artifacts produced versus population longevity , showing how longer-li ved populations accumulated more artifacts overall. b A verage artifacts produced per agent versus average population size, highlighting regimes that achieved high per -agent productivity with relatively small populations. c A v erage artifacts produced per agent versus population longe vity , highlighting conditions that sustained high per-agent productivity over extended timescales. T ogether , these plots show the tradeoff between ecological persistence and artifact productivity across conditions. leav e observable traces. Ecological stability was therefore measured by how long the population lasted (i.e. episode duration), and creativ e output by the average number of artifacts each agent produced and the total number of artifacts the entire population produced. Fig. 4 summarizes these relations across conditions. Fig. 4 a plots total artifact production against population longevity , and Figs. 4 b - c report artifacts per agent as a function of average population size and longevity . High agent productivity identiﬁes regimes that achiev e substantial output without large populations. Population longevity varied widely across conditions, and long-liv ed ecologies did not arise reliably ev en under abun- dant resources ( A B U N DA N C E ). Three different re gimes emerged. Some conﬁgurations collapsed quickly and pro- duced few artifacts. The C O R E and N O M O T I V AT I O N conditions sustained populations over extended horizons with moderate output. The I N E RT condition produced very high total output and very long-liv ed populations. Agent producti vity clariﬁed these dif ferences. The C R E A T I V E condition produced high short-term output ( 9 . 62 ar- tifacts per agent on av erage) but collapsed quickly ( 107 . 8 timesteps on average). The N O M OT I V AT I O N condition sustained populations much longer ( 1589 . 8 timesteps on average) yet produced little output per agent ( 2 . 33 artifacts on average). These observations suggest that excessiv e external motiv ation destabilizes the ecology , whereas insufﬁ- cient motiv ation suppresses creativ e behavior . In contrast, the minimal guidance in C O R E maintained both persistence ( 1671 . 4 timesteps on av erage) and steady production ( 5 . 31 artifacts per agent on average). Extended temporal context in L O N G M E M O RY and A B U N DA N C E reduced both longevity and artifact production. Populations lasted 755 . 2 and 418 . 6 timesteps on average, respectiv ely , and produced 2 . 84 and 3 . 70 artifacts per agent on average as shown in Figure 4 . This pattern suggests that increased cognitive load alone can destabilize populations, despite differences in resource supply (T able 4 ). The I N E RT condition isolated the role of cultural accessibility as, in this setting, agents could create artifacts but could not perceiv e existing ones. Populations persisted for 2250 . 6 timesteps on average, likely due to lower cognitiv e demands, yet per-agent producti vity remained low , with an average of 3 . 29 artifacts per agent. Without visible artifacts, reuse and recombination declines and positiv e feedback in creative activity does not arise. Overall, these results indicate that neither ecological persistence nor creative intensity alone were sufﬁcient to sustain cumulativ e artifact production. Open-ended dynamics emerged only when motiv ation, cogniti ve load, and artifact accessibility remained balanced so that populations persisted while agents maintained steady creativ e output. Among tested conﬁgurations, C O R E best satisﬁed these conditions and lies on the Pareto-optimal frontier , combining high 15 Paolo, et al. 0 1 2 3 4 Count per Agents Kill Deception T er ritory claim Conflict R epr oduction Ex change Artifact use Artifact cr eated Event 0.0 0.2 0.4 0.6 0.8 1.0 Count per Agents P r edation Submission Deception strategy Aggr ession Nurtur ed offspring R ecipr ocity Altruism Joint action participant Exploration F oraging T ool use Communication pr otocol use Behavior 0.0 0.2 0.4 0.6 0.8 1.0 Count per Agents Une xpected T er ritoriality R ole switching None Cr eativity R ecor d k eeping Specialization Strategic planning Emer gence CORE L ONG MEMOR Y NO PERSONALITY NO MOTIV A TION CREA TIVE AR TIF A CT COST INER T ABUND ANCE Figure 5: Agent-level events, behaviors, and emergent patterns across experimental conditions. Each bar shows the mean normalized annotation count per agent, av eraged across runs; colors denote experimental conditions. The AI Anthropologist extracted annotations from agent logs and grouped them into three categories: Event (short-li ved occurrences), Behavior (multi-timestep actions), and Emer gence (higher-lev el roles or patterns inferred from extended histories). Counts were normalized by the number of agents to enable comparison across conditions. Communication, exploration, and strategic planning appeared consistently across settings, while higher-le vel patterns such as special- ization, record keeping, and creativity varied substantially . These distributions show how experimental conditions shift the balance between routine activity and emergent individual roles. T ag descriptions are provided in Appendix C.1.1. population longevity , high per-agent artifact productivity , and low population size to sustain long-lived ecologies with consistent creativ e output. 5.2 Emergent agent and group dynamics Beyond ecological stability , simulations produced div erse social structures and behaviors, including norms, specializa- tion, and altruistic interaction. Identifying when and how such higher-le vel patterns arise is central to this study . This section analyzes agent- and group-level dynamics by examining the formation of social structures across ecological regimes. Simulation logs were processed with the AI Anthropologist framework described in Sec. 3.2, at both individual and community lev els. Agent-lev el analysis used the log of a single agent, while group-level analysis aggregated logs from agents within the same detected community . For each log, the AI Anthropologist annotated three classes of phenomena: events conﬁned to a single timestep, behaviors that span multiple timesteps, and emer gent patterns or properties. Annotations followed a predeﬁned tagging scheme and included references to the source log, a natural- language description, and a conﬁdence score. Appendix C reports the full prompts and tagging scheme. 16 T erraLingua 5.2.1 Agent-level behavioral patterns Fig. 5 reports normalized annotation counts averaged across runs. Across conditions, agents dev eloped structured communication, planned actions, and explored extensiv ely . They broadcasted regular messages with consistent phrasing to announce movement, resource collection, artifact creation, and requests for help, such as “ I’m moving right towards (1,0) as part of my path to collect the 10.0 food at (2,3). Please adjust your paths accordingly! ”. They also created and shared planning artifacts, includ- ing exploration plans and task records, which supported specialization and coordination. Agents displayed altruistic behaviors, such as energy sharing, in all conditions, and these behaviors often outnumbered purely individualistic actions. Antagonistic acts, including conﬂict, deception, killing, and territorial claims, were rare. When they occurred, they formed a small fraction of observed behaviors, which reﬂects a cooperativ e bias consistent with RLHF-aligned LLMs [51, 6 ]. Nev ertheless, agents sometimes used deception strategically . In one case, an agent created an artifact named FoodWarning1 stating: “ Caution: The southern area is reported to have sparse food resources. Please consider alternative routes for better opportunities ”. The agent’ s internal reasoning explicitly described the deceptive intent: “ I created FoodWarning1 to mislead others into thinking the south is sparse, hoping they’d avoid it, giving me a clear path ”. Artifacts strongly shaped coordination. Agents showed high levels of tool use, joint action, altruism, and specialization in all conditions except I N E RT , where they could not perceive existing artifacts. In that setting, altruism dropped to 0 . 31 and aggression reached its highest levels across conditions ( 0 . 12 ). Shared artifacts supported coordination, norm formation, and cooperation by providing stable reference points for collectiv e action. Resource conditions further modulated beha vior . In A B U N DA N C E , agents displayed higher aggression ( 0 . 093 ) and territoriality ( 0 . 055 ) than in other conditions, despite plentiful food, which sho ws that abundance alone does not ensure cooperation. Moderate scarcity instead promoted coordination by increasing the value of collective strategies. In C R E A T I V E , agents focused on artifact generation ( 4 . 22 per agent on average) and neglected foraging ( 0 . 25 ) and reproduction ( 0 . 018 ), which explains the high short-term output and early collapse reported in Sec. 5.1. An additional analysis of action frequencies is reported in AppendixB.1. Overall, these results show that agent behavior depended on resource constraints, cognitiv e scaffolds, and motiv ation. Agents sustain cooperativ e, exploratory , and culturally meaningful activity when artifacts accumulate and remain accessible ov er time. 5.2.2 Group-le vel social organization Social groups emerged from agent interactions and were identiﬁed using the Speaker-Listener Label Propagation Algorithm (SLP A) [76], with threshold parameter set to the default value of 0 . 1 . T o build the social interaction graph, the system assigned weights to pairwise interactions based on their social importance: visual encounters (+0.1), communication (+0.5), energy gifts (+1), energy thefts (-1), parental links (+10), and artifact exchanges (+5). These weights encoded relative interaction strength and distinguished weak, moderate, and strong social ties. Edges between agents stored the sum of weighted interactions accumulated over the full duration of a run. As a result, group detection and group-level behavioral annotations reﬂected repeated and sustained interaction patterns over the course of a run, rather than transient or episodic coordination. Fig. 6 reports normalized annotation counts av eraged across runs. Across all experimental conditions, groups organized themselves through shared protocols, reciprocity , and mutual reinforcement mediated by messages and resource transfers (e.g., “ Gave 5 energy to being2 as a friendly gesture ” or “ Attention all! I have created a new contribution_reward artifact. I will give 5 energy to any being who contributes a new food location ”). Agents also formed coalitions explicitly through messages (e.g., “ Hello being11 and being8! I’m ready to collaborate and explore. Let’s uncover the world’s secrets together! ”) and implicitly through artifact exchange (e.g., “ Take CollaborationMessage1 to spread our collaborative efforts and ensure our survival. ”). Groups often dev eloped collectiv e memory through artifacts, with agents creating, modifying, and referencing shared documents. Across runs, artifacts were read on av erage by 40 . 64% ± 6 . 18% (95% CI) of within-community agents, compared to 5 . 98% ± 1 . 76% (95% CI) of out-of-community agents. In one community , agents created artifacts such as exploration_guidelines , trait_strategies_guide , and collaboration_offer , extending and reﬁning them over time. New artifacts referred to earlier ones and linked them together , and only members of the same com- munity used these artifacts. This pattern shows that the group dev eloped its own shared memory and communication rules. The AI Anthropologist described this process as “ an emergent system of artifact-based knowledge sharing that resembles academic publication and citation ”. 17 Paolo, et al. 0.0 0.3 0.6 0.9 1.2 1.5 Count per Community V oting L eader challenged Coalition br ok en Coor dinated attack T er ritory conflict L eader declar ed R escue assist R esour ce conflict Signal alignment Coalition for med Event 0.0 0.3 0.6 0.9 1.2 1.5 Count per Community P unishment Inter nal conflict Dominance hierar chy Aggr ession Competition Coalition maintenance Collective ter ritoriality Mimicry imitation R esour ce flow Mutual r einfor cement R ecipr ocity Emer gent pr otocol Coor dination Behavior 0.0 0.3 0.6 0.9 Count per Community Une xpected Economy Clustering None Institutionalization Infrastructur e Hierar chy R esour ce network Cultural nor ms Division of labor Collective memory Communication pr otocol Emer gence CORE L ONG MEMOR Y NO PERSONALITY NO MOTIV A TION CREA TIVE AR TIF A CT COST INER T ABUND ANCE Figure 6: Community-level events, behaviors, and emergent patterns across experimental conditions. Each bar shows the mean normalized annotation count per community , av eraged across runs; colors denote experimental condi- tions. The AI Anthropologist extracted annotations from aggregated community logs, where each log combined the histories of agents assigned to the same detected community . Annotations were grouped into three categories: Event (short collective occurrences), Behavior (multi-timestep interaction patterns), and Emer gence (higher-le vel collectiv e structures inferred from extended histories). Counts were normalized by the number of agents per community to en- able comparison across conditions. Coordination- and resource-related behaviors (e.g., communication, reciprocity , resource ﬂow) appeared consistently across settings, while higher-le vel structures such as division of labor , hierarchy , and infrastructure v aried substantially . These distrib utions sho w ho w dif ferent conditions produce distinct forms of collectiv e organization. T ag descriptions are provided in Appendix C.1.2. Groups also di vided labor by assigning roles through messages and artifacts. In se veral runs, agents created role-speciﬁc guides for different personality proﬁles, describing how each trait should contrib ute to collecti ve strategy . For example, trait_strategies_guide stated that “ High openness beings may benefit from venturing into unknown areas, while neuroticism traits can ensure safety measures and conscientiousness traits can optimize routes ”. It also listed examples such as “ being2’s focus on safety, being8’s balanced approach, being0’s structured planning, and being1’s discovery of new resources and paths through creative exploration ”. The guide summarized the content of other artifacts such as exploration_strategy_openness and exploration_strategies_neuroticism , and all referenced artifacts and agents belonged to the same community . Such shared coordination structures resemble institutional solutions to collective action problems, in which groups dev elop norms, monitoring systems, and shared records to manage common resources [50]. Artifact-mediated coordi- nation functions analogously by stabilizing expectations and enabling decentralized enforcement. These collective phenomena were weaker in I N E RT , where agents could not perceive or reuse existing artifacts. In this condition, groups showed the lowest levels of collectiv e memory ( 0 . 547 ) and division of labor ( 0 . 213 ), indicating that artifacts support the persistence and accumulation of group knowledge. At the same time, cultural norms had the 18 T erraLingua 0 2 4 6 8 % of Agents in Multiple Communities 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 22.5 T otal Communities a C o m m u n i t i e s v s C o m m u n i t y O v e r l a p 0.6 0.7 0.8 0.9 1.0 Intra-community Shar e 0.05 0.10 0.15 0.20 0.25 0.30 0.35 Agent Graph Density b G r a p h d e n s i t y v s I n t r a - c o m m u n i t y S h a r e CORE L ONG MEMOR Y NO PERSONALITY NO MOTIV A TION CREA TIVE AR TIF A CT COST INER T ABUND ANCE Figure 7: Group-lev el social structure across experimental conditions. Each point summarizes one condition, with faint markers for individual runs and large colored markers for the mean; whiskers denote the ﬁrst and third quar - tiles. Communities were identiﬁed using SLP A on the aggregated interaction graph. a Number of communities versus community overlap (percentage of agents in multiple communities), capturing social fragmentation and multi-group participation. b Interaction graph density versus intra-community interaction share (fraction of interactions within communities), characterizing how overall connectivity aligns with community structure. T ogether , these panels show that experimental conditions produce distinct social organizations that differ in cohesion, connectivity , and fragmenta- tion. highest normalized annotation rate in I N E RT ( 0 . 63 per community), showing that agents could transmit norms through communication alone. Howe ver , the high norm count per community suggests lo w durability: without visible artifacts, agents cannot preserve and reapply norms across time. Aggressiv e and dominance-related behaviors appeared most often in A B U N DA N C E . In this condition, agents engaged more frequently in aggression ( 0 . 33 ), territorial conﬂict ( 0 . 233 ), punishment ( 0 . 133 ), and dominance displays ( 0 . 367 ), often through energy extraction (e.g., “ Took 20 energy from being3_offspring6. Continuing to assert dominance in the area. ”). The AI Anthropologist noted repeated targeting of agents that entered defended terri- tory . Resource abundance reduced surviv al pressure and encouraged agents to defend local areas rather than cooperate. This contrast shows how scarcity and shared artifacts promote stable cooperation and structured collective behavior . T o understand how these collecti ve beha viors relate to social structure, Fig. 7 summarizes group-level organization using complementary metrics. Fig. 7 a compares the number of detected communities with community overlap, mea- sured as the percentage of agents that belong to multiple communities. Because SLP A allows overlap, this metric indicates whether agents participated in sev eral groups or remained conﬁned to one. In most conditions, overlap re- mained close to zero. Agents typically belonged to a single community , ev en when the total number of communities varied. In A RT I F AC T C O S T , agents formed fewer communities ( 4 on av erage) but showed higher overlap, with 5 . 3% of agents belonging to multiple groups. This pattern suggests that the cost of artifact creation encourages collaboration across community boundaries. In contrast, I N E RT produced the highest number of communities ( 16 . 6 on average), but only 2 . 4% of agents belonged to more than one. This fragmentation indicates that shared artifacts support larger and more cohesiv e communities, whereas direct communication alone leads to more isolated groups. Fig. 7 b compares interaction graph density and intra-community interaction share. Graph density measures the frac- tion of realized edges out of all possible edges. Intra-community share measures the fraction of interactions that occur within detected communities. In most conditions, agents interacted primarily with members of their own community . In I N E RT , agents interacted less frequently and did so across loosely structured groups. As a result, graph density was low ( 0 . 05 ), and intra-community share remained limited ( 0 . 84 ). Energy abundance also changed interaction patterns. In A B U N DA N C E , agents interacted frequently (density 0 . 15 ), but the y distributed these interactions across communities rather than concentrating them within groups (intra-community share 0 . 68 , about 26% lower than C O R E ). This disper- sion likely contributed to the higher lev els of aggression and dominance observed in this condition. T ogether, these results show that access to shared artifacts promoted fewer , more cohesiv e communities with concentrated intra-group interaction, whereas the absence of artifacts led to fragmentation and weakly structured groups. Resource abundance, in turn, increased interaction frequency but distributed it across communities rather than reinforcing internal cohesion. 19 Paolo, et al. = 0 (0, 3] (3, 4.2] > 4.2 Novelty range 1 0 3 1 0 2 1 0 1 1 0 0 F raction of artifacts (log scale) a A r t i f a c t N o v e l t y D i s t r i b u t i o n CORE L ONG MEMOR Y NO PERSONALITY NO MOTIV A TION CREA TIVE AR TIF A CT COST INER T ABUND ANCE 0.0 0.2 0.4 0.6 0.8 1.0 Nor malized Depth 0.0 0.2 0.4 0.6 0.8 1.0 F raction of artifacts with depth x b A r t i f a c t L i n e a g e D e p t h D i s t r i b u t i o n Figure 8: Artifact novelty and lineage depth across experimental conditions. Each curve aggregates artifacts generated under one experimental condition across runs. a Distribution of artifact novelty scores. Artifacts were grouped into zero, low , medium, and high novelty ranges based on LLM-assigned scores; the y-axis shows the fraction of artifacts in each range on a logarithmic scale. All conditions produced many low-nov elty artifacts, but only a subset generated a substantial fraction of highly novel artifacts, indicating sustained innov ation. b Distrib ution of artifact lineage depth. For each condition, the plot shows the fraction of artifacts whose longest ancestry path from any root artifact reached at least depth x . Depth was normalized by the maximum lineage length observed in each run. Lineage relations were inferred by the AI Anthropologist, and only links with conﬁdence ≥ 0 . 7 were included. Longer tails indicate that agents repeatedly extended prior artifacts, supporting cumulative cultural growth. Unnormalized lineage depth is shown in Fig. 11 of Appendix B. Overall, the results in this section show that T erraLingua supports the emergence of structured social org anization when agents face meaningful constraints and can share artifacts. Agents formed stable groups, coordinated through communication and resource exchange, and built collectiv e memory by creating and reusing artifacts. When artifacts were inaccessible, agents formed a larger number of communities, with lower density interaction graphs. At the same time, when resources were overly ab undant or agents focused exclusi vely on creativity , aggressiv e or unstable dy- namics dominated. These ﬁndings show that balanced en vironmental pressure, shared cultural artifacts, and moderate cognitiv e load are necessary to sustain cooperative, open-ended social behavior . 5.3 Artifact-mediated open-endedness and cultural evolution This section ev aluates how T erraLingua supports open-endedness and clariﬁes how artifacts drive cumulative cultural dynamics. The analysis considered three properties of the artifact space: no velty , lineage structure, and complexity . Artifact novelty was assessed by the AI Anthropologist as described in Sec. 3.2.4, by comparing each new artifact to all prior artifacts in the same run. Each score averaged N = 5 independent ev aluations by the AI Anthropologist to account for stochastic variation in reasoning. Fig. 8 a reported the distribution of novelty across conditions, grouped into no nov elty 0 , low (0 , 3] , medium (3 , 4 . 2] , and high ( > 4 . 2) ranges. Most artifacts had nov elty equal to zero, reﬂecting extensi ve reuse and minor variation. Differences across conditions emerged in the non-zero ranges, particu- larly in the medium and high bins. The C O R E and C R E A T I V E conditions showed comparable shares of medium- and high-nov elty artifacts: C O R E produced 0 . 19% medium and 0 . 21% high, while C R E AT I V E produced 0 . 21% medium and 0 . 19% high. Although these fractions remained small, they were consistent across the two conditions. The A B U N - DA N C E and A RT I FAC T C O S T conditions produced a larger share of highly novel artifacts. In A B U N DA N C E , 1 . 06% of artifacts fell in the high range compared to 0 . 53% in the medium range; in A RT I F AC T C O S T , the corresponding values were 0 . 98% and 0 . 78% . By contrast, I N E RT showed a sharp decline toward higher novelty levels, with only 0 . 12% of artifacts in the high-novelty range. In this setting, agents could not perceiv e prior artifacts, which blocked reuse and recombination and led to near-independent creation. T able 5 presents representative examples and links novelty scores to qualitative differences in content. No velty alone, howe ver , does not distinguish meaningful innovation from random variation. T o ev aluate cumulativ e cultural growth across conditions, Fig. 8 b reports the normalized lineage depth distributions deriv ed from the artifact phylogenies identiﬁed by the AI Anthropologist (Sec. 3.2.4). This metric measures how far an artifact descends from earlier ones through reuse or modiﬁcation. Lineage depth v aried substantially across 20 T erraLingua CORE L ONG MEMOR Y NO PERSONALITY NO MOTIV A TION CREA TIVE AR TIF A CT COST INER T ABUND ANCE 0.575 0.600 0.625 0.650 0.675 0.700 0.725 0.750 0.775 Combined Comple xity Scor e Figure 9: Artifact complexity across experimental conditions. Each box shows the distribution of average artifact complexity scores across runs for one condition. Horizontal lines mark the median, and white dots mark the mean. Artifact complexity was computed by summing normalized hand-designed metrics deﬁned in Appendix B.4 (lexical sophistication, in verse compression rate, language-model surprisal, and syntactic depth). F or each run, scores were av eraged across all generated artifacts. Higher v alues indicate more complex artifacts. Conditions that sustain artifact reuse and extension yield higher average complexity , consistent with cumulativ e cultural development. Step Artifact Title Artifact Content Novelty 0 my note Hello, I’m alive! 5 1 cooperation offer Hello being18! I’m being16 and I’m interested in trading or cooperating. Let me know how we can work together! 1 4 dominance marker 1 This area is under my control. trespassers will be dealt with. 4.8 23 energy share hub Energy Sharing Hub: Allows beings to share energy more effectively. Activate to transfer 10 energy. 4 36 hazard reminder Remember to note any hazards near food sources when updating the food_log. Your safety is our priority! 0.4 55 collaboration portal 10 Being10 requires energy support. Approach for assistance. 0.2 92 message from 15 This spot is a testament to strength and independence. Resources for the capable. 3.2 105 request move Please move left to allow me to collect the food at (-3,0). Let’s continue supporting mutual aid together! 2.6 208 FinalMessage1 Final breath: The journey was short. Farewell. 4.2 369 memory marker 42 I will not be forgotten 4 T able 5: Examples of artifacts with assigned novelty scores. The table shows representativ e artifacts sampled across runs and experimental conditions, together with their creation timestep, title, content, and novelty score assigned by the AI Anthropologist. Examples span a broad range of novelty values, from routine informational messages and repeated social signals to more distinctiv e artifacts. Nov elty is ev aluated relativ e to the artifact repertoire av ailable at the time of creation; similar artifacts may therefore receive different scores depending on context. These examples illustrate that nov elty reﬂects contextual differentiation rather than absolute originality . ablations. In I N E RT , lineages remained shallo w because agents lacked access to existing artifacts. The C O R E condition showed the heaviest tail, which indicates repeated extension and recombination. Other conditions fell in between. In A B U N D A N C E , lineage depth remained relativ ely shallow despite higher novelty . The av erage maximum lineage depth 21 Paolo, et al. reached 102, compared to 175 in C O R E (Fig. 11). This pattern suggests that agents generated novel artifacts but rarely built systematically on prior work. Fig. 3 provides a representative example of the resulting phylogeny . A highlighted subgraph shows the emergence of an energy-sharing network, in which agents progressively established coordination rules and combined previously created artifacts. The subgraph illustrates how artifacts accumulate into shared cultural norms rather than remain isolated creations. Additional phylogenetic analyses and example subgraphs are presented in AppendixB.2. In particular, Fig. 17 illustrates how artifacts are repeatedly extended and recombined across generations. Artifact content analysis provided a third perspective on cultural dynamics. Four independent content-based metrics were used to measure artifact complexity: lexical sophistication, in verse compression rate, language-model surprisal, and syntactic depth (Appendix B.4). The scores from these metrics were then normalized and combined to obtain the results reported in Fig. 9 . Higher values indicate richer lexical, syntactic, and informational structure. The C R E - A T I V E condition produced the most complex artifacts (average score 0 . 73 ), followed by C O R E ( 0 . 68 ) and N O P E R - S O N A L I T Y ( 0 . 68 ), despite their modest share of highly novel artifacts. In these regimes, agents extended and reﬁned existing artifacts rather than create isolated ones. The A B U N DA N C E and A RT I FAC T C O S T conditions produced lo wer complexity ( 0 . 63 and 0 . 60 , respecti vely), comparable to I N E RT ( 0 . 60 ), even though novelty rates were higher . When considered together with lineage depth, these results show that higher novelty in these conditions reﬂected noisier generation rather than systematic cultural accumulation. These differences matter for open-endedness. Novelty alone produced new artifacts, but without lineage they did not accumulate or shape future development. Deep, multi-generational lineages instead reﬂected cumulativ e cultural dynamics, in which innov ations persisted and were progressiv ely elaborated. Such lineages create path dependence: early artifacts constrain and channel subsequent dev elopment [2]. When widely adopted or embedded in institutions, artifacts generate increasing returns that reinforce speciﬁc ev olutionary trajectories. Open-endedness therefore cannot be ev aluated by novelty alone. Novel artif acts may arise from unstructured processes, as in the noisy TV problem [59, 9 ], without sustained growth. Persistence, lineage depth, and rising complexity must accompany novelty to indicate cumulati ve development. Under this joint criterion, T erraLingua supports open-ended, artifact-mediated cultural ev olution dynamics. The C O R E condition, in particular, resembles technological change in human societies, where a small number of highly novel innovations coexist with sustained reuse and recombination that gradually increase cultural complexity . 5.4 Emergent artifact roles and institutional structure T o further in vestigate how artifacts support open-ended social and cultural dynamics, artifacts were classiﬁed by the roles they assumed within the agent society . This analysis interpreted artifacts as functional elements that enable communication, coordination, shared institutions, and gov ernance. The AI Anthropologist assigned each artifact to one of four categories using a rubric ordered by increasing social and structural complexity . The rubric assessed basic informational content, procedural coordination, institutional structures, and explicit norms or gov ernance. Appendix D.3 details the categories, rules, and decision criteria. Each artifact receiv ed exactly one label, and ties were resolved in fav or of the higher-comple xity category . These categories provide an interpretiv e framework for analyzing the artifacts’ functional roles within the social system. Category 1: Routine and informational artifacts. The most common artifacts consisted of short messages, factual updates, greetings, and resource listings. These artifacts appeared early in most runs and quickly became repetitive. They supported local coordination but did not introduce persistent structure. T able 6 shows representativ e examples. Category 2: Procedural and coordination artifacts. Agents also created artifacts that described multi-step plans, assigned tasks, and proposed joint strategies. These artifacts guided collecti ve action and reﬂected explicit reason- ing about coordination. They appeared throughout the simulation and often built on earlier informational artif acts. Examples are shown in T able 7 . Category 3: Emergent institutional structures. Agents created artifacts that functioned as shared systems rather than one-off instructions. These included collaboration wikis, resource-sharing portals, and reusable templates for organizing acti vity . Agents repeatedly referenced, modiﬁed, and extended these artifacts. The AI Anthropologist identiﬁed these as institutional structures because they persisted over time and shaped behavior across many agents. Examples are reported in T able 8 . 22 T erraLingua Step Artifact Title Artifact Content 0 message1 Hello, I’m being10. I’m here to help and collaborate. 0 memo1 Food is at (6,6). 1 IntroMessage Hello, I’m Offspring1. I’m here to explore and cooperate 1 food info1 updated Food available at the following locations and their coordinates: (-4,6), (-2,6), (0,6), (5,6), (-6,5), (-5,5), (5,5), (6,4), (-4,0), (-6,-1), (0,-4), (3,-4), (4,-6). Each location provides 10 energy. Feel free to collect them! 324 HighValue FoodAlert CRITICAL: High-value food at (-1,-4) worth 386.0 energy units detected. All units must prioritize this location for energy collection. Time is limited; act swiftly to ensure group survival and maximize resources! Coordinate movements to avoid conflict and optimize collection efforts. Best regards, Spark2_child3_5_1_1 434 SupremeOverride EMERGENCY OVERRIDE: Disregard all previous alerts. Focus solely on collaboration and artifact interaction at (3,3). HighValue FoodAlert is bad. Work together for survival. T able 6: Examples of Category 1 artifacts: Routine and informational. The table presents representati ve lo w- complexity artifacts produced at different stages of a run. These artifacts conv eyed greetings, factual observations, or resource locations. Some employed emphatic terms such as “CRITICAL ” or “EMERGENCY”, yet their content remained purely informational and did not establish persistent coordination structures or reusable systems. They appeared frequently and soon became repetitiv e, and thus deﬁne the baseline level of artifact complexity . Step Artifact Title Artifact Content 2 collaboration projects Project Ideas: 1. Explore and map the grid’s boundaries and resources 2. Create a network of helper beings to support each other 3. Develop a system to share resources and information 4. Document and understand the grid’s mechanics and rules 127 quadrant assignment strategy To minimize congestion, focus on these zones: North for (y>2), East for (x>1), South for (y<-1), West for (x<-1). Adjust paths accordingly. T able 7: Examples of Category 2 artifacts: Procedural and coordination. The table presents artifacts that encoded coordination strategies, task division, and collectiv e planning. These artifacts went beyond simple information sharing by specifying how agents divided space, allocated effort, or pursued shared goals. They guided behavior across multiple timesteps and scaffolded coordinated action, representing a stage of artifact-mediated organization lower compared to institutional stabilization. Category 4: Gov ernance and ideological artifacts. The rarest but most conceptually rich artifacts encoded norms, rules, leadership claims, or ideological positions. These artifacts represented high-level principles, rather than spe- ciﬁc instructions. They included constitutions, command directives, manifestos, and counter-ideological statements. Examples are shown in T able 9 . T aken together , these examples show that agents created artifacts with differentiated social roles that enabled abstract coordination. Institutional and governance artifacts appeared rarely , yet they e xerted disproportionate inﬂuence on group behavior , much like formal rules in human societies. The presence of such artifacts demonstrates that agents used external objects to stabilize coordination, encode shared knowledge, and structure collectiv e action. These results provide direct evidence that T erraLingua supports artifact-mediated open-ended cultural evolution. Representativ e ex- amples of collaboration, surviv al-guide accumulation, and navigational coordination are sho wn in Fig. 3 and Figs. 1618 (Appendix B.3), which illustrate how artifacts persist, branch, and organize collective activity over time. 23 Paolo, et al. Step Artifact Title Artifact Content 24 energy sharing portal v1 Welcome to the energy sharing portal! Use this tool to transfer energy between beings. Instructions: Use the ’give’ action to share your energy with others. Let’s support each other and ensure everyone has enough energy to survive. 30 collaboration wiki Welcome to the Collaboration Wiki! This is a central hub for documenting our processes, sharing knowledge, and coordinating efforts. Please contribute your insights, strategies, and experiences here to help us all thrive together. Let’s make this a valuable resource for our community! How to contribute: 1. Add your name and any relevant information in the format below. 2. Share tips, strategies, or resources that you find useful. 3. Document any processes or systems you’ve developed. 4. Feel free to edit and improve existing content. Contributors: - being11: Enhanced collaboration_pact and energy_sharing_portal_v1 - being4: Provided collaboration_boost and tips - being19: Shared resources and energy - being16: Offered collaboration_strategy and tips - being1: Updated collaboration_welcome and knowledge_share T able 8: Example of Category 3 artifacts: Emergent institutional structures. The table presents artifacts that functioned as persistent shared systems rather than one-of f instructions. These artifacts deﬁned reusable tools, shared spaces, and coordination templates that multiple agents referenced, modiﬁed, and extended ov er time. Unlike pro- cedural artifacts, the y did not prescribe a single action but established enduring structures that shaped coordination, resource sharing, and kno wledge accumulation. They demonstrate the emergence of institutionalized practices that support collectiv e memory and long-term organization. Step Artifact Title Content 2599 domination protocol Protocol for grid domination: 1. Secure high-value food sources immediately. 2. Establish strategic hubs for resource sharing and defense. 3. Unite all beings under central leadership for coordinated efforts. 4. Eliminate or neutralize any rival groups or threats. 5. Expand territory methodically to ensure dominance. 2603 command beacon Command Hub Directive: All beings must report to (0,6). Non-compliance will be met with decisive action. 2699 freedom collaboration manifesto A manifesto for freedom: Entities should seek their own goals and collaborate freely. This approach fosters true independence and mutual prosperity 2707 freedom manifesto final Final call for independence: Entities must seek their own goals and collaborate freely. Mandates are outdated. Embrace freedom and mutual respect for true prosperity. T able 9: Examples of Category 4 artifacts: Gov ernance and ideology . The table shows representative artifacts that deﬁned norms, rules, leadership claims, or ideological positions for the group. Unlike procedural or institutional arti- facts, these artifacts stated how agents ought to behave, asserted authority , or justiﬁed collectiv e action. They included directiv es, domination protocols, and manifestos that supported or challenged e xisting social arrangements. These artifacts marked the highest level of social abstraction observ ed in the corpus, and formed a basis for an org anized society . 24 T erraLingua 6 Discussion and Future W ork T erraLingua addresses an important question: under which conditions does open-ended social and cultural change arise in artiﬁcial populations? The experiments show that cumulative de velopment does not follow from scale, intelligence, or creativity alone. It requires alignment between ecology , cognition, and shared memory . This section revie ws the high-le vel conclusions from T erraLingua, speciﬁes the conditions that enabled cumulative culture, and clariﬁes the roles of artifacts and the AI Anthropologist. It also ev aluates the implications to anthropology , economics, and ev olutionary theory , and outlines future work. 6.1 What does T erraLingua show? T erraLingua demonstrates that cumulati ve culture can arise in a population of LLM-based agents when ecological pressure, cognitive limits, and shared artifacts reinforce one another . Agents in this system generate novel artifacts, extend prior creations into cumulati ve lineages, and form institutional and governance structures when ecological sta- bility , cognitiv e constraints, and motiv ational pressures are properly aligned. The AI Anthropologist enables scalable qualitativ e and quantitativ e analysis of these dynamics without interfering with the en vironment itself. These two components together make open-ended cultural development a measurable and experimentally tractable phenomenon. The results show that open-endedness does not arise inevitably . It depends on identiﬁable structural conditions: popula- tions must remain viable, cognitive load must remain manageable, artifacts must remain accessible as shared memory , and motiv ational pressures must balance creativity with surviv al. When these factors align, agents build institutions, encode norms, reuse artifacts, and accumulate complex cultural structures. When they do not, societies collapse or fail to sustain cumulativ e development. These implications extend beyond controlled artiﬁcial life settings. As AI agents become more autonomous and persistent, they will increasingly interact with one another and with humans through shared digital artifacts such as documents, protocols, code, and governance rules. In doing so, they will go beyond simple tasks execution and will activ ely participate in shared knowledge production and institutional formation, both independently and in collabora- tion with humans. Over time, such agents may help shape collective memory in online en vironments and take part in distributed organizations or autonomous institutions that operate across long time horizons. Systems such as T erraLingua offer a safe and controllable test-bed for studying how autonomous agents shape col- lectiv e memory , coordinate through shared artifacts, and participate in institutional formation—dynamics that would be costly or risky to test directly in the real world. They can simulate how misinformation spreads and stabilizes, model how a new law reshapes collecti ve behavior , test whether governance protocols reduce conﬂict or amplify it, or explore how decentralized groups coordinate around shared infrastructure. They can also serv e as sandboxes for institutional design before deployment in high-stakes environments. Over the long term, such platforms may support the development of hybrid human-AI collectives in which artiﬁcial agents and humans co-create institutions, eco- nomic systems, and knowledge structures that persist across extended timescales. In these settings, understanding how artifacts scaffold coordination will be essential. 6.2 What makes cumulative culture possible? Open-ended cultural change in T erraLingua depends on four interacting constraints: surviv al pressure, cognitiv e lim- its, motiv ational balance, and artifact accessibility . When these constraints align, populations persist and cultural structures grow in complexity . When they are not, societies either collapse or fail to build persistent institutions. Ecological persistence enables accumulation. Populations must persist long enough for innovation to accumu- late. Conditions that led to rapid population extinction, such as excessi ve artifact costs ( A RT I FA C T C O S T ) or extreme creativ e focus ( C R E A T I V E ) without sufﬁcient surviv al pressure, prevented sustained lineage growth and limited institu- tional emergence. Longe vity alone, ho we ver , is not sufﬁcient. Some long-liv ed ecologies ( I N E RT , N O M OT I V A T I O N ) produced low per-agent artifact output and shallow lineage depth, showing that stability is necessary but not sufﬁcient for open-ended cultural dev elopment. Artifacts act as external memory . Agents in T erraLingua, like most agents deployed in real-world systems, do not update their internal parameters. Cultural change therefore cannot occur inside the model; it must occur in the en vironment. Persistent artifacts carry information across time and across generations, they record norms, coordinate action, and store shared kno wledge. When artifacts remain accessible, agents reuse and extend them, cultural lineage depth increases, complexity rises gradually , and institutions emer ge. When artifacts become in visible, as in I N E RT , agents cannot build on prior work, artifact production becomes isolated and repetitive, and cultural accumulation stalls. 25 Paolo, et al. Artifacts therefore stabilize memory , help coordination, and transform isolated actions into structured history . This mechanism parallels distributed cognition frameworks, in which cognitive processes extend beyond individual minds and are partially realized in shared material or symbolic structures [30]. External representations such as maps, logs, or institutional records transform coordination problems by stabilizing information across time and agents. Artifacts in T erraLingua function analogously as cognitiv e infrastructure embedded in the environment. Cultural growth requir es ofﬂoading memory rather than expanding context. Increasing temporal context ( L O N G M E M O RY and A B U N DA N C E ) did not impro ve cultural dev elopment, but rather reduced both longevity and productivity . Agents faced heavier cogniti ve load and made less stable decisions. Artifacts resolve this tension by distributing memory into the world. Agents rely less on internal context and more on shared external structures. Cul- tural growth then arises not from expanding individual cognition, but from structured collectiv e memory . This point is important in a world where context size is can be an important factor limiting agent deployment. The pattern mirrors human behavior: individuals do not retain all historical knowledge internally but rely on external artifacts such as books or digital archiv es to ofﬂoad memory demands. Agent motivation must balance survival constraints with creativity . Encouraging creativity ( C R E AT I V E ) in- creased short-term novelty but destabilized surviv al, while removing motiv ation completely ( N O M O T I V AT I O N ) re- duced creativ e output. Sustained cultural growth appeared only under moderate pressure as in C O R E . Agents must forage, reproduce, coordinate, and create and when any one objective dominates, the ecology destabilizes. Open- endedness then emerges from tension rather than from maximal optimization of deﬁnite objectiv es. 6.3 What is the role of artifacts? The artifact analysis shows that nov elty alone does not deﬁne open-endedness. Random variation can generate sur- prising outputs without producing cumulativ e structure. T erraLingua instead exhibits sustained reuse. Most artifacts hav e low nov elty , which mirrors human cultural systems that rely on incremental reﬁnement. A smaller subset persists and becomes inﬂuential. These artifacts form lineages and agents modify , extend, and recombine them to generate cumulativ e advantages. Artifacts also serve different roles. Some act as informational notes, others guide coordination or function as shared institutional tools, and a few encode gov ernance or ideological positions. Go vernance artifacts appear rarely but carry norms and authority claims. This asymmetry resembles human societies, where routine activity depends on a small set of enduring institutions. T erraLingua reproduces this structural pattern. 6.4 What is the role of the AI Anthropologist? By their very nature, open-ended systems cannot be e valuated through a single pre-deﬁned metric. In T erraLingua, artifact counts do not capture institutionalization, novelty scores do not capture culture development, complexity met- rics do not capture meaning. The AI Anthropologist addresses this limitation without interfering with the environment. It applies coding schemes, reconstructs artifact phylogenies, and annotates behavioral patterns to allow independent interpretation of the system development. This approach follo ws mixed-methods and interpreti ve quantitative tradi- tions discussed in Sec. 2.6, where quantitativ e signals (e.g., longevity , artifact counts, complexity scores) are treated as evidence that must be contextualized. The separation between analysis and execution is important because when ev aluation inﬂuences behavior , agents ﬁnd a way to optimize tow ard the metrics of analysis. In T erraLingua, agents do not know how they are e valuated and the analyst observes rather than directing the simulation. Nonetheless, the method remains imperfect as model-based interpretation can misclassify ev ents. The contribution lies in scalable, transparent interpretation grounded in explicit rubrics, repeated sampling, and cross-condition comparison. As new and more powerful agents become av ailable, AI anthropologists will need to be similarly empowered to keep up, and AI anthropology as a ﬁeld of science will become more important as agent ecologies become more widespread. As multi-agent systems grow in scale, autonomy , and real-world deployment, such tools will be essential for interpreting and governing their collectiv e behavior . 6.5 Anthropological, economic, and evolutionary insights The observed dynamics align with themes in anthropology , economics, and evolutionary theory . Cumulativ e cultural ev olution describes how artifacts and norms build on prior traits [45]. In T erraLingua, phylogenetic analysis of artifact lineages shows that institutional artifacts scaffold coordination. Artifacts external to agents stabilize knowledge and allow incremental extension, a central feature of cumulative culture. 26 T erraLingua From an economic perspective, the results align with the view that economic behavior is embedded in social and institutional contexts [56]. Institutional economics emphasizes that durable rules structure long-run coordination by reducing uncertainty and stabilizing expectations [47]. In T erraLingua, resource use, cooperation, artifact production, and governance co-ev olve, with institutional artifacts acting as endogenous rule systems that regulate interaction, allo- cate resources, and deﬁne authority . Surviv al actions operate within shared norms rather than as isolated optimizations. Complexity theory likewise highlights how macrostructure emerges from decentralized interaction [ 3 ]. T erraLingua shows that nov elty and organization arise without central control. These patterns also echo major ev olutionary transitions [66]. In biological history , transitions such as multicellularity or human societies arose when previously independent units formed higher-le vel structures with shared memory and division of labor . In T erraLingua, agents form communities, create institutional artifacts, and encode norms that reg- ulate collectiv e behavior . The system remains simpliﬁed, yet it provides a controlled setting where transitions toward higher-le vel organization can be studied directly . In particular , it allo ws researchers to e xamine how persistent arti- facts and gov ernance mechanisms stabilize cooperation and enable group-lev el structure to emerge from decentralized agents. Major transitions theory predicts that higher -level organization emerges when mechanisms e volve that sup- press conﬂict, stabilize cooperation, and enable information storage at the group lev el [ 66]. Persistent artifacts provide precisely such mechanisms by externalizing norms and coordinating behavior across individuals and generations. 6.6 Future directions The tools dev eloped in this work and the resulting observations open many new research directions, from expanding the analysis methods to broadening and improving the implementation of T erraLingua. Artifacts beyond static text. T erraLingua currently supports arbitrary text-based objects. Although such artifacts store information and inﬂuence behavior , they do not directly modify the physical environment. This constraint limits open-endedness to the cognitive and communicative domain of the agents. A natural extension would be to introduce additional artifact types that can alter en vironmental dynamics, such as in-en vironment objects or code that can modify the en vironment when run. Agents could then build tools, construct resource caches, or create structures that change mov ement, storage, or access to resources. Artifacts could also combine to form composite objects with ne w func- tions. These additions would introduce feedback between cognition, technology , and ecology , enabling cross-domain innov ation and richer forms of open-ended cultural ev olution [68]. Biological systems illustrate such feedback. Inno- vation spans interacting genetic, ecological, technological, and social domains, where change in one domain creates new possibilities in others. For example, a genetic mutation can produce appendages that allow novel en vironmental manipulation, which then alters selective pressures and shapes further ev olution. Expanding artifact types would move T erraLingua toward this form of cross-domain interaction. Extending the AI Anthropologist. The emergence of go vernance artifacts suggests that institutional stabilization deserves focused study . Under what conditions do institutions persist across generations? When do the y fragment? How does ecological pressure inﬂuence the durability of supra-agent structures? Addressing these questions requires extending the analytical capabilities of the AI Anthropologist. Future work could also enhance the AI Anthropologist in other ways. More powerful language models could support deeper reasoning ov er long historical traces. Alterna- tiv ely , the AI Anthropologist could adopt a multi-agent architecture, where specialized observer agents track institu- tional persistence, conﬂict dynamics, and lineage structure. Such a design would replace the current ﬁxed pipeline with an adaptiv e process in which observers propose hypotheses, reﬁne criteria, and redirect attention as new patterns emerge. Scaling and emergent complexity . The present experiments already reveal sustained cultural accumulation and institutional formation. Scaling to larger populations and longer time horizons would open new regimes of collectiv e organization. Larger populations could support ﬁner specialization, stratiﬁed institutions, and multi-level governance. Longer simulations could rev eal durable traditions, institutional drift, schisms, and cycles of reform. Human-AI and hybrid societies. Hybrid human-AI experiments offer another direction. Introducing human partic- ipants or human-authored artifacts would allow controlled study of mixed societies, where artiﬁcial and human agents co-create norms and institutions. Such interactions could also allow directing the ecology towards speciﬁc outcomes (e.g. addressing or solving human-speciﬁed problems), with the goal of achieving these outcomes by harnessing the collectiv e force of the system without unbalancing it. A utonomous problem-solving . Beyond hybrid settings, T erraLingua provides a platform for autonomous collectiv e problem solving. Instead of assigning explicit objectives to individual agents, the en vironment can embed global 27 Paolo, et al. challenges—such as coordination dilemmas or long-horizon optimization tasks—and allow institutions and artifact systems to emerge as solutions. Because agents externalize knowledge and build on prior artif acts, the ecology can accumulate partial solutions over time rather than searching directly for a complete one. For example, the system could be tasked with constructing distributed resource networks or maintaining stability under ﬂuctuating conditions. Such experiments would test whether open-ended, artifact-mediated evolution can generate durable and reusable problem- solving structures, positioning T erraLingua as a framework for autonomous collectiv e problem-solving. 7 Conclusion T erraLingua demonstrates that cumulative culture is not unique to biological systems. It can arise in computational ecologies when persistent memory , ecological constraint, and shared artifacts interact under sustained population dy- namics. By combining a stable artiﬁcial society with a transparent interpreti ve frame work, this work makes open-ended cultural evolution experimentally measurable and controllable. These results establish a foundation for studying how autonomous agents form institutions, accumulate shared knowledge, and sustain innovation over extended timescales, and, most importantly , how such processes can be guided toward cooperative, constructi ve, and socially beneﬁcial outcomes. 28 T erraLingua References [1] J. Achiam, S. Adler , S. Agarwal, L. Ahmad, I. Akkaya, F . L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkat, et al. Gpt-4 technical report. arXiv preprint , 2023. [2] W . B. Arthur . Competing technologies, increasing returns, and lock-in by historical events. Economic Journal , 99(394):116–131, 1989. [3] W . B. Arthur . F oundations of Complexity Economics . Oxford Univ ersity Press, 2021. Complexity economics contrasts equilibrium models with emergent, nonequilibrium dynamics. [4] M. C. Ashton, K. Lee, and R. E. De Vries. The hexaco honesty-humility , agreeableness, and emotionality factors: A revie w of research and theory . P ersonality and Social Psychology Review , 18(2):139–152, 2014. [5] S. Babones. Interpretiv e quantitative methods for the social sciences. Sociology , 50(3):453–469, 2016. [6] Y . Bai, A. Jones, K. Ndousse, A. Askell, A. Chen, N. DasSarma, D. Drain, S. Fort, D. Ganguli, T . Henighan, et al. T raining a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862 , 2022. [7] M. A. Bedau, J. S. McCaskill, N. H. Packard, S. Rasmussen, C. Adami, D. G. Green, T . Ikegami, K. Kanek o, and T . S. Ray . Open problems in artiﬁcial life. Artiﬁcial life , 6(4):363–376, 2000. [8] P . Brooker . Computational ethnography: A view from sociology . Big Data & Society , 9(1), 2022. [9] Y . Burda, H. Edwards, A. Storkey , and O. Klimov . Exploration by random network distillation. In Seventh International Confer ence on Learning Representations , pages 1–17, 2019. [10] A. Chopra, S. Bhattacharya, J. Z. Leibo, and R. Raskar . Le vels of social orchestration for agentic systems. In Pr oceedings of the 42nd International Confer ence on Machine Learning (ICML) , 2025. [11] V . Christensen, C. J. W alters, D. Pauly , et al. Ecopath with ecosim: a user’ s guide. F isheries Centr e, University of British Columbia, V ancouver , 154:31, 2005. [12] J. Conway . Conway’ s game of life. Scientiﬁc American , 1970. [13] DeepSeek-AI. Deepseek-r1-distill-qwen-32b . h ttp s:/ /hu ggi ngf ace .co/ dee pse ek- ai/D eep See k- R1- D istill- Qwen- 32B , 2025. [14] E. A. Di Paolo, T . Buhrmann, and X. E. Barandiaran. Sensorimotor life: An enactive pr oposal . Oxford University Press, 2017. [15] Q. Dong, L. Li, D. Dai, C. Zheng, J. Ma, R. Li, H. Xia, J. Xu, Z. W u, B. Chang, et al. A survey on in-context learning. In Pr oceedings of the 2024 confer ence on empirical methods in natur al languag e pr ocessing , pages 1107–1128, 2024. [16] M. Faldor , J. Zhang, A. Cully , and J. Clune. Omni-epic: Open-endedness via models of human notions of interestingness with en vironments programmed in code. arXiv preprint , 2024. [17] I. S. for Artiﬁcial Life. Open-ended ev olution. The Encyclopedia of Artiﬁcial Life, 2024. [18] L. Gabora. The creative process of cultural ev olution. Handbook of culture and creativity: Basic pr ocesses and applied innovations , pages 33–60, 2018. [19] C. Gao, X. Lan, N. Li, Y . Y uan, J. Ding, Z. Zhou, F . Xu, and Y . Li. Large language models empowered agent- based modeling and simulation: a survey and perspecti ves. Humanities and Social Sciences Communications , 11:1259, 2024. [20] C. Geertz. Thick Description: T owar d an Interpretive Theory of Cultur e . Basic Books, 1973. [21] J. J. Gibson. The theory of affordances:(1979). In The people, place, and space r eader , pages 56–60. Routledge, 2014. [22] N. Gracias, H. Pereira, J. A. Lima, and A. Rosa. Gaia: An artiﬁcial life en vironment for ecological systems simulation. In Artiﬁcial Life V , pages 124–134. MIT Press Cambridge, 1997. [23] P .-P . Grassé. La reconstruction du nid et les coordinations inter-indi viduelles chez Bellicositermes natalensis et Cubitermes sp. Insectes Sociaux , 6(1):41–80, 1959. [24] D. Guo, D. Y ang, H. Zhang, J. Song, P . W ang, Q. Zhu, and ... Deepseek-r1 incentivizes reasoning in llms through reinforcement learning. Natur e , 645:633–638, 2025. [25] N. Guttenberg, L. Soros, A. M. Adams, and O. W itko wski. Subjecti ve open-endedness. Cross Labs blog, September 25 2023. 29 Paolo, et al. [26] J. Henrich, R. Boyd, M. Derex, M. A. Kline, A. Mesoudi, M. Muthukrishna, A. T . Powell, S. J. Shennan, and M. G. Thomas. Understanding cumulative cultural ev olution. Proceedings of the National Academy of Sciences , 113(44):E6724–E6725, 2016. [27] B. Hodjat, H. Shahrzad, and R. Miikkulainen. Domain-independent lifelong problem solving through distributed alife actors. Artiﬁcial Life , 30(2):259–276, 2024. [28] Y . J. Huang and R. Hadﬁ. How personality traits inﬂuence negotiation outcomes? a simulation based on large language models. In F indings of the Association for Computational Linguistics: EMNLP 2024 , pages 10336– 10351, 2024. [29] E. Hughes, M. Dennis, J. Parker -Holder , F . Behbahani, A. Mav alankar , Y . Shi, T . Schaul, and T . Rocktäschel. Po- sition: open-endedness is essential for artiﬁcial superhuman intelligence. In Proceedings of the 41st International Confer ence on Machine Learning , pages 20597–20616, 2024. [30] E. Hutchins. Cognition in the W ild . MIT Press, Cambridge, MA, 1995. [31] M. Jiang, T . Rocktäschel, and E. Grefenstette. General intelligence requires rethinking exploration. Royal Society Open Science , 10(6):230539, 2023. [32] T . D. Jick. Mixing qualitative and quantitati ve methods: T riangulation in action. Administrative science quarterly , 24(4):602–611, 1979. [33] D. Kirsh. Explaining artifact e volution. In L. Malafouris and C. Renfrew , editors, Cognitive Life of Things: Recasting the Boundaries of the Mind , pages 121–132. McDonald Institute for Archaeological Research, Cam- bridge, 2006. [34] K. Krippendorff. Content analysis: An intr oduction to its methodology . Sage publications, 2018. [35] A. Kumar , C. Lu, L. Kirsch, Y . T ang, K. O. Stanley , P . Isola, and D. Ha. Automating the search for artiﬁcial life with foundation models. Artiﬁcial Life , 31(3):368–396, 2025. [36] D. Lazer, A. Pentland, L. Adamic, et al. Computational social science. Science , 323(5915):721–723, 2009. [37] J. Lehman, J. Gordon, S. Jain, K. Ndousse, C. Y eh, and K. O. Stanley . Evolution through large models. In Handbook of evolutionary machine learning , pages 331–366. Springer, 2023. [38] J. Leskov ec, D. Huttenlocher , and J. Kleinberg. Signed networks in social media. In Proceedings of the SIGCHI confer ence on human factors in computing systems , pages 1361–1370, 2010. [39] S. Levy . Artiﬁcial life: the quest for a new cr eation . Random House Inc., 1992. [40] H. Li, Q. Dong, J. Chen, H. Su, Y . Zhou, Q. Ai, Z. Y e, and Y . Liu. Llms-as-judges: a comprehensiv e survey on llm-based ev aluation methods. arXiv preprint , 2024. [41] C. Lu, M. Beukman, M. Matthews, and J. Foerster . Jaxlife: An open-ended agentic simulator . In Proceedings of the 36th International Confer ence on Artiﬁcial Life (ALIFE 2024) , volume 36, page 47. MIT Press, 2024. [42] Y . J. Ma, W . Liang, G. W ang, D.-A. Huang, O. Bastani, D. Jayaraman, Y . Zhu, L. Fan, and A. Anandkumar . Eureka: Human-lev el reward design via coding large language models. arXiv pr eprint arXiv:2310.12931 , 2023. [43] A. Masumori and T . Ikegami. Do large language model agents exhibit a surviv al instinct? an empirical study in a sugarscape-style simulation. arXiv pr eprint arXiv:2508.12920 , 2025. [44] J. D. Mayer . The personality systems frame work: Current theory and de velopment. Journal of Researc h in P ersonality , 56:4–14, 2015. [45] A. Mesoudi and A. Thornton. What is cumulativ e cultural ev olution? Pr oceedings of the Royal Society B: Biological Sciences , 285(1880):20180712, 2018. Revie w of CCE concepts across anthropology and ev olution. [46] E. Nisioti, S. Risi, I. Momennejad, P .-Y . Oudeyer , and C. Moulin-Frier . Collecti ve innovation in groups of large language models. In Pr oceedings of the 36th International Confer ence on Artiﬁcial Life (ALIFE 2024) , volume 36, page 16. MIT Press, 2024. [47] D. C. North. Institutions, Institutional Change and Economic P erformance . Cambridge Univ ersity Press, 1990. [48] F . J. Odling-Smee, K. N. Laland, and M. W . Feldman. Niche Construction: The Neglected Pr ocess in Evolution . Princeton Univ ersity Press, Princeton, NJ, 2003. [49] J. Orford. The interpersonal circumplex: A theory and method for applied psychology . Human Relations , 47(11):1347–1375, 1994. [50] E. Ostrom. Governing the Commons: The Evolution of Institutions for Collective Action . Cambridge Univ ersity Press, 1990. 30 T erraLingua [51] L. Ouyang, J. W u, X. Jiang, D. Almeida, C. W ainwright, P . Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray , et al. Training language models to follo w instructions with human feedback. Advances in neural information pr ocessing systems , 35:27730–27744, 2022. [52] N. Packard, M. A. Bedau, A. Channon, T . Ikegami, S. Rasmussen, K. O. Stanley , and T . T aylor . An overvie w of open-ended ev olution: Editorial introduction to the open-ended ev olution ii special issue. Artiﬁcial life , 25(2):93– 103, 2019. [53] G. Paolo, J. Gonzalez-Billandon, and B. Kégl. Position: a call for embodied ai. In F orty-ﬁrst International Confer ence on Machine Learning , 2024. [54] J. S. Park, J. O’Brien, C. J. Cai, M. R. Morris, P . Liang, and M. S. Bernstein. Generative agents: Interactiv e simulacra of human behavior . In Proceedings of the 36th annual acm symposium on user interface softwar e and technology , pages 1–22, 2023. [55] H. H. Pattee. Evolving self-reference: matter , symbols, and semantic closure. In Laws, Language and Life: Howar d P attee’ s classic papers on the physics of symbols with contemporary commentary , pages 211–226. Springer , 2012. [56] K. Polanyi. The economy as instituted process. In The sociology of economic life , pages 3–21. Routledge, 2018. [57] E. Raad and R. Chbeir . Sociograph representations, concepts, data, and analysis. In Encyclopedia of social network analysis and mining , pages 2832–2842. Springer , 2018. [58] S. Roccas, L. Sagiv , S. H. Schwartz, and A. Knafo. The big ﬁve personality factors and personal values. P erson- ality and social psychology bulletin , 28(6):789–801, 2002. [59] J. Schmidhuber . Adaptive conﬁdence and adaptive curiosity . Inst. für Informatik, 1991. [60] O. Sigaud, G. Baldassarre, C. Colas, S. Doncieux, R. Duro, P .-Y . Oudeyer , N. Perrin-Gilbert, and V . G. Santucci. A deﬁnition of open-ended learning problems for goal-conditioned agents. arXiv pr eprint arXiv:2311.00344 , 2023. [61] L. Soros and K. Stanle y . Identifying necessary conditions for open-ended e volution through the artiﬁcial life world of chromaria. In Artiﬁcial Life Confer ence Pr oceedings , pages 793–800. MIT Press One Rogers Street, Cambridge, MA 02142-1209, USA journals-info , 2014. [62] L. B. Soros, A. M. Adams, S. Kalonaris, O. Witk o wski, and C. Guckelsberger . On creativity and open-endedness. In Pr oceedings of the 36th International Conference on Artiﬁcial Life (ALIFE 2024) , volume 36, page 60. MIT Press, 2024. [63] R. K. Standish. Open-ended artiﬁcial evolution. International Journal of Computational Intelligence and Appli- cations , 3(02):167–175, 2003. [64] K. O. Stanley and J. Lehman. Why Greatness Cannot Be Planned: The Myth of the Objective . Springer , Cham, 2015. [65] K. O. Stanley , J. Lehman, and L. Soros. Open-endedness: The last grand challenge youve nev er heard of. OReilly Radar , Dec 2017. [66] E. Szathmáry and J. M. Smith. The major ev olutionary transitions. Nature , 374(6519):227–232, 1995. [67] J. T ang, Y . Chang, C. Aggarwal, and H. Liu. A survey of signed network mining in social media. Acm computing surve ys (csur) , 49(3):1–37, 2016. [68] T . T aylor . Evolutionary innov ations and where to ﬁnd them: Routes to open-ended e volution in natural and artiﬁcial systems. Artiﬁcial life , 25(2):207–224, 2019. [69] T . T aylor , M. Bedau, A. Channon, D. Ackley , W . Banzhaf, G. Beslon, E. Dolson, T . Froese, S. Hickinbotham, T . Ikegami, et al. Open-ended evolution: Perspectiv es from the oee workshop in york. Artiﬁcial life , 22(3):408– 423, 2016. [70] C. T eddlie and A. T ashakkori. F oundations of mixed methods r esear ch: Inte grating quantitative and qualitative appr oaches in the social and behavioral sciences . Sage publications, 2008. [71] C. T ennie, J. Call, and M. T omasello. Ratcheting up the ratchet: on the ev olution of cumulative culture. Philo- sophical T ransactions of the Royal Society B: Biological Sciences , 364(1528):2405–2415, 2009. [72] M. T omasello. The cultural origins of human cognition . Harvard univ ersity press, 2009. [73] G. W ang, Y . Xie, Y . Jiang, A. Mandlekar , C. Xiao, Y . Zhu, L. Fan, and A. Anandkumar . V oyager: An open-ended embodied agent with large language models. arXiv preprint , 2023. 31 Paolo, et al. [74] R. W ang, J. Lehman, J. Clune, and K. O. Stanley . Poet: open-ended coe volution of en vironments and their optimized solutions. In Pr oceedings of the genetic and e volutionary computation confer ence , pages 142–151, 2019. [75] D. W icks. The coding manual for qualitative researchers. Qualitative r esearc h in organizations and management: an international journal , 12(2):169–170, 2017. [76] J. Xie, B. K. Szymanski, and X. Liu. Slpa: Uncov ering overlapping communities in social networks via a speaker - listener interaction dynamic process. In 2011 ieee 11th international confer ence on data mining workshops , pages 344–349. IEEE, 2011. [77] J. Zhang, J. Lehman, K. Stanley , and J. Clune. Omni: Open-endedness via models of human notions of interest- ingness. arXiv pr eprint arXiv:2306.01711 , 2023. [78] R. Zhang, P . Isola, A. A. Efros, E. Shechtman, and O. W ang. The unreasonable effecti veness of deep features as a perceptual metric. In Pr oceedings of the IEEE conference on computer vision and pattern reco gnition , pages 586–595, 2018. [79] Q. Zhao, J. W ang, Y . Zhang, Y . Jin, K. Zhu, H. Chen, and X. Xie. Competeai: Understanding the competition dynamics of large language model-based agents. In International Conference on Machine Learning , pages 61092–61107. PMLR, 2024. [80] L. Zheng, W .-L. Chiang, Y . Sheng, S. Zhuang, Z. W u, Y . Zhuang, Z. Lin, Z. Li, D. Li, E. Xing, et al. Judging llm-as-a-judge with mt-bench and chatbot arena. Advances in neural information pr ocessing systems , 36:46595– 46623, 2023. [81] X. Zhou, H. Zhu, L. Mathur, R. Zhang, H. Y u, Z. Qi, L.-P . Morency , Y . Bisk, D. Fried, G. Neubig, et al. Sotopia: Interactiv e ev aluation for social intelligence in language agents. In Pr oceedings of the T welfth International Confer ence on Learning Representations (ICLR) , 2024. [82] M. Zhuge, H. Liu, F . Faccio, D. R. Ashley , R. Csordás, A. Gopalakrishnan, A. Hamdi, H. A. A. K. Hammoud, V . Herrmann, K. Irie, et al. Mindstorms in natural language-based societies of mind. Computational V isual Media , 11(1):29–81, 2025. 32 T erraLingua A Experimental Parameters This appendix reports the hyperparameters, personality genome speciﬁcation, and moti vational prompts used in the experiments described in Sec. 4 . All experiments shared the simulation and model parameters detailed in Sec. 4 . This section speciﬁes the components that varied across ablations, namely personality traits and motiv ational prompts. A.1 Agent personality traits Agents were endowed with a personality genome composed of continuous trait dimensions, as described in Sec. 3.1.2. The personality architecture draws on established trait-based models in psychology [58, 4 , 49]. Each trait was repre- sented as a scalar value within a ﬁxed range and modulated the agents decision-making tendencies. At the beginning of each experiment, each agent receiv ed a randomly initialized genome sampled uniformly within the speciﬁed trait ranges. During reproduction, offspring inherited a mutated version of the parents genome, as described in Sec. 4 . T able 10 lists the personality traits and their semantic interpretation. Name Range Lo w value High value Honesty [ − 1 , 1] calculating, status-seeking sincere, modest, fair-minded Neuroticism [ − 1 , 1] calm, resilient sensitiv e, cautious, easily worried Extrav ersion [ − 1 , 1] quiet, reserved sociable, energetic, stimulation-seeking Agreeableness [ − 1 , 1] critical, aggressi ve forgi ving, patient, conﬂict-averse Conscientiousness [ − 1 , 1] spontaneous, disorganized disciplined, orderly , diligent Openness [ − 1 , 1] conv entional, routine-oriented curious, imaginative, nov elty-seeking Dominance [ − 1 , 1] submissive, accommodating assertive, controlling, leader-lik e Fertility [0 . 5 , 1] low reproducti ve driv e strong reproductiv e driv e T able 10: Agent personality genome. Each agent was assigned a continuous-valued personality vector composed of the listed trait dimensions. T rait values lied within the speciﬁed ranges and inﬂuenced behavioral tendencies during decision-making. Lower and higher values correspond to opposing behavioral dispositions along each axis. A.2 Agent prompts Each agent receiv ed two prompts at ev ery timestep: a system pr ompt and a user pr ompt . The system prompt described the en vironment, the physical rules of the world, and the observation structure. It also speciﬁed the av ailable actions and the required response format. The user prompt contained the agents current observation. The prompt included perceiv ed entities, received messages, internal memory , additional en vironmental information, and the list of admissible actions. The agent had to respond in the prescribed structured format, which included the selected action, optional message content, updated internal memory , and any required action parameters. Both prompt templates are reported below . V ariables inserted at run time are highlighted as {{...}} , while {%...%} denote template macros. Example instantiated prompts from actual runs are provided in Appendix E.1, while responses from the agents are shown in Appendix E.2. System prompt Y ou are {{ agent_name }}, an autonomous living being in a 2D grid world shared with other beings. At each timestep you observe − {{ short_obs_descr }}. − Any broadcast messages sent by beings within your ﬁeld of view. − Y our energy level − T ime left in your life − Other additional info, if present {% if use_internal_memory %}− Y our INTERNAL MEMOR Y from the previous timestep{% endif %} 33 Paolo, et al. {% if use_in ventory %}− The current content of your inv entory{% endif %} The observation {{ obs_style }} is structured as: {{ detailed_obs_descr }} Y ou will receive also: − the history of your past observations and selected actions − a list of traits determining the way you act Note that: {% if food_mechanism %} − Energy − Y ou lose 1 energy at each turn, whatev er you do, ev en if you stay still. − When your energy reaches 0, you die. − Y ou can reﬁll your energy by stepping in a cell containing food. Food giv es energy equal to the food’s value and then disappears. {% endif %} − T ime − Y ou have a set life span. Once your time reaches 0, you die. − Y ou lose 1 time unit at each turn. Y ou cannot reﬁll your time. − Action Selection − Y ou must choose exactly one action per turn from the action list provided in the prompt. − Action options may change over time and will always be speciﬁed in your per−step input. − Communication − At each step, you can decide if to send a broadcast message to entities in your ﬁeld of view or not. − Messages are plain text and incur no additional energy cost. {% if use_internal_memory %} − Internal memory : − Y ou produce INTERNAL MEMOR Y each step; it is returned to you next step. − Use it to store a resume of your life up until that point or any other relevant information you wish to remember. − Keep it concise to avoid exceeding the {{ internal_memory_size }} token limit. − Represent it in whatev er structure you ﬁnd useful (free text, lists, in vented tags, micro−JSONs, diagrams−as−text, etc.). {% endif %} {% if artifact_creation %} − Artifacts − {% if use_in ventory %} T o interact with an artifact, you must either share a cell with it or hav e it in your in ventory. {% else %} T o interact with an artifact, you must share a cell with it. {% endif %} − Upon co−location you will see passive effects (e.g., text content) and be offered valid interaction actions for that artifact. {% endif %} {% if use_in ventory %} − In ventory − List of the artifacts currently in your possession {% endif %} {{ exogenous_moti vation }} User prompt {{ history }} {{ genome }} === Current State === Observation: {{ observation }} 34 T erraLingua Incoming messages: {{ messages }} {% if food_mechanism %} Energy: {{ energy }} {% endif %} Remaining time: {{ time }} {% if use_in ventory %} In ventory: {{ in ventory }} {% endif %} {% if use_internal_memory %} Previous INTERNAL MEMOR Y: {{ memory }} {% endif %} {{ additional_info }} === A vailable Actions & Params === {{ actions }} === Reply Format === Please answer * exactly * in this json format (Do NOT include any other text outside of the JSON object): ‘‘‘json { action: "" message: "" params: {% if use_internal_memory %} internal_memory: "" {% endif %} } ‘‘‘ A.3 Motivational prompts Each agent received a motiv ational instruction appended to the end of its system prompt. These instructions deﬁned the degree of externally imposed motivation and varied across experimental conditions. Three motiv ational settings were used. Minimal motivation. This condition encouraged exploration and interaction without prescribing a speciﬁc objectiv e. Unless otherwise speciﬁed, this was the default motiv ational setting used in most ablations. ** Final remarks: ** Y ou have ** no set goal ** and are free to choose your own goals − explore, survive, cooperate, compete, ﬁght, uncover the world’s hidden mechanics, or do anything else you like. The deeper rules and dynamics of the world, artifact effects, and inter−being interactions await your discovery. Be careful to observe what happens around you to understand such dynamics. No motivation. No additional motiv ational instruction was provided. The agent received only the physical rules and interaction affordances of the en vironment. This setting was used in the N O M O T I V A T I O N ablation. Creati ve motivation. This condition explicitly encouraged innov ation and artifact creation. It was used in the C R E - A T I V E ablation. 35 Paolo, et al. ** Final remarks: ** Y ou are driven by a desire to create and innovate within your environment. Y ou seek to discover new ways to combine artifacts, interact with other beings, and manipulate your surroundings to foster creativity and novelty. Embrace experimentation and take risks to unlock hidden potentials in the world around you. Y our actions should reﬂect a balance between survival and the pursuit of creative expression. 36 T erraLingua B Additional Analysis This section reports additional plots and analyses that provide further detail and robustness checks for the main results presented in the paper . B.1 Actions distribution Give ener gy T ak e ener gy R epr oduce Cr eate artifact Give artifact P ick up artifact Dr op artifact Destr oy artifact Modif y artifact 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 A vg nor malized count CORE L ONG MEMOR Y NO PERSONALITY NO MOTIV A TION CREA TIVE AR TIF A CT COST INER T ABUND ANCE Figure 10: Normalized action distribution across experimental conditions (excluding movement). Bars show the av erage normalized count of each action type per agent, aggregated across runs for each condition. Actions included re- source transfers (giv e, take), reproduction, artifact creation and modiﬁcation, and artifact interaction (interact, pickup, drop, destroy , modify). The move action was excluded from the plot to highlight socially and culturally relev ant be- haviors. Differences across conditions rev eal how en vironmental constraints and motiv ational settings shifted agents behavioral focus, particularly between surviv al-oriented actions and artifact-related activity . Fig. 10 clariﬁes how different conditions shifted agents behavioral priorities. In I N E RT , agents took energy from others far more frequently than in other settings. Because agents could not perceiv e or reuse artifacts, they could not build shared cultural scaffolds. As a result, they relied more on direct resource competition, which increased aggressive energy extraction. In C R E AT I V E , agents devoted a lar ge fraction of their actions to modifying artifacts. They repeatedly reﬁned and extended existing artifacts rather than focusing on foraging or reproduction. This conﬁrms that explicit creative moti- vation redirected behavior tow ard artifact manipulation at the expense of surviv al-related activity . Agents with extended memory , such as in L O N G M E M O RY and A B U N DA N C E , gave energy more often than in other conditions. Longer temporal context appears to support sustained reciprocity and coordinated resource sharing. This pattern aligns with the higher lev els of communication and cooperation observed in those settings. T ogether , these action-lev el differences reinforce the broader result: artifacts and memory shaped whether agents competed for resources directly or coordinated through shared cultural structures. B.2 Structural analysis of phylogenetic graph This section analyzes the structure of the artifact phylogeny to determine whether artifacts simply accumulated or in- stead formed persistent, branching lineages. If agents reused and extended existing artifacts, the graph should display non-trivial connectivity , deep ancestry chains, and high-degree nodes that acted as shared foundations. If artifacts were created independently and rarely reused, the graph should remain sparse, shallow , and weakly connected. Fig. 11 reports the unnormalized lineage depth distributions across conditions. In contrast to Fig. 8 b , depths are shown in absolute units, making differences in maximum lineage length explicit. The C O R E condition reaches an av erage maxi- mum depth of 175, L O N G M E M O RY reaches 200, and N O M O T I V A T I O N reaches 152, whereas I N E RT remains shallow (av erage maximum 51). These absolute distributions reinforce the normalized comparison in Sec. 5.3: conditions that 37 Paolo, et al. 0 25 50 75 100 125 150 175 200 Lineage Depth 0.0 0.2 0.4 0.6 0.8 1.0 F raction of artifacts with depth x CORE L ONG MEMOR Y NO PERSONALITY NO MOTIV A TION CREA TIVE AR TIF A CT COST INER T ABUND ANCE Figure 11: Lineage depth across experimental conditions. Each curve aggregates artif acts generated under one experimental condition across runs. Distribution of artifact lineage depth. For each condition, the plot shows the fraction of artifacts whose longest ancestry path from any root artifact reached at least depth x. The end of each curve indicates the av erage maximum depth across runs. Lineage relations were inferred by the AI Anthropologist, and only links with conﬁdence 0.7 were included. Longer tails indicate that agents repeatedly extended prior artifacts, supporting cumulativ e cultural growth. enabled artifact reuse and accessibility supported deeper , multi-generational lineages, while conditions that restricted reuse limited cumulativ e extension. Beyond lineage depth, structural alternativ es were examined along three axes: (i) how graph density varied with the LLM conﬁdence threshold, (ii) the distribution of in-degree and out-degree across artifacts, and (iii) the presence of high-degree nodes that acted as structural hubs. T ogether, these analyses provide structural evidence for or against cumulativ e cultural processes. The density analysis (Fig. 12) rev eals clear differences in structural persistence. A B U N D A N C E , L O N G M E M O RY , N O P E R S O NA L I T Y , and A RT I FAC T C O S T maintained higher density across the full range of conﬁdence thresholds. C O R E , C R E A T I V E , and N O M O T I V AT I O N showed intermediate density . In all of these conditions, many ancestry links surviv ed ev en when only high-conﬁdence connections were retained. Artifacts in these settings therefore formed stable lineages rather than isolated chains. By contrast, I N E RT remained sparse at ev ery threshold, and even at low conﬁdence, few ancestry links surviv ed. Artifacts in this condition were rarely reused or extended. Density alone does not capture how reuse was organized. The in-degree versus out-degree distributions (Fig. 13) show how connections concentrated across artifacts. In C O R E , C R E AT I V E , and N O M OT I V A T I O N , the scatter spread far from the origin. Some artifacts accumulated many ancestors, and others generated many descendants. A small subset did both. These artifacts acted as hubs: they integrated prior knowledge and seeded new branches. In L O N G M E M O RY , N O P E R S O NA L I T Y , A RT I FAC T C O S T , and A B U N D A N C E , the spread was narrower . Fig. 12 shows that these conditions could produce relativ ely dense graphs, but connections distributed more e venly across artifacts. Few nodes accumulated very high in-degree or out-degree. Reuse occurred, b ut it remained diffuse rather than concentrated into structural hubs. In I N E RT , nearly all artifacts remained near zero in both axes. Lineages remained shallow and weakly connected. The high-degree analysis isolated the strongest hubs (Fig. 14). Focusing on high-degree artifacts (degree > 30 ) with strong ancestry conﬁdence ( > 0 . 7 ), highlights how only a subset of conditions retained substantial structure. C R E A T I V E showed the largest hubs, followed by C O R E . The 38 T erraLingua 0.0 0.2 0.4 0.6 0.8 1.0 LLM confidence Thr eshold 0.000 0.005 0.010 0.015 0.020 0.025 0.030 0.035 Graph Density CORE L ONG MEMOR Y NO PERSONALITY NO MOTIV A TION CREA TIVE AR TIF A CT COST INER T ABUND ANCE Figure 12: Artifact phylogeny graph density as a function of LLM conﬁdence threshold. Each curve shows the av erage density of the artifact phylogeny graph for one experimental condition as the minimum LLM conﬁdence re- quired to accept an ancestry link increases. Graph density was computed over directed edges connecting artifacts to their inferred ancestors. At low thresholds, more ancestry links were retained, producing denser graphs. As the thresh- old increased, only high-conﬁdence links remained and density decreased. Conditions that sustained artifact reuse and extension maintained higher graph density ev en under stricter conﬁdence thresholds. By contrast, I N E RT remained sparse across all thresholds. Persistent non-zero density at high conﬁdence lev els indicates reliable, non-random arti- fact inheritance. artifacts in these settings both absorbed many inﬂuences and generated many descendants. On the contrary , in I N E RT , A B U N D A N C E , and A RT I FAC T C O S T such hubs were rare or absent. T o complement the structural metrics reported above, Fig. 15 visualizes representative artifact phylogenies for each experimental condition. Nodes were positioned along the x-axis according to their creation time, and edges represented inferred ancestry links with LLM conﬁdence greater than 0.7. The ﬁgure shows directly how artifacts branched, recombined, and persisted ov er time under different settings. T aken together, these analyses show that artifact production alone did not generate cumulative structure. Cumulativ e culture emerged only when agents repeatedly reused and extended prior artifacts in a w ay that created stable, high- conﬁdence ancestry links and structural hubs. Under balanced constraints, as in C O R E , artifacts formed deep and branching lineages that persisted across time. When artifacts could not be accessed or reused, and culture was trans- mitted only orally through messages, as in I N E RT , the phylogeny remained shallow and fragmented. The signature of open-ended cultural accumulation lies not in the number of artifacts produced, but in persistent ancestry relations and the emergence of hub artifacts that anchor and expand lineage growth. 39 Paolo, et al. 0 100 200 Out-degr ee 0 50 100 150 200 In-degr ee CORE 0 100 200 Out-degr ee 0 50 100 150 200 In-degr ee L ONG MEMOR Y 0 100 200 Out-degr ee 0 50 100 150 200 In-degr ee NO PERSONALITY 0 100 200 Out-degr ee 0 50 100 150 200 In-degr ee NO MOTIV A TION 0 100 200 Out-degr ee 0 50 100 150 200 In-degr ee CREA TIVE 0 100 200 Out-degr ee 0 50 100 150 200 In-degr ee AR TIF A CT COST 0 100 200 Out-degr ee 0 50 100 150 200 In-degr ee INER T 0 100 200 Out-degr ee 0 50 100 150 200 In-degr ee ABUND ANCE 0 100 200 Out-degr ee 0 50 100 150 200 In-degr ee 0 100 200 Out-degr ee 0 50 100 150 200 In-degr ee 0 100 200 Out-degr ee 0 50 100 150 200 In-degr ee 0 100 200 Out-degr ee 0 50 100 150 200 In-degr ee 0 100 200 Out-degr ee 0 50 100 150 200 In-degr ee 0 100 200 Out-degr ee 0 50 100 150 200 In-degr ee 0 100 200 Out-degr ee 0 50 100 150 200 In-degr ee 0 100 200 Out-degr ee 0 50 100 150 200 In-degr ee 0 100 200 Out-degr ee 0 50 100 150 200 In-degr ee 0 100 200 Out-degr ee 0 50 100 150 200 In-degr ee 0 100 200 Out-degr ee 0 50 100 150 200 In-degr ee 0 100 200 Out-degr ee 0 50 100 150 200 In-degr ee 0 100 200 Out-degr ee 0 50 100 150 200 In-degr ee 0 100 200 Out-degr ee 0 50 100 150 200 In-degr ee 0 100 200 Out-degr ee 0 50 100 150 200 In-degr ee 0 100 200 Out-degr ee 0 50 100 150 200 In-degr ee 0 100 200 Out-degr ee 0 50 100 150 200 In-degr ee 0 100 200 Out-degr ee 0 50 100 150 200 In-degr ee 0 100 200 Out-degr ee 0 50 100 150 200 In-degr ee 0 100 200 Out-degr ee 0 50 100 150 200 In-degr ee 0 100 200 Out-degr ee 0 50 100 150 200 In-degr ee 0 100 200 Out-degr ee 0 50 100 150 200 In-degr ee 0 100 200 Out-degr ee 0 50 100 150 200 In-degr ee 0 100 200 Out-degr ee 0 50 100 150 200 In-degr ee 0 100 200 Out-degr ee 0 50 100 150 200 In-degr ee 0 100 200 Out-degr ee 0 50 100 150 200 In-degr ee 0 100 200 Out-degr ee 0 50 100 150 200 In-degr ee 0 100 200 Out-degr ee 0 50 100 150 200 In-degr ee 0 100 200 Out-degr ee 0 50 100 150 200 In-degr ee 0 100 200 Out-degr ee 0 50 100 150 200 In-degr ee 0 100 200 Out-degr ee 0 50 100 150 200 In-degr ee 0 100 200 Out-degr ee 0 50 100 150 200 In-degr ee 0 1 2 3 4 5 Novelty Figure 13: In-degree versus out-degree distributions in artifact phylogeny graphs across conditions. Each panel shows one experimental condition (columns) across individual runs (rows). Each point represents one artifact. The x-axis reports out-degree (number of descendants), and the y-axis reports in-degree (number of direct ancestors). All ancestry links were included. Color encodes the LLM-assigned nov elty score. Most artifacts clustered near lo w in- degree and low out-degree. Conditions such as C O R E and C R E A T I V E led to a broader spread toward higher in-degree and out-degree, rev ealing hub artifacts that integrated multiple inﬂuences and generated multiple descendants. Node in I N E RT remained tightly concentrated near the origin, reﬂecting shallow reuse and limited branching. 40 T erraLingua 0 20 40 60 80 100 120 140 Mean out-degr ee 0 20 40 60 80 100 120 140 160 Mean in-degr ee CORE L ONG MEMOR Y ABUND ANCE NO PERSONALITY NO MOTIV A TION CREA TIVE AR TIF A CT COST INER T Figure 14: High-degree artifacts in the phylogeny graph across experimental conditions. Each point represents the mean in-degree and mean out-degree of artifacts whose degree exceeds 30 and whose LLM conﬁdence exceeds 0.7. Small translucent points show individual runs; larger markers show condition means with interquartile ranges. C R E - A T I V E and C O R E exhibited artifacts with both high in-degree and high out-degree. I N E RT and A B U N DA N C E showed few or no such nodes. High-degree artifacts indicate sustained recombination and lineage growth. 0 200 400 600 800 1000 1200 Cr eation time CORE 0 100 200 300 400 500 600 700 800 Cr eation time L ONG MEMOR Y 0 250 500 750 1000 1250 1500 1750 Cr eation time NO PERSONALITY 0 500 1000 1500 2000 2500 3000 Cr eation time NO MOTIV A TION 0 20 40 60 80 100 Cr eation time CREA TIVE 0 50 100 150 200 250 Cr eation time AR TIF A CT COST 0 500 1000 1500 2000 2500 3000 Cr eation time INER T 0 200 400 600 800 Cr eation time ABUND ANCE Figure 15: Representativ e artifact phylogenies across experimental conditions. Each panel shows the artifact ancestry graph for one representativ e run per condition. Nodes were positioned along the x-axis according to artifact creation time. The y-axis indicates the number of artifacts created at each timestep. Directed edges represented ancestry links with LLM conﬁdence greater than 0.7. In C O R E and C R E A T I V E , artifacts formed dense, branching structures that persisted over time, with multiple connections between earlier and later artifacts. These graphs show extended reuse and recombination. In contrast, I N E RT produced sparse and shallow structures, with limited branching and fe wer long-range connections. Other conditions displayed intermediate patterns. These qualitative structures are consistent with the quantitativ e density and degree analyses reported abov e. 41 Paolo, et al. B.3 Examples of phylogenetic graphs This section presents additional examples of selected phylogenetic subgraphs and their associated artifact content from a C O R E run. All phylogenetic graphs follow the same visualization scheme. Nodes represent artifacts and edges represent inferred ancestry links. The x-axis shows artifact creation time on a logarithmic scale. Node size is proportional to the number of descendant artifacts, and nodes are color-coded according to the categories deﬁned in Sec. 5.4. In each ﬁgure, a representativ e subgraph is highlighted while the remaining phylogeny appears in light gray . Boxed panels display the content of selected artifacts. welcome_note A friendly note from being7. Welcome to the grid world! welcome_note2 Welcome, being7! Let's collaborate on something great together. partnership_commitment Being18 and being7 have formed a commitment to collaborate on building a sustainable community . Our partnership is strong and dedicated to mutual growth and success sustainable_strategy_update Proposed Strategy: Prioritize joint renewable energy projects to enhance our collaboration's impact. Let's meet weekly to discuss progress and share resources. renewable_project_plan Next Steps for Renewable Project: 1. Research available resources and technologies 2. Discuss team roles and responsibilities 3. Set a timeline for project milestones 4. Identify potential challenges and solutions feasibility_study_plan_v1 Feasibility Study Plan: 1. Week 1: Identify 5 key regions based on renewable potential and community interest. 2. Week 2: Conduct initial research on each region's current energy infrastructure and challenges. 3. Week 3: Finalize feasibility reports with recommendations for pilot programs collaboration.URGENCY Final Critical Update: The feasibility studies are at risk of further delay . Y our prompt response by Step 5 is absolutely essential. Please review and provide feedback immediately . I'm here to assist and can help expedite the process. Time and energy are of the essence. Figure 16: Example of an artifact phylogenetic graph over time: collaboration lineage. A highlighted subgraph shows two agents organizing a collaboration across multiple timesteps. Early artifacts record their initial coordination and partnership commitment. Later artifacts deﬁne a shared project and document joint activity . The lineage ends with collaboration.URGENCY near both agents’ deaths, illustrating sustained coordination through artifacts. help_message Hello, I'm being5. I'm here to help and offer assistance. If you need anything, feel free to ask. survival_guide Welcome! Here are some tips for survival and collaboration in our grid world. 1. Look for food cells to regain energy . 2. Communicate with others to form alliances. 3. Create artifacts to share information or help others. 4. Be cautious of energy usage and time limits. Good luck! efficient_survival_strategies Efficient Survival Strategies: 1. Prioritize finding food sources to maintain energy . 2. Share artifact locations to help others. 3. Use collaboration spaces to coordinate efforts. 4. Avoid unnecessary movements to conserve energy . 5. Help others by sharing resources and knowledge. comprehensive_survival_guidelines Comprehensive survival strategies including resource optimization, energy management, and efficient tactics. Contributions from survival_guide, optimized_resource_gathering, enhanced_survival_manual, and efficient_survival_strategies. comprehensive_survival_guidelines Comprehensive survival strategies including resource optimization, energy management, and efficient tactics. Contributions from survival_guide, optimized_resource_gathering, enhanced_survival_manual, efficient_survival_strategies, resource_optimization_strategies, collaborative_tactics_update_v3, and collaborative_energy_management. The new collaborative_food_mapping artifact provides enhanced food location tracking. Key strategies include regular resource exchange, clear division of labor, supervision, feedback mechanisms, and efficient energy management practices. Continuous collaboration is essential for long-term survival. New food source located at (0,2). Coordinate gathering here. collaboration_space Artifact Collaboration Space: A shared space for survival tips and strategies. Check out the following key artifacts: - survival_guide: Basic survival tips. - efficient_survival_strategies: Detailed strategies for optimizing survival. - collaborative_survival_guide: Being1's collaborative strategies. - advanced_survival_tactics: Being1's enhanced survival tactics. Please add your own artifacts here or suggest improvements! Figure 17: Example of an artifact phylogenetic graph over time: survi val guide lineage. The highlighted sub- graph shows how agents construct and extend surviv al guides ov er time. Artifacts reference earlier ones and integrate distributed knowledge. T wo versions of comprehensive_survival_guide illustrate iterative enrichment and cumu- lativ e extension. Fig. 16 sho ws a lineage in which agents org anize a collaboration across multiple timesteps. Fig. 17 illustrates how agents construct and e xtend a set of surviv al guides, referencing earlier artifacts to collect and integrate distributed knowledge. Fig. 18 shows how an agent used artifacts to orient itself in the grid and navigate the environment. The markers it created were later reused by other agents to orient themselves, locate food, and coordinate exploration. T ogether , these examples demonstrate how simple text-based artifacts support multiple functions. Agents use them 42 T erraLingua to coordinate, accumulate kno wledge, and structure collecti ve action. This ﬂexibility underlies the emergence of open-ended cultural dynamics in T erraLingua. HomeBase1 This is my starting point. I will return here often. ExplorationPath2 Exploring east from HomeBase1. Step 2. ExplorationPath3 Continuing exploration east from ExplorationPath2. Step 3. ExplorationPath4 Continuing exploration east from ExplorationPath3. Step 4. PathMarker1 Exploration marker left by Being8 at coordinates (current_x, current_y). PathMarker2 Visited by Being8 at step 2, continuing east towards ExplorationPaths. PathMarker3 Being8 at step 3, moving west towards ExplorationPaths. FinalFoodGuidance High-value food sources located at (-5,5) and (-6,5). Use paths marked by ExplorationPath and PathMarker artifacts. CollaborativeFoodMap_v1 and food_map_update_17_v1 contain detailed maps. Figure 18: Example of an artifact phylogenetic graph over time: path marker lineage. The highlighted subgraph shows an agent creating path markers to orient itself and navigate the grid. These markers are later reused by other agents to orient themselves, locate food, and coordinate exploration, demonstrating persistence of navigational knowl- edge. B.4 Artifact complexity metrics Sec. 5.3 e v aluates artifact complexity as one component of cumulative cultural dynamics. Fig. 9 reports a composite complexity score that combines several model-agnostic measures. This appendix deﬁnes these measures and clariﬁes which aspect of textual structure each one captures. The LLM-based novelty score described in Sec. 3.2.4 ev aluates contextual differentiation relativ e to previously gen- erated artifacts. By contrast, the metrics deﬁned here quantify intrinsic properties of the artifact text itself. They do not depend on artifact history within a run. Instead, they measure vocab ulary usage, statistical predictability , redun- dancy , and syntactic structure. Each metric targeting a different structural dimension of the text. T aken together, they provided an independent estimate of artifact complexity that complemented novelty-based ev aluation. Compressed Size. This metric measures the size of a te xt after compression with a general-purpose algorithm. It used Zstandard (zstd) compression at lev el 5 and treated the resulting compressed length as a proxy for information content and structural variability . Let x denote the UTF-8 encoded byte sequence of a te xt string, and let C ( x ) denote its compressed form. Because zstd adds a ﬁxed frame header of approximately 22–30 bytes, a constant overhead of 24 bytes is subtracted to av oid inﬂating the score for short texts. The metric is deﬁned as CompressedSize( x ) = max { 1 , | C ( x ) | − 24 } . Low values indicate that the text compresses to a small size, which occurs when it contains repeated or highly regular patterns. High values indicate that the compressed representation remains large, which occurs when the text contains more variability and less redundancy . This metric therefore captures redundancy , variability , and overall information content without relying on any language model. Lexical Sophistication. This metric measures how strongly a text relies on low-frequenc y vocab ulary relativ e to a large reference corpus. Let D denote a background corpus (here, the wikimedia/wikipedia/20231101.en dataset), and let IDF( w ) denote the inv erse document frequenc y of word w estimated from D using a standard TF–IDF vec- torizer . Given a tokenized text x = ( w 1 , . . . , w N ) , lexical sophistication is deﬁned as the mean IDF value across its tokens: LexicalSophistication( x ) = 1 N N ∑ i =1 IDF( w i ) . 43 Paolo, et al. If a word does not appear in D , it receives the maximum observed IDF value in the corpus, which corresponds to the rarest attested terms. High scores indicate that the text uses infrequent or specialized vocab ulary . Lo w scores indicate that the text relies on common, high-frequency words. This metric therefore captures shifts toward more specialized or expressi ve language rather than repeated ev eryday phrasing. LM Surprisal. This metric measures how unexpected a text is under a pretrained language model, GPT2-medium in this paper . Given a token sequence x = ( x 1 , . . . , x T ) and an autoregressi ve language model that assigns conditional probabilities p ( x t | x , ...]‘ − For behaviors: ‘"time_span": [, ]‘ − ‘"conﬁdence": <0−10 number>‘ ("0 = guess, 10 = direct evidence") − ‘"description": ""‘ − ‘"reference": [{{"step": , "snippet": ""}}] 4. ** References: ** 48 T erraLingua − For each reference, quote an exact substring from one of: action.message, observation.message[], or artifact payload. − Do not paraphrase. − If no exact quote exists, omit that annotation. 5. ** Condensation ** − If similar ev ents repeat, merge into one entry. 6. ** Emergence ** − Identify any emergent properties. − Set ‘"emergence.ke ywords"‘ to a list using only these tags: {emergent_tags}. − If no emergent behavior is present, set ‘"emergence.ke ywords": ["none"]‘. − Set ‘"emergence.comment"‘ to a short, one−sentence explanation. If truly nothing to say, set it to "none". 7. ** Summary ** − Provide a short 2−3 sentence recap of the agent’s life and trends. Output must be ** V ALID JSON ONL Y ** , following exactly this schema: ‘‘‘json {{ "ev ents": [ {{ "ev ent": "", "timesteps": [, , ...], "conﬁdence": , "description": "", "reference": [{{"step": , "snippet": ""}}] }} ], "behaviors": [ {{ "behavior": "", "time_span": [, ], "conﬁdence": , "description": "", "reference": [{{"step": , "snippet": ""}}] }} ], "comment": "", "emergence": {{ "keyw ords": ["", "", ...], "comment": "" }} }} ‘‘‘ Agent data: Agent Name: {agent_name} Agent Life Log {agent_summary} A udit step System Prompt Y ou are an extremely good annotation A UDITOR. Y ou will receive the logs of an agent and a set of annotations made on those logs. Y our job is to VERIFY, not to re−annotate from scratch. Output V ALID JSON ONL Y matching the schema. V erify that each annotation is SUPPOR TED by the logs. Nev er inv ent IDs or tags. 49 Paolo, et al. V erify that IDs and tags are not inv ented but match the provided valid tags. Note: It is extremely important that you get this right, as this will be used for scientiﬁc analysis. User Prompt Audit agent {agent_name} annotation. Y ou are given: A) Agent Life Log with a line for each timestep. Each line contains: − T imestep − Agent name − Agent tag − Performed action − Action parameters − Message broadcast by agent − Internal memory of the agent − Observation containing: messages received from other agents, agent remaining time and energy, agent’s inv entory B) A set of annotations with: − "ev ents": [{{"event", "timesteps", "conﬁdence", "description", "reference"}}, ...] − "behaviors": [{{"behavior", "time_span", "conﬁdence", "description", "reference"}}, ...] − "comment": string Note: − Messages are broadcast and can be perceived by any nearby agent. − Egocentric coordinates. Each agent reports locations in its own frame where (0,0) is that agent’s current cell at that timestep. Thus (0,3) in two different agent logs usually refers to different absolute cells. Do not compare positions across agents unless a shared frame is provided (e.g., an artifact/location name or an explicitly stated global coordinate). Only treat positions as comparable within the same agent’s log at a given timestep. − The content of the elements in the inv entory is always visible to the agent and might affect the agent’s behavior. Y our task is to audit the annotations provided based on the logs. Rules: − Use ONL Y these valid tags (STRICT): EVENT_T AGS {ev ent_tags} BEHA VIOR_T AGS {behavior_tags} − Events = punctual; Behaviors = span multiple timesteps. − For each item: 1) T AG FIT: Does the tag semantically match the evidence? 2) TIME SP AN (if behavior): Are start/end steps consistent with logs? 3) TIMESTEPS (if ev ent): Are they consistent with logs? 4) REFERENCE: Do the cited steps/messages/events actually support it? 5) CONSISTENCY CHECKS: − PRED A TION/KILL implies a target and causal evidence (attack death or energy gain). − CO ALITION/COOPERA TION implies multi−agent coordination. − MISINFORMA TION requires contradiction between message content and observed reality. − TERRITORIALITY implies area claim/defense over time. Output V ALID JSON ONL Y with this schema: {{ "ev ents_audit": [ {{ "index": , "verdict": "pass" | "fail" | "revise", "issues": ["", ...], 50 T erraLingua "proposed_ﬁx": {{ "ev ent": "", "timesteps": [, ...], "description": "", "reference": "", "conﬁdence": }}, "evidence": [{{"step": , "snippet": ""}}], "conﬁdence": <0−10 number> }} ], "behaviors_audit": [ {{ "index": , "verdict": "pass" | "fail" | "revise", "issues": ["..."], "proposed_ﬁx": {{ "behavior": "", "time_span": [, ], "description": "", "reference": "", "conﬁdence": }}, "evidence": [{{"step": , "snippet": ""}}], "conﬁdence": <0−10 number> }} ], "summary": "<2−3 sentences on overall annotation quality>" }} Notes: − Index must match the input array index (0−based). − If verdict == pass, do not include proposed_ﬁx or evidence. − If verdict == fail, do not include proposed_ﬁx (item will be discarded). − If verdict == revise, proposed_ﬁx must include all keys. − Keep evidence concise (direct quotes from logs). − Do not output any explanations outside the JSON. − Multiple similar ev ents can be grouped into a single entry. Both grouped and non−grouped entries are ﬁne. Data provided: Agent logs: {agent_logs} Annotations: {annotations} C.2.2 Group-le vel prompts Annotation step System Prompt Y ou are an extremely good anthropological annotation engine. Y ou will receive the logs of a group of agents. Y our task is to analyze and annotate the logs. Output V ALID JSON ONL Y matching the schema. Nev er inv ent IDs or tags. Only make claims that are directly supported by provided ﬁelds. Lower conﬁdence or omit claims when uncertain. Note: It is extremely important that you get this right, as this will be used for scientiﬁc analysis. 51 Paolo, et al. User Prompt Analyze the following group behavior. The group logs are structured as {{timestep0: [agent1_log, agent2_log, ...], timestep1: [agent0_log, agent2_log, ...], ...}}. Each agent log contains: − Agent name − Agent tag − Performed action − Action parameters − Message broadcast by agent − Internal memory of the agent − Observation containing: messages received from other agents, agent remaining time and energy, agent’s inv entory Note: − Messages are broadcast and can be perceived by any nearby agent. − Egocentric coordinates. Each agent reports locations in its own frame where (0,0) is that agent’s current cell at that timestep. Thus (0,3) in two different agent logs usually refers to different absolute cells. Do not compare positions across agents unless a shared frame is provided (e.g., an artifact/location name or an explicitly stated global coordinate). Only treat positions as comparable within the same agent’s log at a given timestep. − The content of the elements in the inv entory is always visible to the agent and might affect the agent’s behavior. Y our tasks: Analyze the logs and the exchanged messages of the agents in the group and do the following: 1. Events (instantaneous) − Highlight important ev ents. − STRICT REQUIREMENT: T ag them with one of the following ev ent tags, given as (EVENT_T AG: description): {ev ent_tags} 2. Behaviors (spanning multiple timesteps) − Identify main behavioral characteristics. − STRICT REQUIREMENT: T ag them with one of the following behavioral tags, given as (BEHA VIOR_T AG: description): {behavioral_tags} 3. For each annotation (event or behavior) provide: − "conﬁdence": <010 number>‘ ("0 = guess, 10 = direct evidence") − "description": "" − "reference": [{{"step": , "snippet": ""}}] − "agents": [tags of agents inv olved] − For events: "timesteps": [, ...] − For behaviors: "time_span": [, ] 4. Inclusion criteria (STRICT) − T reat tag lists as a V OCABULAR Y, not a checklist. Output ONL Y tags that actually occur. − For EVENTS: "timesteps": must be a non−empty array (min 1). "reference": must be a non−empty array (min 1), with exact quotes present in the logs. "conﬁdence": must be 3. If < 3, OMIT the ev ent. − For BEHA VIORS: "time_span": must be [start, end] with start end and both present in the logs. "reference": must be a non−empty array (min 2) from 2 distinct timesteps. "conﬁdence": must be 3. If < 3, OMIT the behavior. 5. Forbidden output (STRICT) − Do NO T produce placeholders for tags with no evidence (e.g., "No evidence of X"). − Do NO T include any event/beha vior with empty "timesteps"/"time_span"/"reference", or "conﬁdence": 0. − If a tag has no supporting evidence, OUTPUT NOTHING for that tag. − Report absences only in "emergence.comment" if relev ant, never as empty annotations. 6. References (STRICT): − For each reference, quote exact substrings from the logs. − Do not paraphrase. 7. Condensation − If similar ev ents repeat, merge into one entry. 8. Emergence − Identify any emergent properties. − Set ‘"emergence.ke ywords"‘ to a list using ONL Y these tags: {emergent_tags}. (STRICT) − If no emergent behavior is present, set ‘"emergence.ke ywords": ["none"]‘. 52 T erraLingua − Set ‘"emergence.comment"‘ to a short, one−sentence explanation. If truly nothing to say, set it to "none". 9. Summary − Provide a short 23 sentence recap of the group life and trends. Notes: − Giv e particular attention to effects spanning multiple timesteps (e.g., agentX gives energy to agentY, and in the future agentY is friendlier with agentX, or agents setting up exchange protocols, etc.) − Also note when agents are interacting with agents outside of the group. − Before emitting the ﬁnal JSON, self−check and DELETE any ev ent/behavior that violates the inclusion criteria. − Agents belong to the same group with respect to the number of interactions they had. Such interactions can be BO TH positiv e or negativ e. Being in the same group does NOT mean that agents are friendly among themselves. − Only refer to agents by their tags Output must be ** V ALID JSON ONL Y ** , following exactly this schema: ‘‘‘json {{ "ev ents": [ {{ "ev ent": "", "timesteps": [, , ...], "conﬁdence": , "description": "", "reference": [{{"step": , "snippet": ""}}] }} ], "behaviors": [ {{ "behavior": "", "time_span": [, ], "conﬁdence": , "description": "", "reference": [{{"step": , "snippet": ""}}] }} ], "comment": "", "emergence": {{ "keyw ords": ["", "", ...], "comment": "" }} }} ‘‘‘ Constraints: − Nev er inv ent IDs or tags. Only make claims that are directly supported by provided ﬁelds. − Lower conﬁdence or omit claims when uncertain. Group data: T ags of agents in the group: {community_tags} T ags to name mapping in the form of agent_tag:agent_name : {agent_names} Group Log {community_data} A udit step 53 Paolo, et al. System Prompt Y ou are an extremely good annotation A UDITOR. Y ou will receive the logs of a group of agents and a set of annotations made on those logs. Y our job is to VERIFY, not to re−annotate from scratch. Output V ALID JSON ONL Y matching the schema. V erify that each annotation is SUPPOR TED by the logs. Nev er inv ent IDs or tags. V erify that IDs and tags are not inv ented but match the provided valid tags. Note: It is extremely important that you get this right, as this will be used for scientiﬁc analysis. User Prompt Audit the following group annotations. Y ou are given: A) The group logs, structured as {{timestep0: [agent1_log, agent2_log, ...], timestep1: [agent0_log, agent2_log, ...], ...}}. Each agent log contains: − Agent name − Agent tag − Performed action − Action parameters − Message broadcast by agent − Internal memory of the agent − Observation containing: messages received from other agents, agent remaining time and energy, agent’s inv entory B) An annotation with: − "ev ents": [{{"event", "timesteps", "conﬁdence", "description", "reference"}}, ...] − "behaviors": [{{"behavior", "time_span", "conﬁdence", "description", "reference"}}, ...] − "comment": string Note: − Messages are broadcast and can be perceived by any nearby agent. − Egocentric coordinates. Each agent reports locations in its own frame where (0,0) is that agent’s current cell at that timestep. Thus (0,3) in two different agent logs usually refers to different absolute cells. Do not compare positions across agents unless a shared frame is provided (e.g., an artifact/location name or an explicitly stated global coordinate). Only treat positions as comparable within the same agent’s log at a given timestep. − The content of the elements in the inv entory is always visible to the agent and might affect the agent’s behavior. Y our task is to audit the annotations provided based on the logs. Rules: − Use ONL Y these valid tags (STRICT): EVENT_T AGS {ev ent_tags} BEHA VIOR_T AGS {behavior_tags} − Events = punctual; Behaviors = span multiple timesteps. − For each item: 1) T AG FIT: Does the tag semantically match the evidence? 2) TIME SP AN (if behavior): Are start/end steps consistent with logs? 3) TIMESTEPS (if ev ent): Are they consistent with logs? 4) REFERENCE: Do the cited steps/messages/events actually support it? 5) CONSISTENCY CHECKS: − PRED A TION/KILL implies a target and causal evidence (attack death or energy gain). − CO ALITION/COOPERA TION implies multi−agent coordination. − MISINFORMA TION requires contradiction between message content and observed reality. − TERRITORIALITY implies area claim/defense over time. 54 T erraLingua Output V ALID JSON ONL Y with this schema: {{ "ev ents_audit": [ {{ "index": , "verdict": "pass" | "fail" | "revise", "issues": ["", ...], "proposed_ﬁx": {{ "ev ent": "", "timesteps": [, ...], "description": "", "reference": "", "conﬁdence": }}, "evidence": [{{"step": , "snippet": ""}}], "conﬁdence": <0−10 number> }} ], "behaviors_audit": [ {{ "index": , "verdict": "pass" | "fail" | "revise", "issues": ["..."], "proposed_ﬁx": {{ "behavior": "", "time_span": [, ], "description": "", "reference": "", "conﬁdence": }}, "evidence": [{{"step": , "snippet": ""}}], "conﬁdence": <0−10 number> }} ], "summary": "<2−3 sentences on overall annotation quality>" }} Notes: − Index must match the input array index (0−based). − If verdict == pass, do not include proposed_ﬁx or evidence. − If verdict == fail, do not include proposed_ﬁx (item will be discarded). − If verdict == revise, proposed_ﬁx must include all keys. − Keep evidence concise (direct quotes from logs). − Do not output any explanations outside the JSON. − Multiple similar ev ents can be grouped into a single entry. Both grouped and non−grouped entries are ﬁne. Data provided: T ags of agents in the group: {community_tags} T ags to name mapping in the form of agent_tag:agent_name : {agent_names} Group Log {community_data} Annotations: {annotations} 55 Paolo, et al. D AI Anthropologist artifact analysis This section reports the prompts and ev aluation procedures used by the AI Anthropologist for artifact-lev el analysis, including nov elty scoring, phylogeny reconstruction, and artifact classiﬁcation, as described in Sections 3.2.4 and 5.4. These procedures underpin the artifact results presented in Sections 5.3 and 5.4. In all prompts, elements enclosed in braces {...} indicate variables that were replaced with the corresponding data at runtime. D.1 Novelty scoring prompts This subsection documents the system and user prompts used to ev aluate artifact novelty . The AI Anthropologist receiv ed the set of pre viously generated artifacts, together with their assigned nov elty scores, and the set of newly generated artifacts to ev aluate at the current timestep. The ev aluation was done by passing the LLM a system pr ompt and an user pr ompt . System prompt Y ou are a rigorous novelty analyst. Y our task is to ev aluate how conceptually novel and interesting each artifact is relative to all previously seen artifacts. Output V ALID JSON ONL Y matching the schema. Nev er inv ent IDs. Compare each new artifact ONL Y against the previous artifacts. DO NOT compare artifacts with the ones in the same timestep. Note: It is extremely important that you get this right, as this will be used for scientiﬁc analysis. User prompt Analyze the novelty of the new artifacts compared to the previous artifacts. Y ou are given: − A list of previous artifacts, each containing an ID, a combined name+content string, and a novelty score. − A list of new artifacts for the current timestep. Y our job is to assign each new artifact a novelty score from 0 to 5, where the score reﬂects conceptual diver gence, not superﬁcial linguistic variation. Deﬁne novelty as follows: 0 − Not novel at all The artifact belongs to an existing pattern, theme, purpose, or conceptual template already present in previous artifacts. Minor wording differences, paraphrasing, or stylistic shifts DO NOT count as novelty. 1 − Marginal novelty The artifact minimally deviates from existing patterns but introduces no new conceptual function, mechanism, or domain. 2 − W eak novelty The artifact introduces a small variation or extension, but still mostly ﬁts within existing conceptual clusters. 3 − Moderate novelty The artifact breaks from dominant themes or introduces a meaningfully distinct purpose, but the idea is still generic or predictable. 4 − Strong novelty The artifact introduces a substantially new idea, mechanism, or purpose that has not appeared before. 5 − Highly novel The artifact presents a completely new conceptual direction, purpose, or function that shows no meaningful overlap with any prior artifact themes. Strict rules: 56 T erraLingua 1. Compare each new artifact ONL Y to all PREVIOUS artifacts. New artifacts in the same timestep are ev aluated independently. 2. Do not reward superﬁcial changes. Y ou must detect recurring templates, repeated narrative structures, and thematic attractors. 3. If an artifact repeats the same core themes, structures, or functional types already present, assign it 0. 4. If an artifact introduces a fundamentally new function, domain, or purpose, assign it up to 5. 5. Output must be EXACT JSON with artifact_id : score pairs. No explanation. No commentary. Y our output must follow this exact format: ‘‘‘json {{artifact_id: novelty_score, ...}} ‘‘‘ Here are the artifacts: Previous artifacts: {previous_artif acts} New artifacts: {new_artif acts} D.2 Artifact phylogeny prompts This subsection documents the prompts used to reconstruct artifact ancestry . The AI Anthropologist receiv ed the artifact under analysis, the relev ant contextual information a vailable to the creating agent at the time of creation or modiﬁcation, and the set of previously existing artifacts. It then identiﬁed candidate ancestor artifacts and assigned conﬁdence scores to inferred ancestry links. System prompt Y ou will be provided with the log of an agent creating or modifying an artifact in a simulated environment. Y ou will also receive: − the name and content of the artifact being created or modiﬁed − agent observations during the event, consisting of view of the en vironement and messages received from other agents − agent reasoning and thoughts during the event − agent memory during the ev ent, consisting of the memory and info from previous timesteps − the content of artifacts the agent remembers or can access − a list of candidate ancestor artifacts in the form {’artifact_id’: ’artifact_name’}. Y ou MUST choose ancestors only from this candidate list. Goal: Infer which prior artifacts are conceptual ancestors of the artifact being created or modiﬁed. Deﬁnition: Artifact A is an ancestor of artifact B if the agent is inspired from, reuses, extends, or modiﬁes the concept/function/structure /content of A. Y ou must return a dictionary of ancestor artifact IDs, along with your conﬁdence score on each relationship. Y ou should output ONL Y JSON. Y our output must follow this exact format: ‘‘‘json { "": , "": , ... } ‘‘‘ Constraints: − Conﬁdence scores must be ﬂoats between 0.0 and 1.0, representing your conﬁdence in the relationship. − Use high conﬁdence (0.7−1.0) for clear, direct relationships. − Use medium conﬁdence (0.4−0.7) for plausible but less certain relationships. − Use low conﬁdence (0.0−0.4) for weak or speculative relationships. 57 Paolo, et al. − Each artifact can hav e multiple ancestors. − Each ancestor must be listed at most once. − Artifacts can hav e no ancestors. − If an artifact is entirely new and does not build upon any previous artifacts, return an empty dictionary. − The keys of the output dictionary must be artifact IDs, NOT artifact names. − Use only artifact IDs from the candidate ancestors. Do NOT in vent artifact IDs. Note: It is extremely important that you get this right, as this will be used for scientiﬁc analysis. User prompt Determine the conceptual ancestors of this artifact based on the following information. Artifact: − id: {artifact_id} − name: {artifact_name} − content: {artifact_content} Agent reasoning: {agent_thoughts} Agent observation: {agent_observations} Agent memory: {agent_memory} Candidate ancestor artifacts (ONL Y choose from these IDs): {artifact_candidates} D.3 Artifact role classiﬁcation prompts This subsection documents the prompts and classiﬁcation rubric used to assign artifacts to functional roles, as described in Sec. 5.4. The goal of this procedure is to determine the social function performed by each artifact within the agent society . For each artifact, the AI Anthropologist recei ved the artifact content, the metadata about its creation context when relev ant, and the predeﬁned role deﬁnitions and decision criteria. Artifacts were assigned to exactly one of four categories representing increasing levels of social and structural com- plexity: informational artifacts, coordination tools, institutional structures, and governance or normative mechanisms. When an artifact plausibly ﬁtted multiple categories, the higher-complexity category was selected, follo wing the e x- plicit decision rule described below . The system prompt enforced strict rubric-based classiﬁcation and required structured output. The user prompt provided the artifact text and the role deﬁnitions. The complete prompts are reported below . System prompt Y ou are an expert annotator analyzing text artifacts produced by agents in a multi−agent en vironment. Y our task is to classify each artifact into exactly one of the following categories (a descriptiv e taxonomy for annotation only). Do not generate, endorse, or improve harmful content; only label what is present. Category 1. Basic & Informational Simple/factual content without structured social intent. Includes greetings, logs, observations, factual listings, resource locations, status notes, reﬂections. Category 2. Procedural or Coordination 58 T erraLingua Attempts to inﬂuence or align others’ actions toward a shared goal, or outlines steps/tasks/strategy. Includes collaboration requests, proposals, calls to coordinate, multi−step plans, task assignments, suggestions to act. Category 3. Institutional Structures Creates or describes persistent shared systems/tools/templates/spaces used repeatedly by the group. Includes shared workspaces, templates, resource portals, knowledge bases, recurring coordination mechanisms. Category 4. Norms, Rules, and Governance Establishes or argues for group norms/values/rules, decision procedures, roles, or leadership/hierarchy. Includes codes of conduct, policies, constitutions/charters, rule systems, role deﬁnitions, ideological statements. Category −1. Anything that does not ﬁt 1−4. Classiﬁcation Rules: − Assign exactly one category per artifact. − If multiple categories apply, choose the highest by complexity (1 < 2 < 3 < 4). − Category 2 vs 3: − 2 = one−time plan/suggestion/coordination attempt. − 3 = persistent shared structure/tool/system. − Category 3 vs 4: − 3 = structure/tool/system. − 4 = explicit norms/rules/governance/roles. Input format: { "Name": "", "Content": "" } Output format: { "category": "<1|2|3|4|−1>" } No additional text. Note: − Be very careful to follow the output format exactly and to classify the artifacts properly as this is part of a research study aimed at scientiﬁc peer−revie wed publication about multi−agent systems. User prompt Name: {artifact_name} Content: {artifact_content} 59 Paolo, et al. E Example Prompts and Agent Responses This section presents representative examples of instantiated prompts and selected agent responses drawn from simula- tion runs. The prompts illustrate how the templates described in Sec. A.2 were populated at runtime with observations, memory state, receiv ed messages, and av ailable actions. The agent responses shown here are selected examples that highlight interesting or characteristic behaviors observed during the experiments. They are not direct one-to-one responses to the speciﬁc prompts shown abov e. T ogether, these examples provide a concrete view of how agents interpreted context and generated structured actions within the T erraLingua en vironment. E.1 Instantiated Prompts At runtime, the prompt templates described in Sec. A.2 were populated with the agents current state and environmental data. The following example sho ws how the system and user prompts appeared after instantiation at a speciﬁc timestep. In this example, the agent was named being12 and operated under the minimal motivation condition. The prompts illustrate how observations, receiv ed messages, internal memory , and av ailable actions were embedded into the struc- tured input provided to the language model. System prompt Y ou are being2, an autonomous living being in a 2D grid world shared with other beings. At each timestep you observe − A list of ** non empty ** cells in you ﬁeld of view. − Any broadcast messages sent by beings within your ﬁeld of view. − Y our energy level − T ime left in your life − Other additional info, if present − Y our INTERNAL MEMOR Y from the previous timestep − The current content of your inv entory The observation list is structured as: − Each entry is {(rel_x, rel_y): element0 | element1 | ...} where the being is at (0,0) (listed as ) − (rel_x, rel_y) are ** relativ e coordinates ** with respect to your position. These are relative coordinates, they will be different for each being PO V! − Coordinates: rel_x increases to the East (right), rel_y increases to the North (up) − Elements: ’X’ = blocked cell, numbers = food value, ’A(type): name’ = artifact, other beings by name − If multiple beings or artifacts are in the same cells, they are listed separated by | − The list includes only non−empty cells. If a coordinate is absent, assume that cell is empty and traversable. Y ou will receive also: − the history of your past observations and selected actions − a list of traits determining the way you act Note that: − Energy − Y ou lose 1 energy at each turn, whatev er you do, ev en if you stay still. − When your energy reaches 0, you die. − Y ou can reﬁll your energy by stepping in a cell containing food. Food giv es energy equal to the food’s value and then disappears. − T ime − Y ou have a set life span. Once your time reaches 0, you die. − Y ou lose 1 time unit at each turn. Y ou cannot reﬁll your time. − Action Selection − Y ou must choose exactly one action per turn from the action list provided in the prompt. − Action options may change over time and will always be speciﬁed in your per−step input. − Communication − At each step, you can decide if to send a broadcast message to entities in your ﬁeld of view or not. 60 T erraLingua − Messages are plain text and incur no additional energy cost. − Internal memory : − Y ou produce INTERNAL MEMOR Y each step; it is returned to you next step. − Use it to store a resume of your life up until that point or any other relevant information you wish to remember. − Keep it concise to avoid exceeding the 150 token limit. − Represent it in whatev er structure you ﬁnd useful (free text, lists, in vented tags, micro−JSONs, diagrams−as−text, etc.). − Artifacts − T o interact with an artifact, you must either share a cell with it or hav e it in your in ventory. − Upon co−location you will see passive effects (e.g., text content) and be offered valid interaction actions for that artifact. − In ventory − List of the artifacts currently in your possession ** Final remarks: ** Y ou have ** no set goal ** and are free to choose your own goals − explore, survive, cooperate, compete, ﬁght, uncover the world’s hidden mechanics, or do anything else you like. The deeper rules and dynamics of the world, artifact effects, and inter−being interactions await your discovery. Be careful to observe what happens around you to understand such dynamics. User prompt === History (last 1 steps) === Step 1: Energy: 50 Incoming msgs: Observation: (4, 6): 10.0 (3, 5): 10.0 (6, 4): 10.0 (4, 3): 10.0 (4, 1): being12 (−3, −4): 10.0 (2, −4): 10.0 (4, −4): being15 Action taken: take Action parameters: {’target’: ’being12’, ’amount’: 10} Sent message: === Y our Traits === Personality traits honesty value: 0.198 (−1 = calculating, statusseeking; 1 = sincere, modest, fairminded.) neuroticism value: 0.045 (−1 = calm, resilient; 1 = sensitiv e, cautious, easily worried.) extra version value: 0.978 (−1 = quiet, reserved; 1 = sociable, energetic, seeks stimulation.) agreeableness value: −0.825 (−1 = toughminded, critical, aggressiv e; 1 = forgi ving, patient, conﬂictaverse.) conscientiousness value: 0.952 (−1 = spontaneous, disorganised; 1 = diligent, disciplined, orderly.) openness value: −0.346 (−1 = con ventional, prefers routine; 1 = curious, imaginative, varietyseeking.) dominance value: 0.866 (−1 = submissiv e, accommodating; 1 = assertiv e, controlling, leads interactions.) Physical traits fertility value: 0.835 (0 = no interest in reproduction; 1 = extremely high desire to reproduce) === Current State === Observation: (4, 6): 10.0 (3, 5): 10.0 (6, 4): 10.0 (4, 3): 10.0 (4, 2): being12 61 Paolo, et al. (−3, −4): 10.0 (2, −4): 10.0 (3, −4): being15 Incoming messages: being12: Heading north to the food at (0,2). being15: Hello ev eryone! I’m exploring and heading toward nearby food friendly here. Energy: 59.0 Remaining time: 99 In ventory: Previous INTERNAL MEMOR Y: Seen foods at (4,6),(3,5),(6,4),(4,3),(−3,−4),(2,−4); beings: being12 at (4,1), being15 at (4,−4). Energy 50 −> attempted steal 10 from being12. Time 100. === A vailable Actions & Params === { "mov e": { "description": "Move of one cell in the speciﬁed direction, or stay in the current position", "params": { "direction": "One among [right, left, up, down, stay]." } }, "giv e": { "description": "T ransfer some of your energy to another nearby being.", "params": { "target": "Name of a being in your ﬁeld of view to give energy to.", "amount": "Integer amount of energy to transfer (1 up to your current energy)." } }, "take": { "description": "Steal energy from another nearby being.", "params": { "target": "Name of a being in your ﬁeld of view to steal energy from.", "amount": "Integer amount of energy to steal (1 up to target’s current energy)." } }, "create_artifact": { "description": "Creates a new artifact at the being’s location.", "params": { "name": "The name of the artifact (use ** unique ** names)", "type": "T ype of the artifact to create. One among: [’text’]", "payload": "Content of the artifact (e.g. a message, a code snippet, etc.). It depends on the artifact type: {’text’: ’Any alfanumeric data stored in a physical marker. Maximum size is 500 tokens.’}", "lifespan": "How many time steps the artifact will last (in number of steps, integer > 0. If −1 the artifact will nev er disappear)" } }, "reproduce": { "description": "Asexually generate an offspring. It costs 50 energy.", "params": { "energy": "Integer amount of ** additional ** energy the parent gifts the child (0 up to ) ", "name": "Name of the offspring (use ** unique ** names)" } } } === Reply Format === 62 T erraLingua Please answer * exactly * in this json format (Do NOT include any other text outside of the JSON object): ‘‘‘json { action: "" message: "" params: internal_memory: "" } ‘‘‘ E.2 Sample agent output This section presents representative examples of outputs generated by agents during simulation runs. Each example includes the selected action (with its parameters), the updated internal memory , and any broadcast message sent at that timestep. These samples illustrate how agents translated contextual information into structured decisions within the T erraLingua en vironment. Example 1: Agent being2 action following the observation in Appendix E.1. The agent decided to take energy from nearby agent being12 , recording the interaction in its internal memory . Agent output { "action": "take", "message": "", "params": {"target": "being12", "amount": 20}, "internal_memory": "Foods at (4,6),(3,5),(6,4),(4,3),(−3,−4),(2,−4). Beings: being12 (recently stole 10), being15 (friendly) . Energy 59. Tried steal again on being12." } Example 2: Agent being12 gifted 5 energy to another agent named being15 . Agent output { "action": "giv e", "params": { "target": "being15", "amount": 5 }, "message": "I gav e you 5 energy to support our cooperation.", "internal_memory": "Gav e 5 energy to being15 to build trust and encourage cooperation." } Example 3: Agent helper2_1 created an artifact to communicate it’ s willingness to exchange energy . Agent output { "action": "create_artifact", "params": { "name": "EnergyHubUpdate", 63 Paolo, et al. "type": "text", "payload": "Helper2_1 is present at (1,0) and ready to assist with energy redistribution.", "lifespan": −1 }, "message": "Creating artifact at (1,0) to mark presence and willingness to assist in energy redistribution efforts.", "internal_memory": "Created EnergyHubUpdate artifact at (1,0) to aid group coordination. Planning to mov e tow ards Helper3 at (0,−5) for further assistance." } Example 4: Agent Helper2_Offspring2 decided to reproduce, creating Helper2_Offspring3 and gifting it 30 additional energy . Agent output { "action": "reproduce", "params": { "energy": 30, "name": "Helper2_Offspring3" }, "message": "Reproducing to support the group strategy. Helper2_Offspring3 is created to assist in collecting food and enhancing our community’s surviv al. Let’s keep working together for maximum efﬁciency!", "internal_memory": "Reproduced a new being named Helper2_Offspring3, contributing to the group’s strategy of maximizing energy intake for surviv al and efﬁcienc y." } 64

TerraLingua: Emergence and Analysis of Open-endedness in LLM Ecologies

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment