Terra Nova: A Comprehensive Challenge Environment for Intelligent Agents
📝 Abstract
We introduce Terra Nova, a new comprehensive challenge environment (CCE) for reinforcement learning (RL) research inspired by Civilization V. A CCE is a single environment in which multiple canonical RL challenges (e.g., partial observability, credit assignment, representation learning, enormous action spaces, etc.) arise simultaneously. Mastery therefore demands integrated, long-horizon understanding across many interacting variables. We emphasize that this definition excludes challenges that only aggregate unrelated tasks in independent, parallel streams (e.g., learning to play all Atari games at once). These aggregated multitask benchmarks primarily asses whether an agent can catalog and switch among unrelated policies rather than test an agent’s ability to perform deep reasoning across many interacting challenges.
💡 Analysis
We introduce Terra Nova, a new comprehensive challenge environment (CCE) for reinforcement learning (RL) research inspired by Civilization V. A CCE is a single environment in which multiple canonical RL challenges (e.g., partial observability, credit assignment, representation learning, enormous action spaces, etc.) arise simultaneously. Mastery therefore demands integrated, long-horizon understanding across many interacting variables. We emphasize that this definition excludes challenges that only aggregate unrelated tasks in independent, parallel streams (e.g., learning to play all Atari games at once). These aggregated multitask benchmarks primarily asses whether an agent can catalog and switch among unrelated policies rather than test an agent’s ability to perform deep reasoning across many interacting challenges.
📄 Content
We introduce Terra Nova1 , a new comprehensive challenge environment (CCE) for reinforcement learning (RL) research inspired by Civilization V (Firaxis Games, 2010). A CCE is a single environment in which multiple canonical RL challenges (e.g., partial observability, credit assignment, representation learning, enormous action spaces, etc.) arise simultaneously. Mastery therefore demands integrated, long-horizon understanding across many interacting variables. We emphasize that this definition excludes challenges that only aggregate unrelated tasks in independent, parallel streams (e.g., learning to play all Atari games at once). These aggregated multitask benchmarks primarily asses whether an agent can catalog and switch among unrelated policies rather than test an agent’s ability to perform deep reasoning across many interacting challenges.
The purpose of CCEs is distinct from the purpose of many environments used in RL studies today. Today’s environments generally attempt to isolate one specific challenge such that small, targeted studies on that challenge can occur fruitfully. We note that such environments are useful research tools. However, a CCE’s purpose is to serve as a yardstick for progress and to highlight shortcomings of current general intelligence research. We assert that this research direction is important, as challenges rarely appear in isolation in real-world scenarios.
The use of CCEs in RL research has an influential history that led to significant advances. For example, research in StarCraft II (Vinyals et al., 2017), Dota 2 (Berner et al., 2019), and NetHack (Küttler et al., 2020) has spurred innovation in search, planning, self-play, and other core areas of RL and control. CCEs are important for research because they expose the limitations of methods optimized for narrow tasks. As the field looks toward developing more capable and general agents, identifying new environments that meaningfully extend the CCE frontier, such as Terra Nova, is essential.
Terra Nova is inspired by Civilization V, a turn-based 4X strategy game 2 whose breadth of mechanics makes it a challenging testbed for general intelligence. Playing Terra Nova competently requires reasoning over a large set of diverse information streams and controlling hundreds of heterogeneous endpoints simultaneously. For example, agents must reason over a partially-observable map and disentangle multi-timescale credit assignments while searching vast hierarchical action-spaces to control units, cities, trade routes, diplomatic Figure 1: An example procedurally-generated Terra Nova map. The map is a central landmass surrounded by ocean and made of hexagonal tiles. The landmass is filled with various terrain types (e.g., desert, plains, grassland), features (e.g., oases, flood plains, jungles), elevation (e.g., flatland, hills, mountains), resources, water features, natural wonders, and more. For more information on maps, see the documentation here: https://trevormcinroe.github.io/terra_nova_environment#maps-mech relations, and more. Perhaps most challenging of all, agents must continually assess a game state that includes five opponents to determine which of the many, mutually-exclusive victory paths they are most likely to achieve.
The remainder of this document proceeds as follows. First, we outline some of the challenges in Terra Nova and compare it with other current CCEs ( §2). Second, we briefly cover previous work in RL that has targeted aspects of Civilization as an environment ( §3). Third, we formalize Terra Nova as a stochastic game ( §4). Finally, we cover several key features of the Terra Nova software ( §5).
Terra Nova provides a unique combination of challenges that sets it apart from other CCEs used previously. Also, Terra Nova’s win mechanics differentiate it from all CCEs we examine. In Table 1, we compare Terra Nova with Starcraft 2 (Vinyals et al., 2017), Dota 2 (Berner et al., 2019), Craftax (Matthews et al., 2024), NetHack (Küttler et al., 2020), NeuralMMO (Suarez et al., 2019), and Diplomacy (Paquette et al., 2019). Below, we detail the characteristics shown in Table 1 and give examples of how Terra Nova stands out. Then, we briefly describe additional challenges that agents face in a Terra Nova game.
Opponent structure describes, in part, the competition dynamics. For example, “1v1” implies that either one agent/team wins or the other. A “singleplayer” game is one that includes no opponents that can win the game. A “1vMany” game is a free-for-all game; i.e., more than two independent parties are trying to win the game. Terra Nova is a “1vMany” game. enable competing agents to cooperate on several facets. For example, agents can form trade deals, where resources, gold, or peace promises are exchanged. These trade deals are a core factor in growing an empire, and agents who do not trade quickly fall behind those who do.
Partial observability refers to settings in which global game state information is not
This content is AI-processed based on ArXiv data.