Map environments provide a fundamental medium for representing spatial structure. Understanding how foundation model (FM) agents understand and act in such environments is therefore critical for enabling reliable map-based reasoning and applications. However, most existing evaluations of spatial ability in FMs rely on static map inputs or text-based queries, overlooking the interactive and experience-driven nature of spatial understanding.In this paper, we propose an interactive evaluation framework to analyze how FM agents explore, remember, and reason in symbolic map environments. Agents incrementally explore partially observable grid-based maps consisting of roads, intersections, and points of interest (POIs), receiving only local observations at each step. Spatial understanding is then evaluated using six kinds of spatial tasks. By systematically varying exploration strategies, memory representations, and reasoning schemes across multiple foundation models, we reveal distinct functional roles of these components. Exploration primarily affects experience acquisition but has a limited impact on final reasoning accuracy. In contrast, memory representation plays a central role in consolidating spatial experience, with structured memories particularly sequential and graph-based representations, substantially improving performance on structure-intensive tasks such as path planning. Reasoning schemes further shape how stored spatial knowledge is used, with advanced prompts supporting more effective multi-step inference. We further observe that spatial reasoning performance saturates across model versions and scales beyond a certain capability threshold, indicating that improvements in map-based spatial understanding require mechanisms tailored to spatial representation and reasoning rather than scaling alone.
Thinking on Maps: How Foundation Model Agents Explore,
Remember, and Reason Map Environments
Zhiwei WEI a,b, Yuxing LIUa, Hua LIAOa,b, Wenjia XUc
aHunan Normal University, School of Geographic Sciences, Hunan Changsha, China;
bHunan Key Laboratory of Geospatial Big Data Mining and Application, Hunan Changsha,
China; cSchool of Information and Communication Engineering, Beijing University of Posts
and Telecommunications, Beijing, China.
Address for correspondence: Wenjia XU. E-mail: xuwenjia@bupt.edu.cn
2 of 4
Thinking on Maps: How Foundation Model Agents Explore,
Remember, and Reason Map Environments
Abstract: Map environments provide a fundamental medium for representing spatial
structure and supporting navigation, planning, and geographic analysis. Understanding how
foundation model (FM) agents acquire, consolidate, and utilize spatial knowledge in such
environments is therefore critical for enabling reliable map-based reasoning and applications.
However, most existing evaluations of spatial ability in FMs rely on static map inputs or text-
based queries, overlooking the interactive and experience-driven nature of spatial
understanding. As a result, the mechanisms through which spatial knowledge emerges during
exploration and is later leveraged for reasoning remain insufficiently understood. In this
paper, we propose an interactive evaluation framework to analyze how FM agents explore,
remember, and reason in symbolic map environments. Agents incrementally explore partially
observable grid-based maps consisting of roads, intersections, and points of interest (POIs),
receiving only local observations at each step. Spatial understanding is then evaluated using
a suite of tasks, including direction judgment, distance estimation, proximity judgment, POI
density recognition, and path planning. By systematically varying exploration strategies,
memory representations, and reasoning schemes across multiple foundation models, we
reveal distinct functional roles of these components. Exploration primarily affects experience
acquisition but has a limited impact on final reasoning accuracy. In contrast, memory
representation plays a central role in consolidating spatial experience, with structured
memories particularly sequential and graph-based representations, substantially improving
performance on structure-intensive tasks such as path planning. Reasoning schemes further
shape how stored spatial knowledge is used, with advanced prompts supporting more
effective multi-step inference. Case-based analyses of reasoning traces show that structured
memory and reasoning prompts help repair spatial reasoning failures by enabling explicit
spatial reconstruction rather than heuristic guessing. We further observe that spatial
reasoning performance saturates across model versions and scales beyond a certain capability
threshold, indicating that improvements in map-based spatial understanding require
mechanisms tailored to spatial representation and reasoning rather than scaling alone. Overall,
this work advances experience-driven evaluation of spatial cognition in FM agents and
provides insights for designing more reliable map-based reasoning systems.
Keywords: Foundation model agents; Spatial cognition; Interactive map exploration;
Memory representation; Spatial reasoning.
3 of 4
- Introduction
As foundation models (FMs) and FM agents are increasingly applied in real-world scenarios,
such as navigation assistance (Espada et al., 2025) and embodied task planning (Zhai et al.,
2025). The need to assess their ability to understand, reason about, and operate within spatial
environments has grown rapidly (Xu et al., 2024; Cai et al., 2025). For example, researchers
in computer science have begun to explore this challenge in physical or perceptual spaces.
They use images, videos, and embodied simulations to test how FMs or FM agents perceive
and act in spatially grounded tasks. Several benchmarks have also been developed for this
purpose, including VSI-Bench (Yang et al., 2025b), EgoSchema (Mangalam et al., 2023),
and OpenEQA (Majumdar et al., 2024). However, geographic or map space is another crucial
type of spatial environment. It represents real-world structures into symbolic and organized
forms, supporting applications such as geographic analysis, environmental planning, and
spatial communication (Ni & Wang, 2024). Therefore, understanding how FMs interpret and
reason in such map-based contexts has also become an important emerging topic (Wang et
al., 2024; Janowicz et al., 2025).
In response, the GIS community has initiated several efforts to evaluate the ability of
spatial understanding in FMs through geographic data or map-based tasks. One major
research direction centers on language-based spatial reasoning, where models are tested
through tasks such as question answering, geographic knowledge extraction, spatial relation
analysis, and code gener
This content is AI-processed based on open access ArXiv data.