Thinking on Maps: How Foundation Model Agents Explore, Remember, and Reason Map Environments

Reading time: 5 minute
...

📝 Original Info

  • Title: Thinking on Maps: How Foundation Model Agents Explore, Remember, and Reason Map Environments
  • ArXiv ID: 2512.24504
  • Date: 2025-12-30
  • Authors: ** - Zhiwei WEI (a, b) - Yuxing LIU (a) - Hua LIAO (a, b) - Wenjia XU (c) – Corresponding author (xuwenjia@bupt.edu.cn) Affiliations a. Hunan Normal University, School of Geographic Sciences, Changsha, China b. Hunan Key Laboratory of Geospatial Big Data Mining and Application, Changsha, China c. School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing, China **

📝 Abstract

Map environments provide a fundamental medium for representing spatial structure. Understanding how foundation model (FM) agents understand and act in such environments is therefore critical for enabling reliable map-based reasoning and applications. However, most existing evaluations of spatial ability in FMs rely on static map inputs or text-based queries, overlooking the interactive and experience-driven nature of spatial understanding.In this paper, we propose an interactive evaluation framework to analyze how FM agents explore, remember, and reason in symbolic map environments. Agents incrementally explore partially observable grid-based maps consisting of roads, intersections, and points of interest (POIs), receiving only local observations at each step. Spatial understanding is then evaluated using six kinds of spatial tasks. By systematically varying exploration strategies, memory representations, and reasoning schemes across multiple foundation models, we reveal distinct functional roles of these components. Exploration primarily affects experience acquisition but has a limited impact on final reasoning accuracy. In contrast, memory representation plays a central role in consolidating spatial experience, with structured memories particularly sequential and graph-based representations, substantially improving performance on structure-intensive tasks such as path planning. Reasoning schemes further shape how stored spatial knowledge is used, with advanced prompts supporting more effective multi-step inference. We further observe that spatial reasoning performance saturates across model versions and scales beyond a certain capability threshold, indicating that improvements in map-based spatial understanding require mechanisms tailored to spatial representation and reasoning rather than scaling alone.

💡 Deep Analysis

📄 Full Content

Thinking on Maps: How Foundation Model Agents Explore, Remember, and Reason Map Environments Zhiwei WEI a,b, Yuxing LIUa, Hua LIAOa,b, Wenjia XUc aHunan Normal University, School of Geographic Sciences, Hunan Changsha, China; bHunan Key Laboratory of Geospatial Big Data Mining and Application, Hunan Changsha, China; cSchool of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing, China. Address for correspondence: Wenjia XU. E-mail: xuwenjia@bupt.edu.cn

2 of 4

Thinking on Maps: How Foundation Model Agents Explore, Remember, and Reason Map Environments

Abstract: Map environments provide a fundamental medium for representing spatial structure and supporting navigation, planning, and geographic analysis. Understanding how foundation model (FM) agents acquire, consolidate, and utilize spatial knowledge in such environments is therefore critical for enabling reliable map-based reasoning and applications. However, most existing evaluations of spatial ability in FMs rely on static map inputs or text- based queries, overlooking the interactive and experience-driven nature of spatial understanding. As a result, the mechanisms through which spatial knowledge emerges during exploration and is later leveraged for reasoning remain insufficiently understood. In this paper, we propose an interactive evaluation framework to analyze how FM agents explore, remember, and reason in symbolic map environments. Agents incrementally explore partially observable grid-based maps consisting of roads, intersections, and points of interest (POIs), receiving only local observations at each step. Spatial understanding is then evaluated using a suite of tasks, including direction judgment, distance estimation, proximity judgment, POI density recognition, and path planning. By systematically varying exploration strategies, memory representations, and reasoning schemes across multiple foundation models, we reveal distinct functional roles of these components. Exploration primarily affects experience acquisition but has a limited impact on final reasoning accuracy. In contrast, memory representation plays a central role in consolidating spatial experience, with structured memories particularly sequential and graph-based representations, substantially improving performance on structure-intensive tasks such as path planning. Reasoning schemes further shape how stored spatial knowledge is used, with advanced prompts supporting more effective multi-step inference. Case-based analyses of reasoning traces show that structured memory and reasoning prompts help repair spatial reasoning failures by enabling explicit spatial reconstruction rather than heuristic guessing. We further observe that spatial reasoning performance saturates across model versions and scales beyond a certain capability threshold, indicating that improvements in map-based spatial understanding require mechanisms tailored to spatial representation and reasoning rather than scaling alone. Overall, this work advances experience-driven evaluation of spatial cognition in FM agents and provides insights for designing more reliable map-based reasoning systems. Keywords: Foundation model agents; Spatial cognition; Interactive map exploration; Memory representation; Spatial reasoning.

3 of 4

  1. Introduction As foundation models (FMs) and FM agents are increasingly applied in real-world scenarios, such as navigation assistance (Espada et al., 2025) and embodied task planning (Zhai et al., 2025). The need to assess their ability to understand, reason about, and operate within spatial environments has grown rapidly (Xu et al., 2024; Cai et al., 2025). For example, researchers in computer science have begun to explore this challenge in physical or perceptual spaces. They use images, videos, and embodied simulations to test how FMs or FM agents perceive and act in spatially grounded tasks. Several benchmarks have also been developed for this purpose, including VSI-Bench (Yang et al., 2025b), EgoSchema (Mangalam et al., 2023), and OpenEQA (Majumdar et al., 2024). However, geographic or map space is another crucial type of spatial environment. It represents real-world structures into symbolic and organized forms, supporting applications such as geographic analysis, environmental planning, and spatial communication (Ni & Wang, 2024). Therefore, understanding how FMs interpret and reason in such map-based contexts has also become an important emerging topic (Wang et al., 2024; Janowicz et al., 2025). In response, the GIS community has initiated several efforts to evaluate the ability of spatial understanding in FMs through geographic data or map-based tasks. One major research direction centers on language-based spatial reasoning, where models are tested through tasks such as question answering, geographic knowledge extraction, spatial relation analysis, and code gener

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut