Privacy-Preserving Synthetic Dataset of Individual Daily Trajectories for City-Scale Mobility Analytics

Reading time: 5 minute
...

📝 Original Info

  • Title: Privacy-Preserving Synthetic Dataset of Individual Daily Trajectories for City-Scale Mobility Analytics
  • ArXiv ID: 2512.17239
  • Date: 2025-12-19
  • Authors: Jun’ichi Ozaki, Ryosuke Susuta, Takuhiro Moriyama, Yohei Shida

📝 Abstract

Urban mobility data are indispensable for urban planning, transportation demand forecasting, pandemic modeling, and many other applications; however, individual mobile phone-derived Global Positioning System traces cannot generally be shared with third parties owing to severe re-identification risks. Aggregated records, such as origin-destination (OD) matrices, offer partial insights but fail to capture the key behavioral properties of daily human movement, limiting realistic city-scale analyses. This study presents a privacy-preserving synthetic mobility dataset that reconstructs daily trajectories from aggregated inputs. The proposed method integrates OD flows with two complementary behavioral constraints: (1) dwell-travel time quantiles that are available only as coarse summary statistics and (2) the universal law for the daily distribution of the number of visited locations. Embedding these elements in a multi-objective optimization framework enables the reproduction of realistic distributions of human mobility while ensuring that no personal identifiers are required. The proposed framework is validated in two contrasting regions of Japan: (1) the 23 special wards of Tokyo, representing a dense metropolitan environment; and (2) Fukuoka Prefecture, where urban and suburban mobility patterns coexist. The resulting synthetic mobility data reproduce dwell-travel time and visit frequency distributions with high fidelity, while deviations in OD consistency remain within the natural range of daily fluctuations. The results of this study establish a practical synthesis pathway under real-world constraints, providing governments, urban planners, and industries with scalable access to high-resolution mobility data for reliable analytics without the need for sensitive personal records, and supporting practical deployments in policy and commercial domains.

💡 Deep Analysis

📄 Full Content

Location data serve as an indispensable foundation for decision-making across diverse domains, including urban transportation, logistics, optimization of commercial activities, disaster prevention, and finance [1], [2]. Its economic value is substantial, and it has been estimated that the Global Positioning System (GPS) generated approximately USD 1.4 trillion in cumulative economic benefits for the U.S. private sector This work was supported in part by GEOTRA Co., Ltd. through a collaborative research agreement.

† These authors contributed equally to this work. ‡ Corresponding author: shida@sk.tsukuba.ac.jp between 1984 and 2017 [3]. Furthermore, market forecasts indicate that revenues in the downstream Global Navigation Satellite System (GNSS) market, which includes devices and services, reached EUR 260 billion in 2023 and are expected to double to EUR 580 billion by 2033 [4]. Within this expansive market, mobile phone-derived location data have attracted particular attention in recent years, owing to their broad population coverage and high spatio-temporal resolution. In addition to established applications, such as urban transportation planning and commercial site selection, there are expanding uses in new domains, including credit scoring and insurance underwriting [5]. However, high-precision GPS data, which can capture detailed individual trajectories at scale, entail significant re-identification risks. Furthermore, when firms provide such data to third parties, they must operate under strict constraints imposed by national privacy laws. Many countries classify mobile phone-derived location data as personal or sensitive personal information and require data providers to implement measures that effectively eliminate reidentification risks [6]- [9]. Service providers typically combine spatial coarsening (e.g., aggregation into 1 km grids or administrative units), temporal coarsening (e.g., rounding to time intervals or days), and anonymization techniques (e.g., kanonymity or geomasking) to de-identify data before providing it to third parties [10]. However, previous research has shown that as few as four spatio-temporal points can uniquely identify an individual [11], demonstrating that anonymization alone cannot sufficiently prevent re-identification risks [12]. As a result, coarse-grained data not only hinder organizational operations and detailed analyses but also fail to provide adequate privacy protection. Consequently, companies and municipalities cannot obtain high-resolution individual-level data, including attributes such as age, sex, and behavioral sequences, which makes it difficult to improve the accuracy of advanced urban planning and demand forecasting.

As a promising approach to overcoming these constraints, synthetic mobility data, defined as data that simulates statistically consistent virtual individuals and their daily trajectories, has recently gained increasing attention in both academia [13], [14] and industry [15]. Synthetic data refers to artificially generated data that preserve the statistical and structural properties of the observed datasets. In the context of human mobility, such data enable the use of detailed trajectories while safeguarding privacy. In domains such as healthcare and finance, such data has already been widely adopted [16], [17]. The synthetic data market is expected to grow at an annual average rate of approximately 30% until 2030 [18], [19] and is increasingly regarded as a foundation for enabling detailed analysis and planning under strict privacy constraints. In the human mobility domain, a representative early industrial example is ‘Replica in the United States’, where synthetic mobility data has been widely utilized by U.S. municipalities for transportation policies and commercial site strategies [15]. However, many existing approaches depend heavily on specific database schemas or use cases, making it difficult to guarantee performance outside these contexts.

GEOTRA Co., Ltd. provides synthetic mobility data that include one-day mobility trajectories and activity information annotated with sex and age attributes [20]. The data have been adopted by municipalities (including the Tokyo Metropolitan Government) and companies in diverse industries, such as construction, electric power, finance, transportation, and manufacturing, and have been used in urban policy-making and business strategy. The company was established as a joint venture between the KDDI Corporation and Mitsui & Co., Ltd. [21], [22]. Under strict privacy policies mandated by parent companies, although the final outputs are synthetic, the inputs are restricted to origin-destination (OD) matrices aggregated by sex and age from mobile phone-derived GPSbased mobility data collected by KDDI. Consequently, the consistency criteria used in the generation process are skewed toward the OD metrics, making it difficult to thoroughly validate multiple indicators that capture human-like behaviora

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut