Estimating hourly population distribution change at high spatiotemporal resolution in urban areas using geo-tagged tweets, land use data, and dasymetric maps

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

This paper introduces a spatiotemporal analysis framework for estimating hourly changing population distribution in urban areas using geo-tagged tweets (the messages containing users’ physical locations), land use data, and dasymetric maps. We collected geo-tagged social media (tweets) within the County of San Diego during one year (2015) by using Twitter’s Streaming Application Programming Interfaces (APIs). A semi-manual Twitter content verification procedure for data cleaning was applied first to separate tweets created by humans and non-human users (bots). The next step is to calculate the number of unique Twitter users every hour with the two different geographical units: (1) census blocks, and (2) 1km by 1km resolution grids of LandScan. The final step is to estimate actual dynamic population by transforming the numbers of unique Twitter users in each census block or grid into estimated population densities with spatial and temporal variation factors. A temporal factor was based on hourly frequency changes of unique Twitter users within San Diego County, CA. A spatial factor was estimated by using the dasymetric method with land use maps and 2010 census data. Several comparison maps were created to visualize the spatiotemporal pattern changes of dynamic population distribution.

💡 Research Summary

The paper presents a novel spatiotemporal framework for estimating hourly population distribution in an urban environment by fusing geo‑tagged Twitter data, land‑use information, and dasymetric mapping techniques. The study area is San Diego County, California, and the data collection period spans the full calendar year of 2015. Using the Twitter Streaming API, the authors harvested 7,884,806 tweets, of which 2,927,301 (≈38 %) were geo‑located inside the county and retained for analysis after discarding tweets lacking coordinates or falling outside the spatial boundary.

A two‑stage data‑cleaning procedure was applied. First, obvious spam, bot, and “cyborg” accounts were identified through the “source” field in the tweet metadata and a curated blacklist (e.g., job‑posting services, automated news feeds). Approximately 13 % of the raw geo‑tweets were classified as noise and removed, leaving 2,546,385 tweets for further processing.

The core metric is the count of unique Twitter user IDs per spatial unit per hour. The authors deliberately count each user only once per hour within a given polygon, thereby approximating the number of distinct individuals present rather than the volume of messages. Temporal patterns were examined separately for weekdays (Monday–Friday) and weekends (Saturday–Sunday). Weekday activity shows a trough from midnight to 4 am, a gradual rise through the morning, a midday plateau, and a peak around 20:00 h. Weekend activity is generally higher, with a pronounced midday peak near 14:00 h, reflecting leisure and event‑driven behavior.

Two spatial aggregations are employed. (1) U.S. Census blocks, the smallest administrative units used by the decennial census, typically contain fewer than 3,000 residents and align with Traffic Analysis Zones (TAZ) used in evacuation planning. (2) LandScan 1 km × 1 km raster cells, which provide a 24‑hour ambient population estimate for the United States. Because some downtown LandScan cells exceed the 3,000‑person threshold, the authors apply a quadtree subdivision: cells are recursively split into four quadrants until each sub‑cell satisfies the population cap, ensuring compatibility with TAZ‑based models.

To translate raw Twitter counts into realistic population densities, the authors introduce a two‑factor scaling model: a temporal factor (T) derived from the hourly proportion of unique users relative to the daily total, and a spatial factor (S) derived from dasymetric redistribution of 2010 census counts using land‑use categories (residential, commercial, institutional, etc.). The final estimated density for a polygon i at hour h is:

Population̂(i,h) = UniqueUsers(i,h) × T(h) × S(i)

where S(i) reflects the proportion of census population assigned to the land‑use class of polygon i. This approach preserves the fine‑grained spatial heterogeneity of land‑use while adjusting for temporal fluctuations captured by Twitter activity.

Visualization of the results demonstrates that the 8 pm hour, which exhibits the highest Twitter activity, produces a spatial pattern closely resembling the 2010 census distribution in residential zones, yet also highlights non‑residential hotspots such as Balboa Park, the San Diego Zoo, major shopping malls, and the international airport—areas where traditional nighttime census counts would underestimate presence. Comparisons with LandScan reveal that the raster’s coarser resolution smooths out these localized peaks, confirming the advantage of the block‑level dasymetric approach for urban‑scale analyses.

A case study of Qualcomm Stadium illustrates the method’s ability to capture event‑driven population dynamics. Weekday tweets peak around 18:00 h, matching typical evening games, while weekend peaks shift earlier (12:00–17:00 h) in line with scheduled events.

The authors acknowledge several limitations: (i) Twitter users are not a random sample of the population, leading to demographic bias; (ii) bot detection is imperfect and may leave residual noise; (iii) the study relies on a single year of data, limiting assessment of inter‑annual variability; and (iv) the dasymetric model assumes static land‑use classifications, which may change over time. Future work is proposed to integrate additional data streams (mobile call detail records, GPS traces, Wi‑Fi logs), employ machine‑learning calibration to correct systematic biases, and develop real‑time dashboards for emergency management and transportation planning.

In summary, the paper demonstrates that geo‑tagged social media, when combined with land‑use based dasymetric mapping and appropriate temporal scaling, can produce high‑resolution, hourly population estimates that complement traditional census and remote‑sensing products, offering valuable insight for urban planners, disaster responders, and researchers studying human mobility.

Estimating hourly population distribution change at high spatiotemporal resolution in urban areas using geo-tagged tweets, land use data, and dasymetric maps

💡 Research Summary

Comments & Academic Discussion

Leave a Comment