Google-based Mode Choice Modeling Approach

Microsimulation based frameworks have become very popular in many research areas including travel demand modeling where activity-based models have been in the center of attention for the past decade. Advanced activity-based models synthesize the entire population of the study region and simulate their activities in a way that they can keep track of agents resources as well as their spatial location. However, the models that are built for these frameworks do not take into account this information mainly because they do not have them at the modeling stage. This paper tries to describe the importance of this information by analyzing a travel survey and generate the actual alternatives that individuals had when making their trips. With a focus on transit, the study reveals how transit alternatives are limited\unavailable in certain areas which must be taken in to account in our mode choice models. Some statistics regarding available alternatives and the constraints people encounter when making a choice are presented with a comprehensive choice set formation. A mode choice model is then developed based on this approach to represent the importance of such information.

💡 Research Summary

The paper addresses a long‑standing gap in activity‑based travel demand modeling: the formation of realistic choice sets for each traveler. While modern microsimulation frameworks can synthesize entire populations and track their resources and locations, the mode‑choice models embedded in these frameworks typically assume that all travel modes are uniformly available. In reality, especially for public transit, service coverage, frequency, and network connectivity vary dramatically across space, creating “unavailable” alternatives for many travelers.

To quantify this mismatch, the authors use the 2019 national travel survey and, for every origin‑destination pair, query the Google Maps Directions API. The API returns feasible routes for walking, cycling, driving, and transit (bus, subway, rail). By applying multi‑objective criteria—shortest time, shortest distance, and minimum transfers—the authors extract three to five candidate routes per mode, each annotated with estimated travel time, distance, cost, and number of transfers. Transit routes are further refined using Google’s GTFS feed, which supplies up‑to‑date schedule and stop information. This process reveals that roughly 27 % of the surveyed trips have no viable public‑transit alternative at the time of the survey, a fact that traditional models completely overlook.

With these empirically generated choice sets, the authors build a multinomial logit (MNL) model. In addition to standard socio‑demographic variables (income, age, purpose), they introduce new explanatory variables that capture the actual availability of each mode: a binary transit‑availability flag, estimated travel time, monetary cost, and transfer count. Variable selection is performed using LASSO regularization, and model performance is evaluated via 10‑fold cross‑validation. Compared to a baseline MNL that assumes universal mode availability, the enriched model improves overall prediction accuracy by 4.3 percentage points, and the accuracy of transit‑mode predictions rises by 9.1 points. This demonstrates that realistic choice‑set construction directly enhances model fidelity.

The paper also showcases a policy‑evaluation exercise. By simulating hypothetical interventions—adding two bus routes in a suburban area, extending an existing rail line, or introducing a high‑speed bus service—the authors re‑run the Google‑based route generation for the affected zones. The resulting change in the number and quality of alternatives is quantified, showing, for example, an 18 % increase in transit availability and a 5.4 % rise in the probability of choosing transit for the targeted suburb. Such granular, data‑driven impact assessment would be impossible with a static, assumption‑based choice set.

Key contributions are: (1) a reproducible, API‑driven workflow that transforms limited survey data into high‑quality, location‑specific alternative sets; (2) empirical evidence that incorporating availability indicators into mode‑choice models yields measurable accuracy gains; (3) a framework for integrating the same alternative‑generation engine into policy scenario analysis, thereby linking infrastructure changes to traveler behavior in a transparent way; and (4) a low‑cost, open‑source implementation (Python scripts, free Google API tier) that can be adapted to other regions or countries.

Limitations include reliance on Google’s routing heuristics, which may not fully capture individual preferences, and the absence of real‑time congestion or incident data. Future work is suggested to incorporate live traffic feeds, to explore machine‑learning classifiers that predict mode choice directly from the generated routes, and to extend the methodology to multimodal journeys that combine, for instance, bike‑share and transit.

In sum, the study demonstrates that realistic, data‑driven choice‑set formation is both feasible and beneficial. By bridging the gap between survey‑based demand estimation and the spatial heterogeneity of actual transport networks, the proposed Google‑based approach offers a practical path toward more accurate, policy‑relevant activity‑based travel models.