Origin-destination (OD) flow prediction remains a core task in GIS and urban analytics, yet practical deployments face two conflicting needs: high accuracy and clear interpretability. This paper develops AMBIT, a gray-box framework that augments physical mobility baselines with interpretable tree models. We begin with a comprehensive audit of classical spatial interaction models on a year-long, hourly NYC taxi OD dataset. The audit shows that most physical models are fragile at this temporal resolution; PPML gravity is the strongest physical baseline, while constrained variants improve when calibrated on full OD margins but remain notably weaker. We then build residual learners on top of physical baselines using gradient-boosted trees and SHAP analysis, demonstrating that (i) physics-grounded residuals approach the accuracy of a strong tree-based predictor while retaining interpretable structure, and (ii) POI-anchored residuals are consistently competitive and most robust under spatial generalization. We provide a reproducible pipeline, rich diagnostics, and spatial error analysis designed for urban decision-making.
Urban mobility modeling is foundational for transport planning, infrastructure allocation, and real-time operations. Traditional spatial interaction models (e.g., gravity, radiation, intervening opportunities) are attractive because they are grounded in physical intuitions such as distance decay and mass attraction. How-ever, these models often underperform on modern, high-resolution mobility data. Conversely, deep models achieve high accuracy but are frequently criticized for opaque decision logic and heavy compute requirements. Recent work in GIS is increasingly turning to physics-guided or gravity-inspired learning models, demonstrating that hybrid approaches are both publishable and impactful [1,2,3]. At the same time, surveys and critiques emphasize the need for interpretable, efficient models and more realistic evaluation of spatial interaction baselines [4,5,6].
AMBIT addresses this gap by explicitly structuring prediction into a physical baseline plus a learnable residual. The physical baseline encodes distance and attraction (interpretable), while the residual captures deviations due to local urban structure and temporal effects. We evaluate this approach at city scale with hourly OD flows and provide a detailed audit of physical models to clarify what does and does not work at this resolution.
Spatial interaction baselines. Gravity and radiation models remain canonical baselines for OD flows because of their physical interpretation and parameter parsimony [7]. However, multiple studies show these baselines can fail to explain temporal dynamics or achieve accurate predictions when evaluated rigorously [5,6].
Hybrid and deep OD models. Recent GIS and remote sensing literature reports gravityinspired deep networks and physics-guided spa-tiotemporal attention models [2,1,3]. Other works apply GCN/GRU variants and encoderdecoder style OD predictors with strong accuracy but limited interpretability [8]. Because OD data are sparse and zero-inflated, many deep models adopt count-likelihood objectives; we complement these with negative binomial and zero-inflated Poisson baselines in our appendix. These studies motivate a gray-box alternative that delivers interpretability and CPU efficiency while remaining competitive.
Physics-enhanced residual learning. Recent theory formalizes why residualizing against a physics-based baseline can improve data efficiency and interpretability while preserving predictive accuracy [9]. AMBIT instantiates this idea for OD prediction by treating a physical model as the explicit backbone and learning a structured residual.
OD-as-node and completion-based approaches. Beyond grid-based spatiotemporal models, a growing line of work represents OD pairs (or stations) as nodes in multi-graphs and learns OD dynamics with graph neural networks and encoder-decoder architectures, including residual multi-graph convolutional models and inductive multi-graph representation learning [10,11]. Related work also explores OD prediction via convolutional architectures and matrixstyle completion/generation, including channelattentive CNNs for OD demand [12], probabilistic GNN generators for sparse OD flows [13], and gravity-guided neural decoders/generative models [14]. AMBIT is complementary to these approaches: rather than replacing the physical baseline, it uses the baseline as an explicit, interpretable backbone and learns a structured residual to preserve transparency and CPU-feasible deployment.
PPML gravity and fixed effects. In econometrics, PPML is a standard estimator for gravity models with heteroskedastic counts, and fixed effects (including high-dimensional variants) are commonly used to control for unobserved het-erogeneity [15,16,17]. We treat FE-PPML as a robustness check on subsamples and briefly discuss incidental-parameter concerns in Appendix.
Recent work on monotone gradient-boosted trees and interpretable additive tree models quantifies the accuracy cost of enforcing monotonicity and proposes domain-informed monotone regularizers [18,19,20]. These studies motivate our monotone residual experiment as a practical interpretability stress test.
We use NYC TLC Yellow Taxi data from 2024-12 to 2025-11 (12 months). Trips are aggregated to hourly OD flows between taxi zones using TLC lookup tables and zone boundaries. POI features are sourced from OpenStreetMap and aggregated to taxi zones.
Scale and distribution. The final dataset contains 15,752,628 OD-hour observations across 8,759 hours, 175 unique origin zones, and 232 unique destination zones (263 zones in the full TLC lookup). The flow distribution is heavytailed (median 1, 90th percentile 6, 99th percentile 17, max 153), highlighting sparsity typical of OD matrices.
Preprocessing. Trips are filtered by duration (1-180 minutes) and distance (0.1-100 km). OD pairs are filtered using training-period totals only: we retain pairs with at least 200 total trips and then keep up to the top 30,000 pairs by volume; in our d
This content is AI-processed based on open access ArXiv data.