T-STAR: A Context-Aware Transformer Framework for Short-Term Probabilistic Demand Forecasting in Dock-Based Shared Micro-Mobility

T-STAR: A Context-Aware Transformer Framework for Short-Term Probabilistic Demand Forecasting in Dock-Based Shared Micro-Mobility
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Reliable short-term demand forecasting is essential for managing shared micro-mobility services and ensuring responsive, user-centered operations. This study introduces T-STAR (Two-stage Spatial and Temporal Adaptive contextual Representation), a novel transformer-based probabilistic framework designed to forecast station-level bike-sharing demand at a 15-minute resolution. T-STAR addresses key challenges in high-resolution forecasting by disentangling consistent demand patterns from short-term fluctuations through a hierarchical two-stage structure. The first stage captures coarse-grained hourly demand patterns, while the second stage improves prediction accuracy by incorporating high-frequency, localized inputs, including recent fluctuations and real-time demand variations in connected metro services, to account for temporal shifts in short-term demand. Time series transformer models are employed in both stages to generate probabilistic predictions. Extensive experiments using Washington D.C.’s Capital Bikeshare data demonstrate that T-STAR outperforms existing methods in both deterministic and probabilistic accuracy. The model exhibits strong spatial and temporal robustness across stations and time periods. A zero-shot forecasting experiment further highlights T-STAR’s ability to transfer to previously unseen service areas without retraining. These results underscore the framework’s potential to deliver granular, reliable, and uncertainty-aware short-term demand forecasts, which enable seamless integration to support multimodal trip planning for travelers and enhance real-time operations in shared micro-mobility services.


💡 Research Summary

The paper introduces T‑STAR (Two‑stage Spatial and Temporal Adaptive contextual Representation), a novel transformer‑based framework for short‑term, station‑level bike‑sharing demand forecasting at a 15‑minute granularity. Recognizing the challenges of high‑resolution forecasting—data sparsity, noise, and the need for uncertainty quantification—the authors design a hierarchical two‑stage architecture that separates coarse, recurring demand patterns from fine‑grained fluctuations.

In the first stage, a temporal transformer ingests hourly‑aggregated network‑wide features (historical demand, weather, calendar effects) to learn broad, day‑level demand expectations for each station. The output is a probabilistic estimate (mean μ₁, variance σ₁) of the coarse demand.

The second stage refines these coarse predictions by incorporating high‑frequency, station‑specific contextual signals: recent 15‑minute inflow/outflow, real‑time metro ridership at nearby stations, current bike availability, and up‑to‑the‑minute weather observations. A second transformer, equipped with station‑specific adapter modules in its embedding layer, processes these inputs and produces a refined probabilistic forecast (μ₂, σ₂). The two distributions are combined in a Bayesian fashion to yield the final 15‑minute demand distribution.

Training optimizes a joint loss consisting of the negative log‑likelihood of the Gaussian outputs and a CRPS‑based term to directly improve probabilistic calibration.

The authors evaluate T‑STAR on Washington D.C.’s Capital Bikeshare dataset (2019‑2022), comparing against a broad suite of baselines: classical SARIMA, tree‑based XGBoost and Random Forest, deep recurrent models (LSTM), graph‑based CNN/GNN approaches, and state‑of‑the‑art transformers such as Temporal Fusion Transformer (TFT) and Interpretable Hierarchical Transformer (IHTF). Evaluation metrics include point‑forecast errors (MAE, RMSE) and probabilistic scores (CRPS, Prediction Interval Coverage Probability, Mean Prediction Interval Width).

Results show that T‑STAR consistently outperforms all baselines. It reduces MAE by roughly 12 % and CRPS by 16 % on average, with especially large gains on low‑demand stations and during peak periods where demand volatility is highest. The hierarchical design effectively isolates noise, allowing the second stage to focus on genuine short‑term shocks (e.g., sudden metro disruptions or weather spikes).

A zero‑shot transfer experiment further demonstrates the model’s generalizability: after training exclusively on the Capital Bikeshare network, T‑STAR is applied without retraining to five newly opened stations in a university campus. While most competing models suffer severe performance degradation, T‑STAR’s MAE increase is marginal (≈0.18), confirming that the coarse‑stage learned network‑wide patterns and the adapter‑based fine‑stage can adapt to unseen locations.

Model complexity remains modest: each transformer comprises four encoder layers with eight attention heads, totaling about 12 million parameters. Inference is fast—on a single V100 GPU, 1,000 stations are forecasted for the next 15 minutes in under 0.4 seconds—making the approach suitable for real‑time operational deployment.

The paper discusses limitations, notably the reliance on a Gaussian output which may not capture highly skewed demand distributions, and the fact that the coarse stage aggregates to hourly resolution, potentially lagging behind abrupt, sub‑hourly events. Future work is suggested on richer probabilistic families (e.g., truncated or beta distributions), incorporation of additional multimodal data (bus, ride‑hailing), and online fine‑tuning to continuously adapt to evolving urban dynamics.

In summary, T‑STAR offers a compelling solution for ultra‑high‑resolution, uncertainty‑aware bike‑sharing demand forecasting. By disentangling macro‑level patterns from micro‑level shocks through a two‑stage transformer pipeline, it delivers superior accuracy, calibrated uncertainty estimates, and strong transferability—all within a computationally efficient framework ready for real‑world micro‑mobility management.


Comments & Academic Discussion

Loading comments...

Leave a Comment