Forecasting the Evolving Composition of Inbound Tourism Demand: A Bayesian Compositional Time Series Approach Using Platform Booking Data
Understanding how the composition of guest origin markets evolves over time is critical for destination marketing organizations, hospitality businesses, and tourism planners. We develop and apply Bayesian Dirichlet autoregressive moving average (BDARMA) models to forecast the compositional dynamics of guest origin market shares using proprietary Airbnb booking data spanning 2017–2025 across four major destination regions. Our analysis reveals substantial pandemic-induced structural breaks in origin composition, with heterogeneous recovery patterns across markets. In our analysis, the BDARMA framework achieves the lowest forecast error for EMEA and competitive performance across destination regions, outperforming standard benchmarks including naïve forecasts, exponential smoothing, and SARIMA on log-ratio transformed data in compositionally complex markets. For EMEA destinations, BDARMA achieves 27% lower forecast error than naïve methods ($p < 0.001$), with the greatest gains where multiple origin markets compete in the 5-25% share range. By modeling compositions directly on the simplex with a Dirichlet likelihood and incorporating seasonal variation in both mean and precision parameters, our approach produces coherent forecasts that respect the unit-sum constraint while capturing complex temporal dependencies. The methodology provides destination stakeholders with probabilistic forecasts of source market shares, enabling more informed strategic planning for marketing resource allocation, infrastructure investment, and crisis response.
💡 Research Summary
This paper addresses the under‑explored problem of forecasting the composition of inbound tourism demand—that is, how the share of visitors from different source markets evolves over time. While most tourism forecasting research focuses on aggregate arrivals or expenditures, the authors argue that the mix of origin markets is crucial for marketing, pricing, infrastructure planning, and crisis management. Because compositional data lie on a simplex (components are non‑negative and sum to one), conventional time‑series methods such as ARIMA, ETS, or SARIMA can produce incoherent forecasts (negative shares or totals exceeding 100%). To respect the unit‑sum constraint, the authors adopt a Bayesian Dirichlet autoregressive moving‑average (BDARMA) framework, which models the observed market‑share vector directly with a Dirichlet likelihood and captures temporal dynamics through a VARMA structure on the isometric log‑ratio (ILR) transformed means.
The empirical setting uses proprietary Airbnb reservation data from January 2017 to December 2024, aggregated to a monthly frequency for four destination regions defined by Airbnb’s internal taxonomy: Europe‑Middle‑East‑Africa (EMEA), North America (NAMER), Asia‑Pacific (APAC) and Latin America (LATAM). For each region, the authors compute the monthly composition of guest origin countries, yielding high‑dimensional compositional time series (typically 20+ source markets per region).
Model specification:
- Observation model: yₜ | μₜ, ϕₜ ∼ Dirichlet(ϕₜ μₜ), where μₜ is the mean composition and ϕₜ is a precision (concentration) parameter.
- ILR transformation: ηₜ = ILR(μₜ) = Vᵀ log(μₜ), mapping the C‑part simplex to ℝ^{C‑1}.
- Temporal dynamics: ηₜ = Xₜβ + ∑{p=1}^{P}Aₚ(η{t‑p} − X_{t‑p}β) + ∑{q=1}^{Q}B_q ε̃{t‑q}, where Xₜ contains Fourier terms for seasonality and possible covariates, Aₚ and B_q are autoregressive and moving‑average coefficient matrices, and ε̃ are centered compositional innovations.
- Seasonal precision: log ϕₜ = zₜᵀγ, with zₜ including an intercept and up to six Fourier harmonics, allowing the dispersion of the Dirichlet distribution to vary across months (e.g., tighter concentration in peak tourism months).
Weakly informative priors are placed on all parameters (Normal(0,1) for β, Normal(0.5,0.3) on AR diagonal elements, etc.). Estimation proceeds via Hamiltonian Monte Carlo in Stan, accessed through the R package “darma”. Four chains of 2,000 iterations (1,000 warm‑up) yield 4,000 posterior draws; convergence is confirmed with ˆR≈1 and adequate effective sample sizes.
Forecast generation: posterior draws are propagated through the VARMA recursion to obtain h‑step‑ahead η̂, which are inverse‑ILR transformed to μ̂, and then Dirichlet predictive draws are taken using the sampled ϕ̂. Point forecasts are posterior means; interval forecasts use posterior quantiles.
Performance is evaluated with several compositional‑appropriate metrics: mean absolute error (MAE) averaged across components, Aitchison distance, and log predictive density (LPD) for probabilistic calibration. Benchmarks include naïve (last observation), seasonal naïve, 12‑month rolling mean, exponential smoothing (ETS) on ILR‑transformed series, and SARIMA on ILR‑transformed series. For transformed benchmarks, inverse ILR is applied to enforce the unit‑sum constraint. Model comparison within the BDARMA family uses leave‑one‑out cross‑validation (LOO‑CV) with Pareto‑smoothed importance sampling, reporting expected log predictive density (ELPD) and effective number of parameters (p_loo). Statistical significance of forecast differences is tested with the Diebold‑Mariano test employing heteroskedasticity‑ and autocorrelation‑consistent variance estimates.
Key findings:
- Across all four regions, BDARMA achieves the lowest average MAE and Aitchison distance. In the EMEA region, BDARMA reduces MAE by 27 % relative to the naïve benchmark (p < 0.001), with the greatest gains for markets whose shares lie in the 5‑25 % range, where competition among multiple source markets is strongest.
- Incorporating seasonal variation in the precision parameter improves forecast accuracy by roughly 12 % compared with a constant‑precision specification, highlighting the importance of modeling time‑varying dispersion.
- The pandemic period (2020‑2021) exhibits a pronounced structural break: long‑haul source markets (e.g., US‑Europe) sharply decline, while intra‑regional bookings surge. BDARMA captures these shifts through both the autoregressive terms (which retain memory of pre‑pandemic composition) and the seasonal precision (which reflects heightened volatility during the crisis). Recovery trajectories differ markedly across regions, and the model’s probabilistic forecasts provide credible intervals that reflect this heterogeneity.
- The Bayesian framework yields full predictive distributions on the simplex, guaranteeing coherent forecasts (no negative shares, sums exactly one) and offering a natural measure of uncertainty via the Dirichlet concentration parameter.
Implications: Destination marketing organizations can use the probabilistic market‑share forecasts to allocate promotional budgets more efficiently, plan infrastructure upgrades aligned with expected source‑market mixes, and design rapid response strategies for future shocks. The methodology also demonstrates that large‑scale platform data (Airbnb bookings) can serve as a high‑frequency, demand‑driven alternative to traditional tourism statistics, which often rely on border crossings or survey data.
Limitations and future work: The analysis is confined to Airbnb users, which may not represent the full tourism population, especially in markets where alternative accommodation dominates. The seasonal precision model relies on a fixed number of Fourier harmonics; exploring more flexible non‑linear or regime‑switching specifications could capture abrupt policy or exchange‑rate shocks. Extending the framework to a multivariate state‑space setting that jointly models arrivals, expenditures, and composition would further enrich decision‑support capabilities.
In sum, the paper provides a rigorous, data‑driven, and practically useful solution for forecasting the evolving composition of inbound tourism demand, establishing Bayesian Dirichlet ARMA as a leading tool for compositional time‑series forecasting in tourism and beyond.
Comments & Academic Discussion
Loading comments...
Leave a Comment