Going NUTS with ADVI: Exploring various Bayesian Inference techniques with Facebook Prophet

Going NUTS with ADVI: Exploring various Bayesian Inference techniques with Facebook Prophet
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Since its introduction, Facebook Prophet has attracted positive attention from both classical statisticians and the Bayesian statistics community. The model provides two built-in inference methods: maximum a posteriori estimation using the L-BFGS-B algorithm, and Markov Chain Monte Carlo (MCMC) sampling via the No-U-Turn Sampler (NUTS). While exploring various time-series forecasting problems using Bayesian inference with Prophet, we encountered limitations stemming from the inability to apply alternative inference techniques beyond those provided by default. Additionally, the fluent API design of Facebook Prophet proved insufficiently flexible for implementing our custom modeling ideas. To address these shortcomings, we developed a complete reimplementation of the Prophet model in PyMC, which enables us to extend the base model and evaluate and compare multiple Bayesian inference methods. In this paper, we present our PyMC-based implementation and analyze in detail the implementation of different Bayesian inference techniques. We consider full MCMC techniques, MAP estimation and Variational inference techniques on a time-series forecasting problem. We discuss in details the sampling approach, convergence diagnostics, forecasting metrics as well as their computational efficiency and detect possible issues which will be addressed in our future work.


💡 Research Summary

The paper addresses two major shortcomings of the original Facebook Prophet library: (1) the limited set of inference algorithms (only MAP via L‑BFGS‑B and full‑Bayesian sampling via NUTS) and (2) a fluent‑API that makes it difficult to modify or extend the underlying decomposable time‑series model. To overcome these issues, the authors re‑implemented Prophet in the probabilistic programming framework PyMC. The re‑implementation mirrors the mathematical formulation of Prophet—trend g(t), Fourier seasonality s(t; p, c), optional holidays h(t), and Gaussian noise ε(t)—but exposes each component as a separate, object‑oriented class (LinearTrend, FourierSeasonality, Holiday, etc.). Operator overloading allows users to combine components with familiar arithmetic expressions (e.g., trend * (1 + yearly + weekly)), providing a clear, modular API that is far more extensible than the original library.

With this new implementation, the authors compared four families of Bayesian inference techniques on a real‑world dataset: the daily log‑view counts of the Wikipedia page for Peyton Manning (two years for training, one year for forecasting). The inference methods evaluated were:

  1. Full MCMC – Metropolis‑Hastings (MH), No‑U‑Turn Sampler (NUTS), and a Z‑score‑adapted Differential‑Evolution Metropolis (DMZ).
  2. Maximum A Posteriori (MAP) – L‑BFGS‑B optimization.
  3. Variational Inference (VI) – mean‑field ADVI (diagonal Gaussian) and Full‑Rank ADVI (full covariance Gaussian).

For each method the authors reported convergence diagnostics (R‑hat, effective sample size, autocorrelation), forecasting performance metrics (MSE, RMSE, MAE, MAPE), and computational cost (wall‑clock time). Key findings include:

  • NUTS achieved excellent convergence (R‑hat ≈ 1.0, ESS > 3 000) with only 2 000 post‑warm‑up samples per chain, delivering stable forecasts comparable to those obtained after drawing the full 2 000 samples.
  • MH and DMZ required one million samples to approach reasonable ESS, yet still fell short of convergence (R‑hat > 1.01, ESS < 400) and exhibited strong autocorrelation, indicating inefficient exploration of the posterior. Interestingly, DMZ produced the lowest MSE among all methods despite its lack of convergence, highlighting a disconnect between diagnostic metrics and predictive quality.
  • MAP was the fastest (seconds) but, as expected, provided only a point estimate and no uncertainty quantification.
  • ADVI converged after roughly 80 000 iterations, delivering the best predictive accuracy among VI methods (MSE ≈ 0.709, MAPE ≈ 0.086) with a runtime far shorter than full MCMC. Full‑Rank ADVI offered a richer posterior approximation at the cost of additional computation, but its predictive metrics were slightly worse than mean‑field ADVI in this particular experiment.

The authors conclude that the PyMC‑based Prophet implementation dramatically improves flexibility, allowing researchers to experiment with a broader suite of inference algorithms and to tailor the model structure to specific needs. While NUTS remains the gold standard for accurate posterior inference, its computational burden makes it less suitable for rapid prototyping. ADVI emerges as a practical alternative when speed is essential and an approximate posterior suffices. MAP retains value for quick baseline checks.

Future work outlined includes extending the model to multivariate time series with external regressors, integrating automatic hyper‑parameter tuning, exploring richer variational families (e.g., normalizing flows, Stein variational gradient descent), and leveraging GPU‑accelerated or distributed MCMC to handle larger datasets. The modular API and open‑source PyMC implementation lay a solid foundation for these extensions, promising a more versatile and research‑friendly Prophet ecosystem.


Comments & Academic Discussion

Loading comments...

Leave a Comment