Bayesian Shrinkage in High-Dimensional VAR Models: A Comparative Study
High-dimensional vector autoregressive (VAR) models offer a versatile framework for multivariate time series analysis, yet face critical challenges from over-parameterization and uncertain lag order. In this paper, we systematically compare three Bayesian shrinkage priors (horseshoe, lasso, and normal) and two frequentist regularization approaches (ridge and nonparametric shrinkage) under three carefully crafted simulation scenarios. These scenarios encompass (i) overfitting in a low-dimensional setting, (ii) sparse high-dimensional processes, and (iii) a combined scenario where both large dimension and overfitting complicate inference. We evaluate each method in quality of parameter estimation (root mean squared error, coverage, and interval length) and out-of-sample forecasting (one-step-ahead forecast RMSE). Our findings show that local-global Bayesian methods, particularly the horseshoe, dominate in maintaining accurate coverage and minimizing parameter error, even when the model is heavily over-parameterized. Frequentist ridge often yields competitive point forecasts but underestimates uncertainty, leading to sub-nominal coverage. A real-data application using macroeconomic variables from Canada illustrates how these methods perform in practice, reinforcing the advantages of local-global priors in stabilizing inference when dimension or lag order is inflated.
💡 Research Summary
This paper investigates the challenges of over‑parameterization and lag‑order uncertainty in high‑dimensional vector autoregressive (VAR) models and conducts a systematic comparison of five shrinkage approaches: three Bayesian priors (horseshoe, Bayesian lasso, and a normal prior that mimics ridge) and two frequentist regularization methods (ridge regression and a non‑parametric James‑Stein‑type shrinkage). The authors design three Monte‑Carlo experiments that differ in dimensionality and the degree of over‑fitting: (1) a low‑dimensional setting with an intentionally oversized lag order, (2) a high‑dimensional sparse setting where the fitted lag order matches the true order, and (3) a high‑dimensional setting with both a large number of variables and an oversized lag order. Each scenario is replicated 50 times, with 180 observations for training and 20 for out‑of‑sample forecasting.
The Bayesian models are estimated in Stan using four parallel Hamiltonian Monte Carlo chains (2 000 iterations each, 500 warm‑up). All three priors share a common LKJ‑based prior for the error covariance matrix, allowing for correlated innovations. The horseshoe prior employs a local‑global hierarchy (β_j = λ_j τ β_raw,j, λ_j ∼ C⁺(0,1), τ ∼ C⁺(0,1)), the Bayesian lasso uses a Laplace prior (β_j | η ∼ Laplace(0,η)), and the normal prior places a global N(0,1) distribution on each coefficient. Posterior means serve as point estimates, and 95 % credible intervals are derived from the 2.5 % and 97.5 % quantiles.
Frequentist ridge regression is implemented via glmnet with a fixed penalty λ = 0.1, while the non‑parametric shrinkage method follows Giannone et al. (2015) and is accessed through the VARshrink R package. To obtain standard errors and confidence intervals for these methods, the authors apply a block bootstrap that respects the time‑series dependence (blocks of length four, 30 replications).
Performance is evaluated on two fronts: (i) parameter recovery (root‑mean‑square error, coverage probability of 95 % intervals, and average interval length) and (ii) one‑step‑ahead forecast accuracy (RMSE). The results are clear and consistent across all designs. The horseshoe prior delivers the lowest parameter RMSE and achieves coverage rates close to the nominal 95 % level, even when the model is heavily over‑parameterized. Its intervals are slightly wider than those of the normal prior, reflecting a more realistic quantification of uncertainty. The Bayesian lasso performs intermediate: it reduces RMSE relative to the normal prior but yields modestly lower coverage because its Laplace tails are not as heavy as the horseshoe’s Cauchy tails. The normal prior produces the narrowest intervals but suffers from severe under‑coverage (often below 80 %) in over‑fitted scenarios, indicating over‑confidence.
Ridge regression excels in forecast RMSE, frequently matching or surpassing the Bayesian methods, especially in the high‑dimensional well‑specified and over‑fitted cases. However, its confidence intervals are overly tight, leading to coverage rates between 65 % and 75 %. The non‑parametric shrinkage method is computationally efficient but displays poor coverage (often below 60 %) and moderate parameter RMSE, suggesting it is less reliable when the true data‑generating process is complex.
A real‑data application uses eight Canadian macro‑economic series (e.g., unemployment, inflation, GDP growth) over 120 monthly observations. VAR models with lag orders from one to four are estimated. The empirical findings mirror the simulation outcomes: the horseshoe prior remains robust to lag over‑specification, maintaining near‑nominal coverage and reasonable interval widths, while ridge delivers the best point forecasts for lag orders one and two but under‑estimates uncertainty for lag four.
The authors conclude that local‑global Bayesian shrinkage, particularly the horseshoe prior, offers a superior balance between bias reduction, variance control, and uncertainty quantification in high‑dimensional VAR contexts. Frequentist ridge is valuable for point forecasting but should be complemented with methods that better capture posterior uncertainty when inference on coefficients or impulse responses is required. The study underscores the practical advantage of employing horseshoe‑type priors when analysts face ambiguous lag selection or when the number of variables approaches or exceeds the sample size.
Comments & Academic Discussion
Loading comments...
Leave a Comment