Variational Inference for Bayesian MIDAS Regression

V ariational Inference for Ba y esian MID AS Regression Luigi Simeone ∗ F ebruary 2026 Abstract W e develop a Co ordinate Ascent V ariational Inference (CA VI) algorithm for Ba y esian Mixed Data Sampling (MID AS) regression with linear w eight parameteri- zations. The mo del separates impact co eﬃcien ts from w eigh ting function parameters through a normalization constraint, creating a bilinear structure that renders generic Hamiltonian Monte Carlo samplers unreliable while preserving conditional conju- gacy exploitable b y CA VI. Eac h v ariational up date admits a closed-form solution: Gaussian for regression co eﬃcien ts and weigh t parameters, Inv erse-Gamma for the error v ariance. The algorithm propagates uncertaint y across blocks through second momen ts, distinguishing it from naive plug-in appro ximations. In a Monte Carlo study spanning 21 data-generating conﬁgurations with up to 50 predictors, CA VI pro duces posterior means nearly identical to a block Gibbs sampler b enchmark while ac hieving sp eedups of 107 × to 1 , 772 × (T able 9). Generic automatic diﬀeren- tiation VI (AD VI), by con trast, pro duces bias 7–14 times larger while b eing orders of magnitude slow er, conﬁrming the v alue of mo del-sp eciﬁc deriv ations. W eight function parameters main tain excellen t calibration (co v erage ab o ve 92%) across all conﬁgurations. Impact co eﬃcien t credible in terv als exhibit the underdisp ersion c haracteristic of mean-ﬁeld appro ximations, with cov erage declining from 89% to 55% as the n um b er of predictors gro ws—a do cumen ted trade-oﬀ b etw een sp eed and in terv al calibration that structured v ariational methods can address. An empirical application to realized volatilit y forecasting on S&P 500 daily returns conﬁrms that CA VI and Gibbs sampling yield virtually identical p oint forecasts, with CA VI completing eac h mon thly estimation in under 10 milliseconds. Keyw ords: MID AS regression, V ariational inference, CA VI, Ba yesian econometrics, Mixed-frequency data, Realized volatilit y JEL Classiﬁcation: C11, C53, C55, C63 ∗ Indep enden t Researcher. Contact: luigi.simeone@tec h-managemen t.net. 1 1 In tro duction Mixed Data Sampling (MIDAS) regression, introduced b y Ghysels et al. (2006), provides a parsimonious framew ork for incorp orating high-frequency data in to low-frequency pre- diction equations. The approach has b ecome standard in macro economic now casting and ﬁnancial v olatilit y forecasting, where daily or in traday observ ations carry information ab out monthly or quarterly aggregates. Bay esian treatments of MID AS mo dels oﬀer nat- ural uncertain t y quan tiﬁcation and regularization through prior distributions—features particularly v aluable when the num b er of high-frequency predictors is mo derate to large (Mogliani and Simoni, 2021; K ohns and P otjagailo, 2025). P osterior inference in Bay esian MIDAS mo dels relies almost exclusiv ely on Mark ov Chain Monte Carlo (MCMC) metho ds. The canonical framew ork of Chan et al. (2025) emplo ys precision-based sampling (Chan and Jeliazk o v, 2009) for time-v arying parameter MID AS with linear w eigh t parameterizations. Petten uzzo et al. (2016) use a four-blo c k Gibbs sampler extending Kim-Shephard-Chib metho ds, while Mogliani and Simoni (2021) dev elop adaptiv e MCMC for hierarchical MID AS with 134 predictors. These metho ds are asymptotically exact but computationally intensiv e: eac h posterior ev aluation requires thousands of sampling iterations, and mixing eﬃciency degrades as the parameter space gro ws. V ariational inference (VI) oﬀers a fundamentally diﬀeren t approac h, recasting p oste- rior appro ximation as an optimization problem (Blei et al., 2017). Rather than generating dep enden t samples from the p osterior, VI seeks the closest mem b er of a tractable distribu- tion family b y maximizing the evidence lo wer bound (ELBO). The resulting deterministic algorithm con v erges in a fraction of the time required by MCMC, pro ducing b oth p oin t estimates and appro ximate uncertaint y quan tiﬁcation. When conditional conjugacy is un- a v ailable, automatic diﬀeren tiation VI (Kucukelbir et al., 2017) pro vides a generic black- b o x implemen tation using stochastic gradients; when the mo del structure p ermits, co or- dinate ascen t metho ds yield deterministic closed-form up dates. In the mixed-frequency econometrics literature, Gefang et al. (2020) demonstrated that v ariational Bay es metho ds for large mixed-frequency V ARs pro duce accurate results while b eing orders of magnitude faster than MCMC. Their subsequent work (Gefang et al., 2023) extended VI to V ARs with horseshoe, LASSO, and stochastic volatilit y sp eciﬁcations. Chan and Y u (2022) con- tributed global Gaussian v ariational appro ximations for log-v olatilities in large Ba yesian V ARs, and Loaiza-May a et al. (2022) dev elop ed structured v ariational appro ximations for time-v arying parameter V AR mo dels that substan tially improv e up on mean-ﬁeld metho ds. Despite this progress in mixed-frequency V ARs, v ariational inference has nev er b een applied to MID AS regression mo dels . This gap is metho dologically signiﬁcant. MID AS regressions are structurally distinct from mixed-frequency V ARs: the separation b et ween w eigh ting function parameters θ j (con trolling how high-frequency observ ations 2 are aggregated) and impact co eﬃcien ts β j (con trolling how much eac h predictor matters) generates a bilinear pro duct β j · ˜ x ( j ) t ( θ j ) that is absen t in the V AR setting. This bilinear structure has three consequences for p osterior computation that motiv ate the present pap er. First , generic Hamiltonian Monte Carlo (HMC) samplers—including the widely used No-U-T urn Sampler (NUTS; Hoﬀman and Gelman, 2014)—fail on this mo del. The bilin- ear pro duct creates a p osterior geometry analogous to Neal’s funnel (Neal, 2003): when | β j | is large, the posterior is tigh tly concen trated in θ j ; when | β j | is near zero, θ j is nearly uniden tiﬁed. This curv ature v ariation across the parameter space causes NUTS tra jectories to div erge, pro ducing biased estimates and unreliable diagnostics. Our imple- men tation of NUTS for the MIDAS mo del consisten tly exhibited numerical divergences and non-con v ergen t chains, conﬁrming that oﬀ-the-shelf HMC is not a viable option. Se c ond , w ell-designed blo c k Gibbs samplers—which condition on one parameter blo c k at a time, collapsing the bilinear term to a linear regression—do pro duce v alid p osterior samples. Ho w ev er, the alternating conditional structure creates auto correlation b et w een successiv e dra ws that worsens with the n um b er of predictors. In our Monte Carlo exp er- imen ts, the minimum eﬀectiv e sample size (ESS) of the Gibbs sampler drops from 4,103 at J = 1 to 539 at J = 10 and further to 134 at J = 25 , indicating that an increasing fraction of computational eﬀort pro duces redundant samples. Thir d , the v ery conditional structure that enables Gibbs sampling—conditional on θ j , the mo del is linear in ( α, β ) ; conditional on ( α, β ) , it is linear in eac h θ j —also enables Co ordinate Ascen t V ariational Inference (CA VI) with closed-form up dates. This is the cen tral observ ation of the pap er. The linear weigh t parameterization in tro duced b y Chan et al. (2025)—using Almon p olynomials, F ourier series, or B-splines as basis functions— main tains conditional conjugacy across all parameter blo c ks. W e exploit this to derive a CA VI algorithm where every up date step has an analytic solution: Gaussian for regression co eﬃcien ts and weigh t parameters, In v erse-Gamma for the error v ariance. Con tribution. This pap er makes three contributions. First, w e dev elop the ﬁrst v aria- tional inference algorithm for MIDAS regression, deriving closed-form CA VI up dates that handle the bilinear structure through a mean-ﬁeld factorization with v ariance-corrected momen ts. The up dates propagate uncertaint y across blo c ks via second moments— sp eciﬁcally , the v ariance of β j en ters the precision matrix of η j , and the v ariance of η j en ters the design matrix for ξ —pro ducing more calibrated estimates than naiv e plug-in approac hes. The co v ariance correction in the weigh t parameter update, arising from the joint treatment of ( α, β ) , is a technical contribution sp eciﬁc to the MID AS bilinear setting. Second, w e pro vide a comprehensive Monte Carlo ev aluation across 21 data-generating conﬁgurations spanning J ∈ { 1 , 3 , 5 , 10 , 25 , 50 } predictors, T ∈ { 50 , 100 , 200 , 400 } obser- 3 v ations, v arying signal-to-noise ratios, w eigh t proﬁle shap es, basis function t yp es, and frequency ratios. CA VI p osterior means are nearly identical to Gibbs: bias diﬀ ers by less than 0.03 across conﬁgurations, including J = 25 where 50 Gibbs replications conﬁrm close agreement. Computational sp eedups range from 107 × ( J = 25 ) to 1 , 772 × ( J = 1 ) in the baseline tier (T able 1), reaching do wn to 136 × under stress-test conditions with K = 65 lags (T able 9), and grow with the n um b er of predictors—precisely where MCMC b ecomes most exp ensiv e. Generic ADVI, by con trast, produces bias 7–14 times larger while being 2,000–100,000 times slo wer, conﬁrming that mo del-speciﬁc deriv ations are essen tial. W eight function parameters ( η ) maintain cov erage ab o v e 92% in all conﬁgura- tions. Impact co eﬃcien t ( β ) credible in terv als exhibit the underdisp ersion characteristic of mean-ﬁeld VI, with cov erage declining from 89% ( J = 1 ) to 55% ( J = 50 )—a pattern w e explain through the interaction b et w een dimension-dep endent cross-blo c k correlations and the mean-ﬁeld factorization. Third, we demonstrate the metho d on a canonical empirical application: forecasting realized v olatilit y of the S&P 500 using daily squared returns in a MIDAS-R V sp eciﬁcation (Gh ysels et al., 2006). Over 187 out-of-sample months, CA VI and Gibbs produce virtually iden tical forecasts (relative MSE diﬀerence below 0.01%), with CA VI completing each estimation in under 10 milliseconds v ersus 1–2.5 seconds for Gibbs. P ositioning. This pap er o ccupies a sp eciﬁc nic he in the literature. Relative to Chan et al. (2025), we adopt their mo del sp eciﬁcation—the linear parameterization and n ull- space reparameterization of the normalization constrain t—but replace their precision- based MCMC with v ariational inference. Relativ e to Gefang et al. (2020), w e translate v ariational metho ds from mixed-frequency V ARs to MID AS regression, handling the bi- linear β · θ structure and normalization constraint that are sp eciﬁc to MIDAS and absen t in V ARs. Relativ e to generic ADVI (Kucuk elbir et al., 2017), our mo del-sp eciﬁc up dates are b oth orders of magnitude faster and substantially more accurate (Section 4.5), b ecause they exploit conditional conjugacy that sto c hastic gradient metho ds cannot lev erage. The pap er pro ceeds as follows. Section 2 sp eciﬁes the Ba y esian MID AS mo del. Sec- tion 3 derives the CA VI algorithm. Section 4 rep orts the Monte Carlo study . Section 5 presen ts the empirical application. Section 6 concludes. 2 Ba y esian MID AS Mo del 2.1 Observ ation equation Consider a lo w-frequency dependent v ariable y t , t = 1 , . . . , T , and J high-frequency pre- dictors. F or predictor j , denote the vector of K j high-frequency observ ations asso ciated with lo w-frequency p eriod t as x ( j ) t = ( x ( j ) t, 0 , . . . , x ( j ) t,K j − 1 ) ⊤ ∈ R K j . The MIDAS regression 4 mo del is y t = α + J X j =1 β j ˜ x ( j ) t ( θ j ) + ε t , ε t iid ∼ N (0 , σ 2 ) , t = 1 , . . . , T , (1) where α is the in tercept, β j is the impact co eﬃcien t for predictor j , and the MID AS aggregation is ˜ x ( j ) t ( θ j ) = K j − 1 X k =0 w j ( k ; θ j ) x ( j ) t,k . (2) The separation of β j (ho w m uch predictor j matters) from θ j (ho w its high-frequency lags are w eigh ted) is a deﬁning feature of MIDAS. 2.2 Linear w eigh t parameterization The original MIDAS sp eciﬁcation of Ghysels et al. (2006) emplo ys nonlinear weigh ting functions—exp onen tial Almon or b eta p olynomials—that create computational diﬃcul- ties: the p osterior conditionals of θ j do not b elong to standard families, ruling out Gibbs sampling and requiring particle ﬁl ters (Sc h umac her, 2014). F oroni et al. (2015) demon- strated that unrestricted lag p olynomials estimated by OLS can p erform comparably for small frequency ratios, motiv ating the shift tow ard linear weigh t sp eciﬁcations. F ollowing Chan et al. (2025), we adopt a linear parameterization : w j ( k ; θ j ) = P j X p =1 θ j,p ϕ p ( k ) = ϕ ( k ) ⊤ θ j , (3) where ϕ 1 ( · ) , . . . , ϕ P j ( · ) are predetermined basis functions and θ j ∈ R P j . Our primary sp eciﬁcation uses Almon polynomials : ϕ p ( k ) = k p − 1 . As a robustness chec k, we also emplo y cubic B-splines with uniformly spaced knots. Using (3), the observ ation equation b ecomes y t = α + J X j =1 β j x ( j ) ⊤ t Φ j θ j + ε t , (4) where Φ j ∈ R K j × P j is the basis matrix. The pro duct β j · ( x ( j ) ⊤ t Φ j θ j ) is bilinear in the unkno wn parameters β j and θ j —the source of b oth the computational challenge and the opp ortunit y . Remark 1 (Negative w eigh ts) . The line ar p ar ameterization p ermits ne gative weights for c ertain values of θ j , unlike nonline ar sp e ciﬁc ations. This is a tr ade-oﬀ: c omputational tr actability in exchange for r elaxe d e c onomic interpr etability. In pr actic e, estimate d weight pr oﬁles ar e wel l-b ehave d, as c onﬁrme d by our Monte Carlo and empiric al r esults. 5 2.3 Normalization constrain t and reparameterization The MID AS mo del in (4) suﬀers from a scale indeterminacy: for any c  = 0 , the trans- formation ( β j , θ j ) 7→ ( cβ j , θ j /c ) lea ves the likelihoo d in v ariant. The standard resolution imp oses that weigh ts sum to unity: c ⊤ j θ j = 1 , where c j = Φ ⊤ j 1 K j ∈ R P j . (5) W e handle this through a null-space reparameterization (Chan et al., 2025). An y θ j satisfying (5) can b e decomp osed as θ j = θ 0 j + N j η j , (6) where θ 0 j = c j / ∥ c j ∥ 2 is the minimum-norm particular solution, N j ∈ R P j × ( P j − 1) has orthonormal columns spanning k er( c ⊤ j ) , and η j ∈ R P j − 1 is the free parameter. Since c ⊤ j N j = 0 , the constrain t is automatically satisﬁed. Substituting (6) into the aggregation yields ˜ x ( j ) t = x ( j ) ⊤ t Φ j θ 0 j | {z } c ( j ) t + x ( j ) ⊤ t Φ j N j | {z } r ( j ) ⊤ t η j , (7) where c ( j ) t is a known scalar and r ( j ) t ∈ R P j − 1 is the reduced MID AS regressor. The reparameterized mo del is y t = α + J X j =1 β j h c ( j ) t + r ( j ) ⊤ t η j i + ε t . (8) This form mak es conditional linearity transparent: given { η j } , the model is linear in ξ = ( α, β 1 , . . . , β J ) ⊤ ; giv en ξ , it is linear in each η j . 2.4 Prior sp eciﬁcation W e assign indep enden t priors: η j ∼ N ( 0 , σ 2 η I P j − 1 ) , σ 2 η = 1 , (9) α ∼ N (0 , 100) , β j ∼ N (0 , 10) , (10) σ 2 ∼ In v erse-Gamma (0 . 01 , 0 . 01) . (11) The prior on η j is cen tered at zero, corresp onding to the weigh t proﬁle induced b y θ 0 j . The priors on α and β j are weakly informativ e, and the Inv erse-Gamma prior on σ 2 is the standard diﬀuse sp eciﬁcation. 6 Remark 2 (Choice of pri ors) . W e use identic al Gaussian priors for al l β j r ather than shrinkage priors. This delib er ate choic e isolates algorithm p erformanc e fr om prior sp e c- iﬁc ation eﬀe cts. In the Monte Carlo study, nul l c o eﬃcients ( β j = 0 ) ar e r e c over e d with bias b elow 0.005 without shrinkage. Extension to glob al-lo c al shrinkage priors is str aight- forwar d. 2.5 P osterior The join t p osterior is p ( ξ , η 1 , . . . , η J , σ 2 | y ) ∝ p ( y | ξ , η , σ 2 ) p ( ξ ) J Y j =1 p ( η j ) p ( σ 2 ) . (12) Due to the bilinear terms, the marginal likelihoo d has no closed form. Ho wev er, the con- ditional p osteriors are standard: p ( ξ | η , σ 2 , y ) is Gaussian, each p ( η j | ξ , η − j , σ 2 , y ) is Gaussian, and p ( σ 2 | ξ , η , y ) is Inv erse-Gamma. Remark 3 (F ailure of NUTS) . W e attempte d estimation using NUTS (Hoﬀman and Gel- man, 2014), which op er ates on the joint p ar ameter sp ac e without exploiting c onditional structur e. The biline ar pr o duct cr e ates a funnel-like ge ometry (Ne al, 2003) wher e p oste- rior curvatur e varies by or ders of magnitude. NUTS tr aje ctories c onsistently pr o duc e d numeric al diver genc es, c onﬁrming that generic HMC is il l-suite d to this mo del class. 3 V ariational Inference for MID AS Regression 3.1 Mean-ﬁeld appro ximation W e approximate the p osterior (12) with q ( ξ , η 1 , . . . , η J , σ 2 ) = q ( ξ ) J Y j =1 q ( η j ) q ( σ 2 ) , (13) c hosen to maximize the evidence lo wer b ound (ELBO), L ( q ) = E q [log p ( y , ξ , η , σ 2 )] − E q [log q ( ξ , η , σ 2 )] ≤ log p ( y ) . (14) The in tercept and impact co eﬃcien ts are k ept in a join t blo c k ξ , preserving their p osterior correlations. The weigh t parameters η j are separated from ξ and from each other—this is the mean-ﬁeld appro ximation that ignores cross-blo ck correlations in ex- c hange for tractability . Under the standard mean-ﬁeld result (Blei et al., 2017), the 7 optimal factor q ∗ m satisﬁes log q ∗ m ( ψ m ) = E − m [log p ( y , ψ )] + const . (15) 3.2 Blo c k 1: Regression coeﬃcients Deﬁne the eﬀectiv e regressor v ector ˜ x t = (1 , ˜ x (1) t , . . . , ˜ x ( J ) t ) ⊤ ∈ R J +1 . Prop osition 1. Under the me an-ﬁeld factorization (13) , the optimal variational distri- bution for the r e gr ession c o eﬃcients is q ∗ ( ξ ) = N ( µ ξ , Σ ξ ) with Σ ξ = E q [ σ − 2 ] T X t =1 E q [ ˜ x t ˜ x ⊤ t ] + Λ ξ ! − 1 , (16) µ ξ = Σ ξ E q [ σ − 2 ] T X t =1 y t E q [ ˜ x t ] ! , (17) wher e Λ ξ = diag( σ − 2 α , σ − 2 β , . . . , σ − 2 β ) is the prior pr e cision matrix. The required momen ts are E q [ ˜ x ( j ) t ] = c ( j ) t + r ( j ) ⊤ t µ η j and E q [( ˜ x ( j ) t ) 2 ] =  E q [ ˜ x ( j ) t ]  2 + r ( j ) ⊤ t Σ η j r ( j ) t . (18) The v ariance correction term r ( j ) ⊤ t Σ η j r ( j ) t propagates uncertain ty ab out η j in to the up date of ξ , distinguishing this from a plug-in approac h. Pr o of. Applying the CA VI rule (15) to ξ and collecting quadratic terms from E − ξ [log p ( y | ξ , η , σ 2 )] yields a Gaussian with the stated precision (16). The second- momen t matrix E q [ ˜ x t ˜ x ⊤ t ] arises from completing the square in ξ , where the oﬀ-diagonal en tries couple α and each β j through E q [ ˜ x ( j ) t ] , and the diagonal en tries in v olv e (18). The linear term in ξ giv es the mean (17). □ 3.3 Blo c k 2: W eigh t parameters F or each predictor j = 1 , . . . , J , deﬁne the partial residual ¯ e ( − j ) t = y t − µ α − X j ′  = j µ β j ′ E q [ ˜ x ( j ′ ) t ] − µ β j c ( j ) t . (19) Prop osition 2. Under the me an-ﬁeld factorization (13) , the optimal variational distri- 8 bution for the weight p ar ameters of pr e dictor j is q ∗ ( η j ) = N ( µ η j , Σ η j ) with Σ η j = E q [ σ − 2 ] E q [ β 2 j ] T X t =1 r ( j ) t r ( j ) ⊤ t + σ − 2 η I P j − 1 ! − 1 , (20) µ η j = Σ η j · E q [ σ − 2 ] µ β j T X t =1 r ( j ) t ¯ e ( − j ) t − T X t =1 r ( j ) t g ⊤ t Σ ξ e j +1 ! , (21) wher e g t = E q [ ˜ x t ] , e j +1 is the ( j + 1) -th c anonic al b asis ve ctor, and E q [ β 2 j ] = µ 2 β j + ( Σ ξ ) j +1 ,j +1 . Three features of this up date merit emphasis. First , the precision (20) is scaled b y E q [ β 2 j ] , whic h includes the v ariance of β j . When β j is close to zero, the prior dominates and w eigh ts rev ert to the baseline—automatic regularization of irrelev ant predictors. Se c ond , the mean (21) contains a co v ariance correction term − P t r ( j ) t g ⊤ t Σ ξ e j +1 , arising b ecause β j is join tly distributed with other comp onen ts of ξ . Omitting this cor- rection w ould ignore the b eneﬁts of the joint ( α, β ) treatment. Thir d , the matrix inv ersion in (20) has dimension ( P j − 1) × ( P j − 1) , typically 2 or 3, making the p er-predictor up date negligible. Pr o of. See Online App endix A.1. 3.4 Blo c k 3: Error v ariance Prop osition 3. Under the me an-ﬁeld factorization, the optimal variational distribution for the err or varianc e is q ∗ ( σ 2 ) = Inv erse - Gamma(˜ a, ˜ b ) with ˜ a = a 0 + T / 2 , ˜ b = b 0 + 1 2 T X t =1 E q [ e 2 t ] , (22) wher e E q [ e 2 t ] = y 2 t − 2 y t E q [ ˜ x t ] ⊤ µ ξ + tr  E q [ ˜ x t ˜ x ⊤ t ] · ( µ ξ µ ⊤ ξ + Σ ξ )  . Pr o of. Applying the CA VI rule (15) to σ 2 , the exp ected log-joint collects all terms in v olving σ 2 : the log-likelihoo d contributes − T 2 log σ 2 − 1 2 σ 2 P t E q [ e 2 t ] , and the In v erse- Gamma prior con tributes − ( a 0 + 1) log σ 2 − b 0 /σ 2 . Com bining yields the k ernel of an In v erse-Gamma with the stated parameters (22). The exp ected squared residual E q [ e 2 t ] expands using E q [ ξ ξ ⊤ ] = µ ξ µ ⊤ ξ + Σ ξ . □ The momen ts required b y other blo c ks are E q [ σ − 2 ] = ˜ a/ ˜ b and E q [log σ 2 ] = log ˜ b − ψ (˜ a ) . 9 3.5 The CA VI algorithm Algorithm 1: CA VI for Ba y esian MID AS Regression Input: Data { y , x ( j ) t } ; basis matrices { Φ j } ; h yp erparameters; tolerance ε Output: V ariational parameters µ ξ , Σ ξ ; { µ η j , Σ η j } ; ˜ a, ˜ b ; ELBO trace Prepro cessing: Compute θ 0 j , N j , c ( j ) t , r ( j ) t for all j, t ; Initialize: µ ξ from OLS with uniform w eigh ts (see App endix A.4); µ η j = 0 ; ˜ a = a 0 + T / 2 ; rep eat for j = 1 , . . . , J do Up date q ∗ ( η j ) via (20)–(21); end Up date q ∗ ( ξ ) via (16)–(17); Up date q ∗ ( σ 2 ) via (22); Compute L ( q ) via (14); un til | ∆ L| / |L| < ε ; Prop osition 4 (Monotone conv ergence) . The ELBO se quenc e {L ( t ) } gener ate d by A lgo- rithm 1 satisﬁes L ( t +1) ≥ L ( t ) for al l t , and c onver ges to a ﬁxe d p oint. Pr o of. Eac h co ordinate up date maximizes the ELBO ov er the corresp onding factor while holding all other factors ﬁxed. Since the ELBO is b ounded ab o v e by log p ( y ) , the monotonically non-decreasing sequence conv erges. This is a standard prop ert y of CA VI with exp onen tial family conditionals; see Blei et al. (2017), Section 2.4. □ A cross 10,500 CA VI ﬁts in our Monte Carlo study , no ELBO violation was observed. 3.6 Computational complexit y The dominant cost p er iteration is the ( J + 1) × ( J + 1) matrix inv ersion in Step 2, whic h is O ( J 3 ) . The J inv ersions in Step 1 are eac h ( P j − 1) × ( P j − 1) —negligible for P j ≤ 4 . Con v ergence requires 3–70 iterations (3 for J = 1 , up to 70 for J = 50 ). With T = 200 , J = 50 , P = 3 , a complete CA VI run takes approximately 1.4 seconds on a standard laptop. 3.7 Mean-ﬁeld underdisp ersion The mean-ﬁeld factorization ignores Co v( β j , η j ) . This produces v ariational marginals more concentrated than the true p osterior—underdispersion (Blei et al., 2017). The ef- fect is asymmetric : the impact co eﬃcien ts in ξ are correlated with all J weigh t vectors sim ultaneously , while each η j is primarily correlated with its o wn β j . As J gro ws, cum u- lativ e ignored correlations increase for ξ but remain approximately constant for each η j , explaining wh y co verage degrades for β but not for η (Section 4). 10 A partial mitigation operates through the second-moment coupling: E q [ β 2 j ] in (20) propagates the uncertaint y of β j in to the calibration of η j . No analogous mec hanism comp ensates β j at the individual co eﬃcien t level. 4 Mon te Carlo Sim ulation Study 4.1 Design W e ev aluate CA VI against a block Gibbs sampler across 21 DGP conﬁgurations organized in three tiers, with 500 replications each (10,500 CA VI ﬁts; 9,500 Gibbs ﬁts—the Gibbs sampler is omitted for J ∈ { 25 , 50 } due to prohibitive computation time). Tier 1 v aries J ∈ { 1 , 3 , 5 , 10 , 25 , 50 } and T ∈ { 50 , 100 , 200 , 400 } ; Tier 2 examines weigh t proﬁles, signal- to-noise ratios, basis functions, and frequency ratios; Tier 3 con tains stress tests. F or conﬁgurations with J > 1 , half the impact co eﬃcien ts are zero. 4.2 Main results: scalability in J T able 1 rep orts results as J increases from 1 to 50, with T = 200 , K = 9 , P = 3 . T able 1: CA VI versus Gibbs across num b er of predictors J . Metrics a v eraged ov er activ e β j , 500 replications (50 for J = 25 Gibbs). Standard errors in paren theses. T = 200 , K = 9 , P = 3 . J Metho d Bias( β ) RMSE( β ) Co v95( β ) Bias( η ) Co v95( η ) Time (s) Sp eedup ESS 1 CA VI 0.029 (.007) 0.158 (.004) 0.894 (.014) 0.001 (.001) 0.929 (.011) 0.001 1,772 × — 1 Gibbs 0.032 (.007) 0.159 (.004) 0.934 (.011) 0.002 (.001) 0.936 (.011) 1.82 — 4,103 3 CA VI 0.056 (.004) 0.173 (.003) 0.836 (.017) 0.005 (.002) 0.945 (.010) 0.006 645 × — 3 Gibbs 0.078 (.004) 0.187 (.003) 0.903 (.013) 0.008 (.002) 0.957 (.009) 3.59 — 1,066 5 CA VI 0.087 (.003) 0.177 (.002) 0.594 (.022) 0.030 (.003) 0.953 (.009) 0.023 283 × — 5 Gibbs 0.103 (.003) 0.181 (.002) 0.846 (.016) 0.032 (.002) 0.974 (.007) 6.54 — 701 10 CA VI 0.079 (.003) 0.178 (.002) 0.602 (.022) 0.050 (.003) 0.963 (.010) 0.066 238 × — 10 Gibbs 0.096 (.003) 0.181 (.002) 0.850 (.016) 0.047 (.002) 0.986 (.007) 15.56 — 539 25 CA VI 0.086 (.003) 0.179 (.002) 0.581 (.022) 0.073 (.003) 0.971 (.010) 0.322 107 × — 25 Gibbs 0.159 0.186 0.828 — — 34.44 — 134 50 CA VI 0.103 (.003) 0.196 (.003) 0.550 (.022) 0.078 (.002) 0.980 (.009) 1.383 — — P oint estimates. Bias and RMSE are comparable b et ween CA VI and Gibbs across all conﬁgurations where b oth are av ailable. The diﬀerence in bias is at most 0.03, conﬁrming the v ariational p osterior mean as a reliable p oin t estimator. Sp eed. Sp eedup ranges from 107 × ( J = 25 ) to 1 , 772 × ( J = 1 ), as visualized in Figure 1. The adv antage gro ws with dimension: at J = 25 , CA VI completes in 0.3 seconds versus 34 seconds for Gibbs. At J = 50 , CA VI completes in 1.4 seconds while the Gibbs sampler w ould require an estimated 60+ seconds (Figure 8). 11 Figure 1: Computational sp eedup of CA VI o v er Gibbs sampler as a function of the num b er of predictors J . Solid mark ers: measured. Op en mark ers: extrap olated from Gibbs scaling pattern. Ev en at J = 50 , CA VI maintains a sp eedup of appro ximately 53 × (extrap olated). Co verage of η . Cov erage remains ab o ve 92% across all conﬁgurations, reac hing 98% at J = 50 . MID AS w eigh t proﬁles estimated b y CA VI are w ell-calibrated. Co verage of β . Co verage declines from 89.4% ( J = 1 ) to 55.0% ( J = 50 ), con- cen trated at J ≥ 5 . T w o contextual observ ations: (i) Gibbs itself achiev es only 85–93% rather than 95%, reﬂecting the c h allenging p osterior geometry; (ii) the co v erage gap re- ﬂects narro w er in terv als, not biased centers. Gibbs ESS degradation. The minimum ESS declines from 4,103 to 539 as J gro ws from 1 to 10, and further to 134 at J = 25 (Figure 2), indicating that few er than 5% of dra ws are eﬀectiv ely indep endent at J = 25 . 4.3 Main results: sample size T T able 2 rep orts results as T v aries from 50 to 400, with J = 3 . T able 2: CA VI versus Gibbs across sample size T . 500 replications. Standard errors in paren theses. J = 3 , K = 9 , P = 3 . T Metho d Bias( β ) RMSE( β ) Co v95( β ) Bias( η ) Cov95( η ) Time (s) Sp eedup ESS 50 CA VI 0.216 (.009) 0.425 (.007) 0.584 (.022) 0.023 (.003) 0.931 (.011) 0.015 303 × — 50 Gibbs 0.278 (.009) 0.444 (.007) 0.807 (.018) 0.034 (.002) 0.977 (.007) 4.47 — 681 100 CA VI 0.130 (.006) 0.274 (.004) 0.729 (.020) 0.010 (.003) 0.950 (.010) 0.011 416 × — 100 Gibbs 0.170 (.006) 0.297 (.004) 0.860 (.016) 0.009 (.002) 0.975 (.007) 4.39 — 794 200 CA VI 0.056 (.004) 0.173 (.003) 0.836 (.017) 0.005 (.002) 0.945 (.010) 0.006 645 × — 200 Gibbs 0.078 (.004) 0.187 (.003) 0.903 (.013) 0.008 (.002) 0.957 (.009) 3.59 — 1,066 400 CA VI 0.028 (.003) 0.118 (.002) 0.869 (.015) 0.003 (.001) 0.932 (.011) 0.003 1,504 × — 400 Gibbs 0.036 (.003) 0.125 (.002) 0.929 (.012) 0.009 (.001) 0.940 (.011) 4.60 — 2,022 CA VI cov erage on β increases from 58.4% ( T = 50 ) to 86.9% ( T = 400 ), approaching 12 Figure 2: Gibbs sampler minim um eﬀectiv e sample size (ESS) as a function of J . The decline from 4,103 ( J = 1 ) to 539 ( J = 10 ) and further to 134 ( J = 25 , 50 replications) indicates increasing auto correlation in the MCMC chain as bilinear coupling in volv es more blo c ks. the Gibbs b enc hmark. A t T = 400 the gap narro ws to 6 p ercen tage p oin ts. The sp eedup is largest when T is large: 1 , 504 × at T = 400 , because CA VI con verges in only 5 iterations while Gibbs remains at 5,000 dra ws. Figure 3 visualizes the co v erage con v ergence. 4.4 Robustness and stress tests W eigh t proﬁles (conﬁgs 2A-1 to 2A-3). Decreasing, h ump-shap ed, and U-shap ed proﬁles are recov ered accurately b y b oth metho ds. Figure 4 illustrates the recov ery for the baseline J = 3 conﬁguration, conﬁrming that CA VI and Gibbs pro duce nearly identical weigh t estimates across distinct proﬁle shap es. Cov erage on η exceeds 92% for all shap es. Signal-to-noise ratio. High SNR ( σ 2 0 = 0 . 25 ) yields 90% cov erage on β for CA VI; lo w SNR ( σ 2 0 = 4 ) reduces it to 75%. Gibbs co v erage also declines (95% to 86%), indicating mo del-inheren t diﬃculty . F requency ratio. At K = 65 (quarterly-to-daily), both CA VI and Gibbs struggle: β co v erage drops to 48–50% for b oth, conﬁrming a DGP/basis limitation rather than an algorithm failure. F ull robustness and stress test tables are in Online App endix A.3. 4.5 Comparison with automatic diﬀeren tiation VI A natural question is whether generic Automatic Diﬀeren tiation V ariational Inference (AD VI; Kucuk elbir et al., 2017) could replace our model-sp eciﬁc CA VI. W e compare against b oth mean-ﬁeld AD VI and full-rank ADVI implemen ted in PyMC 5 across J ∈ { 1 , 3 , 5 } with 5 replications each (10,000 gradient steps p er ﬁt). 13 Figure 3: Empirical 95% cov erage of activ e β j parameters as a function of sample size T , with J = 3 . Error bars sho w ± 1 standard error across replications. Both methods impro v e with T . The CA VI–Gibbs gap narrows from approximately 10 p ercen tage p oin ts at T = 100 to 6 p ercen tage p oin ts at T = 400 . The T = 50 CA VI p oin t (58.4%, T able 2) falls b elo w the display ed range. Figure 4: W eigh t proﬁle recov ery for a representativ e replication ( J = 3 , T = 200 ). Left: decreasing proﬁle ( β 1 = 2 . 0 ). Cen ter: h ump-shap ed proﬁle ( β 2 = − 1 . 0 ). Righ t: decrea s- ing proﬁle ( β 3 = 0 . 5 ). CA VI (blue) and Gibbs (red) estimates are nearly indistinguishable from the true proﬁle (blac k) for strong predictors. T able 3: CA VI versus generic AD VI (mean-ﬁeld and full-rank). 5 replications p er conﬁg- uration. T = 200 , K = 9 , P = 3 . J Metho d Bias( β ) RMSE( β ) Time (s) Sp eedup vs ADVI-MF 1 CA VI 0.128 0.235 0.000 > 100,000 × 1 AD VI-MF 1.851 1.851 136.0 — 1 AD VI-FR 1.972 1.972 137.3 — 3 CA VI 0.128 0.157 0.009 24,644 × 3 AD VI-MF 1.381 1.457 221.8 — 3 AD VI-FR 1.467 1.547 302.0 — 5 CA VI 0.164 0.226 0.022 1,934 × 5 AD VI-MF 1.074 1.219 42.6 — 5 AD VI-FR 1.149 1.302 61.9 — 14 CA VI dominates ADVI on both dimensions. A c cur acy : ADVI bias is 7–14 times larger than CA VI across all conﬁgurations, indicating that sto c hastic gradien t optimization fails to ﬁnd the correct p osterior mo de for the MIDAS bilinear structure. The AD VI loss function landscap es are likely plagued b y the same funnel geometry that defeats NUTS. Sp e e d : CA VI is 2,000–100,000 times faster. The adv antage is even more dramatic than the CA VI–Gibbs comparison b ecause ADVI requires thousands of gradient ev aluations through automatic diﬀerentiation, whereas CA VI exploits closed-form conditional conju- gacy . These results conﬁrm that the contribution of this pap er is not merely computational sp eedup ov er MCMC, but the deriv ation of mo del-sp eciﬁc up dates that exploit conditional structure una v ailable to black-box metho ds. 4.6 Discussion The Mon te Carlo evidence supp orts four conclusions. First, CA VI is a reliable estimator of p osterior means: bias and RMSE matc h Gibbs across all conﬁgurations, including J = 25 where the Gibbs b enc hmark is no w av ailable (T able 1). Second, CA VI dominates generic ADVI by a wide margin on b oth accuracy and sp eed (T able 3), conﬁrming the v alue of mo del-sp eciﬁc v ariational deriv ations. Third, w eigh t function parameters are w ell-calibrated (cov erage > 92%). F ourth, impact co eﬃcien t credible in terv als exhibit mean-ﬁeld underdispersion—mo derate for J ≤ 3 ( ≥ 84%) and substantial for J ≥ 5 ( ≈ 55– 60%). F or applications fo cused on p oin t forecasting, v ariable selection, or weigh t proﬁle esti- mation, CA VI oﬀers a comp elling alternative. F or w ell-calibrated credible interv als on β in high-dimensional settings, Gibbs sampling or structured VI (Loaiza-May a et al., 2022) should b e preferred. Practical co v erage guidance. F or practitioners requiring uncertaint y quan tiﬁcation on β , w e oﬀer the following guidance based on the full Monte Carlo evidence. F or J ≤ 3 with T ≥ 200 , CA VI credible in terv als ac hieve co v erage ab ov e 83%, adequate for most applications. A simple p ost-hoc calibration—inﬂating the v ariational standard deviation b y a multiplier κ —restores nominal co v erage for J ≤ 3 : a multiplier of κ = 1 . 2 suﬃces for J = 1 (calibrated cov erage 94.2%) and κ = 1 . 8 for J = 3 (calibrated cov erage 94.9%). F or J ≥ 5 , the mean-ﬁeld appro ximation b ecomes inadequate for interv al estimation: ev en a multiplier of κ = 3 . 0 only reac hes 86% co verage at J = 5 and 81% at J = 50 , b ecause the underdisp ersion is non-uniform across parameters. In such settings, we recommend using the Gibbs sampler for inference on β or, when computational constraints preclude MCMC, adopting the linear response corrections of Giordano et al. (2018), whic h adjust v ariational cov ariances using second-order sensitivity of the ELBO. Extension to 15 structured v ariational families that capture Cov( β j , η j ) is a natural direction for future w ork. 5 Empirical Application: Realized V olatilit y F orecast- ing 5.1 Setup W e apply the MIDAS-R V sp eciﬁcation of Gh ysels et al. (2006) to forecast monthly realized v olatilit y (R V) of the S&P 500 index: log RV t = α + β K − 1 X k =0 w ( k ; θ ) r 2 t − k + ε t , (23) where RV t = P D t d =1 r 2 t,d is the sum of squared daily returns within mon th t . W e use K = 22 daily lags and Almon p olynomials with P = 3 , and consider J = 1 and J = 3 sp eciﬁcations. In the J = 3 sp eciﬁcation, the three predictors corresp ond to daily squared returns from the curren t mon th ( t ), the previous month ( t − 1 ), and t w o mon ths prior ( t − 2 ), each aggregated with its own MIDAS w eigh t function. Benc hmarks include HAR- R V (Corsi, 2009), AR(1), AR(4), and the historical av erage. Figure 5 displa ys the S&P 500 mon thly realized v olatility series from January 2000 through Decem b er 2025 (312 mon ths, 6,537 trading da ys). The dashed v ertical line marks the start of the out-of-sample ev aluation p erio d (187 months, expanding windo w from 120-mon th initial training sample). 16 Figure 5: S&P 500 mon thly realized v olatility (2000–2025). T op: raw RV t . Bottom: log RV t . Dashed line: start of out-of-sample p erio d. Prominen t spikes corresp ond to the 2008 ﬁnancial crisis and the 2020 CO VID-19 sho c k. 5.2 F orecast accuracy T able 4 rep orts out-of-sample forecast accuracy . Dieb old-Mariano tests (Diebold and Mariano, 1995) are against HAR-R V. T able 4: Out-of-sample forecast accuracy (187 months). MSE and MAE on log RV t . Mo del MSE MAE Rel. MSE vs HAR DM p -v alue Time/month MID AS J = 1 (CA VI) 0.720 0.651 1.058 0.226 0.001 s MID AS J = 1 (Gibbs) 0.720 0.651 1.058 0.224 1.13 s MID AS J = 3 (CA VI) 0.688 0.633 1.011 0.864 0.009 s MID AS J = 3 (Gibbs) 0.702 0.635 1.031 0.633 2.48 s HAR-R V 0.680 0.645 1.000 — < 0.001 s AR(1) 0.713 0.657 1.048 0.045 < 0.001 s AR(4) 0.711 0.653 1.046 0.358 < 0.001 s Historical A vg 0.997 0.802 1.465 0.000 < 0.001 s CA VI–Gibbs agreement. The MSE diﬀerence b et w een CA VI and Gibbs is 0.00006 for J = 1 (relativ e diﬀerence: 0.01%), conﬁrming the Mon te Carlo ﬁnding. Figure 6 shows that MID AS J = 1 (CA VI) trac ks the actual log-R V series closely , with performance comparable to HAR-R V. F orecasting p erformance. MID AS J = 3 nearly matches HAR-R V (relative MSE = 1 . 011 ; DM p = 0 . 864 ). Neither MIDAS sp eciﬁcation is signiﬁcan tly w orse than HAR at 17 Figure 6: Out-of-sample forecasts of log R V t (2010–2025). Black: actual. Blue: MIDAS J = 1 (CA VI). Gray dashed: HAR-R V. Both mo dels track the actual series closely , with large forecast errors concen trated around the Marc h 2020 COVID-19 volatilit y spike. con v en tional lev els, while AR(1) is signiﬁcantly inferior ( p = 0 . 045 ) and the historical a v erage is strongly dominated ( p < 0 . 001 ). Computational time. CA VI completes each monthly estimation in 1 ms ( J = 1 ) or 9 ms ( J = 3 ), yielding sp eedups of 936 × and 264 × resp ectiv ely . In a single-pass estima- tion such as this application, the absolute time diﬀerence (1–2 seconds for Gibbs) ma y seem inconsequen tial. How ev er, the sp eed adv an tage b ecomes decisive in three settings that are increasingly common in applied work: (i) large-scale mo del comparison exercises scanning o v er man y predictor sets, lag structures, or basis speciﬁcations; (ii) real-time no w casting pip elines with h undreds of candidate predictors, where estimation must com- plete within a publication cycle; and (iii) expanding-window or rolling-window ev aluations with hundreds of re-estimations, as in the presen t exercise (187 out-of-sample mon ths × m ultiple sp eciﬁcations). With J = 10 predictors and 500 windows, CA VI completes in appro ximately 33 seconds v ersus 2.2 hours for Gibbs. 5.3 Estimated w eigh t proﬁles Figure 7 displa ys the estimated MID AS weigh t proﬁle from the full-sample J = 1 es- timation. Both CA VI and Gibbs pro duce a monotonically decreasing proﬁle, assigning greatest w eight to the most recent daily observ ations—economically sensible, as recent squared returns are more informative ab out current-mon th v olatilit y . The shaded regions sho w 95% credible bands. The CA VI band (purple) is wider than the Gibbs band (pink), reﬂecting the diﬀerent uncertain ty propagation mec hanisms: CA VI inﬂates the eﬀectiv e v ariance through the E q [ β 2 ] term in (20), whic h includes the v ariance of β , while the Gibbs band is computed conditionally on each p osterior dra w of β . The p oin t estimates are virtually iden tical. 18 Figure 7: Estimated MIDAS weigh t proﬁles ( J = 1 , full sample). Blue: CA VI p osterior mean. Red: Gibbs p osterior mean. Shaded regions: 95% credible bands. Dashed green: uniform we igh ts. Poin t estimates are virtually iden tical. The monotonically decreasing proﬁle assigns greater w eigh t to recen t daily returns, consistent with the intuition that recen t squared returns are more informativ e ab out current-mon th v olatilit y . The J = 3 sp eciﬁcation tells a complemen tary story (Figure 9 in the Appendix): Predictor 1 (current mon th) retains the same decreasing proﬁle, while Predictors 2 and 3 (mon ths t − 1 and t − 2 ) hav e w eights concen trated near zero with wide credible bands, indicating negligible marginal contribution—consisten t with the mo dest forecasting gain of J = 3 ov er J = 1 (T able 4). 6 Conclusion This pap er dev elops the ﬁrst v ariational inference algorithm for Ba yesian MIDAS regres- sion. The CA VI algorithm exploits conditional conjugacy of the linearly parameterized mo del to pro duce closed-form up dates for all parameter blo c ks, propagating uncertaint y across blo cks through second moments and including a co v ariance correction sp eciﬁc to the MID AS bilinear setting. A Mon te Carlo study across 21 conﬁgurations demonstrates that CA VI p osterior means are essentially identical to those from a blo c k Gibbs sampler—no w conﬁrmed through J = 25 —while achieving sp eedups of 107 × to 1 , 772 × (T able 9). Generic AD VI, by contrast, pro duces bias 7–14 times larger while b eing orders of magnitude slow er, conﬁrming that mo del-speciﬁc deriv ations are essen tial for this mo del class. W eigh t parameters are well- calibrated (cov erage ab o ve 92%). Impact co eﬃcien t credible in terv als exhibit mean-ﬁeld 19 underdisp ersion, with co verage declining from 89% to 55% as predictor dimension grows— an honestly do cumen ted trade-oﬀ. Three directions for future work follow naturally . First, extension to time-v arying pa- rameter MID AS requires structured v ariational approximations (Loaiza-May a et al., 2022) that capture temp oral dep endence—for example, targeting GDP no w casting applications where parameter drift is economically motiv ated. Second, global-lo cal shrink age priors (Ba y esian Lasso, horsesho e, Dirichlet-Laplace) can b e incorp orated with minimal addi- tional deriv ation, enabling principled v ariable selection in large MID AS panels. Third, sto c hastic v olatilit y can b e accommo dated through the Gaussian appro ximation metho ds of Chan and Y u (2022). References Blei, D.M., Kucukelbir, A. and McAuliﬀe, J.D. (2017). V ariational inference: a review for statisticians. Journal of the Americ an Statistic al Asso ciation , 112(518), 859–877. Chan, J.C.C. and Jeliazk o v, I. (2009). Eﬃcien t simulation and integrated lik eliho od es- timation in state space mo dels. International Journal of Mathematic al Mo del ling and Numeric al Optimisation , 1(1/2), 101–120. Chan, J.C.C. and Y u, X. (2022). F ast and accurate v ariational inference for large Ba yesian V ARs wi th sto c hastic v olatilit y . Journal of Ec onomic Dynamics and Contr ol , 143, 104505. Chan, J.C.C., Poon, A. and Zhu, D. (2025). Time-v arying parameter MID AS mo dels. Journal of Ec onometrics . Corsi, F. (2009). A simple approximate long-memory mo del of realized volatilit y . Journal of Financial Ec onometrics , 7(2), 174–196. Dieb old, F.X. and Mariano, R.S. (1995). Comparing predictive accuracy . Journal of Busi- ness & Ec onomic Statistics , 13(3), 253–263. F oroni, C., Marcellino, M. and Sch umacher, C. (2015). Unrestricted mixed data sampling (MID AS): MIDAS regressions with unrestricted lag p olynomials. Journal of the R oyal Statistic al So ciety: Series A , 178(1), 57–82. Gefang, D., Koop, G. and Poon, A. (2020). Computationally eﬃcien t inference in large Ba y esian mixed frequency V ARs. Ec onomics L etters , 191, 109120. Gefang, D., K o op, G. and Poon, A. (2023). F orecasting using v ariational Bay esian infer- ence in large vector autoregressions with hierarchical shrink age. International Journal of F or e c asting , 39(1), 346–363. 20 Giordano, R., Bro deric k, T. and Jordan, M.I. (2018). Co v ariances, robustness, and v ari- ational Ba y es. Journal of Machine L e arning R ese ar ch , 19(51), 1–49. Gh ysels, E., San ta-Clara, P . and V alk anov, R. (2006). Predicting v olatilit y: getting the most out of return data sampled at diﬀerent frequencies. Journal of Ec onometrics , 131(1-2), 59–95. Hoﬀman, M.D. and Gelman, A. (2014). The No-U-T urn Sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. Journal of Machine L e arning R ese ar ch , 15, 1593– 1623. K ohns, D. and P otjagailo, G. (2025). Flexible Bay esian MIDAS: time v ariation, group shrink age and sparsity . Journal of Business & Ec onomic Statistics , 43(4), 1034–1050. Kucuk elbir, A., T ran, D., Ranganath, R., Gelman, A. and Blei, D.M. (2017). Automatic diﬀeren tiation v ariational inference. Journal of Machine L e arning R ese ar ch , 18, 1–45. Loaiza-Ma y a, R., Smith, M.S., Nott, D.J. and Danaher, P .J. (2022). F ast and accurate v ariational inference for mo dels with many laten t v ariables. Journal of Ec onometrics , 230(2), 339–362. Mogliani, M. and Simoni, A. (2021). Ba y esian MID AS p enalized regressions: estimation, selection, and prediction. Journal of Ec onometrics , 222(1), 833–860. Neal, R.M. (2003). Slice sampling. A nnals of Statistics , 31(3), 705–767. P etten uzzo, D., Timmermann, A. and V alk anov, R. (2016). A MIDAS approach to mo d- eling ﬁrst and second momen t dynamics. Journal of Ec onometrics , 193(2), 315–334. Sc h umac her, C. (2014). MIDAS and bridge equations. Discussion Pap er No. 26/2014 , Deutsc he Bundesbank. 21 Online App endix A.1 Pro of of Prop osition 2 Starting from the CA VI rule (15), the log-optimal densit y for η j in v olv es isolating terms dep enden t on η j from the exp ected log-join t. W riting e t = e ( − j ) t − β j r ( j ) ⊤ t η j and expanding: The quadratic term in η j giv es − 1 2 E q [ σ − 2 ] E q [ β 2 j ] X t η ⊤ j r ( j ) t r ( j ) ⊤ t η j − 1 2 σ − 2 η η ⊤ j η j , yielding the precision matrix (20). F or the linear term, the key calculation is E q [ β j · e ( − j ) t ] . Since β j = e ⊤ j +1 ξ : E q [ β j ( y t − g ⊤ t ξ )] = µ β j y t − g ⊤ t E q [ ξ ξ ⊤ ] e j +1 = µ β j y t − g ⊤ t ( µ ξ µ ⊤ ξ + Σ ξ ) e j +1 = µ β j ¯ e ( − j ) t − g ⊤ t Σ ξ e j +1 , where the last term is the cov ariance correction. Collecting and applying µ η j = Σ η j × ( linear co eﬃcien t ) yields (21). □ A.2 ELBO computation The ELBO decomp oses as: Exp ected log-likelihoo d: E q [log p ( y | ξ , η , σ 2 )] = − T 2 log(2 π ) − T 2 (log ˜ b − ψ ( ˜ a )) − ˜ a 2 ˜ b X t E q [ e 2 t ] . Exp ected log-priors: E q [log p ( ξ )] = − J +1 2 log(2 π ) − 1 2 log | Λ − 1 ξ | − 1 2 tr( Λ ξ ( µ ξ µ ⊤ ξ + Σ ξ )) , E q [log p ( η j )] = − P j − 1 2 log(2 π σ 2 η ) − 1 2 σ 2 η ( ∥ µ η j ∥ 2 + tr( Σ η j )) , E q [log p ( σ 2 )] = a 0 log b 0 − log Γ( a 0 ) − ( a 0 + 1)(log ˜ b − ψ ( ˜ a )) − b 0 ˜ a/ ˜ b. En tropies: H [ q ( ξ )] = J +1 2 (1 + log (2 π )) + 1 2 log | Σ ξ | , H [ q ( η j )] = P j − 1 2 (1 + log (2 π )) + 1 2 log | Σ η j | , H [ q ( σ 2 )] = ˜ a + log ( ˜ b Γ( ˜ a )) − (1 + ˜ a ) ψ ( ˜ a ) . 22 A.3 A dditional Mon te Carlo results T able 5: Robustness results (Tier 2, Panel A): W eight proﬁle shap es. T = 200 , J = 3 , P = 3 . Proﬁle Method Bias( β ) Cov95( β ) Bias( η ) Co v95( η ) ESS Decreasing CA VI 0.051 0.749 0.012 0.921 — Decreasing Gibbs 0.077 0.900 0.023 0.946 643 Hump-shap ed CA VI 0.057 0.819 0.010 0.942 — Hump-shap ed Gibbs 0.079 0.899 0.019 0.961 1,034 U-shap ed CA VI 0.064 0.827 0.006 0.962 — U-shap ed Gibbs 0.086 0.891 0.009 0.970 1,108 T able 6: Robustness (P anel B): Signal-to-noise ratio and frequency ratio. V ariation Metho d Bias( β ) Co v95( β ) Bias( η ) Cov95( η ) ESS σ 2 0 = 0 . 25 (high SNR) CA VI 0.010 0.899 0.001 0.948 — Gibbs 0.012 0.953 0.002 0.959 3,280 σ 2 0 = 4 (low SNR) CA VI 0.112 0.747 0.011 0.943 — Gibbs 0.148 0.861 0.011 0.966 883 K = 5 (quarterly) CA VI 0.079 0.793 0.010 0.936 — Gibbs 0.112 0.867 0.009 0.957 696 K = 65 (daily) CA VI 0.332 0.483 0.012 0.952 — Gibbs 0.523 0.497 0.008 0.981 270 T able 7: Stress test results (Tier 3). T J Metho d Bias( β ) RMSE( β ) Cov95( β ) Bias( η ) Co v95( η ) ESS 50 10 CA VI 0.240 0.383 0.374 0.068 0.957 — 50 10 Gibbs 0.266 0.382 0.842 0.059 0.993 366 200 10 CA VI 0.156 0.273 0.441 0.026 0.955 — 200 10 Gibbs 0.203 0.293 0.708 0.016 0.981 339 50 3 CA VI 0.350 0.573 0.455 0.038 0.933 — 50 3 Gibbs 0.433 0.599 0.783 0.048 0.980 646 A.4 Prior and initialization sensitivit y Prior sensitivit y . T able 8 rep orts CA VI p erformance under three settings of the prior v ariance σ 2 β ∈ { 5 , 10 , 20 } for the impact co eﬃcien ts, with J = 3 , T = 200 , 100 replications. Results are virtually identical across sp eciﬁcations, conﬁrming robustness to the choice of prior v ariance ov er a fourfold range. 23 T able 8: Prior sensitivit y: CA VI p erformance under diﬀeren t σ 2 β . J = 3 , T = 200 , 100 replications. σ 2 β Bias( β ) RMSE( β ) Co v95( β ) Bias( η ) Co v95( η ) 5 0.128 (.007) 0.140 (.008) 0.875 (.033) 0.021 0.932 10 0.128 (.007) 0.140 (.008) 0.880 (.032) 0.021 0.932 20 0.127 (.007) 0.140 (.008) 0.880 (.032) 0.021 0.932 Initialization sensitivity . W e compare the default OLS w arm-start initialization against random initialization ( µ ξ ∼ N ( 0 , 0 . 01 I ) , µ η j ∼ N ( 0 , 0 . 01 I ) ) o ver 50 replications with J = 3 , T = 200 . OLS initialization yields substan tially better results: mean ab- solute bias of 0.121 (SE 0.011) v ersus 0.634 (SE 0.071) for random initialization, with the maximum ELBO diﬀerence across replications reac hing 76.3. This sensitivity arises b ecause the ELBO landscap e of the bilinear mo del has m ultiple lo cal optima; the OLS w arm-start directs the algorithm tow ard the basin containing the global optim um. W e recommend OLS initialization as the default for all applications. A.5 Computational timing T able 9 rep orts the mean CA VI and Gib bs estimation times across all Mon te Carlo con- ﬁgurations, together with sp eedup factors and the minim um eﬀective sample size (ESS) of the Gibbs chain. The sp eedup ranges from 107 × ( J = 25 , 50 replications) to 1 , 772 × ( J = 1 ). F or J = 50 , Gibbs timing is not rep orted as the computational cost w as pro- hibitiv e. T able 9: Computational timing across Monte Carlo conﬁgurations. CA VI and Gibbs times are p er-ﬁt av erages (500 replications except J = 25 Gibbs: 50 replications). Conﬁg CA VI (s) Iters Gibbs (s) Sp eedup Min ESS 1A-1 ( J = 1 ) 0.001 3 1.82 1,772 × 4,103 1A-2 ( J = 3 ) 0.006 10 3.59 645 × 1,066 1A-3 ( J = 5 ) 0.023 29 6.54 283 × 701 1A-4 ( J = 10 ) 0.066 37 15.56 238 × 539 1A-5 ( J = 25 ) 0.322 49 34.44 107 × 134 1A-6 ( J = 50 ) 1.383 67 — — — 2D-3 ( K = 65 ) 0.030 59 4.06 136 × 270 A.6 ELBO v alues T able 10 rep orts the ﬁnal ELBO v alues across all Mon te Carlo conﬁgurations. The ELBO increases (b ecomes less negativ e) with higher SNR and decreases with more predictors, as exp ected from the additional mo del complexit y . 24 T able 10: Final ELBO v alues across Monte Carlo conﬁgurations (500 replications). Conﬁg Mean Std Min Max 1A-1 ( J = 1 ) − 235.0 10.1 − 264.9 − 210.6 1A-2 ( J = 3 ) − 253.5 9.5 − 285.0 − 220.1 1A-3 ( J = 5 ) − 263.7 9.5 − 290.0 − 228.5 1A-4 ( J = 10 ) − 291.3 9.8 − 318.3 − 263.9 1A-5 ( J = 25 ) − 370.9 8.2 − 390.8 − 338.2 1A-6 ( J = 50 ) − 503.5 7.7 − 527.3 − 480.0 2B-1 (high SNR) − 100.6 9.8 − 130.4 − 70.1 2B-2 (lo w SNR) − 318.2 9.6 − 345.4 − 286.8 A.7 A dditional ﬁgures Figure 8: Computational scaling: mean estimation time p er ﬁt as a function of J (log-log scale). Blue: CA VI. Red: Gibbs (solid: measured; op en: extrap olated). The gap widens with J due to Gibbs mixing degradation. 25 Figure 9: Estimated MID AS w eight proﬁles ( J = 3 , full sample). Left: Predictor 1 (mon th t ) sho ws a monotonically decreasing proﬁle with tigh t credible bands, consistent with the J = 1 result (Figure 7). Cen ter and Right: Predictors 2 and 3 (mon ths t − 1 and t − 2 ) hav e weigh ts concen trated near zero with wide credible bands, indicating neg- ligible marginal con tribution b ey ond the curren t month’s squared returns. CA VI (blue) and Gibbs (red) p oin t estimates are virtually identical across all three predictors. Figure 10: Cum ulativ e out-of-sample squared forecast error (2010–2025). The historical a v erage (pink) is clearly dominated. All other mo dels cluster closely , with a sharp increase around Marc h 2020 (CO VID-19). MID AS J = 3 (CA VI) trac ks HAR-R V most closely . 26 Figure 11: MID AS-R V ( J = 1 , CA VI) out-of-sample forecasts with approximate 95% pre- dictiv e interv als. The March 2020 COVID-19 spik e falls outside the in terv al, as exp ected for an extreme tail ev en t. 27

Variational Inference for Bayesian MIDAS Regression

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment