Robust Two-Sample Mean Inference under Serial Dependence

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We propose robust two-sample tests for comparing means in time series. The framework accommodates a wide range of applications, including structural breaks, treatment-control comparisons, and group-averaged panel data. We first consider series HAR two-sample t-tests, where standardization employs orthonormal basis projections, ensuring valid inference under heterogeneity and nonparametric dependence structures. We propose a Welch-type t-approximation with adjusted degrees of freedom to account for long-run variance heterogeneity across the series. We further develop a series-based HAR wild bootstrap test, extending traditional wild bootstrap methods to the time-series setting. Our bootstrap avoids resampling blocks of observations and delivers superior finite-sample performance.

💡 Research Summary

This paper addresses the fundamental problem of testing equality of two population means when the data are time series that exhibit serial dependence and heteroskedasticity. Classical two‑sample t‑tests and Welch’s test assume independent, normally distributed observations; these assumptions are routinely violated in macro‑economic and financial applications, leading to severe size distortions.
The authors propose a comprehensive framework based on series‑based heteroskedasticity and autocorrelation robust (HAR) long‑run variance (LRV) estimation. The key idea is to project the residuals of each series onto a set of orthonormal basis functions (e.g., sine and cosine series) defined on the unit interval. For each basis function ℓ, the outer product of the projection coefficient yields an LRV component ˆΩ_{j,ℓ}; averaging across K_j basis functions gives the series‑HAR estimator ˆΩ_j. Two asymptotic regimes are considered: (i) fixed‑K asymptotics, where K_j is held constant as the sample size grows, and (ii) increasing‑K asymptotics, where K_j → ∞ but K_j/T_j → 0.
In the equal‑LRV case (Ω₁ = Ω₂), the authors define a t‑statistic t₀, HAR that uses the pooled series‑HAR variance estimate. Under fixed‑K asymptotics, the numerator and denominator are asymptotically independent, and t₀, HAR converges to a Student‑t distribution with K₁+K₂ degrees of freedom. This result (Theorem 1) offers a practical advantage over kernel‑based fixed‑b methods, whose limiting distributions are nonstandard and require simulation for critical values.
When the LRVs differ (Ω₁ ≠ Ω₂), the fixed‑K statistic is non‑pivotal. Under increasing‑K asymptotics, the series‑HAR estimator becomes consistent, and a Wald‑type statistic t₁, HAR is asymptotically normal. However, strong serial correlation can still cause size distortions. To mitigate this, the authors introduce a Welch‑type degrees‑of‑freedom adjustment that incorporates the heterogeneity of the LRVs. The adjusted degrees of freedom are computed analogously to the classic Welch‑Satterthwaite formula but replace sample variances with series‑HAR estimates. This yields a t‑approximation that retains the familiar Student‑t critical values while improving finite‑sample accuracy.
Beyond analytic approximations, the paper develops a series‑HAR wild bootstrap (SHAR‑WB). Instead of resampling blocks, the bootstrap generates serially dependent external random weights using the same orthogonal basis functions, ensuring that the bootstrap LRV mirrors the series‑HAR estimator’s structure. The bootstrap t‑statistic, studentized by the bootstrap series‑HAR variance, is shown to be consistent for the original t‑statistic under large‑K asymptotics. Simulation studies demonstrate that SHAR‑WB delivers superior size control and power relative to both kernel‑based fixed‑b bootstrap and block bootstrap methods, while being computationally simpler.
Monte‑Carlo experiments cover a variety of dependence structures (AR(1), MA, ARMA) and heteroskedastic patterns. Results confirm that (a) the fixed‑K t‑test attains accurate size when LRVs are equal, (b) the Welch‑type adjusted test corrects size distortions under LRV heterogeneity, and (c) SHAR‑WB outperforms traditional bootstrap approaches, especially under strong dependence.
Two empirical applications illustrate the methodology. First, the authors revisit a study on the impact of working‑from‑home (WFH) on employee performance, finding no statistically significant difference between WFH and office‑based groups once serial dependence and heteroskedasticity are accounted for. Second, they analyze three U.S. macro‑economic series (unemployment rate, CPI inflation, real GDP growth) across known structural breaks, again finding no meaningful mean shifts after robust adjustment. These findings contrast with results obtained using conventional t‑tests, underscoring the practical relevance of the proposed methods.
In conclusion, the paper delivers a unified, theoretically sound, and easy‑to‑implement solution for two‑sample mean inference with dependent data. It bridges the gap between fixed‑K series‑HAR inference (which yields standard t‑critical values) and increasing‑K consistency (which handles LRV heterogeneity), and augments both with a novel wild bootstrap that avoids block resampling. Potential extensions include multivariate mean testing, panel data with cross‑sectional dependence, and applications to nonlinear time‑series models.

Robust Two-Sample Mean Inference under Serial Dependence

💡 Research Summary

Comments & Academic Discussion

Leave a Comment