Prediction-based inference for integrated diffusions with high-frequency data

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We consider parametric inference for an ergodic and stationary diffusion process, when the data are high-frequency observations of the integral of the diffusion process. Such data are obtained via certain measurement devices, or if positions are recorded and speed is modelled by a diffusion. In finance, realized volatility or variations thereof can be used to construct observations of the latent integrated volatility process. Specifically, we assume that the integrated process is observed at equidistant, deterministic time points and consider the high-frequency/infinite horizon asymptotic scenario, where the number of observations, the sampling frequency and the time of the last observation all go to infinity. Subject to mild standard regularity conditions on the diffusion model, we prove the asymptotic existence and uniqueness of a consistent estimator for useful and tractable classes of prediction-based estimating functions. Asymptotic normality of the estimator is obtained under an additional assumption on the rates. The proofs are based on the useful Euler-Ito expansions of transformations of diffusions and integrated diffusions, which we study in some detail.

💡 Research Summary

The paper addresses parametric inference for an ergodic, stationary diffusion process Xₜ when only high‑frequency observations of its time integral Iₜ = ∫₀ᵗ X_s ds are available. Observations are taken at deterministic, equidistant times tᵢⁿ = iΔₙ (i = 0,…,n), yielding the transformed series Y_i = Δₙ⁻¹(I_{t_iⁿ} – I_{t_{i‑1}ⁿ}). The asymptotic framework is high‑frequency and infinite‑horizon: n → ∞, Δₙ → 0, and nΔₙ → ∞. Under mild regularity conditions (smooth drift a(·;θ) and diffusion coefficient b(·;θ), polynomial growth, positivity of b, and ρ‑mixing of Xₜ), the authors develop a comprehensive theory for a class of prediction‑based estimating functions.

Prediction‑based estimating functions G_n(θ) are constructed from a finite collection of functions {f_j} that belong to L²(μ_θ) and have polynomial growth. For each j, the value f_j(Y_i) is projected onto a predictor space spanned by past values {f_j(Y_{i‑k})}{k=0}^{q_j}. The orthogonal projection yields coefficients ĥπ{i‑1,j}(θ) defined by normal equations. The estimator θ̂_n solves G_n(θ) = 0.

A central technical tool is the Euler‑Itô expansion. Proposition 3.1 provides a decomposition for f(X_{t_iⁿ}) into a leading term, a √Δₙ stochastic term involving a standard normal ε_{1,i}, and a remainder ε_{2,i} whose conditional moments are expressed in terms of the generator L_θ. An analogous expansion for the integrated process is given in Proposition 3.3: f(Y_i) = f(X_{t_{i‑1}ⁿ}) + √Δₙ ∂x f·b·ξ{1,i} + ξ_{2,i}, where ξ_{1,i} ∼ N(0,1/3) and ξ_{2,i} has conditional mean Δₙ H_θ f + O(Δₙ^{3/2}). The operator H_θ combines L_θ and a term involving b²∂²_x, reflecting both drift and diffusion contributions of the integrated process.

Section 4 establishes limit theorems for the transformed series using the above expansions together with the ρ‑mixing property, which guarantees exponential decay of autocorrelations and validates Lindeberg‑type conditions for martingale differences. These results enable the derivation of asymptotic properties of the estimating functions.

The main asymptotic results are presented in Section 5. Theorem 5.1 proves existence, uniqueness, and consistency of a G_n‑estimator under the high‑frequency regime, relying on the invertibility of the information matrix Σ(θ) built from the prediction‑based functions and the potential operator U_θ. Theorem 5.2 adds the rate condition nΔₙ² → 0 and shows that √(nΔₙ)(θ̂_n – θ₀) converges in distribution to a normal vector with mean zero and covariance Σ(θ₀)⁻¹. The proofs combine the Euler‑Itô expansions, martingale central limit theorems, and the Poisson equation L_θU_θ(f)=–f to control bias.

Compared with earlier work (e.g., Ditlevsen & Sørensen 2004, Gloter 2000), which focused on low‑frequency data or specific diffusion families, this paper extends the theory to general ergodic diffusions observed through their integrals at high frequency. It demonstrates that prediction‑based estimating functions remain tractable and efficient even when the latent process is non‑Markovian from the observer’s perspective. The authors also discuss practical implications for financial econometrics, where realized volatility provides noisy proxies for integrated volatility, and suggest extensions to multivariate integrated diffusions, microstructure noise, and empirical applications.

Prediction-based inference for integrated diffusions with high-frequency data

💡 Research Summary

Comments & Academic Discussion

Leave a Comment