Haussdorff consistency of MLE in folded normal and Gaussian mixtures
We develop a constant-tracking likelihood theory for two nonregular models: the folded normal and finite Gaussian mixtures. For the folded normal, we prove boundary coercivity for the profiled likelihood, show that the profile path of the location parameter exists and is strictly decreasing by an implicit-function argument, and establish a unique profile maximizer in the scale parameter. Deterministic envelopes for the log-likelihood, the score, and the Hessian yield elementary uniform laws of large numbers with finite-sample bounds, avoiding covering numbers. Identification and Kullback-Leibler separation deliver consistency. A sixth-order expansion of the log hyperbolic cosine creates a quadratic-minus-quartic contrast around zero, leading to a nonstandard one-fourth-power rate for the location estimator at the kink and a standard square-root rate for the scale estimator, with a uniform remainder bound. For finite Gaussian mixtures with distinct components and positive weights, we give a short identifiability proof up to label permutations via Fourier and Vandermonde ideas, derive two-sided Gaussian envelopes and responsibility-based gradient bounds on compact sieves, and obtain almost-sure and high-probability uniform laws with explicit constants. Using a minimum-matching distance on permutation orbits, we prove Hausdorff consistency on fixed and growing sieves. We quantify variance-collapse spikes via an explicit spike-bonus bound and show that a quadratic penalty in location and log-scale dominates this bonus, making penalized likelihood coercive; when penalties shrink but sample size times penalty diverges, penalized estimators remain consistent. All proofs are constructive, track constants, verify measurability of maximizers, and provide practical guidance for tuning sieves, penalties, and EM-style optimization.
💡 Research Summary
This paper develops a rigorous, constant‑tracking likelihood theory for two non‑regular statistical models that possess finite symmetries: the folded normal distribution (sign symmetry) and finite Gaussian mixture models (label permutation symmetry). The authors adopt a quotient‑space framework in which a finite group G acts isometrically on the parameter space Θ, and they define the orbit distance d_G as the minimal Euclidean distance over the group. This construction turns the original parameter space into an orbit space Θ/G equipped with a genuine metric, allowing identification of parameters up to the symmetry.
A deterministic argmax‑stability lemma is proved: uniform approximation of a continuous objective together with a positive population gap guarantees that the empirical argmax set lies within any prescribed orbit‑Hausdorff neighbourhood of the true argmax set. When the true argmax set is isolated (or finite), full Hausdorff convergence follows. This lemma is the engine that converts uniform laws of large numbers (ULLNs) and Kullback–Leibler (KL) separation into orbit‑Hausdorff consistency of maximum‑likelihood estimators (MLEs), even when the argmax may be set‑valued.
Folded Normal Model
The folded normal is obtained by observing Y = |X| with X ~ N(µ,σ²). The sign group G = {±1} acts on (µ,σ) by flipping µ. The log‑likelihood can be written as a constant plus log cosh(µ y/σ²). The authors first establish deterministic hyperbolic bounds and a lower bound on the negative log‑density that prevents degenerate samples from causing infinite likelihood. They then profile out µ for fixed σ, showing that the score in µ is (1/σ²) ∑ y_i tanh(y_i µ/σ²) and the curvature involves ∑ y_i² sech²(y_i µ/σ²). By an implicit‑function argument they prove the existence of a unique profile maximizer µ̂(σ) that is strictly decreasing in σ, and they demonstrate boundary coercivity of the profiled likelihood in σ, guaranteeing a global maximizer on any compact sieve.
KL‑identifiability holds on the orbit space: M(µ,σ) – M(µ₀,σ₀) = –KL(f_{µ₀,σ₀}‖f_{µ,σ}) ≤ 0 with equality iff (µ,σ) lies in the orbit of (µ₀,σ₀). Using uniform LLNs with deterministic envelopes for the log‑likelihood, score, and Hessian, the authors obtain finite‑sample bounds without covering numbers. This yields almost‑sure consistency of the orbit‑MLE.
A key technical contribution is a sixth‑order expansion of log(2 cosh t) = t²/2 – t⁴/12 + R₆(t) with an explicit remainder bound. Near µ₀ = 0 this produces a “quadratic‑minus‑quartic” contrast, leading to a nonstandard convergence rate n^{-1/4} for the location estimator, while the scale estimator retains the usual √n rate. The remainder is uniformly controlled on a shrinking window of width n^{-1/4}, establishing a precise asymptotic distribution for the singular case.
Finite Gaussian Mixtures
For a k‑component mixture with distinct means and variances and strictly positive mixing weights, the permutation group S_k acts on the parameter vector θ = (π_j,µ_j,σ_j)_{j=1}^k. The authors give a short identifiability proof using characteristic functions: equality of mixtures implies equality of the corresponding finite Vandermonde system, which forces component parameters to match up to permutation.
They construct compact sieves K_n that bound the parameters away from zero and infinity. On each sieve they derive two‑sided Gaussian envelopes for the log‑density and responsibility‑based gradient bounds. These bounds are Lipschitz in θ with a random slope that can be controlled via a finite‑net argument, yielding a uniform law of large numbers with explicit constants: sup_{θ∈K_n}|M_n(θ) – M(θ)| ≤ C √(log|K_n|/n) with high probability.
On the full space the likelihood is unbounded because a component’s variance can collapse to zero (variance‑collapse spikes). To restore coercivity, a ridge penalty g(θ)=λ∑(µ_j²+(log σ_j)²) is added. The authors prove that for any λ>0 the penalized criterion is coercive, and that if λ_n→0 while nλ_n→∞, the penalized MLE remains consistent on the orbit space. They also provide an explicit “spike‑bonus” bound showing that the quadratic penalty dominates any gain from variance collapse.
Combining the deterministic argmax‑stability lemma with the uniform LLN and KL separation, they obtain orbit‑Hausdorff consistency of the (penalized) sieve MLEs for both fixed and growing sieves. The paper discusses practical tuning: how to choose sieve radii, penalty magnitude, and initialization for EM‑style algorithms, and it verifies measurability of set‑valued maximizers.
Discussion summarizes contributions, points out limitations (e.g., strong identifiability assumptions, need for distinct components), and outlines future directions such as rates for mixture order selection, extensions to other nonregular models, and refined LAN theory at singular points.
Overall, the work provides a constructive, constant‑tracking framework that unifies existence, consistency, and rate results for MLEs in models with finite symmetries, delivering both deep theoretical insights and concrete guidance for implementation.
Comments & Academic Discussion
Loading comments...
Leave a Comment