Bounds on the Bayes Error Given Moments
We show how to compute lower bounds for the supremum Bayes error if the class-conditional distributions must satisfy moment constraints, where the supremum is with respect to the unknown class-conditional distributions. Our approach makes use of Curto and Fialkow’s solutions for the truncated moment problem. The lower bound shows that the popular Gaussian assumption is not robust in this regard. We also construct an upper bound for the supremum Bayes error by constraining the decision boundary to be linear.
💡 Research Summary
The paper addresses the fundamental question: given only a finite set of moments (e.g., means and variances) of the class‑conditional distributions, what is the worst‑case Bayes error that can arise? The authors develop both a lower bound and an upper bound on the supremum Bayes error over all possible distributions that satisfy the prescribed moment constraints.
The lower‑bound construction is based on forcing all G class‑conditional distributions to share a common point mass ε at the same location. If each distribution contains a Dirac mass ε at a common point, the Bayes error is guaranteed to be at least ε·(G‑1)/G. The maximal feasible ε is determined by the feasibility of a truncated moment problem: one must find a probability measure whose moments match the given constraints after removing the common mass. Using the Curto‑Fialkow theory of truncated moment problems, the authors translate feasibility into positivity of a moment matrix and a rank‑matching condition. For one‑dimensional distributions, they derive explicit formulas: with only first moments, ε can be arbitrarily close to 1, implying that the Bayes error can approach ½ for two equally likely classes. When both first and second moments are known, ε is bounded by 1‑(μ_i²/σ_i²) for each class i, where μ_i and σ_i² are the prescribed mean and variance. By allowing a global shift Δ of all distributions, they further tighten the bound; the optimal Δ solves a simple quadratic (or linear) equation when variances differ (or are equal). The resulting lower bound is expressed as a supremum over Δ of a simple function of the moments and class priors.
The upper bound is obtained by restricting the decision rule to linear classifiers. Building on Lanckriet et al.’s framework for linear discriminant analysis under moment constraints, the authors formulate a semidefinite program (SDP) that maximizes the error of the best linear separator consistent with the given moments. This yields a computable, though generally loose, upper bound on the worst‑case Bayes error.
The paper includes numerical experiments that illustrate the tightness of the bounds. For synthetic data with varying mean separations and variance ratios, the lower bound closely tracks the actual worst‑case error, while the Gaussian‑assumed error is often overly optimistic, especially when variances differ substantially. The upper bound, though higher than the true Bayes error, provides a practical benchmark: if a classifier’s empirical error far exceeds this bound, either the classifier’s decision surface is far from optimal or the test set is unrepresentative.
Overall, the work demonstrates that knowledge of only a few moments is insufficient to guarantee low Bayes error under the common Gaussian assumption. The derived bounds give practitioners a quantitative tool to assess the robustness of moment‑based modeling assumptions, to diagnose overly optimistic error estimates, and to guide data collection when only limited statistical information is available. Future directions suggested include extending the truncated moment analysis to multivariate settings and deriving tighter upper bounds for nonlinear decision surfaces.
Comments & Academic Discussion
Loading comments...
Leave a Comment