On the Stability of Empirical Risk Minimization in the Presence of Multiple Risk Minimizers
Recently Kutin and Niyogi investigated several notions of algorithmic stability–a property of a learning map conceptually similar to continuity–showing that training-stability is sufficient for consistency of Empirical Risk Minimization while distribution-free CV-stability is necessary and sufficient for having finite VC-dimension. This paper concerns a phase transition in the training stability of ERM, conjectured by the same authors. Kutin and Niyogi proved that ERM on finite hypothesis spaces containing a unique risk minimizer has training stability that scales exponentially with sample size, and conjectured that the existence of multiple risk minimizers prevents even super-quadratic convergence. We prove this result for the strictly weaker notion of CV-stability, positively resolving the conjecture.
💡 Research Summary
The paper investigates the stability properties of Empirical Risk Minimization (ERM) in the presence of multiple risk minimizers, building on the framework introduced by Kutin and Niyogi. Kutin and Niyogi defined several notions of algorithmic stability—training‑stability, CV‑stability, and distribution‑free CV‑stability—and showed that training‑stability is sufficient for the consistency of ERM, while distribution‑free CV‑stability characterizes finite VC‑dimension. They also proved that when the hypothesis space is finite and contains a unique risk minimizer, ERM enjoys exponential training‑stability: the probability that the loss changes appreciably after replacing a single training example decays as (e^{-\Omega(m)}), where (m) is the sample size.
The authors of the present work focus on a conjecture made by Kutin and Niyogi: if the hypothesis space contains multiple risk minimizers, then ERM should lose its super‑quadratic convergence, i.e., the stability should degrade dramatically. While the original conjecture was stated for training‑stability, the present paper resolves it for the weaker notion of CV‑stability, thereby confirming the intuition that multiple optimal hypotheses fundamentally alter the stability landscape.
The technical contribution proceeds as follows. Let (\mathcal{H}) be a finite hypothesis class and (\mathcal{H}^* = {h \in \mathcal{H} : R(h) = \inf_{h’} R(h’)}) denote the set of risk minimizers. When (|\mathcal{H}^*| = 1), the ERM rule selects the unique optimal hypothesis regardless of the sample, and the stability analysis reduces to bounding the probability that a single example influences the empirical risk of that hypothesis. This yields the exponential bound mentioned above.
When (|\mathcal{H}^*| \ge 2), the ERM rule may break ties arbitrarily (or according to a deterministic but sample‑dependent rule). The authors model this tie‑breaking as a uniform random selection among the optimal hypotheses, which captures the worst‑case scenario for stability. They then examine the effect of replacing a single training point (z_i) with an independent copy (z_i’). The key observation is that such a replacement can change the empirical risk ordering among the optimal hypotheses only if the altered point lies in a region where the two hypotheses differ. Because the hypotheses are distinct yet share the same true risk, the probability of this event is proportional to (1/m). Consequently, the expected loss difference between the original ERM output (h_S) and the perturbed output (h_{S^{(i)}}) scales as (\Theta(1/m)).
Formally, the paper proves that for any (\epsilon > 0) there exists a constant (c) (depending on the loss function and the size of (\mathcal{H}^*)) such that
\
Comments & Academic Discussion
Loading comments...
Leave a Comment