NCSAM Noise-Compensated Sharpness-Aware Minimization for Noisy Label Learning
Learning from Noisy Labels (LNL) presents a fundamental challenge in deep learning, as real-world datasets often contain erroneous or corrupted annotations, \textit{e.g.}, data crawled from Web. Current research focuses on sophisticated label correction mechanisms. In contrast, this paper adopts a novel perspective by establishing a theoretical analysis the relationship between flatness of the loss landscape and the presence of label noise. In this paper, we theoretically demonstrate that carefully simulated label noise synergistically enhances both the generalization performance and robustness of label noises. Consequently, we propose Noise-Compensated Sharpness-aware Minimization (NCSAM) to leverage the perturbation of Sharpness-Aware Minimization (SAM) to remedy the damage of label noises. Our analysis reveals that the testing accuracy exhibits a similar behavior that has been observed on the noise-clear dataset. Extensive experimental results on multiple benchmark datasets demonstrate the consistent superiority of the proposed method over existing state-of-the-art approaches on diverse tasks.
💡 Research Summary
The paper tackles the longstanding challenge of learning with noisy labels (LNL) by investigating the interplay between label noise and loss‑landscape flatness, and by proposing a novel optimization method called Noise‑Compensated Sharpness‑Aware Minimization (NCSAM).
First, the authors formalize a generic label‑corruption model where each true label is flipped with probability α to a noisy label drawn from a Beta distribution. They decompose the stochastic gradient into a clean component g_clean and a noise‑induced bias g_noise, showing that the latter causes a systematic parameter drift Δ\hat W = –η g_noise.
Using a PAC‑Bayes framework, they model the posterior distribution after noisy training as a Gaussian centered at the clean optimum w shifted by Δw. The KL‑divergence term in the PAC‑Bayes bound then grows with ∥w + Δw∥², indicating that noise inflates the generalization bound.
Sharpness‑Aware Minimization (SAM) is interpreted as adding a random perturbation ϵ ∼ N(0,β²I) to the parameters and minimizing the worst‑case loss within a radius ρ. When the loss is evaluated under noisy supervision, the bound depends on the additive term w + Δw + ϵ. Consequently, if Δw is large, the SAM perturbation can amplify the noise bias rather than guide the optimizer toward flat minima, explaining recent empirical observations that SAM underperforms on noisy data.
Motivated by this analysis, the authors design NCSAM. For each mini‑batch they estimate the noise‑induced shift Δŵ by comparing gradients computed with clean versus noisy labels (the clean gradients are approximated using the loss on the subset of samples believed to be correct). They then modify the SAM perturbation to ϵ̂ = ϵ – Δŵ. This correction effectively cancels the noise‑induced drift, so the updated parameters follow w + ϵ, preserving SAM’s flat‑minimum seeking behavior while neutralizing the harmful effect of label noise. The algorithm retains SAM’s two‑step structure (max‑perturbation search then parameter update) and adds only a lightweight subtraction of Δŵ.
Extensive experiments are conducted on CIFAR‑10 and CIFAR‑100 with symmetric noise rates of 40 % to 80 % and with asymmetric noise, as well as on the real‑world noisy dataset Clothing1M. Various architectures (ResNet‑18/34, WideResNet‑28‑10, DenseNet) are evaluated. Across all settings, NCSAM consistently outperforms standard SAM, plain SGD, and a suite of state‑of‑the‑art LNL methods (Co‑Teaching, DivideMix, JoCoR, etc.). For example, with 80 % symmetric noise, SGD reaches ~68 % accuracy, SAM ~71 %, while NCSAM achieves ~74 %, demonstrating superior robustness. Training curves reveal that NCSAM suppresses early memorization of noisy labels and maintains a flatter loss surface for a longer period.
The contributions are threefold: (1) a theoretical link between label noise and loss‑landscape sharpness via PAC‑Bayes analysis; (2) an explanation of why existing SAM variants fail under noisy supervision; (3) the NCSAM algorithm that compensates for noise‑induced parameter drift without any label‑cleaning or curriculum heuristics. The method is simple to implement, compatible with existing deep‑learning pipelines, and opens new avenues for robust optimization in noisy environments.
Limitations include the reliance on an accurate estimate of Δŵ, which may be sensitive to batch size and noise level, and the focus on classification tasks with synthetic or web‑collected noise. Future work could explore more sophisticated drift estimators, extend the approach to structured noise (e.g., class‑dependent or instance‑dependent corruption), and tighten the PAC‑Bayes bounds to better reflect empirical performance. Overall, the paper provides a compelling blend of theory and practice, offering a principled alternative to label‑correction strategies for learning under noisy supervision.
Comments & Academic Discussion
Loading comments...
Leave a Comment