Noise Tolerance under Risk Minimization
In this paper we explore noise tolerant learning of classifiers. We formulate the problem as follows. We assume that there is an ${\bf unobservable}$ training set which is noise-free. The actual training set given to the learning algorithm is obtained from this ideal data set by corrupting the class label of each example. The probability that the class label of an example is corrupted is a function of the feature vector of the example. This would account for most kinds of noisy data one encounters in practice. We say that a learning method is noise tolerant if the classifiers learnt with the ideal noise-free data and with noisy data, both have the same classification accuracy on the noise-free data. In this paper we analyze the noise tolerance properties of risk minimization (under different loss functions), which is a generic method for learning classifiers. We show that risk minimization under 0-1 loss function has impressive noise tolerance properties and that under squared error loss is tolerant only to uniform noise; risk minimization under other loss functions is not noise tolerant. We conclude the paper with some discussion on implications of these theoretical results.
💡 Research Summary
The paper investigates the robustness of risk‑minimization based classifiers to label noise that may depend on the feature vector, a setting that captures most practical noisy datasets. An unobservable, noise‑free training set is assumed; the observed training set is generated by flipping the label of each example with probability η(x). When η(x) is constant the noise is uniform; otherwise it is non‑uniform. The authors define a learning method to be “noise‑tolerant” if the classifier obtained by minimizing risk on the noisy data has the same mis‑classification probability (with respect to the true distribution) as the classifier obtained from the ideal noise‑free data.
The analysis proceeds loss‑function by loss‑function.
-
0‑1 loss
For any classifier f, the risk under clean data is simply the probability mass of the error region S(f). Under uniform label noise η, the noisy risk becomes Rη(f)=η+(1−2η)R(f). Since η<0.5, the minimizer of R also minimizes Rη, proving that 0‑1 loss is completely tolerant to uniform noise. For non‑uniform noise the authors show that if the optimal clean classifier achieves zero error (R(f*)=0), then it remains optimal under noise; otherwise, a counter‑example with a quadratic true boundary and linear hypothesis class demonstrates loss of tolerance. Thus 0‑1 loss is tolerant only when perfect separation is possible. -
Squared‑error loss
Assuming a linear model f(x)=wᵀx+b, the clean risk minimizer is w*=(E
Comments & Academic Discussion
Loading comments...
Leave a Comment