Admissibility of Stein Shrinkage for Batch Normalization in the Presence of Adversarial Attacks

Admissibility of Stein Shrinkage for Batch Normalization in the Presence of Adversarial Attacks
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Batch normalization (BN) is a ubiquitous operation in deep neural networks, primarily used to improve stability and regularization during training. BN centers and scales feature maps using sample means and variances, which are naturally suited for Stein’s shrinkage estimation. Applying such shrinkage yields more accurate mean and variance estimates of the batch in the mean-squared-error sense. In this paper, we prove that the Stein shrinkage estimator of the mean and variance dominates over the sample mean and variance estimators, respectively, in the presence of adversarial attacks modeled using sub-Gaussian distributions. Furthermore, by construction, the James-Stein (JS) BN yields a smaller local Lipschitz constant compared to the vanilla BN, implying better regularity properties and potentially improved robustness. This facilitates and justifies the application of Stein shrinkage to estimate the mean and variance parameters in BN and the use of it in image classification and segmentation tasks with and without adversarial attacks. We present SOTA performance results using this Stein-corrected BN in a standard ResNet architecture applied to the task of image classification using CIFAR-10 data, 3D CNN on PPMI (neuroimaging) data, and image segmentation using HRNet on Cityscape data with and without adversarial attacks.


💡 Research Summary

The paper investigates the use of James‑Stein (JS) shrinkage estimators for the mean and variance calculations in Batch Normalization (BN) and demonstrates that this approach yields statistically superior estimates, especially under adversarial perturbations modeled as sub‑Gaussian noise. Traditional BN computes per‑channel means and variances directly from the mini‑batch, which can be noisy when the batch size is small or the feature dimensionality is high. The authors propose replacing the sample mean with the classic JS estimator, which shrinks the empirical mean toward the origin by a factor 1 − (p − 2)σ²/‖μ̂‖², thereby reducing overall mean‑squared error (MSE) at the cost of a small bias. For the variance, they note that the sample variance follows a scaled chi‑square (Gamma) distribution, so they adopt a Stein‑type shrinkage formula for the Gamma scale parameter, introducing a shrinkage constant ˜c ∈


Comments & Academic Discussion

Loading comments...

Leave a Comment