Noise Stability of Transformer Models

Noise Stability of Transformer Models
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Understanding simplicity biases in deep learning offers a promising path toward developing reliable AI. A common metric for this, inspired by Boolean function analysis, is average sensitivity, which captures a model’s robustness to single-token perturbations. We argue that average sensitivity has two key limitations: it lacks a natural generalization to real-valued domains and fails to explain the “junta-like” input dependence we empirically observe in modern LLMs. To address these limitations, we propose noise stability as a more comprehensive simplicity metric. Noise stability expresses a model’s robustness to correlated noise applied to all input coordinates simultaneously. We provide a theoretical analysis of noise stability for single-layer attention and ReLU MLP layers and tackle the multi-layer propagation problem with a covariance interval propagation approach. Building on this theory, we develop a practical noise stability regularization method. Experiments on algorithmic and next-token-prediction tasks show that our regularizer consistently catalyzes grokking and accelerates training by approximately $35%$ and $75%$ respectively. Our results sculpt a new connection between signal propagation in neural networks and interpretability, with noise stability emerging as a powerful tool for understanding and improving modern Transformers.


💡 Research Summary

The paper critiques the prevailing use of average sensitivity—originating from Boolean function analysis—as a metric for simplicity bias in deep learning. While average sensitivity captures a model’s robustness to single‑token perturbations, the authors argue it suffers from two major drawbacks: (1) it does not extend naturally to real‑valued functions typical of modern neural networks, and (2) it fails to explain the “junta‑like” behavior observed in large language models (LLMs) such as GPT‑2, GEMMA‑2B, and ROBERTA, where only a few tokens dominate the model’s output.

To overcome these limitations, the authors introduce noise stability as a new simplicity measure. Noise stability evaluates a function’s resilience when all input coordinates are simultaneously corrupted by correlated Gaussian noise. Formally, for a function (f\in L^2(\gamma)) and correlation parameter (\rho\in(0,1)), they define a pair ((X,Y)) with (Y=\rho X+\sqrt{1-\rho^2}Z) (independent Gaussian (Z)). The stability is (\operatorname{Stab}_\rho(f)=\mathbb{E}


Comments & Academic Discussion

Loading comments...

Leave a Comment