Softmax is $1/2$-Lipschitz: A tight bound across all $ell_p$ norms

February 22, 2026

Reading time: 2 minute

...

📝 Original Info

Title: Softmax is $1/2$-Lipschitz: A tight bound across all $ell_p$ norms
ArXiv ID: 2510.23012
Date: 2025-10-27
Authors: ** 논문에 명시된 저자 정보가 제공되지 않았습니다. **

📝 Abstract

The softmax function is a basic operator in machine learning and optimization, used in classification, attention mechanisms, reinforcement learning, game theory, and problems involving log-sum-exp terms. Existing robustness guarantees of learning models and convergence analysis of optimization algorithms typically consider the softmax operator to have a Lipschitz constant of $1$ with respect to the $\ell_2$ norm. In this work, we prove that the softmax function is contractive with the Lipschitz constant $1/2$, uniformly across all $\ell_p$ norms with $p \ge 1$. We also show that the local Lipschitz constant of softmax attains $1/2$ for $p = 1$ and $p = \infty$, and for $p \in (1,\infty)$, the constant remains strictly below $1/2$ and the supremum $1/2$ is achieved only in the limit. To our knowledge, this is the first comprehensive norm-uniform analysis of softmax Lipschitz continuity. We demonstrate how the sharper constant directly improves a range of existing theoretical results on robustness and convergence. We further validate the sharpness of the $1/2$ Lipschitz constant of the softmax operator through empirical studies on attention-based architectures (ViT, GPT-2, Qwen3-8B) and on stochastic policies in reinforcement learning.

Softmax is $1/2$-Lipschitz: A tight bound across all $ell_p$ norms

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

Reference

Table of Contents

Table of Contents

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

Reference

Start searching

No results found