Convergence Analysis of Two-Layer Neural Networks under Gaussian Input Masking

Reading time: 5 minute
...

📝 Original Info

  • Title: Convergence Analysis of Two-Layer Neural Networks under Gaussian Input Masking
  • ArXiv ID: 2602.17423
  • Date: 2026-02-19
  • Authors: ** 논문에 명시된 저자 정보가 제공되지 않았습니다. — **

📝 Abstract

We investigate the convergence guarantee of two-layer neural network training with Gaussian randomly masked inputs. This scenario corresponds to Gaussian dropout at the input level, or noisy input training common in sensor networks, privacy-preserving training, and federated learning, where each user may have access to partial or corrupted features. Using a Neural Tangent Kernel (NTK) analysis, we demonstrate that training a two-layer ReLU network with Gaussian randomly masked inputs achieves linear convergence up to an error region proportional to the mask's variance. A key technical contribution is resolving the randomness within the non-linear activation, a problem of independent interest.

💡 Deep Analysis

📄 Full Content

Neural networks (NNs) have revolutionized AI applications, where their success largely stems from their ability to learn complex patterns when trained on well-curated datasets (Schuhmann et al., 2022;Li et al., 2023b;Gunasekar et al., 2023;Edwards, 2024). A component to the success of NNs is its ability to model a broad range of tasks and data distributions under various scenarios. Empirical evidence has suggested neural network's ability to learn even under noisy input (Kariotakis et al., 2024), gradient noise (Ruder, 2017), as well as modifications to the internal representations during training (Srivastava et al., 2014;Yuan et al., 2022). Leveraging such ability of the neural networks, many real-world deployment adopts a modification to the data representations during training to achieve particular goals such as robustness, privacy, or efficiency. Among the methods, perturbing the representations with an additive noise has been studied by a number of prior works (Gao et al., 2019;Li et al., 2025;2023a;Madry et al., 2018;Loo et al., 2022;Tsilivis and Kempe, 2022;Ilyas et al., 2019), showcasing both the benefit of such perturbation and the stable convergence of the training under this setting. Compared with additive noise, perturbing the representations by multiplying it with a mask has rarely been studied theoretically.

Perturbing the representations with multiplicative noise appears in many real-world settings, either by design or unintentionally. For instance, in federated learning (FL) settings (McMahan et al., 2017;Kairouz et al., 2021), particularly vertical FL (Cheng et al., 2020;Liu et al., 2021;Romanini et al., 2021;He et al., 2020;Liu et al., 2022;2024), different features of the input data may be available to different parties, effectively creating a form of sparsity-inducing multiplicative masking on the input space. Moreover, the drop-out family (Srivastava et al., 2014;Rey and Mnih, 2021) is a class of methods to prevent overfitting and improve generalization ability of neural networks during training. Lastly, training models under data-parallel protocol over a wireless channel incurs the channel effect that blurs the data passed to the workers through a multiplication Tse and Viswanath (2005).

Theoretically analyzing the training dynamics of neural networks under these settings are difficult, especially when the introduced randomness are intertwined with the nonlinearity of the activation function. While there has been previous work that studies the convergence of neural network training under drop-out (Liao and Kyrillidis, 2022;Mianjy and Arora, 2020), they often assume that the drop-out happens after the nonlinear activations are applied. From a technical perspective, statistics of the neural network outputs are easier to handle as the randomness are not affected by the nonlinearity.

In this paper, we take a step further into the understanding of multiplicative perturbations in neural network training by considering noise applied before the nonlinear activation. In particular, the setting we consider is the training of a two-layer MLP where the inputs bears a multiplicative Gaussian mask. This prototype provides a simplified scenario to study the noise-inside-activation difficulty, while generalizes various training scenarios ranging from input masking (Kariotakis et al., 2024) to Gaussian drop-out (Rey and Mnih, 2021), if one views the input in our setting as fixed embeddings from previous layers of a deep neural network. Under this setting, we aim to answer the following question:

Neural Network Robustness. The study of neural network robustness has a rich history, with early work focusing primarily on additive perturbations. Results such as (Bartlett et al., 2017) and (Miyato et al., 2018) established generalization bounds for neural networks under adversarial perturbations, showing that the network’s Lipschitz constant plays a crucial role in determining robustness. Subsequent work by (Cohen et al., 2019) introduced randomized smoothing techniques for certified robustness against ℓ 2 perturbations, while (Wong et al., 2018) developed methods for training provably robust deep neural networks.

Regularization techniques have emerged as powerful tools for enhancing network robustness. Dropout (Srivastava et al., 2014) pioneered the idea of randomly masking internal neurons during training, effectively creating an implicit ensemble of subnetworks (Yuan et al., 2022;Hu et al., 2023;Kariotakis et al., 2024;Wolfe et al., 2023;Liao and Kyrillidis, 2022;Dun et al., 2023;2022). This connection between feature masking and regularization was further explored in (Ghorbani et al., 2021), who showed that dropout can be interpreted as a form of data-dependent regularization. Note that sparsity-inducing norms, based on Laplacian continuous distribution, have a long history in sparse recovery problems (Bach et al., 2011;Jenatton et al., 2011;Bach et al., 2012;Kyrillidis et al., 2015). Empirical stud

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut