NAIS-Net: Stable Deep Networks from Non-Autonomous Differential Equations
This paper introduces Non-Autonomous Input-Output Stable Network(NAIS-Net), a very deep architecture where each stacked processing block is derived from a time-invariant non-autonomous dynamical system. Non-autonomy is implemented by skip connections from the block input to each of the unrolled processing stages and allows stability to be enforced so that blocks can be unrolled adaptively to a pattern-dependent processing depth. NAIS-Net induces non-trivial, Lipschitz input-output maps, even for an infinite unroll length. We prove that the network is globally asymptotically stable so that for every initial condition there is exactly one input-dependent equilibrium assuming $tanh$ units, and incrementally stable for ReL units. An efficient implementation that enforces the stability under derived conditions for both fully-connected and convolutional layers is also presented. Experimental results show how NAIS-Net exhibits stability in practice, yielding a significant reduction in generalization gap compared to ResNets.
💡 Research Summary
The paper introduces NAIS‑Net (Non‑Autonomous Input‑Output Stable Network), a deep neural architecture derived from a time‑invariant, non‑autonomous dynamical system. Unlike conventional residual networks (ResNets) where the external input influences only the first layer of a block, NAIS‑Net injects the input into every unrolled stage through skip connections, making the system non‑autonomous. This design enables the authors to derive rigorous stability conditions that guarantee well‑behaved trajectories for any initial state.
Mathematically, each layer follows
x(k+1) = x(k) + h·σ(A·x(k) + B·u + b),
where A and B are state‑ and input‑transfer matrices, σ is either tanh or ReLU, and h>0 is a step size. For convolutional layers the matrix multiplications are replaced by convolutions, but the analysis remains identical.
The core stability result hinges on the Jacobian
J(x,u) = I + h·diag(σ′(Δx(k)))·A,
and the requirement that its spectral radius satisfies ρ̄ = sup_{(x,u)∈P} ρ(J) < 1 (Condition 1). To enforce this, the authors parameterize A as a negative‑definite symmetric matrix: A = –RᵀR – εI, with ε>0 and trainable R. During training they apply a projection (Algorithm 1 for fully‑connected layers, Algorithm 2 for CNNs) that rescales R so that the Frobenius norm of RᵀR respects a bound derived from ε and h. This projection guarantees Condition 1 for h ≤ 1.
Under Condition 1, two theoretical regimes are proved:
-
With tanh activations, the system is globally asymptotically stable. Every trajectory converges to a unique input‑dependent equilibrium x̄ = A⁻¹B·u, independent of the initial state. The input‑output (IO) gain γ(·) is linear and bounded by h·‖B‖/(1‑ρ̄), implying that bounded input perturbations lead to bounded output deviations.
-
With ReLU activations, the system is globally incrementally practically stable. The distance between two trajectories driven by the same input decays at rate ρ̄, plus a constant term that depends on the initial distance. Moreover, the distance between trajectories driven by different inputs is bounded by ρ̄ times the initial distance plus the IO gain applied to the input difference. Hence the network remains Lipschitz even for an infinite unroll length.
Experimentally, NAIS‑Net is evaluated on CIFAR‑10 and CIFAR‑100 using the same parameter budget as comparable ResNets. Results show that NAIS‑Net achieves similar or slightly better classification accuracy while dramatically reducing the generalization gap (by 30‑40%). Because each block can be unrolled many more times (10‑20× deeper) without increasing the total number of parameters, the architecture can learn pattern‑dependent processing depths. Notably, training does not require batch normalization (except when changing dimensionality), demonstrating the practical robustness of the stability‑enforced design.
In summary, NAIS‑Net bridges deep learning and control theory by treating deep feed‑forward networks as non‑autonomous discrete‑time dynamical systems. By imposing a spectral‑radius constraint on the state‑transition Jacobian through a simple weight‑projection scheme, the authors guarantee both global state convergence and bounded input‑output behavior. This provides a principled solution to vanishing/exploding gradients and improves robustness to input perturbations, offering a new direction for designing stable, deep architectures.
Comments & Academic Discussion
Loading comments...
Leave a Comment