Deep Network Trainability via Persistent Subspace Orthogonality
Training neural networks via backpropagation is often hindered by vanishing or exploding gradients. In this work, we design architectures that mitigate these issues by analyzing and controlling the network Jacobian. We first provide a unified characterization for a class of networks with orthogonal Jacobian including known architectures and yielding new trainable designs. We then introduce the relaxed notion of persistent subspace orthogonality. This applies to a broader class of networks whose Jacobians are isometries only on a non-trivial subspace. We propose practical mechanisms to enforce this condition and empirically show that it is necessary to sufficiently preserve the gradient norms during backpropagation, enabling the training of very deep networks. We support our theory with extensive experiments.
💡 Research Summary
The paper tackles the long‑standing problem of vanishing and exploding gradients in deep neural networks by focusing on the Jacobian of the network mapping. It first provides a unified mathematical characterization of a broad class of networks whose Jacobian is orthogonal almost everywhere. By partitioning the input space into a finite collection of open, connected regions {Ω_i}, the authors consider vector fields of the form
F(x)=∑{i=1}^N 1{Ω_i}(x)
Comments & Academic Discussion
Loading comments...
Leave a Comment