Lateral Connections in Denoising Autoencoders Support Supervised Learning
We show how a deep denoising autoencoder with lateral connections can be used as an auxiliary unsupervised learning task to support supervised learning. The proposed model is trained to minimize simultaneously the sum of supervised and unsupervised cost functions by back-propagation, avoiding the need for layer-wise pretraining. It improves the state of the art significantly in the permutation-invariant MNIST classification task.
💡 Research Summary
The paper introduces a novel training scheme that couples a deep denoising autoencoder (DAE) equipped with lateral connections to a supervised classification task, allowing both objectives to be optimized simultaneously through standard back‑propagation. Traditional DAEs reconstruct corrupted inputs by passing information through all layers, which forces higher layers to retain fine‑grained details that are often irrelevant for classification. By adding neuron‑wise lateral links between each encoder layer and its corresponding decoder layer, the model lets low‑level details bypass the higher hierarchy and flow directly to the decoder. Consequently, the upper layers can focus on learning abstract, task‑relevant features while the lower layers preserve the information needed for accurate reconstruction.
Model Architecture
- Encoder: A multilayer perceptron (MLP) with batch‑normalization applied to every pre‑activation, followed by ReLU non‑linearity (softmax at the top). Input images are corrupted with isotropic Gaussian noise (σ = 0.3) before being fed to the network, providing a regularizing effect. The topmost hidden representation (h^{(L)}) is used directly as the class‑probability vector (y).
- Decoder: Mirrors the encoder in size but incorporates two streams of information for each layer (l): (1) a lateral term that processes the encoder activation (z^{(l)}) neuron‑wise, and (2) a vertical term obtained by projecting the higher‑level decoder output (\hat{z}^{(l+1)}) through a weight matrix (V^{(l+1)}) to the same dimensionality as (z^{(l)}). The final denoised activation is computed as
\
Comments & Academic Discussion
Loading comments...
Leave a Comment