Information-Theoretic Greedy Layer-wise Training for Traffic Sign Recognition

Reading time: 2 minute
...

📝 Original Info

  • Title: Information-Theoretic Greedy Layer-wise Training for Traffic Sign Recognition
  • ArXiv ID: 2510.27651
  • Date: 2025-10-31
  • Authors: ** 정보 없음 (논문에 저자 정보가 제공되지 않음) **

📝 Abstract

Modern deep neural networks (DNNs) are typically trained with a global cross-entropy loss in a supervised end-to-end manner: neurons need to store their outgoing weights; training alternates between a forward pass (computation) and a top-down backward pass (learning) which is biologically implausible. Alternatively, greedy layer-wise training eliminates the need for cross-entropy loss and backpropagation. By avoiding the computation of intermediate gradients and the storage of intermediate outputs, it reduces memory usage and helps mitigate issues such as vanishing or exploding gradients. However, most existing layer-wise training approaches have been evaluated only on relatively small datasets with simple deep architectures. In this paper, we first systematically analyze the training dynamics of popular convolutional neural networks (CNNs) trained by stochastic gradient descent (SGD) through an information-theoretic lens. Our findings reveal that networks converge layer-by-layer from bottom to top and that the flow of information adheres to a Markov information bottleneck principle. Building on these observations, we propose a novel layer-wise training approach based on the recently developed deterministic information bottleneck (DIB) and the matrix-based Rényi's $α$-order entropy functional. Specifically, each layer is trained jointly with an auxiliary classifier that connects directly to the output layer, enabling the learning of minimal sufficient task-relevant representations. We empirically validate the effectiveness of our training procedure on CIFAR-10 and CIFAR-100 using modern deep CNNs and further demonstrate its applicability to a practical task involving traffic sign recognition. Our approach not only outperforms existing layer-wise training baselines but also achieves performance comparable to SGD.

💡 Deep Analysis

Figure 1

📄 Full Content

📸 Image Gallery

Figure_training_dynamic_1.png Figure_training_dynamic_2.png IB_plot.png IB_resnet_plot.png MDS.png MDS_2.png MI_matrix.png MI_matrix_new_resnet.png beta_various.png figure1_sketch.png latent_layerwise_latent.png layerwise_new.png results_compare.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut