Transfer entropy-based feedback improves performance in artificial neural networks
The structure of the majority of modern deep neural networks is characterized by uni- directional feed-forward connectivity across a very large number of layers. By contrast, the architecture of the cortex of vertebrates contains fewer hierarchical levels but many recurrent and feedback connections. Here we show that a small, few-layer artificial neural network that employs feedback will reach top level performance on a standard benchmark task, otherwise only obtained by large feed-forward structures. To achieve this we use feed-forward transfer entropy between neurons to structure feedback connectivity. Transfer entropy can here intuitively be understood as a measure for the relevance of certain pathways in the network, which are then amplified by feedback. Feedback may therefore be key for high network performance in small brain-like architectures.
💡 Research Summary
The paper investigates whether feedback connections—abundant in biological cortex but largely absent in modern deep learning architectures—can substantially boost the performance of a relatively small convolutional neural network. The authors focus on AlexNet, an eight‑layer network (five convolutional layers followed by three fully‑connected layers) that, when trained on CIFAR‑10, typically reaches about 85 % classification accuracy—far below the state‑of‑the‑art results achieved by much deeper models.
To introduce feedback in a principled way, the authors compute the transfer entropy (TE) between every pair of neurons that belong to different layers. TE measures the directed information flow from a source neuron i to a target neuron j, conditioned on the intermediate layers. After training the feed‑forward AlexNet, they record the continuous activations g_i for all images, binarize them (y_i = 1 if g_i > 0.001, else 0), and calculate TE for each image class. The class‑wise TE values are then averaged to obtain a single relevance score ˜T_i→j for each ordered pair (i, j).
A feedback connection from neuron j (layer β) to neuron i (layer α) is created only if the averaged TE ˜T_i→j falls below a user‑defined threshold Φ. The weight of this feedback link is set to
f_j→i = w_min · |β − α| / L,
where w_min is the smallest weight in the trained feed‑forward network, L = 4 is the maximum layer distance considered, and |β − α| is the absolute layer gap. This formulation guarantees that feedback weights are always smaller than any existing feed‑forward weight, while giving relatively larger values to long‑range feedback links. Feedback is added only between convolutional layers 2–5; the input layer and the three dense layers remain purely feed‑forward.
With this feedback‑augmented network (FB‑AlexNet) the authors evaluate performance on the full CIFAR‑10 test set. When Φ = 0.9, the network achieves 95 % accuracy, a full 10 % gain over the baseline and comparable to the best results reported for much larger architectures. The improvement is robust across a wide range of Φ values (approximately 0.5 < Φ < 0.9); only when Φ approaches 1, causing excessive feedback, does the network suffer from runaway excitation and performance collapses.
To rule out trivial explanations, several control experiments are performed: (1) random shuffling of all feedback weights, which reduces accuracy to ~18 %; (2) shuffling weights within each pathway, yielding ~30 % accuracy; (3) assigning a uniform small weight to every feedback link, which gives a modest boost to ~87 % but degrades quickly if the weight is increased; (4) scaling all feedback weights by a factor λ, showing an optimum at λ = 1 (the proposed setting) and degradation for λ ≠ 1; (5) adding the feedback weights to the feed‑forward weights and then removing the feedback connections (an “augmented feed‑forward” model), which reaches only 88 % accuracy. These controls confirm that the selective, TE‑driven feedback pattern—not merely the addition of extra connections or extra weight magnitude—is responsible for the performance gain.
Beyond classification accuracy, the authors analyze network topology. After adding feedback, the average TE across the whole network decreases, the characteristic path length drops from 4.2 to 2.1, and global efficiency rises from 0.40 to 0.67. Moreover, local active information storage (the amount of a unit’s future state predictable from its own past) increases markedly in the feedback‑enabled network, especially in deeper layers. This suggests that feedback amplifies the influence of each neuron on its own dynamics, effectively creating richer internal representations.
The discussion interprets these findings in the context of cortical processing. In deep CNNs, information tends to converge strongly toward higher layers, potentially discarding useful intermediate features. Transfer entropy naturally captures how much information actually propagates between distant layers; low TE between distant neurons indicates a long, meaningful pathway that could benefit from reinforcement. By assigning stronger feedback to such long‑range pairs, the method mimics a cortical strategy where feedback selectively boosts salient, high‑level pathways while leaving short‑range, already strong feed‑forward links relatively untouched, thereby preventing runaway activity.
In summary, the paper demonstrates that a biologically inspired feedback mechanism, grounded in an information‑theoretic measure (transfer entropy), can transform a modest‑size convolutional network into a top‑performing model on a standard benchmark without increasing the number of neurons or layers. The approach is architecture‑agnostic and could be applied to other tasks, offering a promising route toward more brain‑like, efficient deep learning systems that achieve high performance with relatively small computational footprints.
Comments & Academic Discussion
Loading comments...
Leave a Comment