Neuromodulation via Krotov-Hopfield Improves Accuracy and Robustness of RBMs

Neuromodulation via Krotov-Hopfield Improves Accuracy and Robustness of RBMs
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

In biological systems, neuromodulation tunes synaptic plasticity based on the internal state of the organism, complementing stimulus-driven Hebbian learning. The algorithm recently proposed by Krotov and Hopfield \cite{krotov_2019} can be utilized to mirror this process in artificial neural networks, where its built-in intra-layer competition and selective inhibition of synaptic updates offer a cost-effective remedy for the lack of lateral connections through a simplified attention mechanism. We demonstrate that KH-modulated RBMs outperform standard (shallow) RBMs in both reconstruction and classification tasks, offering a superior trade-off between generalization performance and model size, with the additional benefit of robustness to weight initialization as well as to overfitting during training.


💡 Research Summary

The paper introduces a biologically inspired neuromodulatory mechanism into Restricted Boltzmann Machines (RBMs) by integrating the Krotov‑Hopfield (KH) unsupervised learning rule. Traditional RBMs, while tractable due to the removal of intra‑layer connections, suffer from limited expressive power. The KH algorithm, derived from BCM theory, implements a three‑factor learning rule: it ranks hidden‑layer neurons by their input current, grants full Hebbian potentiation to the top‑ranked neuron, applies anti‑Hebbian weakening to the next ℓ neurons, and leaves the rest unchanged. This global competition signal acts as a context‑dependent learning‑rate modifier (ε) and effectively creates indirect lateral inhibition among hidden units without explicit connections, resembling a simplified attention mechanism.

Training proceeds by first performing standard Contrastive Divergence (CD‑k) to obtain the RBM gradient, then adding the KH‑induced update ξ_t as an additive term. The combined update can be written as a Langevin‑like dynamics where ξ_t is orthogonal to the gradient, providing a structured perturbation rather than random noise. Empirically, ξ_t steers the optimization toward alternative minima where hidden‑unit receptive fields overlap less, as measured by reduced cosine similarity (0.41 → 0.35). The authors explore two integration schemes: top‑down (KH‑TD) using hidden activations as the input vector, and bottom‑up (KH‑BU) using visible data; KH‑TD yields slightly better results.

Experiments on a binarized MNIST dataset use RBMs with 100 and 500 hidden units, CD‑1 and CD‑10, and two common weight initializations (standard normal and LeCun). The KH learning rate ε is annealed according to ε₀(1‑epoch/S)^{3/2}, with S controlling the duration of neuromodulation. Across all settings, KH‑modulated RBMs outperform standard shallow RBMs in validation reconstruction mean‑squared error and cross‑entropy, while showing robustness to weight initialization. Notably, KH‑RBMs avoid over‑fitting: validation loss does not increase during training, and the models retain superior performance even after the KH term is fully phased out. Faster annealing (smaller S) can recover rapid convergence comparable to LeCun‑initialized RBMs without sacrificing final accuracy.

The study demonstrates that a global competitive signal, inspired by neuromodulation, can compensate for the lack of intra‑layer connections in RBMs, yielding a “semi‑restricted” model that balances expressive capacity and computational efficiency. The authors suggest that this approach could be extended to deeper Boltzmann machines, variational autoencoders, or reinforcement‑learning policies, opening avenues for biologically plausible regularization in a broad range of energy‑based models.


Comments & Academic Discussion

Loading comments...

Leave a Comment