Learning with hidden variables
Learning and inferring features that generate sensory input is a task continuously performed by cortex. In recent years, novel algorithms and learning rules have been proposed that allow neural network models to learn such features from natural images, written text, audio signals, etc. These networks usually involve deep architectures with many layers of hidden neurons. Here we review recent advancements in this area emphasizing, amongst other things, the processing of dynamical inputs by networks with hidden nodes and the role of single neuron models. These points and the questions they arise can provide conceptual advancements in understanding of learning in the cortex and the relationship between machine learning approaches to learning with hidden nodes and those in cortical circuits.
💡 Research Summary
The paper surveys how the brain might learn and infer hidden causes that generate sensory inputs, framing this problem within the context of deep neural network architectures that contain multiple layers of hidden units. It begins with a historical overview, noting that early models of sensory processing were largely deterministic and hierarchical, but that a probabilistic perspective has gradually become dominant. Under the assumption that sensory signals are samples from a latent probabilistic generative process, cortical circuits are cast as systems that must learn an internal generative model and perform inference on hidden causes.
The authors review classic learning algorithms, starting with back‑propagation, whose biological plausibility is limited due to its reliance on global error signals, slow convergence, and susceptibility to local minima. They then discuss early probabilistic models such as the Boltzmann Machine and its restricted variant (RBM), which introduced stochastic, energy‑based learning but suffered from severe computational costs, especially when hidden layers proliferated. The Helmholtz Machine is presented as a seminal architecture that separates generative and recognition weights, embodying the idea that the brain learns a probability distribution over the world without supervised signals. However, scaling the Helmholtz Machine to many hidden layers proved difficult.
The breakthrough came with Deep Belief Networks (DBNs) and Deep Boltzmann Machines (DBMs). DBNs stack RBMs and employ a layer‑wise pre‑training phase that learns useful representations before a fine‑tuning stage, thereby reducing sensitivity to weight initialization—a process reminiscent of developmental stages of synaptic strengthening in the cortex. DBMs, by contrast, are fully undirected and incorporate feedback connections, offering richer representational power at the cost of more complex and computationally demanding training procedures.
Modern deep learning advances are then examined. The paper highlights the critical role of single‑neuron nonlinearities—particularly non‑saturating functions such as ReLU—in preventing gradient vanishing and accelerating learning. Dropout, a regularization technique that randomly silences units during training, is shown to mitigate co‑adaptation and improve generalization, suggesting that the statistical properties of individual neurons can shape network‑wide learning dynamics, an insight that may map onto neuronal excitability and synaptic plasticity mechanisms.
Dynamic inputs are addressed through convolutional neural networks (CNNs) and recurrent architectures. CNNs, with their hierarchical filtering and pooling operations, echo the functional organization of simple and complex cells in visual cortex, yet they are primarily designed for static images. To handle temporal streams such as video or auditory sequences, recurrent networks (RNNs, LSTMs) and spatiotemporal CNN extensions are required, providing a computational analogue to cortical feedback loops and sensorimotor integration.
Finally, the authors argue that despite impressive engineering successes, deep learning has only limited dialogue with neuroscience. Key biological constraints—local learning rules, metabolic efficiency, neuromodulatory influences—are often ignored. They call for new algorithms that respect these constraints, integrate realistic single‑neuron dynamics, and embody plausible synaptic plasticity mechanisms. By bridging the gap between artificial and biological learning, future work could both advance machine intelligence and deepen our understanding of cortical organization and plasticity.
Comments & Academic Discussion
Loading comments...
Leave a Comment