The acoustic cues used by humans and other animals to localise sounds are subtle, and change during and after development. This means that we need to constantly relearn or recalibrate the auditory spatial map throughout our lifetimes. This is often thought of as a "supervised" learning process where a "teacher" (for example, a parent, or your visual system) tells you whether or not you guessed the location correctly, and you use this information to update your map. However, there is not always an obvious teacher (for example in babies or blind people). Using computational models, we showed that approximate feedback from a simple innate circuit, such as that can distinguish left from right (e.g. the auditory orienting response), is sufficient to learn an accurate full-range spatial auditory map. Moreover, using this mechanism in addition to supervised learning can more robustly maintain the adaptive neural representation. We find several possible neural mechanisms that could underlie this type of learning, and hypothesise that multiple mechanisms may be present and interact with each other. We conclude that when studying spatial hearing, we should not assume that the only source of learning is from the visual system or other supervisory signal. Further study of the proposed mechanisms could allow us to design better rehabilitation programmes to accelerate relearning/recalibration of spatial maps.
Sensory systems must adapt to changes throughout life to maintain an accurate representation of the environment. The auditory localization system, which enables animals to determine sound source locations, provides an excellent model for studying such sensory plasticity. Neural circuits processing sound localization cues show remarkable adaptability, adjusting to developmental changes like head growth and compensating for hearing impairments [1,2]. However, fundamental questions remain about how the brain accomplishes this complex calibration task.
What could be the calibration signal used by the brain to learn spatial hearing? While the auditory spatial map can be calibrated via supervised learning using visual feedback as the teaching signal [2,3], considerable evidences indicate that vision-independent mechanisms must also exist [3]. Human can learn to accurately localize sound sources both inside and outside the visual field [4,5], with equal speed and magnitude [6]. Congenitally blind individuals can develop sound localization abilities comparable to, and in some cases superior to, those of sighted individuals [7,8]. Neurophysiological results in animals also demonstrate visual-independent calibration of the auditory spatial map [9,10]. Although several hypotheses [9,10] and computational models [11,12] have been proposed, the precise mechanisms underlying vision-independent calibration remain largely elusive. Recently, deep learning has emerged as a powerful and flexible tool for modeling sensory systems [13][14][15], offering new insights into auditory learning [16,17]. However, its effectiveness is limited by prevailing learning paradigms-especially supervised learning-which depend heavily on large volumes of externally provided labels. Moreover, this question of sensory cue calibration in the absence of direct supervision arises in a wide variety of sensory learning contexts and modalities beyond spatial hearing [18][19][20][21][22], calling for a general algorithmic framework to support further inquiry.
We propose “bootstrap learning”, a novel type of learning process where innate brain functions, even though basic and minimal, guide the learning of more sophisticated functions without external supervision-analogous to “pulling oneself up by one’s bootstraps”. Innate neural circuits in the auditory system offer several advantages as the vision-independent teacher for spatial hearing, providing a universally accessible calibration mechanism that is present in every individual throughout life. However, innate circuits are typically much less accurate than the learned auditory neural map which they should help calibrate. To determine the location of a sound, the learned map must accurately process complex, direction-dependent acoustic cues. In contrast, innate circuits are often limited to only basic functions, such as crude left-right discrimination, falling far short of offering precise localization supervision. Could bootstrap learning truly be feasible for spatial hearing?
We utilize simulations to examine the bootstrap learning principles, integrating three core components: a small “Teacher” neural circuit with basic innate functionality, a plastic “Student” neural network with sufficient capacity to learn the complex sound localization function, and an interactive acoustic “Environment”(fig. 1). Both the Teacher and Student are internal components of an Agent’s brain. The “Agent”-defined here as any human, animal, or model capable of acting within the environment-receives auditory inputs and moves in the simulation. The Student is a deep neural network that models the auditory space map. The Teacher is a much simpler, hardwired neural circuit that provides internal calibration signals for the Student. Bootstrapping in this context resembles a blindfolded single-player game: an Agent must learn the accurate spatial map through exploration, relying solely on its innate Teacher circuit as self-guidancewithout access to visual feedback or external labels. To assess the plausibility of different candidate Teacher circuits, we evaluate how effectively each guides the Student’s learning. Additionally, we analyze the computational principles of bootstrapping using systematic simulations, exploring how simple innate mechanisms can facilitate the development of more complex neural functions.
Newborns turn their head toward sound sources. This innate behavior is named the Auditory Orienting Reflex (AOR) [23]. Although newborns can not accurately localize sound sources, their AOR is relatively accurate for left-right discrimination [24]. We investigated whether or not a simple neural circuit model of the AOR would allow “bootstrap learning” of an accurate 360º auditory spatial map in the azimuth planewithout external error feedback or supervision labels.
There are two neural modules in our model. The first -the Teacher -models the basic AOR as a small neural circuit that discriminates whether a sound
This content is AI-processed based on open access ArXiv data.