When the brain receives input from multiple sensory systems, it is faced with the question of whether it is appropriate to process the inputs in combination, as if they originated from the same event, or separately, as if they originated from distinct events. Furthermore, it must also have a mechanism through which it can keep sensory inputs calibrated to maintain the accuracy of its internal representations. We have developed a neural network architecture capable of i) approximating optimal multisensory spatial integration, based on Bayesian causal inference, and ii) recalibrating the spatial encoding of sensory systems. The architecture is based on features of the dorsal processing hierarchy, including the spatial tuning properties of unisensory neurons and the convergence of different sensory inputs onto multisensory neurons. Furthermore, we propose that these unisensory and multisensory neurons play dual roles in i) encoding spatial location as separate or integrated estimates and ii) accumulating evidence for the independence or relatedness of multisensory stimuli. We further propose that top-down feedback connections spanning the dorsal pathway play key a role in recalibrating spatial encoding at the level of early unisensory cortices. Our proposed architecture provides possible explanations for a number of human electrophysiological and neuroimaging results and generates testable predictions linking neurophysiology with behaviour.
An important task constantly being carried out by the brain is determining whether inputs from different sensory modalities originate from the same cause or from separate causes. Take, for example, a situation in which multiple people in a room are having a conversation. To follow the conversation, one must identify and locate who is talking at any given moment. Considering only the spatiotemporal properties of current sensory information, the speaker whose apparent image is nearest the apparent source of the sound is the most likely candidate for who is speaking. However, taking further contextual information into consideration could enhance one's estimate for where, and thus whom, the speech is coming from: for example, the sight of moving lips combined with speech sounds further disambiguates and greatly influences the final estimate or percept of where the sound is occurring. This prior or contextual information, although helpful in most situations, can sometimes lead to mistaken perceptions, such as when a ventriloquist convinces her audience that a puppet is speaking by moving its mouth in synchrony with her own vocalizations. Because of this famous example, the general phenomenon of visual dominance over auditory cues for localization has been labeled the ventriloquism effect.
In general, the problem outlined above is a question of hierarchical causal inference (Shams & Beierholm, 2010): lower-level features, such as stimulus locations and onset times, inform the higher-level hypotheses about the causal relations between stimuli (are they related or not?). In turn, the inferences made about the causal relations affect the final estimates of where and when the stimuli occurred. Körding et al. (2007) formally introduced causal inference as a mechanism for optimal multisensory interactions. Rather than naively integrating sensory cues, under the assumption that they are related (Ernst and Banks, 2002;Alais and Burr, 2004;Atkins et al., 2001), their model assigns uncertainty to the exhaustive and mutually exclusive possibilities that i) the stimuli are causally linked by a common source (C=1), and drawn from distributions centered at the same true location or ii) that they have separate causes (C=2), and are independently drawn from their distributions centered at their own true locations (Körding et al., 2007). The appropriate strategy when dealing with these two possibilities is to first calculate the likelihood of obtaining the auditory and visual sensory estimates (𝑥 ! , 𝑥 ! ) given either possible causal model: 𝑝 𝑥 ! , 𝑥 ! |𝐶 = 1 and 𝑝 𝑥 ! , 𝑥 ! |𝐶 = 2 . The posterior probabilities of each causal model, 𝑝 𝐶 = 1|𝑥 ! , 𝑥 ! and 𝑝 𝐶 = 2|𝑥 ! , 𝑥 ! , are then determined by entering the likelihoods for each causal model and the prior probability that two stimuli are causally related, 𝑝 𝐶 = 1 or p common , into Bayes’ Equation. Finally, an observer should use these posteriors as weights to compute a weighted average between the fully integrated estimate for stimulus location (assumption of a common cause) and the estimate produced under the assumption of independent stimulus locations.
A number of behavioural studies suggest that the brain appears to do causal inference and produces estimates that are well fit by the model (Körding et ), but how exactly would neurons carry out these computations? Early models have been proposed to describe how idealized neurons might carry out such computations in a Bayes optimal manner (Ma and Rahmati, 2013; Spratling, 2016; Magosso, Cuppini, and Ursino, 2017). Such models tend to focus on achieving optimality through encoding the full probability distributions over the variable of interest and less so the actual physiological or anatomical relationships observed between neural structures. Taking inspiration from such models, especially the architecture of Spratling (2016), as well as what is known from the literature regarding the neural correlates of multisensory integration, we propose a network model that approximates the average estimates produced by the causal inference model. Furthermore, our model accounts for the recalibration of sensory estimates, and considers the possible functional roles for different brain regions and their associated activities involved in multisensory causal inference and recalibration. In the next section we will review some of the major findings in the domain of the neural architecture of multisensory integration and recalibration that have informed this work.
Since Meredith and Stein’s (1986) pioneering work exploring single neuron responses to multisensory stimuli, a number of studies have elucidated the possible involvement of cortical areas in auditory and visual integration. Early sensory cortices, such as auditory and visual cortex, which were previously believed to have been strictly unisensory have been found to have multisensory responses (McDonald et al., 2013;Feng et al., 2014;Bieler et al., 2017;Brang et al., 2015). Althou
This content is AI-processed based on open access ArXiv data.