A stochastic model of human visual attention with a dynamic Bayesian network

Reading time: 6 minute
...

📝 Original Info

  • Title: A stochastic model of human visual attention with a dynamic Bayesian network
  • ArXiv ID: 1004.0085
  • Date: 2015-03-13
  • Authors: ** - Akisato Kimura, Senior Member, IEEE - Derek Pang, Student Member, IEEE - Tatsuto Takeuchi - Kouji Miyazato - Kunio Kashino, Senior Member, IEEE - Junji Yamato, Senior Member, IEEE **

📝 Abstract

Recent studies in the field of human vision science suggest that the human responses to the stimuli on a visual display are non-deterministic. People may attend to different locations on the same visual input at the same time. Based on this knowledge, we propose a new stochastic model of visual attention by introducing a dynamic Bayesian network to predict the likelihood of where humans typically focus on a video scene. The proposed model is composed of a dynamic Bayesian network with 4 layers. Our model provides a framework that simulates and combines the visual saliency response and the cognitive state of a person to estimate the most probable attended regions. Sample-based inference with Markov chain Monte-Carlo based particle filter and stream processing with multi-core processors enable us to estimate human visual attention in near real time. Experimental results have demonstrated that our model performs significantly better in predicting human visual attention compared to the previous deterministic models.

💡 Deep Analysis

Deep Dive into A stochastic model of human visual attention with a dynamic Bayesian network.

Recent studies in the field of human vision science suggest that the human responses to the stimuli on a visual display are non-deterministic. People may attend to different locations on the same visual input at the same time. Based on this knowledge, we propose a new stochastic model of visual attention by introducing a dynamic Bayesian network to predict the likelihood of where humans typically focus on a video scene. The proposed model is composed of a dynamic Bayesian network with 4 layers. Our model provides a framework that simulates and combines the visual saliency response and the cognitive state of a person to estimate the most probable attended regions. Sample-based inference with Markov chain Monte-Carlo based particle filter and stream processing with multi-core processors enable us to estimate human visual attention in near real time. Experimental results have demonstrated that our model performs significantly better in predicting human visual attention compared to the pre

📄 Full Content

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. XXX, NO. XXX, XXXXX 2010 1 A stochastic model of human visual attention with a dynamic Bayesian network Akisato Kimura, Senior Member, IEEE, Derek Pang, Student Member, IEEE, Tatsuto Takeuchi, Kouji Miyazato, Kunio Kashino, Senior Member, IEEE, and Junji Yamato, Senior Member, IEEE. ! Abstract Recent studies in the field of human vision science suggest that the human responses to the stimuli on a visual display are non-deterministic. People may attend to different locations on the same visual input at the same time. Based on this knowledge, we propose a new stochastic model of visual attention by introducing a dynamic Bayesian network to predict the likelihood of where humans typically focus on a video scene. The proposed model is composed of a dynamic Bayesian network with 4 layers. Our model provides a framework that simulates and combines the visual saliency response and the cognitive state of a person to estimate the most probable attended regions. Sample-based inference with Markov chain Monte-Carlo based particle filter and stream processing with multi-core processors enable us to estimate human visual attention in near real time. Experimental results have demonstrated that our model performs significantly better in predicting human visual attention compared to the previous deterministic models. Index Terms Human visual attention, saliency, dynamic Bayesian network, state space model, hidden Markov model, Markov chain Monte-Carlo, particle filter, stream processing. • The authors are with NTT Communication Science Laboratories, NTT Corporation, 3-1 Morinosato Wakamiya, Atsugi, Kanagawa, 243-0198 Japan. E-mail: akisato@ieee.org • D. Pang is with Department of Electrical Engineering, Stanford University, Packard 240, 350 Serra Mall, Stanford, CA 94305, USA. He contributed to this work during his internship at NTT Communication Science Laboratories. • K. Miyazato was with Department of Information and Communication Systems Engineering, Okinawa National College of Technology, 905 Henoko, Nago, Okinawa, 905-2192 Japan. He contributed to this work during his internship at NTT Communication Science Laboratories. • Parts of the material in this paper has been presented at IEEE International Conference on Multimedia and Expo (ICME2008), Hannover, Germany, June 2008, and IEEE International Conference on Multimedia and Expo (ICME2009), Cancun, Mexico, June-July 2009. • Manuscript receive March 31 2010. October 22, 2018 DRAFT arXiv:1004.0085v1 [cs.CV] 1 Apr 2010 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. XXX, NO. XXX, XXXXX 2010 2 Fig. 1. An example of a saliency map using Koch-Ullman model 1 INTRODUCTION Developing a sophisticated object detection and recognition algorithms has been a long distance challenge in computer and robot vision researches. Such algorithms are required in most applications of compu- tational vision, including robotics [1], medical imaging [2], intelligent cars [3], surveillance [4], image segmentation [5], [6] and content-based image retrieval [7]. One of the major challenges in designing generic object detection and recognition systems is to construct methods that are fast and capable of operating on standard computer platforms without any prior knowledge. To that end, pre-selection mechanism would be essential to enable subsequent processing to focus only on relevant data. One promising approach to achieve this mechanism is visual attention: it selects regions in a visual scene that are most likely to contain objects of interest. The field of visual attention is currently the focus of much research for both biological and artificial systems. Attention is generally controlled by one or a combination of the two mechanisms: 1) a top-down control that voluntarily chooses the focus of attention in a cognitive and task-dependent manner, and 2) a bottom-up control that reflexively directs the visual focus based on the observed saliency attributes. The first biologically-plausible model for explaining the human attention system was proposed by Koch and Ullman [8], which follows the latter approach. The basic concept underlying this model is the feature integration theory developed by Treisman and Gelade [9] which has been one of the most influential theories of human visual attention. According to the feature integration theory, in a first step to visual processing, several primary visual features are processed and represented with separate feature maps that are later integrated in a saliency map that can be accessed in order to direct attention to the most conspicuous areas. In an example shown in Fig. 1, a red car placed on the right in the frame should be attentive, and therefore people directs one’s attention to this area. The Koch-Ullman model has been attracting attention of many researchers, especially after the development of an implementation model October 22, 2018 DRAFT IEEE TRANSACTIONS ON PATTERN

…(Full text truncated)…

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut