The ability to predict the intentions of people based solely on their visual actions is a skill only performed by humans and animals. The intelligence of current computer algorithms has not reached this level of complexity, but there are several research efforts that are working towards it. With the number of classification algorithms available, it is hard to determine which algorithm works best for a particular situation. In classification of visual human intent data, Hidden Markov Models (HMM), and their variants, are leading candidates. The inability of HMMs to provide a probability in the observation to observation linkages is a big downfall in this classification technique. If a person is visually identifying an action of another person, they monitor patterns in the observations. By estimating the next observation, people have the ability to summarize the actions, and thus determine, with pretty good accuracy, the intention of the person performing the action. These visual cues and linkages are important in creating intelligent algorithms for determining human actions based on visual observations. The Evidence Feed Forward Hidden Markov Model is a newly developed algorithm which provides observation to observation linkages. The following research addresses the theory behind Evidence Feed Forward HMMs, provides mathematical proofs of their learning of these parameters to optimize the likelihood of observations with a Evidence Feed Forwards HMM, which is important in all computational intelligence algorithm, and gives comparative examples with standard HMMs in classification of both visual action data and measurement data; thus providing a strong base for Evidence Feed Forward HMMs in classification of many types of problems.
Visual Understanding (VU) is the ability for a machine to understand events, items, and scenarios through visual cues, or visual data. It is a very important and complex process used in many artificial and computational intelligence research programs. The need for VU is increasing with the growing advances in technology that require VU algorithms to be taken out of the research labs and into fully developed programs [11,12,13,32,83].
A sub research area of VU is Visual Human Intent Analysis (VHIA). This area may also be referred to as visual human behavior identification, action or activity recognition, and understanding human actions from visual cues. VHIA concentrates on the visual identification of actions made by a human. There are many different names associated with VHIA that describe the specific process. In static self security systems visual human behavior identification systems will aide or replace security guards monitoring CCTV feeds [13]. Television stations and the gaming community will require activity recognition systems to automatically categorize and store or quickly search for certain scenes in a database [12]. The military is pushing robotics to replace the soldier, thus requiring the need to understand human actions from visual cues to determine hostile actions from people so the robot can take appropriate actions to secure itself [83]. These are just a few names of the many names for VHIA.
Evidence Feed Forward Hidden Markov Models are designed to better handle many of the shortcomings not addressed in current classification systems. The motivation of this new research is to provide a way of better detecting human movements for classification of the person’s activity. This classification will be based on the observations being linked, not just linking observations with events as described in standard Hidden Markov Models. Moving from better classifications in visual human activity to other types of non-visual data will be discussed. Classification results of Evidence Feed Forward Hidden Markov Models compared with results of standard Hidden Markov Models will also be shown for both visual and nonvisual data. The goal of this research is to develop a more robust classification system then current standard Hidden Markov Models and to identify a new way of looking at the links between evidences from the classification system; in the case of Evidence Feed Forward HMMs this is described as observation to observation links.
The first section of this paper is the introduction. The second section is a brief overview of current research in the area of VHIA. The third section describes the Evidence Feed Forward theory and derives the equations for the three common problems of HMMs. The fourth and fifth sections use the Evidence Feed Forward HMM developed algorithms and apply them to two examples, one measurement based and one visual based. The final section is the conclusion.
The current research in the area of VHIA is split into six sub-sections which best describes the methods based on the volume of work: Non-traditional artificial intelligent (AI) methods, traditional AI methods, Markov models and Bayesian networks, grammars, traditional hidden Markov models (HMM), and non-traditional HMMs.
Non-traditional AI methods general do not have learning in them and rely on heavy processing of the data to determine the intent of the person. M. Cristani et al [1] uses non-traditional AI methods by taking in both audio and visual data to determine simple events in an office. First they remove foreground objects and segment the images in the sequence. This output is coupled with the audio data and a threshold detection process is used to identify unusual events. These event sequences are put into an audio visual concurrence matrix (AVC) to compare with known AVC events.
Template matching is performed by M. Dimitrijevic et. al. [2]. They developed a template database of actions based on five male and three female people. Each human action is represented by three frames of their 2D silhouette at different stages of the activity: the frame when the person first touches the ground with one of his/her feet, the frame at the midstride of the step, and the end frame when the person finishes touching the ground with the same foot.
The three frame sets were taken from seven camera positions. When determining the event, they use a modified Chamfer’s distance calculation to match to the template sequences in the database.
Traditional AI methods usually have some type of learning, either with known or unknown outcomes. Typical methods would include neural networks, fuzzy systems, and other common AI techniques. H. Stern et al. [3] created a prototype fuzzy system for picture understanding of surveillance cameras. His model is split into three parts, pre-processing module, a static object fuzzy system module, and a dynamic temporal fuzzy system module. The static fuzzy system module classifies pre-processed d
This content is AI-processed based on open access ArXiv data.