Employing Emotion Cues to Verify Speakers in Emotional Talking Environments

February 23, 2026

Reading time: 6 minute

...

📝 Abstract

Usually, people talk neutrally in environments where there are no abnormal talking conditions such as stress and emotion. Other emotional conditions that might affect people talking tone like happiness, anger, and sadness. Such emotions are directly affected by the patient health status. In neutral talking environments, speakers can be easily verified, however, in emotional talking environments, speakers cannot be easily verified as in neutral talking ones. Consequently, speaker verification systems do not perform well in emotional talking environments as they do in neutral talking environments. In this work, a two-stage approach has been employed and evaluated to improve speaker verification performance in emotional talking environments. This approach employs speaker emotion cues (text-independent and emotion-dependent speaker verification problem) based on both Hidden Markov Models (HMMs) and Suprasegmental Hidden Markov Models (SPHMMs) as classifiers. The approach is comprised of two cascaded stages that combines and integrates emotion recognizer and speaker recognizer into one recognizer. The architecture has been tested on two different and separate emotional speech databases: our collected database and Emotional Prosody Speech and Transcripts database. The results of this work show that the proposed approach gives promising results with a significant improvement over previous studies and other approaches such as emotion-independent speaker verification approach and emotion-dependent speaker verification approach based completely on HMMs.

💡 Analysis

🇰🇷 한글로 읽기

📄 Content

Employing Emotion Cues to Verify Speakers in Emotional Talking Environments

Ismail Shahin Department of Electrical and Computer Engineering University of Sharjah P. O. Box 27272 Sharjah, United Arab Emirates Tel: (971) 6 5050967 Fax: (971) 6 5050877 E-mail: ismail@sharjah.ac.ae

Abstract Usually, people talk neutrally in environments where there are no abnormal talking conditions such as stress and emotion. Other emotional conditions that might affect people talking tone like happiness, anger, and sadness. Such emotions are directly affected by the patient’s health status. In neutral talking environments, speakers can be easily verified; however, in emotional talking environments, speakers cannot be easily verified as in neutral talking ones. Consequently, speaker verification systems do not perform well in emotional talking environments as they do in neutral talking environments. In this work, a two-stage approach has been employed and evaluated to improve speaker verification performance in emotional talking environments. This approach employs speaker’s emotion cues (text-independent and emotion-dependent speaker verification problem) based on both Hidden Markov Models (HMMs) and Suprasegmental Hidden Markov Models (SPHMMs) as classifiers. The approach is comprised of two cascaded stages that combines and integrates emotion recognizer and speaker recognizer into one recognizer. The architecture has been tested on two different and separate emotional speech databases: our collected database and Emotional Prosody Speech and Transcripts database. The results of this work show that the proposed approach gives promising results with a significant improvement over previous studies and other approaches such as emotion-independent speaker verification approach and emotion-dependent speaker verification approach based completely on HMMs.

Keywords: emotion recognition; emotional talking environments; hidden Markov models; speaker verification; suprasegmental hidden Markov models. 3

Introduction Listeners can obtain different types of information from speech signals. Such types of information are: 1) Speech recognition which conveys information about the content of the speech signal. 2) Speaker recognition which yields information about the speaker identity. 3) Emotion recognition that gives information about the emotional state of the speaker. 4) Health recognition which provides information on the patient’s health status. 5) Language recognition that produces information of the language being spoken. 6) Accent recognition which generates information about the speaker accent. 7) Age recognition which delivers information about the speaker age. 8) Gender recognition that gives information about the speaker gender.

There are two types of speaker recognition: speaker identification and speaker verification (authentication). Speaker identification is the task of automatically determining who is speaking from a set of known speakers. Speaker verification is the task of automatically determining if a person really is the person he or she claims to be. Speaker verification can be used in intelligent health care systems [1], [2], [3], [4]. Speaker verification systems are used in hospitals which include computerized emotion categorization and assessment techniques [1]. These systems can also be used in the pathological voice assessment (functional dysphonic voices) [2]. Dysphonia is the medical term for disorders of the voice: an impairment in the ability to produce voice sounds using the vocal organs. Thus, dysphonia is a phonation disorder. The dysphonic voice can be hoarse or excessively breathy, harsh, or rough [5]. Furthermore, speaker verification systems can be used in the diagnosis of Parkinson’s disease [3]. Max Little and 4

his team at Massachusetts Institute of Technology (MIT) did some work on analyzing and evaluating the voice characteristics of patients who had been diagnosed with Parkinson’s disease. They discovered that they could create a tool to detect such a disease in the speech patterns of individuals [3]. In addition, speaker verification systems can be exploited to provide assistance to multidisciplinary evaluation teams as they evaluate each child who is referred for an assessment to determine if he/she is a child with a disability and in need of special education services. The verification of children with disabilities is one of the most important aspects of both federal law and state special education regulation [4].

Speaker recognition has been an interesting research field in the last few decades, which still yields a number of challenging problems. One of the most challenging problems that face speaker recognition systems is the low performance of such systems in emotional talking environments [6], [7], [8], [9]. Emotion-based speaker recognition

View Original ArXiv

This content is AI-processed based on ArXiv data.

Employing Emotion Cues to Verify Speakers in Emotional Talking Environments

📝 Abstract

💡 Analysis

📄 Content

Table of Contents

Table of Contents

📝 Abstract

💡 Analysis

📄 Content

Start searching

No results found