Employing Second-Order Circular Suprasegmental Hidden Markov Models to Enhance Speaker Identification Performance in Shouted Talking Environments

Reading time: 5 minute
...

📝 Original Info

  • Title: Employing Second-Order Circular Suprasegmental Hidden Markov Models to Enhance Speaker Identification Performance in Shouted Talking Environments
  • ArXiv ID: 1706.09722
  • Date: 2017-07-03
  • Authors: Researchers from original ArXiv paper

📝 Abstract

Speaker identification performance is almost perfect in neutral talking environments; however, the performance is deteriorated significantly in shouted talking environments. This work is devoted to proposing, implementing and evaluating new models called Second-Order Circular Suprasegmental Hidden Markov Models (CSPHMM2s) to alleviate the deteriorated performance in the shouted talking environments. These proposed models possess the characteristics of both Circular Suprasegmental Hidden Markov Models (CSPHMMs) and Second-Order Suprasegmental Hidden Markov Models (SPHMM2s). The results of this work show that CSPHMM2s outperform each of: First-Order Left-to-Right Suprasegmental Hidden Markov Models (LTRSPHMM1s), Second-Order Left-to-Right Suprasegmental Hidden Markov Models (LTRSPHMM2s) and First-Order Circular Suprasegmental Hidden Markov Models (CSPHMM1s) in the shouted talking environments. In such talking environments and using our collected speech database, average speaker identification performance based on LTRSPHMM1s, LTRSPHMM2s, CSPHMM1s and CSPHMM2s is 74.6%, 78.4%, 78.7% and 83.4%, respectively. Speaker identification performance obtained based on CSPHMM2s is close to that obtained based on subjective assessment by human listeners.

💡 Deep Analysis

Deep Dive into Employing Second-Order Circular Suprasegmental Hidden Markov Models to Enhance Speaker Identification Performance in Shouted Talking Environments.

Speaker identification performance is almost perfect in neutral talking environments; however, the performance is deteriorated significantly in shouted talking environments. This work is devoted to proposing, implementing and evaluating new models called Second-Order Circular Suprasegmental Hidden Markov Models (CSPHMM2s) to alleviate the deteriorated performance in the shouted talking environments. These proposed models possess the characteristics of both Circular Suprasegmental Hidden Markov Models (CSPHMMs) and Second-Order Suprasegmental Hidden Markov Models (SPHMM2s). The results of this work show that CSPHMM2s outperform each of: First-Order Left-to-Right Suprasegmental Hidden Markov Models (LTRSPHMM1s), Second-Order Left-to-Right Suprasegmental Hidden Markov Models (LTRSPHMM2s) and First-Order Circular Suprasegmental Hidden Markov Models (CSPHMM1s) in the shouted talking environments. In such talking environments and using our collected speech database, average speaker identific

📄 Full Content

1

Employing Second-Order Circular Suprasegmental Hidden Markov Models to Enhance Speaker Identification Performance in Shouted Talking Environments

Ismail Shahin

Electrical and Computer Engineering Department University of Sharjah P. O. Box 27272 Sharjah, United Arab Emirates Tel: (971) 6 5050967 Fax: (971) 6 5050877 E-mail: ismail@sharjah.ac.ae

2 Abstract Speaker identification performance is almost perfect in neutral talking environments; however, the performance is deteriorated significantly in shouted talking environments. This work is devoted to proposing, implementing and evaluating new models called Second-Order Circular Suprasegmental Hidden Markov Models (CSPHMM2s) to alleviate the deteriorated performance in the shouted talking environments. These proposed models possess the characteristics of both Circular Suprasegmental Hidden Markov Models (CSPHMMs) and Second-Order Suprasegmental Hidden Markov Models (SPHMM2s). The results of this work show that CSPHMM2s outperform each of: First-Order Left-to-Right Suprasegmental Hidden Markov Models (LTRSPHMM1s), Second-Order Left-to- Right Suprasegmental Hidden Markov Models (LTRSPHMM2s) and First-Order Circular Suprasegmental Hidden Markov Models (CSPHMM1s) in the shouted talking environments. In such talking environments and using our collected speech database, average speaker identification performance based on LTRSPHMM1s, LTRSPHMM2s, CSPHMM1s and CSPHMM2s is 74.6%, 78.4%, 78.7% and 83.4%, respectively. Speaker identification performance obtained based on CSPHMM2s is close to that obtained based on subjective assessment by human listeners.

Keywords: first-order circular suprasegmental hidden Markov models; first-order left-to-right suprasegmental hidden Markov models; second-order circular suprasegmental hidden Markov models; second-order left-to-right suprasegmental hidden Markov models; shouted talking environments; speaker identification.

3

  1. Introduction Speaker recognition is the process of automatically recognizing who is speaking on the basis of individual information embedded in speech signals. Speaker recognition involves two applications: speaker identification and speaker verification (authentication). Speaker identification is the process of finding the identity of the unknown speaker by comparing his/her voice with voices of registered speakers in the database. The comparison results are measures of the similarity from which the maximal quality is chosen. Speaker identification can be used in criminal investigations to determine the suspected persons who generated the voice recorded at the scene of the crime. Speaker identification can also be used in civil cases or for the media. These cases include calls to radio stations, local or other government authorities, insurance companies, monitoring people by their voices and many other applications.

Speaker verification is the process of determining whether the speaker identity is who the person claims to be. In this type of speaker recognition, the voiceprint is compared with the speaker voice model registered in the speech data corpus that is required to be verified. The result of comparison is a measure of the similarity from which acceptance or rejection of the verified speaker follows. The applications of speaker verification include using the voice as a key to confirm the identity claim of a speaker. Such services include banking transactions using a telephone network, database access services, security control for confidential information areas, remote access to computers, tracking speakers in a conversation or broadcast and many other applications.

4 Speaker recognition is often classified into closed-set recognition and open-set recognition. The closed-set refers to the cases that the unknown voice must come from a set of known speakers, while the open-set refers to the cases that the unknown voice may come from unregistered speakers. Speaker recognition systems could also be divided according to the speech modalities: text-dependent (fixed-text) recognition and text-independent (free-text) recognition. In the text- dependent recognition, the text spoken by the speaker is known; however, in the text-independent recognition, the system should be able to identify the unknown speaker from any text.

  1. Motivation and Literature Review Speaker recognition systems perform extremely well in neutral talking environments [1-4]; however, such systems perform poorly in stressful talking environments [5-13]. Neutral talking environments are defined as the talking environments in which speech is generated assuming that speakers are not suffering from any stressful or emotional talking conditions. Stressful talking environments are defined as the talking environments that cause speakers to vary their generation of speech from neutral talking condition to other stressful talking condi

…(Full text truncated)…

📸 Image Gallery

cover.png page_2.webp page_3.webp

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut