📝 Original Info
- Title: Employing Second-Order Circular Suprasegmental Hidden Markov Models to Enhance Speaker Identification Performance in Shouted Talking Environments
- ArXiv ID: 1706.09722
- Date: 2017-07-03
- Authors: Researchers from original ArXiv paper
📝 Abstract
Speaker identification performance is almost perfect in neutral talking environments; however, the performance is deteriorated significantly in shouted talking environments. This work is devoted to proposing, implementing and evaluating new models called Second-Order Circular Suprasegmental Hidden Markov Models (CSPHMM2s) to alleviate the deteriorated performance in the shouted talking environments. These proposed models possess the characteristics of both Circular Suprasegmental Hidden Markov Models (CSPHMMs) and Second-Order Suprasegmental Hidden Markov Models (SPHMM2s). The results of this work show that CSPHMM2s outperform each of: First-Order Left-to-Right Suprasegmental Hidden Markov Models (LTRSPHMM1s), Second-Order Left-to-Right Suprasegmental Hidden Markov Models (LTRSPHMM2s) and First-Order Circular Suprasegmental Hidden Markov Models (CSPHMM1s) in the shouted talking environments. In such talking environments and using our collected speech database, average speaker identification performance based on LTRSPHMM1s, LTRSPHMM2s, CSPHMM1s and CSPHMM2s is 74.6%, 78.4%, 78.7% and 83.4%, respectively. Speaker identification performance obtained based on CSPHMM2s is close to that obtained based on subjective assessment by human listeners.
💡 Deep Analysis
Deep Dive into Employing Second-Order Circular Suprasegmental Hidden Markov Models to Enhance Speaker Identification Performance in Shouted Talking Environments.
Speaker identification performance is almost perfect in neutral talking environments; however, the performance is deteriorated significantly in shouted talking environments. This work is devoted to proposing, implementing and evaluating new models called Second-Order Circular Suprasegmental Hidden Markov Models (CSPHMM2s) to alleviate the deteriorated performance in the shouted talking environments. These proposed models possess the characteristics of both Circular Suprasegmental Hidden Markov Models (CSPHMMs) and Second-Order Suprasegmental Hidden Markov Models (SPHMM2s). The results of this work show that CSPHMM2s outperform each of: First-Order Left-to-Right Suprasegmental Hidden Markov Models (LTRSPHMM1s), Second-Order Left-to-Right Suprasegmental Hidden Markov Models (LTRSPHMM2s) and First-Order Circular Suprasegmental Hidden Markov Models (CSPHMM1s) in the shouted talking environments. In such talking environments and using our collected speech database, average speaker identific
📄 Full Content
1
Employing Second-Order Circular Suprasegmental Hidden Markov Models to
Enhance Speaker Identification Performance in Shouted Talking Environments
Ismail Shahin
Electrical and Computer Engineering Department
University of Sharjah
P. O. Box 27272
Sharjah, United Arab Emirates
Tel: (971) 6 5050967
Fax: (971) 6 5050877
E-mail: ismail@sharjah.ac.ae
2
Abstract
Speaker identification performance is almost perfect in neutral talking
environments; however, the performance is deteriorated significantly in shouted
talking environments. This work is devoted to proposing, implementing and
evaluating new models called Second-Order Circular Suprasegmental Hidden
Markov Models (CSPHMM2s) to alleviate the deteriorated performance in the
shouted talking environments. These proposed models possess the characteristics
of both Circular Suprasegmental Hidden Markov Models (CSPHMMs) and
Second-Order Suprasegmental Hidden Markov Models (SPHMM2s). The results
of this work show that CSPHMM2s outperform each of: First-Order Left-to-Right
Suprasegmental Hidden Markov Models (LTRSPHMM1s), Second-Order Left-to-
Right Suprasegmental Hidden Markov Models (LTRSPHMM2s) and First-Order
Circular Suprasegmental Hidden Markov Models (CSPHMM1s) in the shouted
talking environments. In such talking environments and using our collected
speech database, average speaker identification performance based on
LTRSPHMM1s, LTRSPHMM2s, CSPHMM1s and CSPHMM2s is 74.6%,
78.4%, 78.7% and 83.4%, respectively. Speaker identification performance
obtained based on CSPHMM2s is close to that obtained based on subjective
assessment by human listeners.
Keywords: first-order circular suprasegmental hidden Markov models; first-order
left-to-right suprasegmental hidden Markov models; second-order circular
suprasegmental hidden Markov models; second-order left-to-right suprasegmental
hidden Markov models; shouted talking environments; speaker identification.
3
- Introduction
Speaker recognition is the process of automatically recognizing who is speaking
on the basis of individual information embedded in speech signals. Speaker
recognition involves two applications: speaker identification and speaker
verification (authentication). Speaker identification is the process of finding the
identity of the unknown speaker by comparing his/her voice with voices of
registered speakers in the database. The comparison results are measures of the
similarity from which the maximal quality is chosen. Speaker identification can
be used in criminal investigations to determine the suspected persons who
generated the voice recorded at the scene of the crime. Speaker identification can
also be used in civil cases or for the media. These cases include calls to radio
stations, local or other government authorities, insurance companies, monitoring
people by their voices and many other applications.
Speaker verification is the process of determining whether the speaker identity is
who the person claims to be. In this type of speaker recognition, the voiceprint is
compared with the speaker voice model registered in the speech data corpus that
is required to be verified. The result of comparison is a measure of the similarity
from which acceptance or rejection of the verified speaker follows. The
applications of speaker verification include using the voice as a key to confirm
the identity claim of a speaker. Such services include banking transactions using a
telephone network, database access services, security control for confidential
information areas, remote access to computers, tracking speakers in a
conversation or broadcast and many other applications.
4
Speaker recognition is often classified into closed-set recognition and open-set
recognition. The closed-set refers to the cases that the unknown voice must come
from a set of known speakers, while the open-set refers to the cases that the
unknown voice may come from unregistered speakers. Speaker recognition
systems could also be divided according to the speech modalities: text-dependent
(fixed-text) recognition and text-independent (free-text) recognition. In the text-
dependent recognition, the text spoken by the speaker is known; however, in the
text-independent recognition, the system should be able to identify the unknown
speaker from any text.
- Motivation and Literature Review
Speaker recognition systems perform extremely well in neutral talking
environments [1-4]; however, such systems perform poorly in stressful talking
environments [5-13]. Neutral talking environments are defined as the talking
environments in which speech is generated assuming that speakers are not
suffering from any stressful or emotional talking conditions. Stressful talking
environments are defined as the talking environments that cause speakers to vary
their generation of speech from neutral talking condition to other stressful talking
condi
…(Full text truncated)…
📸 Image Gallery
Reference
This content is AI-processed based on ArXiv data.