Improvement of Text Dependent Speaker Identification System Using Neuro-Genetic Hybrid Algorithm in Office Environmental Conditions
📝 Abstract
In this paper, an improved strategy for automated text dependent speaker identification system has been proposed in noisy environment. The identification process incorporates the Neuro- Genetic hybrid algorithm with cepstral based features. To remove the background noise from the source utterance, wiener filter has been used. Different speech pre-processing techniques such as start-end point detection algorithm, pre-emphasis filtering, frame blocking and windowing have been used to process the speech utterances. RCC, MFCC, MFCC, MFCC, LPC and LPCC have been used to extract the features. After feature extraction of the speech, Neuro-Genetic hybrid algorithm has been used in the learning and identification purposes. Features are extracted by using different techniques to optimize the performance of the identification. According to the VALID speech database, the highest speaker identification rate of 100.000 percent for studio environment and 82.33 percent for office environmental conditions have been achieved in the close set text dependent speaker identification system.
💡 Analysis
In this paper, an improved strategy for automated text dependent speaker identification system has been proposed in noisy environment. The identification process incorporates the Neuro- Genetic hybrid algorithm with cepstral based features. To remove the background noise from the source utterance, wiener filter has been used. Different speech pre-processing techniques such as start-end point detection algorithm, pre-emphasis filtering, frame blocking and windowing have been used to process the speech utterances. RCC, MFCC, MFCC, MFCC, LPC and LPCC have been used to extract the features. After feature extraction of the speech, Neuro-Genetic hybrid algorithm has been used in the learning and identification purposes. Features are extracted by using different techniques to optimize the performance of the identification. According to the VALID speech database, the highest speaker identification rate of 100.000 percent for studio environment and 82.33 percent for office environmental conditions have been achieved in the close set text dependent speaker identification system.
📄 Content
IJCSI International Journal of Computer Science Issues, Vol. 1, 2009
ISSN (Online): 1694-0784
ISSN (Printed): 1694-0814
IJCSI
IJCSI
42
Improvement of Text Dependent Speaker Identification
System Using Neuro-Genetic Hybrid Algorithm in Office
Environmental Conditions
Md. Rabiul Islam1 and Md. Fayzur Rahman2
1 Department of Computer Science & Engineering Rajshahi University of Engineering & Technology (RUET), Rajshahi-6204, Bangladesh rabiul_cse@yahoo.com
2 Department of Electrical & Electronic Engineering Rajshahi University of Engineering & Technology (RUET), Rajshahi-6204, Bangladesh mfrahman3@yahoo.com
Abstract In this paper, an improved strategy for automated text dependent speaker identification system has been proposed in noisy environment. The identification process incorporates the Neuro- Genetic hybrid algorithm with cepstral based features. To remove the background noise from the source utterance, wiener filter has been used. Different speech pre-processing techniques such as start-end point detection algorithm, pre-emphasis filtering, frame blocking and windowing have been used to process the speech utterances. RCC, MFCC, ∆MFCC, ∆∆MFCC, LPC and LPCC have been used to extract the features. After feature extraction of the speech, Neuro-Genetic hybrid algorithm has been used in the learning and identification purposes. Features are extracted by using different techniques to optimize the performance of the identification. According to the VALID speech database, the highest speaker identification rate of 100.000 % for studio environment and 82.33 % for office environmental conditions have been achieved in the close set text dependent speaker identification system. Key words: Bio-informatics, Robust Speaker Identification, Speech Signal Pre-processing, Neuro-Genetic Hybrid Algorithm.
- Introduction Biometrics are seen by many researchers as a solution to a lot of user identification and security problems now a days [1]. Speaker identification is one of the most important areas where biometric techniques can be used. There are various techniques to resolve the automatic speaker identification problem [2, 3, 4, 5, 6, 7, 8].
Most published works in the areas of speech recognition and speaker recognition focus on speech under the noiseless environments and few published works focus on speech under noisy conditions [9, 10, 11, 12]. In some research work, different talking styles were used to simulate the speech produced under real stressful talking conditions [13, 14, 15]. Learning systems in speaker identification that employ hybrid strategies can potentially offer significant advantages over single-strategy systems.
In this proposed system, Neuro-Genetic Hybrid algorithm with cepstral based features has been used to improve the performance of the text dependent speaker identification system under noisy environment. To extract the features from the speech, different types of feature extraction technique such as RCC, MFCC, ∆MFCC, ∆∆MFCC, LPC and LPCC have been used to achieve good result. Some of the tasks of this work have been simulated using Matlab based toolbox such as Signal processing Toolbox, Voicebox and HMM Toolbox. 2. Paradigm of the Proposed Speaker Identification System The basic building blocks of speaker identification system are shown in the Fig.1. The first step is the acquisition of speech utterances from speakers. To remove the background noises from the original speech, wiener filter has been used. Then the start and end points detection algorithm has been used to detect the start and end points from each speech utterance. After which the unnecessary parts have been removed. Pre-emphasis filtering technique has been used as a noise reduction technique to increase the amplitude of the input signal at frequencies where signal-to-noise ratio (SNR) is low. The speech signal is segmented into overlapping frames. The purpose of the overlapping analysis is that each speech sound of the input sequence would be approximately centered at some frame. After segmentation, windowing technique has been used. Features were extracted from the segmented speech. The IJCSI International Journal of Computer Science Issues, Vol. 1, 2009
43
IJCSI
IJCSI
extracted features were then fed to the Neuro-Genetic
hybrid techniques for learning and classification.
Fig. 1 Block diagram of the proposed automated speaker identification
system.
3. Speech Signal Pre-processing for Speaker
Identification
To capture the speech signal, sampling frequency of
11025 Hz, sampling resolution of 16-bits, mono recording
channel and Recorded file format = *.wav have been
considered. The speech preprocessing part has a vital role
for the efficiency of learning. After acquisition of speech
utterances, wiener filter has been used to remove the
background noise from the original speech utterances [16,
17, 18]. Speech end po
This content is AI-processed based on ArXiv data.