Improvement of Text Dependent Speaker Identification System Using Neuro-Genetic Hybrid Algorithm in Office Environmental Conditions

Reading time: 5 minute
...

📝 Abstract

In this paper, an improved strategy for automated text dependent speaker identification system has been proposed in noisy environment. The identification process incorporates the Neuro- Genetic hybrid algorithm with cepstral based features. To remove the background noise from the source utterance, wiener filter has been used. Different speech pre-processing techniques such as start-end point detection algorithm, pre-emphasis filtering, frame blocking and windowing have been used to process the speech utterances. RCC, MFCC, MFCC, MFCC, LPC and LPCC have been used to extract the features. After feature extraction of the speech, Neuro-Genetic hybrid algorithm has been used in the learning and identification purposes. Features are extracted by using different techniques to optimize the performance of the identification. According to the VALID speech database, the highest speaker identification rate of 100.000 percent for studio environment and 82.33 percent for office environmental conditions have been achieved in the close set text dependent speaker identification system.

💡 Analysis

In this paper, an improved strategy for automated text dependent speaker identification system has been proposed in noisy environment. The identification process incorporates the Neuro- Genetic hybrid algorithm with cepstral based features. To remove the background noise from the source utterance, wiener filter has been used. Different speech pre-processing techniques such as start-end point detection algorithm, pre-emphasis filtering, frame blocking and windowing have been used to process the speech utterances. RCC, MFCC, MFCC, MFCC, LPC and LPCC have been used to extract the features. After feature extraction of the speech, Neuro-Genetic hybrid algorithm has been used in the learning and identification purposes. Features are extracted by using different techniques to optimize the performance of the identification. According to the VALID speech database, the highest speaker identification rate of 100.000 percent for studio environment and 82.33 percent for office environmental conditions have been achieved in the close set text dependent speaker identification system.

📄 Content

IJCSI International Journal of Computer Science Issues, Vol. 1, 2009 ISSN (Online): 1694-0784 ISSN (Printed): 1694-0814 IJCSI IJCSI
42 Improvement of Text Dependent Speaker Identification System Using Neuro-Genetic Hybrid Algorithm in Office Environmental Conditions Md. Rabiul Islam1 and Md. Fayzur Rahman2

1 Department of Computer Science & Engineering Rajshahi University of Engineering & Technology (RUET), Rajshahi-6204, Bangladesh rabiul_cse@yahoo.com

2 Department of Electrical & Electronic Engineering Rajshahi University of Engineering & Technology (RUET), Rajshahi-6204, Bangladesh mfrahman3@yahoo.com

Abstract In this paper, an improved strategy for automated text dependent speaker identification system has been proposed in noisy environment. The identification process incorporates the Neuro- Genetic hybrid algorithm with cepstral based features. To remove the background noise from the source utterance, wiener filter has been used. Different speech pre-processing techniques such as start-end point detection algorithm, pre-emphasis filtering, frame blocking and windowing have been used to process the speech utterances. RCC, MFCC, ∆MFCC, ∆∆MFCC, LPC and LPCC have been used to extract the features. After feature extraction of the speech, Neuro-Genetic hybrid algorithm has been used in the learning and identification purposes. Features are extracted by using different techniques to optimize the performance of the identification. According to the VALID speech database, the highest speaker identification rate of 100.000 % for studio environment and 82.33 % for office environmental conditions have been achieved in the close set text dependent speaker identification system. Key words: Bio-informatics, Robust Speaker Identification, Speech Signal Pre-processing, Neuro-Genetic Hybrid Algorithm.

  1. Introduction Biometrics are seen by many researchers as a solution to a lot of user identification and security problems now a days [1]. Speaker identification is one of the most important areas where biometric techniques can be used. There are various techniques to resolve the automatic speaker identification problem [2, 3, 4, 5, 6, 7, 8].

Most published works in the areas of speech recognition and speaker recognition focus on speech under the noiseless environments and few published works focus on speech under noisy conditions [9, 10, 11, 12]. In some research work, different talking styles were used to simulate the speech produced under real stressful talking conditions [13, 14, 15]. Learning systems in speaker identification that employ hybrid strategies can potentially offer significant advantages over single-strategy systems.

In this proposed system, Neuro-Genetic Hybrid algorithm with cepstral based features has been used to improve the performance of the text dependent speaker identification system under noisy environment. To extract the features from the speech, different types of feature extraction technique such as RCC, MFCC, ∆MFCC, ∆∆MFCC, LPC and LPCC have been used to achieve good result. Some of the tasks of this work have been simulated using Matlab based toolbox such as Signal processing Toolbox, Voicebox and HMM Toolbox. 2. Paradigm of the Proposed Speaker Identification System The basic building blocks of speaker identification system are shown in the Fig.1. The first step is the acquisition of speech utterances from speakers. To remove the background noises from the original speech, wiener filter has been used. Then the start and end points detection algorithm has been used to detect the start and end points from each speech utterance. After which the unnecessary parts have been removed. Pre-emphasis filtering technique has been used as a noise reduction technique to increase the amplitude of the input signal at frequencies where signal-to-noise ratio (SNR) is low. The speech signal is segmented into overlapping frames. The purpose of the overlapping analysis is that each speech sound of the input sequence would be approximately centered at some frame. After segmentation, windowing technique has been used. Features were extracted from the segmented speech. The IJCSI International Journal of Computer Science Issues, Vol. 1, 2009

43 IJCSI IJCSI extracted features were then fed to the Neuro-Genetic hybrid techniques for learning and classification.
Fig. 1 Block diagram of the proposed automated speaker identification system. 3. Speech Signal Pre-processing for Speaker Identification To capture the speech signal, sampling frequency of 11025 Hz, sampling resolution of 16-bits, mono recording channel and Recorded file format = *.wav have been considered. The speech preprocessing part has a vital role for the efficiency of learning. After acquisition of speech utterances, wiener filter has been used to remove the background noise from the original speech utterances [16, 17, 18]. Speech end po

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut