Identification of Voice Utterance with Aging Factor Using the Method of MFCC Multichannel

February 23, 2026

Reading time: 6 minute

...

📝 Original Info

Title: Identification of Voice Utterance with Aging Factor Using the Method of MFCC Multichannel
ArXiv ID: 1702.01999
Date: 2017-02-08
Authors: Researchers from original ArXiv paper

📝 Abstract

This research was conducted to develop a method to identify voice utterance. For voice utterance that encounters change caused by aging factor, with the interval of 10 to 25 years. The change of voice utterance influenced by aging factor might be extracted by MFCC (Mel Frequency Cepstrum Coefficient). However, the level of the compatibility of the feature may be dropped down to 55%. While the ones which do not encounter it may reach 95%. To improve the compatibility of the changing voice feature influenced by aging factor, then the method of the more specific feature extraction is developed: which is by separating the voice into several channels, suggested as MFCC multichannel, consisting of multichannel 5 filterbank (M5FB), multichannel 2 filterbank (M2FB) and multichannel 1 filterbank (M1FB). The result of the test shows that for model M5FB and M2FB have the highest score in the level of compatibility with 85% and 82% with 25 years interval. While model M5FB gets the highest score of 86% for 10 years time interval.

💡 Deep Analysis

Deep Dive into Identification of Voice Utterance with Aging Factor Using the Method of MFCC Multichannel.

📄 Full Content

A Conversation is a form of communication, set by words into sentences. A conversation can be recorded, documented record is used to identify a plot of an event [1]. Words within the conversation have features, for each individual, those features are different from one to another [2], [3]. These features are obtained through extraction process; the method used are MFCC (Mel-Frequency Cepstral Coefficients) [4]. The features of the extraction result are the features of frequency [5]. For extraction mode of MFCC, it has Mel scale which constitutes a scale that has linear value for frequency under 1 KHz, and the exponential above 1 KHz.

The characteristics of frequency produced from the extraction consist of the fundamental frequency and formant frequency [6]. This characteristic can be used for identification by comparing the characteristics of voice utterance [7]. The compared characteristics are originated from the words in voice utterance. Similarity means that both are originated from the same individual and in contrary for the different characteristics [8].

The result of identification using extraction method of MFCC has the high level of compatibility and it can reach up to more than 95% [9]. The high compatibility can be reached for voice utterance that does not encounter change on its characteristics [8]. Some of these changes are caused by aging factor [10], [11]. The emergence of aging factor is because the occurrence of the time interval between voice utterance that will be identified and the voice utterance as the comparison [12], [13]. Time interval up to 25 years old causes the characteristics to change which is caused by the difference of age. The aging factor according to [6] and [13] can be observed to encounter change during a period of time of change between the age of 18 and 60 years old as shown in Figure 1. However, the existence of aging factor does not cause all characteristics to change yet only some components of characteristics. For components of characteristics that encounter change according to [11] is at the fundamental frequency and some formant frequency.

This research is to identify voice utterance with aging factor of the time interval of 10 and 25 years old. Identification using suggested extraction method, by developing a method of MFCC. This method of extraction is begun with separating voice into some range of frequency and the process of extraction is conducted in each channel. The process aims to obtain the more specific characteristics. [14] is a process to emerge the characteristics of words in a recorded voice utterance. The characteristics consist of fundamental frequency (F0), and formant frequency. As for formant frequency, it is divided into formant frequency 1 (F1), formant frequency 2 (F2), formant frequency 3 (F3), and formant frequency 4 (F4) [11], [15]. According to [16], formant frequency has dynamic characteristics yet it can be used for identification process. Besides, based on research [15] the process of identification for voice that encounters language change can still be able to be analyzed using formant 1 up to formant 4 (F1 up to F4). The characteristics of voice with certain languages or accents [17] can be used to improve the performance of identification process of voice utterance recognition.

The characteristics of the voice utterance may encounter change [18], some of them are caused by the influence of noise interference [19]. Besides, it can also be caused by a conversation that occurs in a high tension, under alcohol consumption [18], [20], as well as the influence of aging factor [13], [10]. For the characteristics change caused by aging factor, occurs because of the existence of change in organ of voice in a particular time interval. The time interval according to [13] occurs between 18 and 60 years old. The research about the influence of aging on verification process such done by [13], states that the result of verification is highly influenced by the time interval between the taking of sample record and at the time of process of verification. In the research done by [11] states that the time interval influences the fundamental frequency value (F0) and the first formant value (F1) however there is no systematic effect on the second formant value (F2) and third formant (F3).

The analysis of the characteristics for voice utterance as done by [6] by dividing some range of frequency, the result of the research aims to recognize the age and gender obtain high accuracy up to 92.86%.

This research aims to identify the voice utterance influenced by aging factor, by using extraction method of MFCC multichannel. The stage of this method as shown in Figure 2 is begun with the process of pre-emphasis as shown on the (1). The function of pre-emphasis, so that the spectrum will be smoother, more even or flat. The output of pre-emphasis is influenced by value for model of suggested value used is 0.97 [12].

(1) (2)

For output from each channel

…(Full text truncated)…

📄 Read Full PDF on ArXiv

📸 Image Gallery

Reference

This content is AI-processed based on ArXiv data.

Identification of Voice Utterance with Aging Factor Using the Method of MFCC Multichannel

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

📸 Image Gallery

Reference

Table of Contents

Table of Contents

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

📸 Image Gallery

Reference

Related Posts

Scaling Open Discrete Audio Foundation Models with Interleaved Semantic, Acoustic, and Text Tokens

UniTAF: A Modular Framework for Joint Text-to-Speech and Audio-to-Face Modeling

Phoneme-Based Persian Speech Recognition

Start searching

No results found