A New Statistic Feature of the Short-Time Amplitude Spectrum Values for Humans Unvoiced Pronunciation

Reading time: 5 minute
...

📝 Abstract

In this paper, a new statistic feature of the discrete short-time amplitude spectrum is discovered by experiments for the signals of unvoiced pronunciation. For the random-varying short-time spectrum, this feature reveals the relationship between the amplitude’s average and its standard for every frequency component. On the other hand, the association between the amplitude distributions for different frequency components is also studied. A new model representing such association is inspired by the normalized histogram of amplitude. By mathematical analysis, the new statistic feature discovered is proved to be necessary evidence which supports the proposed model, and also can be direct evidence for the widely used hypothesis of “identical distribution of amplitude for all frequencies”.

💡 Analysis

In this paper, a new statistic feature of the discrete short-time amplitude spectrum is discovered by experiments for the signals of unvoiced pronunciation. For the random-varying short-time spectrum, this feature reveals the relationship between the amplitude’s average and its standard for every frequency component. On the other hand, the association between the amplitude distributions for different frequency components is also studied. A new model representing such association is inspired by the normalized histogram of amplitude. By mathematical analysis, the new statistic feature discovered is proved to be necessary evidence which supports the proposed model, and also can be direct evidence for the widely used hypothesis of “identical distribution of amplitude for all frequencies”.

📄 Content

A New Statistic Feature of the Short-Time Amplitude Spectrum Values for Human’s Unvoiced Pronunciation

                 XIAODONG ZHUANG1  
  1. Qingdao University, Electronics & Information College, Qingdao, 266071 CHINA

Abstract: - In this paper, a new statistic feature of the discrete short-time amplitude spectrum is discovered by experiments for the signals of unvoiced pronunciation. For the random-varying short-time spectrum, this feature reveals the relationship between the amplitude’s average and its standard for every frequency component. On the other hand, the association between the amplitude distributions for different frequency components is also studied. A new model representing such association is inspired by the normalized histogram of amplitude. By mathematical analysis, the new statistic feature discovered is proved to be necessary evidence which supports the proposed model, and also can be direct evidence for the widely used hypothesis of “identical distribution of amplitude for all frequencies”.

Key-Words: - unvoiced pronunciation, short-time spectrum, amplitude distribution, statistic analysis

1 Introduction Speech signal can be mathematically modelled by stochastic process. The speech features are random and time-varying in both time domain and transformed domains such as the short-time spectrum [1,2]. The statistic feature of speech signal is one of the important research topics. In the frequency domain, the short-time amplitude spectrum values can be mathematically taken as random variables, and there have been researches estimating their probability distribution, which facilitates the application of speech enhancement [3,4]. Such researches are based on the large amount of speech data in corpora like TIMIT or other database of daily speech signal from the internet [2,5].
However, these studies are based on the words or sentences spoken in daily-life communication, which are the mixture of various pronunciation types including vowel, consonant, plosive, etc. Based on such corpora, the estimated statistic feature is in fact the overall feature of the signal mixed by different pronunciation types. Therefore, it is necessary to further study the statistic feature of specific pronunciation type (or specific phoneme) alone, because different types have different pronunciation mechanisms.
The unvoiced pronunciation is one of the major pronunciation types, which is closely related to the aerodynamic process in vocal tract [6-8]. The physical process of unvoiced pronunciation is complicated, while the statistical study of its signal may reveal some underlying properties of it. In this paper, the statistic study is carried out in the frequency domain for unvoiced pronunciation. A novel statistical feature named “consistent standard deviation coefficient” is discovered for short-time amplitude spectrum data, which is revealed by the statistic study on stable and sustained signals of unvoiced pronunciation. Moreover, the relationship between the amplitude probability distributions of two different frequency components is investigated, based on which a new model is proposed representing such relationship. The validity of the new model is supported in mathematical analysis with the discovered statistic feature as direct evidence, which has potential application like speech synthesis.

2 New Statistic Feature in Frequency Domain for Unvoiced Pronunciation In order to obtain sufficient data for statistic study, the signals used in this study are stable and sustained pronunciations. For each unvoiced phoneme studied, its signal is recorded, and each signal is studied alone. For each signal, the short- time Fourier transform (STFT) is used to gather sufficient spectrum data for the statistic study. Since the STFT used is in discrete form, the spectrum has finite number of discrete components, and the statistic study is eventually performed for each frequency component individually.
WSEAS TRANSACTIONS on SIGNAL PROCESSING Xiaodong Zhuang E-ISSN: 2224-3488 265 Volume 12, 2016

Since currently there is little corpus of sustained phoneme pronunciation, signals have been captured using microphones connected to the sound card on computers. The signals were recorded at sample frequency of 16 kHz, with 16 bit per sample. To guarantee the generality of experimental results, signals have been captured for a group of unvoiced pronunciation spoken by different speakers, and on different recording platforms (different microphones and sound cards on different computers). In the collection of signal, the speakers were informed with the requirements of stable pronunciation during sufficient time length, which is required by reliable statistic study. For each unvoiced phoneme, the stability of pronunciation largely determines the effectiveness of further analysis, therefore the signals were captur

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut