A New Statistic Feature of the Short-Time Amplitude Spectrum Values for Humans Unvoiced Pronunciation
📝 Abstract
In this paper, a new statistic feature of the discrete short-time amplitude spectrum is discovered by experiments for the signals of unvoiced pronunciation. For the random-varying short-time spectrum, this feature reveals the relationship between the amplitude’s average and its standard for every frequency component. On the other hand, the association between the amplitude distributions for different frequency components is also studied. A new model representing such association is inspired by the normalized histogram of amplitude. By mathematical analysis, the new statistic feature discovered is proved to be necessary evidence which supports the proposed model, and also can be direct evidence for the widely used hypothesis of “identical distribution of amplitude for all frequencies”.
💡 Analysis
In this paper, a new statistic feature of the discrete short-time amplitude spectrum is discovered by experiments for the signals of unvoiced pronunciation. For the random-varying short-time spectrum, this feature reveals the relationship between the amplitude’s average and its standard for every frequency component. On the other hand, the association between the amplitude distributions for different frequency components is also studied. A new model representing such association is inspired by the normalized histogram of amplitude. By mathematical analysis, the new statistic feature discovered is proved to be necessary evidence which supports the proposed model, and also can be direct evidence for the widely used hypothesis of “identical distribution of amplitude for all frequencies”.
📄 Content
A New Statistic Feature of the Short-Time Amplitude Spectrum Values for Human’s Unvoiced Pronunciation
XIAODONG ZHUANG1
- Qingdao University, Electronics & Information College, Qingdao, 266071 CHINA
Abstract: - In this paper, a new statistic feature of the discrete short-time amplitude spectrum is discovered by experiments for the signals of unvoiced pronunciation. For the random-varying short-time spectrum, this feature reveals the relationship between the amplitude’s average and its standard for every frequency component. On the other hand, the association between the amplitude distributions for different frequency components is also studied. A new model representing such association is inspired by the normalized histogram of amplitude. By mathematical analysis, the new statistic feature discovered is proved to be necessary evidence which supports the proposed model, and also can be direct evidence for the widely used hypothesis of “identical distribution of amplitude for all frequencies”.
Key-Words: - unvoiced pronunciation, short-time spectrum, amplitude distribution, statistic analysis
1 Introduction
Speech signal can be mathematically modelled by
stochastic process. The speech features are random
and time-varying in both time domain and
transformed domains such as the short-time
spectrum [1,2]. The statistic feature of speech signal
is one of the important research topics. In the
frequency
domain,
the
short-time
amplitude
spectrum values can be mathematically taken as
random variables, and there have been researches
estimating their probability distribution, which
facilitates the application of speech enhancement
[3,4]. Such researches are based on the large amount
of speech data in corpora like TIMIT or other
database of daily speech signal from the internet
[2,5].
However, these studies are based on the words or
sentences spoken in daily-life communication,
which are the mixture of various pronunciation
types including vowel, consonant, plosive, etc.
Based on such corpora, the estimated statistic
feature is in fact the overall feature of the signal
mixed by different pronunciation types. Therefore, it
is necessary to further study the statistic feature of
specific pronunciation type (or specific phoneme)
alone, because different types have different
pronunciation mechanisms.
The unvoiced pronunciation is one of the major
pronunciation types, which is closely related to the
aerodynamic process in vocal tract [6-8]. The
physical process of unvoiced pronunciation is
complicated, while the statistical study of its signal
may reveal some underlying properties of it. In this
paper, the statistic study is carried out in the
frequency domain for unvoiced pronunciation. A
novel statistical feature named “consistent standard
deviation coefficient” is discovered for short-time
amplitude spectrum data, which is revealed by the
statistic study on stable and sustained signals of
unvoiced pronunciation. Moreover, the relationship
between the amplitude probability distributions of
two different frequency components is investigated,
based on which a new model is proposed
representing such relationship. The validity of the
new model is supported in mathematical analysis
with the discovered statistic feature as direct
evidence, which has potential application like
speech synthesis.
2 New Statistic Feature in Frequency
Domain for Unvoiced Pronunciation
In order to obtain sufficient data for statistic study,
the signals used in this study are stable and
sustained
pronunciations.
For
each
unvoiced
phoneme studied, its signal is recorded, and each
signal is studied alone. For each signal, the short-
time Fourier transform (STFT) is used to gather
sufficient spectrum data for the statistic study. Since
the STFT used is in discrete form, the spectrum has
finite number of discrete components, and the
statistic study is eventually performed for each
frequency component individually.
WSEAS TRANSACTIONS on SIGNAL PROCESSING
Xiaodong Zhuang
E-ISSN: 2224-3488
265
Volume 12, 2016
Since currently there is little corpus of sustained phoneme pronunciation, signals have been captured using microphones connected to the sound card on computers. The signals were recorded at sample frequency of 16 kHz, with 16 bit per sample. To guarantee the generality of experimental results, signals have been captured for a group of unvoiced pronunciation spoken by different speakers, and on different recording platforms (different microphones and sound cards on different computers). In the collection of signal, the speakers were informed with the requirements of stable pronunciation during sufficient time length, which is required by reliable statistic study. For each unvoiced phoneme, the stability of pronunciation largely determines the effectiveness of further analysis, therefore the signals were captur
This content is AI-processed based on ArXiv data.