📝 Original Info
- Title: Structural Analysis of Hindi Phonetics and A Method for Extraction of Phonetically Rich Sentences from a Very Large Hindi Text Corpus
- ArXiv ID: 1701.08655
- Date: 2017-02-08
- Authors: Researchers from original ArXiv paper
📝 Abstract
Automatic speech recognition (ASR) and Text to speech (TTS) are two prominent area of research in human computer interaction nowadays. A set of phonetically rich sentences is in a matter of importance in order to develop these two interactive modules of HCI. Essentially, the set of phonetically rich sentences has to cover all possible phone units distributed uniformly. Selecting such a set from a big corpus with maintaining phonetic characteristic based similarity is still a challenging problem. The major objective of this paper is to devise a criteria in order to select a set of sentences encompassing all phonetic aspects of a corpus with size as minimum as possible. First, this paper presents a statistical analysis of Hindi phonetics by observing the structural characteristics. Further a two stage algorithm is proposed to extract phonetically rich sentences with a high variety of triphones from the EMILLE Hindi corpus. The algorithm consists of a distance measuring criteria to select a sentence in order to improve the triphone distribution. Moreover, a special preprocessing method is proposed to score each triphone in terms of inverse probability in order to fasten the algorithm. The results show that the approach efficiently build uniformly distributed phonetically-rich corpus with optimum number of sentences.
💡 Deep Analysis
Deep Dive into Structural Analysis of Hindi Phonetics and A Method for Extraction of Phonetically Rich Sentences from a Very Large Hindi Text Corpus.
Automatic speech recognition (ASR) and Text to speech (TTS) are two prominent area of research in human computer interaction nowadays. A set of phonetically rich sentences is in a matter of importance in order to develop these two interactive modules of HCI. Essentially, the set of phonetically rich sentences has to cover all possible phone units distributed uniformly. Selecting such a set from a big corpus with maintaining phonetic characteristic based similarity is still a challenging problem. The major objective of this paper is to devise a criteria in order to select a set of sentences encompassing all phonetic aspects of a corpus with size as minimum as possible. First, this paper presents a statistical analysis of Hindi phonetics by observing the structural characteristics. Further a two stage algorithm is proposed to extract phonetically rich sentences with a high variety of triphones from the EMILLE Hindi corpus. The algorithm consists of a distance measuring criteria to select
📄 Full Content
2016 Conference of The Oriental Chapter of International Committee
for Coordination and Standardization of Speech Databases and Assessment Technique (O-COCOSDA)
26-28 October 2016, Bali, Indonesia
Structural Analysis of Hindi Phonetics and A
Method for Extraction of Phonetically Rich
Sentences from a Very Large Hindi Text Corpus
Shrikant Malviya∗, Rohit Mishra† and Uma Shanker Tiwary‡
Department of Information Technology
Indian Institute of Information Technology, Allahabad, India 211012
∗Email: shrikant.iet6153@gmail.com
†Email: rohit129iiita@gmail.com
‡Email: ustiwary@gmail.com
Abstract—Automatic speech recognition (ASR) and Text to
speech (TTS) are two prominent area of research in human
computer interaction nowadays. A set of phonetically rich
sentences is in a matter of importance in order to develop
these two interactive modules of HCI. Essentially, the set of
phonetically rich sentences has to cover all possible phone units
distributed uniformly. Selecting such a set from a big corpus
with maintaining phonetic characteristic based similarity is still
a challenging problem. The major objective of this paper is to
devise a criteria in order to select a set of sentences encompassing
all phonetic aspects of a corpus with size as minimum as possible.
First, this paper presents a statistical analysis of Hindi phonetics
by observing the structural characteristics. Further a two stage
algorithm is proposed to extract phonetically rich sentences with
a high variety of triphones from the EMILLE Hindi corpus. The
algorithm consists of a distance measuring criteria to select a
sentence in order to improve the triphone distribution. Moreover,
a special preprocessing method is proposed to score each triphone
in terms of inverse probability in order to fasten the algorithm.
The results show that the approach efficiently build uniformly
distributed phonetically-rich corpus with optimum number of
sentences.
Keywords—Phonetically-Rich Sentences; Statistical Analysis;
Phonemes; Triphone; Hindi Speech Recognition; Grapheme-
Phoneme; Hindi Phonology; Phone Like Units(PLU);
I. Iඇඍඋඈൽඎർඍංඈඇ
When a set of sentences would be called as phonetically-
rich set? The answer to the question depends on two statistical
properties of phonetic distribution. First one is to know about
the characteristic distribution of phonemes which decides the
phonetic richnes of a sentence. On the other hand second prop-
erty talks about the phonetic resemblance between extracted
sentences and language in study. Evidently, this field of study
is significantly related to automatic speech recognition (ASR)
and speech synthesis(TTS) [1].
The task of extracting phonetically rich and balanced sen-
tences involves, analyzing a large corpus and performing the
procedure to extract sentences using sentence extraction crite-
ria, based on various stochastic methodologies [2]. Addition-
ally, a corpus has to be phonetically balanced through made up
of sentences having phonetic units as per its distribution in the
natural spoken language. Now based on these set of sentences
which contains all the phonemes in most of the possible
contexts, when recorded, would produce a phonetically rich
and balanced speech corpus [3], [4].
Compare to some studies which are based on employing
words, syllables and monophones, most of the current research
in the development of Automatic Speech Recognition (ASR)
and Text to Speech (TTS) systems widely focused on how the
contextual phone units e.g. triphones and diphones could be
used for improving the robustness of the systems [5], [6].
The field of constructing a phonetically-rich sentence cor-
pus is relevant to various applications i.e. ASR and speech
synthesis and many more. For instance, to estimate a robust
accoustic model, a phonetically rich speech dataset is a basic
requirement [7]. Phonologists have found useful to have such a
specific corpora in order to develop a sytem to analyze speech
production and variability [8]. In speech therapy, to find com-
municative disorders of patients, phonetically-rich sentences
are often utilized in assessment of patient’s speech production
under surveillance in various phonetic/phonological contexts
[9].
A novel approach based on the triphone distribution has
been formulated and evaluated in this paper. The problem
could be elaborated formally as: suppose a corpus C is given
which consists of s sentences, find a subset K such that
subset K containing sk sentences having uniformly distributed
triphones. In first appearance the problem looks simple, but
due to inherent high computational time complexity, the task
is considered to be approached diligently. The problem is a
non-polynomial type as it generates a set of instances based
on the combinations of triphones and should be considered as
an intractable problem [10].
A two phase algorithm has been proposed to extract set of
phonetically-rich triphone sentences from EMILLE corpora in
order to build a phonetically-rich corpus for Hindi.
…(Full text truncated)…
📸 Image Gallery
Reference
This content is AI-processed based on ArXiv data.