Phoneme-Based Persian Speech Recognition
Undoubtedly, one of the most important issues in computer science is intelligent speech recognition. In these systems, computers try to detect and respond to the speeches they are listening to, like humans. In this research, presenting of a suitable …
Authors: Saber Malekzadeh
STFT - - - - Mel-frequency Cepstrum : - - - STF T MFCC : - - - - STFT - M FCC - - - - " " " " - B I DE BOWED - MFCC " " " " - - STFT - STFT - - - - - 1 - 1 - 1 - 2 - 1 - 3 - [ ] ] [ Moore [ ] periodic cy clic rate multirate random pulse width modulated periodic [ ] [ ] Harvey Fletcher [ ] [ ] [ ] Raj Reddy [ ] [ ] BBN I BM Carnegie Mellon Stanford [ ] [ ] L eonard E. Baum Carnegie Mellon Raj Reddy James Baker Janet Bake r [ ] Fred Jelinek IBM [ ] Raj Reddy ung Xe Dong H sphinx I BM MedSpeak Office [ ] [ ] Windows Vista Google I phone S iri Apple Micr osoft Cortana Google [ ] 1 - 4 - 2 - 1 - 2 - 2 - [ ] 2 - 3 - 2 - 3 - 1 - - [ ] 2 - 3 - 2 - x(t) X( f) [ ] 2 - 3 - 2 - 1 - [ ] - ST FT W x 2 - 3 - 3 - Mel-frequency Cepstrum Spectrum Mel Mel [ ] - M F C C 2 - 4 - 2 - 4 - 1 - [ ] 2 - 4 - 2 - [ ] 2 - 4 - 3 - [ ] 2 - 4 - 4 - [ ] 2 - 5 - 2 - 5 - 1 - . 2 - 5 - 2 - [ ] [ ] 2 - 6 - 2 - 6 - 1 - [ ] 2 - 6 - 2 - factors of variation Convolution Recurrent - [ ] Pooling Fully Connected 3 - 1 - 3 - 2 - 3 - 2 - 1 - Curvelet Ridglet [ ] 3 - 2 - 2 - [ ] 3 - 3 - 3 - 3 - 1 - 1 [ ] 3 - 3 - 2 - [ ] Perceptron Warr en Mcc ulloch Walter Pitts euron N Do nald O. Hebb Rosenblatt Bernard Widrow Marcian Hoff Adaline Marvin Minsky Sey mour Papert Grasberg Alexnet Resnet Densnet Nasnet 3 - 3 - 3 - Avalanch Carpenter Gail Anderson James Kohonen Teuvo Verbose 3 - 4 - 3 - 4 - 1 - [ ] - 3 - 4 - 2 - [ ] 3 - 4 - 3 - [ ] [ ] 3 - 5 - 4 - 1 - 4 - 2 - 4 - 2 - 1 - 4 - 2 - 2 - [ ] - 4 - 3 - - - 4 - 3 - 1 - - " " " " 4 - 3 - 2 - " " " " " " " " " " " " " " " " " " " " " " " " 4 - 3 - 3 - 4 - 3 - 4 - ) SVM ( Support Vector Machin e SVM [ ] 4 - 3 - 5 - 4 - 3 - 6 - MFCC STFT MFCC MFCC MFCC MFCC - STFT MFCC STFT Luven 4 - 3 - 7 - STFT 4 - 3 - 8 - STFT ST FT STFT STFT SVM STFT STFT STFT ، ST FT Formant - b i de b o w e d 4 - 3 - 9 - MFCC MF CC MFCC " " " " - M F C C " " " " MFCC MFCC MF CC 4 - 4 - STFT - 4 - 5 - Adobe Audtion Adaptive Noise Reduction STFT - STFT - ST F T 4 - 6 - Persian form English form Persian Example A I ʊ æ e o P B T D tʃ dʒ K G F V Kh S Z ʃ ʒ M N H L R Q j - Dropout n Bat chNormalizatio - same 4 - 6 - 1 - Relu Sequential adding P ) rectified linear unit Relu ( Relu Leaky R elu Softmax MaxPooling Perceptron Neuron Softmax Onehot Categorical Cross Entropy AdaDelta SGD Adam 2 3 Cross Entropy Compile Fit 5 - 1 - 5 - 2 - Support F -score Recall Precision Phoneme Number . . Avg / Total - F 5 - 2 - 1 - F - 5 - 2 - 2 - recision P Recall PCVC 5 - 3 - PC VC [ ] [ ] [ ] - 6 - 1 - 6 - 2 - AutoEncoder GAN AC GAN End to En d 6 - 3 - Miller, R., Low e, J Boar, R. ( ). The incredible music machine. Quartet Books Ltd James R. Lewis ( ): The Voice in the Machine: Building Computers T hat Understand Spee ch, International Journal of Human-Computer Interaction, , Moore, Gordon E. ( ). "Cramming more components onto integrated circuits". Electronics. Retrieved Kuo, B.C.:( ) Digital Control Systems, nd edn. Oxford University Press, New York Sejdić E.; Djurović I.; Jiang J. ( ). "Time-frequency feature representation using energy concentration: An overview of recent advances". Digital Signal Processing. ( ): doi: /j.dsp. Huffman, Larry. "Stokowski, Harvey Fletcher, and the Bell Labs E xperimental Recordings Juang, B. H.; Rabiner, Lawrence R.( ) "Automatic speech reco gnition a brief history of the technology development" (PDF): . Retrieved January Pierce, John R. ( ). "Whither speech recognition Journal of the Acoustical Society of America. ( ): . doi: Reddy, D. Raj. ( ) " Computer recognition of connected sp eech." The Journal of the Acoustical Society of America : Benesty, Jacob; Sondhi, M. M.; Huang, Yiteng ( ). Springer Handbook of Speech Processing. Springer Science Business Media. ISBN Blechman, R. O.; Blech man, Nicholas (June , ). "Hello, Hal ". The N ew Yorker. Retrieved January Klatt, Dennis H. ( ). " Review of the A RPA speech understanding project". The Journal of the Acoustical Society of America. ( ): doi: James K. Baker.( Machine-aided La beling of Connected Spe ech in Working Papers in Speech Recognition XI, Technical Reports, C omputer Science Department, Carnegie-Mellon University, Pittsburgh, PA Pioneering Speech Recognition". Retrieved January Thompson, Terry. "DO-IT" .( ) University of Washington. Retrieved June Froomkin, Dan. " The Computers Are Listening".( ) The Intercept. Retrieved June Haşim Sak, Andrew Senior, Kanishka Rao, Françoise Beaufays and Johan Schalkwyk (September ): Google voice search: faster and more accurate Zölzer, Udo ( ). Digital Audio Signal Processing. John W iley and Sons. ISBN Larson, David R. ( ). "Wavelet Analysis and Applications ( See: Unitary systems and wavelet sets)". Appl. Numer. Harmon. Anal. Birkhäuser: Cochran, William T., et al. " W hat is the fast Fourier transform?." Proceedings of the IEEE : Okumura, Shuhei. ( ) The short time Fourier transform and local signals. Diss. Carnegie Mellon University Sahidullah Md.; Saha , Gou tam (May ). "Design, analysis and ex perimental evaluation of block based transformat ion in MFCC computation for speaker recognition". Speech Comm unication. ( ): doi: /j.specom. Boashash, Boualem ( ). Time-frequency signal analysis and processing: a comprehensive reference. Academic Press Beigi, Homayoon.( ) Fundamentals of spe aker recognition. Springer Science Business Media Graves, Alex, Abdel -rahman Mohamed, and Geoffrey Hinton. ( ) " Speech recognition with deep recurrent neural networks." Acoustics, speech and signal processing (icassp), ieee international conference on. IEEE LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. ( ) "Deep learning." Nature : Albelwi, S Mahmood, A. ( ). A framework for designing the architectures of deep convolutional neural networks. Entropy, ( ), Stoica, Petre; Moses, Randolph ( ). Spectral Analysis of Signals (PDF). NJ: Prentice Hall Rumelhart, David E., Geoffrey E. Hinton.( ) "Learning Internal Repr esentations by Error Propagation". David E. Rumelhart, James L . McClelland, and the PDP research group. (editors), Parallel distributed processing: Explorations in the mic MacNeilage, Peter, ( ). The Origin of Speech. Oxford: Oxford University Press Ostendorf, Mari, and Salim Roukos.( ) " A stochastic segment model for phoneme-based continuous speech recognition." IEEE Transactions on Acoustics, Speech, and Signal Processing : Gupta, Vishwa N., et al.( ) "Phoneme based spe ech recognition." U.S. Patent No. . Feb Titze, Ingo R. ( ). Principles of Voice Production. Prentice Hall. ISBN . Archived from the original on Cortes, C.; Vapnik, V. ( ). "Support-vector networks" . Machine Learning. ( ): doi: /BF Nazari, M., Sayadiyan, A Valiollahzadeh, S. M. ( , April). Spea ker- independent v owel recognition in Persian spee ch. In Information and Communication Technologies: From Theory to Applications, . ICT TA . rd International Conference on (pp. Sadeghi, V. S Yaghmaie, K. ( ). Vowel recognition using neural networks. IJCSNS International Journal of Computer Science and Network Security, ( ), Tavanaei, A., Manzuri, M. T Sameti, H. ( , June). Mel-scaled Discrete Wavelet Transform and dynamic features for the Persian phoneme r ecognition. In Artificial Intelligence and Signal Processing (AISP), International Symposium on (pp. Abstract Undoubtedly , one of the most important issues in computer science is intelligent speech recog nition. In these sy stems, computers try to detect and respond to the speeches they are listening to, like humans. I n this research, presenting of a suitable method for the diagnosis of Persian phonemes by AI using the signal processing and classification algorithms have tried. For thi s purpose, the STFT algorithm has been used to process the audio signals, as well as to detect and classify the signals processed by the deep artificial neural network. At first, educational samples were provided as two phonological phrases in Persian langua ge and then signal processing operations were performed on them. Then the results for the data training have been given to the artificial deep neural network. At the final stage, the experiment was conducted on new sounds. Keyword s: Phoneme Recognition, Speech Recognition, Deep L earning Vali-e-Asr Univer sity of Rafsanjan Faculty of Math and Com puter Sciences Phoneme-Base d Persian Speech Rec ognition Report as Thesis of Master of C omputer Science in the fie ld of Artificail Intellig ence Supervisor: Dr. Mohamm ad Hosein Gholizadeh Adviser: Dr. Seyyed Nas er Razavi Saber Malek zadeh September
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment