Automated Word Prediction in Bangla Language Using Stochastic Language Models

Reading time: 5 minute
...

📝 Abstract

Word completion and word prediction are two important phenomena in typing that benefit users who type using keyboard or other similar devices. They can have profound impact on the typing of disable people. Our work is based on word prediction on Bangla sentence by using stochastic, i.e. N-gram language model such as unigram, bigram, trigram, deleted Interpolation and backoff models for auto completing a sentence by predicting a correct word in a sentence which saves time and keystrokes of typing and also reduces misspelling. We use large data corpus of Bangla language of different word types to predict correct word with the accuracy as much as possible. We have found promising results. We hope that our work will impact on the baseline for automated Bangla typing.

💡 Analysis

Word completion and word prediction are two important phenomena in typing that benefit users who type using keyboard or other similar devices. They can have profound impact on the typing of disable people. Our work is based on word prediction on Bangla sentence by using stochastic, i.e. N-gram language model such as unigram, bigram, trigram, deleted Interpolation and backoff models for auto completing a sentence by predicting a correct word in a sentence which saves time and keystrokes of typing and also reduces misspelling. We use large data corpus of Bangla language of different word types to predict correct word with the accuracy as much as possible. We have found promising results. We hope that our work will impact on the baseline for automated Bangla typing.

📄 Content

International Journal in Foundations of Computer Science & Technology (IJFCST) Vol.5, No.6, November 2015

DOI:10.5121/ijfcst.2015.5607 67

AUTOMATED WORD PREDICTION IN BANGLA LANGUAGE USING STOCHASTIC LANGUAGE MODELS

Md. Masudul Haque 1, Md. Tarek Habib2 and Md. Mokhlesur Rahman3

1Dept. of Electrical and Computer Engineering, North South University, Bangladesh 2Dept. of Computer Science and Engineering, Daffodil International University, Bangladesh 3Dept. of Computer Science and Engineering, Prime University, Bangladesh

ABSTRACT

Word completion and word prediction are two important phenomena in typing that benefit users who type using keyboard or other similar devices. They can have profound impact on the typing of disable people. Our work is based on word prediction on Bangla sentence by using stochastic, i.e. N-gram language model such as unigram, bigram, trigram, deleted Interpolation and backoff models for auto completing a sentence by predicting a correct word in a sentence which saves time and keystrokes of typing and also reduces misspelling. We use large data corpus of Bangla language of different word types to predict correct word with the accuracy as much as possible. We have found promising results. We hope that our work will impact on the baseline for automated Bangla typing.

KEYWORDS

Word prediction, stochastic model, natural language processing, corpus, N-gram, deleted interpolation, backoff method.

  1. INTRODUCTION

Auto complete or word completion works so that the user types the first letter or letters of a word and the program provides one or more higher probable words. If the word he intends to type is included in the list he can select it, for example by using the number of keys. If the word that the user wants is not predicted, the user must type the next letter of the predicted word. At this time, the word choice(s) is altered so that the words provided begin with the same letters as those that have been selected or the word that the user wants appears it is selected. Word prediction technique predicts word by analyzing previous word flow for auto completing a sentence with more accuracy by saving maximum keystroke of any user or student and also reduces misspelling. N-gram language model is important technique for word prediction. We use large data corpus for training in N-gram language model for predicting correct Bangla word to complete a Bangla sentence with more accuracy.

Word prediction means guessing the next word in a sentence . Word prediction helps disabled people for typing,speed up typing speed by decreasing keystrokes,helps in spelling and error detection and it also helps in speech recognition and hand writing recognition. Auto completion decreases misspelling of word. Word completion and word prediction also helps student to spell any word correctly and to type anything with fewer errors [1].

International Journal in Foundations of Computer Science & Technology (IJFCST) Vol.5, No.6, November 2015

68

Figure 1.Word completion vs. word prediction. Suggested words are highlighted in yellow color.

We survey many techniques to predict upcoming words of a sentence in different languages especially for English Language But there is no satisfactory analysis on Bangla language to predict words in a sentence. So we apply some N-gram language model, backoff and deleted interpolation techniques to predict Bangla words in a sentence. Word prediction is very important and complex task in natural language processing (NLP) to predict the correct word to complete a sentence in a very meaningful way.

We use statistical prediction technique such as N-gram technique as for example unigram, bigram, trigram, bakeoff propagation, deleted interpolation. We also use large data set of text word in Bangla which is collected from different news paper.

The paper is constructed as follows: related work in section 2, introduction of N-gram based word prediction in section 3, methodology in section 4, implementation in section 5, result analysis in section 6 and conclusion in section 7.

  1. RELATED WORK

In an analysis of predicting sentences [2] researcher developed a sentence completion method based on N-gram language models and they derived a k best Viterbi beam search decoder for strongly completing a sentence. We also observed use of Artificial Intelligence [3] for word prediction. Here syntactic and semantic analysis is done using the chart bottom-up technique for word prediction. Another researcher suggests an approach [4] of word prediction via a Clustered Optimal Binary Search Tree. They suggest using a cluster of computer to build optimal binary search tree which also contain extra link so that bigram and the trigram of the language also presented to achieve optimal

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut