Language Models for Handwritten Short Message Services
📝 Abstract
Handwriting is an alternative method for entering texts composing Short Message Services. However, a whole new language features the texts which are produced. They include for instance abbreviations and other consonantal writing which sprung up for time saving and fashion. We have collected and processed a significant number of such handwriting SMS, and used various strategies to tackle this challenging area of handwriting recognition. We proposed to study more specifically three different phenomena: consonant skeleton, rebus, and phonetic writing. For each of them, we compare the rough results produced by a standard recognition system with those obtained when using a specific language model.
💡 Analysis
Handwriting is an alternative method for entering texts composing Short Message Services. However, a whole new language features the texts which are produced. They include for instance abbreviations and other consonantal writing which sprung up for time saving and fashion. We have collected and processed a significant number of such handwriting SMS, and used various strategies to tackle this challenging area of handwriting recognition. We proposed to study more specifically three different phenomena: consonant skeleton, rebus, and phonetic writing. For each of them, we compare the rough results produced by a standard recognition system with those obtained when using a specific language model.
📄 Content
Language Models for Handwritten Short Message Services Emmanuel Prochasson Christian Viard-Gaudin Emmanuel Morin LINA – FRE CNRS 2729 Université de Nantes IRCCyN UMR CNRS 6597 Université de Nantes LINA – FRE CNRS 2729 Université de Nantes Emmanuel.Prochasson@univ-nantes.fr Christian.Viard-Gaudin@univ-nantes.fr Emmanuel.Morin@univ-nantes.fr Abstract Handwriting is an alternative method for entering texts composing Short Message Services. However, a whole new language features the texts which are produced. They include for instance abbreviations and other consonantal writing which sprung up for time saving and fashion. We have collected and processed a significant number of such handwriting SMS, and used various strategies to tackle this challenging area of handwriting recognition. We proposed to study more specifically three different phenomena: consonant skeleton, rebus, and phonetic writing. For each of them, we compare the rough results produced by a standard recognition system with those obtained when using a specific language model.
- Introduction SMS (Short Message Service) has achieved huge success in the wireless world. It is a technology that enables the sending and receiving of messages between mobile phones. As suggested by the name “Short Message Service”, the data that can be held by an SMS message is very limited. One SMS message can contain at most 140 bytes (1,120 bits) of data, so one SMS message can contain up to:
- 160 characters if 7-bit character encoding is used. (7-bit character encoding is suitable for encoding Latin characters like English alphabets.)
- 70 characters if 16-bit Unicode UCS2 character encoding is used. (SMS text messages containing non- Latin characters such as Chinese characters should use 16-bit character encoding.) Person-to-person text messaging is the most commonly used SMS application and it is what the SMS technology was originally designed for. In these kind of text messaging applications, a mobile user types an SMS text message using the keypad of his/her mobile phone, then he/she enters the mobile phone number of the recipient and finally sends the text message out. However, the small phone keypad and the limited message lengths caused a number of adaptations of spelling, as in the phrase “txt msg”, or use of CamelCase, such as in “ThisIsVeryCool”. Users aim to use the least number of characters needed to transmit a comprehensible message. Hence, punctuation and grammar are largely ignored [5]. To circumvent the bottleneck of the keyboard entry, two quite different strategies are encountered: one is to assist the user with optimized predictive text entry solutions ([1], [2]), which consists in some form of disambiguation to determine which letter, among the three or four letters shared by the same key, is intended by the writer (see [1] for complete references). Another is to replace the keyboard by handwriting input, using either a stylus and a screen, or a digital pen and paper solution connected to the GSM phones. Our goal here is to improve recognition of handwritten message to allow them to be sent like normal text SMS. It has been proved that language models allow to increase significantly the recognition rate of handwriting systems [3]. They allow minimizing the error recognition rate by taking into account the context in order to disambiguate poorly written texts. Two approaches are likely to be implemented. One is based on structural models specifically designed by linguistic experts, while the other approach relies on some statistics computed on large written text corpora. One example of the latter being the well-known n- gram models, which work either at the character or at the word levels. In this paper, we have identified three different phenomena that alter SMS texts, and we propose for each of them a specific adaptation of the handwriting recognition engine.
- SMS language For both technical reasons ̶ limited length of the text and multiple taps of the key, and sociological reasons ̶ short messages are particularly popular amongst teens, several phenomena affect SMS texts when compared to standard written productions. A few are listed below. 2.1. Rebus style Rebus style writing is characterized by using a single letter or digit to replace a whole syllable or word. Examples are: −be → b ; you → u ; are → r (single letter replace whole word) ; −ate → 8 ; for → 4 ; to, too → 2 (single digit replace a whole word) ; −skate → sk8 ; later → l8er ; before → b4 (letter or digit replace a whole syllable within word) ; Using only rebus style, one can easily construct a whole phrase, for example “c u l8er!” (see you later!). 2.2. Consonant Skeleton style Consonant Skeleton style is characterized by the withdrawal of most of the vowels of a w
This content is AI-processed based on ArXiv data.