An Enhanced Static Data Compression Scheme Of Bengali Short Message
This paper concerns a modified approach of compressing Short Bengali Text Message for small devices. The prime objective of this research technique is to establish a low complexity compression scheme suitable for small devices having small memory and relatively lower processing speed. The basic aim is not to compress text of any size up to its maximum level without having any constraint on space and time, rather than the main target is to compress short messages up to an optimal level which needs minimum space, consume less time and the processor requirement is lower. We have implemented Character Masking, Dictionary Matching, Associative rule of data mining and Hyphenation algorithm for syllable based compression in hierarchical steps to achieve low complexity lossless compression of text message for any mobile devices. The scheme to choose the diagrams are performed on the basis of extensive statistical model and the static Huffman coding is done through the same context.
💡 Research Summary
This paper presents a novel, low-complexity lossless compression scheme specifically designed for short Bengali text messages (SMS) on resource-constrained mobile devices. Recognizing the limitations of small memory and lower processing power in such environments, the authors shift the focus from achieving maximum compression ratio to obtaining optimal compression under strict hardware constraints. The proposed method is a multi-stage, hierarchical pipeline that combines several lightweight techniques.
The compression process unfolds in four sequential core steps, followed by a final encoding stage. First, Character Masking identifies and encodes specific characters like spaces to reclaim basic redundancy. Second, Dictionary Matching replaces frequently used Bengali words or phrases with shorter codes from a pre-defined static dictionary, leveraging the statistical properties of the language without the runtime overhead of building a dynamic dictionary. Third, Associative Rule Mining (a technique borrowed from data mining) analyzes the text to discover and compress commonly co-occurring character pairs (diagrams). The set of diagrams to be used is selected based on an extensive statistical model of Bengali text. Fourth, a Hyphenation Algorithm decomposes words into syllables, exploiting linguistic structure for further compression. Finally, the transformed message, now consisting of an altered alphabet of original symbols, dictionary codes, diagram codes, and syllable codes, undergoes Static Huffman Coding. The Huffman codebook is also pre-generated offline using the same statistical model, ensuring high efficiency without real-time computation.
The key innovation lies in the synergistic integration of these steps. Each stage employs relatively simple operations to keep overall computational complexity low, making it suitable for mobile processors. By statically defining the dictionary, diagram set, and Huffman codes, the scheme minimizes runtime memory usage and processing delay, which are critical for battery life and user experience on mobile devices. The approach attacks redundancy at multiple levels: character, word, character combination, and syllable, aiming for a comprehensive compression effect. The paper positions this work as a necessary contribution to the field, addressing a gap in research for Bengali language-specific compression and providing a practical solution tailored for the pervasive mobile computing paradigm. The success of the method hinges on the quality of the underlying statistical model derived from a representative corpus of Bengali text.
Comments & Academic Discussion
Loading comments...
Leave a Comment