A Framework for Devanagari Script-based Captcha
Human Interactive Proofs (HIPs) are automatic reverse Turing tests designed to distinguish between various groups of users. Completely Automatic Public Turing test to tell Computers and Humans Apart (CAPTCHA) is a HIP system that distinguish between humans and malicious computer programs. Many CAPTCHAs have been proposed in the literature that text-graphical based, audio-based, puzzle-based and mathematical questions-based. The design and implementation of CAPTCHAs fall in the realm of Artificial Intelligence. We aim to utilize CAPTCHAs as a tool to improve the security of Internet based applications. In this paper we present a framework for a text-based CAPTCHA based on Devanagari script which can exploit the difference in the reading proficiency between humans and computer programs. Our selection of Devanagari script-based CAPTCHA is based on the fact that it is used by a large number of Indian languages including Hindi which is the third most spoken language. There is potential for an exponential rise in the applications that are likely to be developed in that script thereby making it easy to secure Indian language based applications.
💡 Research Summary
The paper proposes a novel CAPTCHA framework that leverages the Devanagari script, which is used by more than 400 million speakers across India and Nepal. Recognizing that current text‑based CAPTCHAs predominantly rely on Latin alphabets and that OCR technologies for Devanagari remain relatively weak due to the script’s complex structure (multiple vowel signs, conjunct consonants, and the ubiquitous horizontal headline “shirorekha”), the authors argue that Devanagari offers a natural advantage for distinguishing humans from automated bots.
The framework, named “DevaCAPTCHA,” consists of five interconnected modules:
-
DevaDB – a large, indexed repository of digitized Devanagari material (books, newspapers, manuscripts). The database stores both image files and OCR‑derived text, enabling rapid random selection of words or phrases.
-
Query Generator – creates a request to DevaDB based on configurable parameters such as minimum/maximum string length, word versus phrase selection, and a per‑user random seed. This module determines whether the challenge will be an existing lexical item or a synthetic character sequence.
-
Obfuscator – the core security component. It defines a transformation function with 50‑100 possible distortion parameters (font variation, size, character spacing, removal of the shirorekha, rotation, skew, background noise, curve overlay, conjunct density, etc.). For each challenge, a random subset of m < N parameters is selected, and their intensity is tuned according to a robustness‑usability trade‑off. This dynamic, per‑user randomization makes it extremely difficult for OCR‑based bots to learn a stable preprocessing pipeline.
-
DevaGUI – the client‑side interface that displays the distorted image, provides a text input field, a submit button, a timer for expiration, and a refresh option to request a new challenge. The GUI is designed to work on both desktop and mobile platforms.
-
Match Response – validates the user’s input against the original string retrieved from DevaDB, granting or denying access accordingly.
The authors discuss the balance between robustness (resistance to automated attacks) and usability (human solve‑time). They note that excessive distortion can hinder users with visual impairments or low‑resolution devices, so parameter selection must be calibrated for the target audience.
In the literature review, the paper surveys existing CAPTCHA families (Gimpy, Baffletext, audio, image‑based, multilingual) and highlights prior attempts to use non‑Latin scripts (Arabic, Persian) where diacritic complexity hampers OCR. The Devanagari case is presented as a more challenging variant because of its conjunct formation and the shirorekha that ties characters together.
The motivation section emphasizes the prevalence of brute‑force and bot attacks on Indian‑language web services (email, social networks, e‑learning platforms). By integrating DevaCAPTCHA, service providers can protect these applications without requiring users to switch to Latin‑based keyboards, as many already have Devanagari input methods.
Finally, the paper outlines future work: empirical usability studies, systematic evaluation against modern deep‑learning OCR attacks, performance benchmarking under server load, and extension to other Indian scripts such as Tamil or Kannada. While the design is conceptually sound, the manuscript lacks concrete experimental results, detailed algorithmic specifications for parameter selection, and quantitative security analysis, which are necessary to validate the claimed advantages. Nonetheless, the proposal opens a promising direction for region‑specific CAPTCHA design that exploits script‑level complexities to enhance web security.
Comments & Academic Discussion
Loading comments...
Leave a Comment