ASR Under the Stethoscope: Evaluating Biases in Clinical Speech Recognition across Indian Languages
Automatic Speech Recognition (ASR) is increasingly used to document clinical encounters, yet its reliability in multilingual and demographically diverse Indian healthcare contexts remains largely unknown. In this study, we conduct the first systematic audit of ASR performance on real world clinical interview data spanning Kannada, Hindi, and Indian English, comparing leading models including Indic Whisper, Whisper, Sarvam, Google speech to text, Gemma3n, Omnilingual, Vaani, and Gemini. We evaluate transcription accuracy across languages, speakers, and demographic subgroups, with a particular focus on error patterns affecting patients vs. clinicians and gender based or intersectional disparities. Our results reveal substantial variability across models and languages, with some systems performing competitively on Indian English but failing on code mixed or vernacular speech. We also uncover systematic performance gaps tied to speaker role and gender, raising concerns about equitable deployment in clinical settings. By providing a comprehensive multilingual benchmark and fairness analysis, our work highlights the need for culturally and demographically inclusive ASR development for healthcare ecosystem in India.
💡 Research Summary
The research paper, “ASR Under the Stethoscope: Evaluating Biases in Clinical Speech Recognition across Indian Languages,” presents a critical and systematic audit of Automatic Speech Recognition (ASR) technologies within the complex linguistic landscape of the Indian healthcare sector. As ASR becomes an integral tool for automating clinical documentation, the study addresses a vital gap: the lack of reliability and fairness in multilingual and demographically diverse environments.
The study evaluates a wide array of state-of-the-art models, ranging from global giants like Google Speech-to-Text, Whisper, and Gemini to India-centric models such as Indic Whisper, Sarvam, and Vaani. The scope of the evaluation covers three primary linguistic domains: Kannada, Hindi, and Indian English. The researchers went beyond simple accuracy metrics to investigate deep-seated biases, specifically focusing on how transcription errors correlate with speaker roles (clinicians vs. patients), gender, and the presence of code-mixed or vernacular speech patterns.
The findings reveal a significant performance disparity across different linguistic contexts. While many models demonstrate competitive performance when processing standard Indian English, they struggle significantly with the “code-mixed” speech (the blending of multiple languages) and regional dialects that are ubiquitous in Indian clinical encounters. This failure to handle linguistic fluidity poses a major risk to the integrity of medical records.
Furthermore, the study uncovers a troubling pattern of systemic bias related to speaker identity. There is a noticeable performance gap between the transcription of clinicians and patients, suggesting that the voices of patients—often the most critical subjects in a medical context—are more prone to being misrecorded. Additionally, the research highlights intersectional disparities, where gender and linguistic background intersect to create uneven error rates.
The implications of these findings are profound for the deployment of AI in healthcare. Inaccurate transcriptions in a clinical setting are not merely technical errors; they are potential threats to patient safety and clinical decision-making. The study concludes that for ASR technology to be equitably deployed in the Indian healthcare ecosystem, there is an urgent need for the development of culturally and demographically inclusive models. This requires a concerted effort to build datasets that reflect the true linguistic diversity of India, ensuring that the benefits of AI-driven healthcare are accessible to all, regardless of their language, gender, or dialect.
Comments & Academic Discussion
Loading comments...
Leave a Comment