Cost-Efficient Cross-Lingual Retrieval-Augmented Generation for Low-Resource Languages: A Case Study in Bengali Agricultural Advisory

Reading time: 5 minute
...

📝 Original Info

  • Title: Cost-Efficient Cross-Lingual Retrieval-Augmented Generation for Low-Resource Languages: A Case Study in Bengali Agricultural Advisory
  • ArXiv ID: 2601.02065
  • Date: 2026-01-05
  • Authors: Researchers from original ArXiv paper

📝 Abstract

Access to reliable agricultural advisory remains limited in many developing regions due to a persistent language barrier: authoritative agricultural manuals are predominantly written in English, while farmers primarily communicate in low-resource local languages such as Bengali. Although recent advances in Large Language Models (LLMs) enable natural language interaction, direct generation in low-resource languages often exhibits poor fluency and factual inconsistency, while cloudbased solutions remain cost-prohibitive. This paper presents a cost-efficient, cross-lingual Retrieval-Augmented Generation (RAG) framework for Bengali agricultural advisory that emphasizes factual grounding and practical deployability. The proposed system adopts a translation-centric architecture in which Bengali user queries are translated into English, enriched through domain-specific keyword injection to align colloquial farmer terminology with scientific nomenclature, and answered via dense vector retrieval over a curated corpus of English agricultural manuals (FAO, IRRI). The generated English response is subsequently translated back into Bengali to ensure accessibility. The system is implemented entirely using open-source models and operates on consumer-grade hardware without reliance on paid APIs. Experimental evaluation demonstrates reliable sourcegrounded responses, robust rejection of out-of-domain queries, and an average end-to-end latency below 20 seconds. The results indicate that cross-lingual retrieval combined with controlled translation offers a practical and scalable solution for agricultural knowledge access in low-resource language settings.

💡 Deep Analysis

Deep Dive into Cost-Efficient Cross-Lingual Retrieval-Augmented Generation for Low-Resource Languages: A Case Study in Bengali Agricultural Advisory.

Access to reliable agricultural advisory remains limited in many developing regions due to a persistent language barrier: authoritative agricultural manuals are predominantly written in English, while farmers primarily communicate in low-resource local languages such as Bengali. Although recent advances in Large Language Models (LLMs) enable natural language interaction, direct generation in low-resource languages often exhibits poor fluency and factual inconsistency, while cloudbased solutions remain cost-prohibitive. This paper presents a cost-efficient, cross-lingual Retrieval-Augmented Generation (RAG) framework for Bengali agricultural advisory that emphasizes factual grounding and practical deployability. The proposed system adopts a translation-centric architecture in which Bengali user queries are translated into English, enriched through domain-specific keyword injection to align colloquial farmer terminology with scientific nomenclature, and answered via dense vector retrieva

📄 Full Content

Cost-Efficient Cross-Lingual Retrieval-Augmented Generation for Low-Resource Languages: A Case Study in Bengali Agricultural Advisory 1st Md. Asif Hossain Dept. of Computer Science and Engineering East West University Dhaka, Bangladesh asifhossain8612@gmail.com 3rdMantasha Rahman Mahi Dept. of Computer Science and Engineering East West University Dhaka, Bangladesh mantashamahi11@gmail.com 2nd Nabil Subhan Dept. of Computer Science and Engineering East West University Dhaka, Bangladesh nabilsubhan861@gmail.com 4th Jannatul Ferdous Nabila Dept. of Computer Science and Engineering East West University Dhaka, Bangladesh jannatulferdousnabila1@gmail.com Abstract—Access to reliable agricultural advisory remains limited in many developing regions due to a persistent language barrier: authoritative agricultural manuals are predominantly written in English, while farmers primarily communicate in low-resource local languages such as Bengali. Although recent advances in Large Language Models (LLMs) enable natural language interaction, direct generation in low-resource languages often exhibits poor fluency and factual inconsistency, while cloud- based solutions remain cost-prohibitive. This paper presents a cost-efficient, cross-lingual Retrieval- Augmented Generation (RAG) framework for Bengali agricul- tural advisory that emphasizes factual grounding and practical deployability. The proposed system adopts a translation-centric architecture in which Bengali user queries are translated into English, enriched through domain-specific keyword injection to align colloquial farmer terminology with scientific nomenclature, and answered via dense vector retrieval over a curated corpus of English agricultural manuals (FAO, IRRI). The generated English response is subsequently translated back into Bengali to ensure accessibility. The system is implemented entirely using open-source models and operates on consumer-grade hardware without reliance on paid APIs. Experimental evaluation demonstrates reliable source- grounded responses, robust rejection of out-of-domain queries, and an average end-to-end latency below 20 seconds. The results indicate that cross-lingual retrieval combined with controlled translation offers a practical and scalable solution for agricultural knowledge access in low-resource language settings. Index Terms—Retrieval-Augmented Generation (RAG), Cross- Lingual NLP, Low-Resource Languages, Bengali, Agricultural Advisory, Quantization, Large Language Models (LLMs) I. INTRODUCTION Agriculture plays a vital role in developing countries such as Bangladesh, where millions of people depend on farming for food security and income. International organizations in- cluding the Food and Agriculture Organization (FAO) and the International Rice Research Institute (IRRI) publish detailed agricultural manuals containing scientifically validated guid- ance on crop diseases, fertilizer usage, and best practices [1], [2]. However, a major accessibility challenge remains: these manuals are predominantly written in English and distributed as static PDF documents. For smallholder farmers who pri- marily communicate in Bengali, this information is effectively inaccessible. Recent advances in Large Language Models (LLMs) have enabled natural language interfaces for information access. However, directly applying standard LLMs for Bengali agri- cultural advisory presents significant limitations. Most high- performing models are trained primarily on English data, resulting in poor grammatical quality and factual inconsisten- cies in Bengali outputs [3]. In addition, commercial cloud- based LLM services are often cost-prohibitive for low-cost rural deployment. More critically, generative models operating without external grounding are prone to hallucinations, which can lead to unsafe recommendations in agriculture-related decision-making [4]. Retrieval-Augmented Generation (RAG) [5] has been pro- posed as a solution to reduce hallucinations by grounding responses in authoritative documents. In a RAG system, the model retrieves relevant information from trusted sources be- fore generating an answer. While effective, most existing RAG frameworks are designed for English-language use or require high computational resources, limiting their applicability in low-resource linguistic and deployment settings [6]. In the Bangladeshi agricultural context, an additional chal- lenge arises from a pronounced vocabulary gap. Farmers frequently use local or colloquial terms to describe crop dis- eases and symptoms (e.g., “Magra”), whereas official manuals rely on scientific terminology (e.g., “Stem Borer”) [7]. This mismatch prevents standard retrieval systems from effectively arXiv:2601.02065v1 [cs.CL] 5 Jan 2026 Fig. 1. System Architecture of the proposed Translation-Centric Cross-Lingual RAG Pipeline. The system processes Bengali queries by translating them to English, enriching them with domain-specific keywords, and retrieving relevant inform

…(Full text truncated)…

📸 Image Gallery

archi.png kb.png latancy.png

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut