AgriRegion: Region-Aware Retrieval for High-Fidelity Agricultural Advice

February 23, 2026

Reading time: 6 minute

...

📝 Original Info

Title: AgriRegion: Region-Aware Retrieval for High-Fidelity Agricultural Advice
ArXiv ID: 2512.10114
Date: 2025-12-10
Authors: ** - Mesafint Fanuel, North Carolina A&T State University, USA - Mahmoud Nabil Mahmoud, The University of Alabama, USA - Crystal Cook Marshall, North Carolina Agricultural and Technical State University, USA - Vishal Lakhotia, Amazon AWS, USA - Biswanath Dari, North Carolina Agricultural and Technical State University, USA - Kaushik Roy, North Carolina Agricultural and Technical State University, USA - Shaohu Zhang, North Carolina Agricultural and Technical State University, USA **

📝 Abstract

Large Language Models (LLMs) have demonstrated significant potential in democratizing access to information. However, in the domain of agriculture, general-purpose models frequently suffer from "contextual hallucination", which provides non-factual advice or answers are scientifically sound in one region but disastrous in another due to variations in soil, climate, and local regulations. We introduce AgriRegion, a Retrieval-Augmented Generation (RAG) framework designed specifically for high-fidelity, region-aware agricultural advisory. Unlike standard RAG approaches that rely solely on semantic similarity, AgriRegion incorporates a geospatial metadata injection layer and a region-prioritized re-ranking mechanism. By restricting the knowledge base to verified local agricultural extension services and enforcing geo-spatial constraints during retrieval, AgriRegion ensures that the advice regarding planting schedules, pest control, and fertilization is locally accurate. We create a novel benchmark dataset, AgriRegion-Eval, which comprises 160 domain-specific questions across 12 agricultural subfields. Experiments demonstrate that AgriRegion reduces hallucinations by 10-20% compared to state-of-the-art LLMs systems and significantly improves trust scores according to a comprehensive evaluation. CCS Concepts: • Computing methodologies → Machine learning; • Applied computing → Agriculture.

💡 Deep Analysis

Deep Dive into AgriRegion: Region-Aware Retrieval for High-Fidelity Agricultural Advice.

Large Language Models (LLMs) have demonstrated significant potential in democratizing access to information. However, in the domain of agriculture, general-purpose models frequently suffer from “contextual hallucination”, which provides non-factual advice or answers are scientifically sound in one region but disastrous in another due to variations in soil, climate, and local regulations. We introduce AgriRegion, a Retrieval-Augmented Generation (RAG) framework designed specifically for high-fidelity, region-aware agricultural advisory. Unlike standard RAG approaches that rely solely on semantic similarity, AgriRegion incorporates a geospatial metadata injection layer and a region-prioritized re-ranking mechanism. By restricting the knowledge base to verified local agricultural extension services and enforcing geo-spatial constraints during retrieval, AgriRegion ensures that the advice regarding planting schedules, pest control, and fertilization is locally accurate. We create a novel b

📄 Full Content

AgriRegion: Region-Aware Retrieval for High-Fidelity Agricultural Advice MESAFINT FANUEL, North Carolina A&T State University, USA MAHMOUD NABIL MAHMOUD, The University of Alabama, USA CRYSTAL COOK MARSHALL, North Carolina Agricultural and Technical State University, USA VISHAL LAKHOTIA, Amazon AWS, USA BISWANATH DARI, North Carolina Agricultural and Technical State University, USA KAUSHIK ROY, North Carolina Agricultural and Technical State University, USA SHAOHU ZHANG, North Carolina Agricultural and Technical State University, USA Large Language Models (LLMs) have demonstrated significant potential in democratizing access to information. However, in the domain of agriculture, general-purpose models frequently suffer from "contextual hallucination", which provides non-factual advice or answers are scientifically sound in one region but disastrous in another due to variations in soil, climate, and local regulations. We introduce AgriRegion, a Retrieval-Augmented Generation (RAG) framework designed specifically for high-fidelity, region-aware agricultural advisory. Unlike standard RAG approaches that rely solely on semantic similarity, AgriRegion incorporates a geospatial metadata injection layer and a region-prioritized re-ranking mechanism. By restricting the knowledge base to verified local agricultural extension services and enforcing geo-spatial constraints during retrieval, AgriRegion ensures that the advice regarding planting schedules, pest control, and fertilization is locally accurate. We create a novel benchmark dataset, AgriRegion-Eval, which comprises 160 domain-specific questions across 12 agricultural subfields. Experiments demonstrate that AgriRegion reduces hallucinations by 10-20% compared to state-of-the-art LLMs systems and significantly improves trust scores according to a comprehensive evaluation. CCS Concepts: • Computing methodologies →Machine learning; • Applied computing →Agriculture. Additional Key Words and Phrases: intelligent systems, AI, retrieval-augmented generation, agriculture ACM Reference Format: Mesafint Fanuel, Mahmoud Nabil Mahmoud, Crystal Cook Marshall, Vishal Lakhotia, Biswanath Dari, Kaushik Roy, and Shaohu Zhang. 2025. AgriRegion: Region-Aware Retrieval for High-Fidelity Agricultural Advice. 1, 1 (December 2025), 15 pages. https: //doi.org/10.1145/nnnnnnn.nnnnnnn 1 Introduction Large Language Models (LLMs) such as ChatGPT [26], Deepseek [10], and Gemini [14] have shown promising capabilities in image understanding and interpreting, text summarization, question answering (QA), and dialog systems [8, 33, 35]. Authors’ Contact Information: Mesafint Fanuel, North Carolina A&T State University, Greensboro, NC, USA, mfanuel@ncat.edu; Mahmoud Nabil Mahmoud, The University of Alabama, Tuscaloosa, AL, USA, mmahmoud1@ua.edu; Crystal Cook Marshall, North Carolina Agricultural and Technical State University, Greensboro, NC, USA, cacookmarshall@ncat.edu; Vishal Lakhotia, Amazon AWS, USA, lakhov@amazon.com; Biswanath Dari, North Carolina Agricultural and Technical State University, Greensboro, NC, USA, bdari@ncat.edu; Kaushik Roy, North Carolina Agricultural and Technical State University, Greensboro, NC, USA, kroy@ncat.edu; Shaohu Zhang, North Carolina Agricultural and Technical State University, Greensboro, NC, USA, szhang1@ncat.edu. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. © 2025 Copyright held by the owner/author(s). Publication rights licensed to ACM. Manuscript submitted to ACM Manuscript submitted to ACM 1 arXiv:2512.10114v1 [cs.AI] 10 Dec 2025 2 Mesafint Fanuel et al. Despite their remarkable success, LLMs face challenges in domain-specific or knowledge-intensive tasks [20]. They often struggle to provide accurate and relevant responses to niche or complex queries, particularly when they are faced with questions requiring specialized knowledge, or when asked to generate content that requires up-to-date information in region. A promising solution to these challenges is Retrieval-Augmented Generation (RAG), which involves integrating parametric and non-parametric memory components. This method combines the capabilities of LLMs with an external information retrieval system, allowing the model to dynamically search and incorporate information from extensive databases or document collections [15, 20]. By leveraging external knowledge beyond the model’s pre-trained dataset, this approach improves the model’s ability to produce accurate an

…(Full text truncated)…

📄 Read Full PDF on ArXiv