Large Language Models (LLMs) have demonstrated significant potential in democratizing access to information. However, in the domain of agriculture, general-purpose models frequently suffer from "contextual hallucination", which provides non-factual advice or answers are scientifically sound in one region but disastrous in another due to variations in soil, climate, and local regulations. We introduce AgriRegion, a Retrieval-Augmented Generation (RAG) framework designed specifically for high-fidelity, region-aware agricultural advisory. Unlike standard RAG approaches that rely solely on semantic similarity, AgriRegion incorporates a geospatial metadata injection layer and a region-prioritized re-ranking mechanism. By restricting the knowledge base to verified local agricultural extension services and enforcing geo-spatial constraints during retrieval, AgriRegion ensures that the advice regarding planting schedules, pest control, and fertilization is locally accurate. We create a novel benchmark dataset, AgriRegion-Eval, which comprises 160 domain-specific questions across 12 agricultural subfields. Experiments demonstrate that AgriRegion reduces hallucinations by 10-20% compared to state-of-the-art LLMs systems and significantly improves trust scores according to a comprehensive evaluation. CCS Concepts: • Computing methodologies → Machine learning; • Applied computing → Agriculture.
Deep Dive into AgriRegion: Region-Aware Retrieval for High-Fidelity Agricultural Advice.
Large Language Models (LLMs) have demonstrated significant potential in democratizing access to information. However, in the domain of agriculture, general-purpose models frequently suffer from “contextual hallucination”, which provides non-factual advice or answers are scientifically sound in one region but disastrous in another due to variations in soil, climate, and local regulations. We introduce AgriRegion, a Retrieval-Augmented Generation (RAG) framework designed specifically for high-fidelity, region-aware agricultural advisory. Unlike standard RAG approaches that rely solely on semantic similarity, AgriRegion incorporates a geospatial metadata injection layer and a region-prioritized re-ranking mechanism. By restricting the knowledge base to verified local agricultural extension services and enforcing geo-spatial constraints during retrieval, AgriRegion ensures that the advice regarding planting schedules, pest control, and fertilization is locally accurate. We create a novel b
AgriRegion: Region-Aware Retrieval for High-Fidelity Agricultural Advice
MESAFINT FANUEL, North Carolina A&T State University, USA
MAHMOUD NABIL MAHMOUD, The University of Alabama, USA
CRYSTAL COOK MARSHALL, North Carolina Agricultural and Technical State University, USA
VISHAL LAKHOTIA, Amazon AWS, USA
BISWANATH DARI, North Carolina Agricultural and Technical State University, USA
KAUSHIK ROY, North Carolina Agricultural and Technical State University, USA
SHAOHU ZHANG, North Carolina Agricultural and Technical State University, USA
Large Language Models (LLMs) have demonstrated significant potential in democratizing access to information. However, in the
domain of agriculture, general-purpose models frequently suffer from "contextual hallucination", which provides non-factual advice
or answers are scientifically sound in one region but disastrous in another due to variations in soil, climate, and local regulations.
We introduce AgriRegion, a Retrieval-Augmented Generation (RAG) framework designed specifically for high-fidelity, region-aware
agricultural advisory. Unlike standard RAG approaches that rely solely on semantic similarity, AgriRegion incorporates a geospatial
metadata injection layer and a region-prioritized re-ranking mechanism. By restricting the knowledge base to verified local agricultural
extension services and enforcing geo-spatial constraints during retrieval, AgriRegion ensures that the advice regarding planting
schedules, pest control, and fertilization is locally accurate. We create a novel benchmark dataset, AgriRegion-Eval, which comprises
160 domain-specific questions across 12 agricultural subfields. Experiments demonstrate that AgriRegion reduces hallucinations by
10-20% compared to state-of-the-art LLMs systems and significantly improves trust scores according to a comprehensive evaluation.
CCS Concepts: • Computing methodologies →Machine learning; • Applied computing →Agriculture.
Additional Key Words and Phrases: intelligent systems, AI, retrieval-augmented generation, agriculture
ACM Reference Format:
Mesafint Fanuel, Mahmoud Nabil Mahmoud, Crystal Cook Marshall, Vishal Lakhotia, Biswanath Dari, Kaushik Roy, and Shaohu
Zhang. 2025. AgriRegion: Region-Aware Retrieval for High-Fidelity Agricultural Advice. 1, 1 (December 2025), 15 pages. https:
//doi.org/10.1145/nnnnnnn.nnnnnnn
1
Introduction
Large Language Models (LLMs) such as ChatGPT [26], Deepseek [10], and Gemini [14] have shown promising capabilities
in image understanding and interpreting, text summarization, question answering (QA), and dialog systems [8, 33, 35].
Authors’ Contact Information: Mesafint Fanuel, North Carolina A&T State University, Greensboro, NC, USA, mfanuel@ncat.edu; Mahmoud Nabil
Mahmoud, The University of Alabama, Tuscaloosa, AL, USA, mmahmoud1@ua.edu; Crystal Cook Marshall, North Carolina Agricultural and Technical
State University, Greensboro, NC, USA, cacookmarshall@ncat.edu; Vishal Lakhotia, Amazon AWS, USA, lakhov@amazon.com; Biswanath Dari, North
Carolina Agricultural and Technical State University, Greensboro, NC, USA, bdari@ncat.edu; Kaushik Roy, North Carolina Agricultural and Technical
State University, Greensboro, NC, USA, kroy@ncat.edu; Shaohu Zhang, North Carolina Agricultural and Technical State University, Greensboro, NC,
USA, szhang1@ncat.edu.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not
made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components
of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on
servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org.
© 2025 Copyright held by the owner/author(s). Publication rights licensed to ACM.
Manuscript submitted to ACM
Manuscript submitted to ACM
1
arXiv:2512.10114v1 [cs.AI] 10 Dec 2025
2
Mesafint Fanuel et al.
Despite their remarkable success, LLMs face challenges in domain-specific or knowledge-intensive tasks [20]. They
often struggle to provide accurate and relevant responses to niche or complex queries, particularly when they are
faced with questions requiring specialized knowledge, or when asked to generate content that requires up-to-date
information in region.
A promising solution to these challenges is Retrieval-Augmented Generation (RAG), which involves integrating
parametric and non-parametric memory components. This method combines the capabilities of LLMs with an external
information retrieval system, allowing the model to dynamically search and incorporate information from extensive
databases or document collections [15, 20]. By leveraging external knowledge beyond the model’s pre-trained dataset,
this approach improves the model’s ability to produce accurate an
…(Full text truncated)…
This content is AI-processed based on ArXiv data.