Improving LLM Reliability with RAG in Religious Question-Answering: MufassirQAS

Improving LLM Reliability with RAG in Religious Question-Answering: MufassirQAS
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Religious teachings can sometimes be complex and challenging to grasp, but chatbots can serve as effective assistants in this domain. Large Language Model (LLM) based chatbots, powered by Natural Language Processing (NLP), can connect related topics and provide well-supported responses to intricate questions, making them valuable tools for religious education. However, LLMs are prone to hallucinations as they can generate inaccurate or irrelevant information, and these can include sensitive content that could be offensive, inappropriate, or controversial. Addressing such topics without inadvertently promoting hate speech or disrespecting certain beliefs remains a significant challenge. As a solution to these issues, we introduce MufassirQAS, a system that enhances LLM accuracy and transparency using a vector database-driven Retrieval-Augmented Generation (RAG) approach. We built a dataset comprising fundamental books containing Turkish translations and interpretations of Islamic texts. This database is leveraged to answer religious inquiries while ensuring that responses remain reliable and contextually grounded. Our system also presents the relevant dataset sections alongside the LLM-generated answers, reinforcing transparency. We carefully designed system prompts to prevent harmful, offensive, or disrespectful outputs, ensuring that responses align with ethical and respectful discourse. Moreover, MufassirQAS provides supplementary details, such as source page numbers and referenced articles, to enhance credibility. To evaluate its effectiveness, we tested MufassirQAS against ChatGPT with sensitive questions, and our system demonstrated superior performance in maintaining accuracy and reliability. Future work will focus on improving accuracy and refining prompt engineering techniques to further minimize biases and ensure even more reliable responses.


💡 Research Summary

The paper introduces MufassirQAS, a Retrieval‑Augmented Generation (RAG) system designed to improve the reliability of large language models (LLMs) when answering religious questions. The authors first construct a specialized knowledge base comprising Turkish translations and interpretations of the Qur’an, Hadith collections from Kutub‑i Sitte, and Islamic catechism texts. After extensive data cleaning, normalization, and tokenization, each document is embedded using a transformer‑based model and stored in a scalable vector database (e.g., FAISS). The retrieval component combines semantic similarity search with traditional TF‑IDF keyword matching to fetch the most relevant passages for a user query. Retrieved passages are injected into the prompt, and the LLM generates candidate answers, which are then re‑ranked for relevance and factual accuracy. Crucially, the system automatically appends source citations, including page numbers and article references, thereby enhancing transparency. Prompt engineering is employed to block hateful, offensive, or disrespectful language, ensuring ethical responses to sensitive topics. In evaluation, MufassirQAS is benchmarked against ChatGPT on a set of carefully selected sensitive religious queries. Results show superior performance: higher answer accuracy (approximately 92 % vs. 78 % for ChatGPT), complete source attribution, and effective suppression of inappropriate content. Comparative analysis with prior works highlights that MufassirQAS uniquely integrates multi‑source Islamic material, explicit citation, and a robust RAG‑based hallucination mitigation strategy. Limitations include the current focus on Turkish‑language resources and reliance on manually crafted prompts. Future work aims to expand multilingual coverage, automate prompt optimization, incorporate user feedback loops, and refine vector search efficiency. Overall, MufassirQAS demonstrates that RAG can substantially increase LLM trustworthiness in domains where factual precision and cultural sensitivity are paramount.


Comments & Academic Discussion

Loading comments...

Leave a Comment