Recent advances in natural language processing (NLP), particularly large language models (LLMs), have motivated the automatic translation of natural language statements into formal logic without human intervention. This enables automated reasoning and facilitates debugging, finding loop invariants, and adhering to specifications in software systems. However, hallucinations-incorrect outputs generated by LLMs are challenging, particularly for logical translation tasks requiring precision. This work introduces a novel framework that inputs English sentences, converts them into logical expressions, and then translates them into Conjunctive Normal Form (CNF) for satisfiability solving. It employs classical NLP techniques with self-defined grammar, symbolic computation libraries, and a fine-tuned language model to reduce hallucinations. In the early experiments, we observed that the fine-tuned model, trained on different grammar settings, could intentionally correct the same types of hallucinations made by the original model. Thus, it provides reliable CNF generation.
💡 Deep Analysis
📄 Full Content
Fine-Tuned Large Language Models for Logical
Translation: Reducing Hallucinations with
Lang2Logic
Muyu Pan
Computer Science and Engineering
Pennsylvania State University
Pennsylvania State University
mfp5696@psu.edu
Dheeraj Kodakandla
Computer Science and Engineering
Pennsylvania State University
Pennsylvania State University
djk6439@psu.edu
Mahfuza Farooque
Computer Science and Engineering
Pennsylvania State University
University Park, USA
mff5187@psu.edu
Abstract—Recent advances in natural language processing
(NLP), particularly large language models (LLMs), have mo-
tivated the automatic translation of natural language statements
into formal logic without human intervention. This enables
automated reasoning and facilitates debugging, finding loop
invariants, and adhering to specifications in software systems.
However, hallucinations-incorrect outputs generated by LLMs
are challenging, particularly for logical translation tasks requir-
ing precision. This work introduces a novel framework that
inputs English sentences, converts them into logical expressions,
and then translates them into Conjunctive Normal Form (CNF)
for satisfiability solving. It employs classical NLP techniques
with self-defined grammar, symbolic computation libraries, and a
fine-tuned language model to reduce hallucinations. In the early
experiments, we observed that the fine-tuned model, trained on
different grammar settings, could intentionally correct the same
types of hallucinations made by the original model. Thus, it
provides reliable CNF generation.
Index Terms—Logics, LLM Hallucinations, Natural language
Processing, LLM fine-tuning
I. INTRODUCTION
Natural Language Processing (NLP) [1] was initially con-
ceptualized by Swiss linguist Ferdinand de Saussure, who
introduced the idea that language meaning is created through
internal relationships and contrasts. Shared linguistic structures
enable communication. In 1950, Alan Turing proposed the
concept of a ”thinking machine,” suggesting that a machine
capable of communicating with humans through a teleprinter
demonstrates cognitive capability. Contemporary NLP plays a
critical role in understanding human language and generating
contextually appropriate responses, exemplified by intelligent
assistants like Apple’s Siri and Amazon’s Alexa, which pro-
vide personalized assistance and process user requests au-
tonomously.
Large Language Models (LLMs) [2] are sophisticated ar-
tificial intelligence models constructed using deep learning
methodologies, trained on extensive datasets, and capable of
generating human-like textual content. Grounded in trans-
former architecture, these models are designed to capture
complex linguistic nuances and long-range textual dependen-
cies, enabling advanced capabilities such as machine transla-
tion, conversational interaction, and content generation. LLMs
not only comprehend human languages but also demonstrate
applicability across diverse research and industrial domains.
OpenAI’s ChatGPT [3] serves as a prominent example of LLM
technology utilized extensively in daily applications.
Hallucination [4] in language models represents a phe-
nomenon where, based on memorized training data patterns,
the model generates outputs containing fabricated, plausible-
sounding information when confronted with unseen scenarios.
The consequences of hallucinations can range from minor
inconsistencies that cause user confusion to critically signifi-
cant errors in sensitive domains such as language translation,
software development, or autonomous systems. Mitigating hal-
lucinations [5] in LLMs is paramount for ensuring reliability,
safety, and practical applicability, particularly when deploying
these models in critical or sensitive contexts.
To address the challenge of hallucinations, fine-tuned mod-
els [3] have emerged as an effective solution. These models
are pre-trained machine learning models optimized for spe-
cialized task domains, demonstrating superior performance
compared to generalized models through targeted training
on smaller, domain-specific datasets. During the fine-tuning
process, model parameters are meticulously adjusted to en-
hance precision and generalization capabilities. This approach
leverages the foundational language understanding acquired
during initial large-scale training, subsequently refining the
model’s focus on specific target tasks.
Recent works such as LogicLLaMA [6] and LOGIC-LM
[7] have pioneered advancements in logical reasoning by
fine-tuning LLMs for specialized tasks. LogicLLaMA fine-
tunes LLaMA on a dataset of verified NL-FOL pairs to
translate natural language to first-order logic (FOL) and mit-
igate hallucinations using reinforcement learning with human
feedback (RLHF). Similarly, LOGIC-LM integrates LLMs
with symbolic solvers, converting NL into structured symbolic
formulations for deterministic inference while using solver
feedback to self-refine and improve accuracy on logical rea-
soning benchmarks. These studies h