Large Language Models (LLMs) are powerful linguistic engines but remain susceptible to hallucinations-plausible-sounding outputs that are factually incorrect or unsupported. In this work, we present a coherent, mathematically grounded framework to understand, measure, and mitigate these hallucinations. Drawing on probabilistic modeling, information theory, trigonometric signal analysis, and Bayesian uncertainty estimation, we analyze how errors compound autoregressively, propose refined uncertainty metrics (including semantic and phase-aware variants), and develop principled mitigation strategies: contrastive decoding, retrieval-augmented grounding, factual alignment, and abstention. This unified lens connects recent advances in calibration, retrieval, and alignment in a way that supports safer and more reliable LLMs.
Large Language Models (LLMs) have rapidly become critical tools in multiple domains-conversational agents, scientific assistants, tutoring systems, and more. Their fluency can be staggering, but they rely fundamentally on statistical prediction: anticipating the next token based on context rather than verifying the truth of what they generate. This gap can lead to hallucinations, where output is syntactically plausible but factually ungrounded.
We broadly classify hallucinations into two types:
• Intrinsic hallucinations: errors or contradictions with respect to the input context.
• Extrinsic hallucinations: statements that conflict with verified external sources. In high-stakes settings-medicine, law, education-such hallucinations are not just harmless mistakes; they undermine trust and create risk. Addressing them demands more than ad hoc fixes. We need a rigorous, mathematical understanding and mitigation strategy.
This paper develops such a foundation. Specifically, we:
• Model error propagation in autoregressive generation.
• Introduce novel uncertainty metrics that incorporate both semantic similarity and positional phase.
• Propose mitigation techniques rooted in theory: contrastive decoding that respects phase, retrieval-augmented generation, factuality-aware training, and abstention.
• Synthesize these components into a unified, practical architecture.
2 Mathematical Origins of Hallucination
Consider a token sequence x 1 , x 2 , . . . , x T . A language model defines:
If at some step t the model assigns a slightly incorrect (or low-confidence) probability, say it underestimates the true token’s probability by ϵ t , then future conditioning suffers. In a first-order approximation, the deviation in joint probability is roughly:
Over many steps, even small ϵ t values can compound into significant drift, making later tokens more likely to diverge from factual or coherent content.
Let P * (x t+1 | x 1:t ) be the true (but unknown) conditional distribution of tokens, and P (x t+1 | x 1:t ) be the model’s estimate. We can measure their divergence by:
A high KL divergence suggests the model’s internal belief is far from reality, which often underlies overconfident false predictions, especially on out-of-distribution (OOD) inputs.
Modern transformer-based LLMs use sinusoidal positional embeddings (as in the original transformer architecture):
This induces a phase ϕ t at each token position. We propose that this phase can modulate uncertainty. For instance, if σ 2 base (x t ) is some “base” variance (from dropout or ensemble), a phase-aware version could be:
Here, γ ≥ 0 is a scaling hyperparameter. This form implies that at certain “phase positions,” uncertainty is systematically heightened or reduced.
3 Quantifying Uncertainty and Miscalibration
A well-calibrated model’s confidence should match its accuracy. To formally evaluate this, partition predictions into M bins B 1 , . . . , B M by confidence, and define:
where conf(B m ) is the average confidence in bin m, acc(B m ) is the empirical accuracy, and n is the total number of examples.
Using dropout at inference time, we run T stochastic forward passes, drawing θ 1 , . . . , θ T . Then:
We estimate epistemic variance as:
Instead of only relying on token probabilities, we can view the semantic similarity structure among a set of candidate continuations. Let K be a positive semi-definite kernel matrix over these candidates (e.g., based on embedding similarity). Normalize:
Then compute the von Neumann entropy:
This semantic entropy reflects how “diverse” the candidate meanings are: higher S(ρ) means more semantic spread, indicating uncertainty beyond token-level ambiguity.
Combining the above:
where α, β are tunable hyperparameters. This ties positional phase to variance, hypothesizing that some positions are inherently more “risky.”
We use two models-a “full” model and a “baseline” model. For token x t , define the contrastive score: Score CD (x t ) = log P full (x t ) -λ log P baseline (x t ) + η • sin(ϕ t ) Here: -λ controls how much we penalize tokens that the baseline also likes, -η sin(ϕ t ) biases toward tokens located at favorable phase positions.
By decoding based on this score (e.g., via sampling or beam search), we favor tokens that are both distinctive (from the baseline) and phase-aligned, which may reduce hallucination risk.
In a retrieval-augmented architecture, let R be the set of retrieved documents or contexts. Then:
To combine multiple retrievals (e.g., from different query formulations), we can use Reciprocal Rank Fusion (RRF):
where Q is a set of query variants, rank q (r) is the rank of document r for query q, and k is a smoothing constant.
To promote factual generation during training, incorporate a regularization term. Let ŷ be the model’s predicted distribution, and S verifier (ŷ) be a factuality score from a separate verifier model. Define a combined loss:
The factor 1 + sin 2 (ϕ t ) amplifies or a
This content is AI-processed based on open access ArXiv data.