Trust in One Round: Confidence Estimation for Large Language Models via Structural Signals
Large language models (LLMs) are increasingly deployed in domains where errors carry high social, scientific, or safety costs. Yet standard confidence estimators, such as token likelihood, semantic similarity and multi-sample consistency, remain brittle under distribution shift, domain-specialised text, and compute limits. In this work, we present Structural Confidence, a single-pass, model-agnostic framework that enhances output correctness prediction based on multi-scale structural signals derived from a model’s final-layer hidden-state trajectory. By combining spectral, local-variation, and global shape descriptors, our method captures internal stability patterns that are missed by probabilities and sentence embeddings. We conduct extensive, cross-domain evaluation across four heterogeneous benchmarks-FEVER (fact verification), SciFact (scientific claims), WikiBio-hallucination (biographical consistency), and TruthfulQA (truthfulness-oriented QA). Our Structural Confidence framework demonstrates strong performance compared with established baselines in terms of AUROC and AUPR. More importantly, unlike sampling-based consistency methods which require multiple stochastic generations and an auxiliary model, our approach uses a single deterministic forward pass, offering a practical basis for efficient, robust post-hoc confidence estimation in socially impactful, resource-constrained LLM applications.
💡 Research Summary
The paper tackles the problem of post‑hoc confidence estimation for large language models (LLMs) under strict deployment constraints: a single deterministic forward pass, no auxiliary models, and no multi‑sample decoding. Existing approaches—probability‑based (e.g., mean log‑probability), semantic‑embedding based (e.g., SBERT similarity), and sampling‑based consistency methods (e.g., SelfCheckGPT)—either suffer from mis‑calibration, degrade under distribution shift, or incur prohibitive compute costs. To overcome these limitations, the authors propose Structural Confidence, a model‑agnostic framework that extracts multi‑scale structural descriptors from the hidden‑state trajectory of a proxy encoder (BERT‑base) applied to the concatenated input‑output text.
The central hypothesis is that confident generations correspond to smooth, low‑frequency hidden‑state trajectories with compact global geometry, whereas uncertain or hallucinated outputs exhibit high‑frequency spectral components, sharp local fluctuations, and fragmented global shape. To operationalize this, three families of descriptors are defined:
- Spectral Stability – Fast Fourier Transform of the token‑wise hidden states yields a power spectrum; the ratio of low‑frequency to high‑frequency energy quantifies smoothness.
- Local Variation – Pairwise Euclidean and cosine distances between consecutive token embeddings capture abrupt changes; mean and variance of these distances form the feature set.
- Global Shape Coherence – After reducing the trajectory to a low‑dimensional subspace (PCA to 5‑D), metrics such as total path length, average curvature, and convex‑hull volume describe overall geometric consistency.
These features (≈30 dimensions) are fed into a lightweight multilayer perceptron (two hidden layers with ReLU) trained on labeled correctness data from each benchmark. The authors also experiment with concatenating a sentence‑level SBERT embedding, showing modest gains but confirming that structural features alone are highly predictive.
Because production LLM APIs (e.g., GPT‑4o) do not expose internal activations, the method uses a frozen encoder‑only model as a proxy to generate the hidden‑state trajectory. Tokenizer mismatches are mitigated by the fact that the descriptors rely on activation geometry rather than exact token alignment. The trajectory is truncated or padded to a maximum of 256 tokens to keep computation bounded.
Experiments span four heterogeneous factuality and hallucination benchmarks: FEVER (fact verification), SciFact (scientific claim verification), WikiBio‑hallucination (biography consistency), and TruthfulQA (truthfulness‑oriented QA). Structural Confidence consistently outperforms baselines in AUROC (0.86–0.93) and AUPR (0.81–0.90). The advantage is most pronounced on domain‑shifted datasets (SciFact, TruthfulQA), where probability‑based scores collapse and embedding‑based methods lose discriminative power. Compared to the sampling‑based SelfCheckGPT, the proposed method reduces FLOPs by a factor of 5–6 and latency by 4–5× (≈45 ms per confidence estimation), making it suitable for real‑time web services.
Ablation studies reveal that each descriptor family contributes uniquely: removing spectral features drops AUROC by 0.02–0.04, and removing local variation yields a similar decline. Swapping the proxy encoder for RoBERTa‑large yields a marginal performance boost (≈0.5 % AUROC) at the cost of 1.8× more compute, confirming the trade‑off between encoder capacity and efficiency.
Limitations include dependence on the chosen proxy encoder (different encoders may produce divergent trajectories) and potential information loss when truncating long outputs. Future work is suggested on (i) jointly modeling multiple layers and scales of the trajectory, (ii) integrating attention‑map dynamics, and (iii) learning structural descriptors in a self‑supervised manner without labeled correctness data.
In summary, Structural Confidence introduces a novel, single‑pass, model‑agnostic confidence signal derived from hidden‑state structural stability. It achieves state‑of‑the‑art accuracy on diverse factuality tasks while dramatically reducing computational overhead, thereby offering a practical solution for trustworthy LLM deployment in resource‑constrained, high‑throughput environments.
Comments & Academic Discussion
Loading comments...
Leave a Comment