Factuality on Demand: Controlling the Factuality-Informativeness Trade-off in Text Generation
Large language models (LLMs) encode knowledge with varying degrees of confidence. When responding to queries, models face an inherent trade-off: they can generate responses that are less informative but highly factual, or more informative but potentially less accurate. Different applications demand different balances between informativeness and factuality. We introduce Factuality-Controlled Generation (FCG), a framework that enables users to specify factuality constraints alongside their queries. We propose to evaluate FCG performance on two dimensions: adherence to factuality constraints and response informativeness. We propose to train models on the FCG task using synthetic data, and show that our synthetic training significantly improves models’ ability to both respect factuality requirements and maintain informativeness in their outputs.
💡 Research Summary
Large language models (LLMs) excel at generating fluent text but often struggle to balance factual correctness with informational richness. In many applications—such as medical advice or legal analysis—high factuality is paramount, whereas creative writing or brainstorming can tolerate occasional inaccuracies in exchange for more detailed content. Existing LLMs provide no built‑in mechanism to let users explicitly control this trade‑off; prompting with “be more factual” yields inconsistent results, and even state‑of‑the‑art models frequently fail to meet moderate factuality targets measured by FactScore.
The paper introduces Factuality‑Controlled Generation (FCG), a new task and framework that lets a user specify a desired factuality level c (e.g., 80 % of statements must be correct) together with a question x. The model must then produce an answer that satisfies f(o) ≥ c (where f is the FactScore‑based factuality metric) while maximizing the number of atomic facts it conveys. To train such a controllable model, the authors create a synthetic dataset because no labeled (question, factuality‑target, answer) triples exist.
Synthetic data generation proceeds as follows. For each question, a base LLM (GPT‑4 in the experiments) first generates an unrestricted answer r₀. The answer is segmented into atomic facts using a separate segmenter. Each fact is then queried back to the same model with a “True or False?” prompt, yielding a confidence score h_M(aᵢ) ∈
Comments & Academic Discussion
Loading comments...
Leave a Comment