Neurosymbolic Language Reasoning as Satisfiability Modulo Theory

Neurosymbolic Language Reasoning as Satisfiability Modulo Theory
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Natural language understanding requires interleaving textual and logical reasoning, yet large language models often fail to perform such reasoning reliably. Existing neurosymbolic systems combine LLMs with solvers but remain limited to fully formalizable tasks such as math or program synthesis, leaving natural documents with only partial logical structure unaddressed. We introduce Logitext, a neurosymbolic language that represents documents as natural language text constraints (NLTCs), making partial logical structure explicit. We develop an algorithm that integrates LLM-based constraint evaluation with satisfiability modulo theory (SMT) solving, enabling joint textual-logical reasoning. Experiments on a new content moderation benchmark, together with LegalBench and Super-Natural Instructions, show that Logitext improves both accuracy and coverage. This work is the first that treats LLM-based reasoning as an SMT theory, extending neurosymbolic methods beyond fully formalizable domains.


💡 Research Summary

The paper tackles a fundamental limitation of current large language models (LLMs): while they excel at generating fluent text, they often fail to perform reliable logical reasoning, especially when the reasoning must be interleaved with textual interpretation. Existing neurosymbolic approaches address this by coupling LLMs with solvers, but they are confined to fully formalizable domains such as mathematics or program synthesis. Real‑world documents—policies, statutes, user guidelines—contain a mixture of textual concepts that are hard to formalize (e.g., “hateful”, “immediate threat”) and logical structures that can be expressed with Boolean operators. The authors argue that such hybrid documents require an approach that can handle partial logical structure while still leveraging the nuanced understanding of LLMs for the textual parts.

To fill this gap, they introduce Logitext, a neurosymbolic language that represents documents as a collection of Natural Language Text Constraints (NLTCs). An NLTC binds a natural‑language clause to a Boolean variable and records any sub‑clauses that refer to other variables (e.g., the target group in a moderation policy). Logitext programs consist of four constructs: (1) variable declarations (both Boolean and typed strings), (2) textual let‑bindings that embed NLTCs, (3) logical constraint blocks written in a Z3‑compatible syntax, and (4) convenience constructs such as forall and forsome that compactly express repeated patterns. The language makes the logical skeleton of a document explicit while preserving the original textual fragments for LLM evaluation.

The core technical contribution is an algorithm that treats LLM‑based textual evaluation as a new theory within an SMT (Satisfiability Modulo Theory) solver. The outer loop is a conventional SMT solver that proposes Boolean assignments for the logical variables. For each unassigned string variable, an inner NL‑Solver invokes an LLM in two phases: (a) LLMPropose generates a candidate text that satisfies the current NLTCs under the given assignment, and (b) LLMVerify checks whether the candidate indeed fulfills each NLTC. If verification fails, the NL‑Solver refines its proposal, possibly using feedback about which constraints were violated. When no suitable text can be found for a given Boolean assignment, the outer SMT solver backtracks and tries a different logical assignment. The check() function optionally enumerates multiple satisfying assignments (cover=True), enabling both classification (single assignment) and generation (multiple assignments) tasks.

The authors construct a new content moderation benchmark comprising five policies with 6–21 clauses each. They define two types of reasoning gaps: compositional gaps (Δ)—the performance drop when a single LLM must handle both textual and logical reasoning versus a staged approach—and combinatorial gaps (Δ′)—the inability of LLMs to enumerate all satisfying assignments for combinatorial generation tasks. Experiments reveal that even state‑of‑the‑art models (GPT‑5) exhibit large Δ′ (over 99 % of satisfying assignments missed), while Δ shrinks with model size but remains non‑trivial. Logitext dramatically reduces both gaps: it matches the exhaustive enumeration of a traditional SMT solver (Z3) for combinatorial tasks and improves classification accuracy by several points across all policies.

Beyond moderation, Logitext is evaluated on LegalBench (legal question answering) and Super‑Natural Instructions (complex instruction following). In both settings, the integration of textual constraints with logical solving yields consistent gains in accuracy and coverage compared to baselines that rely solely on prompting or on naïve staged pipelines.

The paper’s contributions can be summarized as follows:

  1. Conceptual Insight – a clear articulation of why interleaved textual‑logical reasoning is essential and why staged pipelines fall short.
  2. Language Design – the definition of Logitext and NLTCs, providing a formal yet flexible interface between natural language and SMT.
  3. Solver Architecture – an algorithm that embeds LLM‑based constraint solving as an SMT‑compatible theory, enabling tight cooperation between logical propagation and textual evaluation.
  4. Empirical Validation – new benchmarks and extensive experiments demonstrating superior performance on compositional and combinatorial tasks across multiple domains.

In conclusion, the work pioneers the treatment of LLM reasoning as an SMT theory, extending neurosymbolic methods beyond fully formalizable domains. By allowing partial logical structure to be expressed directly in natural language and by tightly coupling LLM evaluation with logical solving, Logitext opens a pathway toward reliable, scalable reasoning over real‑world documents that blend ambiguous textual concepts with precise logical rules. Future directions include scaling to larger, more diverse corpora (e.g., medical records, policy analysis) and exploring richer theories (temporal, probabilistic) within the same framework.


Comments & Academic Discussion

Loading comments...

Leave a Comment