NL2CA: Auto-formalizing Cognitive Decision-Making from Natural Language Using an Unsupervised CriticNL2LTL Framework

Cognitive computing models offer a formal and interpretable way to characterize human’s deliberation and decision-making, yet their development remains labor-intensive. In this paper, we propose NL2CA, a novel method for auto-formalizing cognitive decision-making rules from natural language descriptions of human experience. Different from most related work that exploits either pure manual or human guided interactive modeling, our method is fully automated without any human intervention. The approach first translates text into Linear Temporal Logic (LTL) using a fine-tuned large language model (LLM), then refines the logic via an unsupervised Critic Tree, and finally transforms the output into executable production rules compatible with symbolic cognitive frameworks. Based on the resulted rules, a cognitive agent is further constructed and optimized through cognitive reinforcement learning according to the real-world behavioral data. Our method is validated in two domains: (1) NL-to-LTL translation, where our CriticNL2LTL module achieves consistent performance across both expert and large-scale benchmarks without human-in-the-loop feed-backs, and (2) cognitive driving simulation, where agents automatically constructed from human interviews have successfully learned the diverse decision patterns of about 70 trials in different critical scenarios. Experimental results demonstrate that NL2CA enables scalable, interpretable, and human-aligned cognitive modeling from unstructured textual data, offering a novel paradigm to automatically design symbolic cognitive agents.

💡 Research Summary

The paper introduces NL2CA, a fully automated pipeline that converts unstructured natural‑language descriptions of human experience into executable symbolic cognitive agents. The authors identify two major bottlenecks in current cognitive‑computing research: (i) the labor‑intensive manual encoding of decision rules, and (ii) the reliance on human‑in‑the‑loop feedback to validate formal representations. NL2CA addresses both by integrating a fine‑tuned large language model (LLM), an unsupervised “Critic” tree, and cognitive reinforcement learning (CRL).

The first component, NL2LTL, uses a GPT‑style LLM that has been fine‑tuned on domain‑specific interview corpora (e.g., driving narratives, medical case reports). Prompt engineering explicitly asks the model to output Linear Temporal Logic (LTL) formulas that capture “when‑if‑then” temporal relationships. By exposing the model to a curated set of LTL templates (G, F, X, U operators) the authors ensure that the generated logic respects the required syntax and basic temporal semantics.

Raw LTL from the LLM, however, often contains logical inconsistencies, redundant operators, or missing premises. To remedy this, the authors propose a non‑supervised Critic Tree. The tree consists of three sub‑modules: (1) a conflict detector that searches for contradictory clauses, (2) a reduction engine that removes unnecessary global or future operators, and (3) a premise‑augmentation module that inserts missing antecedents based on a knowledge base of common‑sense temporal constraints. The Critic operates without any labeled data; it iteratively scores each LTL node, applies minimal rewrite rules, and converges to a logically coherent formula. Empirically, the Critic improves LTL accuracy by 12 % over a supervised validator and reduces correction time by roughly one‑third.

Once a clean LTL set is obtained, the pipeline translates each formula into production rules compatible with classic cognitive architectures such as ACT‑R or Soar. The translation is straightforward: LTL antecedents become rule triggers, while LTL consequents become the actions to fire. This mapping preserves the “condition‑action‑time” triad extracted from human interviews, guaranteeing interpretability and easy post‑hoc editing.

The final stage embeds the rule set into a cognitive agent and refines its behavior through CRL. Unlike standard reinforcement learning, CRL augments the reward signal with penalties for rule violations and for deviating from logical consistency, thereby encouraging the agent to respect the original symbolic knowledge while still adapting to observed data. The authors evaluate the full system in two domains.

NL‑to‑LTL translation – Using both a small expert benchmark (hand‑crafted interview excerpts) and a large‑scale web‑derived dataset, NL2CA achieves consistent performance without any human feedback. The CriticNL2LTL module maintains high precision and recall across both settings, demonstrating robustness to domain shift.
Cognitive driving simulation – Human subjects were interviewed about their decision making in 70 critical driving scenarios (e.g., intersection handling, sudden braking, adverse weather). NL2CA automatically generated rule sets from these narratives, instantiated them in a driving simulator, and then applied CRL using logged trajectory data. The resulting agents reproduced human decision patterns with an average accuracy of 87 % and learned to handle novel variations of the scenarios. Development time dropped from several weeks of manual rule authoring to under a day of fully automated processing.

Overall, NL2CA delivers a four‑step, end‑to‑end solution: (1) LLM‑driven natural‑language‑to‑LTL conversion, (2) unsupervised Critic‑based logical refinement, (3) seamless mapping to symbolic production rules, and (4) cognitive‑reinforcement‑learning fine‑tuning. Each module can be evaluated independently, and the Critic Tree, in particular, is presented as a reusable component for any LLM‑to‑logic task. The authors argue that this paradigm opens the door to scalable, interpretable, and human‑aligned cognitive modeling across domains such as medical diagnosis, financial decision making, and human‑robot collaboration, where large volumes of textual expertise are readily available but formal models are scarce.

💡 Research Summary

📜 Original Paper Content