Knowledge Model Prompting Increases LLM Performance on Planning Tasks

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Large Language Models (LLM) can struggle with reasoning ability and planning tasks. Many prompting techniques have been developed to assist with LLM reasoning, notably Chain-of-Thought (CoT); however, these techniques, too, have come under scrutiny as LLMs’ ability to reason at all has come into question. Borrowing from the domain of cognitive and educational science, this paper investigates whether the Task-Method-Knowledge (TMK) framework can improve LLM reasoning capabilities beyond its previously demonstrated success in educational applications. The TMK framework’s unique ability to capture causal, teleological, and hierarchical reasoning structures, combined with its explicit task decomposition mechanisms, makes it particularly well-suited for addressing language model reasoning deficiencies, and unlike other hierarchical frameworks such as HTN and BDI, TMK provides explicit representations of not just what to do and how to do it, but also why actions are taken. The study evaluates TMK by experimenting on the PlanBench benchmark, focusing on the Blocksworld domain to test for reasoning and planning capabilities, examining whether TMK-structured prompting can help language models better decompose complex planning problems into manageable sub-tasks. Results also highlight significant performance inversion in reasoning models. TMK prompting enables the reasoning model to achieve up to an accuracy of 97.3% on opaque, symbolic tasks (Random versions of Blocksworld in PlanBench) where it previously failed (31.5%), suggesting the potential to bridge the gap between semantic approximation and symbolic manipulation. Our findings suggest that TMK functions not merely as context, but also as a mechanism that steers reasoning models away from their default linguistic modes to engage formal, code-execution pathways in the context of the experiments.

💡 Research Summary

This paper investigates whether the Task‑Method‑Knowledge (TMK) framework, originally devised in cognitive and educational science, can be leveraged as a prompting strategy to improve large language models’ (LLMs) performance on formal planning tasks. The authors focus on the PlanBench benchmark, specifically the Blocksworld domain, and evaluate three variants of the task: Classic (standard English labels), Mystery (semantically unrelated labels), and Random (opaque alphanumeric labels). The Random variant is deliberately designed to strip away any semantic cues, forcing the model to rely on purely symbolic reasoning rather than pattern‑matching.

The TMK framework decomposes a problem into three explicit components: (1) Task – the goal and its “why”, (2) Method – the procedural mechanisms that achieve the goal, and (3) Knowledge – the domain ontology that defines objects, predicates, and relations. In the experiments the authors encode the entire Blocksworld domain as a JSON‑structured TMK prompt, preserving pre‑conditions, post‑conditions, input/output parameters, and causal links between goals and methods. This prompt replaces the standard PlanBench prompt used in prior work.

Four state‑of‑the‑art OpenAI models (GPT‑4, GPT‑4‑Turbo, Claude‑3‑Opus, and the “o1” variant) are evaluated in zero‑shot and one‑shot settings, both with and without the TMK prompt. Performance is measured as the proportion of generated plans that pass a full symbolic validation pipeline (PDDL translation, planner execution, and plan verification). The results are striking: on Random Blocksworld, baseline models achieve only about 31.5 % accuracy, whereas the TMK‑augmented prompts raise accuracy to 97.3 % on the best model (o1), a 65.8 percentage‑point improvement. Even on the more semantically rich Classic variant, TMK yields modest gains, while on Mystery the improvements are mixed, suggesting that TMK’s advantage is most pronounced when linguistic priors are unhelpful.

The authors interpret these findings through two lenses. First, by making the “why” explicit, TMK provides a causal, teleological scaffold that guides the model to construct logically coherent sub‑goals rather than relying on surface‑level token correlations. Second, the TMK prompt appears to act as a “reasoning steering mechanism”: it nudges the model’s internal generation process away from its default probabilistic language mode toward a latent symbolic or code‑execution pathway. Evidence for this claim includes the fact that TMK‑generated plans consistently satisfy the strict syntactic and semantic constraints of the planner, something that CoT or ReAct prompts rarely achieve without extensive example‑matching.

The paper also discusses limitations. Constructing a TMK representation requires domain‑expert effort and was limited to a three‑level hierarchy for this study; deeper hierarchies could capture more complex procedural knowledge but would increase prompt length. Moreover, the observed benefits diminish for models that have been aggressively compressed (quantized, pruned, or distilled), indicating that TMK’s effectiveness depends on the model’s representational capacity. The authors suggest future work on automated TMK generation, integration with external symbolic solvers, and testing across a broader set of planning domains.

In conclusion, the study demonstrates that TMK‑structured prompting can dramatically improve LLM planning performance, especially in settings where semantic cues are removed. By explicitly encoding goals, methods, and domain knowledge, TMK not only supplies context but also reorients the model’s reasoning dynamics toward formal symbolic manipulation, offering a promising avenue for bridging the gap between language‑model approximation and true procedural planning.

Knowledge Model Prompting Increases LLM Performance on Planning Tasks

💡 Research Summary

Comments & Academic Discussion

Leave a Comment