Do We Really Need SFT? Prompt-as-Policy over Knowledge Graphs for Cold-start Next POI Recommendation

Do We Really Need SFT? Prompt-as-Policy over Knowledge Graphs for Cold-start Next POI Recommendation
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Next point-of-interest (POI) recommendation is a key component of smart urban services, yet it remains challenging under cold-start conditions with sparse user-POI interactions. Recent LLM-based methods address this issue through either supervised fine-tuning (SFT) or in-context learning (ICL), but SFT is costly and prone to overfitting active users, while static prompts in ICL lack adaptability to diverse user contexts. We argue that the main limitation lies not in LLM reasoning ability, but in how contextual evidence is constructed and presented. Accordingly, we propose Prompt-as-Policy over knowledge graphs (KG), a reinforcement-guided prompting framework that formulates prompt construction as a learnable decision process, while keeping the LLM frozen as a reasoning engine. To enable structured prompt optimization, we organize heterogeneous user-POI signals into a KG and transform mined relational paths into evidence cards, which serve as atomic semantic units for prompt composition. A contextual bandit learner then optimizes a prompt policy that adaptively determines (i) which relational evidences to include, (ii) how many evidences to retain per candidate POI, and (iii) how to organize and order them within the prompt. Experiments on three real-world datasets show that Prompt-as-Policy consistently outperforms state-of-the-art baselines, achieving an average 11.87% relative improvement in Acc@1 for inactive users, while maintaining competitive performance for active users, without any model fine-tuning.


💡 Research Summary

The paper tackles the cold‑start next‑point‑of‑interest (POI) recommendation problem by proposing a novel framework called Prompt‑as‑Policy, which treats prompt construction for a frozen large language model (LLM) as a learnable decision process. Traditional approaches either fine‑tune LLMs (Supervised Fine‑Tuning, SFT) – incurring high computational cost and overfitting to active users – or rely on static in‑context learning (ICL) prompts that cannot adapt to diverse user contexts. The authors argue that the bottleneck lies not in the LLM’s reasoning ability but in how contextual evidence is assembled and presented.

To make prompt optimization tractable, the authors first build a heterogeneous knowledge graph (KG) that encodes users, POIs, categories, spatial grid cells, time slots, intents, and profile anchors, together with typed relations such as visited, prefersIntent, near, activeInTime, etc. Spatial grids are obtained via k‑means clustering of POI coordinates, and a near relation links POIs within a 10 km radius. Using this KG, a multi‑hop breadth‑first search (BFS) from a user node discovers candidate POIs through diverse relational paths (e.g., user → intent → category → POI, user → POI → grid → POI, user → POI → near → POI). Each discovered path is summarized by the LLM into a concise natural‑language rationale, forming an “evidence card”. An evidence card for a candidate POI consists of up to M rationales selected from a pool of all mined rationales for that POI.

Prompt construction then comprises three parts: (i) a compact user‑context header (time slot, last location, brief profile summary, inferred intents), (ii) a candidate list (ID, category, distance), and (iii) the evidence cards, one per candidate. The key innovation is a contextual bandit learner that decides, for each recommendation round, (a) which evidences to include, (b) how many rationales per candidate (the cap M), and (c) the ordering of selected rationales. The bandit receives as reward the quality of the LLM’s output – a strict JSON ranking limited to the candidate set – penalizing schema violations or out‑of‑candidate IDs, and also incorporating a penalty for overly long prompts to encourage concise yet informative prompts.

Experiments on three real‑world Foursquare datasets (representing different cities) evaluate both inactive users (≤ 5 historical check‑ins) and active users (> 20 check‑ins). Prompt‑as‑Policy achieves an average 11.87 % relative improvement in Acc@1 for inactive users compared with state‑of‑the‑art SFT baselines, while maintaining competitive performance for active users. Ablation studies show that removing the KG, using static prompts, or replacing the bandit with random selection each degrades performance substantially, confirming the importance of structured evidence, adaptive selection, and reinforcement‑guided optimization.

In summary, the work demonstrates that by keeping the LLM frozen and focusing on dynamic, policy‑driven prompt construction over a knowledge‑graph‑derived evidence space, one can obtain robust cold‑start recommendation without the expense of fine‑tuning. The approach opens avenues for extending policy‑based prompting to other recommendation domains and for exploring richer policy architectures (e.g., transformer‑based policies) to further boost performance.


Comments & Academic Discussion

Loading comments...

Leave a Comment