AMEM4Rec: Leveraging Cross-User Similarity for Memory Evolution in Agentic LLM Recommenders
Agentic systems powered by Large Language Models (LLMs) have shown strong potential in recommender systems but remain hindered by several challenges. Fine-tuning LLMs is parameter-inefficient, and prompt-based agentic reasoning is limited by context length and hallucination risk. Moreover, existing agentic recommendation systems predominantly leverages semantic knowledge while neglecting the collaborative filtering (CF) signals essential for implicit preference modeling. To address these limitations, we propose AMEM4Rec, an agentic LLM-based recommender that learns collaborative signals in an end-to-end manner through cross-user memory evolution. AMEM4Rec stores abstract user behavior patterns from user histories in a global memory pool. Within this pool, memories are linked to similar existing ones and iteratively evolved to reinforce shared cross-user patterns, enabling the system to become aware of CF signals without relying on a pre-trained CF model. Extensive experiments on Amazon and MIND datasets show that AMEM4Rec consistently outperforms state-of-the-art LLM-based recommenders, demonstrating the effectiveness of evolving memory-guided collaborative filtering.
💡 Research Summary
AMEM4Rec tackles three fundamental shortcomings of existing LLM‑driven agentic recommender systems: (1) parameter‑inefficient fine‑tuning, (2) limited context windows and hallucination risk when feeding large interaction histories, and (3) the absence of explicit collaborative‑filtering (CF) signals. While prior work has used LLMs for feature augmentation or direct candidate generation, these approaches rely heavily on semantic knowledge and either ignore implicit user‑item co‑occurrence patterns or depend on external CF models (e.g., LightGCN, SASRec). Such dependencies hinder end‑to‑end learning and reduce adaptability to dynamic data.
AMEM4Rec introduces a novel memory‑evolution framework that embeds CF information directly into a shared textual memory pool, thereby allowing a frozen LLM agent to reason with collaborative cues without any external CF component. The pipeline consists of three training stages followed by a re‑ranking stage at inference.
Memory Creation: For each user, a sliding window of size w traverses the ordered interaction sequence. The window’s (item title, category) pairs are fed to the LLM via a carefully crafted prompt. The LLM returns a structured pattern description comprising two textual fields: a behavior explanation that captures the user’s motivation for the observed actions, and a pattern description that abstracts recurring interaction structures shared across users. These texts are encoded into embeddings and stored as memory entries (pₖ, eₖ) in a global pool M_mem. This process yields “abstract group‑level behavior” fragments rather than raw item IDs.
Linking and Dual Validation: When a new memory is created, its embedding is used to retrieve the top‑K most similar memories from M_mem. Two validators then decide whether to link the new entry with each candidate: (a) a similarity validator based on cosine distance, and (b) a semantic validator where the LLM assesses meaning‑level consistency between the new and candidate texts. Only pairs passing both checks are linked, establishing a network of related memories that can share information during evolution.
Memory Evolution: Linked memories are jointly processed by the LLM through a synthesis prompt. The model aggregates the multiple pattern descriptions, generates an updated, more comprehensive description, and simultaneously updates the embeddings of all involved memories. This iterative refinement reinforces shared collaborative patterns while filtering out noise, effectively turning the memory pool into a dynamic repository of CF signals that evolve over time.
Agentic Re‑ranking: At inference, a base recommender supplies a candidate set Cᵤ for a target user u. The system retrieves memories relevant to u’s interaction history hᵤ and injects their textual content into a re‑ranking prompt. The frozen LLM then scores the candidates using both the user’s personal history and the collaborative memories, producing a personalized ranking without any parameter updates.
Experiments: The authors evaluate AMEM4Rec on four real‑world datasets—Amazon Fashion, Video Games, CDs & Vinyl, and the news‑recommendation MIND dataset. Using standard ranking metrics (HR@10, NDCG@10), AMEM4Rec consistently outperforms state‑of‑the‑art LLM‑based recommenders (e.g., P5, TALLRec) and strong CF baselines (LightGCN, SASRec) by 4–7 absolute percentage points. Gains are especially pronounced for users with sparse interaction histories. Ablation studies confirm the necessity of both similarity and semantic validators as well as the iterative evolution step; removing any component degrades performance.
Contributions and Limitations: The paper’s primary contributions are (i) a text‑based memory pool that directly encodes collaborative patterns, (ii) a dual‑validator linking mechanism that mitigates noise, (iii) an end‑to‑end memory‑evolution process that requires no LLM fine‑tuning, and (iv) empirical validation across diverse domains. Limitations include the computational cost of generating memories via LLM prompts and potential scalability challenges as the memory pool grows, which may necessitate additional indexing or compression techniques. Moreover, the current work focuses on re‑ranking; extending memory‑guided generation to the candidate‑selection stage remains an open research direction.
In summary, AMEM4Rec demonstrates that collaborative filtering can be effectively distilled into evolving textual memories, enabling large language model agents to leverage both semantic understanding and implicit cross‑user patterns while keeping model parameters fixed. This approach bridges the gap between traditional CF and modern LLM‑centric recommendation, offering a scalable and adaptable solution for next‑generation recommender systems.
Comments & Academic Discussion
Loading comments...
Leave a Comment