MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents
Most Large Language Model (LLM) agent memory systems rely on a small set of static, hand-designed operations for extracting memory. These fixed procedures hard-code human priors about what to store and how to revise memory, making them rigid under diverse interaction patterns and inefficient on long histories. To this end, we present \textbf{MemSkill}, which reframes these operations as learnable and evolvable memory skills, structured and reusable routines for extracting, consolidating, and pruning information from interaction traces. Inspired by the design philosophy of agent skills, MemSkill employs a \emph{controller} that learns to select a small set of relevant skills, paired with an LLM-based \emph{executor} that produces skill-guided memories. Beyond learning skill selection, MemSkill introduces a \emph{designer} that periodically reviews hard cases where selected skills yield incorrect or incomplete memories, and evolves the skill set by proposing refinements and new skills. Together, MemSkill forms a closed-loop procedure that improves both the skill-selection policy and the skill set itself. Experiments on LoCoMo, LongMemEval, HotpotQA, and ALFWorld demonstrate that MemSkill improves task performance over strong baselines and generalizes well across settings. Further analyses shed light on how skills evolve, offering insights toward more adaptive, self-evolving memory management for LLM agents.
💡 Research Summary
MemSkill tackles the rigidity and inefficiency of current memory management in large language model (LLM) agents by reframing static memory operations as learnable, reusable “memory skills.” The authors observe that most existing systems rely on a handful of hand‑crafted primitives (add, update, delete, skip) that embed strong human priors, struggle with diverse interaction patterns, and scale poorly as conversation histories grow. To overcome these limitations, MemSkill introduces three tightly coupled components: a controller, an executor, and a designer, forming a closed‑loop optimization pipeline.
Memory Skills are structured templates that describe when a skill should be applied, how it should be applied, its purpose, and any constraints. Each skill also contains a detailed content block that serves as a prompt for the executor. The skill bank is initialized with four generic primitives (INSERT, UPDATE, DELETE, SKIP) and is later expanded and refined through the designer’s interventions.
The controller selects a small, context‑relevant subset of skills for each text span. It encodes the current span and the already‑retrieved memories into a state embedding (hₜ) using a shared encoder f_ctx. Each skill’s description is embedded via f_skill, producing a skill vector (uᵢ). Compatibility with an evolving skill set is achieved by scoring skills through the inner product hₜᵀuᵢ, followed by a softmax over the dynamic skill set. A Gumbel‑Top‑K sampler then picks the top‑K skills without replacement, ensuring that only the most pertinent skills are passed to the executor.
The executor is a fixed LLM that receives a prompt composed of (i) the current text span, (ii) the retrieved memory items, and (iii) the selected skill descriptions and contents. Conditioned on this information, the LLM generates memory updates in a single forward pass, effectively applying multiple skills simultaneously. This contrasts with prior turn‑by‑turn pipelines that interleave handcrafted operations with LLM calls, leading to higher efficiency and the ability to operate on variable‑length spans.
The designer closes the loop by periodically mining a sliding hard‑case buffer that records failures (low reward, high error count). Representative hard cases are clustered and filtered; the selected cases are fed to another LLM that (a) refines existing skill descriptions/content and (b) proposes entirely new skills to address uncovered failure modes. The updated skill bank is then merged, and the controller resumes training with an increased exploration rate to adopt the new skills.
Training proceeds with reinforcement learning (RL) on downstream task rewards. After each span is processed, the constructed memory is evaluated on task‑specific queries (e.g., question answering, planning). The reward signal back‑propagates to the controller’s policy parameters, encouraging skill selections that lead to higher downstream performance. The designer’s updates occur at fixed intervals, alternating between skill‑usage phases and skill‑evolution phases.
Experiments span four benchmarks: LoCoMo (multi‑turn dialogues), LongMemEval (long‑text summarization), HotpotQA (multi‑document reasoning), and ALFWorld (text‑based interactive simulation). MemSkill consistently outperforms strong baselines such as Memory‑R1, Memα, and static‑skill pipelines, achieving 3–7 percentage‑point gains in task success rates. Notably, on LongMemEval, MemSkill reduces memory footprint by ~40% while improving accuracy, demonstrating the advantage of span‑level, skill‑conditioned processing for long histories. Ablation studies reveal that (i) removing the designer (fixed skill set) degrades performance, and (ii) feeding all skills rather than a top‑K subset inflates computation without accuracy gains.
Analysis of the evolved skill bank shows emergent, domain‑specific skills such as “Capture Temporal,” “Extract Activity,” and “Refine Details,” which were not present in the initial primitive set. These skills directly address failure patterns observed in the hard‑case analysis, confirming the effectiveness of the designer’s LLM‑driven refinement. The authors also discuss limitations: the designer’s reliance on LLM prompt quality, potential scaling concerns as the skill bank grows, and the need for meta‑learning or compression techniques to keep the controller’s scoring efficient.
In conclusion, MemSkill presents a novel paradigm where memory management is treated as a learnable, self‑evolving skill system. By jointly optimizing skill selection (via RL) and skill evolution (via LLM‑guided hard‑case analysis), the framework achieves adaptable, efficient, and scalable memory handling for LLM agents. Future directions include hierarchical skill banks, multimodal memory skills, and hybrid human‑LLM feedback loops to further enhance robustness and generalization.
Comments & Academic Discussion
Loading comments...
Leave a Comment