CADENT: Gated Hybrid Distillation for Sample-Efficient Transfer in Reinforcement Learning

CADENT: Gated Hybrid Distillation for Sample-Efficient Transfer in Reinforcement Learning
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Transfer learning promises to reduce the high sample complexity of deep reinforcement learning (RL), yet existing methods struggle with domain shift between source and target environments. Policy distillation provides powerful tactical guidance but fails to transfer long-term strategic knowledge, while automaton-based methods capture task structure but lack fine-grained action guidance. This paper introduces Context-Aware Distillation with Experience-gated Transfer (CADENT), a framework that unifies strategic automaton-based knowledge with tactical policy-level knowledge into a coherent guidance signal. CADENT’s key innovation is an experience-gated trust mechanism that dynamically weighs teacher guidance against the student’s own experience at the state-action level, enabling graceful adaptation to target domain specifics. Across challenging environments, from sparse-reward grid worlds to continuous control tasks, CADENT achieves 40-60% better sample efficiency than baselines while maintaining superior asymptotic performance, establishing a robust approach for adaptive knowledge transfer in RL.


💡 Research Summary

The paper tackles a fundamental challenge in reinforcement learning (RL) transfer: how to combine long‑term strategic knowledge with short‑term tactical guidance while adapting to domain shift between source and target tasks. Existing approaches either transfer low‑level value or feature representations, imitate a teacher policy (policy distillation), or embed high‑level task structure using automata or temporal logic. Each of these methods excels at one aspect—tactics or strategy—but fails to provide the other, and they typically treat the teacher’s knowledge as static, leading to negative transfer when the source and target domains differ.

CADENT (Context‑Aware Distillation with Experience‑gated Transfer) is introduced as a hybrid distillation framework that unifies both kinds of knowledge into a single learning signal and dynamically arbitrates between teacher guidance and the student’s own experience. The method consists of three main components:

  1. Hybrid Distillation – The teacher’s strategic knowledge is captured in a deterministic finite automaton (DFA) that encodes the required sequence of sub‑goals. By averaging the teacher’s Q‑values over all state‑action pairs that trigger a particular automaton transition, a transition value Q_AD(q, q′) is obtained. This value is turned into an intrinsic reward r_AD = λ_AD·Q_AD(q, q′) whenever the student makes a meaningful automaton transition, encouraging progress along the high‑level plan. Simultaneously, the teacher’s tactical knowledge is represented as a policy conditioned on the current automaton state, π_teacher(a|q). A policy‑gradient‑style correction term g_PD(s, a) = λ_PD·(π_teacher(a|q) – π_student(a|s)) nudges the student’s policy toward the teacher’s action distribution, providing fine‑grained, context‑aware advice.

  2. Experience‑gated Trust – For each state‑action pair, a volatility tracker V_t(s, a) maintains an exponential moving average of the absolute TD‑error magnitude. High volatility indicates that the student’s value estimate is unstable, while low volatility suggests confidence. A sigmoid‑based trust metric ω(s, a) = σ(−k·(V_t(s, a) – θ)) maps volatility to a weight in


Comments & Academic Discussion

Loading comments...

Leave a Comment