Hierarchical Successor Representation for Robust Transfer

Hierarchical Successor Representation for Robust Transfer
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The successor representation (SR) provides a powerful framework for decoupling predictive dynamics from rewards, enabling rapid generalisation across reward configurations. However, the classical SR is limited by its inherent policy dependence: policies change due to ongoing learning, environmental non-stationarities, and changes in task demands, making established predictive representations obsolete. Furthermore, in topologically complex environments, SRs suffer from spectral diffusion, leading to dense and overlapping features that scale poorly. Here we propose the Hierarchical Successor Representation (HSR) for overcoming these limitations. By incorporating temporal abstractions into the construction of predictive representations, HSR learns stable state features which are robust to task-induced policy changes. Applying non-negative matrix factorisation (NMF) to the HSR yields a sparse, low-rank state representation that facilitates highly sample-efficient transfer to novel tasks in multi-compartmental environments. Further analysis reveals that HSR-NMF discovers interpretable topological structures, providing a policy-agnostic hierarchical map that effectively bridges model-free optimality and model-based flexibility. Beyond providing a useful basis for task-transfer, we show that HSR’s temporally extended predictive structure can also be leveraged to drive efficient exploration, effectively scaling to large, procedurally generated environments.


💡 Research Summary

**
The paper addresses two fundamental shortcomings of the classic Successor Representation (SR) in reinforcement learning: its strong dependence on the current policy and the diffusion of its spectral components in topologically complex environments, which leads to dense, overlapping features that scale poorly. To overcome these issues, the authors introduce the Hierarchical Successor Representation (HSR), which integrates temporal abstraction through the options framework.

An option ω = ⟨I, π, β⟩ defines a temporally extended, interpretable action segment with its own initiation set, intra‑option policy, and termination condition. By treating each primitive action as a one‑step pseudo‑option, HSR defines a high‑level policy μ over the extended action set (primitive actions ∪ options) and derives a Bellman operator for the hierarchical occupancy matrix M^μ:

 M^μ = B^μ + G^μ M^μ

where B^μ is the average intra‑option SR and G^μ is a continuation kernel that can be computed analytically as G^μ = γ M^{\bar a} diag(β^{\bar a}). The authors prove that this operator is a contraction for any discount factor γ < 1, guaranteeing convergence of a TD‑style update analogous to the classic SR TD‑learning but now operating at the option level.

To further reduce policy dependence, the paper proposes Expected HSR (eHSR), obtained by averaging HSR matrices across a distribution of pre‑training tasks (different reward configurations). eHSR captures the environment’s transition structure rather than any particular policy, making it stable when the reward function changes.

For dimensionality reduction, the authors compare Singular Value Decomposition (SVD) and Non‑Negative Matrix Factorisation (NMF). While SVD efficiently compresses globally smooth SR features, it smears the piecewise‑smooth structure of HSR, producing ringing artifacts. In contrast, NMF yields a sparse, parts‑based basis Φ · H that aligns with the natural modularity introduced by options. In multi‑compartment (e.g., four‑room) mazes, NMF applied to eHSR discovers basis vectors that correspond to bottleneck states and room interiors, whereas NMF on classic SR fails due to lack of intra‑room variance.

Empirical evaluation focuses on three hypotheses: (a) HSR features are more robust to policy shifts than SR; (b) HSR’s topology is uniquely amenable to NMF, producing interpretable low‑rank bases; (c) HSR can drive efficient intrinsic‑motivation exploration.

  1. Transfer Learning – Agents using linear function approximation with rows of HSR, SR, or one‑hot encodings were trained on a four‑room maze to reach goal G₁, then transferred to a new goal G₂. HSR‑based agents retained performance after the switch, requiring far fewer steps to re‑learn the new task than SR‑based agents, whose features had to be re‑estimated.

  2. Interpretability & Sample Efficiency – In larger multi‑room environments, NMF‑derived HSR bases reduced the dimensionality to a handful of components that each highlighted a specific compartment or corridor. When these bases were used for linear value approximation, agents achieved comparable or better performance with dramatically fewer samples than when using raw SR rows or SVD bases.

  3. Exploration – The authors used the prediction error of HSR as an intrinsic reward. Because HSR predicts occupancy over extended options, its error naturally highlights unexplored bottlenecks and distant rooms. Experiments in procedurally generated mazes showed that HSR‑driven exploration covered the entire state space with far fewer environment interactions than standard count‑based or curiosity‑based methods, demonstrating scalability to large, stochastic environments.

Overall, the Hierarchical Successor Representation preserves the computational advantages of SR (linear value decomposition, fast reward re‑evaluation) while mitigating its policy‑dependence through temporal abstraction. Coupled with NMF, HSR yields sparse, interpretable, and transferable state representations that support rapid adaptation, efficient exploration, and scaling to complex, procedurally generated domains. The work bridges model‑free efficiency and model‑based flexibility, offering a promising foundation for robust transfer learning in both artificial agents and potentially in neuroscientific models of hippocampal‑prefrontal navigation.


Comments & Academic Discussion

Loading comments...

Leave a Comment