SHAP-Guided Kernel Actor-Critic for Explainable Reinforcement Learning

SHAP-Guided Kernel Actor-Critic for Explainable Reinforcement Learning
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Actor-critic (AC) methods are a cornerstone of reinforcement learning (RL) but offer limited interpretability. Current explainable RL methods seldom use state attributions to assist training. Rather, they treat all state features equally, thereby neglecting the heterogeneous impacts of individual state dimensions on the reward. We propose RKHS-SHAP-based Advanced Actor-Critic (RSA2C), an attribution-aware, kernelized, two-timescale AC algorithm, including Actor, Value Critic, and Advantage Critic. The Actor is instantiated in a vector-valued reproducing kernel Hilbert space (RKHS) with a Mahalanobis-weighted operator-valued kernel, while the Value Critic and Advantage Critic reside in scalar RKHSs. These RKHS-enhanced components use sparsified dictionaries: the Value Critic maintains its own dictionary, while the Actor and Advantage Critic share one. State attributions, computed from the Value Critic via RKHS-SHAP (kernel mean embedding for on-manifold and conditional mean embedding for off-manifold expectations), are converted into Mahalanobis-gated weights that modulate Actor gradients and Advantage Critic targets. We derive a global, non-asymptotic convergence bound under state perturbations, showing stability through the perturbation-error term and efficiency through the convergence-error term. Empirical results on three continuous-control environments show that RSA2C achieves efficiency, stability, and interpretability.


💡 Research Summary

The paper introduces RSA2C, a novel actor‑critic algorithm that integrates Shapley‑value based state attributions into a kernel‑enhanced reinforcement‑learning framework, thereby achieving explainable, efficient, and stable learning for continuous‑control tasks. Traditional actor‑critic methods suffer from opaque policy updates because they treat all state dimensions uniformly, ignoring the heterogeneous influence of each feature on the return. Existing explainable RL approaches either provide post‑hoc visualizations or constrain the policy class, but they do not feed attribution information back into the learning loop.

RSA2C addresses these gaps in three key ways. First, it computes state‑level Shapley values analytically using RKHS‑SHAP, which leverages kernel mean embeddings (KME) for on‑manifold (observational) expectations and conditional mean embeddings (CME) for off‑manifold (interventional) expectations. This eliminates the exponential cost of coalition sampling and respects the underlying data geometry, yielding low‑variance, robust attributions even in high‑dimensional, correlated state spaces.

Second, the obtained Shapley values are transformed into Mahalanobis‑gated weights. By embedding a Mahalanobis distance metric into an operator‑valued kernel (OVK) that defines the actor’s policy space, the algorithm modulates the policy gradient and the advantage‑critic targets proportionally to each feature’s importance. Consequently, learning updates focus on the most influential dimensions, improving sample efficiency and reducing sensitivity to stochastic perturbations.

Third, RSA2C adopts a two‑timescale architecture with kernel‑based function approximation. The actor lives in a vector‑valued RKHS equipped with the Mahalanobis‑weighted OVK, while the value and advantage critics reside in scalar RKHSs. All three components maintain sparsified dictionaries via Approximate Linear Dependence (ALD), keeping computational complexity linear in the dictionary size and enabling online updates. The dictionaries evolve as new samples that are linearly independent of existing ones are added, preserving expressive power without exploding memory usage.

On the theoretical side, the authors prove a global, non‑asymptotic convergence bound under state perturbations. The error decomposition separates a perturbation‑error term (capturing the effect of noisy Shapley estimates and policy learning under disturbed states) from a convergence‑error term (covering tracking error and two‑timescale approximation error). The Mahalanobis‑gated Shapley weights explicitly appear in the perturbation‑error, demonstrating that the attribution mechanism stabilizes learning against noise.

Empirically, RSA2C is evaluated on three MuJoCo continuous‑control benchmarks: Hopper‑v4, Walker2d‑v4, and Ant‑v5. Two variants are tested—RSA2C‑KME (using only on‑manifold Shapley values) and RSA2C‑CME (incorporating off‑manifold conditional expectations). Both variants outperform strong baselines such as PPO, SAC, and TD3 in terms of average return (5–12 % improvement) while exhibiting markedly lower variance in learning curves. Additional experiments injecting stochastic state perturbations show that RSA2C‑CME maintains stable performance, confirming robustness. Visualizations of the Shapley attributions reveal intuitive patterns, e.g., joint angles and velocities receive higher importance in locomotion tasks, providing genuine interpretability of the learned policy.

In summary, RSA2C unifies (1) analytically computed, kernel‑based Shapley attributions, (2) Mahalanobis‑weighted policy updates, (3) sparse kernel function approximation, and (4) rigorous non‑asymptotic convergence under noisy states. This combination constitutes the first actor‑critic framework that simultaneously delivers sample‑efficient learning, provable stability, and intrinsic, dimension‑level explainability for high‑dimensional continuous control problems.


Comments & Academic Discussion

Loading comments...

Leave a Comment