Empowerment for Continuous Agent-Environment Systems
This paper develops generalizations of empowerment to continuous states. Empowerment is a recently introduced information-theoretic quantity motivated by hypotheses about the efficiency of the sensorimotor loop in biological organisms, but also from considerations stemming from curiosity-driven learning. Empowemerment measures, for agent-environment systems with stochastic transitions, how much influence an agent has on its environment, but only that influence that can be sensed by the agent sensors. It is an information-theoretic generalization of joint controllability (influence on environment) and observability (measurement by sensors) of the environment by the agent, both controllability and observability being usually defined in control theory as the dimensionality of the control/observation spaces. Earlier work has shown that empowerment has various interesting and relevant properties, e.g., it allows us to identify salient states using only the dynamics, and it can act as intrinsic reward without requiring an external reward. However, in this previous work empowerment was limited to the case of small-scale and discrete domains and furthermore state transition probabilities were assumed to be known. The goal of this paper is to extend empowerment to the significantly more important and relevant case of continuous vector-valued state spaces and initially unknown state transition probabilities. The continuous state space is addressed by Monte-Carlo approximation; the unknown transitions are addressed by model learning and prediction for which we apply Gaussian processes regression with iterated forecasting. In a number of well-known continuous control tasks we examine the dynamics induced by empowerment and include an application to exploration and online model learning.
💡 Research Summary
This paper extends the concept of empowerment—an information‑theoretic measure of an agent’s controllability and observability—to continuous, vector‑valued state spaces and to settings where the transition dynamics are initially unknown. Empowerment quantifies the mutual information between an agent’s action sequence and the subsequent sensor observations, thereby capturing how much influence the agent can exert on the environment that it can later perceive. Earlier work confined empowerment to small discrete domains with known transition probabilities, limiting its applicability to real‑world control problems.
To overcome these limitations the authors introduce two complementary techniques. First, they approximate empowerment in continuous spaces using Monte‑Carlo sampling. For each candidate action (or short action sequence) the method generates a large number of stochastic roll‑outs from the current state, producing a set of future state samples. A kernel density estimator (KDE) is then applied to these samples to obtain an empirical probability density function. From this density the marginal entropy of the predicted observations and the conditional entropy given the actions are estimated, yielding an empirical mutual information that serves as the empowerment value. By increasing the number of roll‑outs the approximation converges, and the sampling process can be parallelised, making the approach feasible for real‑time applications.
Second, because the transition model is not assumed to be known a priori, the paper employs Gaussian Process Regression (GPR) to learn the dynamics online. GPR provides a non‑parametric Bayesian model of the mapping from (state, action) pairs to next‑state distributions, delivering both a mean prediction and an uncertainty (covariance) estimate. To predict multi‑step ahead outcomes the authors adopt an “iterated forecasting” scheme: the one‑step GP prediction is fed back as input for the next step, and this process is repeated for the desired horizon. The resulting sequence of predictive distributions is used as the basis for the Monte‑Carlo roll‑outs, ensuring that model uncertainty is naturally incorporated into the empowerment calculation.
The combined Monte‑Carlo/KDE empowerment estimator and the GPR‑based dynamics learner are evaluated on several canonical continuous control benchmarks: Cart‑Pole, Mountain‑Car, Acrobot, and a 2‑D planar robotic arm. In each experiment the agent starts with no knowledge of the dynamics and incrementally updates its GP model as it interacts with the environment. Empowerment naturally drives exploration toward regions of high predictive uncertainty, because such regions promise higher potential information gain. Consequently, the agent collects informative data more efficiently than standard exploration strategies such as ε‑greedy, Boltzmann exploration, or random action selection.
Empirical results show that empowerment‑guided policies achieve higher success rates and require fewer interaction steps to solve the tasks. As the GP model becomes more accurate, the empowerment values converge toward the true mutual information between actions and observations, leading the agent to refine its behavior and focus on truly controllable and observable aspects of the environment. The paper demonstrates that empowerment can serve as an intrinsic reward signal that replaces external task‑specific rewards, enabling agents to self‑organize toward salient states solely based on the dynamics.
Key contributions of the work are: (1) a scalable Monte‑Carlo approximation of continuous‑state empowerment using KDE, (2) an online GP‑based transition model with iterated forecasting that supplies the necessary predictive distributions, (3) a demonstration that the synergy of these components yields efficient, curiosity‑driven exploration and rapid model learning in benchmark control problems.
The authors discuss several avenues for future research, including extending the method to high‑dimensional sensory inputs (e.g., images), handling multi‑agent scenarios where agents’ empowerment may interact, and deploying the approach on physical robotic platforms to assess real‑time performance and safety. By bridging the gap between theoretical empowerment and practical continuous control, the paper positions empowerment as a powerful intrinsic motivation mechanism for autonomous agents operating in complex, partially unknown environments.