활성 추론과 자유 에너지 원리의 분리 이산 상태 공간에서의 제약 발산 최소화 접근

Reading time: 5 minute
...

📝 Abstract

We seek to clarify the concept of active inference by disentangling it from the Free Energy Principle. We show how the optimizations that need to be carried out in order to implement active inference in discrete state spaces can be formulated as constrained divergence minimization problems which can be solved by standard mean field methods that do not appeal to the idea of expected free energy. When it is used to model perception, the perception/action divergence criterion that we propose coincides with variational free energy. When it is used to model action, it differs from an expected free energy functional by an entropy regularizer.

💡 Analysis

We seek to clarify the concept of active inference by disentangling it from the Free Energy Principle. We show how the optimizations that need to be carried out in order to implement active inference in discrete state spaces can be formulated as constrained divergence minimization problems which can be solved by standard mean field methods that do not appeal to the idea of expected free energy. When it is used to model perception, the perception/action divergence criterion that we propose coincides with variational free energy. When it is used to model action, it differs from an expected free energy functional by an entropy regularizer.

📄 Content

The idea that sense perception can be understood as an unconscious process of inference dates back to Helmholtz. Among his many other scientific contributions is the concept of Helmholtz free energy which quantifies the energy in a system that is available to do physical work such as moving a piston. Hinton and his collaborators realized that approximate Bayesian inference in machine learning could be viewed as a problem of optimizing an abstract mathematical quantity which is formally identical to Helmholtz free energy and which has come to be known as variational free energy. (Thus the Helmholtz machine [1] and the Boltzmann machine [2].) In a long series of papers (notably [3,4,5]), Friston has developed the connections between these ideas and greatly expanded their scope by showing how action as well as sense perception can be modelled as approximate Bayesian inference.

If perceptions are inferences drawn from bodily sensations, then actions can be viewed as fulfilling predictions of future sensations. Whereas perceptual inference is thought to consist of computing approximate posterior probability distributions, active inference is a matter of computing approximate predictive distributions. Both types of inference can be formulated as optimization problems which are amenable to the sort of variational inference algorithms that have been developed in machine learning. In the case of perception, the objective function to be optimized is just variational free energy and this optimization can be achieved by the well known mean field approximation [6]. In the case of action, a new type of objective function known as expected free energy has been proposed and new variational methods have been developed to optimize it [7,8,9,10,11].

“It is said that the Free Energy Principle is difficult to understand.” Thus the opening sentence of a review article that purports to simplify the Free Energy Principle but brooks no compromise on Friston’s program of grounding variational inference by biological agents in statistical physics [12]. For a physicist studying active inference, it is natural to assume that a biological agent models its world as a random dynamical system governed by a stochastic differential equation and to construe the agent’s striving to maintain homeostasis as a pullback attractor. However, in simulating the behaviour of biological agents or in designing autonomous AI agents capable of multistep hierarchical planning, an engineer would assume instead that an agent models its world as a discrete state space that evolves in discrete time steps using a Hidden Markov Model or Partially Observable Markov Decision Process. The mathematical apparatus that the engineer needs to deploy is much simpler than that required by the physicist.

In this paper we aim to give a self-contained and mathematically rigorous account of active inference in discrete state spaces without appealing to any of the machinery that has been developed in the context of continuous state spaces. One advantage of the discrete set up is that Hidden Markov Models accommodate the path integral formulation of the active inference problem quite readily. Another is that mixed continuous/discrete state models can be avoided since it is natural to model the actions available to an agent in a given situation as a discrete set and the actions themselves as discrete transition probability matrices. This enables the distinction between states and actions to be dissolved by augmenting the definition of a state to include a tag which indicates which action is currently underway. So a sufficiently rich Hidden Markov Model structure obviates the need to refer to actions explicitly. Sequences of actions (and even sequences of sequences of actions) can be handled in the same way as actions since they too can be modelled by transition probability matrices [13]. Thus we can conceive of a type of agent that is endowed with a Hidden Markov Model whose structure is sufficiently rich that it can encode all of the agent’s knowledge of the dynamics of the world it inhabits, including its own actions and policies and their sensory consequences. For such an agent, perceptual and active inference would consist in using this Hidden Markov Model to calculate posterior and predictive probability distributions conditioned on its history up to the present moment, if only these computations were tractable. Given that the agent does not have the resources to calculate these distributions exactly, it needs to resort to methods of calculating approximate posterior and predictive distributions that yield results which are accurate enough to be actionable.

If active inference is viewed in this way, there is no a priori requirement to appeal to the Free Energy Principle. So, although active inference and the Free Energy Principle are usually conflated (as in the title of the active inference textbook [14], for example), we will distinguish between the two. As we will see

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut