We consider a sequence of repeated interactions between an agent and an environment. Uncertainty about the environment is captured by a probability distribution over a space of hypotheses, which includes all computable functions. Given a utility function, we can evaluate the expected utility of any computational policy for interaction with the environment. After making some plausible assumptions (and maybe one not-so-plausible assumption), we show that if the utility function is unbounded, then the expected utility of any policy is undefined.
We consider a sequence of repeated interactions between an agent and an environment. Uncertainty about the environment is captured by a probability distribution over a space of hypotheses, which includes all computable functions. Given a utility function, we can evaluate the expected utility of any computational policy for interaction with the environment. After making some plausible assumptions (and maybe one not-so-plausible assumption), we show that if the utility function is unbounded, then the expected utility of any policy is undefined.
We will assume that the interaction between the agent and the environment takes place in discrete time-steps, or cycles. In cycle n, the agent outputs an action y n ∈ Y , and the environment inputs to the agent a perception x n ∈ X. Y and X are the sets of possible actions and perceptions, respectively, and are considered as subsets of N. Thus the story of all interaction between agent and environment is captured by the two sequences x = (x 1 , x 2 , . . . ) and y = (y 1 , y 2 , . . . ).
Let us introduce a notation for substrings. If s is a sequence or string, and {a, b} ⊆ N, a ≤ b, then define s b a = (s a , s a+1 , . . . s b ). We will denote the function instantiated by the environment as Q : Y * → X, so that ∀n ∈ N, x n = Q(y n 1 ). This means that the perception generated by the environment at any given cycle is determined by the agent’s actions on that and all previous cycles.
A policy for the agent is a function p : (Y * × X * ) → Y , so that an agent implementing p at time n will choose an action y n = p y n-1
If, at any time, an agent adopts some policy p, and continues to follow that policy forever, then p and Q taken together completely determine the future of the sequences (x n ) and (y n ). We are particularly interested in the future sequence of perceptions, so we will define a future function Ψ (Q, p, y n 1 , x n 1 ) = x ∞ n+1 . Because the precise nature of the environment Q is unknown to the agent, we will let Ω be the set of possible environments. Let F be a σ-algebra on Ω, and P : F → [0, 1] be a probability measure on F which represents the agent’s prior information about the environment.
We will also define a function Γ q : Y * → X * which represents the perception string output by environment q given some action string. Let Γ q (s) = q s 1 1 , q s 2 1 , . . . , q (s) . The agent will compare the quality of different outcomes using a utility function U : X N → R. We can then judge a policy by calculating the expected utility of the outcome given that policy, which can be written as
) …where Q is being treated as a random variable. When we write a string next to a sequence, as in x n 1 Ψ (Q, p, y n 1 , x n 1 ), we mean to concatenate them. Here, x n 1 represents what the agent has seen in the past, and Ψ (Q, p, y n 1 , x n 1 ) represents something the agent may see in the future. By concatenating them, we get a complete sequence of perceptions, which is the input required by the utility function U .
Notice that the expected utility above is a conditional expectation. Except on the very first time-step, the agent will already have some knowledge about the environment. After n cycles, the agent has output the string y n 1 , and the environment has output the string x n 1 . Thus the agent’s knowledge is given by the equation Γ Q (y n 1 ) = x n 1 . Agents such as AIXI (Hutter, 2007) choose actions by comparing the expected utility of different policies. Thus we will focus, in this paper, on calculating the expected utility of a policy.
Here we’ll make further assumptions about the hypothesis space (Ω, F, P ). While we could succinctly make strong assumptions that would justify our central claim, we will instead try to give somewhat weaker assumptions, even at the loss of some brevity.
Let Ω C be the set of computable total functions mapping Y * to X. We will assume that Ω ⊇ Ω C and that (∀q ∈ Ω C ) : {q} ∈ F and P ({q}) > 0. Thus we assume that the agent assigns a nonzero probability to any computable environment function.
Let Ω P be the set of computable partial functions from Y * to X. Then Ω C ⊂ Ω P . The computable partial functions Ω P can be indexed by natural numbers, using a surjective computable index function φ : N → Ω P . Since the codomain of φ is a set of partial functions, it may be unclear what we mean when we say that φ is computable. We mean that (i, s) → (φ (i)) (s), whose codomain is X, is a computable partial function. We will also use the notation φ i = φ (i).
We’ll now assume that there exists a computable total function ρ :
. Intuitively, we are saying that φ is a way of describing computable functions using some sort of language, and that ρ is a way of specifying lower bounds on probabilities based on these descriptions. Note that we make no assumption about ρ (i) when φ i /
∈ Ω C . To see an example of a hypothesis space satisfying all of our assumptions, let Ω = Ω C , let F = 2 ΩP , let φ be any programming langu
This content is AI-processed based on open access ArXiv data.