Convergence of Expected Utility for Universal AI
We consider a sequence of repeated interactions between an agent and an environment. Uncertainty about the environment is captured by a probability distribution over a space of hypotheses, which includes all computable functions. Given a utility func…
Authors: Peter de Blanc
CONVERGENCE OF EXPECTED UTI LITY F OR UNIVERSAL AI PETER DE BLANC DEP AR TMENT OF MA THEMA TICS TEMPLE UNIVERSITY Date : October 23, 2018. W ritten in asso ciation with the Singularity Institute for Artificial Int elligence. 0 CONVERG ENCE OF EXPECTED UTILITY F OR UNIVERSAL AI 1 1. Abstra ct W e co ns ider a seq uence of r ep eated interactions b etw een a n a gent and an en vi- ronment. Uncertaint y ab out the en vironment is captured b y a probabilit y dis tr i- bution ov er a space o f h ypo thes es, which includes all computable functions. Given a utility function, we can ev aluate the exp ected utilit y o f any computational po licy for int eraction with the e nvironment. After making so me pla usible assumptions (and maybe one not-s o-plausible assumption), we show that if the utility function is unbounded, then the exp ected utility o f any p olicy is undefined. 2. AI Formalism W e will assume tha t the interaction betw een the ag ent a nd the environmen t takes place in dis crete time-steps, or cycles. In cycle n , the ag ent outputs an a ction y n ∈ Y , and the environment inputs to the a gent a p erce ptio n x n ∈ X . Y and X are the sets of p o s sible actions and p erceptions, re s pe ctively , a nd a re considered as subsets of N . Th us the story of a ll interaction b etw een agen t and en vironment is captured by the t w o sequences x = ( x 1 , x 2 , . . . ) a nd y = ( y 1 , y 2 , . . . ). Let us in tro duce a notation for substrings. If s is a s equence or string, and { a, b } ⊆ N , a ≤ b , then define s b a = ( s a , s a +1 , . . . s b ). W e will denote the function instantiated by the environmen t a s Q : Y ∗ → X , so that ∀ n ∈ N , x n = Q ( y n 1 ). This means that the p erception generated by the environmen t at any giv en cycle is determined b y the a gent’s ac tio ns o n that and all prev ious cycles. A policy for the agent is a function p : ( Y ∗ × X ∗ ) → Y , so that an agen t implemen ting p at time n will choose a n action y n = p y n − 1 1 , x n − 1 1 . If, at any time, a n ag ent adopts some p olicy p , a nd contin ues to follow that po licy forever, then p and Q ta ken together co mpletely determine the future of the sequences ( x n ) and ( y n ). W e are par ticularly in terested in the future sequence of per ceptions, s o we will define a futur e funct ion Ψ ( Q, p, y n 1 , x n 1 ) = x ∞ n +1 . Because the precise na ture of the environment Q is unkno wn to the agent, w e will le t Ω be the set of po ssible environmen ts. Let F b e a σ - algebra on Ω, a nd P : F → [0 , 1] be a probability measure on F which repre sents the agent’s prior information a b out the en vironment. W e will also define a function Γ q : Y ∗ → X ∗ which represents the p erception string o utput by environmen t q given some action string. Let Γ q ( s ) = q s 1 1 , q s 2 1 , . . . , q ( s ) . The a gent will compare the quality o f different outcomes using a utility function U : X N → R . W e can then judge a p olicy by ca lculating the exp ected utility of the outcome g iven that p o licy , which can b e written as (1) E ( U ( x n 1 Ψ ( Q, p, y n 1 , x n 1 )) | Γ Q ( y n 1 ) = x n 1 ) ...where Q is b eing tr eated as a ra ndom v ariable. When we write a string next to a sequence , as in x n 1 Ψ ( Q, p, y n 1 , x n 1 ), w e mean to conca tena te them. Here, x n 1 repre- sents wha t the a g ent ha s seen in the pas t, a nd Ψ ( Q, p, y n 1 , x n 1 ) represe nts something the ag ent may see in the future. By co ncatenating them, we get a complete sequence of p erceptions , whic h is the input require d b y the utility function U . Notice that the exp ected utility ab ove is a co nditio na l expectatio n. Except on t he v ery fir st time- s tep, the ag ent will already hav e so me k nowledge about 2 PETER DE BLANC DEP AR TME NT OF MA THEMA TICS TEM PLE UNIVE RSITY the environment. After n cycles, the agent has output the string y n 1 , a nd the environmen t ha s output the string x n 1 . Thus the age n t’s knowledge is given by the equation Γ Q ( y n 1 ) = x n 1 . Agents suc h as AIXI (Hutter, 20 07) choose a ctions by comparing the e xp e cted utilit y of differen t p olic ie s . Th us we will fo cus, in this pa pe r , on calcula ting the exp ected utilit y of a po licy . 3. Assumptions about the Hypothesis S p a ce Here w e’ll make further assumptions ab out the hypothesis space (Ω , F, P ) . While we co uld succinctly mak e stro ng assumptions that w ould justify our central claim, we will instea d try to give s omewhat weak er assumptions, even at the los s of some brevity . Let Ω C be the set of computable total functions mapping Y ∗ to X . W e will assume that Ω ⊇ Ω C and that ( ∀ q ∈ Ω C ) : { q } ∈ F and P ( { q } ) > 0. Th us we assume that the agent assigns a no nzero pr obability to any computable environmen t function. Let Ω P be the set o f computable partial functions from Y ∗ to X . Then Ω C ⊂ Ω P . The computable partial functions Ω P can b e indexed by natural num b er s, using a sur jective computable index function φ : N → Ω P . Sin ce the co domain o f φ is a set of partial functions , it may b e unclear what we mean when we say that φ is co mputable. W e mean that ( i, s ) → ( φ ( i )) ( s ), whose co domain is X , is a computable par tial function. W e will a lso use the nota tio n φ i = φ ( i ). W e’ll now assume that there exists a computable total function ρ : N → Q such that if φ i ∈ Ω C , then 0 < ρ ( i ) ≤ P ( { φ ( i ) } ). Intuitiv ely , we ar e s aying that φ is a wa y o f descr ibing co mputable functions using some so rt of language, and that ρ is a wa y of sp ecifying low er b ounds o n probabilities based on these descr iptions. Note that we make no a ssumption ab out ρ ( i ) when φ i / ∈ Ω C . T o see an example of a hypo thesis spa ce satisfying a ll of our assumptions, let Ω = Ω C , let F = 2 Ω P , let φ b e any progra mming lang uage, a nd let ρ ( i ) = 2 − i . Let (2) O = X i ∋ ( φ i ∈ Ω C ) ρ ( i ) and for any ω ∈ Ω, let (3) P ( { ω } ) = 1 O X i ∋ ( φ i = ω ) ρ ( i ) 4. Assumptions about the Utility Function Perhaps the most philosophically questionable assumption in this pa p er has al- ready been made in defining the domain of the utilit y function U as X N , the set of p e r ception-sequence s . This is like assuming that a p erson ca res not ab out his or her family and friends, but abo ut his or her p erception of his or her family and friends. Since the utility function U : X N → R takes as its argument a n infinite sequenc e , we must discuss what it means for such a function to b e computable. O b viously any computation which terminates can only lo ok at a finite num ber of terms. Therefor e we will tr y to approximate U ( x ) using prefixes of x . W e say that U is computable if there exist co mputable functions U L , U U : X ∗ → Q ∪ {−∞ , + ∞} such that, if x ∈ X N and ¯ x ∈ X ∗ and ¯ x ⊑ x , then: CONVERG ENCE OF EXPECTED UTILITY F OR UNIVERSAL AI 3 • U L ( ¯ x ) ≤ U ( x ) ≤ U U ( ¯ x ) • U L ( ¯ x ) → U ( x ) and U U ( ¯ x ) → U ( x ) as ¯ x → x . In any case , we will not assume that U is computable, b ecaus e we do not need such a strong as sumption to prov e our claims. Instead we will define tw o pos sible conditions. Definition 1 . L et D ⊆ X N and let U : D → R . L et D p = { s ∈ X ∗ | ( ∃ d ∈ D ) : s ⊑ d } . Then U is computably un bo unded from above on D if ther e exists a c omputable p artial fu n ction U L : D p → Z such that: • ( ∀ d ∈ D ) ( ∀ s ∈ D p ) : if s ⊑ d , and if U L ( s ) exists, then U L ( s ) ≤ U ( d ) . • ( ∀ m ∈ Z ) ( ∃ s ∈ D p ) : U L ( s ) > m . U is c omputably unb ounde d fr om b elow if − U is computably unbounded fro m ab ov e. Note in particular that any computable function on X N which is un bo unded from ab ov e is computably un bounded from ab ov e, and an y computable function which is unbounded from b elow is computably unbounded from below. The following lemma will help us find environmen ts whic h generate large amounts of utility . When co ns idering f in the lemma , think of U L ab ov e. Lemma 1. Supp ose C ⊆ X ∗ , and f : C → Z is a c omputable p artial function such that ( ∀ m ∈ Z ) ( ∃ c ∈ C ) : f ( c ) > m . The n ther e exists a c omputable t otal function H : Z → C su ch that, ( ∀ m ∈ Z ) : f ◦ H ( m ) ≥ m . In o ther words, g iven an unb ounde d partial function f , there is a computable function H which finds a n input on which f will exceed any given b ound. Pr o of. First we’ll index C ; let C = { c 1 , c 2 , . . . } . If f w ere a total function, we could simply let H ( m ) = c min { i ∈ N : f ( c i ) >m } . W e would c ompute this by first computing f ( c 1 ), then f ( c 2 ), etc. Unfortunately we only have that f is a partial function, so w e can not pro ce ed in this wa y . Instead, we’ll note that for any input on which f halts, it must halt in a spe c ific nu mber of steps. The Cantor pa iring function π : N × N → N , π ( k 1 , k 2 ) = 1 2 ( k 1 + k 2 )( k 1 + k 2 + 1) + k 2 is a bijection, so w e can use π − 1 to index all pair s of natura l nu mbers. Then we can simulate f on every po ssible input for every num ber of steps, which will allo w us to ev aluate f on every input for whic h f halts. def H(m): for n in (1, 2, 3, ...): let (t, i) = pi^-1 (n) simula te f(c_i) for t steps ... if it does not finish : do nothi ng. ... if it does finish: if f(c_i ) >= m: return c_i Then H is a computable total function and f ◦ H ( m ) ≥ m . 4 PETER DE BLANC DEP AR TME NT OF MA THEMA TICS TEM PLE UNIVE RSITY 5. Resul ts Let R b e the set of all computable par tial functions mapping N to N , and let θ : N → R b e a computable index (analog ous to o ur other index function φ ). Let (4) B ( n ) = max k ≤ n θ k (0) Lemma 2 . L et f ∈ R b e a total function. Then B ( n ) > f ( n ) infi nitely often. Pr o of. Suppose not. Then B ( n ) > f ( n ) o nly finitely many times, so there exists some c ∈ N suc h that ( ∀ n ∈ N ) : f ( n ) + c > B ( n ). Let C ( n, m ) = f ( n ) + c . By a corollar y of the Recursion Theorem, there exists m ∈ N such that ( ∀ n ∈ N ) : θ m ( n ) = C ( m, n ) = f ( m ) + c . By definition, B ( m ) ≥ θ m (0) = C ( m, 0) = f ( m ) + c > B ( m ). So B ( m ) > B ( m ), which is a c o ntradiction. Now supp ose tha t at time n + 1, the agent has already taken actions y n 1 and made obser v ations x n 1 , and is considering the expected utility of p olic y p . Let D = { s ∈ X N : s n 1 = x n 1 } . Theorem 1. I f U is c omputably unb ounde d fr om ab ove on D , then E ( U ( x n 1 Ψ ( Q, p, y n 1 , x n 1 )) | Γ Q ( y n 1 ) = x n 1 ) is either u ndefine d or + ∞ . Pr o of. Let U L : D p → Z b e as in definition 1. Then by Lemma 1, there exists H : Z → D p such tha t ( ∀ m ∈ Z ) : U L ( H ( m )) > m . H here is intended to be used to construct sequences with high utility . Since H outputs a string r ather than a sequence, we will pad it to get a sequence. Let c ∈ X b e some a rbitrary word in the perce ption a lphab et. Then let ¯ H : Z → D , where ¯ H ( n ) is a sequence b eginning with H ( n ), follow e d by c, c , c , . . . . F or brev it y , let W p ( q ) = x n 1 Ψ ( q , p, y n 1 , x n 1 ). W p ( q ) represents the complete se- quence of perce ptions received by the agen t, assuming that it con tin ues to imple- men t p olicy p in en vironment q . W e will now break up the exp ected utility into tw o terms, dep ending on whether or not Q ∈ Ω C . E ( U ( W p ( Q )) | Γ Q ( y n 1 ) = x n 1 ) = P ( Q ∈ Ω C ) E ( U ( W p ( Q )) | Γ Q ( y n 1 ) = x n 1 , Q ∈ Ω C ) + P ( Q / ∈ Ω C ) E ( U ( W p ( Q )) | Γ Q ( y n 1 ) = x n 1 , Q / ∈ Ω C ) = X q ∈ Ω C U ( W p ( q )) P ( { q } | Γ Q ( y n 1 ) = x n 1 ) + P ( Q / ∈ Ω C ) E ( U ( W p ( Q )) | Γ Q ( y n 1 ) = x n 1 , Q / ∈ Ω C ) W e will s how that the se r ies: X q ∈ Ω C U ( W p ( q )) P ( { q } | Γ Q ( y n 1 ) = x n 1 ) has infinitely many terms ≥ 1 . W e will do this by finding a sequence of en vi- ronments whose utilities grows very quic kly - more quickly than their pro babilities can shr ink . CONVERG ENCE OF EXPECTED UTILITY F OR UNIVERSAL AI 5 By equation 4, for ea ch j ∈ N there exis ts u j ∈ N suc h that u j ≤ j a nd θ u j (0) = B ( j ). Now we define a ma p on function indices G : N → N suc h that: φ G ( n ) ( γ ) = H ( θ n (0)) | γ | So G takes th e θ -index of an N → N f unction (sa y , g ), and returns the φ - index of an en vironment which is co mpatible with all the data so far, and which is guara nteed to pr o duce utility grea ter than g (0). W e can a ssume that G is a computable function. So our sequence o f environmen ts will b e φ G ( u j ) ∞ j =1 . Then U W p ( φ G ( u j ) ) ≥ B ( j ). Now let (5) ¯ ρ ( j ) = ⌈ max k ≤ j 1 ρ ( G ( k )) ⌉ Then ¯ ρ is a c o mputable, nondecrea s ing function. Since ¯ ρ is computable, B ( j ) ≥ ¯ ρ ( j ) infinitely o ften. Since u j ≤ j , then by definition, ¯ ρ ( j ) ≥ 1 ρ ( G ( u j )) ≥ 1 P ( { φ G ( u j ) } ) . P (Γ Q ( y n 1 ) = x n 1 | Q = φ G ( u j ) ) = 1, so b y Bayes’ Rule, P ( { φ G ( u j ) }| Γ Q ( y n 1 ) = x n 1 ) ≥ P ( { φ G ( u j ) } ). Since b oth sides a re p o sitive, we take the r ecipro cal to get 1 P ( { φ G ( u j ) }| y n 1 ,x n 1 ) ≤ 1 P ( { φ G ( u j ) } ) . By transitiv ity , U W p ( φ G ( u j ) ) ≥ 1 P ( { φ G ( u j ) }| Γ Q ( y n 1 )= x n 1 ) infinitely often, s o U W p ( φ G ( u j ) ) P ( { φ G ( u j ) }| Γ Q ( y n 1 ) = x n 1 ) ≥ 1 infinitely often. Since the series contains infinitely man y ter ms ≥ 1, its limit is e ither + ∞ or nonex- istent. Corollary 1. If U is c omputably unb oun de d fr om b elow on D , t hen E ( U ( x n 1 Ψ ( Q, p, y n 1 , x n 1 )) | Γ Q ( y n 1 ) = x n 1 ) is either u ndefine d or −∞ . Pr o of. By definition, − U is co mputably unbo unded fro m ab ove. Thus, by the- orem 1, E ( − U ( x n 1 Ψ ( Q, p, y n 1 , x n 1 )) | Γ Q ( y n 1 ) = x n 1 ) is either undefined or + ∞ . So E ( U ( x n 1 Ψ ( Q, p, y n 1 , x n 1 )) | Γ Q ( y n 1 )) = − E ( − U ( x n 1 Ψ ( Q, p, y n 1 , x n 1 )) | Γ Q ( y n 1 ) = x n 1 ) is either undefined or − ∞ . Corollary 2. If U is c omputably unb ounde d fr om b oth b elow and ab ove on D , then E ( U ( x n 1 Ψ ( Q, p, y n 1 , x n 1 )) | Γ Q ( y n 1 ) = x n 1 ) is undefine d. Pr o of. By theo rem 1, E ( U ( x n 1 Ψ ( Q, p, y n 1 , x n 1 )) | Γ Q ( y n 1 ) = x n 1 ) is either undefined or + ∞ . By corollary 1, it is either undefined or −∞ . Th us it is undefined. 6. Discussion Our main r esult implies that if you hav e an un bounded, perception deter mined, computable utility function, and you use a Solomonoff-like pr ior (So lo monoff, 196 4), then y ou hav e no wa y to c hoo se betw een policies using expected utility . So which of these things should we c hange? W e could use a non-p erce ption determined utility function. Then our main result would no t apply . In this cas e, the existence of bo unded exp e cted utility will dep end on the utility function. It may b e p oss ible to g eneralize our argument to some larger class of utilit y functions which ha ve a different domain. W e could use an uncomputable utility function. F o r instance, if the utilit y of any p erception-s e quence is defined as equal to its Kolmo g orov complexity , then the utilit y function is unbounded but the exp ected utilit y of any p o licy is finite. 6 PETER DE BLANC DEP AR TME NT OF MA THEMA TICS TEM PLE UNIVE RSITY W e could use a smaller h ypothesis space ; p erhaps not all computable en viron- men ts should b e c o nsidered. The s implest approach ma y b e to use a bo unded utilit y function. The n co nv e r - gence is guaranteed. References [1] Hutter, M., Universal Algorithmic Intel ligenc e: A mathematic al top-down appr o ach , Artifi- cial General In telligence (2007), Springer [2] Wikipedia, Kleene’s Recursion T heorem (http ://en.wikip edia.org/wiki/Kleene’s recursion theorem) [3] Solomonoff, R., A F ormal The ory of Inductive Infer e nc e , Information and Con trol, P art I: V ol 7, N o. 1, pp. 1-22, March 1964
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment