Diverse Consequences of Algorithmic Probability
We reminisce and discuss applications of algorithmic probability to a wide range of problems in artificial intelligence, philosophy and technological society. We propose that Solomonoff has effectively axiomatized the field of artificial intelligence…
Authors: Eray "Ozkural
Div erse Consequences of Algorithmic Probabilit y Eray ¨ Ozkural Computer Engineering Department, Bilk ent Universit y , Ank ara, T urkey Abstract. W e reminisce and discuss applications of alg orithmic prob- abilit y to a wide range of problems in a rtifi cial intelligence, philosoph y and technolo gical so ciet y . W e prop ose that Solomonoff has eff ectively ax- iomatized the field of artificial intellig en ce, therefore establishing it as a rigorous scientific discipline. W e also relate to our own work in incremen- tal machine learning and ph ilosoph y of complexity . 1 In tro duction Ray Solomonoff w a s a pioneer in mathematica l Artificial Intelligence (AI), who se prop osal of Algorithmic Pr o bability (ALP) has led t o div erse theor etical conse - quences and applications , most no tably in AI. In this pap er, we try to g iv e a sense of the s ig nificance of his theo retical contributions, reviewing the essence of his prop osal in an acc e s sible wa y , and r e c oun ting a few, seemingly unrela ted, diverse cons e quences which, in our o pinion, hin t tow ar ds a philoso phically clear world-view that has r arely b een acknowledged by the grea ter scientific commu- nit y . That is to say , we tr y to give the reader a glimpse of what it is like to consider the consequences o f ALP , and what idea s might lie be hind the theoret- ical mo del, as we imag ine them. Let M be a reference machine which co rresp onds to a universal computer 1 with a prefix-free co de. In a prefix-free co de, no co de is a prefix o f another. This is also called a self-de limiting co de, as most r easonable co mputer pr o gramming languages ar e. Solo mo noff inquired the pr obability that a n output string x is generated by M considering the whole space of p oss ible programs . By giving each pr ogram bitstr ing p an a prio ri pro babilit y of 2 −| p | , w e can ensure that the space of programs meets the probability axioms (by the extended Kraft inequality [2]). In o ther words, we imagine that we tos s a fa ir co in to generate each bit o f a r andom progra m. This probability mo del of prog rams entails the following probability mass function (p.m.f.) for s trings x ∈ { 0 , 1 } ∗ : P M ( x ) = X M ( p )= x ∗ 2 −| p | (1) which is the probability that a random pr ogram w ill output a pr efix of x . P M ( x ) is calle d the algorithmic pr ob ability of x for it assumes the definition of pr ogram based pr o bability . W e use P when M is clear fro m the c o n tex t to avoid clutter. 1 Optionally , it can b e probabilistic to deal with general induction problems, i.e., it has access to a random num b er generator [1 , Section 4]. 2 Solomonoff Induction Using this pro bability mode l o f bitstr ings, one can make pr edictions. Intuitiv ely , we can state that it is imp ossible to imag ine intelligence in the absence of a n y prediction a bilit y: pur ely ra ndo m behavior is decis iv ely non- intelligent. Since, P is a universal proba bilit y mo del, it ca n b e used a s the bas is of universal predic- tion, and thus intelligence. Perhaps, Solomo noff ’s mo st significant contributions were in the field of AI, as he envisioned a machine that can learn a n y thing from scratch. Reviewing his ear ly pap ers such as [3,4], we see that he has established the theoretical justification for machine learning and data mining fields. F ew resear chers could ably make claims a bout universal intelligence as he did. Unfor- tunately , not all of his idea s have r eached fruition in practice; yet there is little doubt tha t his approa c h was the cor rect basis for a scienc e o f intelligence. His main pro pos al for machine learning is inductive inference [5,6] circ a 1964, for a v a riety of pro blems such a s seq uence prediction, set induction, op erator in- duction and grammar induction [7 ]. Without muc h loss of generality , w e can discuss sequence prediction on bitstrings. Assume that there is a c omputable p.m.f. of bitstrings P 1 . Given a bitstring x drawn fro m P 1 , we ca n define the conditional pr o bability of the ne x t bit simply by normalizing (1) [7 ]. Algor ithmi- cally , we would hav e to approximate (1) b y finding s hort prog rams that g enerate x (the s hortest o f which is the most proba ble). In more ge ne r al inductio n, we run all mo dels in parallel, quantifying fit-to- data, weighed by the algo rithmic probability of the mo del, to find the b est mo dels and construct distributions [7]; the commo n p oint b eing determining g o o d mo dels with high a priori pro babilit y . Finding the s hortest prog ram in gener a l is unde cidable , how ever, Levin sea rch [8] can b e used for this pur p os e . There are tw o imp ortant r e s ults ab out So lo monoff induction that w e shall mention her e. First, Solo monoff induction conv e rges very rapidly to the rea l pr obability dis tr ibution. The co n vergence theorem shows that the exp ected total square er ror is r elated only to the algo rithmic co mplexit y o f P 1 , which is independent fro m x . The fo llowing b ound [9] is discussed at length in [1 0] with a co ncise pro of: E P " n X m =1 ( P ( a m +1 = 1 | a 1 a 2 ...a m ) − P 1 ( a m +1 = 1 | a 1 a 2 ...a m )) 2 # ≤ − 1 2 ln P ( P 1 )) (2) This b ound characterizes the divergence o f the ALP solution from the r eal pr ob- ability distribution P 1 . P ( P 1 ) is the a pr iori probability of P 1 p.m.f. acco rding to our universal distribution P M . On the right hand side of (2), − ln P M ( P 1 ) is r oughly k ln 2 where k is the Kolmogor ov co mplexit y of P 1 (the le ng th of the shortest progr am that defines it), thu s the total exp ected err o r is b ounded by a cons ta n t, which guarantees that the er r or decr eases very rapidly a s ex am- ple size increases. Secondly , there is an optimal search algor ithm to approximate Solomonoff induction, which ado pts Levin’s universal search metho d to so lve the problem of universal induction [8,1 1]. Universal search pro cedure time-shar es all candidate progr ams accor ding to their a priori pro babilit y with a clever watc h- dog p olicy to a void the practical impact o f the undecidability of the halting problem [1 1]. The s earch pro cedure starts with a time limit t = t 0 , in its it- eration tr ies a ll candida te progr a ms c with a time limit of t.P ( c ), and while a solution is no t found, it do ubles the time limit t . The time t ( s ) /P ( s ) for a solu- tion progra m s taking time t ( s ) is called the Conceptual J ump Size (CJS), a nd it is ea sily shown that Levin Search terminates in at mo s t 2 . CJS time. T o o btain alternative solutions, one may keep running after the fir st so lution is found, as there may b e more probable solutio ns that need more time. The optimal solution is co mputable only in the limit, which turns out to b e a desirable prop erty of Solomonoff induction, as it is c omplete and uncomputable [12, Section 2 ]. An ex- planation of Levin’s universal sea r ch pro c edure and its a pplication to Solomonoff induction may b e found in [8,11,13]. 3 The Axiomatization of Artificial In t elligence W e belie v e in fact that Solo monoff ’s work was seminal in that he has single- handedly axiomatize d AI, discovering the minimal necessa ry conditio ns fo r any machine to attain genera l intelligence (based o n our interpretation of [1 ]). Informally , these axioms are: AI0 AI must hav e in its p ossess ion a universal co mputer M (Universalit y). AI1 AI m ust b e a ble to lea r n a n y solution expres sed in M’s c ode (Lear ning recursive s olutions). AI2 AI must use proba bilistic prediction (Bay es’ theor em). AI3 AI must embo dy in its le arning a principle of induction (Occa m’s razo r). While it may b e p ossible to give a more co mpa ct characterization, these are ultimately what is necessar y for the kind o f genera l learning that Solomo noff induction achiev es. ALP c an b e seen as a c omplete formalization of O ccam’s razor (as well as Epicurus’s principle) [14] a nd thus serve as the foundation of universal induction, ca pable of so lving all AI pro blems of sig nificance. The axioms ar e impo rtant because they allow us to assess whether a system is capa ble of g eneral intelligence or not. Obviously , AI1 e n tails AI0, ther efore AI0 is redundant, and can b e omitted ent irely , howev er w e stated it sepa rately only for historica l rea sons, as one of the landmarks of ea rly AI res earch, in retr osp ect, was the inv ention of the universal computer, which go es back to Leibniz’s idea of a universal languag e (character- istica universalis) that can express every statement in s c ience and mathematics, and has found its p erfect embo diment in T uring’s resea rch [15,16]. A related achiev ement of early AI w as the developmen t of LISP , a universal co mputer based on lambda calculus (which is a functional mo del of co mputation) that has shap ed muc h of ea rly AI resea rch. See also a r ecent survey ab out inductive inference [17] with a fo cus on Mini- m um Messag e Leng th (MML) principle introduced in 1968 [18]. MML pr inciple is also a formalization of induction developed within the framework of cla ssi- cal information theory , which establishes a trade-off b etw een mo del complexity and fit-to-data b y finding the minimal messa ge that enco des both the mo del and the data [19]. This trade- off is quite similar to the ea rlier forms of induc- tion that Solomonoff developed, howev er indep endently discov er ed. Dow e po in ts out that Occam’s razor mea ns choos ing the simples t single theory when data is equally matched, which MML formaliz e s p erfectly (and is functional otherwise in the case of inequal fits) while Solo monoff induction maintains a mixtur e of alternative so lutions [17, Sections 2.4 & 4]. On the other ha nd, the diversity of solutions in ALP is seen a s desira ble by Solomonoff himself [12], and in a recent philosophical pape r which illustra tes how Solo mo noff induction dissolves v ar ious philosophical o b jectio ns to induction [14]. Nevertheless, it is well worth men- tioning that So lo monoff induction (formal theor y published in 1964 [5,6]), MML (1968), and Minimum Descr iption Leng th [20] formaliza tions, as well a s Sta tis - tical Lear ning Theo ry [21] (initially develope d in 196 0), a ll provide a principle of induction (AI3). Howev er, it was Solo mo noff who fir st observed the imp or- tance of universalit y for AI (AI0-AI1 ). The plura lit y of proba bilistic appro aches to induction supp orts the impo rtance of AI3 (as well a s hinting that diversity of so lutions may be useful). AI2, how e ver, do es not requir e muc h e x planation. Some o b jectio ns to Bayesianism ar e answered using MML in [22]. P lease als o s ee an intruging pa per by W allace and Dowe [23] o n the relatio n b etw een MML a nd Kolmogo rov complexity , which states that So lomonoff induction is tailored to prediction rather than inference, and r ecommends non-universal mo dels in pr ac- tical work, therefore b ecomes incompatible with the AI axioms (AI0-AI1). Ulti- mately , empirical work will illuminate whether our AI axio ms should be ado pted, or more restr ictiv e mo dels ar e sufficient for universal intelligence; therefore such alternative viewp oints must b e co ns idered. In addition to this, Dow e discusses the r e la tion b etw een inductive inference and int elligence, and the requir emen ts of intelligence as we do elsewher e [1 7, Sec tion 7 .3 ]. Also r elev a n t is an ada ptiv e universal intelligence test that a ims to measure the intelligence of any AI a gent , and discuss e s v arious definitions of in telligence [2 4]. 4 Incremen tal Mac hine Learnin g In solv ing a problem of induction, the afor ement io ned search methods suffer fro m the huge computational complexity of trying to compress the entire input. F or instance, if the complexity of the p.m.f. P 1 is ab out 400 bits, Levin search would take on the order of 2 400 times the r unning time of the s olution pro gram, which is infeasible (quite imp ossible in the obser ved universe). There fo re, So lomonoff ha s suggested using an incr emental machine lea rning a lgorithm, which can re- use information fo und in previo us s o lutions [13]. The following argument illustrates the situation more cle a rly . Let P 1 and P 2 be the p.m.f.’s corr e spo nding to a training seq uence of tw o induction pro blems (any of them, no t necessarily sequence prediction, to which others can b e reduced easily) with data < d 1 , d 2 > . Assume that the first pro blem has be e n so lved (correctly) with universal sea rch. It has ta ken at mo s t 2 . CJS 1 = 2 .t ( s 1 ) /P ( s 1 ) time. If the seco nd problem is so lv e d in an incr emental fashion, making use of the information fro m P 1 , then the running time of discov er ing a so lutio n s 2 for d 2 reduces, dep ending on the success of information tr ansfer acr oss pro blems. Here, we qua n tify how muc h in familiar pr obabilistic ter ms. In [10], Solomo noff descr ib es a n info r mation theoretic interpretation of ALP , which suggests the following entrop y function: H ∗ ( x ) = − log 2 P ( x ) (3) This entropy function ha s per fect sub-additivity o f information a ccording to the corres p onding c o nditional entropy definition: P ( y | x ) = P ( x, y ) P ( x ) (4) H ∗ ( y | x ) = − log 2 P ( y | x ) (5) H ∗ ( x, y ) = H ∗ ( x ) + H ∗ ( y | x ) (6) This definition of entrop y thus do es not suffer fro m the additive constant terms as in Chaitin’s version. W e can instantly define mutual entrop y: H ∗ ( x : y ) = H ∗ ( x ) + H ∗ ( y ) − H ∗ ( x, y ) = H ∗ ( y ) − H ∗ ( y | x ) (7) which trivially follows. A KUSP ma c hine is a universal computer that can store data and metho ds in additional sto r age. In 1984, Solomono ff observed that KUSP machines are esp ecially suitable for incremental learning [11]. In o ur work [25] we found that, the incr emen tal learning a pproach was indee d use ful (as in the preceding O OPS algorithm[26]). Her e is how we interpreted incremental learning . After each in- duction problem, the p.m.f. P is up dated, thus for every new problem a new probability distribution is obtained. Although we are us ing the sa me M r eference machine for trial pr ograms, we a r e r eferring to implicit KUSP machines which store infor ma tion ab out the exp erience of the machine so far, in subse quen t prob- lems. In our exa mple o f tw o induction problems, let the up dated P b e called P ′ , naturally ther e will b e a n up date pr o cedure which takes time t u ( P, s 1 ). Just how m uch time can we exp ect to sav e if we use incre mental learning instead o f inde- pendent learning? First, let us write the time b ound 2 .t ( s ) /P ( s ) as t ( s ) . 2 H ∗ ( s )+1 . If s 1 and s 2 are not alg orithmically indep endent, then H ∗ ( s 2 | s 1 ) is smaller than H ∗ ( s 2 ). Indep endently , w e w o uld hav e t ( s 1 ) . 2 H ∗ ( s 1 )+1 + t ( s 2 ) . 2 H ∗ ( s 2 )+1 , toge ther , we will hav e, in the b est ca se t ( s 1 ) . 2 H ∗ ( s 1 )+1 + t ( s 2 ) . 2 H ∗ ( s 2 | s 1 )+1 for the sear c h time, ass uming that recalling s 1 takes no time for the latter search ta sk (which is an unr ealistic a ssumption). Ther efore in total, the la tter search task can acceler- ate 2 H ∗ ( s 1: s 2) times, and we ca n sav e t ( s 2 ) . 2 H ∗ ( s 2 )+1 (1 − 2 − H ∗ ( s 1: s 2) ) − t u ( P, s 1 ) total time in the bes t case (only an upp er b ound since we did no t account for recall time). Note that the max imum tempor al g ain is related to b oth how m uch m utual infor ma tion is discovered acr o ss so lutions (thus P i ’s), and ho w muc h time the up date pro cedure takes. Clear ly , if the up date time dominates ov era ll, incremental lea r ning is in v ain. How ever, if up dates are effective and efficient, there is enor mous p oten tial in incr emen ta l machine learning. During the ex per imental tests o f our Sto chastic Context F r ee Grammar ba sed search and up date algor ithms [25], we hav e observed that in prac tice we can r e- alize fast up dates, and we can s till achieve actual co de re-use and tremendous sp eed-up. Using only 0 . 5 tera flop/sec o f computing s peed a nd a reference ma- chine choice of R5RS Scheme [27], we solved 6 simple deterministic op era to r induction problems in 245 . 1 seconds. This running time is compared to 7150 seconds without a n y up dates. Scaled to human-level pro ces sing sp eed of 100 ter - aflop/sec, our sy stem would lea rn and solve the entire tr aining sequence in 1 . 25 seconds, which is (argua bly) b etter tha n most human students. In one pa rticu- lar op erator induction problem (fourth p ow er , x 4 ), we s aw a ctual co de re-us e: (define (pow4 x ) (define (sqr x ) (* x x)) (sqr (sqr x ) )) , and an actua l sp eedup of 27 2. The gains that we saw co nfir med the incremental lear ning pro- po sals o f Solomonoff, mentioned in a go o d num b er of his publications , but most clearly in [11,13,1]. B a sed on o ur work a nd the huge sp eedup o bserved in OOPS for a shor ter training sequenc e [26], we hav e c o me to b elieve that incremental learning has the epistemolo gical status of an additiona l AI axiom: AI4 AI must b e able to us e its pr evious exp erience to s p eed up subsequen t prediction tas ks (T ransfer Learning ). This axiom is justified b y o bserving that many universal induction problems are co mpletely unsolv able by a system that do es not hav e the adequate sort of algorithmic memory , r egardless of the sea rch metho d. The results above may b e contrasted with inductiv e pro gramming approaches, since we predicted deterministic functions. One of the earliest and mos t s uccess- ful inductive pro gramming s y stems is ADA TE, which is optimized for a more sp ecific purp ose. ADA TE sys tem has yielded impr essive results in an ML v ar iant by user supplied primitives and co nstraining candidate prog rams [28]. Universal representations have b een inv estig ated in inductive logic pro gramming a s well [29], howev er U-lea rning unfortunately la cks the e xtremely acc ur ate gener aliza- tion of Solomonoff induction. It has b een shown that incremental lea rning is useful in the inductive prog ramming fr amework [30], which supp o rts our obser - v a tio n of the neces sit y of incr emen ta l machine learning . Another r elev ant work is a typed higher -order logic knowledge repr esentation scheme ba sed on term representation of individuals and a r ic h representation languag e encompass ing many abstract data t y pes [3 1]. A r ecent s ur vey on inductive pro gramming may be found in [32]. W e should als o account o ur br ie f c o rresp ondence with Solomono ff. W e ex- pressed that the prediction alg orithms were p ow er ful but it seemed that mem- ory was not used sufficiently . So lomonoff resp onded by mentioning the p otential sto chastic gr ammar a nd genetic pro gramming appro aches that he was working on a t the time. Our present research was motiv ated by a problem he posed during the dis cussions of his s eminars in T uring Days ’0 6 at Bilgi University , Istanbul: “ W e can use grammar induction for upda ting a sto chastic context free grammar , but there is a problem. W e a lr eady know the g rammar of the refer- ence machine.”. W e designed our incremental learning a lgorithms to addr ess this particular problem 2 . So lomonoff has also guided our resear ch by making a v a lu- able sugges tion, that it is more imp orta n t to show whether incremental le a rning works over a seque nc e o f simpler problems than so lving a difficult problem. W e hav e in addition inv estigated the use of P PM family of compress o rs fo llowing his prop osal, but as we exp ected, they were not sufficient for g uiding LISP-like pr o- grams, and would requir e to o many changes. Therefor e, we pro ceeded directly to the s implest kind of guiding p.m.f. that would w o rk for Scheme, as we pr e ferred not to work on assembly-like lang uages for which PPM might b e appro priate, since, in our opinion, high- lev el la nguages embo dy mor e technological pro gress (see a lso [33] which employs a Scheme subset). Color fully sp eaking, inv ent ing a functional form in assembly might b e like re- in venting the wheel. How ever, in general, it would not b e trivial for the induction s y stem to inv e nt syntax for ms that co mpare fav ora bly to LISP , esp ecially during preliminary training. There- fore, much in telligence is alr e a dy present in a high-level universal computer (AI0) which we simply take a dv antage of. 5 Cognitiv e Ar c hitecture Another imp ortant discussion is whether a cognitive architecture is necessa ry . The axiomatic approach was seen coun ter- pro ductive b y some leading rese archers in the pa s t. How ever, we think that their opinio n can be expresse d as follows: the minimal pr ogram that re alizes these a xioms is not automatica lly intelligen t, bec ause in practice a n intelligen t s ystem requir es a go o d deal of algor ithmic in- formation to take o ff the gr o und. This is no t a bad arg umen t, since obviously , the human brain is w e ll equipp ed genetica lly . Ho wev er, we cannot either r ule out that a somewhat compact sy stem may achieve human-level genera l intelligence. The question ther efore, is whether a simply describ ed system lik e AIX I [34] (an extension o f Solomonoff induction to reinfor cement learning) is sufficient in pr actic e , o r there is a need for a mo dular/extensible cognitive architecture that has b een designed in pa rticular wa ys to pro mote certa in kinds o f mental growth and op eration. Some prop onents of genera l purp ose AI resear c h think that such a cog nitiv e a rchit ecture is necess ary , e.g., Op enCog [3 5]. Schmidh ub er has sug- gested the famous G¨ odel Machine which has a mechanical mo del of machine consciousness [36]. Solo mo noff himself has prop osed ear ly on in 2002 , the design of Alpha, a gener ic AI architecture which can ultimately solve free-for m time- limited optimization pro blems [1 3]. Although in his later works, So lo monoff has not made muc h mention o f Alpha and has instead fo cused on the pa rticulars of the requir ed basic induction and learning capability , nonetheless his prop osal remains as one of the most extensible and elegant self-improving AI designs. 2 W e o ccassional ly corres p ond ed via e-mail. Before the AGI-10 conference, h e h ad review ed a draft of my pap er, and he had commented that the “learning p rogram- ming idioms” and “frequent subprogram mining” algorithms were in teresting, which w as all th e encouragemen t I n eeded. The last e- mail I received from him was on 11/Oct/2009 . I regretfully learnt that h e passed aw ay a month later. H is indep en- dent c haracter and true scientific spirit will alw ays b e a shining b eacon for m e. Therefore, this p oint is op en to deba te, though some res earchers may wan t to assume a nother, entirely optiona l, axiom: AI5 AI must b e arr a nged such that self-improv ement is fea s ible in a realistic mo de o f op eration (Cognitive Architecture). It is doubtful for instance whether a combination of incremental learning and AIXI will result in a prac tical reinforcement learning agent. Neither is it well understo o d whether autono mous systems w ith built-in utility/goal functions are suitable for all practica l purp oses. W e anticipate that such questions will b e set- tled by e x per imen ter s , a s the co mplex it y of interesting exp eriments will quickly ov ertake theor etical analy sis. W e do not conside r hum a n-like b ehavior, or a r ob otic bo dy , or an autonomo us AI design, such as a goa l-driven or r e inforcement-learning ag en t, essential to int e llig ence, hence we did not pr op ose autono m y or embo diment as an axiom. Solomonoff has co mmen ted likewise o n the preferred target applications [3 7]: T o start, I’d like to define the scope o f m y in terest in A.I. I am not particularly interested in simulating human b ehavior. I am interested in creating a machine that ca n work very difficult pro ble ms muc h b etter and/or fas ter than humans can – a nd this machine should b e embo died in a technology to which Mo ore’s Law a pplies. I would like it to g ive a better understanding of the r e lation of q ua n tum mechanics to gene r al relativity . I would like it to discov er cures for cancer and AIDS. I would like it to find s ome v e ry go o d high temp erature s uper conductors. I would not be disa ppointed if it were unable to pass itself off as a ro ck star . 6 Philosophical F oundation and Consequences Solomonoff ’s AI theory is fo unded on a wealth of philosophy . Here, we shall briefly r evisit the philosophica l fo unda tion of ALP and p oint o ut so me of its philosophical consequences. In his pos th umous publicatio n, So lo monoff mentions the inspiration for some of his work: Car nap’s idea tha t the state of the world can b e repr esented by a finite bitstring (and that sc ie nce predicts future bits with inductive inference), T uring’s universal computer (AI0) a s co mm unicated by Minsky and McCar th y , and Choms ky’s g enerative g rammars [12]. The dis- cov ery of ALP is describ ed by Solomono ff in quite a bit of detail in [38], which relates his discovery to the background of many pr ominent thinkers and co n- tributors. Carnap’s empiricism seems to hav e b een a highly influential factor in Solo monoff ’s resear ch as he sought to find how scienc e is ca rried out, rather than particular s cien tific findings; and ALP is a satisfactory solution to Carnap’s progra m o f inductive inference [1 4]. Let us then recall so me philosophica lly relev ant a spe c ts of ALP discussed in the most recent publications of Solomono ff. First, the exa ct same metho d is used to solve b oth mathematical and scientific problems. This means that there is no fundamental epistemologica l differ e nce b et ween these pro ble ms ; our int e r- pretation is that, this is well founded o nly when we obse r ve that ma thematical problems themselves a re computationa l or linguis tic problems , in practice ma th- ematical proble ms can b e r educed to par ticular computational pro blems, and here is why the sa me metho d works for b oth kinds o f pro blems. Mathematica l facts do no t pr eside ov er or precede physical facts, they themselves are so lutions of physical pro blems ultimately (e.g., do es this particula r kind of machine ha lt or not?). And the substance of mathematics , the lucid sor t of ma thematical lan- guage and concepts that we hav e inv ented, can b e fully e x plained by Solomono ff induction, as thos e are the kinds of useful pr o gr ams , which have aided an int e lle ct in its tr a ining, and therefor e are retained as linguistic and algorithmic informa- tion. The s ub jectivity and diversity as pects of ALP [12, Sections 3 & 4 ] fully explain w hy there can be m ultiple and almost equally pro ductive foundations of mathematics, as those mer ely p oint out somewha t equally useful formalisms inv ented by different mathematicia ns . There is a bs olutely nothing sp ecial a bo ut ZFC theo r y , it is just a forma l theory to e x plain some useful pro cedur es that we p erform in our heads, i.e., it is more like the logica l explanatio n o f a set mo dule in a functional pr o gramming languag e than anything else, howev er, the op erations in a mathematician’s brain ar e no t visible to their owner, thereby leading to useless Plato nist fantasies o f so me mathematicia ns owing to a dea rth of philosophica l imag ination. Ther efore, it do es not matter muc h whether one prefers this or that formalization of s e t theory , or category theory as a foun- dation, unless tha t c ho ic e restricts success in the solution of future scien tific problems. Since, s uc h a problematic scientific situation do es not seem to hav e emerged yet (forc ing us to choos e among particular forma lizations), the diver- sity pr inciple of ALP forces us to retain them a ll. Tha t is to say , subscribing to the ALP vie wpoint has the unexp ected consequence that we abandon b oth Platonism and F orma lism. T he r e is a meaning in formal langua ge, in the manner which improv es future predictions , how ever, there is not a single a prio ri fac t, in addition to empirical o bserv ations, and no such fact is ever needed to co n- duct empirica l work, except a prop er realiza tio n of a xioms A1–A3 (and surely no sane sc ie ntist would accept that there is a unique a nd empty set that exists in a hidden order of r e alit y ). When we consider these a xioms, we need to un- derstand the universalit y of computation, and the principled manner in which we hav e to employ it for relia ble induction in our s cien tific inquiries. The o nly ph ysically relev ant assumption is that of the co mputabilit y of the distributions which generate our empirical problems (regar dless of whether the problem is mathematical or sc ien tific), and the choice of a universal computer which intro- duces a ne c essary s ub jectivity . The computability a spe c t may b e interpreted as information fin itism , all the problems that we can work with should have finite ent ropy . Y et, this restr iction on dis o rder is no t at all limiting, for it is hardly conceiv able how one may w is h to so lv e a problem of actually infinite complexity . Therefore, this is no t muc h of an assumption for scientific inquiry , esp ecially given that b oth quantum mechanics and general relativity can be describ ed in computable mathematics (see fo r insta nc e [39] ab out the applicability of com- putable mathematics to quantum mechanics). And neither ca n one hop e to find an exa mple of a single scientifically v alid problem in any textb oo k of science that r equires the existence of distributions with infinite co mplexit y to s olve. With reg ards to genera l epistemolog y , ALP/ AIT may b e s een as lar g ely in- compatible with non-reductionism. Non-reductionism is quite mislea ding in the manner it is usually conv eyed. Instead, we must seek to understand ir reducibil- it y in the sense of AIT, of quantifying alg orithmic infor mation, which a llows us to reco ncile the concept of irr educibilit y with physicalism (which we think every empiricist should accept) [40]. In particular, we can partially formalize the no tion of knowledge by mutual infor mation b etw een the world and a brain. Our pa per prop osed a physical so lution to the pr oblem of determining the mos t “ob jective” universal computer: it is the universe itself. If digital physics were true, this might b e for instance a par ticular kind of gra ph a utomata, o r if quan- tum mechanics were the basis, then a univ ers al quantum computer could b e used; how ever, for many tasks using such a lo w-level computer might be ex- traordina rily difficult. W e also arg ued that extreme no n- reductionism leads to arguments from ignora nce such a s o n to logical dualism, and informatio n theor y is muc h b etter suited to e xplaining evolution and the need for a bstractions in our langua ge. It sho uld a lso b e obvious that the ALP solution to AI e x tends the tw o ma in tenets of lo gical p ositivism, which are verificationism and unified science, as it gives a finite cognitive pro cedure with which one ca n conduct all empirical work, and allows us to develop a priv ate langua ge with which we ca n describ e all of science and mathematics . How ever, we s hould also mention that this strengthened p ositivism do es not require a strict ana ly tic-synthetic distinc- tion; a sp ectrum of analytic-s y n thetic distinction as in Quine’s philosophy seems to be acceptable [41]. W e hav e already seen that acc o rding to ALP , mathemat- ical and scientific pro ble ms hav e no re a l distinction, therefo r e like Q uine, ALP would allow r evising e ven mathematical logic itself, a nd we need not remind that the c oncept of universal computer itself has not app eared out of thin air, but has b een inv ented due to the la bo rious mental work of scientists, as they ab- stracted from the mechanics o f p erforming mathematics ; at the b ottom these ar e all empirical pro blems [42]. On the other hand, a “ w eb of b e lie f ” as in Quine , by no means sug gests non-r eductionism, for that could b e true only if indeed there were phenomena that had unscathable (infinite) co mplexit y , such as T ur- ing oracle ma chines which were no t pro po sed as physical machines, but only as a hypo thetical concept [16]. Quine himself was a physicalist; we do not think that he would supp ort the la ter vendetta ag ainst r e ductionism which may b e a misunderstanding of his holis m. Though, it may b e a r gued tha t his o bs cure version o f Platonis m, which do es not s eem muc h scien tific to us, may be the culprit. T o day’s Bay esia n netw o rks seem to b e a go o d formalizatio n of Quine’s web of b elief, and his instr umen talis m is consis ten t w ith the ALP appr o ach of maintaining useful prog rams. The r efore, on this account, psychology oug h t to b e reducible to neuro ph y siology , as the concept of life to molecular biology , b ecaus e these ar e a ll ultimately sets of problems that overlap in the ph y sical world, and the relatio n betw een them ca nnot hold an infinite amount of informatio n; which would req uire an infinitely complex lo ca l environmen t, and that do es not seem consistent with o ur scientific observ ations. That is to say , discovery of bridge disciplines is p ossible a s exemplified by qua ntum chemistry and mole c ula r bi- ology , and it is not different from a n y other kind o f empir ical work. Recently , it has b een p erha ps b etter understo o d in the p o pular culture that creatio nism and no n-reductionism a re almost synonymous (rega rding the cla ims of “intelli- gent design” that the fla gella of bacter ia ar e to o complex to hav e evolv ed). Note that ALP has no qualms with the sta tis tica l b ehavior of quantum sys tems, as it allows non-deter minism. Moreover, the particular kind of irr e ducibilit y in AIT corres p onds to weak emerg e n tism, a nd most certainly contradicts with s trong emergentism which implies sup ernatura l even ts. Please see also [17, Section 7] for a discussio n of philosophica l pr oblems rela ted to algor ithmic co mplexit y . 7 In tellectual P rop ert y T o wa rds Infinit y Poin t Solomonoff has prop osed the infinity p oint hypothesis, a lso known as the singu- larity , a s an exp onentially acce le r ating technological prog r ess caused by h uman- level AI’s that complement the scientific communit y , to a ccelerate our prog r ess ad infinitum within a finite, s ho rt time (in prac tice o nly a finite, but significant factor of improv ement co uld b e ex pected) in 198 5 [4 3] (the fir s t pap er on the sub ject). Solomono ff has pro po s ed seven milestone s of AI development : A: mo d- ern AI phase (195 6 Dartmouth conference), B: ge neral theory o f problem solv ing (our interpretation: Solo monoff Induction, Levin Se a rch), C: self- impr oving AI (our interpretation: Alpha architecture, 2 002), D: AI that can understand En- glish (our interpretation: no t r ealized yet), E: hum an-level AI, F: a n AI a t the level of entire c o mputer science (CS) co mm unity , G: an AI ma ny times s marter than the entire CS communit y . A weak condition fo r the infinity p oint may b e obtained b y an economic ar - gument, a ls o covered in [43] briefly . The human br ain pro duces 5 teraflops/ w a tt roughly . The current inca rnation of NVIDIA’s General Purp ose Graphics Pro - gramming Unit architectures called F ermi achiev es ab out 6 gigaflops /watt [44]. Assuming 85% improv ement in p ow er efficiency p er year (as seen in NVIDIA’s pro jections), in 12 y ea rs, human-lev el ener gy efficiency of computing will be achiev ed. After that date, even if mathematical AI fails due to an unfor eseen problem, we will b e able to run our bra in simulations faster than us, using les s energy than h umans , effectively cr e ating a bio-informa tion based AI whic h meets the basic r equirement of infinit y p oint. F or this to o ccur, whole br ain simulation pro jects must b e co mpr ehensive in o p era tion and efficient eno ugh [4 5]. O ther- wise, human-level AI’s that we will cons truct should match the co mputational efficiency o f the human brain. This weaker condition rests o n an e conomic obser- v a tio n: the econo mic incentiv e o f cheaper intellectual work will dr ive the prolif- eration of pe r sonal use of brain simulations. According to NVIDIA’s pro jections, th us , we can exp ect the neces s ary conditions for the infinity p oint to materia liz e by 202 3, a fter which p oint technological progr ess may acceler ate very ra pidly . According to a recent pa per by Ko omey , the energy efficiency of computing is doubling every 1 . 5 years (ab out 60% p er year), regardles s of architecture, which would set the date at 2026 [46]. Assume that we are progressing towards the h yp othetical infinit y point. Then, the ent ire human civilization may b e viewed as a glo bal intelligence work- ing o n technological problems. The pr actical necessity of incremental learning suggests that when faced with more difficult problems, b etter info r mation sha r- ing is required. If no information shar ing is pr esent b e t ween rese a rchers (i.e., different sear ch progra ms), then, they will lose time traversing ov erlapping pro - gram subspa ces. This is most clear ly seen in the case of simultane ous inventions when an idea is said to b e “up in the air ” and is inv ented by multiple, indep en- dent par ties on near dates. If intellectual prop erty (IP) laws are to o rig id and costly , this would entail tha t there is minimal information sharing, and a fter some p oint, the globa l efficiency of s olving no n-trivial tec hno lo gical pro blems would b e severely ha mper ed. Therefore, to utilize the infinity p oint effects b et- ter, knowledge sharing must b e enco uraged in the so ciety . Maximum efficiency in this fashion can b e provided by fre e softw a re licenses, and a reform of the patent system. Our view is that no single compa ny or organiza tion can (or s ho uld) hav e a monop oly on the knowledge resources to attack pro blems with truly larg e a lgo- rithmic complexity (monop oly is mos tly illegal pres e ntly a t any rate). W e tend to think that sharing science a nd technology is the most e fficien t path tow ar ds the infinity p oint. Naturally , free s oft ware philo sophy is not acceptable to muc h commercial enterprise, thus we suggest that as technology adv ances, the ov er - head of enforcing IP laws ar e taken into a ccount. If technology starts to adv ance m uch mo re ra pidly , the duration of the IP protection may b e s hortened, for in- stance, as after the AI milestone F, the bureaucr acy and res trictions of IP law may b e a serious b ottleneck. 8 Conclusion W e hav e mentioned diverse co nsequences of ALP in ax iomatization o f AI, phi- losophy , and tec hnologic a l s o c iet y . W e have also related our own resear c h to Solomonoff ’s prop osa ls . W e in terpr et ALP and AIT a s a fundamentally new world-view which allows us to bridge the ga p betw een complex natural phe- nomena and p ositive sciences more closely than ever. This para digm shift has resulted in v arious br eakthrough applica tions and is likely to b enefit the so ciety in the for e s eeable future. Ac kno wl edgements W e thank anonymous reviewers, Da vid Dow e and La urent Orseau for their v alu- able co mments, which substantially improv e d this pap er. References 1. Solomonoff, R.J.: Progress in in cremen tal mac hin e learning. T ec hnical Rep ort IDSIA- 16-03, I DSIA, Lugano, Switzerland (2003) 2. Chaitin, G.J.: A theory of program size formally identical to information theory . J. ACM 22 (1975) 329–340 3. Solomonoff, R .J.: An ind u ctive inference mac hine. Dartmouth Summer Research Pro ject on Artificial Intelligence (1956) A p riv ately circulated rep ort. 4. Solomonoff, R .J.: An inductive inference machine. In: I RE National Conv entio n Record, Section on Information Theory , Part 2, New Y ork, USA (1957) 56–62 5. Solomonoff, R .J.: A formal th eory of inductive inference, part i. Information and Con trol 7 (1) (1964) 1–22 6. Solomonoff, R.J.: A formal theory of inductive inference, p art ii. Information an d Con trol 7 (2) (1964) 224–254 7. Solomonoff, R .J.: Three kind s of p robabilistic induction: Universal distributions and conv ergence th eorems. The Computer Journal 51 (5) (2008) 566–570 Christo- pher S t ew art W allace (1933-2004) memorial sp ecial issue. 8. Levin, L.A.: Universal sequential searc h p roblems. Problems of Information T rans- mission 9 ( 3) (1973) 265–266 9. Solomonoff, R .J.: A lgorithmic probability: Theory and applications. In Dehmer, M., Emmert-Streib, F., eds.: Information Theory and Statistical Learning, Springer Science+Business Media, N .Y . (2009) 1–23 10. Solomonoff, R .J.: C omp lexit y -based induction systems: Comparisons and conv er- gence th eorems. IEEE T rans. on I nformation Theory IT-24 (4) (1978) 422–432 11. Solomonoff, R.J.: Optimum sequ entia l search. T echnical rep ort, Oxb rid ge R e- searc h, Cambridge, Mass., US A (1984) 12. Solomonoff, R.J.: Algorithmic Probabilit y – Its Disco very – It s Prop erties and Application to Strong AI . In: Randomness Through Computation: Some Answers, More Questions. W orld Scientific Pub lishing Company (2011) 149–157 13. Solomonoff, R.J.: A system for in cremental learning based on algorithmic p roba- bilit y . In: Proceed in gs of the Sixth I sraeli Conference on Art ifi cial Intellig en ce, T el Aviv, I srael (1989) 515–527 14. Rathmanner, S., Hutter, M.: A philosophical treatise of universal induction. En- tropy 13 (6) (2011) 1076–113 6 15. Davis, M.: The Universal Computer: The Road from Leibniz to T uring. W. W. Norton & Company (2000) 16. T uring, A.M.: On co m p utable n umbers, wi th an applicatio n to the entsc hei- dungsproblem. Proceedin gs of the Lond on Mathematical So ciet y s2-42 (1) (1937) 230–265 17. Dow e, D.L.: MML, hybrid Bay esian n et work graphical mo dels, statistical consis- tency , inv ariance and uniquen ess. In: H andb ook of th e Ph ilosoph y of Science - (HPS V olume 7) Philosophy of Statistics. Elsevier (2011) 901–982 18. W allace, C.S., Boulton, D.M.: A information measure for classification. Computer Journal 11 (2) (1968) 185–194 19. W allace, C.S.: Statistical and I nductive Inference by Minimum Message Length. Springer, Berlin, Germany (2005) 20. Barron, A., Rissanen, J., Y u, B. In: The minimum d escription length principle in cod ing and mo deling (invited pap er). IEEE Press, Piscataw ay , NJ, USA ( 2000) 699–716 21. V apnik , V .: Statistical Learning Theory . John Wiley and Sons, NY (1998) 22. Dow e, D .L., Gardner, S., Oppy , G.: Ba yes not bust! why simplicit y is no problem for baye sians. The British Journal for the Philosoph y of S cience 58 (4) (2007) 709–754 23. W allace, C.S., Dow e, D .L.: Minimum message length and kolmogoro v complexity . The Comput er Journal 42 (4) (1999) 270–283 24. Hernndez- Orallo, J., Dow e, D.L.: Measuring un ive rsal intelligence: T ow ards an anytime intelligence t est. A rtificial Intelligence 174 (18) (2010) 1508 – 1539 25. ¨ Ozkural, E. : T o w ards h euristic al gorithmic memory . In Schmidh ub er, J. , Th´ orisson, K.R., Lo oks, M., eds.: AGI. V olume 6830 of Lectu re N otes in Computer Science., Springer (2011) 382–387 26. Schmidh ub er, J.: Op timal ordered problem solver. Machine Learning 54 (2004) 211–256 27. Richard Kelsey , William Clinger, J.R.: Revised5 rep ort on the algo rith m ic language sc h eme. Higher-Order and Symb olic Computation 11 (1) (1998) 28. Olsson, J.R.: Indu ctiv e functional programming using incremental program t ran s- formation. A rtificial Intelligence 74 (1995) 55–83 29. Muggleton, S., Page, C.: A learnability mo del for universal representations. In: Proceedings of the 4th International W orkshop on Ind uctive Logic Programming. V olume 237., Citeseer (1994) 139–160 30. F erri-Ram ´ ırez, C., Hern ´ andez-Orallo, J., Ramirez-Quintana, M.: Incremental learn- ing of functional logic programs. FLOPS ’01: Pro ceedings of the 5th International Symp osium on F un ctional and Logic Programming ( 2001) 233–247 31. Bo w ers, A., Giraud-Carrier, C., Llo y d , J., SA, E.: A kn o wledge representation framew ork for inductive learning (2001) 32. Kitzelmann, E.: Inductive programming: A survey o f program syn thesis techniques. Approaches an d Ap plications of Indu ctive Programming (2010) 50–73 33. Lo oks, M.: Scalable estimation-of-distribution program evol u tion. In: Pro ceedings of the 9th annual conference on Genetic and evolutionary compu tation. (2007) 34. Hutter, M.: Universal algorithmic in telligence: A mathematical top → down ap- proac h . In Goertzel, B., Pennac h in, C., eds.: Artificial General Intelligence. Cog- nitive T echnologies. Sp ringer, Berlin (2007) 227–290 35. Goertzel, B.: Op encogprime: A cognitive synergy based arc hitecture for artificial general intell igence. In Baciu, G., W ang, Y., Y ao, Y., Kinsner, W., Chan, K., Zadeh, L.A., eds.: IEEE ICCI, IEEE Computer S ociety (2009) 60–68 36. Schmidh ub er, J.: Ultimate cognition ` a la G¨ odel. Cognitiv e Computation 1 (2) (2009) 177–193 37. Solomonoff, R .J.: Mac h ine learning - past and future. In: The Dartmouth Artificial Intellig en ce Conference. (2006) 13–15 38. Solomonoff, R.J.: The d isco very of algorithmic probability . Journal of Computer and Sy stem Sciences 55 (1) (1997) 73–88 39. Bridges, D., Svozil, K.: Constructive mathematics and quantum physics. Interna- tional Journal of Theoretical Physics 39 (2000) 503–515 40. ¨ Ozkural, E.: A compromise b etw een reduct ionism and non-reductionism. In: W orldviews, Science and Us: Philosophy and Complexity . W orld Scientific Bo oks (2007) 41. Quine, W.: Two d ogmas of empiricism. The Philosophical Review 60 (1951) 20–43 42. Chaitin, G.J.: Tw o philosophical applications of algorithmic in formation theory . In C. S. Calude, M. J. D inneen, V.V., ed.: Proceedings DMTCS’03. (2003) 43. Solomonoff, R .J.: The time scale of artificial intellig en ce: Refl ections on so cial effects. Human Systems Management 5 (1985) 149–153 44. Glasko wsk, P .N.: Nvidia’s fermi: The fi rst complete gpu computing architecture (2009) 45. Sandb erg, A., Bostrom, N .: W hole brain em u lation: A roadmap. T echnical rep ort, F uture of Hu manit y Institute, Oxford Universit y (2008) 46. Ko omey , J.G., Berard, S., Sanchez, M., W ong, H.: Implications of historical trends in the electrical efficiency of compu ting. IEEE Annals of the H istory of Computing 33 (2011) 46–54
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment