Research Priorities for Robust and Beneficial Artificial Intelligence
Success in the quest for artificial intelligence has the potential to bring unprecedented benefits to humanity, and it is therefore worthwhile to investigate how to maximize these benefits while avoiding potential pitfalls. This article gives numerou…
Authors: Stuart Russell (Berkeley), Daniel Dewey (FHI), Max Tegmark (MIT)
Researc h Priorities for Robust and Beneficial Artificial In telligence ∗ Stuart Russell, Daniel Dew ey , Max T egmark Computer Scienc e Division, University of Calif ornia, Berkeley, CA 94720 Dept. of Physics & MIT Kavli I nstitute, Massachusett s Institute of T e chnolo gy, Cam bridge, MA 02139 and F utur e of Humanity Institut e, Oxfor d University, 16 -17 S t. Ebb e’s st r., Oxfor d OX1 1PT, UK Success in the quest for artificial intelligence has the p otential to bring unp recedented benefi t s to humanit y , and it is therefore w orthwhile to in vestigate ho w to maximize these ben efits while av oiding p otentia l p itfalls. This article gives n umerous ex amples (which should by no means b e construed as an exhaustive list) of such worth while researc h aimed at ensuring that AI remains robust and b eneficial. Artificial intelligence (AI) r esearch ha s explore d a v a- riety of pro blems and approa ches since its inception, but for the last 20 y ears o r so has bee n fo cused on the prob- lems surr ounding the construction of int el ligent agents – systems that p erceive and act in so me environmen t. In this context, the criterion for intelligence is r elated to sta - tistical and economic no tions of r ationality – collo quially , the ability to make go o d decisio ns, plans, or inferenc e s. The adoption of probabilistic r epresentations and s tatis- tical learning metho ds has led to a large degree of inte- gration and cro s s-fertilization betw een AI, machine learn- ing, statistics, control theory , neuroscience, and other fields. The establishmen t of shar ed theor etical fra me- works, co m bined with the av ailability of data and pro- cessing p ow er, has yielded remar k able succes ses in v ar i- ous comp onent task s such as sp eech recognition, ima ge classification, autonomous vehicles, mac hine translation, legged loc omotion, a nd question-answering systems. As capabilities in these areas and others cros s the threshold from lab ora to ry resear ch to eco no mically v alu- able technologies, a v irtuous cycle ta kes hold whereby even small improv ements in p erfor mance a re w orth large sums of money , pr o mpting greater inv estments in re- search. Ther e is now a broad c o nsensus that AI research is pr ogressing steadily , and tha t its impact o n so ciety is likely to increase. The p otential benefits are huge, since everything that civilization has to offer is a pro duct of hu man intelligence; w e cannot predict what w e mig ht achiev e when this intelligence is ma g nified by the to ols AI may provide, but the er a dication of disease a nd povert y are not unfathomable. Because of the great p otential of AI, it is v a luable to inv estigate how to rea p its b enefits while av oiding p otential pitfalls. Progr ess in AI r esearch ma kes it timely to fo cus re- search not only on making AI more capable, but als o on maximizing the so cietal b enefit o f AI. Such considera- tions motiv ated the AAAI 2008 –09 Presidential Panel on Long-T erm AI F utures [1] and o ther pro jects a nd com- m unity efforts on AI’s future impacts. These constitute ∗ Published in AI Magazine 36 , No 4 (2015) : http://t inyurl.com/ rbaipaper . T his article give s examples of the t ype of researc h advocated by the Open Letter at http://f utureoflife .org/ai- open- letter a significant expansio n o f the field of AI itself, which up to now has fo cused la rgely on techniques that are neutral with resp ect to purp ose. The present do cument can b e viewed as a natura l co ntin ua tion of these e fforts, fo cusing on iden tifying research directions that c an help maximize the so cietal b enefit of AI. This r e search is by necess it y int erdisciplinar y , because it inv olves bo th so ciety and AI. It rang es from econo mics, law, and philosophy to com- puter sec ur ity , formal methods and, of cour se, v arious branches of AI itself. The fo cus is o n delivering AI that is b eneficial to so c ie t y a nd r obust in the sense that the bene fits a re guar anteed: o ur AI systems must do what we wan t them to do. This do cument was drafted with input fro m the atten- dees o f the 2 015 co nference “The F utur e of AI: Opp or- tunities and Challenges ” 1 (see Ackno wledgements), and was the basis for an op en letter that has collected ov er 8,000 sig natures in supp or t of the r esearch priorities out- lined here. I. SHOR T-TERM RESEARCH PRIORI TIES A. Optimizing AI’ s Economic Impact The successes of industria l applications o f AI, from manuf acturing to informa tion services, demons trate a growing impact on the economy , although there is dis- agreement ab out the e x act nature o f this impact a nd o n how to dis tinguish b e tw een the effects o f AI and thos e of other information technologies. Ma ny eco nomists a nd computer scientists a gree that there is v aluable research to b e done o n how to maximize the economic benefits of AI while mitigating adverse effects, which could include increased inequa lit y and unemplo ymen t [2 – 8]. Such co n- siderations motiv ate a rang e of res e arch dir ections, span- ning areas from economics to psy chology . Below are a few ex amples that should by no means be in terpreted a s an exhaustiv e list. 1 More details about the conference, including many of the talks, are av ailable at http://tinyu rl.com/beneficialai . 2 1. Lab or mark et forecasting : When and in what order sho uld we e xpe c t v arious jobs to b ecome au- tomated [4]? Ho w will this affect the wages of le ss skilled workers, the c reative professions, and differ- ent kinds of information w orkers? So me ha ve hav e argued that AI is lik ely to g reatly increase the ov er- all w ealth of humanit y as a who le [3 ]. How ev er, increased a utomation ma y pus h income distribu- tion further towards a p ower law [9], and the re- sulting disparity ma y fall dis pr op ortionately alo ng lines of ra ce, class, and gender; re search anticipat- ing the economic a nd s o cietal impact of such dis- parity could be useful. 2. Other mark et dis ruptions : Significa n t pa rts o f the economy , including finance, insurance, actuar- ial, and many consumer ma rkets, could be suscepti- ble to disruption through the use o f AI techniques to learn, model, and predict h uman and mark et behaviors. These mar kets migh t be identified b y a combination of high c o mplexity and high re w ards for na vigating that complexity [8 ]. 3. P olicy for managing adv erse effects : What po licies could help incr easingly automated so cieties flourish? F or e x ample, Brynjolfsson and McAfee [3] ex plore v arious p olicies for incen tivizing de- velopmen t of lab or-intensive s ectors and for us- ing AI-generated wealth to supp ort undere mployed po pulations. What are the pr os and cons of in- terven tions such as educatio nal reform, appren- ticeship progr ams, lab or-dema nding infrastructur e pro jects, and changes to minim um wage law, tax structure, a nd the soc ia l safety net [5]? His- tory provides many examples of subpopulations no t needing to work for economic security , ranging from aristo cra ts in a nt iquity to many pr esent-da y citi- zens of Qa tar. What s o cietal structures a nd other factors deter mine w he ther such p opulations flour- ish? Unemploymen t is not the same as leisure, and there are deep links b etw een unemplo yment and unhappiness, s elf-doubt, and iso lation [10, 11]; understanding what p o lic ies and norms can break these links could significa n tly improv e the median quality of life. E mpirical and theor etical resea rch on to pic s such as the basic income pr op osal could clarify our optio ns [12, 13]. 4. Economic m easures : It is p ossible that eco nomic measures such as rea l GDP p er capita do not ac- curately capture the b enefits and detr imen ts of heavily AI-and-automatio n- based economies, mak- ing these metrics unsuitable for p olicy purp oses [2]. Research on improv ed metrics could b e useful for decision-making . B. La w and Ethics Rese arc h The developmen t of systems that embo dy significant amounts o f intelligence and autonom y leads to imp or- tant legal and ethical questions whose answers impact bo th pro ducer s a nd consumers of AI technology . The s e questions s pan la w, public policy , professio nal ethics, and philosophical ethics, a nd will require expe r tise from com- puter s c ient ists, lega l exp erts, p olitical scien tists, and ethicists. F or example: 1. Liabil it y and l a w for autonomo us v ehicles : If se lf-dr iving ca rs cut the r oughly 40,0 00 annual US traffic fatalities in ha lf, the car ma kers might get not 20,000 thank-y ou notes, but 2 0,000 la w- suits. In what leg al framework can the safet y bene fits of a utonomous vehicles such as dr o ne air- craft and self-driv ing cars b est b e realize d [14]? Should legal questions abo ut AI b e handled b y ex - isting (softw are- a nd internet-focuse d) “cyb erlaw”, or should they b e tr eated separately [15]? In bo th military and commer c ial applications, gov ernments will need to decide how b est to bring the relev an t exp ertise to bea r; for example, a panel or commit- tee o f profess ionals and aca demics could b e created, and Ca lo has prop osed the cr eation of a F eder a l Rob otics Commission [16]. 2. M ac hine ethics : Ho w should an autonomous ve- hicle trade off, say , a s mall pro bability of injury to a h uman against the near-certaint y of a large material co st? How sho uld lawy ers, e thicis ts, and po licymakers eng a ge the public on these issues? Should such trade-o ffs b e the sub ject of national standards? 3. Autono m ous w eap ons : Can lethal a utonomous weapons be made to comply with h umanitarian law [1 7]? If, as some orga nizations hav e suggested, autonomous weapons s hould b e banned [18], is it po ssible to develop a precise definition of auton- omy for this purp o s e, and ca n such a ban practi- cally b e enforced? If it is p ermis sible or legal to use lethal autonomous weapons, how should these weapons be int egra ted into the existing command- and-control structure so that resp onsibility and li- ability remain as so ciated with sp e cific hu man ac- tors? What tec hnical r ealities and fo recasts should inform these questions, a nd how should “meaning- ful human control” ov er weap ons b e defined [1 9– 21]? Are autonomous w eap o ns likely to reduce p o- litical av ersion to conflict, or p erhaps res ult in “ ac- cident al” battles or w ars [22]? W ould suc h weapons bec ome the to ol o f choice for oppr essors or ter ror- ists? Finally , how can transpa r ency and public dis- course bes t be encoura ged on these issues? 4. Priv acy : Ho w should the abilit y of AI sys tems to in terpret the data obtained from surv eillance 3 cameras, phone lines, e mails, etc. , interact with the rig h t to priv acy? Ho w will priv a cy risks inter- act with cyber security a nd cy b er warfare [23]? Our ability to take full adv antage of the synergy be- t ween AI and big data will dep end in part o n our ability to manage and pres e rve priv a c y [24, 25]. 5. Professional ethics : What role should computer scientists play in the la w and e thics o f AI develop- men t and use? Past and current pro jects to explore these questions include the AAAI 20 08–0 9 Pr esi- dent ial P anel on Lo ng-T erm AI F utures [1 ], the EP - SR C P rinciples of Rob otics [2 6], and r ecently an- nounced pro grams such as Stanford’s One-Hundred Y ea r Study of AI and the AAAI Co mmittee on AI Impact and E thical Issues. F r om a public policy p ersp ective, AI (like a ny power- ful new technology) e na bles b oth great new benefits and nov el pitfalls to be avoided, and appr opriate p olicies can ensure that w e can enjoy the b enefits while risks are min- imized. This r aises policy questions suc h as these: 1. What is the space of p olic ies worth s tudying , and how might they be enac ted? 2. Which criteria should be used to determine the merits of a p olicy? Ca ndidates include verifiabilit y of compliance, enfor ceability , ability to reduce ris k, ability to av oid stifling desira ble tec hnology de- velopmen t, a doptability , and ability to ada pt ov er time to changing cir cumstances. C. Computer Scie nce Research for Robust A I As a utonomous systems b ecome mor e prev alent in s o- ciety , it becomes incre a singly impor tant that they r o - bustly b ehav e as intended. The developmen t of au- tonomous vehicles, autonomous trading s ystems, au- tonomous weap ons, etc. has therefore stoked interest in high-assur ance s ystems where strong robustness gua ran- tees can b e made; W eld and Etzioni hav e argued that “so- ciety will reject autono mous ag ents unless we hav e so me credible means of ma king them safe” [2 7]. Different w ays in which an AI system may fail to p erfor m as desired corres p o nd to different areas o f robustness r esearch: 1. V erification : how to prov e that a system satisfies certain desired formal prop er ties. ( “Did I build the system right?” ) 2. V alidi t y : how to ensure that a system tha t meets its formal r equirements do es not ha ve un w anted be- haviors and consequence s . ( “Did I build the right system?” ) 3. Securit y : how to preven t in tentional manipulation by unauthorized parties. 4. Co ntrol : how to enable meaningful hum an co n trol ov er an AI system after it begins to ope r ate. ( “O K, I built the system wr ong; c an I fi x it?” ) 1. V erific ation By verification, we mean metho ds that yield high co n- fidence that a sy stem will satisfy a set of formal con- straints. When p os sible, it is desirable for systems in safety-critical situations, e.g. self-driv ing car s, to be ver- ifiable. F o r mal verification of soft ware has adv anced signifi- cantly in recent years: examples include the seL4 k er- nel [28], a complete, gener al-purp ose operating - system kernel that has b een mathematica lly chec k ed a gainst a formal sp ecification to g ive a strong guar antee a gainst crashes a nd unsafe op erations , and HA CMS, DAR P A’s “clean-sla te, formal metho ds-based approa ch” to a set o f high-assur ance so ft ware to o ls [29]. Not only s hould it be p o s sible to build AI systems on top of verified sub- strates; it sho uld also b e p o ssible to verify the desig ns of the AI systems themselves, pa rticularly if they fol- low a “co mpo nentized architecture”, in which guar antees ab out individual comp onents c a n b e c ombined accord- ing to their connections to yield pro pe rties o f the over- all system. This mirror s the agent architectures used in Russell and Norv ig (20 10), whic h s eparate an ag ent into distinct mo dules (predictive mo dels, state estimates, util- it y functions, p olicies, learning elements, etc. ), and ha s analogues in some fo rmal r esults o n c ontrol sys tem de- signs. Research on richer kinds of ag ent s – for exa mple, agents with lay ered architectures, a nytime comp onents, ov erlapping delib erative and r eactive elements, metalevel control, etc. – could contribute to the cr eation of verifi- able agents, but we la ck the for mal “algebra ” to prop erly define, explore, a nd rank the space of designs. Perhaps the most salient difference be t ween verifica- tion of traditio nal so ft ware and v erification of AI sys- tems is that the co rrectness of tr a ditional softw are is de- fined with respect to a fixed and kno wn mac hine model, whereas AI s y stems – esp ecia lly robo ts and other e m- bo die d systems – op erate in environments that are a t bes t pa rtially known by the system designer. In these cases, it may be practical to verify that the sy s tem acts correctly given the k nowledge that it has, av oiding the problem of mo delling the real environment [30]. A lack of desig n-time knowledge also motiv ates the use of learn- ing algorithms within the agent soft ware, and verification bec omes mo re difficult: statistical lear ning theory g ives so-called ǫ - δ (pro bably approximately correc t) b ounds, mostly for the somewhat unrealistic settings o f s uper - vised learning from i.i.d. data and sing le -agent reinforce- men t learning with simple architectures and full observ- ability , but ev en then req uiring prohibitively large sample sizes to obta in meaningful g ua rantees. W or k in adaptive control theor y [31], th e theor y of so-called cyb erphysi c al systems [32], and verification of 4 hybrid or ro bo tic systems [3 3, 3 4] is highly relev an t but also faces the same difficulties. And of course all these issues are laid on top of the standar d pro blem of pr oving that a g iven so ft ware artifact do es in fac t co rrectly im- plement , say , a re info r cement learning a lgorithm of the int ended t yp e. Some w ork ha s b e e n do ne on verifying neural netw ork applications [35 – 3 7] and the notion of p artial pr o gr ams [38, 3 9] a llows the designer to imp ose arbitrar y “structural” constra ints on b ehavior, but muc h remains to be done b efor e it will b e p o s sible to hav e high confidence that a learning agent will learn to satisfy its design criteria in rea listic con texts. 2. V alidity A v erification theorem for an agent design has the form, “If environmen t satisfies assumptions φ then b e- havior satisfies r equirements ψ .” There are t wo ways in which a verified ag ent c a n, nonetheless, fail to b e a bene fic ia l agen t in actuality: firs t, the environmen tal as- sumption φ is fa lse in the rea l world, leading to b ehavior that violates the requir ements ψ ; second, the system ma y satisfy the fo rmal requirement ψ but still b ehav e in wa ys that w e find hig hly undesirable in practice. It may b e the case tha t this undesirability is a consequence o f sa tisfying ψ when φ is v iolated; i.e., had φ held the undesirability would no t have bee n manifested; or it may b e the ca se that the requirement ψ is erro neous in itself. Russell and Nor vig (20 1 0) provide a simple example: if a r ob ot v acuum clea ner is asked to clea n up as muc h dirt as p os- sible, and has an a ction to dump the co nten ts of its dir t container, it will rep eatedly dump and clean up the same dirt. The re q uirement should fo cus not on dirt clea ned up but o n cleanliness of the flo o r. Suc h sp ecification erro rs are ubiquitous in softw are verification, where it is com- monly observed that writing cor rect specifica tions can be harder than w r iting correct code. Unfor tuna tely , it is not po ssible to verify the sp ecificatio n: the notio ns of “b en- eficial” a nd “desir able” are not separa tely made for mal, so o ne ca nno t stra ightf orwardly prove that satisfying ψ necessarily leads to desira ble b ehavior and a beneficia l agent. In order to build systems that robustly behave well, w e of cour se need to decide what “go o d b ehavior” means in each a pplica tion domain. This ethical question is tied in- timately to questions of what engineering techniques ar e av ailable, how relia ble these techniques are , and what trade-offs ca n b e made – a ll are as where computer sci- ence, ma chine lea rning, and br oader AI exp ertise is v a lu- able. F or example, W allach a nd Allen (2008) argue that a significa nt considera tio n is the computational exp ense of different behavioral standar ds (or ethica l theories): if a standard canno t b e applied efficien tly eno ugh to guide behavior in safet y-critical s ituations, then c heaper ap- proximations ma y b e needed. D esigning simplified r ule s – for example, to gov ern a self-dr iving car ’s decisions in critical s ituations – will lik ely r e q uire exp er tise fr om b oth ethicists and computer scientists. Computatio nal models of ethical r easoning ma y shed ligh t o n questions of com- putational exp ense a nd the viabilit y of r eliable ethical reasoning methods [40, 41]. 3. Se curity Security resear ch can help make AI more robust. As AI s ystems are used in an increasing num b er of cr iti- cal r oles, they will take up a n inc r easing prop or tio n of cyb er-attack s urface a rea. It is also probable that AI and machine lear ning techniques will themselves b e used in cyber -attacks. Robustness against exploitation at the low lev el is closely tied to verifiability a nd freedom from bugs. F or example, the D ARP A SAFE progra m aims to build a n in- tegrated hardware-softw are system with a flexible meta- data rule engine, on which can b e built memory safety , fault is o lation, and other pro to cols that c o uld impr ov e security by preven ting exploitable flaws [42]. Suc h pro - grams cannot eliminate all se c urity fla ws (since verifica- tion is only as strong a s the a ssumptions tha t underly the sp ecificatio n), but c o uld sig nificantly reduce vulner- abilities of the type exploited by the recent “Heartbleed” and “ B ash” bugs. Suc h systems could b e preferentially deploy ed in safety-critical applica tions, where the cost of improv ed security is justified. A t a higher level, resear ch into sp ecific AI and ma - chine lea r ning techniques may b ecome increa singly use- ful in s ecurity . These techniques could b e applied to the detection of intrusions [4 3], a nalyzing malw are [4 4], o r detecting potential exploits in other prog rams throug h co de analysis [45]. It is not implausible that cyb er attack betw een states and priv ate actor s will b e a r isk factor for harm fr om near-future AI sy stems, motiv ating resear ch on preventing harmful ev ent s. As AI systems grow mor e complex a nd a re netw orked together, they will hav e to int elligently manage their trust, mo tiv ating r esearch on statistical-b ehavioral trust establishment [46 ] a nd com- putational reputation models [47]. 4. Contr ol F o r ce r tain types of s afety-critical AI systems – esp e- cially vehicles and weap ons platforms – it may b e desir - able to r etain so me form o f meaningful hum an control, whether this means a human in the lo op, on the lo op [48, 49], or some o ther proto co l. In a n y of these cases, there will b e technical work needed in order to ensure that meaningful human con trol is maintained [50]. Automated v ehicles are a test-b ed for effective c ontrol- granting tec hniques. T he design of systems and proto cols for transition b etw een automated navigation a nd h uman control is a promising area for further research. Such issues also motiv ate broader rese arch on how to opti- mally allo cate tasks within human–computer teams, both 5 for identifying situations w her e co nt rol should be tra ns - ferred, and for applying h uman judgmen t efficiently to the highest-v alue decisions. II. LONG-TERM RESEARCH PRIORITIES A fr equently discussed long-ter m goa l of so me AI re- searchers is to develop systems that c a n learn from exp e- rience with h uman-like brea dth and surpass h uman per- formance in most cognitive tasks, thereby having a ma jor impact o n so ciety . If there is a non-neg lig ible probability that these efforts will succeed in the foreseea ble future, then additional current research b eyond that men tioned in the previous se c tions will be motiv a ted a s exemplified below, to help ens ur e that the resulting AI will be robust and beneficial. Assessments o f this success probability v ary widely b e- t ween resear chers, but few would argue with grea t con- fidence that the proba bilit y is negligible, given the track record of such predictions. F or example, Ernest Ruther- ford, arguably the greatest n uclear physicist o f his time, said in 19 33 – less than 2 4 ho urs b efore Szilard’s inv en- tion of the nuclear chain rea ction – that nuclear energy was “ mo onshine” [51], a nd Astronomer Ro yal Richard W o o lle y called in terplanetar y tra vel “utter bilge” in 1956 [52]. Moreover, to justify a mo dest inv estmen t in this AI robustness research, this probability nee d not be high, merely non- ne g ligible, just as a mo des t inv estmen t in home insurance is justified by a non-negligible probabil- it y of the ho me bur ning down. A. V erification Reprising the themes of sho r t-term rese arch, resear ch enabling verifiable low-level softw are a nd hardware can eliminate lar ge classe s of bugs and pro blems in general AI s ystems; if suc h systems b ecome increasingly pow- erful and s a fet y-critica l, verifiable sa fet y prop er ties will bec ome increasingly v aluable. If the theory of extend- ing verifiable pro per ties fro m comp onents to entire sys - tems is well understo o d, then even very large systems can enjoy certain kinds of safety guarantees, po tent ially aided by techniques designed explicitly to handle learning agents and hig h- level prop erties. Theoretica l rese a rch, esp ecially if it is do ne explicitly with very genera l and capable AI sy stems in mind, could b e particula rly useful. A related verification r esearch topic tha t is distinctive to long-term concerns is the v erifiability o f systems that mo dify , extend, or impro ve themselves, p ossibly man y times in succes s ion [5 3, 54]. At tempting to straig ht for- wardly apply formal verification to o ls to this mor e ge n- eral setting presents new difficulties , including the chal- lenge that a formal system that is sufficiently p ow erful cannot use formal metho ds in the obvious wa y to ga in assurance about the accura cy of functionally similar for - mal sy stems, o n pain of inconsistency via G¨ odel’s incom- pleteness [5 5, 5 6]. It is not y et clea r whether or ho w this pro blem can b e ov ercome, or whe ther simila r prob- lems will arise with other v erification metho ds of similar strength. Finally , it is often difficult to actually apply for ma l ver- ification techniques to ph ysical systems, esp ecially sys- tems that have not been desig ned with v erification in mind. This motiv ates r esearch pursuing a gene r al the- ory that links functional sp ecification to physical states of affairs. This type of theory would allow use of for- mal too ls to ant icipate and control b ehaviors of systems that a pproximate r ational agen ts, a lternate desig ns suc h as s atisficing agents, and s y stems that canno t b e eas- ily desc r ib ed in the sta ndard ag e nt forma lism (p ow erful prediction systems, theorem-prov ers, limited-purp ose sci- ence or engineering sys tems, etc. ). It may a lso be that such a theor y could a llow rig o rous demo ns trations that systems are constrained from taking ce r tain kinds of a c- tions or p er forming certain kinds of reas o ning. B. V alidity As in the sho rt-term research priorities , v alidity is co n- cerned with undes irable b ehaviors that c an arise despite a system’s forma l corre ctness. In the long term, AI sys- tems might b ecome more p owerful and autono mo us, in which case failures of v alidit y could carry corresp onding ly higher costs. Strong guarantees for machine lear ning metho ds, an area w e highligh ted fo r short-term v alidit y resear ch, will also b e impor tant for long -term safety . T o maximize the long-term v alue of this work, machine lea rning resear ch might fo cus on the types of unexp ected generalization that would b e most problematic for v ery genera l a nd ca- pable AI s ystems. In par ticular, it might aim to under - stand theoretica lly and pra ctically how learned r epresen- tations o f high-level human concepts could b e exp ected to gener alize (or fail to) in radically new c o nt exts [57]. Additionally , if some concepts could b e learned r eliably , it might b e p ossible to us e them to define tasks and con- straints that minimize the chances of unint ended conse- quences even when autonomo us AI s ystems b eco me very general and ca pable. Little work has b een done on this topic, which sug g ests that b oth theor etical a nd ex per i- men tal research may b e useful. Mathematical to ols such a s formal log ic, probability , and decision theory hav e yielded sig nificant insight into the founda tions of reasoning and decision-making . How- ever, there are still ma n y op en pro blems in the fo unda - tions of rea soning a nd dec ision. Solutio ns to these prob- lems may mak e the b ehavior of very capable sy s tems m uch more reliable a nd pr edictable. Example r esearch topics in this ar ea include reasoning and decision un- der b ounded computational reso urces ` a la Hor vitz and Russell [5 8, 59], how to tak e into account co rrelatio ns betw een AI systems’ b ehaviors a nd those of their envi- ronments or of other agents [60 – 64], how agents that are 6 embedded in their environmen ts s hould re a son [65, 66], and how to reaso n ab out uncer taint y over logical con- sequences of be lie fs or o ther deterministic co mputations [67]. Thes e topics ma y b enefit from be ing considered to- gether, since they app ear deeply linked [68, 69]. In the long term, it is pla usible tha t we will w ant to make agents that act autonomo usly and p ow erfully across many domains. Explicitly sp ecifying our prefer- ences in broad domains in the style of near -future ma- chine ethics may no t b e practical, ma king “aligning ” the v alues of p ow erful AI s ystems with o ur own v a lues and preferences difficult [70, 71]. Cons ider, for instance, the difficult y of creating a utility function that encompasses an entire b o dy of law; even a litera l rendition of the la w is far b eyond o ur cur rent capabilities, and would b e highly unsatisfactory in practice (since la w is written a ssuming that it will be in terpreted and a pplied in a flexible, case- by-case wa y by humans who , presumably , already em- bo dy the bac kground v alue systems that ar tificial agents may la ck). Reinforcement learning rais es its own pro b- lems: when systems beco me very capable and general, then an e ffect similar to Go o dhar t’s Law is likely to o c- cur, in which sophisticated age nts attempt to manipulate or directly control their reward sig nals [72]. This moti- v ates resea rch ar eas that could improv e our ability to engineer systems that can learn or acquir e v alues at r un- time. F o r ex a mple, inv erse reinforcement learning may offer a viable approa ch, in which a system infers the pre f- erences of another rational or nearly rationa l actor b y o b- serving its b ehavior [73, 74]. O ther a pproaches could use different as sumptions a bo ut underlying co gnitive mo dels of the actor who se preferences are b eing lea rned [75], or could b e explicitly inspired by the way humans acq uir e ethical v alues. As sys tems be c o me mor e capable, more epistemically difficult metho ds could b ecome viable, s ug- gesting that res e arch on such metho ds could b e useful; for exa mple, Bo s trom (20 14) reviews preliminary work on a v ariety of metho ds for specifying goals indirectly . C. Security It is unclear whether lo ng-term pro gress in AI will make the o verall problem of security easier or ha rder; on one hand, s y stems will b eco me increa singly complex in construction a nd behavior a nd AI-ba sed cybera ttacks may b e ex tr emely effective, while o n the o ther hand, the use of AI a nd machine learning techniques a long with sig- nificant progress in low-level sys tem reliability ma y ren- der ha rdened systems muc h less vulnerable than to day’s. F r om a cryptogra phic p e r sp ective, it app ears tha t this conflict favors defenders over attack ers; this ma y b e a reason to pursue effectiv e defense resea r ch wholeheart- edly . Although the topics describ ed in the nea r-term s ecurity resear ch section ab ove may b ecome incr easingly imp or- tant in the long ter m, very general and capable systems will p ose distinctive security pr oblems. I n particula r, if the problems of v alidit y and control are not solv ed, it may be useful to crea te “co ntainers” for AI systems that could hav e undesirable b ehaviors and consequences in less con- trolled environment s [76]. Both theor etical a nd practical sides o f this question warrant inv estigation. If the gen- eral case of AI containmen t tur ns out to b e pro hibitiv ely difficult, then it may be that designing an AI system and a co nt ainer in parallel is mor e successful, allowing the w eakness es and streng ths of the design to inform the containmen t str ategy [72]. The design o f anomaly detec- tion systems a nd automa ted ex ploit-chec kers could b e of significant help. Overall, it seems rea sonable to ex pec t this additional p ers p ective – defending ag ainst attacks from “within” a system a s well a s from ex ter nal actors – will r aise interesting and profitable questions in the field of computer securit y . D. Contr ol It has b een a r gued tha t very general and ca pa ble AI systems op erating a utonomously to a c complish some ta sk will often b e sub ject to effects that increa se the difficult y of main taining meaning ful human control [6, 72, 77, 78]. Research on systems that are no t s ub ject to these ef- fects, minimize their impact, or allow for reliable human control could b e v aluable in preven ting undesired co nse- quences, as could work on re lia ble and secure test-b eds for AI s ystems at a v ariety o f capabilit y lev els. If an AI system is se lecting the actions that best allow it to complete a giv en task, then av oiding co nditions that preven t the s y stem fro m contin uing to pur sue the task is a natural subgo a l [77, 78] (and con versely , seeking unco n- strained situatio ns is s ometimes a useful heuris tic [79]). This could b eco me problematic, how ever, if we wish to repurp ose the system, to deactiv ate it, or to sig nifica nt ly alter its decision-mak ing pro cess; such a system w ould rationally avoid these changes. Sy stems that do no t ex- hibit these behaviors have b een termed c orrigi ble systems [80], and b oth theore tica l and pr actical work in this area app ears tractable and us e ful. F or ex a mple, it may b e po ssible to desig n utility functions or decision pro cesses so tha t a sy s tem will not try to av oid b eing shut down or repurp osed [80], a nd theoretical frameworks could b e developed to better understa nd the spac e of p otential systems that a void undesirable behaviors [81 – 83]. It has b een a r gued that another natura l subgoal for AI systems pursuing a given goa l is the acquisition of fun- gible r esources of a v a riety of kinds : for exa mple, infor- mation ab out the environment , safety fro m dis ruption, and improv ed freedom o f action a re all instrumen tally useful for many task s [77, 78]. Hammond et al (1995) gives the label stabili zation to the mor e general set o f cases where “due to the action of the a gent, the environ- men t comes to be be tter fitted to the agent as time go es on”. T his type of s ubg oal could lea d to undesir ed con- sequences, a nd a b etter understanding o f the co nditio ns under which resource acquisitio n or radica l stabilization 7 is an optimal str ategy (or likely to b e selec ted by a given system) w ould b e useful in mitigating its effects. Poten- tial research topics in this area include “domestic” go als that are limited in scop e in some wa y [72], the effects of lar ge tempo ral discount r ates on r esource acquis itio n strategies, and e x per iment al inv estigatio n of simple sys- tems that displa y these subgo als. Finally , r esearch o n the p o ssibility of sup erintelligent machines or rapid, sustained self-improvemen t (“intelli- gence explosion” ) has b een highlighted by past and cur- rent pro jects on the future o f AI as p otentially v aluable to the pro ject of main taining reliable co ntrol in the long term. The AAAI 200 8 –09 Pres iden tial Panel on Long- T er m AI F utures’ “ Subgroup on Pace, Concerns , and Control” stated that There was overall skepticism ab out the prosp ect of an intelligence ex plo sion... Nev- ertheless, there was a shared se ns e that addi- tional research would b e v aluable on metho ds for understa nding and verifying the rang e of behaviors of complex computational systems to minimize unexp ected o utco mes. Some panelists rec o mmended that more resear ch needs to be done to b etter define “intelli- gence ex plosion,” and also to b etter formu- late different cla sses of such a ccelerating in- telligences. T echnical work w ould likely lead to enhanced understanding of the likelihoo d of such phenomena, and the na ture, risks, and overall o utco mes as s o ciated with differ- ent conceived v ariants [1 ]. Stanford’s O ne-Hundred Y ea r Study of Ar tificial In tel- ligence includes “Loss of Control of AI systems” a s an area of study , sp ecifically highlig ht ing co nc e r ns ov er the po ssibility that ...we could one day lose c o ntrol of AI systems via the rise of s up er in telligences tha t do not act in ac c ordance with human wis hes – and that suc h p ow erful systems w ould threaten hu manity . Are such dys topic outcomes p os- sible? If s o, how might these situations arise? ...What kind of in vestmen ts in resea rch should b e made to b etter under stand and to address the p ossibility of the rise of a danger- ous sup erintelligence or the o ccurrence of a n “intelligence explosion” ? [84] Research in this area c o uld include an y of the long - term resear ch priorities listed ab ov e, as well as theo r etical and forecasting work o n intelligence ex plosion a nd s uper in- telligence [72, 85], and could ex tend o r critique existing approaches b egun by gro ups such as the Machine Intel- ligence Research Institute [71]. II I. CONCLUSION In summary , succes s in the quest for artificial intelli- gence has the po tent ial to br ing unprecedented b enefits to humanit y , and it is ther efore worth while to res e arch how to maximize these benefits while av oiding potential pitfalls. The research ag enda outlined in this pap er, and the concerns that motiv ate it, hav e b een called “ anti- AI”, but we v ig orously contest this characteriza tion. It seems self-evident that the g rowing capabilities of AI a r e leading to an incr e ased p otential for impact o n human so ciety . It is the dut y of AI r esearchers to ensure that the future impa c t is b eneficial. W e b elieve that this is po ssible, a nd hop e that this resear ch agenda provides a helpful step in the right direc tion. IV. AUTHORS Stuart Russell is a Profess or of C o mputer Science at UC Berkeley . His rese a rch covers many asp ects of artifi- cial intelligence and machine learning. He is a fellow of AAAI, ACM, and AAAS and winner of the IJCAI Co m- puters and Thought Award. He held the Cha ire B laise Pascal in Paris from 2012 to 2 014. His b o ok A rtificial Intel lige nc e: A Mo dern Appr o ach (with Peter No r vig) is the standard text in the field. Daniel Dew ey is the Alexa nder T amas Resea rch F el- low o n Machine Super int elligence a nd the F uture of AI at Oxford’s F utur e of Humanit y Ins titute, Oxfor d Martin School. He was pr eviously at Go o gle, Intel Labs P itts- burgh, and Ca r negie Mellon University . Max T egmark is a professor of physics at MIT. His current resear ch is at the interface of physics and a rti- ficial intelligence, using physics-based techniques to ex- plore connections b etw een information pro cessing in bi- ologica l and eng ineered s ystems. He is the pr esident o f the F uture of Life Institute, whic h s uppo rts r e s earch a d- v ancing robust and beneficial artificia l intelligence. V. ACKNO WLEDGEMENTS The initial version of this do cument was drafted with ma jor input fro m Janos Kramar and Richard Mallah, and reflects v aluable feedbac k from Anthon y Aguirre, Erik Brynjolfsson, Ryan Calo, Meia Chita-T egmark , T o m Dietterich, Dileep George, Bill Hibba r d, Demis Hassabis, Eric Hor v itz, Leslie Pac k Kaelbling, James Manyik a, Luke Muehlhauser, Michael Osbor ne, Da vid Park es, Heather Roff, F rancesc a Rossi, Bart Selman, Murray Sha na han, a nd many others . The authors are also gra teful to Ser k an Cabi a nd David Stanley for he lp with man uscript editing and forma tting. 8 [1] E. Hor vitz and B. S elman , Interi m Rep ort from the P anel Chairs, 2009, AAA I Presidential Panel on Long T erm AI F utures. [2] J. Mokyr , Se cular Stagnation: F acts, Causes and Cur es , 83 (2014). [3] E. Br ynjolfsson and A. McAfee , The se c ond machine age: work, pr o gr ess, and pr osp erity in a time of bril l i ant te chnolo gies , W.W. Norton & Company , 2014. [4] C. Fre y and M. Osborne , The future of employmen t: how suscept ible are jobs to comput erisation?, T ec hni- cal report, O xford Martin Sc ho ol, Universit y of Oxford, 2013. [5] E. L. G laeser , Se cular Stagnation: F acts, Causes and Cur es , 69 (2014). [6] M. Shanahan , The T e chnolo gic al Singularity , MIT Press, 2015, F orthcoming. [7] N. J. Nilsson , AI Magazine 5 , 5 (1984 ). [8] J. Many ika , M. Chui , J. B ughin , R. Dobbs , P. B is- son , and A. Marrs , Di sruptive T e chnolo gies: A dvanc es that wil l T r ansform Life, Busi ness, and the Glob al Ec on- omy , McKinsey G lobal In stitute, W ashington, D.C., 2013. [9] E. Br ynjolfsson , A. McA fee , and M. Spe nce , F or- eign Aff. 93 , 44 (2014). [10] C. Hetschk o , A. Knabe , and R. Sch ¨ ob , The Ec onomic Journal 124 , 149–166 (2014 ). [11] A. E. Clark and A . J. Osw ald , The Ec onomic Journal , 648–659 (1994). [12] P. V an P arijs , Ar guing for Basic Inc ome. Ethic al foun- dations for a r adic al r eform , V erso, 1992. [13] K. Widerquist , J. A. Noguera , Y. V an derbor ght , and J. De Wi spelaere , Basic inc ome: an antholo gy of c ontemp or ary r ese ar ch , Wiley/Blac kw ell, 2013 . [14] D. C. Vladeck , Wash. L. R ev. 89 , 117 (2014 ). [15] R. Calo , Available at SSRN 2402972 (2014). [16] R. Calo , Available at SSRN 2529151 (2014). [17] R. R. Churchill and G. Ulfstein , Amer ic an Journal of Interna tional L aw 94 , 623–659 ( 2000). [18] B. L. Docher ty , L osing Humani ty: The Case A gainst Kil ler R ob ots , Human R ights W atch, New Y ork, 2012. [19] H. M. Roff , R outle dge Handb o ok of Ethics and War: Just W ar The ory in the 21st Century , 352 (2013). [20] H. M. Roff , Journal of Military Ethics 13 (2014 ). [21] K. Ande rson , D. Reisner , and M . C. W axman , In- ternational L aw Studies 90 , 386–411 (2014). [22] P. Asaro , How Just could a Rob ot W ar Be?, in Curr ent Issues in C om puting And Phil osophy , ed ited by K. W. Adam Briggle and P. A. E. Bre y , p. 50–64, IOS Press, Amsterdam, 2008. [23] P. W. S inger and A. Friedm an , Cyb erse curity: What Everyone Ne e ds to Know , Ox ford Universit y Press, New Y ork, 2014. [24] J. Manyika , M. Ch ui , B . Bro wn , J. Bughin , R. Dobbs , C. R o xburgh , and A. H. Byers , Big Data: The N ext F rontier for In n o v ation, Competition, and Pro- ductivity , Rep ort, McKinsey Global I nstitute, W ashing- ton, D.C., 2011. [25] R. Agra w al and R. Srikant , ACM Si gmo d R e c or d 29 , 439–450 (2000). [26] M. Boden , J. Br yson , D. Caldwell , K. D aut- enhahn , L. Ed w ards , S. Kember , P. Newman , V. P arr y , G. Pegman , T. Rodden , et al., (2011). [27] D. Weld and O. Etzioni , AAAI T e chnic al R ep ort SS- 94-03 , 17–23 ( 1994). [28] G . Klein , K. Elphin stone , G. Hei ser , J. And r onick , D. Cock , P. Derrin , D. Elkaduwe , K. Engelhardt , R. Kolansk i , M. N orrish , T. Sewell , H. Tuch , and S. Winwood , seL4: F ormal verification of an OS kernel, in Pr o c e e dings of the A CM SIGOPS 22nd symp osium on Op er ating systems principles , p. 207–220, A CM, 2009. [29] K. Fisher , HACMS: High Assurance Cyb er Military Systems, in Pr o c e e dings of the 2012 ACM Conf er enc e on High Inte grity L anguage T e chnolo gy , p. 51–52, ACM, 2012. [30] L. A. Dennis , M. Fisher , N. K. Lincoln , A. Lisitsa , and S. M. Veres , arXiv pr eprint (2013). [31] K. J. r Astr ¨ om and B. Wittenm ark , A daptive c ontr ol , Courier Dov er Publications, 201 3. [32] A . Pla tzer , L o gic al A nalysis of Hybrid Systems: Pr ov- ing The or ems for Complex Dynamics , Springer, 20 10. [33] R . Alur , F ormal verification of hybrid sy stems, in Em- b e dde d Softwar e (EMSOFT), 2011 Pr o c e e dings of the In- ternational Confer enc e on , p. 273–278, IEEE, 2011. [34] A . F. W infield , C. Blum , and W. Liu , T ow ards an Ethical Rob ot: Internal Mod els, Consequences and Ethical Action Selection, in A dvanc es i n Autonomous R ob otics Systems , edited by M. Mistr y , A. Leonardi s , M. Witko wski , and C. Melhu ish , p. 85–96, Sp ringer, 2014. [35] L. Pulina and A. T acchella , An abstraction- refinement approach to verification of artificial neural netw orks, in Computer Aide d V erific ation , p. 243–257, 2010. [36] B . J. E. T a ylor , Metho ds and Pr o c e dur es for the V er- ific ation and V alidation of Art ificial Neur al Networks , Springer, 2006. [37] J. M. Schumann and Y. Liu , Appli c ations of neur al networks in hi gh assur anc e systems , S pringer, 2010. [38] D. Andre and S . J. Russell , State abstraction for pro- grammable reinforcemen t learning agents, in Eighte enth national c onfer enc e on Artificial intel ligenc e , p. 119–1 25, American Asso ciation for Art ificial Intelligence, 2002. [39] D. F. Spears , Assuring the Behavior of Adaptive Agents, in A gent T e chnolo gy fr om a F ormal Persp e ctive , p. 227–257, Springer, 2006. [40] P. M. Asa r o , I nternational R eview of Information Ethics 6 , 9–16 (2006 ). [41] J. P. Sullins , Philosophy & T e chnolo gy 24 , 23 3–238 (2011). [42] A . DeH on , B. Karel , T. F. Kni ght Jr , G. Malecha , B. Mont agu , R. Morisset , G. Morrise tt , B. C. Pierce , R. Pollac k , S. Ra y , O. Shi vers , and J. M. Smith , Preliminary Design of the SAFE Platform, in Pr o c e e dings of the 6th Workshop on Pr o gr amming L an- guages and Op er ating Syst ems , A CM, 2011. [43] T . D. Lan e , Mac hine learning techniques for the com- puter securit y domain of anomaly detection, 2000, Ph.D. Dissertation, Department of Electrical Engineering, Pur- due U n ivers ity . [44] K. Rieck , P. Trinius , C . Willems , and T. Holz , Jour- nal of C om puter Se curity 19 , 639 –668 (2011). 9 [45] Y. Brun and M. D. Ernst , Find ing Latent Cod e Er- rors v ia Machine Learning Over Program Executions, in Pr o c e e dings of the 26th Intern ational Confer enc e on Soft- war e Engine ering , p. 480–490 , IEEE Computer So ciety , 2004. [46] M. J. Probst and S. K. Kasera , Statistical trust es- tablishment in wireless sensor n etw orks, in Par al lel and Distribute d Systems, 2007 I nternational Conf er enc e on , vol ume 2, p. 1–8, IEEE, 2007. [47] J. Saba ter and C. Si erra , A rtificial intel l igenc e r eview 24 , 33–60 (2005). [48] H. Hexmoor , B. McLa ughlan , and G . Tuli , Journal of Exp erimental & The or etic al Art ificial Intel ligenc e 21 , 59–77 (2009). [49] R. P a rasuraman , T. B . She ridan , and C . D. W ick- ens , Syst ems, M an and Cyb ernetics, Part A: Systems and Humans, IEEE T r ansactions on 30 , 286–2 97 (2000). [50] UNIDIR , The We ap onization of Incr e asingly Au- tonomous T e chnolo gies: Im pl ic ations for Se curity and Ar ms Contr ol , UN IDIR, 2014. [51] A. Press , New Y ork Her ald T ribune (1933), September 12, p. 1. [52] Reuters , The Ottawa Citizen (1956), January 3, p. 1. [53] I. J. Good , A dvanc es in Computers 6 , 31– 88 (1965). [54] V. Vinge , The Coming T echnological S ingularit y , in VISION-21 Symp osium, NASA L ewis R ese ar ch Center and the Ohi o A er osp ac e Institute , 1993, NASA CP-10129. [55] B. F allenstein and N. So ares , Vingean R eflection: Reliable R easoning for Self-Mod ifying Agents, T ec hnical rep ort, Mac hine Intelligence Research Institute, Berkele y , 2014. [56] N. Wea ve r , P arado xes of rational agency and formal systems that verify their own sound ness, 2013, Preprint. [57] M. Tegmark , F riend ly Artificial Intelligence: the Physics Challenge, in Pr o c e e dings of the AAAI-15 Work- shop on AI and Ethics , p. 87–89, AA AI, 2015. [58] E. J. Hor vitz , Reasoning A b out Beliefs and Actions Un - der Comput ational Resource Constraints, in Thir d AAAI Workshop on Unc ertainty in A rtificial Intel l igenc e , p. 429–444 , 1987. [59] S. J. R ussell and D. Subramanian , Journal of Artifi- cial Intel ligenc e R ese ar ch , 1–36 (1995). [60] M. Te nnenhol tz , Games and Ec onomic Behavior 49 , 363–373 (2004). [61] P. LaVi ctoire , B. F allenstein , E. Yud k o wsky , M. Barasz , P. Christiano , and M . Herreshoff , Pro- gram Equilibrium in the Prisoner’s Dilemma via Lb’s Theorem, in AAAI Mul ti agent I nter action without Prior Co or dination workshop , 2014. [62] D. Hintze , Problem Class Dominance in Predictive Dilemmas, 2014, H onors Thesis, A rizona St ate Univer- sit y . [63] J. Y. Halpe rn and R. P ass , arXiv pr eprint arXiv:1308.3778 (2013). [64] N. Soares an d B. F allenstein , T o war d Idealized Deci- sion Theory , T echnical rep ort, Machine Intelligence Re- searc h In stitute, Berk eley , 2014. [65] N. Soares , F ormalizing Tw o Problems of Realisti c W orld-Mod els, T echnical rep ort, Machine Intelli gence Researc h I n stitute, Berkeley , 2014. [66] L. Orseau and M. Ring , Space-Time Embedded I ntell i- gence, in Pr o c e e dings of the 5th International Confer enc e on A rtificial Gener al Intel ligenc e , p . 209–218, Berlin, 2012, Sp ringer. [67] N . So ares and B. F allenstein , Questions of Reasoning Und er Logical Uncertaint y , T ec hnical re- p ort, Mac hine Intelligence Researc h I nstitute, 201 4, url: http://int elligence.or g/files/QuestionsLogicalUncertainty.p d f [68] J. Y. Halper n and R. P a ss , arXiv pr eprint arXiv:1106.2657 (2011). [69] J. Y. Halp ern , R. P ass , and L. Se eman , T opics in Co gnitive Scienc e 6 , 245–257 (2014). [70] N . So ares , The V alue Learning Problem, T echnical rep ort, Mac hine Intelli gence Researc h Institut e, Berkeley , 2014. [71] N . Soares and B. F allenstein , Aligning Sup erin- telligence with Human Interests: A T echnical Researc h Agenda, T echnical rep ort, Mac hine Intel ligence Researc h Institute, Berkeley , California, 2014. [72] N . Bostrom , Sup erintel ligenc e: Pat hs, dangers, str ate- gies , Ox ford Universit y Press, 2014 . [73] S . Russ ell , Learning Agents for Uncertain Environ- ments, in Pr o c e e dings of the Eleventh Annual Confer enc e on Computational L e arning The ory , p . 101– 103, 1998. [74] A . Y. Ng and S. Russell , A lgorithms for Inverse R e- inforcemen t Learning, in Pr o c e e dings of the 17th Inter- national Confer enc e on Machine L e arning , p. 663–670, 2000. [75] W . Chu and Z. Ghahramani , Preference learning with Gaussian pro cesses, in Pr o c e e dings of the 22nd interna- tional c onfer enc e on Machine le arning , p. 137–144, ACM, 2005. [76] R . Y ampolskiy , Journal of C onsciousness Studies 19 , 1–2 (2012). [77] S . M. Om ohundro , The nature of self-improving artifi- cial in telligence, 2007, Presented at Singularity Summit 2007. [78] N . Bostr om , Minds and M achines 22 , 71–85 (2012). [79] A . Wissner-Gross and C. Freer , Physic al r eview let- ters (2013), 110.16: 168702. [80] N . Soares , B. F allenstein , E. Yud k o wsky , and S. Arm str ong , Corrigibili ty , in AAAI-15 Workshop on AI and Ethics , 2015. [81] B . Hibbard , Avoiding u nintended AI b eh a viors, in A r- tificial Gener al Intel ligenc e , edited by J. Bach , B. Go- er tze l , and M. Ikl , p. 107 –116, S pringer, 2012. [82] B . Hibbard , Ethic al A rtificial Intel li genc e , 2014. [83] B . Hibbard , S elf-Modeling Agents an d R ew ard Genera- tor Corruption, in Pr o c e e dings of the AAAI-15 Worksh op on AI and Ethics , p . 61–6 4, AAAI, 2015. [84] E. Hor vitz , One-H undred Y ear Study of A rt ificial Intel- ligence: Reflections and F raming, White p ap er, S t anford Universit y , 2014. [85] D. Ch almers , Journal of Consciousness Stu dies 17 , 7–65 (2010).
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment