GIB: Imperfect Information in a Computationally Challenging Game
This paper investigates the problems arising in the construction of a program to play the game of contract bridge. These problems include both the difficulty of solving the game's perfect information variant, and techniques needed to address the fact…
Authors: ** - **Matthew Ginsberg** (주 저자) – 당시 University of California, Berkeley 컴퓨터 과학과 박사 과정. *(논문에 공동 저자가 명시되지 않은 경우, 주요 저자만 기재)* **
Journal of Artiial In telligene Resear h 14 (2001) 303{358 Submitted 10/00; published 6/01 GIB: Imp erfet Information in a Computationally Challenging Game Matthew L. Ginsb erg ginsber girl.uoregon.edu CIRL 1269 University of Or e gon Eugene, OR 97405 USA Abstrat This pap er in v estigates the problems arising in the onstrution of a program to pla y the game of on trat bridge. These problems inlude b oth the diÆult y of solving the game's p erfet information v arian t, and te hniques needed to address the fat that bridge is not, in fat, a p erfet information game. Gib , the program b eing desrib ed, in v olv es v e separate te hnial adv anes: partition sear h, the pratial appliation of Mon te Carlo te hniques to realisti problems, a fo us on a hiev able sets to solv e problems inheren t in the Mon te Carlo approa h, an extension of alpha-b eta pruning from total orders to arbitrary distributiv e latties, and the use of squeaky wheel optimization to nd appro ximately optimal solutions to ardpla y problems. Gib is urren tly b eliev ed to b e of appro ximately exp ert alib er, and is urren tly the strongest omputer bridge program in the w orld. 1. In tro dution Of all the lassi games of men tal skill, only ard games and Go ha v e y et to see the ap- p earane of serious omputer hallengers. In Go, this app ears to b e b eause the game is fundamen tally one of pattern reognition as opp osed to sear h; the brute-fore te hniques that ha v e b een so suessful in the dev elopmen t of hess-pla ying programs ha v e failed al- most utterly to deal with Go's h uge bran hing fator. Indeed, the arguably strongest Go program in the w orld (Handtalk) w as b eaten b y 1-dan Janie Kim (winner of the 1984 F uji W omen's Championship) in the 1997 AAAI Hall of Champions after Kim had giv en the program a mon umen tal 25 stone handiap. Card games app ear to b e dieren t. P erhaps b eause they are games of imp erfet in- formation, or p erhaps for other reasons, existing p ok er and bridge programs are extremely w eak. W orld p ok er hampion Ho w ard Lederer (T exas Hold'em, 1996) has said that he w ould exp et to b eat an y existing p ok er program after v e min utes' pla y . y 1 P erennial w orld bridge hampion Bob Hamman, sev en-time winner of the Berm uda Bo wl, summarized the state of bridge programs in 1994 b y sa ying that, \They w ould ha v e to impro v e to b e hop eless." y In p ok er, there is reason for optimism: the gala system (Koller & Pfeer, 1995), if appliable, promises to pro due a omputer pla y er of unpreeden ted strength b y reduing the p ok er \problem" to a large linear optimization problem whi h is then solv ed to generate a strategy that is nearly optimal in a game-theoreti sense. S haeer, author of the w orld 1. Man y of the itations here are the results of p ersonal omm uniations. Su h omm uniations are indi- ated simply b y the presene of a y in the aompan ying text. 2001 AI Aess F oundation and Morgan Kaufmann Publishers. All righ ts reserv ed. Ginsber g hampion he k ers program Chinook (S haeer, 1997), is also rep orting signian t suess in the p ok er domain (Billings, P app, S haeer, & Szafron, 1998). The situation in bridge has b een bleak er. In addition, b eause the Amerian Con trat Bridge League ( a bl ) do es not rank the bulk of its pla y ers in meaningful w a ys, it is diÆult to ompare the strengths of omp eting programs or pla y ers. In general, p erformane at bridge is measured b y pla ying the same deal t wie or more, with the ards held b y one pair of pla y ers b eing giv en to another pair during the repla y and the results then b eing ompared. 2 A \team" in a bridge mat h th us t ypially onsists of t w o pairs, with one pair pla ying the North/South (N/S) ards at one table and the other pair pla ying the E/W ards at the other table. The results obtained b y the t w o pairs are added; if the sum is p ositiv e, the team wins this partiular deal and if negativ e, they lose it. In general, the n umeri sum of the results obtained b y the t w o pairs is on v erted to In ternational Mat h P oin ts, or imp s. The purp ose of the on v ersion is to diminish the impat of single deals on the total, lest an abnormal result on one partiular deal ha v e an unduly large impat on the result of an en tire mat h. Je Goldsmith y rep orts that the standard deviation on a single deal in bridge is ab out 5.5 imp s, so that if t w o roughly equal pairs w ere to pla y the deal, it w ould not b e surprising if one team b eat the other b y ab out this amoun t. It also app ears that the dierene b et w een an a v erage lub pla y er and an exp ert is ab out 1.5 imp s (p er deal pla y ed); the strongest pla y ers in the w orld are appro ximately 0.5 imp s/deal b etter still. Exepting gib , the strongest bridge pla ying programs app ear to b e sligh tly w eak er than a v erage lub pla y ers. Progress in omputer bridge has b een slo w. An inorp oration of planning te hniques in to Bridge Baron, for example, app ears to ha v e led to a p erformane inremen t of appro ximately 1/3 imp p er deal (Smith, Nau, & Thro op, 1996). This mo dest impro v emen t still lea v es Bridge Baron far sh y of exp ert-lev el (or ev en go o d amateur-lev el) p erformane. Prior to 1997, bridge programs generally attempted to dupliate h uman bridge-pla ying metho dology in that they pro eeded b y attempting to reognize the lass in to whi h an y partiular deal fell: nesse, end pla y , squeeze, et. Smith et al.'s w ork on the Bridge Baron program uses planning to extend this approa h, but the plans on tin ue to b e onstruted from h uman bridge te hniques. Nygate and Sterling's early w ork on python (Sterling & Nygate, 1990) pro dued an exp ert system that ould reognize squeezes but not prepare for them. In retrosp et, p erhaps w e should ha v e exp eted this approa h to ha v e limited suess; ertainly hess-pla ying programs that ha v e attempted to mimi h uman metho dology , su h as p aradise (Wilkins, 1980), ha v e fared p o orly . Gib , in tro dued in 1998, w orks dieren tly . Instead of mo deling its pla y on te hniques used b y h umans, gib uses brute-fore sear h to analyze the situation in whi h it nds itself. A v ariet y of te hniques are then used to suggest pla ys based on the results of the brute-fore sear h. This te hnique has b een so suessful that all omp etitiv e bridge programs ha v e swit hed from a kno wledge-based approa h to a sear h-based approa h. GIB's ardpla y based on brute-fore te hniques w as at the exp ert lev el (see Setion 3) ev en without some of the extensions that w e disuss in Setion 5 and subsequen tly . The w eak est part of gib 's game is bidding, where it relies on a large database of rules desribing 2. The rules of bridge are summarized in App endix A. 304 GIB: Imperfet inf orma tion in a omput a tionall y hallenging game the meanings of v arious autions. Quan titativ e omparisons here are diÆult, although the general impression of the stronger pla y ers using GIB are that its o v erall pla y is omparable to that of a h uman exp ert. This pap er desrib es the v arious te hniques that ha v e b een used in the gib pro jet, as follo ws: 1. Gib 's analysis in b oth bidding and ardpla y rests on an abilit y to analyze bridge's p erfet-information v arian t, where all of the ards are visible and ea h side attempts to tak e as man y tri ks as p ossible (this p erfet-information v arian t is generally referred to as double dummy bridge). Double dumm y problems are solv ed using a te hnique kno wn as p artition se ar h , whi h is disussed in Setion 2. 2. Early v ersions of gib used Monte Carlo metho ds exlusiv ely to selet an ation based on the double dumm y analysis. This te hnique w as originally prop osed for ardpla y b y Levy (Levy , 1989), but w as not implemen ted in a p erformane program b efore gib . Extending Levy's suggestion, gib uses Mon te Carlo sim ulation for b oth ardpla y (disussed in Setion 3) and bidding (disussed in Setion 4). 3. Setion 5 disusses diÆulties with the Mon te Carlo approa h. F rank et al. ha v e suggested dealing with these problems b y sear hing the spae of p ossible plans for pla ying a partiular bridge deal, but their metho ds app ear to b e in tratable in b oth theory and pratie (F rank & Basin, 1998; F rank, Basin, & Bundy , 2000). W e instead ho ose to deal with the diÆulties b y mo difying our understanding of the game so that the v alue of a bridge deal is not an in teger (the n um b er of tri ks that an b e tak en) but is instead tak en from a distributiv e lattie. 4. In Setion 6, w e sho w that the alpha-b eta pruning me hanism an b e extended to deal with games of this t yp e. This allo ws us to nd optimal plans for pla ying bridge end p ositions in v olving some 32 ards or few er. (In on trast, F rank's metho d is apable only of nding solutions in 16 ard endings.) 5. Finally , applying our ideas to the pla y of full deals (52 ards) requires solving an appro ximate v ersion of the o v erall problem. In Setion 7, w e desrib e the nature of the appro ximation used and our appliation of sque aky whe el optimization (Joslin & Clemen ts, 1999) to solv e it. Conluding remarks are on tained in Setion 8. 2. P artition sear h Computers are eetiv e game pla y ers only to the exten t that brute-fore sear h an o v erome innate stupidit y; most of their time sp en t sear hing is sp en t examining mo v es that a h uman pla y er w ould disard as ob viously without merit. As an example, supp ose that White has a fored win in a partiular hess p osition, p erhaps b eginning with an atta k on Bla k's queen. A h uman analyzing the p osition will see that if Bla k do esn't resp ond to the atta k, he will lose his queen; the analysis onsiders plaes to whi h the queen ould mo v e and appropriate resp onses to ea h. 305 Ginsber g A ma hine onsiders resp onses to the queen mo v es as w ell, of ourse. But it m ust also analyze in detail ev ery other Bla k mo v e, arefully demonstrating that ea h of these other mo v es an b e refuted b y apturing the Bla k queen. A six-ply sear h will ha v e to analyze ev ery one of these mo v es v e further ply , ev en if the refutations are iden tial in all ases. Con v en tional pruning te hniques annot help here; using - pruning, for example, the en tire \main line" (White's winning hoies and all of Bla k's losing resp onses) m ust b e analyzed ev en though there is a great deal of apparen t redundany in this analysis. 3 In other sear h problems, te hniques based on the ideas of dep endeny main tenane (Stall- man & Sussman, 1977) an p oten tially b e used to o v erome this sort of diÆult y . As an example, onsider hronologial ba ktra king applied to a map oloring problem. When a dead end is rea hed and the sear h ba ks up, no information is a hed and the eet is to eliminate only the sp ei dead end that w as enoun tered. Reording information giving the reason for the failure an mak e the sear h substan tially more eÆien t. In attempting to olor a map with only three olors, for example, thirt y oun tries ma y ha v e b een olored while the deteted on tradition in v olv es only v e. By reording the on tradition for those v e oun tries, dead ends that fail for the same reason an b e a v oided. Dep endeny-based metho ds ha v e b een of limited use in pratie b eause of the o v erhead in v olv ed in onstruting and using the olletion of aum ulated reasons. This problem has b een substan tially addressed in the w ork on dynami ba ktra king (Ginsb erg, 1993) and its suessors su h as relsa t (Ba y ardo & Mirank er, 1996), where p olynomial limits are plaed on the n um b er of nogo o ds b eing main tained. In game sear h, ho w ev er, most algorithms already inlude signian t a hed information in the form of a transp osition table (Green blatt, Eastlak e, & Cro k er, 1967; Marsland, 1986). A transp osition table stores a single game p osition and the ba k ed up v alue that has b een asso iated with it. The name reets the fat that man y games \transp ose" in that iden tial p ositions an b e rea hed b y sw apping the order in whi h mo v es are made. The transp osition table eliminates the need to reompute v alues for p ositions that ha v e already b een analyzed. These olleted observ ations lead naturally to the idea that transp osition tables should store not single p ositions and their v alues, but sets of p ositions and their v alues. Con tin uing the dep endeny-main tenane analogy , a transp osition table storing sets of p ositions an prune the subsequen t sear h far more eÆien tly than a table that stores only singletons. There are t w o reasons that this approa h w orks. The rst, whi h w e ha v e already men- tioned, is that most game-pla ying programs already main tain transp osition tables, thereb y inurring the bulk of the omputational exp ense in v olv ed in storing su h tables in a more general form. The seond and more fundamen tal reason is that when a game ends with one pla y er the winner, the reason for the vitory is generally a lo al one. A hess game an b e though t of as ending when one side has its king aptured (a ompletely lo al phenomenon); a he k ers game, when one side runs out of mo v es. Ev en if an in ternal sear h no de is ev al- uated b efore the game ends, the reason for assigning it an y sp ei v alue is lik ely to b e indep enden t of some global features (e.g., is the Bla k pa wn on a 5 or a 6?). P artition sear h exploits b oth the existene of transp osition tables and the lo alit y of ev aluation for realisti games. 3. An informal solution to this is Adelson-V elskiy et al.'s metho d of analo gies (Adelson-V elskiy , Arlazaro v, & Donsk o y , 1975). This approa h app ears to ha v e b een of little use in pratie b eause it is restrited to a sp ei lass of situations arising in hess games. 306 GIB: Imperfet inf orma tion in a omput a tionall y hallenging game ! ! ! ! ! ! a a a a a a X X X O O O X X X X O O O X X X X O O O X X X O O X O X X X O O O X X X O O O X X X O O O X X X O O O X X X O O X O mo v es Figure 1: A p ortion of the game tree for ti-ta-to e This setion explains these ideas via an example and then desrib es them formally . Exp erimen tal results for bridge are also presen ted. 2.1 An example Our illustrativ e examples for partition sear h will b e tak en from the game of ti-ta-to e. A p ortion of the game tree for this game app ears in Figure 1, where w e are analyzing a p osition that is a win for X. W e sho w O's four p ossible mo v es, and a winning resp onse for X in ea h ase. Although X frequen tly wins b y making a ro w aross the top of the diagram, - pruning annot redue the size of this tree b eause O's losing options m ust all b e analyzed separately . Consider no w the p osition at the lo w er left in the diagram, where X has w on: X X X O O O X (1) The reason that X has w on is lo al. If w e are retaining a list of p ositions with kno wn outomes, the en try w e an mak e b eause of this p osition is: X X X ? ? ? ? ? ? (2) where the ? means that it is irrelev an t whether the asso iated square is mark ed with an X, an O, or unmark ed. This table en try orresp onds not to a single p osition, but to appro ximately 3 6 b eause the unassigned squares an on tain X's, O's, or b e blank. W e an redue the game tree in Figure 1 to: 307 Ginsber g ! ! ! ! ! ! a a a a a a X X X ? ? ? ? ? ? X X O O X O X X X O O O X X X O O O X X X O O O X X X O O O X X X O O X O mo v es Con tin uing the analysis, it is lear that the p osition X X ? ? ? ? ? ? (3) is a win for X if X is on pla y . 4 So is X ? ? ? ? ? ? X and the tree an b e redued to: H H H H H X X X ? ? ? ? ? ? X ? ? ? X ? ? ? X X X ? ? ? ? ? ? X ? ? ? ? ? ? X X X O O X O mo v es Finally , onsider the p osition X X ? ? ? X (4) where it is O's turn as opp osed to X's. If O mo v es in the seond ro w, w e get an instane of X X ? ? ? ? ? ? while if O mo v es to the upp er righ t, w e get an instane of X ? ? ? ? ? ? X 4. W e assume that O has not already w on the game here, sine X w ould not b e \on pla y" if the game w ere o v er. 308 GIB: Imperfet inf orma tion in a omput a tionall y hallenging game Th us ev ery one of O's mo v es leads to a p osition that is kno wn to b e a win for X, and w e an onlude that the original p osition (4) is a win for X as w ell. The ro ot no de in the redued tree an therefore b e replaed with the p osition of (4). These p ositions apture the essene of the algorithm w e will prop ose: If pla y er x an mo v e to a p osition that is a mem b er of a set kno wn to b e a win for x , the giv en p osition is a win as w ell. If ev ery mo v e is to a p osition that is a loss, the original p osition is also. 2.2 F ormalizing partition sear h In this setion, w e presen t a summary of existing metho ds for ev aluating p ositions in game trees. There is nothing new here; our aim is simply to dev elop a preise framew ork in whi h our new results an b e presen ted. Denition 2.2.1 A n in terv al-v alued game is a quadruple ( G; p I ; s; ev ), wher e G is a nite set of le gal p ositions, p I 2 G is the initial p osition, s : G ! 2 G gives the imme diate su essors of a given p osition, and ev is an evaluation funtion ev : G ! f max ; min g [ [0 ; 1℄ Informal ly, p 0 2 s ( p ) me ans that p osition p 0 an b e r e ahe d fr om p in a single move, and the evaluation funtion ev lab els internal no des b ase d up on whose turn it is to play ( max or min ) and values terminal p ositions in terms of some element of the unit interval [0 ; 1℄ . The strutur es G , p I , s and ev ar e r e quir e d to satisfy the fol lowing onditions: 1. Ther e is no se quen e of p ositions p 0 ; : : : ; p n with n > 0 , p i 2 s ( p i 1 ) for e ah i and p n = p 0 . In other wor ds, ther e ar e no \lo ops" that r eturn to an identi al p osition. 2. ev ( p ) 2 [0 ; 1℄ if and only if s ( p ) = . In other wor ds, ev assigns a numeri al value to p if and only if the game is over. Informal ly, ev ( p ) = max me ans that the maximizer is to play and ev ( p ) = min me ans that the minimizer is to play. W e use 2 G to denote the p o w er set of G , the set of subsets of G . There are t w o further things to note ab out this denition. First, the requiremen t that the game ha v e no \lo ops" is onsisten t with all mo dern games. In hess, for example, p ositions an rep eat but there is a onealed oun ter that dra ws the game if either a single p osition rep eats three times or a ertain n um b er of mo v es pass without a apture or a pa wn mo v e. In fat, dealing with the hidden oun ter is more natural in a partition sear h setting than a on v en tional one, sine the ev aluation funtion is in general (although not alw a ys) indep enden t of the v alue of the oun ter. Seond, the range of ev inludes the en tire unit in terv al [0 ; 1℄. The v alue 0 represen ts a win for the minimizer, and 1 a win for the maximizer. The in termediate v alues migh t orresp ond to in termediate results (e.g., a dra w) or, more imp ortan tly , allo w us to deal with in ternal sear h no des that are b eing treated as terminal and assigned appro ximate v alues b eause no time remains for additional sear h. The ev aluation funtion ev an b e used to assign n umerial v alues to the en tire set G of p ositions: 309 Ginsber g Denition 2.2.2 Given an interval-value d game ( G; p I ; s; ev ), we intr o du e a funtion ev : G ! [0 ; 1℄ dene d r e ursively by ev ( p ) = 8 < : ev ( p ) ; if ev ( p ) 2 [0 ; 1℄ ; max p 0 2 s ( p ) ev ( p 0 ) ; if ev ( p ) = max ; min p 0 2 s ( p ) ev ( p 0 ) ; if ev ( p ) = min . The v alue of ( G; p I ; s; ev ) is dene d to b e ev ( p I ) . T o ev aluate a p osition in a game, w e an use the w ell-kno wn minimax pro edure: Algorithm 2.2.3 (Minimax) F or a game ( G; p I ; s; ev ) and a p osition p 2 G , to ompute ev ( p ): if ev ( p ) 2 [0 ; 1℄ return ev ( p ) if ev ( p ) = max return max p 0 2 s ( p ) minimax ( p 0 ) if ev ( p ) = min return min p 0 2 s ( p ) minimax ( p 0 ) There are t w o w a ys in whi h the ab o v e algorithm is t ypially extended. The rst in- v olv es the in tro dution of transp osition tables; w e will assume that a new en try is added to the transp osition table T whenev er one is omputed. (A mo diation to a he only seleted results is straigh tforw ard.) The seond in v olv es the in tro dution of - pruning. Inorp orating these ideas giv es us the algorithm at the top of the next page. Ea h en try in the transp osition table onsists of a p osition p , the urren t utos [ x; y ℄, and the omputed v alue v . Note the need to inlude information ab out the utos in the transp osition table itself, sine the v alidit y of an y partiular en try dep ends on the utos in question. As an example, supp ose that the v alue of some no de is in fat 1 (a win for the maxi- mizer) but that when the no de is ev aluated with utos of [0 ; 0 : 5℄ a v alue of 0.5 is returned (indiating a dra w) b eause the maximizer has an ob viously dra wing line. It is lear that this v alue is only aurate for the giv en utos; wider utos will lead to a dieren t answ er. In general, the upp er uto y is the urren tly smallest v alue assigned to a minimizing no de; the minimizer an do at least this w ell in that he an fore a v alue of y or lo w er. Similarly , x is the urren tly greatest v alue assigned to a maximizing no de. These uto v alues are up dated as the algorithm is in v ok ed reursiv ely in the lines resp onsible for setting v new , the v alue assigned to a hild of the urren t p osition p . Prop osition 2.2.4 Supp ose that v = ( p; [ x; y ℄) for e ah entry ( p; [ x; y ℄ ; v ) in T . Then if ev ( p ) 2 [ x; y ℄ , the value r eturne d by A lgorithm 2.2.5 is ev ( p ) . 310 GIB: Imperfet inf orma tion in a omput a tionall y hallenging game Algorithm 2.2.5 ( - pruning with transp osition tables) Giv en an in terv al-v alued game ( G; p I ; s; ev ), a p osition p 2 G , utos [ x; y ℄ [0 ; 1℄ and a transp osition table T onsisting of triples ( p; [ a; b ℄ ; v ) with p 2 G and a b; v 2 [0 ; 1℄, to ompute ( p; [ x; y ℄): if there is an en try ( p; [ x; y ℄ ; z ) in T return z if ev ( p ) 2 [0 ; 1℄ then v ans = ev ( p ) if ev ( p ) = max then v ans := 0 for ea h p 0 2 s ( p ) do v new = ( p 0 ; [max ( v ans ; x ) ; y ℄) if v new y then T := T [ ( p; [ x; y ℄ ; v new ) return v new if v new > v ans then v ans = v new if ev ( p ) = min then v ans := 1 for ea h p 0 2 s ( p ) do v new = ( p 0 ; [ x; min ( v ans ; y )℄) if v new x then T := T [ ( p; [ x; y ℄ ; v new ) return v new if v new < v ans then v ans = v new T := T [ ( p; [ x; y ℄ ; v ans ) return v ans 2.3 P artitions W e are no w in a p osition to presen t our new ideas. W e b egin b y formalizing the idea of a p osition that an rea h a kno wn winning p osition or one that an rea h only kno wn losing ones. Denition 2.3.1 Given an interval-value d game ( G; p I ; s; ev ) and a set of p ositions S G , we wil l say that the set of p ositions that an rea h S is the set of al l p for whih s ( p ) \ S 6 = . This set wil l b e denote d R 0 ( S ) . The set of p ositions onstrained to rea h S is the set of al l p for whih s ( p ) S , and is denote d C 0 ( S ) . These denitions should mat h our in tuition; the set of p ositions that an rea h a set S is indeed the set of p ositions p for whi h some elemen t of S is an immediate suessor of p , so that s ( p ) \ S 6 = . Similarly , a p osition p is onstrained to rea h S if every immediate suessor of p is in S , so that s ( p ) S . Unfortunately , it ma y not b e feasible to onstrut the R 0 and C 0 op erators expliitly; there ma y b e no onise represen tation of the set of all p ositions that an rea h S . In pratie, this will b e reeted in the fat that the data strutures b eing used to desrib e 311 Ginsber g the set S ma y not on v enien tly desrib e the set R 0 ( S ) of all situations from whi h S an b e rea hed. No w supp ose that w e are expanding the sear h tree itself, and w e nd ourselv es analyz- ing a partiular p osition p that is determined to b e a win for the maximizer b eause the maximizer an mo v e from p to the winning set S ; in other w ords, p is a win b eause it is in R 0 ( S ). W e w ould lik e to reord at this p oin t that the set R 0 ( S ) is a win for the maxi- mizer, but ma y not b e able to onstrut or represen t this set on v enien tly . W e will therefore assume that w e ha v e some omputationally eetiv e w a y to appro ximate the R 0 and C 0 funtions, in that w e ha v e (for example) a funtion R that is a onserv ativ e implemen tation of R 0 in that if R sa ys w e an rea h S , then so w e an: R ( p; S ) R 0 ( S ) R ( p; S ) is in tended to represen t a set of p ositions that are \lik e p in that they an rea h the (winning) set S ." Note the inlusion of p as an argumen t to R ( p; S ), sine w e ertainly w an t p 2 R ( p; S ). W e are ab out to a he the fat that ev ery elemen t of R ( p; S ) is a win for the maximizer, and ertainly w an t that information to inlude the fat that p itself has b een sho wn to b e a win. Th us w e require p 2 R ( p; S ) as w ell. Finally , w e need some w a y to generalize the information returned b y the ev aluation funtion; if the ev aluation funtion itself iden ties a p osition p as a win for the maximizer, w e w an t to ha v e some w a y to generalize this to a wider set of p ositions that are also wins. W e formalize this b y assuming that w e ha v e some generalization funtion P that \resp ets" the ev aluation funtion in the sense that the v alue returned b y P is a set of p ositions that ev ev aluates iden tially . Denition 2.3.2 L et ( G; p I ; S; ev ) b e an interval-value d game. L et f b e any funtion with r ange 2 G , so that f sele ts a set of p ositions b ase d on its ar guments. We wil l say that f resp ets the evaluation funtion ev if whenever p; p 0 2 F for any F in the r ange of f , ev ( p ) = ev ( p 0 ) . A partition system for the game is a triple ( P ; R ; C ) of funtions that r esp e t ev suh that: 1. P : G ! 2 G maps p ositions into sets of p ositions suh that for any p osition p , p 2 P ( p ) . 2. R : G 2 G ! 2 G a epts as ar guments a p osition p and a set of p ositions S . If p 2 R 0 ( S ) , so that p an r e ah S , then p 2 R ( p; S ) R 0 ( S ) . 3. C : G 2 G ! 2 G a epts as ar guments a p osition p and a set of p ositions S . If p 2 C 0 ( S ) , so that p is onstr aine d to r e ah S , then p 2 C ( p; S ) C 0 ( S ) . As men tioned ab o v e, the funtion P tells us whi h p ositions are suÆien tly \lik e" p that they ev aluate to the same v alue. In ti-ta-to e, for example, the p osition (1) where X has w on with a ro w aross the top migh t b e generalized b y P to the set of p ositions X X X ? ? ? ? ? ? (5) 312 GIB: Imperfet inf orma tion in a omput a tionall y hallenging game as in (2). The funtions R and C appro ximate R 0 and C 0 . One again turning to our ti-ta-to e example, supp ose that w e tak e S to b e the set of p ositions app earing in (5) and that p is giv en b y X X O O O X so that S an b e rea hed from p . R ( p; S ) migh t b e X X ? ? ? ? ? ? (6) as in (3), although w e ould also tak e R ( p; S ) = f p g or R ( p; S ) to b e X X O O O X [ X X ? ? ? ? ? ? [ X X ? ? ? ? ? ? although this last union migh t b e a wkw ard to represen t. Note again that R and C are funtions of p as w ell as S ; the set returned m ust inlude the giv en p osition p but an otherwise b e exp eted to v ary as p do es. W e will no w mo dify Algorithm 2.2.5 so that the transp osition table, instead of a hing results for single p ositions, a hes results for sets of p ositions. As disussed in the in tro du- tion to this setion, this is an analog to the in tro dution of truth main tenane te hniques in to adv ersary sear h. The mo died algorithm 2.3.3 app ears in Figure 2 and returns a pair of v alues { the v alue for the giv en p osition, and a set of p ositions that will tak e the same v alue. Prop osition 2.3.4 Supp ose that v = ( p; [ x; y ℄) for every ( S; [ x; y ℄ ; v ) in T and p 2 S . Then if ev ( p ) 2 [ x; y ℄ , the value r eturne d by A lgorithm 2.3.3 is ev ( p ) . Pro of. W e need to sho w that when the algorithm returns, an y p osition in S ans will ha v e the v alue v ans . This will ensure that the transp osition table remains orret. T o see this, supp ose that the no de b eing expanded is a maximizing no de; the minimizing ase is dual. Supp ose rst that this no de is a loss for the maximizer, ha ving v alue 0. In sho wing that the no de is a loss, w e will ha v e examined suessor no des that are in sets denoted S new in Algorithm 2.3.3; if the maximizer subsequen tly nds himself in a p osition from whi h he has no mo v es outside of the v arious S new , he will still b e in a losing p osition. Sine S all = [ S new , the maximizer will lose in an y p osition from whi h he is onstrained to next mo v e in to an elemen t of S all . Sine ev ery p osition in C ( p; S all ) has this prop ert y , it is safe to tak e S ans = C ( p; S all ). This is what is done in the rst line with a dagger in the algorithm. The more in teresting ase is where the ev en tual v alue of the no de is nonzero; no w in order for another no de n to demonstrably ha v e the same v alue, the maximizer m ust ha v e no new options at n , and m ust still ha v e some mo v e that a hiev es the v alue v ans at n . The rst ondition is iden tial to the earlier ase where v ans = 0. F or the seond, note that an y time the maximizer nds a new b est mo v e, w e set S ans to the set of p ositions that 313 Ginsber g Algorithm 2.3.3 (P artition sear h) Giv en a game ( G; p I ; s; ev ) and ( P ; R ; C ) a partition system for it, a p osition p 2 G , utos [ x; y ℄ [0 ; 1℄ and a transp osition table T onsisting of triples ( S; [ a; b ℄ ; v ) with S G and a b; v 2 [0 ; 1℄, to ompute ( p; [ x; y ℄): if there is an en try ( S; [ x; y ℄ ; z ) with p 2 S return h z ; S i if ev ( p ) 2 [0 ; 1℄ then h v ans ; S ans i = h ev ( p ) ; P ( p ) i if ev ( p ) = max then v ans := 0 S all := for ea h p 0 2 s ( p ) do h v new ; S new i = ( p 0 ; [max ( v ans ; x ) ; y ℄) if v new y then T := T [ ( S new ; [ x; y ℄ ; v new ) return h v new ; S new i if v new > v ans then h v ans ; S ans i = h v new ; S new i S all := S all [ S new if v ans = 0 then S ans = C ( p; S all ) y else S ans = R ( p; S ans ) \ C ( p; S all ) y z if ev ( p ) = min then v ans := 1 S all := for ea h p 0 2 s ( p ) do h v new ; S new i = ( p 0 ; [ x; min( v ans ; y )℄) if v new x then T := T [ ( S new ; [ x; y ℄ ; v new ) return h v new ; S new i if v new < v ans then h v ans ; S ans i = h v new ; S new i S all := S all [ S new if v ans = 1 then S ans = C ( p; S all ) else S ans = R ( p; S ans ) \ C ( p; S all ) z T := T [ ( S ans ; [ x; y ℄ ; v ans ) return h v ans ; S ans i Figure 2: The partition sear h algorithm 314 GIB: Imperfet inf orma tion in a omput a tionall y hallenging game w e kno w reursiv ely a hiev e the same v alue. When w e omplete the maximizer's lo op in the algorithm, it follo ws that S ans will b e a set of p ositions from whi h the maximizer an indeed a hiev e the v alue v ans . Th us the maximizer an also a hiev e that v alue from an y p osition in R ( p; S ans ). It follo ws that the o v erall set of p ositions kno wn to ha v e the v alue v ans is giv en b y R ( p; S ans ) \ C ( p; S all ), in terseting the t w o onditions of this paragraph. This is what is done in the seond daggered step in the algorithm. 2.4 Zero-windo w v ariations The eetiv eness of partition sear h dep ends ruially on the size of the sets main tained in the transp osition table. If the sets are large, man y p ositions will b e ev aluated b y lo okup. If the sets are small, partition sear h ollapses to on v en tional - pruning. An examination of Algorithm 2.3.3 suggests that the p oin ts in the algorithm at whi h the sets are redued the most are those mark ed with a double dagger in the desription, where an in tersetion is required b eause w e need to ensure b oth that the pla y er an mak e a mo v e equiv alen t to his b est one and that there are no other options. The eetiv eness of the metho d w ould b e impro v ed if this p ossibilit y w ere remo v ed. T o see ho w to do this, supp ose for a momen t that the ev aluation funtion alw a ys returned 0 or 1, as opp osed to in termediate v alues. No w if the maximizer is on pla y and the v alue v new = 1, a prune will b e generated b eause there an b e no b etter v alue found for the maximizer. If all of the v new are 0, then v ans = 0 and w e an a v oid the troublesome in tersetion. The maximizer loses and there is no \b est" mo v e that w e ha v e to w orry ab out making. In realit y , the restrition to v alues of 0 or 1 is unrealisti. Some games, su h as bridge, allo w more than t w o outomes, while others annot b e analyzed to termination and need to rely on ev aluation funtions that return appro ximate v alues for in ternal no des. W e an deal with these situations using a te hnique kno wn as zer o-window se ar h (originally alled s out sear h (P earl, 1980)). T o ev aluate a sp ei p osition, one rst estimates the v alue to b e e and then determines whether the atual v alue is ab o v e or b elo w e b y treating an y v alue v > e as a win for the maximizer and an y v alue v e as a win for the minimizer. The results of this alulation an then b e used to rene the guess, and the pro ess is rep eated. If no initial estimate is a v ailable, a binary sear h an b e used to nd the v alue to within an y desired tolerane. Zero-windo w sear h is eetiv e b eause little time is w asted on iterations where the estimate is wildly inaurate; there will t ypially b e man y lines sho wing that a new estimate is needed. Most of the time is sp en t on the last iteration or t w o, dev eloping tigh t b ounds on the p osition b eing onsidered. There is an analog in on v en tional - pruning, where the b ounds t ypially get tigh t qui kly and the bulk of the analysis deals with a situation where the v alue of the original p osition is kno wn to lie in a fairly narro w range. In zero-windo w sear h, a no de alw a ys ev aluates to 0 or 1, sine either v > e or v e . This allo ws a straigh tforw ard mo diation to Algorithm 2.3.3 that a v oids the troublesome ases men tioned earlier. 315 Ginsber g 2.5 Exp erimen tal results P artition sear h w as tested b y analyzing 1000 randomly generated bridge deals and om- paring the n um b er of no des expanded using partition sear h and on v en tional metho ds. In addition to our general in terest in bridge, there are t w o reasons wh y it an b e exp eted that partition sear h will b e useful for this game. First, partition sear h requires that the funtions R 0 and C 0 supp ort a partition-lik e analysis; it m ust b e the ase that an analysis of one situation will apply equally w ell to a v ariet y of similar ones. Seond, it m ust b e p ossible to build appro ximating funtions R and C that are reasonably aurate represen tativ es of R 0 and C 0 . Bridge satises b oth of these prop erties. Exp ert disussion of a partiular deal often will refer to small ards as x 's, indiating that it is indeed the ase that the exat ranks of these ards are irrelev an t. Seond, it is p ossible to \ba k up" x 's from one p osition to its predeessors. If, for example, one pla y er pla ys a lub with no hane of ha ving it impat the rest of the game, and b y doing so rea hes a p osition in whi h subsequen t analysis sho ws him to ha v e t w o small lubs, then he learly m ust ha v e had thr e e small lubs originally . Finally , the fat that ards are simply b eing replaed b y x 's means that it is p ossible to onstrut data strutures for whi h the time p er no de expanded is virtually un hanged from that using on v en tional metho ds. P erhaps an example will mak e this learer. Consider the follo wing partial bridge deal in whi h East is to lead and there are no trumps: | ~ | } AK | | 10 A Q ~ A ~ | } | } | | | | | KJ ~ | } | | | An analysis of this situation sho ws that in the main line, the only ards that win tri ks b y virtue of their ranks are the spade Ae, King and Queen. This santions the replaemen t of the ab o v e gure b y the follo wing more general one: 316 GIB: Imperfet inf orma tion in a omput a tionall y hallenging game | ~ | } xx | | x A Q ~ x ~ | } | } | | | | | Kx ~ | } | | | Note rst that this replaemen t is sound in the sense that ev ery p osition that is an instane of the seond diagram is guar ante e d to ha v e the same v alue as the original. W e ha v e not resorted to an informal argumen t of the form \Ja ks and lo w er tend not to matter," but instead to a preise argumen t of the form, \In the expansion of the sear h tree asso iated with the giv en deal, Ja ks and lo w er w ere pro v en never to matter." Bridge also app ears to b e extremely w ell-suited (no pun in tended) to the kind of analysis that w e ha v e b een desribing; a hess analog migh t in v olv e desribing a mating om bination and sa ying that \the p osition of Bla k's queen didn't matter." While this do es happ en, asual hess on v ersation is m u h less lik ely to inlude this sort of remark than bridge on v ersation is lik ely to refer to a host of small ards as x 's, suggesting at least that the partition te hnique is more easily applied to bridge than to hess (or to other games). That said, ho w ev er, the results for bridge are striking, leading to p erformane impro v e- men ts of an order of magnitude or more on fairly small sear h spaes (p erhaps 10 6 no des). The deals w e tested in v olv ed b et w een 12 and 48 ards and w ere analyzed to termination, so that the depth of the sear h v aried from 12 to 48. (The solv er without partition sear h w as unable to solv e larger problems.) The bran hing fator for minimax without transp osition tables app eared to b e appro ximately 4, and the results app ear in Figure 3. Ea h p oin t in the graph orresp onds to a single deal. The p osition of the p oin t on the x -axis indiates the n um b er of no des expanded using - pruning and transp osition tables, and the p osition on the y -axis the n um b er expanded using partition sear h as w ell. Both axes are plotted logarithmially . In b oth the partition and on v en tional ases, a binary zero-windo w sear h w as used to determine the exat v alue to b e assigned to the hand, whi h the rules of bridge onstrain to range from 0 to the n um b er of tri ks left (one quarter of the n um b er of ards in pla y). As men tioned previously , hands generated using a full de k of 52 ards w ere not onsidered b eause the on v en tional metho d w as in general inapable of solving them. The program w as run on a Spar 5 and P o w erMa 6100, where it expanded appro ximately 15K no des/seond. The transp osition table shares ommon struture among dieren t sets and as a result, uses appro ximately 6 b ytes/no de. The dotted line in the gure is y = x and orresp onds to the break ev en p oin t relativ e to - pruning in isolation. The solid line is the least-squares b est t to the logarithmi data, and is giv en b y y = 1 : 57 x 0 : 76 . This suggests that partition sear h is leading to an eetiv e redution in bran hing fator of b ! b 0 : 76 . This impro v emen t, ab o v e and b ey ond that 317 Ginsber g 10 10 3 10 5 10 7 10 10 3 10 5 10 7 P artition Con v en tional p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 1 : 57 x 0 : 76 Figure 3: No des expanded as a funtion of metho d pro vided b y - pruning, an b e on trasted with - pruning itself, whi h giv es a redution when ompared to pure minimax of b ! b 0 : 75 if the mo v es are ordered randomly (P earl, 1982) and b ! b 0 : 5 if the ordering is optimal. The metho d w as also applied to full deals of 52 ards, whi h an b e solv ed while ex- panding an a v erage of 18,000 no des p er deal. 5 This w orks out to ab out a seond of pu time. 3. Mon te Carlo ardpla y algorithms One w a y in whi h w e migh t use our p erfet-information ardpla y engine to pro eed in a realisti situation w ould b e to deal the unseen ards at random, biasing the deal so that it w as onsisten t b oth with the bidding and with the ards pla y ed th us far. W e ould then analyze the resulting deal double dumm y and deide whi h of our p ossible pla ys w as the strongest. Av eraging o v er a large n um b er of su h Mon te Carlo samples w ould allo w us to deal with the imp erfet nature of bridge information. This idea w as initially suggested b y Levy (Levy , 1989), although he do es not app ear to ha v e realized (see b elo w) that there are problems with it in pratie. Algorithm 3.0.1 (Mon te Carlo ard seletion) T o sele t a move fr om a andidate set M of suh moves: 5. The v ersion of gib that w as released in Otob er of 2000 replaed the transp osition table with a data struture that uses a xed amoun t of memory , and also sorts the mo v es based on narro wness (suggested b y Plaat et al. (Plaat, S haeer, Pijls, & de Bruin, 1996) to b e ro oted in the idea of onspiray sear h (MAllester, 1988)) and the killer heuristi. While the memory requiremen ts are redued, the o v erall p erformane is little hanged. 318 GIB: Imperfet inf orma tion in a omput a tionall y hallenging game 1. Construt a set D of de als onsistent with b oth the bidding and play of the de al thus far. 2. F or e ah move m 2 M and e ah de al d 2 D , evaluate the double dummy r esult of making the move m in the de al d . Denote the s or e obtaine d by making this move s ( m; d ) . 3. R eturn that m for whih P d s ( m; d ) is maximal. The Mon te Carlo approa h has dra wba ks that ha v e b een p oin ted out b y a v ariet y of authors, inluding Koller y and others (F rank & Basin, 1998). Most ob vious among these is that the approa h nev er suggests making an \information gathering pla y ." After all, the p erfet-information v arian t on whi h the deision is based in v ariably assumes that the information will b e a v ailable b y the time the next deision m ust b e made! Instead, the tendeny is for the approa h to simply defer imp ortan t deisions; in man y situations this ma y lead to information gathering inadv erten tly , but the amoun t of information aquired will generally b e far less than other approa hes migh t pro vide. As an example, supp ose that on a partiular deal, gib has four p ossible lines of pla y to mak e its on trat: 1. Line A w orks if W est has the Q. 2. Line B w orks if East has the Q. 3. Line C defers the guess un til later. 4. Line D (the lev er line) w orks indep enden t of who has the Q. Assuming that either pla y er is equally lik ely to hold the Q, a Mon te Carlo analyzer will orretly onlude that line A w orks half the time, and line B w orks half the time. Line C , ho w ev er, will b e presumed to w ork al l of the time, sine the on trat an still b e made (double dumm y) if the guess is deferred. Line D will also b e onluded to w ork all of the time (orretly , in this ase). As a result, gib will ho ose randomly b et w een the last t w o p ossibilities ab o v e, b elieving as it do es that if it an only defer the guess un til later (ev en the next ard), it will mak e that guess orretly . The orret pla y , of ourse, is D . W e will disuss a solution to these diÆulties in Setions 5{7; although gib 's defensiv e ardpla y on tin ues to b e based on the ab o v e ideas, its delarer pla y no w uses stronger te h- niques. Nev ertheless, basing the ard pla y on the algorithm presen ted leads to extremely strong results, appro ximately at the lev el of a h uman exp ert. Sine gib 's in tro dution, all other omp etitiv e bridge-pla ying programs ha v e swit hed their ardpla y to similar meth- o ds, although gib 's double dumm y analysis is substan tially faster than most of the other programs and its pla y is orresp ondingly stronger. W e will desrib e three tests of GIB's ardpla y algorithms: P erformane on a om- merially a v ailable set of b en hmarks, p erformane in a h uman hampionship designed to highligh t ardpla y in isolation, and statistial p erformane measured o v er a large set of deals. 319 Ginsber g F or the rst test, w e ev aluated the strength of gib 's ardpla y using Bridge Master (BM), a ommerial program dev elop ed b y Canadian in ternationalist F red Gitelman. BM on tains 180 deals at 5 lev els of diÆult y . Ea h of the 36 deals on ea h lev el is a problem in delarer pla y . If y ou mispla y the hand, BM mo v es the defenders' ards around if neessary to ensure y our defeat. BM w as used for the test instead of randomly dealt deals b eause the signal to noise ra- tio is far higher; go o d pla ys are generally rew arded and bad ones punished. Ev ery deal also on tains a lesson of some kind; there are no ompletely unin teresting deals where the line of pla y is irrelev an t or ob vious. There are dra wba ks to testing gib 's p erformane on non- randomly dealt deals, of ourse, sine the BM deals ma y in some w a y not b e represen tativ e of the problems a bridge pla y er w ould atually enoun ter at the table. The test w as run under Mirosoft Windo ws on a 200 MHz P en tium Pro. As a b en hmark, Bridge Baron (BB) v ersion 6 w as also tested on the same deals using the same hardw are. 6 BB w as giv en 10 seonds to selet ea h pla y , and gib w as giv en 90 seonds to pla y the en tire deal with a maxim um Mon te Carlo sample size of 50. 7 New deals w ere generated ea h time a pla y deision needed to b e made. These n um b ers appro ximately equalized the omputational resoures used b y the t w o programs; BB ould in theory tak e 260 seonds p er deal (ten seonds on ea h of 26 pla ys), but in pratie to ok substan tially less. Gib w as giv en the autions as w ell; there w as no failit y for doing this in BB. This information w as ritial on a small n um b er of deals. Here is ho w the t w o systems p erformed: Lev el BB GIB 1 16 31 2 8 23 3 2 12 4 1 21 5 4 13 T otal 33 100 18.3% 55.6% Ea h en try is the n um b er of deals that w ere pla y ed suessfully b y the program in question. Gib 's mistak es are illuminating. While some of them in v olv e failing to gather informa- tion, most are problems in om bining m ultiple hanes (as in ase D ab o v e). As BM's deals get more diÆult, they more often in v olv e om bining a v ariet y of p ossibly winning options and that is wh y GIB's p erformane falls o at lev els 2 and 3. A t still higher lev els, ho w ev er, BM t ypially in v olv es the suessful dev elopmen t of omplex end p ositions, and gib 's p erformane reb ounds. This app eared to happ en to BB as w ell, although to a m u h lesser exten t. It w as gratifying to see gib diso v er for itself the omplex end p ositions around whi h the BM deals are designed, and more gratifying still to witness gib 's diso v ery of a maneuv er that had hitherto not b een iden tied in the bridge literature, as desrib ed in App endix B. 6. The urren t v ersion is Bridge Baron 10 and ould b e exp eted to p erform guardedly b etter in a test su h as this. Bridge Baron 6 do es not inlude the Smith enhanemen ts (Smith et al., 1996). 7. GIB's Mon te Carlo sample size is xed at 50 in most ases, whi h pro vides a go o d ompromise b et w een sp eed of pla y and auray of result. 320 GIB: Imperfet inf orma tion in a omput a tionall y hallenging game Exp erimen ts su h as this one are tedious, b eause there is no text in terfae to a om- merial program su h as Bridge Master or Bridge Baron. As a result, information regarding the sensitivit y of gib 's p erformane to v arious parameters tends to b e only anedotal. Gib solv es an additional 16 problems (bringing its total to 64.4%) giv en additional resoures in the form of extra time (up to 100 seonds p er pla y , although that time w as v ery rarely tak en), a larger Mon te Carlo sample (100 deals instead of 50) and hand-generated explanations of the opp onen ts' bids and op ening leads. Ea h of the three fators app eared to on tribute equally to the impro v ed p erformane. Other authors are rep orting omparable lev els of p erformane for gib . F orrester, w orking with a dieren t but similar b en hmark (Bla kw o o d, 1979), rep orts 8 that gib solv es 68% of the problems giv en 20 seonds/pla y , and 74% of them giv en 30 seonds/pla y . Deals where gib has outpla y ed h uman exp erts are the topi of a series of artiles in the Dut h bridge magazine IMP (Esk es, 1997, and sequels). 9 Based on these results, gib w as in vited to partiipate in an in vitational ev en t at the 1998 w orld bridge hampionships in F rane; the ev en t in v olv ed deals similar to Bridge Master's but substan tially more diÆult. Gib joined a eld of 34 of the b est ard pla y ers in the w orld, ea h pla y er faing t w elv e su h problems o v er the ourse of t w o da ys. Gib w as leading at the halfw a y mark, but pla y ed p o orly on the seond da y (p erhaps the pressure w as to o m u h for it), and nished t w elfth. The h uman partiipan ts w ere giv en 90 min utes to pla y ea h deal, although they w ere p enalized sligh tly for pla ying slo wly . GIB pla y ed ea h deal in ab out ten min utes, using a Mon te Carlo sample size of 500; tests b efore the ev en t indiated little or no impro v emen t if gib w ere allotted more time. Mi hael Rosen b erg, the ev en tual winner of the on test and the pre-tournamen t fa v orite, in fat made one more mistak e than did Bart Bramley , the seond plae nisher. Rosen b erg pla y ed just qui kly enough that Bramley's aum ulated time p enalties ga v e Rosen b erg the vitory . The soring metho d th us fa v ors GIB sligh tly . Finally , gib 's p erformane w as ev aluated diretly using reords from atual pla y . These reords are a v ailable from high lev els of h uman omp etition (w orld and national hampi- onships, t ypially), so that it is p ossible to determine exatly ho w frequen tly h umans mak e mistak es at the bridge table. In Figure 4, w e sho w the frequeny with whi h this data indiates that a h uman delarer, leading to the n th tri k of a deal, mak es a mistak e that auses his on trat to b eome unmak eable on a double-dumm y basis. The y axis giv es the frequeny of the mistak es and is plotted logarithmially; as one w ould exp et, pla y b eomes more aurate later in the deal. W e also giv e similar data for gib , based on large sample of deals that gib pla y ed against itself. The error proles of the t w o are quite similar. Before turning to defensiv e pla y , let me p oin t out that this metho d of analysis fa v ors gib sligh tly . F ailing to mak e an information gathering pla y gets reeted in the ab o v e gure, sine the la k of information will ause gib to mak e a double-dumm y mistak e subsequen tly . But h uman delarers often w ork to giv e the defenders problems that exploit their relativ e la k of information, and that tati is not rew arded in the ab o v e analysis. Similar results for defensiv e pla y app ear in Figure 5. 8. P osting to re.games.bridge on 14 July 1997. 9. http://www.imp-bridge.nl 321 Ginsber g 0.0001 0.001 0.01 0.1 0 2 4 6 8 10 12 P(err) tri k h uman GIB Figure 4: Gib 's p erformane as delarer 1e-05 0.0001 0.001 0.01 0.1 0 2 4 6 8 10 12 P(err) tri k h uman GIB Figure 5: Gib 's p erformane as defender 322 GIB: Imperfet inf orma tion in a omput a tionall y hallenging game There are t w o imp ortan t te hnial remarks that m ust b e made ab out the Mon te Carlo algorithm b efore pro eeding. First, note that w e w ere a v alier in simply sa ying, \Construt a set D of deals onsisten t with b oth the bidding and pla y of the deal th us far." T o onstrut deals onsisten t with the bidding, w e rst simplify the aution as observ ed, building onstrain ts desribing ea h of the hands around the table. W e then deal hands onsisten t with the onstrain ts using a deal generator that deals un biased hands giv en restritions on the n um b er of ards held b y ea h pla y er in ea h suit. This set of deals is then tested to remo v e elemen ts that do not satisfy the remaining onstrain ts, and ea h of the remaining deals is passed to the bidding mo dule to iden tify those for whi h the observ ed bids w ould ha v e b een made b y the pla y ers in question. (This assumes that gib has a reasonable understanding of the bidding metho ds used b y the opp onen ts.) The o v erall dealing pro ess t ypially tak es one or t w o seonds to generate the full set of deals needed b y the algorithm. No w the ard pla y m ust b e analyzed. Ideally , gib w ould do something similar to what it do es for the bidding, determining whether ea h pla y er w ould ha v e pla y ed as indiated on an y partiular deal. Unfortunately , it is simply impratial to test ea h h yp othetial deision reursiv ely against the ardpla y mo dule itself. Instead, gib tries to ev aluate the probabilit y that W est (for example) has the K (for example), and to then use these probabilities to w eigh t the sample itself. T o understand the soure of the w eigh ting probabilities, let us onsider a sp ei exam- ple. Supp ose that in some partiular situation, gib pla ys the 5. The analysis indiates that 80% of the time that the next pla y er (sa y W est) holds the K, it is a mistak e for W est not to pla y it. In other w ords, W est's failure to pla y the K leads to o dds of 4:1 that he hasn't got it. These o dds are no w used via Ba y es' rule to adjust the probabilit y that W est holds the K at all. The probabilities are then mo died further to inlude information rev ealed b y defensiv e signalling (if an y), and the adjusted probabilities are nally used to bias the Mon te Carlo sample. The ev aluation P d s ( m; d ) in Algorithm 3.0.1 is replaed with P d w d s ( m; d ) where w d is the w eigh t assigned to deal d . More hea vily w eigh ted deals th us ha v e a larger impat on gib 's ev en tual deision. The seond te hnial p oin t regarding the algorithm itself in v olv es the fat that it needs to run qui kly and that it ma y need to b e terminated b efore the analysis is omplete. F or the former, there are a v ariet y of greedy te hniques that an b e used to ensure that a mo v e m is not onsidered if w e an sho w P d s ( d; m ) P d s ( d; m 0 ) for some m 0 . The algorithm also uses iterativ e broadening (Ginsb erg & Harv ey , 1992) to ensure that a lo w-width answ er is a v ailable if a high-width sear h fails to terminate in time. Results from the lo w- and high-width sear hes are om bined when time expires. Also regarding sp eed, the algorithm requires that for ea h deal in the Mon te Carlo sample and ea h p ossible mo v e, w e ev aluate the resulting p osition exatly . Kno wing simply that mo v e m 1 is not as go o d as mo v e m 2 for deal d is not enough; m 1 ma y b e b etter than m 2 elsewhere and w e need to ompare them quan titativ ely . This approa h is aided substan tially b y the partition sear h idea, where en tries in the transp osition table orresp ond not to single p ositions and their ev aluated v alues, but to sets of p ositions and v alues. In man y ases, m 1 and m 2 ma y fall in to the same en try of the partition table long b efore they atually transp ose in to one another exatly . 323 Ginsber g 4. Mon te Carlo bidding The purp ose of bidding in bridge is t w ofold. The primary purp ose is to share information ab out y our ards with y our partner so that y ou an o op erativ ely selet an optimal nal on trat. A seondary purp ose is to disrupt the opp onen ts' attempt to do the same. In order to a hiev e this purp ose, a wide v ariet y of bidding \languages" ha v e b een de- v elop ed. In some, when y ou suggest lubs as trumps, it means y ou ha v e a lot of them. In others, the suggestion is only temp orary and the information on v ey ed is quite dieren t. In all of these languages, some meaning is assigned to a wide v ariet y of bids in partiular situations; there are also default rules that assign meanings to bids that ha v e no sp eially assigned meanings. An y omputer bridge pla y er will need similar understandings. Bidding is in teresting b eause the meanings frequen tly o v erlap; there ma y b e one or more bids that are suitable (or nearly so) on an y partiular set of ards. Existing omputer programs ha v e simply mat hed p ossible bids against large databases giving their meanings, sear hing for that bid that b est mat hes the ards that the ma hines hold. W orld hampion Chip Martel rep orts y that h uman exp erts tak e a dieren t approa h. 10 ; 11 Although exp ert bidding is based on a database su h as that used b y existing programs, lose deisions are made b y sim ulating the results of ea h andidate ation. This in v olv es pro jeting ho w the bidding is lik ely to pro eed and ev aluating the pla y in one of a v ariet y of p ossible nal on trats. An exp ert gets his \judgmen t" from a Mon te Carlo-lik e sim ulation of the results of p ossible bids, often referred to in the bridge-pla ying omm unit y as a Bor el sim ulation (so named after the rst pla y er to desrib e the metho d). Gib tak es a similar ta k. Algorithm 4.0.2 (Borel sim ulation) T o sele t a bid fr om a andidate set B , given a datab ase Z that suggests bids in various situations: 1. Construt a set D of de als onsistent with the bidding thus far. 2. F or e ah bid b 2 B and e ah de al d 2 D , use the datab ase Z to pr oje t how the aution wil l ontinue if the bid b is made. (If no bid is suggeste d by the datab ase, the player in question is assume d to p ass.) Compute the double dummy r esult of the eventual ontr at, denoting it s ( b; d ) . 3. R eturn that b for whih P d s ( b; d ) is maximal. As with the Mon te Carlo approa h to ard pla y , this approa h do es not tak e in to aoun t the fat that bridge is not pla y ed double dumm y . Human exp erts often ho ose not to mak e bids that will on v ey to o m u h information to the opp onen ts in order to mak e the defenders' task as diÆult as p ossible. This onsideration is missing from the ab o v e algorithm. 12 10. The 1994 Rosen blum Cup W orld T eam Championship w as w on b y a team that inluded Martel and Rosen b erg. 11. F rank suggests (F rank, 1998) that the existing ma hine approa h is apable of rea hing exp ert lev els of p erformane. While this app ears to ha v e b een true in the early 1980's (Lindel of, 1983), mo dern exp ert bidding pratie has b egun to highligh t the disruptiv e asp et of bidding, and ma hine p erformane is no longer lik ely to b e omp etitiv e. 12. In theory at least, this issue ould b e addressed using the single-dumm y ideas that w e will presen t in subsequen t setions. Computational onsiderations urren tly mak e this impratial, ho w ev er. 324 GIB: Imperfet inf orma tion in a omput a tionall y hallenging game There are more serious problems also, generally en tering around the dev elopmen t of the bidding database Z . First, the database itself needs to b e built and debugged. A large n um b er of rules need to b e written, t ypially in a sp eialized language and dep enden t up on the bridge exp ertise of the author. The rules need to b e debugged as atual pla y rev eals o v ersigh ts or other diÆulties. The nature and sizes of these databases v ary enormously , although all of them represen t v ery substan tial in v estmen ts on the part of the authors. The database distributed with meado wlark bridge inludes some 7300 rules; that with q-plus bridge 2500 rules omprising 40,000 lines of sp eialized o de. Gib 's database is built using a deriv ativ e of the Meado wlark language, and inludes ab out 3000 rules. All of these databases doubtless on tain errors of one sort or another; one of the nie things ab out most bidding metho ds is that they tend to b e fairly robust against su h prob- lems. Unfortunately , the Borel algorithm desrib ed ab o v e in tro dues substan tial instabilit y in gib 's o v erall bidding. T o understand this, supp ose that the database Z is somewhat onserv ativ e in its ations. The pro jetion in step 2 of Algorithm 4.0.2 no w leads ea h pla y er to assume its partner bids onserv ativ ely , and therefore to bid somewhat aggressiv ely to omp ensate. The partnership as a whole ends up over omp ensating. W orse still, supp ose that there is an omission of some kind in Z ; p erhaps ev ery time someone bids 7 } , the database suggests a fo olish ation. Sine 7 } is a rare bid, a bid- ding system that mat hes its bids diretly to the database will enoun ter this problem infrequen tly . Gib , ho w ev er, will b e m u h more aggressiv e, bidding 7 } often on the grounds that doing so will ause the opp onen ts to mak e a mistak e. In pratie, of ourse, the bug in the database is unlik ely to b e repliated in the opp onen ts' minds, and gib 's attempts to exploit the gap will b e unrew arded or w orse. This is a serious problem, and app ears to apply to an y attempt to heuristially mo del an adv ersary's b eha vior: It is diÆult to distinguish a go o d hoie that is suessful b eause the opp onen t has no winning options from a bad hoie that app e ars suessful b eause the heuristi fails to iden tify su h options. There are a v ariet y of w a ys in whi h this problem migh t b e addressed, none of them p erfet. The most ob vious is simply to use gib 's aggressiv e tendenies to iden tify the bugs or gaps in the bidding database, and to x them. Beause of the size of the database, this is a slo w pro ess. Another approa h is to try to iden tify the bugs in the database automatially , and to b e w ary in su h situations. If the bidding sim ulation indiates that the opp onen ts are ab out to a hiev e a result m u h w orse than what they migh t a hiev e if they sa w ea h other's ards, that is evidene that there ma y b e a gap in the database. Unfortunately , it is also evidene that gib is simply eetiv ely disrupting its opp onen ts' eorts to bid aurately . Finally , restritions ould b e plaed on gib that require it to mak e bids that are \lose" to the bids suggested b y the database, on the grounds that su h bids are more lik ely to reet impro v emen ts in judgmen t than to highligh t gaps in the database. All of these te hniques are used, and all of them are useful. Gib 's bidding is substan tially b etter than that of earlier programs, but not y et of exp ert alib er. 325 Ginsber g The bidding w as tested as part of the 1998 Baron Barla y/OKBridge W orld Computer Bridge Championships, and the 2000 Orbis W orld Computer Bridge Championship. Ea h program bid deals that had previously b een bid and pla y ed b y exp erts; a result of 0 on an y partiular deal mean t that the program bid to a on trat as go o d as the a v erage exp ert result. A p ositiv e result w as b etter, and a negativ e result w as w orse. There w ere 20 deals in ea h on test; although ard pla y w as not an issue, the deals w ere seleted to p ose hallenges in bidding and a standard deviation of 5.5 imp s/deal is still a reasonable estimate. One standard deviation o v er the 20 deal set ould th us b e exp eted to b e ab out 25 imp s. Gib 's nal sore in the 1998 bidding on test w as +2 imp s; in the 2000 on test it w as +9 imp s. In b oth ases, it narro wly edged out the exp ert eld against whi h it w as ompared. 13 The next b est program in 1998, Blue Chip Bridge, nished with a sore of -35 imp s, not dissimilar from the -37 imp s that had b een suÆien t to win the bidding on test in 1997. The seond plae program in 2000 (one again Blue Chip Bridge) had a sore of -2 imp s. 5. The v alue of information In previous setions of this pap er, w e ha v e desrib ed Mon te Carlo metho ds for dealing with the fat that bridge is a game of imp erfet information, and ha v e also desrib ed p ossible problems with this approa h. W e no w turn to w a ys to o v eromes some of these diÆulties. F or the momen t, let me assume that w e replae bridge with a f 0 ; 1 g game, so that w e are in terested only in the question of whether delarer mak es his on trat. Ov ertri ks or extra undertri ks are irrelev an t. A t least as a rst appro ximation, bridge exp erts often lo ok at hands this w a y , only subsequen tly rening the analysis. If y ou ask su h an exp ert wh y he to ok a partiular line on a deal, he will often sa y something lik e, \I w as pla ying for ea h opp onen t to ha v e three hearts," or \I w as pla ying for W est to hold the spade queen." What he is rep orting is that set of distributions of the unseen ards for whi h he w as exp eting to mak e the hand. A t some lev el, the exp ert is treating the v alue of the game not as zero or one (whi h it w ould b e if he ould see the unseen ards), but as a funtion from the set of p ossible distributions of unseen ards in to f 0 ; 1 g . If w e denote this set of distributions b y S , the v alue of the game is th us a funtion f : S ! f 0 ; 1 g W e will follo w standard mathematial notation and denote the set f 0 ; 1 g b y 2 and denote the set of funtions f : S ! 2 b y 2 S . It is p ossible to extend max and min from the set f 0 ; 1 g to 2 S in a p oin t wise fashion, so that, for example min ( f ; g )( s ) = min ( f ( s ) ; g ( s )) (7) for funtions f ; g 2 2 S and a sp ei situation s 2 S . The maximizing funtion is dened similarly . 13. This is in spite of the earlier remark that GIB's bidding is not of exp ert alib er. GIB w as fortunate in the bidding on tests in that most of the problems in v olv ed situations handled b y the database. When faed with a situation that it do es not understand, GIB's bidding deteriorates drastially . 326 GIB: Imperfet inf orma tion in a omput a tionall y hallenging game As an example, supp ose that in a partiular situation, there is one line of pla y f that wins if W est has the Q. There is another line of pla y g that wins if East has exatly three hearts. No w min ( f ; g ) is the line of pla y that wins just in ase b oth W est has the Q and East has three hearts, while max( f ; g ) is the line of pla y that wins if either ondition obtains. It is imp ortan t to realize that the set 2 S is not totally ordered b y these max and min funtions, lik e the unit in terv al is. Instead, 2 S is an instane of a mathematial struture kno wn as a latti e (Gr atzer, 1978, and Setion 6). A t this p oin t, w e note only that w e an extend Denition 2.2.1 to an y set with maximization and minimization op erators: Denition 5.0.3 A game is an o tuple ( G; V ; p I ; s; ev ; f + ; f ) suh that: 1. G is a nite set of p ossible p ositions in the game. 2. V is the set of values for the game. 3. p I 2 G is the initial p osition of the game. 4. s : G ! 2 G gives the su essors of a given p osition. 5. ev : G ! f max ; min g [ V gives the value for terminal p ositions or indi ates whih player is to move for nonterminal p ositions. 6. f + : P ( V ) ! V and f : P ( V ) ! V ar e the ombination funtions for the maximizer and minimizer r esp e tively. The strutur es G , V , p I , s and ev ar e r e quir e d to satisfy the fol lowing onditions (unhange d fr om Denition 2.2.1): 1. Ther e is no se quen e of p ositions p 0 ; : : : ; p n with n > 0 , p i 2 s ( p i 1 ) for e ah i and p n = p 0 . In other wor ds, ther e ar e no \lo ops" that r eturn to an identi al p osition. 2. ev ( p ) 2 V if and only if s ( p ) = . This denition extends Denition 2.2.1 only in that the v alue set and om bination funtions ha v e b een generalized. A su h, Denition 5.0.3 inludes b oth \on v en tional" games in whi h the v alues are n umeri and the om bination funtions are max/min, and our more general setting where the v alues are funtional and the om bination funtions om bine them as desrib ed ab o v e. As usual, w e an use the maximization and minimization funtions to assign a v alue to the ro ot of the tree: Denition 5.0.4 Given a game ( G; V ; p I ; s; ev ; f + ; f ) , we intr o du e a funtion ev : G ! V dene d r e ursively by ev ( p ) = 8 < : ev ( p ) ; if ev ( p ) 2 V ; f + f ev ( p 0 ) j p 0 2 s ( p ) g ; if ev ( p ) = max ; f f ev ( p 0 ) j p 0 2 s ( p ) g ; if ev ( p ) = min . The v alue of ( G; V ; p I ; s; ev ; f + ; f ) is dene d to b e ev ( p I ) . 327 Ginsber g The denition is w ell founded b eause the game has no lo ops, and it is straigh tforw ard to extend the minimax algorithm 2.2.3 to this more general formalism. W e will disuss extensions of - pruning in the next setion. T o esh out our previous informal desription, w e need to instan tiate Denition 5.0.3. W e do this b y ha ving the v alue of an y partiular no de orresp ond to the set of p ositions where the maximizer an win: 1. The set G of p ositions is a set of pairs ( p; Z ) where p is a p osition with only t w o of the four bridge hands visible (i.e., a p osition in the \single dumm y" game), and Z is that subset of S (the set of situations) that is onsisten t b oth with p and with the ards that w ere pla y ed to rea h p from the initial p osition. 2. The v alue set V is 2 S . 3. The initial p osition p I is ( p 0 ; S ), where p 0 is the initial single-dumm y p osition. 4. The suessor funtion is desrib ed as follo ws: (a) If the delarer/maximizer is on pla y in the giv en p osition, the suessors are obtained b y en umerating the maximizer's legal pla ys and lea ving the set Z of situations un hanged. (b) If the minimizer is on pla y in the giv en p osition, the suessors are obtained b y pla ying an y ard that is legal in an y elemen t of Z and then restriting Z to that subset for whi h is in fat a legal pla y . 5. T erminal no des are no des where all ards ha v e b een pla y ed, and therefore orresp ond to single situations s , sine the lo ations of all ards ha v e b een rev ealed. F or su h a terminal p osition, if the delarer has made his on trat, the v alue is S (the en tire set of p ositions p ossible at the ro ot). If the delarer has failed to mak e his on trat, the v alue is S f s g . 6. The maximization and minimization funtions are omputed p oin t wise, so that f + ( U; V ) = U [ V and f ( U; V ) = U \ V Giv en an initial single-dumm y situation p orresp onding to a set S of situations, w e will all the ab o v e game the ( p; S ) game . Prop osition 5.0.5 Supp ose that the set of situations for whih the maximizer an make his ontr at is T S . Then the value of the ( p; S ) game is T . It is natural to view T as an elemen t of 2 S ; it is the funtion mapping p oin ts in T to 1 and p oin ts outside of T to 0. Pro of. The pro of pro eeds b y indution on the depth of the game tree. If the ro ot no de p is also terminal, then S = f s g and the v alue is learly set orretly (to s or ) b y the denition of the ( p; S ) game. 328 GIB: Imperfet inf orma tion in a omput a tionall y hallenging game If p is non terminal, supp ose rst that it is a maximizing no de. No w let s 2 S b e some partiular situation. If the maximizer an win in s , then there is some suessor ( p 0 ; S 0 ) to ( p; S ) where the maximizer wins, and hene b y the indutiv e h yp othesis, the v alue of ( p 0 ; S 0 ) is a set U with s 2 U . But sine the maximizer mo v es in p , the v alue assigned to ( p; S ) is a sup erset of the v alue assigned to an y subno de, so that s 2 ev ( p; S ) = T . If, on the other hand, the maximizer annot win in s , then he annot win in an y hild of s . If ( p i ; S i ) are the suessors of ( p; S ) in the game tree, then again b y the indutiv e h yp othesis, w e m ust ha v e s 62 ev ( p i ; S i ) for ea h i . But ev ( p; S ) = [ i ev ( p i ; S i ) so that s 62 ev ( p; S ) = T . F or the minimizing ase, supp ose that the maximizer wins in s . Then the maximizer m ust win in ev ery suessor of s , so that s 2 ev ( p i ; S i ) for ea h su h suessor and therefore s 2 ev ( p; S ). Alternativ ely , if the minimizer wins in s , he m ust ha v e a legal winning option so that s 62 ev ( p i ; S i ) for some i and therefore s 62 ev ( p; S ). Unfortunately , Prop osition 5.0.5 is in some sense exatly what w e w an ted not to pro v e: it sa ys that our mo died game omputes the set of situations in whi h it is p ossible for the maximizer to mak e his on trat if he has p erfet information ab out the opp onen ts' ards, not the set of situations in whi h it is p ossible for him to mak e his on trat giv en his atual state of inomplete information. Before w e go on to deal with this, ho w ev er, let me lo ok at an example in some detail. The example w e will use is similar to that of Setion 3 and in v olv es a situation where the maximizer an mak e his on trat if either W est has the Q or East has three hearts. I will denote b y S the set of situations where W est has the Q, and b y T the set where East has three hearts. It's p ossible to tie in the \defer the guess" example from Setion 3 as w ell, so I will do that also. Here is the game tree for the game in question: q q q q q q q q q q q q q q q q q P P P P P P P P P P P A A A A A A A A A A A A A A A A C C C C C C C C max max min min min min min 1 0 0 1 1 1 1 0 0 1 S S S S S T T T T T A t the ro ot no de, the maximizer has four hoies. If he mak es the mo v e on the left (pla ying for S , as it turns out), the minimizer then mo v es in a situation where the maximizer wins if S holds and loses if T holds. F or the seond mo v e, where the maximizer is essen tially pla ying for T , the rev erse is true. In the third ase, the maximizer defers the guess. W e supp ose that he is on pla y again immediately , fored to ommit b et w een pla ying for S and pla ying for T . In the last ase, he wins indep enden t of whether T or S obtains. 329 Ginsber g In the Mon te Carlo setting, the ab o v e tree will atually b e split based on the elemen t of the sample in question. In some ases, S will b e true and w e will examine only this subtree: q q q q q q q q q q q q P P P P P P P P P P P A A A A max max min min min min min 1 0 1 1 0 S S S S S The maximizer an win b y making an y mo v e other than the seond. In the ases where T obtains, w e examine: q q q q q q q q q q q q P P P P P P P P P P P A A A A A A A A A A A A A A A A C C C C C C C C max max min min min min min 0 1 1 0 1 T T T T T Here, the maximizer an win b y making an y mo v e other than the rst. In all ases, b oth of the last t w o mo v es win for the maximizer, sine this approa h annot reognize the fat that the third mo v e simply defers the guess while the fourth wins outrigh t. No w let us return to the situation where w e inlude information ab out the sets that it is p ossible to pla y for. Here is the tree again: q q q q q q q q q q q q q q q q q P P P P P P P P P P P A A A A A A A A A A A A A A A A C C C C C C C C max max min min min min min 1 0 0 1 1 1 1 0 0 1 S S S S S T T T T T The rst thing that w e need to do is to realize that the terminal no des should not b e lab elled with 1's and 0's but instead with sets where the maximizer an win. This pro dues: 330 GIB: Imperfet inf orma tion in a omput a tionall y hallenging game q q q q q q q q q q q q q q q q q P P P P P P P P P P P A A A A A A A A A A A A A A A A C C C C C C C C max max min min min min min S [ T S T S [ T S [ T S [ T S [ T S T S [ T S S S S S T T T T T T o understand the lab els, onsider the t w o leftmost fringe no des. The leftmost no de gets lab elled with T \for free" b eause T is eliminated b y the fat that the minimizer hose S . Sine the maximizer wins in S , the maximizer wins in all ases. F or the seond fringe no de, S is inluded b y virtue of the minimizer's mo ving to T ; T is not inluded b eause the minimizer atually wins on this line. Hene the lab el of T for the no de in question. This analysis assumes that S and T are disjoin t; if they o v erlap, the lab els b eome sligh tly more omplex but the o v erall analysis is little hanged. Ba king up the v alues one step giv es us: q q q q q q q q q q q q q q q q q P P P P P P P P P P P A A A A A A A A A A A A A A A A C C C C C C C C max max S [ T S T S [ T S [ T S [ T S [ T S T S [ T S T S [ T S T The minimizer, pla ying with p erfet information, alw a ys do es as b est he an. The rst in terior no de's lab el of S , for example, means that the maximizer wins only if S atually is the ase. Of ourse, our denitions th us far imply that the maximizer is pla ying with p erfet information as w ell, and w e an ba k up the rest of the tree to get: q q q q q q q q q q q q q q q q q P P P P P P P P P P P A A A A A A A A A A A A A A A A C C C C C C C C S [ T S [ T S [ T S T S [ T S [ T S [ T S [ T S T S [ T S T S [ T S T 331 Ginsber g 1e-05 0.0001 0.001 0.01 0.1 0 2 4 6 8 10 12 P(err) tri k delare defend Figure 6: Defense vs. delarer pla y for h umans As b efore, the maximizer \wins" with either of the last t w o options. Before w e address the fat that the pla y ers do not in fat ha v e p erfet information, let me p oin t out that in most bridge analyses, imp erfet information is assumed to b e an issue for the maximizer only . The defenders are assumed to b e op erating with omplete information for at least the follo wing reasons: 1. In general, there is a premium for delaring as opp osed to defending, so that b oth sides w an t to delare. T ypially , the pair with greater assets in terms of high ards wins the \bidding battle" and sueeds in b eoming the delaring side, so that the o v erall assets a v ailable to the defenders in terms of high ards are generally less than those a v ailable to the delarer. This means that the defenders will generally b e able to predit ea h other's hands with more auray than the delarer an. 2. The defenders an signal, on v eying to one another information ab out the ards they hold. (As an example, pla y of an unneessarily high ard often indiates an ev en n um b er of ards in the suit b eing pla y ed.) They are generally assumed to signal only information that is useful to them but not to delarer, one again impro ving their olletiv e abilit y to pla y as if they had p erfet information. 3. After the rst t w o or three tri ks, defenders' pla y is t ypially loser to double dumm y than is the delarer's. This is sho wn in Figure 6, whi h on trasts the qualit y of h uman pla y as defender with the qualit y of h uman pla y as delarer; w e mak e more mistak es delaring than defending as of tri k four. (This gure is analogous to Figures 4 and 5.) 332 GIB: Imperfet inf orma tion in a omput a tionall y hallenging game There are some deals where it is imp ortan t for delarer to exploit unertain t y on the part of the defenders, but these are denitely the exeption as opp osed to the rule. This suggests that Prop osition 5.0.5 is doing a reasonable job of mo deling the defenders' ardpla y , but the om bination funtion for the maximizer needs to b e mo died to reet the imp erfet-information nature of his task. T o understand this, let us return to our putativ e exp ert, who suggested at the b eginning of this setion that he migh t b e pla ying for W est to hold the spade queen. What he migh t sa y in a bit more detail is, \I ould pla y for ea h opp onen t to hold exatly three hearts, or I ould pla y for W est to hold the spade queen. The latter w as the b etter hane." This suggests that the v alue assigned to the p osition b y the maximizer is not a single set of situations (those in whi h he an mak e the on trat), but a set S of sets of situations. Ea h set S 2 S orresp onds to one set of situations that the maximizer ould pla y for, giv en his inomplete kno wledge of the p ositions of the opp osing ards. Extending the notation used earlier in this setion, w e will denote the set of sets of situations b y 2 2 S . The maximizer's om bination funtion on 2 2 S is giv en b y max( F ; G ) = F [ G (8) where ea h of F and G are sets of sets of situations. This sa ys that if the maximizer is on pla y in a situation p , and he has one mo v e that will allo w him to selet from a set F of things to \pla y for" and another mo v e that will allo w him to selet from a set G , then his hoie at p is to selet from an y elemen t of F [ G . The minimizer's funtion is a bit more subtle. Supp ose that at a no de p , the minimizer an mo v e to a suessor with v alue F = f F i g , or to a suessor with v alue G = f G i g . What v alue should w e assign to p ? Sine the minimizer has p erfet information, he will alw a ys guaran tee that the maximizer a hiev es the minim um v alue for the atual situation. Whatev er elemen t of F i 2 F or G j 2 G is ev en tually seleted b y the maximizer, the ev en tual v alue of p will b e the minim um of F i and G j . In other w ords min ( f F i g ; f G j g ) = f min ( F i ; G j ) g (9) where the individual minima are omputed using the p erfet information rule (7). Denition 5.0.6 L et G b e the set of p ositions in an imp erfe t information game, a set of p airs ( p; Z ) wher e p is a p osition fr om the p oint of view of the maximizing player and Z is the set of p erfe t information p ositions onsistent with p . The imp erfet information game for G is the game ( G; V ; p I ; s; ev ; f + ; f ) wher e: 1. The value set V is 2 2 S . 2. The initial p osition p I is ( p 0 ; S ) , wher e p 0 is the initial imp erfe t information p osition and S is the set of al l p erfe t information p ositions onsistent with it. 3. The su essor funtion is desrib e d as fol lows: (a) If the maximizer is on play in the given p osition, the su essors ar e obtaine d by enumer ating the maximizer's le gal plays and le aving the elements of the set Z of situations unhange d. 333 Ginsber g (b) If the minimizer is on play in the given p osition, the su essors ar e obtaine d by making playing any ar d that is le gal in any element of X and then r estriting Z to those situations for whih is in fat a le gal play. 4. T erminal no des ar e no des wher e al l ar ds have b e en playe d, and ther efor e orr esp ond to single situations s . F or suh a terminal p osition, if the de lar er has made his ontr at, the value is ( f s g ; f S g ) . If the de lar er has faile d to make his ontr at, the value is ( f s g ; f S f s gg ) . 5. The maximization and minimization funtions ar e given by (8) and (9) r esp e tively. Theorem 5.0.7 Supp ose that the value of the imp erfe t information game for G is T . Then a set of p ositions T is a subset of an element of T if and only if the maximizer has a str ate gy that wins in every element of T , assuming that the minimizer plays with p erfe t information. Pro of. One again, the pro of pro eeds b y indution on the depth of the game tree. And one again, the ase where p is a terminal p osition is handled easily b y the denition. F or the indutiv e ase, w e onsider the maximizer and minimizer separately . F or the maximizer, supp ose that there is some set T of situations that satises the onditions of the theorem, so that the maximizer has a strategy that aters to all of the elemen ts of T . Then the rst mo v e of that strategy will b e some single mo v e to a p osition p i that is a suessor of p and that aters to the elemen ts of T . Th us if the v alue of the suessful hild is F , T is a subset of some F 2 F b y the indutiv e h yp othesis. Th us if the v alue of the original game is G , T is a subset of an elemen t of G b y virtue of (8). Alternativ ely , if T is a set for whi h the maximizer has no su h strategy , then learly the maximizer annot ha v e a strategy after making an y of the mo v es to the suessor p ositions p i . This means that no sup erset U T in an y ev ( p i ), and th us no sup erset of T in ev ( p ) either. The minimizing ase is not really an y harder. Supp ose rst that the maximizer has no strategy for sueeding in ev ery situation in T . Then the minimizer (pla ying with p erfet information) m ust ha v e some mo v e to a p osition p i with v alue F i su h that T is not a subset of an y elemen t of F i . No w if F i = f T i g , reall that min ( f T i g ; f U i g ) = f T i \ U j g ; and T 6 T i for ea h i . Th us T 6 T i \ U j for ea h i and j , and there is no V T with V 2 min ( f T i g ; f U i g ) F or the last ase, supp ose that the maximizer do es ha v e a strategy for sueeding in ev ery situation in T . That means that after an y mo v e for the minimizer, the maximizer will still ha v e a strategy that sueeds in T , so that if p i are the suessors of p and ev ( p i ) = T i , then there is a T i 2 T i with T T i . No w T \ i T i 2 min ( T i ) = ev ( p ). Th us ev ( p ) on tains an elemen t that is a sup erset of T . Using this result, w e an in theory ompute exatly the set of things w e migh t pla y for giv en a single-dumm y bridge problem. Before w e turn to the issues in v olv ed in doing so in pratie, ho w ev er, let me rep eat the example of this setion using the imp erfet information te hnique. Here is the game tree again with v alues assigned to the terminal no des: 334 GIB: Imperfet inf orma tion in a omput a tionall y hallenging game q q q q q q q q q q q q q q q q q P P P P P P P P P P P A A A A A A A A A A A A A A A A C C C C C C C C max max min min min min min f S [ T g f S g f T g f S [ T g f S [ T g f S [ T g f S [ T g f S g f T g f S [ T g S S S S S T T T T T Ba king up past the minimizer's nal mo v e giv es us: q q q q q q q q q q q q q q q q q P P P P P P P P P P P A A A A A A A A A A A A A A A A C C C C C C C C max max f S g f T g f S [ T g f S g f T g f S [ T g f S g f T g f S [ T g f S [ T g f S [ T g f S [ T g f S g f T g f S [ T g And w e an no w omplete the analysis to nally get: q q q q q q q q q q q q q q q q q P P P P P P P P P P P A A A A A A A A A A A A A A A A C C C C C C C C f S; T ; S [ T g f S; T g f S g f T g f S [ T g f S g f T g f S [ T g f S g f T g f S [ T g f S [ T g f S [ T g f S [ T g f S g f T g f S [ T g Note the dierene in the v alues assigned to the maximizer's third and fourth hoies at the rst ply . The third hoie has v alue f S; T g , indiating learly that the maximizer will need to subsequen tly deide whether to pla y for S or for T . But the fourth hoie has v alue f S [ T g indiating that b oth p ossibilities are atered to. The v alue assigned to the ro ot on tains some redundany (whi h w e will deal with in Setion 7), in that one of the maximizer's hoies ( S [ T ) dominates the others. Nev ertheless, this v alue learly indiates that the maximizer has an option a v ailable at the ro ot that aters to b oth situations. 335 Ginsber g q q q q q A A A A Q Q Q Q Q Q C C C C min m 1 m 2 m 3 m 4 q q q q q q A A A A S S S S C C C C min min m 2 m 1 m 3 m 4 Figure 7: Equiv alen t games? 6. Extending alpha-b eta pruning to latties The results of the previous setion allo w us to deal with imp erfet information in theory . Unfortunately , omputing the v alue in theory is hardly the same as omputing it in pratie. Some ideas, su h as transp osition tables and partition sear h, an fairly ob viously b e applied to games with v alues tak en from sets more general than total orders. But what ab out - pruning, the lin hpin of high-p erformane adv ersary sear h algorithms? The answ er here is far more subtle. 6.1 Some neessary denitions Let us b egin b y onsidering the t w o small game trees in Figure 7, where the minimizer is on pla y at the nonfringe no des and none of the m i is in tended to b e neessarily terminal. Are these t w o games alw a ys equiv alen t? W e w ould argue that they are. In the game on the left, the minimizer needs to selet among the four options m 1 ; m 2 ; m 3 ; m 4 . In the game on the righ t, he needs to rst selet whether or not to pla y m 2 ; if he deides not to, he m ust selet among the remaining options. Sine the minimizer has the same p ossibilities in b oth ases, w e assume that the v alues assigned to the games are the same. F rom a more formal p oin t of view, the v alue of the game on the left is f ( m 1 ; m 2 ; m 3 ; m 4 ), while that of the game on the righ t is f ( m 2 ; f ( m 1 ; m 3 ; m 4 )) where w e ha v e abused nota- tion somewhat, writing m i for the v alue of the no de m i as w ell. Denition 6.1.1 A game wil l b e al le d simple if for any x 2 v V , f + f x g = f f x g = x and also f + ( v ) = f + f x; f + ( v x ) g and f ( v ) = f f x; f ( v x ) g 336 GIB: Imperfet inf orma tion in a omput a tionall y hallenging game W e ha v e augmen ted the ondition dev elop ed in the disussion of Figure 7 with the assumption that if a pla y er's mo v e in a p osition p is fored (so that p has a unique suessor), then the v alue b efore and after the fored mo v e is the same. Prop osition 6.1.2 F or any simple game, ther e ar e binary funtions ^ and _ fr om V to itself that ar e ommutative, asso iative and idemp otent 14 and suh that f + f v 0 ; : : : ; v m g = v 0 _ _ v m and f f v 0 ; : : : ; v m g = v 0 ^ ^ v m Pro of. Indution on m . When referring to a simple game, w e will t ypially replae the funtions f + and f b y the equiv alen t binary funtions _ and ^ . W e assume throughout the rest of this setion that all games are simple. 15 The binary funtions _ and ^ no w indue a partial order , where w e will sa y that x y if and only if x _ y = y . It is not hard to see that this partial order is reexiv e ( x x ), an tisymmetri ( x y and y x if and only if x = y ) and transitiv e. The op erators _ and ^ b eha v e lik e greatest lo w er b ound and least upp er b ound op erators with regard to the partial order. W e also ha v e the follo wing: Prop osition 6.1.3 Whenever S T , f + ( S ) f + ( T ) and f ( S ) f ( T ) . In other w ords, assuming that the minimizer is trying to rea h a lo w v alue in the partial order and the maximizer is trying to rea h a high one, ha ving more options is alw a ys go o d. 6.2 Shallo w pruning W e are no w able to in v estigate - pruning in our general framew ork. Let us b egin with shallo w pruning, sho wn in Figure 8. The idea here is that if the minimizer prefers x to y , he will nev er allo w the maximizer ev en the p ossibilit y of seleting b et w een y and the v alue of the subtree ro oted at T . After all, the v alue of the maximizing no de in the gure is y _ ev ( T ) y x , and the minimizer will therefore alw a ys prefer x . In order for the usual orretness pro of for (shallo w) - pruning to hold, w e need the follo wing ondition to b e satised: Denition 6.2.1 (Shal low - pruning) A game G wil l b e said to allo w shallo w - prun- ing for the minimizer if x ^ ( y _ T ) = x (10) 14. A binary funtion f is alled idemp otent if f ( a; a ) = a for all a. 15. W e also assume that the games are suÆien tly omplex that w e an nd in the game tree a no de with an y desired funtional v alue, e.g., a ^ ( b _ ) for sp ei a , b and . W ere this not the ase, none of our results w ould follo w. As an example, a game in whi h the initial p osition is also terminal surely admits pruning of all kinds (sine the game tree is empt y) but need not satisfy the onlusions of the results in this setion. 337 Ginsber g q q q q q A A A A S S S S C C C C max min x y T Figure 8: T an b e pruned (shallo wly) if x y for al l x; y ; T 2 V with x y . The game wil l b e said to allo w shallo w - pruning for the maximizer if x _ ( y ^ T ) = x (11) for al l x; y ; T 2 V with x y . We wil l say that G allo ws shallo w pruning if it al lows shal low - pruning for b oth players. The denition basially sa ys that the ba k ed up v alue at the ro ot of the game tree is un hanged b y pruning the maximizing subtree in the gure. As w e will see shortly , the expressions (10) and (11) desribing shallo w pruning are iden tial to what are more t ypially kno wn as absorption identities . Denition 6.2.2 Supp ose V is a set and ^ and _ ar e two binary op er ators on V . The triple ( V ; ^ ; _ ) is al le d a lattie if ^ and _ ar e idemp otent, ommutative and asso iative, and satisfy the absorption iden tities in that for any x; y 2 V , x _ ( x ^ y ) = x (12) x ^ ( x _ y ) = x (13) W e also ha v e the follo wing: Denition 6.2.3 A latti e ( V ; ^ ; _ ) is al le d distributiv e if ^ and _ distribute with r esp e t to one another, so that x _ ( y ^ z ) = ( x _ y ) ^ ( x _ z ) (14) x ^ ( y _ z ) = ( x ^ y ) _ ( x ^ z ) (15) Lemma 6.2.4 Eah of (12) and (13) implies the other. Eah of (14) and (15) implies the other. Pro of. These are w ell kno wn results from lattie theory (Gr atzer, 1978). Prop osition 6.2.5 (Ginsb erg & Jara y , 2001) F or a game G , the fol lowing onditions ar e e quivalent: 338 GIB: Imperfet inf orma tion in a omput a tionall y hallenging game r r r r r A A A A S S S S C C C C r max r min max min x y T Figure 9: T an b e pruned (deeply) if x y 1. G al lows shal low - pruning for the minimizer. 2. G al lows shal low - pruning for the maximizer. 3. G al lows shal low pruning. 4. ( V ; ^ ; _ ) is a latti e. Pro of. 16 W e sho w that the rst and fourth onditions are equiv alen t; ev erything else follo ws easily . If G allo ws shallo w - pruning for the minimizer, w e tak e x = a and y = T = a _ b in (10). Clearly x y so w e get a ^ ( y _ y ) = a ^ y = a ^ ( a _ b ) = a as in (13). F or the on v erse, if x y , then x ^ y = x and x ^ ( y _ T ) = ( x ^ y ) ^ ( y _ T ) = x ^ ( y ^ ( y _ T )) = x ^ y = x: 6.3 Deep pruning Deep pruning is a bit more subtle. An example app ears in Figure 9. As b efore, assume x y . The argumen t is as desrib ed previously: Giv en that the minimizer has a guaran teed v alue of x at the upp er minimizing no de, there is no w a y that a hoie allo wing the maximizer to rea h y an b e on the main line; if it w ere, then the maximizer ould get a v alue of at least y . 16. The pro ofs of this and Prop osition 6.3.2 are due to Alan Jara y . 339 Ginsber g r r r r r 0 | } ~ | max min max min Figure 10: The deep pruning oun terexample Denition 6.3.1 (De ep - pruning) A game G wil l b e said to allo w - pruning for the minimizer if for any x; y ; T ; z 1 ; : : : ; z 2 i 2 V with x y , x ^ ( z 1 _ ( z 2 ^ _ ( z 2 i ^ ( y _ T ))) ) = x ^ ( z 1 _ ( z 2 ^ _ z 2 i ) ) : The game wil l b e said to allo w - pruning for the maximizer if x _ ( z 1 ^ ( z 2 _ ^ ( z 2 i _ ( y ^ T ))) ) = x _ ( z 1 ^ ( z 2 _ ^ z 2 i ) ) : We wil l say that G allo ws pruning if it al lows - pruning for b oth players. As b efore, the prune allo ws us to remo v e the dominated no de ( y in Figure 9) and all of its siblings. The fat that a game allo ws shallo w - pruning do es not mean that it allo ws pruning in general, as is sho wn b y the follo wing oun terexample. The example in v olv es a game with one ard that is kno wn to b oth pla y ers; only the suit of the ard matters. The game tree app ears in Figure 10. In this tree, a no de lab elled with a suit sym b ol is terminal and means that the maximizer wins if and only if the suit of the ard mat hes the giv en sym b ol. So at the ro ot of the giv en tree, the maximizer (whose turn it is to pla y) an ho ose to \turn o v er" the ard, winning if and only if it's a lub, or an defer to the minimizer. The minimizer an ho ose to turn the ard ( losing just in ase it's a diamond { the suit sym b ols refer to the maximizer's result), or hand the situation ba k to the maximizer. If the maximizer defers y et again, the minimizer an either turn o v er the ard, losing if it's a lub, or simply delare vitory (presumably his hoie). There is one other wrinkle in this game. A t an y p oin t in the game, the maximizer an hange the ard from either a diamond or a spade to a lub. No w let's onsider the game itself. A t ply 4, the minimizer will ob viously ho ose to win the game. Th us at ply 3, the maximizer will need to ho ose ~ , winning just in ase the ard is a heart. But this means that at ply 2, the minimizer will win the game, sine if the ard is not a diamond he will mo v e to the left (and win at one) while if the ard is not a heart he an win b y mo ving to the righ t. (Remem b er that the minimizer kno ws the suit 340 GIB: Imperfet inf orma tion in a omput a tionall y hallenging game of the ard.) The upshot of this is that the maximizer wins the o v erall game if and only if the ard in question is a lub. A formal analysis pro eeds similarly , lab elling the no des as follo ws: r r r r r 0 | } ~ | ~ = ~ _ 0 0 = | ^ 0 | = | _ 0 0 = } ^ ~ Note, iniden tally , that the maximizer's abilit y to hange the ard do es not help him win the game. No w supp ose that w e apply deep pruning to this game. The ply four no de is one where the minimizer an fore a v alue of at most | , suggesting that the siblings of the b ottom | no de an b e pruned. But doing so pro dues the follo wing tree: r r r r r pruned? | } ~ | 1 = ~ _ | | | _ } } = } ^ 1 If the maximizer rea hes ply 3, he an win b y hanging the ard to a lub if need b e. Of ourse, the minimizer w on't let the maximizer rea h ply 3; at ply 2, he'll mo v e left so that the maximizer wins only if the ard is a diamond. That means that the maximizer wins at the ro ot just in ase the ard is either a lub or a diamond. A partial graph of the v alues for this game is as follo ws: r r r r r r Q Q Q Q Q Q A A A A 0 1 | } ~ where w e ha v e inluded the ruial fat that x ^ y = 0 if x 6 = y (sine the minimizer kno ws the ard) and ~ _ | = 1 b eause the maximizer an in v ok e his sp eial rule. Other least upp er b ounds are not sho wn in the diagram. The maximizing funtion _ mo v es up the gure; the minimizing funtion ^ mo v es do wn. The deep prune fails b eause w e an't \push" the v alue | ^ 0 past the ~ to get to the | near the ro ot. Somewhat more preisely , the problem is that ~ = ~ _ ( | ^ 0) 6 = ( ~ ^ | ) _ ( ~ ^ 0) = 0 This suggests the follo wing: 341 Ginsber g Prop osition 6.3.2 (Ginsb erg & Jara y , 2001) F or a game G , the fol lowing onditions ar e e quivalent: 1. G al lows - pruning for the minimizer. 2. G al lows - pruning for the maximizer. 3. G al lows pruning. 4. ( V ; ^ ; _ ) is a distributive latti e. Pro of. As b efore, w e sho w only that the rst and fourth onditions are equiv alen t. Sine pruning implies shallo w pruning (tak e i = 0 in the denition), it follo ws that the rst ondition implies that ( V ; ^ ; _ ) is a lattie. F rom deep pruning for the minimizer with i = 1, w e ha v e that if x y , then for an y z 1 ; z 2 ; T , x ^ ( z 1 _ ( z 2 ^ ( y _ T ))) = x ^ ( z 1 _ z 2 ) No w tak e y = T = x to get x ^ ( z 1 _ ( z 2 ^ x )) = x ^ ( z 1 _ z 2 ) (16) It follo ws that ea h top lev el term in the left hand side of (16) is greater than or equal to the righ t hand side; sp eially z 1 _ ( z 2 ^ x ) x ^ ( z 1 _ z 2 ) : (17) W e laim that this implies that the lattie in question is distributiv e. T o see this, let u; v ; w 2 V . No w tak e z 1 = u ^ w , z 2 = v and x = w in (17) to get ( u ^ w ) _ ( v ^ w ) w ^ (( u ^ w ) _ v ) (18) But v _ ( u ^ w ) w ^ ( v _ u ) is an instane of (17), and om bining this with (18) giv es us ( u ^ w ) _ ( v ^ w ) w ^ (( u ^ w ) _ v ) w ^ w ^ ( v _ u ) = w ^ ( v _ u ) This is the hard diretion; w ^ ( v _ u ) ( u ^ w ) _ ( v ^ w ) for an y lattie b eause w ^ ( v _ u ) u ^ w and w ^ ( v _ u ) v ^ w individually . Th us w ^ ( v _ u ) = ( u ^ w ) _ ( v ^ w ), and deep pruning implies that the lattie is distributiv e. F or the on v erse, if the lattie is distributiv e and x y , then x ^ ( z 1 _ ( z 2 ^ ( y _ T ))) = ( x ^ z 1 ) _ ( x ^ z 2 ^ ( y _ T )) = ( x ^ z 1 ) _ ( x ^ z 2 ) = x ^ ( z 1 _ z 2 ) where the seond equalit y is a onsequene of the fat that x ( y _ T ), so that x = x ^ ( y _ T ). This v alidates pruning for i = 1; deep er ases are similar. Finally , note that in games where this result applies, w e an on tin ue to use Algorithms 2.2.5 or 2.3.3 without mo diation, sine the prunes that they endorse on tin ue to b e sound as the game tree is expanded. 342 GIB: Imperfet inf orma tion in a omput a tionall y hallenging game 6.4 Appliation to imp erfet information In order to apply these ideas to games of imp erfet information treated as in Setion 5, w e need to sho w that the v alue set in tro dued there is a (hop efully distributiv e) lattie. T o do this, reall that there is redundan t information in an arbitrary elemen t F of 2 2 S , sine if F on tains b oth T and U with T U (in other w ords, the maximizer an pla y for either T or for U but U is prop erly b etter), the set T an b e remo v ed from F without aeting the maximizer's options in an y in teresting w a y . This suggests the follo wing: Denition 6.4.1 L et F 2 2 2 S for an arbitr ary set S . We wil l say that F is redued if ther e ar e no T ; U 2 F with T U . We wil l say that F 1 is a redution of F 2 if F 1 is r e du e d and F 1 F 2 . Lemma 6.4.2 Every F 2 2 2 S has a unique r e dution. Pro of. This is immediate; just remo v e the subsumed elemen ts from F . . W e will denote the redution of F b y r ( F ). Armed with this denition, w e an no w mo dify Denition 5.0.6 in the ob vious w a y , replaing the v alue set V with the set of redued elemen ts of V and the maximizing and minimizing funtions (8) and (9) with the redued v ersions thereof, so that max( F ; G ) = r ( F [ G ) (19) and min ( f F i g ; f G j g ) = r ( f F i \ G j g ) (20) Remem b er that w e t ypially write _ for max and ^ for min . Prop osition 6.4.3 Given the ab ove denitions, ( V ; _ ; ^ ) is a distributive latti e. Pro of. W e need to sho w that max and min as dened ab o v e are omm utativ e, asso iativ e, and idemp oten t, that they distribute with resp et to one another, and that the absorption iden tit y (12) is satised. Sine the redution op erator learly omm utes with the initial denitions of max and min , omm utativit y , asso iativit y and distributivit y are ob vious, as is the fat that _ is idemp oten t. T o see that ^ is idemp oten t, w e ha v e F ^ F = r ( f min ( F i ; F j ) g ) = r ( f F i \ F j g ) but ea h elemen t of the set on the righ thand side is a subset of F i \ F i so F ^ F = r ( f F i g ) = r ( F ) = F : F or the absorption iden tit y , w e need to sho w that F _ ( F ^ G ) = F But F ^ G = r f F i \ G j g 343 Ginsber g so F _ ( F ^ G ) = r ( F _ r f F i \ G j g ) = r ( f F i g [ f F i \ G j g ) = r ( f F i g ) = r ( F ) = F sine, one again, ea h elemen t of F ^ G is subsumed b y the orresp onding F i . It follo ws that an implemen tation designed to ompute the v alue of an imp erfet in- formation game as desrib ed b y Theorem 5.0.7 an indeed use - pruning to sp eed the omputation. 6.5 Bridge implemen tation Giv en this b o dy of theory , w e implemen ted a single-dumm y v ersion of gib 's double-dumm y sear h engine. Not surprisingly , the most diÆult elemen t of the implemen tation w as build- ing eÆien t data strutures for the manipulation of elemen ts of 2 2 S . T o handle this, w e represen ted ea h elemen t of S as a onjuntion. W e rst iden tied one of the t w o hidden hands H , and then for ea h ard , w ould write if w ere held b y H and : if w ere not held b y H . An elemen t of 2 S w as then tak en to b e a disjuntiv e om bination of these onjuntions, and an elemen t of 2 2 S w as tak en to b e a list of su h disjuntions. The adv an tage of this represen tation w as that logial inferene ould b e used to onstrut the redution of an y su h list. In order to mak e this inferene as eÆien t as p ossible, the disjuntions themselv es w ere represen ted as binary de ision diagr ams , or bdd 's (Lind-Nielsen, 2000). There are a v ariet y of publi domain implemen tations of bdd 's a v ailable, and w e used one pro vided b y Lind- Nielsen (Lind-Nielsen, 2000). 17 The resulting implemen tation solv es small endings (p erhaps 16 ards left in total) qui kly but for larger endings, the running times ome to b e dominated b y the bdd omputations; this is hardly surprising, sine the size of individual bdd s an b e exp onen tial in the size of S (the n um b er of p ossible distributions of the unseen ards). W e found that w e w ere generally able to solv e 32-ard endings in ab out a min ute, but that the running times w ere inreasing b y t w o orders of magnitude as ea h additional ard w as added. This is b oth go o d news and bad news. View ed p ositiv ely , the p erformane of the system as onstruted is far sup erior to the p erformane of preeding attempts to deal with the imp erfet information arising in bridge. F rank et.al, for example, are only apable of solving single suit om binations (13 ards left, giv e or tak e) using an algorithm that app ears to tak e sev eral min utes to run (F rank, Basin, & Matsubara, 1998). They subsequen tly impro v e the p erformane to an a v erage time of 0.6 seonds (F rank et al., 2000), but are still restrited to problems that are to o small to b e of m u h use to a program in tended to pla y the omplete game. 17. W e tried a v ariet y of non- bdd based implemen tations as w ell. The bdd -based implemen tation w as far faster than an y of the others. 344 GIB: Imperfet inf orma tion in a omput a tionall y hallenging game That's the go o d news. The bad news is that a program apable only of solving an 8- ard ending in a min ute is inappropriate for pro dution use. Gib is a pro dution program, exp eted to pla y bridge at h uman sp eeds. Another approa h w as therefore needed. 7. Solving single-dumm y problems in pratie 7.1 A hiev able sets The k ey to pratial appliation of the ideas in the previous setion is the realization that when it omes time to mak e a pla y , a single elemen t of F m ust b e seleted: if y ou an pla y for W est to ha v e the Q or for ea h pla y er to ha v e three hearts but annot ater to b oth p ossibilities sim ultaneously , y ou ev en tually ha v e to atually mak e the hoie. Denition 7.1.1 Supp ose that the value of the imp erfe t information game for G is F . Given a sp e i A S , we wil l say that A is a hiev able if ther e is some F 2 F for whih A F . In other w ords, the set A of situations is a hiev able if the maximizer has a plan that wins for all elemen ts of A . Denition 7.1.2 Given a set S of situations, a pa y o funtion for S is any funtion f : 2 S ! I R suh that f ( U ) f ( T ) whenever U T . The pa y o funtion ev aluates p oten tial a hiev able sets. Denition 7.1.3 L et G b e a game and S the asso iate d set of situations. If f is a p ayo funtion for S , a solution to G under f is any ahievable set A for whih f ( A ) is maximal. In pratie, w e need not nd the atual v alue of the game; nding a solution to G under an appropriate pa y o funtion suÆes. In bridge, the pa y o funtion is presumably the probabilit y that the ards are dealt as in the set A ; this funtion learly inreases with inreasing set size as required b y Denition 7.1.2 and an b e ev aluated in pratie using the Mon te Carlo sample of Setion 3. Instead of nding the solution to an imp erfet information game, supp ose instead that w e ha v e a Mon te Carlo sample for the game onsisting of a set of situations S = f s i g that is ordered as i = 0 ; : : : ; n . W e an no w pro due an a hiev able set A as follo ws: Algorithm 7.1.4 T o onstrut a maximal ahievable set A fr om a se quen e h s 0 ; : : : ; s n i of situations: 1. Set A = . 2. F or i = 0 ; : : : ; n , if A [ f s i g is ahievable, set A = A [ f s i g . The algorithm onstruts the a hiev able set in a greedy fashion, gradually adding elemen ts of S to A un til no more an b e added. Denition 7.1.5 Given a game G and a se quen e S of situations, the a hiev able set in- dued b y S for G is the set onstrute d by A lgorithm 7.1.4. 345 Ginsber g F rom a omputational p oin t of view, the exp ensiv e step in the algorithm is determining whether or not the set A [ f s i g is a hiev able. This is relativ ely straigh tforw ard, ho w ev er, sine the fo us on a sp ei set eetiv ely replaes the game G with a new game with v alues in f 0 ; 1 g . A t an y partiular no de n , if expanding n demonstrates that A [ f s i g is not a hiev able, the v alue of the game is zero. If expanding n indiates that A [ f s i g is a hiev able one n is rea hed, then the v alue of the no de n is one. Although the sear h spae is un hanged from that of the original imp erfet information game as in Denition 5.0.6, there is no longer an y need to manipulate omplex v alues, and the he k for a hiev abilit y is therefore tratable in pratie. Let me illustrate this b y returning to our usual example of Setion 5. Here is the fully ev aluated tree one again: q q q q q q q q q q q q q q q q q P P P P P P P P P P P A A A A A A A A A A A A A A A A C C C C C C C C f S [ T g f S; T g f S g f T g f S [ T g f S g f T g f S [ T g f S g f T g f S [ T g f S [ T g f S [ T g f S [ T g f S g f T g f S [ T g Note that w e ha v e replaed the v alue at the ro ot with its redution. No w supp ose that w e view the set of p ositions as on taining only t w o elemen ts, s 2 S and t 2 T . Presumably W est holds the Q in s , and East holds three hearts in t . If the ordering hosen is h s; t i , then w e rst try to a hiev e f s g . In this on text, a no de n is a win for the maximizer if either the maximizer an indeed win at n or s is no longer p ossible (in whi h ase the maximizer's abilit y to a hiev e f s g is undiminished). The game tree b eomes: q q q q q q q q q q q q q q q q q P P P P P P P P P P P A A A A A A A A A A A A A A A A C C C C C C C C max max min min min min min 1 1 0 1 1 1 1 1 0 1 S S S S S T T T T T All of the T bran hes are wins for the maximizer (who is onerned with s only), and the S bran hes are wins just in ase the maximizer do es indeed win (as he do es if he guesses righ t at either of the rst t w o plies). Ba king up the v alues giv es us: 346 GIB: Imperfet inf orma tion in a omput a tionall y hallenging game q q q q q q q q q q q q q q q q q P P P P P P P P P P P A A A A A A A A A A A A A A A A C C C C C C C C 1 1 1 0 1 0 1 1 1 0 1 1 1 1 1 0 1 S S S S S T T T T T This indiates (orretly) that the maximizer an a hiev e s pro vided that he do esn't deide to pla y for T at the ro ot of the tree. Note that this analysis is a straigh t minimax, allo wing fast algorithms to b e applied while a v oiding the manipulation of elemen ts of 2 2 S desrib ed in the previous setion. No w w e add t to our a hiev able set, whi h th us b eomes f s; t g . The maximizer wins only if he really do es win (and not just b eause he isn't in terested in T an y more), and the basi tree b eomes: q q q q q q q q q q q q q q q q q P P P P P P P P P P P A A A A A A A A A A A A A A A A C C C C C C C C max max min min min min min 1 0 0 1 1 1 1 0 0 1 S S S S S T T T T T Ba king up the v alues giv es: q q q q q q q q q q q q q q q q q P P P P P P P P P P P A A A A A A A A A A A A A A A A C C C C C C C C 1 0 0 0 1 0 0 1 0 0 1 1 1 1 0 0 1 S S S S S T T T T T The maximizer an a hiev e the extended result only b y making the righ tmost mo v e, as desired. What if the righ tmost bran h did not exist, so that the maximizer w ere unable to om bine his hanes? No w the v alue of the ro ot no de in the ab o v e tree is 0, so that f s; t g is not a hiev able. The maximal a hiev able set returned b y the algorithm w ould b e S ; had the 347 Ginsber g ordering b een h t; s i instead, an alternativ e maximal a hiev able set of T w ould ha v e b een returned instead. In an y ev en t, w e ha v e: Prop osition 7.1.6 Given a game G and a se quen e S of situations, let A b e the ahievable set indu e d by S for G . Then no pr op er sup erset of A in S is ahievable. Pro of. This is straigh tforw ard. F or an y elemen t s 2 S A , w e kno w that U [ f s g is not a hiev able for some U A . Th us A [ f s g is not a hiev able as w ell. Algorithm 7.1.4 allo ws us to onstrut maximal a hiev able sets relativ e to our Mon te Carlo sample; reall that w e are taking our sequene S of situations to b e an y ordering of the sample itself. In pratie, ho w ev er, it is imp ortan t not to fo us to o sharply on the sample itself, lest the ev en tual a hiev able set onstruted o v ert irrelev an t probabilisti harateristis of that sample. This an b e aomplished b y replaing the simple union in step 2 of the algorithm with some more ompliated op eration that aptures the idea of \situations that are either lik e s i or lik e those already in A ." In bridge, for example, A migh t b e all situations where W est has t w o or three hearts, and s i migh t b e some new situation where W est has four hearts. The generalized union w ould b e situations where W est has t w o, three or four hearts. If this more general set is not a hiev able, another attempt ould b e made with the simple union. If w e denote the \general union" b y , Algorithm 7.1.4 b eomes: Algorithm 7.1.7 T o onstrut an ahievable set A fr om a se quen e h s 0 ; : : : ; s n i of situa- tions: 1. Set A = . 2. F or i = 0 ; : : : ; n : (a) If A f s i g is ahievable, set A = A f s i g . (b) Otherwise, if A [ f s i g is ahievable, set A = A [ f s i g . This algorithm an b e used in pratie to nd a hiev able sets that are either maximal or eetiv ely so o v er the set of all p ossible instanes, not just those app earing in the Mon te Carlo sample. 7.2 Maximizing the pa y o It remains to nd not just maximal a hiev able sets, but ones that appro ximate the solution to the game in question giv en a partiular pa y o funtion. T o understand ho w w e do this, let me dra w an analogy b et w een the problem w e are trying to solv e and resoure-onstrained pro jet s heduling ( r ps ). In r ps , one has a list of tasks to b e p erformed, together with ordering onstrain ts sa ying that ertain tasks need to b e p erformed b efore others. In addition, ea h task uses a ertain quan tit y of v arious resoures; there are limitations on the a v ailabilit y of an y partiular resoure at an y partiular time. As an example, building an airraft wing ma y in v olv e fabriating the top and b ottom igh t surfaes, building the aileron, and atta hing the t w o. It should b e lear that the aileron 348 GIB: Imperfet inf orma tion in a omput a tionall y hallenging game annot b e atta hed un til b oth it and the wing ha v e b een onstruted. Building ea h setion ma y in v olv e the use of three sheetmetal w ork ers, but only v e ma y b e a v ailable in general. The goal in an r ps problem is t ypially to minimize the length of the s hedule (often alled the makesp an ) without exeeding the resoure limits. In building a wing, it is more eÆien t (and more ost eetiv e) to build it qui kly than slo wly . Man y pro dution s heduling systems try to minimize mak espan b y building the s hedule from the initial time forw ard. A t ea h p oin t, they selet a task all of whose predeessors ha v e b een s heduled, and then s hedule that task as early as p ossible giv en the previously s heduled tasks and the resoure onstrain ts. S heduling the tasks in this w a y pro dues a lo ally optimal s hedule that ma y b e impro v ed b y mo difying the order in whi h the tasks are seleted for s heduling. One metho d for nding an appropriate mo diation to the seletion order is kno wn as sque aky whe el optimization , or sw o (Joslin & Clemen ts, 1999). In sw o , a lo ally optimal s hedule is examined to determine whi h tasks are s heduled most sub optimally relativ e to some o v erall metri; those tasks are deemed to \squeak" and are then adv aned in the task list so that they are s heduled earlier when the s hedule is reonstruted. This pro ess is rep eated, pro duing a v ariet y of andidate solutions to the s heduling problem at hand; one of these s hedules is t ypially optimal or nearly so. Applying sw o to our game-pla ying problem is relativ ely straigh tforw ard. 18 When w e use Algorithm 7.1.7 to onstrut an a hiev able set, w e also onstrut as a b ypro dut a list of sample elemen ts to whi h that a hiev able set annot b e extended; mo ving elemen ts of this list forw ard in the sequene of h s 0 ; : : : ; s n i will ause them to b e more lik ely to b e inluded in the a hiev able set A if the algorithm is rein v ok ed. The w eigh ts assigned to the failing sequene elemen ts an b e onstruted b y determining ho w represen tativ e ea h partiular elemen t is of the remainder of the sample. Returning to our example, supp ose that the set S (where W est has the Q) has a single represen tativ e s 1 in the Mon te Carlo sample (presumably this means it is unlik ely for W est to hold the ard in question), while T has v e su h represen tativ es t 1 , t 2 , t 3 , t 4 and t 5 . Supp ose also that the initial ordering of the six elemen ts is h s 1 ; t 4 ; t 2 ; t 1 ; t 5 ; t 3 i . Assuming that the maximizer loses his righ tmost option (so that he annot ater to S and T sim ultaneously), the maximal a hiev able set orresp onding to this ordering is S . An examination no w rev eals that all of the t i 's ould ha v e b een a hiev ed but w eren't; in sw o terms, these elemen ts of the sample \squeak." A t the next iteration, the priorities of the t i 's are inreased b y mo ving them forw ard in the sequene, while the priorit y of s 1 falls. P erhaps the new ordering is h t 4 ; t 2 ; s 1 ; t 1 ; t 5 ; t 3 i . This ordering an b e easily seen to lead to the maximal a hiev able set T ; S [ T is still una hiev able. But the pa y o assigned to T is lik ely to b e m u h b etter than that assigned to S (a probabilit y of 0.8 instead of 0.2, if the Mon te Carlo sample itself is un w eigh ted). It is in this w a y that sw o allo ws us to nd a globally optimal (or nearly so) a hiev able set. 18. Squeaky wheel optimization w as dev elop ed at the Univ ersit y of Oregon; the paten t appliation for the te hnique has b een allo w ed b y the U.S. P aten t and T rademark OÆe. The Univ ersit y's in terests in sw o are liensed exlusiv ely to On Time Systems, In. for use in s heduling and related appliations, and to Just W rite, In. for use in bridge-pla ying systems. 349 Ginsber g 7.3 Results Our implemen tation of gib 's ardpla y when delarer is based on the ideas desrib ed ab o v e. (As a defender, a diret Mon te Carlo approa h app ears preferable b eause enough infor- mation is t ypially a v ailable ab out delarer's hand to mak e the double-dumm y assumption reasonably v alid.) The implemen tation is fast enough to onform to the time requiremen ts plaed on a pro dution program (roughly one pu min ute to pla y ea h deal). Ev aluating the impat of these ideas on gib 's ardpla y is diÆult, sine delarer pla y is already the strongest asp et of its game. In extended mat hes b et w een the t w o v ersions of gib , the approa h based on the ideas desrib ed here b eats the Mon te-Carlo based v ersion b y appro ximately 0.1 imp s/deal, but there is a great deal of noise in the data b eause most of the swings orresp ond to dierenes in bidding or defensiv e pla y . It is p ossible to remo v e some of these dierenes artiially (requiring the bidding to b e iden tial b oth times the deal is pla y ed, for example), but defensiv e dierenes remain. Nev ertheless, gib is urren tly a strong enough pla y er that the 0.1 imp s/deal dierene is signian t. The situation on problem deals, su h as those from the par on tests or from the Gitelman sets, is m u h learer. In addition, man y of the deals that gib gets \wrong" are in fat deals that gib pla ys orretly but that the problem omp osers pla y inorretly (Gitelman or, in the ase of the par on tests, Swiss bridge exp ert Pietro Bernasoni). In the follo wing table, w e ha v e b een generous with all parties, deeming a line to b e orret if it is not learly inferior to another. Let me p oin t out that the designers of the problems are attempting to onstrut deals where there is a unique solution (the \answ er" to the test they are p osing the solv er), so that a deal with m ultiple solutions is in fat one that the omp oser has already misanalyzed. Soure size BB Gib MC Gib SW O omp oser am biguous BM lev el 1 36 16 31 36 35 0 lev el 2 36 8 23 34 34 1 lev el 3 36 2 12 34 34 2 lev el 4 36 1 21 31 34 4 lev el 5 36 4 13 28 34 5 1998 par on test 12 0 5 11 12 2 1990 par on test 18 0 8 14 17 3 The ro ws are in order of inreasing diÆult y; it w as univ ersally felt among the h uman omp etitors that the deals in the 1990 par on test w ere far more diÆult than those in 1998. The olumns are as follo ws: Soure is the soure from whi h the problems w ere obtained. Size is the n um b er of problems a v ailable from this partiular soure. BB giv es the n um b er of problems solv ed orretly b y Bridge Baron 6. Gib MC giv es the n um b er solv ed orretly b y gib using a Mon te Carlo approa h. Gib SW O giv es the n um b er solv ed orretly b y gib using sw o and a hiev able sets. omp oser giv es the n um b er solv ed orretly b y the omp oser (in that the in tended solution w as the b est one a v ailable). am biguous giv es the n um b er misanalyzed b y the omp oser (in that m ultiple solutions exist). 350 GIB: Imperfet inf orma tion in a omput a tionall y hallenging game Note, iniden tally , that gib 's p erformane is still less than p erfet on these problems. The reason is that gib 's sample ma y b e sk ew ed in some w a y , or that sw o ma y fail to nd a global optim um among the set of p ossible a hiev able sets. 8. Conlusion 8.1 GIB ompared Other programs Gib partiipated in b oth the 1998 and the 2000 W orld Computer Bridge Championships. (There w as no 1999 ev en t.) Pla y w as organized with ea h ma hine pla ying t w o hands and the omp etitors b eing trusted not to heat b y \p eeking" at partner's ards or those of the opp onen ts. 19 Ea h tournamen t b egan with a omplete round robin among the programs, with the top four programs on tin uing to a kno k out phase. The mat hes in the round robin w ere quite short, and it w as exp eted that bridge's sto hasti elemen t w ould k eep an y program from b eing ompletely dominan t. While this ma y ha v e b een true in theory , in pratie gib dominated b oth round robins, winning all of its mat hes in 1998 and all but one in 2000. The round robin results from the 2000 ev en t w ere as follo ws: 20 Gib WB Mir o Buff Q-Plus Chip Bar on M'lark T otal Gib { 14 11 16 7 19 16 17 100 WBridge 6 { 19 13 16 7 18 20 99 Mir o 9 1 { 18 15 15 13 20 91 Buff 4 7 2 { 12 20 5 20 70 Q-Plus 13 4 5 8 { 11 14 11 66 Blue Chip 1 13 5 0 9 { 11 20 59 Bar on 4 2 7 15 6 9 { 14 57 Meado wlark 3 0 0 0 9 0 6 { 18 Ea h mat h w as on v erted rst to imp s and then to vitory p oints , or VPs, with the t w o omp eting programs sharing the 20 VPs a v ailable in ea h mat h. The rst en try in the ab o v e table indiates that gib b eat wbridge b y 14 VPs to 6; the fourth that gib lost to q-plus bridge b y 7 VPs to 13. (This is gib 's only loss ev er to another program in tournamen t pla y .) In the 1998 kno k out phase, gib b eat Bridge Baron in the seminals b y 84 imp s o v er 48 deals. Had the programs b een ev enly mat hed, the imp dierene ould b e exp eted to b e normally distributed, and the observ ed 84 imp dierene w ould b e a 2.2 standard deviation 19. Starting with the 2001 ev en t, ea h omputer will handle only one of the four pla y ers, although there is still no attempt to prev en t the (net w ork ed) omputers from transmitting illegal information b et w een partners. 20. There w ere eigh t omp etitors in the ev en t: gib ( www.gib w a re.om ), Hans Leb er's q-plus ( www.q-plus.om ), T omio and Y umik o U hida's mir o bridge ( www.threew eb.ad.jp/~ mb ridge ), Mik e Whittak er and Ian T ra kman's blue hip bridge ( www.bluehipb ridge.o.uk ), Ro d Lud- wig's meado wlark bridge ( rrnet.om/meado wla rk ), bridge bar on ( www.b ridgeba ron.om ), and t w o new omers: Doug Bannion's bridge buff ( www.b ridgebu.om ) and Yv es Costel's wbridge ( ourw o rld.ompuserve.om/homepages/yvesostel ). 351 Ginsber g ev en t. Gib then b eat Q-Plus Bridge in the nals b y 63 imp s o v er 64 deals (a 1.4 standard deviation ev en t). In 2000, it b eat Bridge Bu b y 39 imp s o v er 48 deals in the seminals (a 1.0 standard deviation ev en t) and then b eat wbridge b y 101 imp s o v er 58 deals (a 2.6 standard deviation ev en t). The nals had b een s heduled to run 64 deals, but wbridge oneded after 58 had b een pla y ed. The most publiized deal from the nal w as this one, an extremely diÆult deal that b oth programs pla y ed mo derately w ell. Gib rea hed a b etter on trat and w as aided somewhat b y wbridge 's misdefene in a mo derately omplex situation. K Q 9 ~ A Q J } 9 6 4 3 2 | 8 6 10 6 8 7 3 2 ~ 10 9 2 ~ 7 5 3 } 10 } A K Q J 8 5 | A J 10 9 5 3 2 | | A J 5 4 ~ K 8 6 4 } 7 | K Q 7 4 When wbridge pla y ed the North-South ards and gib w as East-W est, North op ened 1 } and ev en tually pla y ed in three notrump, ommitting to taking nine tri ks. The gib East started with four rounds of diamonds as South disarded t w o lubs and . . . ? Lo oking at all four hands, the on trat is old; South an disard another lub and East has none to pla y . There are th us nine tri ks: four in ea h of hearts and spades, and the diamond nine. Giv e East a lub, ho w ev er, and the on trat rates to b e do wn no less than four sine the defense will b e able to tak e at least four lub tri ks. WBridge deided to pla y safe, k eeping the | K Q and disarding a heart. There are no w only eigh t tri ks and the on trat w as do wn one. The bidding and pla y w ere more in teresting when gib w as N-S. North op ened 1NT, sho wing 11{14 HCP without four hearts or spades unless exatly three ards w ere held in ev ery other suit. East o v eralled a natural 2 } and South ue bid 3 } , sho wing w eakness in diamonds and asking North to bid a 4-ard heart or spade suit if he had one. North has no go o d bid at this p oin t. Bidding 3NT with v e small diamonds rates to b e wrong and 4 | is learly out of the question. Gib 's sim ulation suggested that 3 (ostensibly sho wing four of them) w as the least of evils. South raised to 4 , and East doubled, ending the aution. East led a top diamond, and shifted to the ~ 3, w on b y North's ~ Q. Gib no w ashed the ~ J and led the | 6, whi h East hose (wrongly) to ru. WBridge no w led the } K as East, whi h w as rued with the J. Gib w as no w able to ash the AK to pro due: 352 GIB: Imperfet inf orma tion in a omput a tionall y hallenging game Q ~ A } 9 6 2 | 8 | 8 ~ | ~ 7 } | } Q J 8 5 | A J 10 9 5 3 | | 5 ~ K 8 } | | K Q 4 Kno wing the p osition exatly , gib needed v e more tri ks with North to lead. It rued a diamond, returned to the ~ A and drew East's trump with the Q. No w a lub fored an en try to the South hand, where the ~ K pro vided the ten th tri k. Humans Gib pla y ed a 14-deal demonstration mat h against h uman w orld hampions Zia Mahmo o d and Mi hael Rosen b erg 21 in the AAAI Hall of Champions in 1998, losing b y a total of 6.4 imp s (a 0.3 standard deviation ev en t). Early v ersions of gib also pla y ed on OKBridge, an in ternet bridge lub with some 15,000 mem b ers. 22 After pla ying thousands of deals against h uman opp onen ts of v arious lev els, gib 's ranking w as omparable to the OKBridge a v erage. It is probable that neither of these results is an aurate reetion of gib 's urren t strength. The Mahmo o d-Rosen b erg mat h w as extremely short and gib app eared to ha v e the b est of the lu k. The OKBridge in terfae has hanged and the gib `OKb ots' no longer funtion. The p erformane gures there are th us somewhat outdated, predating v arious reen t impro v emen ts inluding all of the ideas in Setions 5{7. More in teresting information will b eome a v ailable starting in late July of 2001, when gib , paired with Gitelman and his regular partner Brad Moss, will b egin a series of 64-deal mat hes against h uman opp onen ts of v arying skill lev els. 8.2 Curren t and future w ork Reen t w ork on gib has fo used on its w eak est areas: defensiv e ardpla y and bidding. The bidding w ork has b een and on tin ues to b e primarily a matter of extending the existing bidding database, although gib 's bidding language is also b eing hanged from Standard Amerian (a fairly natural system) to a v arian t of an artiial system alled Mosito de- v elop ed in Australia. 23 Mosito has v ery sharply dened meanings, making it ideal for use 21. Mahmo o d and Rosen b erg ha v e w on, among other titles, the 1995 Cap V olma W orld T op In vitational T ournamen t. As remark ed earlier, Rosen b erg w ould also go on after the GIB mat h to win the P ar Comp etition in whi h GIB nished 12th. 22. http://www.okbridge.om 23. Gib 's v ersion of Mosito is alled Mosito Byte . 353 Ginsber g b y a omputer program, and is an \ation" system, w orking hard to mak e the opp onen ts' bidding as diÆult as p ossible. With regard to defensiv e ardpla y , the k ey elemen ts of high lev el defense are to mak e it hard for partner to mak e a mistak e while making it easy for delarer to do so. Pro viding gib with these abilities will in v olv e an extra lev el of reursion in the ardpla y , as ea h elemen t of the Mon te Carlo sample m ust no w b e onsidered from other pla y ers' p oin ts of view, as they generate and then analyze their o wn samples. These ideas ha v e b een implemen ted but urren tly lead to small p erformane degradations (appro ximately 0.05 imp s/deal) b eause the omputational ost of the reursiv e analyses require reduing the size of the Mon te Carlo sample substan tially . As pro essor sp eeds inrease, it is reasonable to exp et these ideas to b ear signian t fruit. In 1997, Martel, a omputer sien tist himself, suggested that he exp eted gib to b e the b est bridge pla y er in the w orld in appro ximately 2003. y The w ork app ears to b e roughly on s hedule. 8.3 Other games I ha v e left essen tially un tou hed the question of to what exten t the basi te hniques w e ha v e disussed ould b e applied to games of imp erfet information other than bridge. The ideas that w e ha v e presen ted are lik ely to b e the most appliable in games where the p erfet information v arian t is tratable but omputationally hallenging, and the as- sumption that one's opp onen ts are pla ying with p erfet information is a reasonable one. This suggests that games lik e hearts and other tri k-taking games will b e amenable to our te hniques, while games lik e p ok er (where it is essen tial to realize and exploit the fat that the opp onen ts also ha v e imp erfet information) are lik ely to need other approa hes. A kno wledgmen ts A great man y p eople ha v e on tributed to the gib pro jet o v er the y ears. In the te hnial omm unit y , I w ould lik e to thank Jonathan S haeer, Ri h Korf, Da vid Etherington, Bart Massey and the other mem b ers of irl . In the bridge omm unit y , I ha v e reeiv ed in v aluable assistane from Chip Martel, Ro d Ludwig, Zia Mahmo o d, Andrew Robson, Alan Jara y , Hans Kuijf, F red Gitelman, Bob Hamman, Eri Ro dw ell, Je Goldsmith, Thomas Andrews and the mem b ers of the re.games.bridge omm unit y . The w ork itself has b een supp orted b y Just W rite, In., b y D ARP A/Rome Labs under on trats F30602-95-1-0023 and F30602- 97-1-0294, and b y the Bo eing Compan y under on trat AHQ569. T o ev ery one who has on tributed, whether named ab o v e or not, I o w e m y deep est appreiation. App endix A. A summary of the rules of bridge W e giv e here a v ery brief summary of the rules of bridge. Readers w an ting a more omplete desription are referred to an y of the man y exellen t texts a v ailable (Shein w old, 1996). Bridge is a ard game for four pla y ers, who are split in to t w o pairs. Mem b ers of a single pair sit opp osite one another, so that North-South form one pair and East-W est the other. 354 GIB: Imperfet inf orma tion in a omput a tionall y hallenging game The de k is distributed ev enly among the pla y ers, so that ea h deal in v olv es giving ea h pla y er a hand of 13 ards. The game then pro eeds through a bidding and a pla ying phase. The pla ying phase onsists of 13 triks , with ea h pla y er on tributing one ard to ea h tri k in a lo kwise fashion. The pla y er who pla ys rst to an y tri k is said to le ad to that tri k. The highest ard of the suit led wins the tri k (Ae is high and deue lo w), unless a trump is pla y ed, in whi h ase the highest trump wins the tri k. The p erson who leads to a tri k is free to lead an y ard he wishes; subsequen t pla y ers m ust pla y a ard of the suit led if they ha v e one, and an pla y an y ard they ho ose if they don't. The winner of one tri k leads to the next; the p erson who leads to the rst tri k (the op ening le ader ) is determined during the bidding phase of the game. The ob jet of the ard pla y phase is alw a ys for y our partnership to tak e as man y tri ks as p ossible; there is no adv an tage to one partner's taking a tri k o v er another, and the order in whi h the tri ks are tak en is irrelev an t. After the op ening leader pla ys the rst ard to the rst tri k, the pla y er to his left plaes his ards fae up on the table so that all of the other pla y ers an see them. This pla y er is alled the dummy , and when it is dumm y's turn to pla y , dumm y's partner (who an see the partnership's om bined assets) selets the ard to b e pla y ed. Dumm y's partner is alled the de lar er and the mem b ers of the other pair are alled the defenders . The purp ose of the bidding phase is to iden tify trumps and the delarer, and also the ontr at , whi h will b e desrib ed shortly . The op ening leader is iden tied as w ell, and is the pla y er to the delarer's left. During the bidding phase, v arious on trats are prop osed. The dealer has the rst opp ortunit y to prop ose a on trat and subsequen t opp ortunities are giv en to ea h pla y er in a lo kwise diretion. Ea h pla y er has man y opp ortunities to suggest a on trat during this phase of the game, whi h is alled the aution . Ea h partnership is required to explain the meanings of their ations during the aution to the other side, if requested. Ea h on trat suggests a partiular trump suit (or p erhaps that there not b e a trump suit at all). Ea h pla y er suggesting a on trat is ommitting his side to winning some par- tiular n um b er of the 13 a v ailable tri ks. The minim um ommitmen t is 7 tri ks, so there are 35 p ossible on trats (ea h of 4 p ossible trumps, or no trumps, and sev en p ossible om- mitmen ts, from sev en to thirteen tri ks). These 35 on trats are ordered, whi h guaran tees that the bidding phase will ev en tually terminate. After the bidding phase is omplete, the side that suggested the nal on trat is the de laring side . Of the t w o mem b ers of the delaring side, the one who rst suggested the ev en tual trump suit (or no trumps) is the delarer. Pla y b egins with the pla y er to the delarer's left leading to the rst tri k. After the hand is omplete, there are t w o p ossible outomes. If the delaring side to ok at least as man y tri ks as it ommitted to taking, the delaring side reeiv es a p ositiv e sore and the defending side an equal but negativ e sore. There are substan tial b on uses a w arded for ommitting to taking partiular n um b ers of tri ks; in general, the larger the ommitmen t, the larger the b on us. There are small b on uses a w arded for winning tri ks ab o v e and b ey ond the ommitmen t. If the delaring side failed to honor its ommitmen t, it reeiv es a negativ e sore and the defenders reeiv e an equal but p ositiv e sore. The o v erall sore in this ase (where the 355 Ginsber g delarer \go es do wn") is generally smaller than the o v erall sore in the ase where delarer \mak es it" (i.e., honors his ommitmen t). App endix B. A new ending diso v ered b y GIB This deal o urred during a short imp mat h b et w een gib and Bridge Baron. 9 6 ~ Q J 8 5 } A Q 3 | K J 10 8 K Q J 8 7 5 4 3 ~ 9 4 3 ~ A 7 2 } 7 } J 10 6 2 | 6 4 2 | A Q 7 3 A 10 2 ~ K 10 6 } K 9 8 5 4 | 9 5 With South ( gib ) dealing at unfa v orable vulnerabilit y , the aution w en t P{2 {X{P{3NT{ all pass. (P is pass and X is double.) The op ening lead w as the K, du k ed b y gib , and Bridge Baron no w swit hed to a small heart. East w on the ae and returned to spades, gib winning. Gib ashed all the hearts, pit hing a small lub from its hand. It then tested the diamonds, learning of the bad break and winning the third diamond in hand. It then led the } 9 in the follo wing p osition: | ~ | } | | K J 10 8 Q | ~ | ~ | } | } J | ? ? ? | A ? ? 10 ~ | } 9 8 | 9 When gib pit hed the ten of lubs from dumm y (it had b een aiming for this ending all along), the defenders w ere helpless to tak e more than t w o tri ks indep enden t of the lo ation of the lub queen. A t the other table, Bridge Baron let gib pla y in 2 making exatly , and gib pi k ed up 12 imp s. 356 GIB: Imperfet inf orma tion in a omput a tionall y hallenging game Referenes Adelson-V elskiy , G., Arlazaro v, V., & Donsk o y , M. (1975). Some metho ds of on trolling the tree sear h in hess programs. A rtiial Intel ligen e , 6 , 361{371. Ba y ardo, R. J., & Mirank er, D. P . (1996). A omplexit y analysis of spae-b ounded learning algorithms for the onstrain t satisfation problem. In Pr o e e dings of the Thirte enth National Confer en e on A rtiial Intel ligen e , pp. 298{304. Billings, D., P app, D., S haeer, J., & Szafron, D. (1998). Opp onen t mo deling in p ok er. In Pr o e e dings of the Fifte enth National Confer en e on A rtiial Intel ligen e , pp. 493{ 499. Bla kw o o d, E. (1979). Play of the Hand with Blakwo o d . Bobbs-Merrill. Esk es, O. (1997). GIB: Sensational breakthrough in bridge soft w are. IMP , 8 (2). F rank, I. (1998). Se ar h and Planning under In omplete Information: A Study Using Bridge Car d Play . Springer-Verlag, Berlin. F rank, I., & Basin, D. (1998). Sear h in games with inomplete information: A ase study using bridge ard pla y . A rtiial Intel ligen e , 100 , 87{123. F rank, I., Basin, D., & Bundy , A. (2000). Com bining kno wledge and sear h to solv e single- suit bridge. In Pr o e e dings of the Sixte enth National Confer en e on A rtiial Intel li- gen e , pp. 195{200. F rank, I., Basin, D., & Matsubara, H. (1998). Finding optimal strategies for imp erfet information games. In Pr o e e dings of the Fifte enth National Confer en e on A rtiial Intel ligen e , pp. 500{507. Ginsb erg, M. L. (1993). Dynami ba ktra king. Journal of A rtiial Intel ligen e R ese ar h , 1 , 25{46. Ginsb erg, M. L., & Harv ey , W. D. (1992). Iterativ e broadening. A rtiial Intel ligen e , 55 , 367{383. Ginsb erg, M. L., & Jara y , A. (2001). Alpha-b eta pruning under partial orders. In Games of No Chan e II . T o app ear. Gr atzer, G. (1978). Gener al L atti e The ory . Birkh auser V erlag, Basel. Green blatt, R., Eastlak e, D., & Cro k er, S. (1967). The green blatt hess program. In F al l Joint Computer Confer en e 31 , pp. 801{810. Joslin, D. E., & Clemen ts, D. P . (1999). Squeaky wheel optimization. Journal of A rtiial Intel ligen e R ese ar h , 10 , 353{373. Koller, D., & Pfeer, A. (1995). Generating and solving imp erfet information games. In Pr o e e dings of the F ourte enth International Joint Confer en e on A rtiial Intel ligen e , pp. 1185{1192. Levy , D. N. (1989). The million p ound bridge program. In Levy , D., & Beal, D. (Eds.), Heuristi Pr o gr amming in A rtiial Intel ligen e , Asilomar, CA. Ellis Horw o o d. Lind-Nielsen, J. (2000). BuDDy: Binary Deision Diagram pa k age. T e h. rep., Depart- men t of Information T e hnology , T e hnial Univ ersit y of Denmark, DK-2800 Lyngb y , Denmark. 357 Ginsber g Lindel of, T. (1983). COBRA: The Computer-Designe d Bidding System . Gollanz, London. Marsland, T. A. (1986). A review of game-tree pruning. J. Intl. Computer Chess Assn. , 9 (1), 3{19. MAllester, D. A. (1988). Conspiray n um b ers for min-max sear hing. A rtiial Intel ligen e , 35 , 287{310. P earl, J. (1980). Asymptoti prop erties of minimax trees and game-sear hing pro edures. A rtiial Intel ligen e , 14 (2), 113{138. P earl, J. (1982). A solution for the bran hing fator of the alpha-b eta pruning algorithm and its optimalit y . Comm. A CM , 25 (8), 559{564. Plaat, A., S haeer, J., Pijls, W., & de Bruin, A. (1996). Exploiting graph prop erties of game trees. In Pr o e e dings of the Thirte enth National Confer en e on A rtiial Intel ligen e , pp. 234{239. S haeer, J. (1997). One Jump A he ad: Chal lenging Human Supr emay in Che kers . Springer-V erlag, New Y ork. Shein w old, A. (1996). Five We eks to Winning Bridge . P o k et Bo oks. Smith, S. J., Nau, D. S., & Thro op, T. (1996). T otal-order m ulti-agen t task-net w ork plan- ning for on trat bridge. In Pr o e e dings of the Thirte enth National Confer en e on A rtiial Intel ligen e , Stanford, California. Stallman, R. M., & Sussman, G. J. (1977). F orw ard reasoning and dep endeny-direted ba ktra king in a system for omputer-aided iruit analysis. A rtiial Intel ligen e , 9 , 135{196. Sterling, L., & Nygate, Y. (1990). PYTHON: An exp ert squeezer. J. L o gi Pr o gr amming , 8 , 21{40. Wilkins, D. E. (1980). Using patterns and plans in hess. A rtiial Intel ligen e , 14 , 165{203. 358
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment