GIB: Imperfect Information in a Computationally Challenging Game

Journal of Artiial In telligene Resear h 14 (2001) 303{358 Submitted 10/00; published 6/01 GIB: Imp erfet Information in a Computationally Challenging Game Matthew L. Ginsb erg ginsber girl.uoregon.edu CIRL 1269 University of Or e gon Eugene, OR 97405 USA Abstrat This pap er in v estigates the problems arising in the onstrution of a program to pla y the game of on trat bridge. These problems inlude b oth the diÆult y of solving the game's p erfet information v arian t, and te hniques needed to address the fat that bridge is not, in fat, a p erfet information game. Gib , the program b eing desrib ed, in v olv es v e separate te hnial adv anes: partition sear h, the pratial appliation of Mon te Carlo te hniques to realisti problems, a fo us on a hiev able sets to solv e problems inheren t in the Mon te Carlo approa h, an extension of alpha-b eta pruning from total orders to arbitrary distributiv e latties, and the use of squeaky wheel optimization to nd appro ximately optimal solutions to ardpla y problems. Gib is urren tly b eliev ed to b e of appro ximately exp ert alib er, and is urren tly the strongest omputer bridge program in the w orld. 1. In tro dution Of all the lassi games of men tal skill, only ard games and Go ha v e y et to see the ap- p earane of serious omputer  hallengers. In Go, this app ears to b e b eause the game is fundamen tally one of pattern reognition as opp osed to sear h; the brute-fore te hniques that ha v e b een so suessful in the dev elopmen t of  hess-pla ying programs ha v e failed al- most utterly to deal with Go's h uge bran hing fator. Indeed, the arguably strongest Go program in the w orld (Handtalk) w as b eaten b y 1-dan Janie Kim (winner of the 1984 F uji W omen's Championship) in the 1997 AAAI Hall of Champions after Kim had giv en the program a mon umen tal 25 stone handiap. Card games app ear to b e dieren t. P erhaps b eause they are games of imp erfet in- formation, or p erhaps for other reasons, existing p ok er and bridge programs are extremely w eak. W orld p ok er  hampion Ho w ard Lederer (T exas Hold'em, 1996) has said that he w ould exp et to b eat an y existing p ok er program after v e min utes' pla y . y 1 P erennial w orld bridge  hampion Bob Hamman, sev en-time winner of the Berm uda Bo wl, summarized the state of bridge programs in 1994 b y sa ying that, \They w ould ha v e to impro v e to b e hop eless." y In p ok er, there is reason for optimism: the gala system (Koller & Pfeer, 1995), if appliable, promises to pro due a omputer pla y er of unpreeden ted strength b y reduing the p ok er \problem" to a large linear optimization problem whi h is then solv ed to generate a strategy that is nearly optimal in a game-theoreti sense. S haeer, author of the w orld 1. Man y of the itations here are the results of p ersonal omm uniations. Su h omm uniations are indi- ated simply b y the presene of a y in the aompan ying text.   2001 AI Aess F oundation and Morgan Kaufmann Publishers. All righ ts reserv ed. Ginsber g  hampion  he k ers program Chinook (S haeer, 1997), is also rep orting signian t suess in the p ok er domain (Billings, P app, S haeer, & Szafron, 1998). The situation in bridge has b een bleak er. In addition, b eause the Amerian Con trat Bridge League ( a bl ) do es not rank the bulk of its pla y ers in meaningful w a ys, it is diÆult to ompare the strengths of omp eting programs or pla y ers. In general, p erformane at bridge is measured b y pla ying the same deal t wie or more, with the ards held b y one pair of pla y ers b eing giv en to another pair during the repla y and the results then b eing ompared. 2 A \team" in a bridge mat h th us t ypially onsists of t w o pairs, with one pair pla ying the North/South (N/S) ards at one table and the other pair pla ying the E/W ards at the other table. The results obtained b y the t w o pairs are added; if the sum is p ositiv e, the team wins this partiular deal and if negativ e, they lose it. In general, the n umeri sum of the results obtained b y the t w o pairs is on v erted to In ternational Mat h P oin ts, or imp s. The purp ose of the on v ersion is to diminish the impat of single deals on the total, lest an abnormal result on one partiular deal ha v e an unduly large impat on the result of an en tire mat h. Je Goldsmith y rep orts that the standard deviation on a single deal in bridge is ab out 5.5 imp s, so that if t w o roughly equal pairs w ere to pla y the deal, it w ould not b e surprising if one team b eat the other b y ab out this amoun t. It also app ears that the dierene b et w een an a v erage lub pla y er and an exp ert is ab out 1.5 imp s (p er deal pla y ed); the strongest pla y ers in the w orld are appro ximately 0.5 imp s/deal b etter still. Exepting gib , the strongest bridge pla ying programs app ear to b e sligh tly w eak er than a v erage lub pla y ers. Progress in omputer bridge has b een slo w. An inorp oration of planning te hniques in to Bridge Baron, for example, app ears to ha v e led to a p erformane inremen t of appro ximately 1/3 imp p er deal (Smith, Nau, & Thro op, 1996). This mo dest impro v emen t still lea v es Bridge Baron far sh y of exp ert-lev el (or ev en go o d amateur-lev el) p erformane. Prior to 1997, bridge programs generally attempted to dupliate h uman bridge-pla ying metho dology in that they pro eeded b y attempting to reognize the lass in to whi h an y partiular deal fell: nesse, end pla y , squeeze, et. Smith et al.'s w ork on the Bridge Baron program uses planning to extend this approa h, but the plans on tin ue to b e onstruted from h uman bridge te hniques. Nygate and Sterling's early w ork on python (Sterling & Nygate, 1990) pro dued an exp ert system that ould reognize squeezes but not prepare for them. In retrosp et, p erhaps w e should ha v e exp eted this approa h to ha v e limited suess; ertainly  hess-pla ying programs that ha v e attempted to mimi h uman metho dology , su h as p aradise (Wilkins, 1980), ha v e fared p o orly . Gib , in tro dued in 1998, w orks dieren tly . Instead of mo deling its pla y on te hniques used b y h umans, gib uses brute-fore sear h to analyze the situation in whi h it nds itself. A v ariet y of te hniques are then used to suggest pla ys based on the results of the brute-fore sear h. This te hnique has b een so suessful that all omp etitiv e bridge programs ha v e swit hed from a kno wledge-based approa h to a sear h-based approa h. GIB's ardpla y based on brute-fore te hniques w as at the exp ert lev el (see Setion 3) ev en without some of the extensions that w e disuss in Setion 5 and subsequen tly . The w eak est part of gib 's game is bidding, where it relies on a large database of rules desribing 2. The rules of bridge are summarized in App endix A. 304 GIB: Imperfet inf orma tion in a omput a tionall y hallenging game the meanings of v arious autions. Quan titativ e omparisons here are diÆult, although the general impression of the stronger pla y ers using GIB are that its o v erall pla y is omparable to that of a h uman exp ert. This pap er desrib es the v arious te hniques that ha v e b een used in the gib pro jet, as follo ws: 1. Gib 's analysis in b oth bidding and ardpla y rests on an abilit y to analyze bridge's p erfet-information v arian t, where all of the ards are visible and ea h side attempts to tak e as man y tri ks as p ossible (this p erfet-information v arian t is generally referred to as double dummy bridge). Double dumm y problems are solv ed using a te hnique kno wn as p artition se ar h , whi h is disussed in Setion 2. 2. Early v ersions of gib used Monte Carlo metho ds exlusiv ely to selet an ation based on the double dumm y analysis. This te hnique w as originally prop osed for ardpla y b y Levy (Levy , 1989), but w as not implemen ted in a p erformane program b efore gib . Extending Levy's suggestion, gib uses Mon te Carlo sim ulation for b oth ardpla y (disussed in Setion 3) and bidding (disussed in Setion 4). 3. Setion 5 disusses diÆulties with the Mon te Carlo approa h. F rank et al. ha v e suggested dealing with these problems b y sear hing the spae of p ossible plans for pla ying a partiular bridge deal, but their metho ds app ear to b e in tratable in b oth theory and pratie (F rank & Basin, 1998; F rank, Basin, & Bundy , 2000). W e instead  ho ose to deal with the diÆulties b y mo difying our understanding of the game so that the v alue of a bridge deal is not an in teger (the n um b er of tri ks that an b e tak en) but is instead tak en from a distributiv e lattie. 4. In Setion 6, w e sho w that the alpha-b eta pruning me hanism an b e extended to deal with games of this t yp e. This allo ws us to nd optimal plans for pla ying bridge end p ositions in v olving some 32 ards or few er. (In on trast, F rank's metho d is apable only of nding solutions in 16 ard endings.) 5. Finally , applying our ideas to the pla y of full deals (52 ards) requires solving an appro ximate v ersion of the o v erall problem. In Setion 7, w e desrib e the nature of the appro ximation used and our appliation of sque aky whe el optimization (Joslin & Clemen ts, 1999) to solv e it. Conluding remarks are on tained in Setion 8. 2. P artition sear h Computers are eetiv e game pla y ers only to the exten t that brute-fore sear h an o v erome innate stupidit y; most of their time sp en t sear hing is sp en t examining mo v es that a h uman pla y er w ould disard as ob viously without merit. As an example, supp ose that White has a fored win in a partiular  hess p osition, p erhaps b eginning with an atta k on Bla k's queen. A h uman analyzing the p osition will see that if Bla k do esn't resp ond to the atta k, he will lose his queen; the analysis onsiders plaes to whi h the queen ould mo v e and appropriate resp onses to ea h. 305 Ginsber g A ma hine onsiders resp onses to the queen mo v es as w ell, of ourse. But it m ust also analyze in detail ev ery other Bla k mo v e, arefully demonstrating that ea h of these other mo v es an b e refuted b y apturing the Bla k queen. A six-ply sear h will ha v e to analyze ev ery one of these mo v es v e further ply , ev en if the refutations are iden tial in all ases. Con v en tional pruning te hniques annot help here; using  -  pruning, for example, the en tire \main line" (White's winning  hoies and all of Bla k's losing resp onses) m ust b e analyzed ev en though there is a great deal of apparen t redundany in this analysis. 3 In other sear h problems, te hniques based on the ideas of dep endeny main tenane (Stall- man & Sussman, 1977) an p oten tially b e used to o v erome this sort of diÆult y . As an example, onsider  hronologial ba ktra king applied to a map oloring problem. When a dead end is rea hed and the sear h ba ks up, no information is a hed and the eet is to eliminate only the sp ei dead end that w as enoun tered. Reording information giving the reason for the failure an mak e the sear h substan tially more eÆien t. In attempting to olor a map with only three olors, for example, thirt y oun tries ma y ha v e b een olored while the deteted on tradition in v olv es only v e. By reording the on tradition for those v e oun tries, dead ends that fail for the same reason an b e a v oided. Dep endeny-based metho ds ha v e b een of limited use in pratie b eause of the o v erhead in v olv ed in onstruting and using the olletion of aum ulated reasons. This problem has b een substan tially addressed in the w ork on dynami ba ktra king (Ginsb erg, 1993) and its suessors su h as relsa t (Ba y ardo & Mirank er, 1996), where p olynomial limits are plaed on the n um b er of nogo o ds b eing main tained. In game sear h, ho w ev er, most algorithms already inlude signian t a hed information in the form of a transp osition table (Green blatt, Eastlak e, & Cro  k er, 1967; Marsland, 1986). A transp osition table stores a single game p osition and the ba k ed up v alue that has b een asso iated with it. The name reets the fat that man y games \transp ose" in that iden tial p ositions an b e rea hed b y sw apping the order in whi h mo v es are made. The transp osition table eliminates the need to reompute v alues for p ositions that ha v e already b een analyzed. These olleted observ ations lead naturally to the idea that transp osition tables should store not single p ositions and their v alues, but sets of p ositions and their v alues. Con tin uing the dep endeny-main tenane analogy , a transp osition table storing sets of p ositions an prune the subsequen t sear h far more eÆien tly than a table that stores only singletons. There are t w o reasons that this approa h w orks. The rst, whi h w e ha v e already men- tioned, is that most game-pla ying programs already main tain transp osition tables, thereb y inurring the bulk of the omputational exp ense in v olv ed in storing su h tables in a more general form. The seond and more fundamen tal reason is that when a game ends with one pla y er the winner, the reason for the vitory is generally a lo al one. A  hess game an b e though t of as ending when one side has its king aptured (a ompletely lo al phenomenon); a  he k ers game, when one side runs out of mo v es. Ev en if an in ternal sear h no de is ev al- uated b efore the game ends, the reason for assigning it an y sp ei v alue is lik ely to b e indep enden t of some global features (e.g., is the Bla k pa wn on a 5 or a 6?). P artition sear h exploits b oth the existene of transp osition tables and the lo alit y of ev aluation for realisti games. 3. An informal solution to this is Adelson-V elskiy et al.'s metho d of analo gies (Adelson-V elskiy , Arlazaro v, & Donsk o y , 1975). This approa h app ears to ha v e b een of little use in pratie b eause it is restrited to a sp ei lass of situations arising in  hess games. 306 GIB: Imperfet inf orma tion in a omput a tionall y hallenging game ! ! ! ! ! !       a a a a a a X X X O O O X X X X O O O X X X X O O O X X X O O X O X X X O O O X X X O O O X X X O O O X X X O O O X X X O O X O mo v es Figure 1: A p ortion of the game tree for ti-ta-to e This setion explains these ideas via an example and then desrib es them formally . Exp erimen tal results for bridge are also presen ted. 2.1 An example Our illustrativ e examples for partition sear h will b e tak en from the game of ti-ta-to e. A p ortion of the game tree for this game app ears in Figure 1, where w e are analyzing a p osition that is a win for X. W e sho w O's four p ossible mo v es, and a winning resp onse for X in ea h ase. Although X frequen tly wins b y making a ro w aross the top of the diagram,  -  pruning annot redue the size of this tree b eause O's losing options m ust all b e analyzed separately . Consider no w the p osition at the lo w er left in the diagram, where X has w on: X X X O O O X (1) The reason that X has w on is lo al. If w e are retaining a list of p ositions with kno wn outomes, the en try w e an mak e b eause of this p osition is: X X X ? ? ? ? ? ? (2) where the ? means that it is irrelev an t whether the asso iated square is mark ed with an X, an O, or unmark ed. This table en try orresp onds not to a single p osition, but to appro ximately 3 6 b eause the unassigned squares an on tain X's, O's, or b e blank. W e an redue the game tree in Figure 1 to: 307 Ginsber g       ! ! ! ! ! !       a a a a a a X X X ? ? ? ? ? ? X X O O X O X X X O O O X X X O O O X X X O O O X X X O O O X X X O O X O mo v es Con tin uing the analysis, it is lear that the p osition X X ? ? ? ? ? ? (3) is a win for X if X is on pla y . 4 So is X ? ? ? ? ? ? X and the tree an b e redued to:      H H H H H X X X ? ? ? ? ? ? X ? ? ? X ? ? ? X X X ? ? ? ? ? ? X ? ? ? ? ? ? X X X O O X O mo v es Finally , onsider the p osition X X ? ? ? X (4) where it is O's turn as opp osed to X's. If O mo v es in the seond ro w, w e get an instane of X X ? ? ? ? ? ? while if O mo v es to the upp er righ t, w e get an instane of X ? ? ? ? ? ? X 4. W e assume that O has not already w on the game here, sine X w ould not b e \on pla y" if the game w ere o v er. 308 GIB: Imperfet inf orma tion in a omput a tionall y hallenging game Th us ev ery one of O's mo v es leads to a p osition that is kno wn to b e a win for X, and w e an onlude that the original p osition (4) is a win for X as w ell. The ro ot no de in the redued tree an therefore b e replaed with the p osition of (4). These p ositions apture the essene of the algorithm w e will prop ose: If pla y er x an mo v e to a p osition that is a mem b er of a set kno wn to b e a win for x , the giv en p osition is a win as w ell. If ev ery mo v e is to a p osition that is a loss, the original p osition is also. 2.2 F ormalizing partition sear h In this setion, w e presen t a summary of existing metho ds for ev aluating p ositions in game trees. There is nothing new here; our aim is simply to dev elop a preise framew ork in whi h our new results an b e presen ted. Denition 2.2.1 A n in terv al-v alued game is a quadruple ( G; p I ; s; ev ), wher e G is a nite set of le gal p ositions, p I 2 G is the initial p osition, s : G ! 2 G gives the imme diate su  essors of a given p osition, and ev is an evaluation funtion ev : G ! f max ; min g [ [0 ; 1℄ Informal ly, p 0 2 s ( p ) me ans that p osition p 0  an b e r e ahe d fr om p in a single move, and the evaluation funtion ev lab els internal no des b ase d up on whose turn it is to play ( max or min ) and values terminal p ositions in terms of some element of the unit interval [0 ; 1℄ . The strutur es G , p I , s and ev ar e r e quir e d to satisfy the fol lowing  onditions: 1. Ther e is no se quen e of p ositions p 0 ; : : : ; p n with n > 0 , p i 2 s ( p i  1 ) for e ah i and p n = p 0 . In other wor ds, ther e ar e no \lo ops" that r eturn to an identi al p osition. 2. ev ( p ) 2 [0 ; 1℄ if and only if s ( p ) =  . In other wor ds, ev assigns a numeri al value to p if and only if the game is over. Informal ly, ev ( p ) = max me ans that the maximizer is to play and ev ( p ) = min me ans that the minimizer is to play. W e use 2 G to denote the p o w er set of G , the set of subsets of G . There are t w o further things to note ab out this denition. First, the requiremen t that the game ha v e no \lo ops" is onsisten t with all mo dern games. In  hess, for example, p ositions an rep eat but there is a onealed oun ter that dra ws the game if either a single p osition rep eats three times or a ertain n um b er of mo v es pass without a apture or a pa wn mo v e. In fat, dealing with the hidden oun ter is more natural in a partition sear h setting than a on v en tional one, sine the ev aluation funtion is in general (although not alw a ys) indep enden t of the v alue of the oun ter. Seond, the range of ev inludes the en tire unit in terv al [0 ; 1℄. The v alue 0 represen ts a win for the minimizer, and 1 a win for the maximizer. The in termediate v alues migh t orresp ond to in termediate results (e.g., a dra w) or, more imp ortan tly , allo w us to deal with in ternal sear h no des that are b eing treated as terminal and assigned appro ximate v alues b eause no time remains for additional sear h. The ev aluation funtion ev an b e used to assign n umerial v alues to the en tire set G of p ositions: 309 Ginsber g Denition 2.2.2 Given an interval-value d game ( G; p I ; s; ev ), we intr o du e a funtion ev  : G ! [0 ; 1℄ dene d r e ursively by ev  ( p ) = 8 < : ev ( p ) ; if ev ( p ) 2 [0 ; 1℄ ; max p 0 2 s ( p ) ev  ( p 0 ) ; if ev ( p ) = max ; min p 0 2 s ( p ) ev  ( p 0 ) ; if ev ( p ) = min . The v alue of ( G; p I ; s; ev ) is dene d to b e ev  ( p I ) . T o ev aluate a p osition in a game, w e an use the w ell-kno wn minimax pro edure: Algorithm 2.2.3 (Minimax) F or a game ( G; p I ; s; ev ) and a p osition p 2 G , to ompute ev  ( p ): if ev ( p ) 2 [0 ; 1℄ return ev ( p ) if ev ( p ) = max return max p 0 2 s ( p ) minimax ( p 0 ) if ev ( p ) = min return min p 0 2 s ( p ) minimax ( p 0 ) There are t w o w a ys in whi h the ab o v e algorithm is t ypially extended. The rst in- v olv es the in tro dution of transp osition tables; w e will assume that a new en try is added to the transp osition table T whenev er one is omputed. (A mo diation to a he only seleted results is straigh tforw ard.) The seond in v olv es the in tro dution of  -  pruning. Inorp orating these ideas giv es us the algorithm at the top of the next page. Ea h en try in the transp osition table onsists of a p osition p , the urren t utos [ x; y ℄, and the omputed v alue v . Note the need to inlude information ab out the utos in the transp osition table itself, sine the v alidit y of an y partiular en try dep ends on the utos in question. As an example, supp ose that the v alue of some no de is in fat 1 (a win for the maxi- mizer) but that when the no de is ev aluated with utos of [0 ; 0 : 5℄ a v alue of 0.5 is returned (indiating a dra w) b eause the maximizer has an ob viously dra wing line. It is lear that this v alue is only aurate for the giv en utos; wider utos will lead to a dieren t answ er. In general, the upp er uto y is the urren tly smallest v alue assigned to a minimizing no de; the minimizer an do at least this w ell in that he an fore a v alue of y or lo w er. Similarly , x is the urren tly greatest v alue assigned to a maximizing no de. These uto v alues are up dated as the algorithm is in v ok ed reursiv ely in the lines resp onsible for setting v new , the v alue assigned to a  hild of the urren t p osition p . Prop osition 2.2.4 Supp ose that v =  ( p; [ x; y ℄) for e ah entry ( p; [ x; y ℄ ; v ) in T . Then if ev  ( p ) 2 [ x; y ℄ , the value r eturne d by A lgorithm 2.2.5 is ev  ( p ) . 310 GIB: Imperfet inf orma tion in a omput a tionall y hallenging game Algorithm 2.2.5 (  -  pruning with transp osition tables) Giv en an in terv al-v alued game ( G; p I ; s; ev ), a p osition p 2 G , utos [ x; y ℄  [0 ; 1℄ and a transp osition table T onsisting of triples ( p; [ a; b ℄ ; v ) with p 2 G and a  b; v 2 [0 ; 1℄, to ompute  ( p; [ x; y ℄): if there is an en try ( p; [ x; y ℄ ; z ) in T return z if ev ( p ) 2 [0 ; 1℄ then v ans = ev ( p ) if ev ( p ) = max then v ans := 0 for ea h p 0 2 s ( p ) do v new =  ( p 0 ; [max ( v ans ; x ) ; y ℄) if v new  y then T := T [ ( p; [ x; y ℄ ; v new ) return v new if v new > v ans then v ans = v new if ev ( p ) = min then v ans := 1 for ea h p 0 2 s ( p ) do v new =  ( p 0 ; [ x; min ( v ans ; y )℄) if v new  x then T := T [ ( p; [ x; y ℄ ; v new ) return v new if v new < v ans then v ans = v new T := T [ ( p; [ x; y ℄ ; v ans ) return v ans 2.3 P artitions W e are no w in a p osition to presen t our new ideas. W e b egin b y formalizing the idea of a p osition that an rea h a kno wn winning p osition or one that an rea h only kno wn losing ones. Denition 2.3.1 Given an interval-value d game ( G; p I ; s; ev ) and a set of p ositions S  G , we wil l say that the set of p ositions that an rea h S is the set of al l p for whih s ( p ) \ S 6 =  . This set wil l b e denote d R 0 ( S ) . The set of p ositions onstrained to rea h S is the set of al l p for whih s ( p )  S , and is denote d C 0 ( S ) . These denitions should mat h our in tuition; the set of p ositions that an rea h a set S is indeed the set of p ositions p for whi h some elemen t of S is an immediate suessor of p , so that s ( p ) \ S 6 = . Similarly , a p osition p is onstrained to rea h S if every immediate suessor of p is in S , so that s ( p )  S . Unfortunately , it ma y not b e feasible to onstrut the R 0 and C 0 op erators expliitly; there ma y b e no onise represen tation of the set of all p ositions that an rea h S . In pratie, this will b e reeted in the fat that the data strutures b eing used to desrib e 311 Ginsber g the set S ma y not on v enien tly desrib e the set R 0 ( S ) of all situations from whi h S an b e rea hed. No w supp ose that w e are expanding the sear h tree itself, and w e nd ourselv es analyz- ing a partiular p osition p that is determined to b e a win for the maximizer b eause the maximizer an mo v e from p to the winning set S ; in other w ords, p is a win b eause it is in R 0 ( S ). W e w ould lik e to reord at this p oin t that the set R 0 ( S ) is a win for the maxi- mizer, but ma y not b e able to onstrut or represen t this set on v enien tly . W e will therefore assume that w e ha v e some omputationally eetiv e w a y to appro ximate the R 0 and C 0 funtions, in that w e ha v e (for example) a funtion R that is a onserv ativ e implemen tation of R 0 in that if R sa ys w e an rea h S , then so w e an: R ( p; S )  R 0 ( S ) R ( p; S ) is in tended to represen t a set of p ositions that are \lik e p in that they an rea h the (winning) set S ." Note the inlusion of p as an argumen t to R ( p; S ), sine w e ertainly w an t p 2 R ( p; S ). W e are ab out to a he the fat that ev ery elemen t of R ( p; S ) is a win for the maximizer, and ertainly w an t that information to inlude the fat that p itself has b een sho wn to b e a win. Th us w e require p 2 R ( p; S ) as w ell. Finally , w e need some w a y to generalize the information returned b y the ev aluation funtion; if the ev aluation funtion itself iden ties a p osition p as a win for the maximizer, w e w an t to ha v e some w a y to generalize this to a wider set of p ositions that are also wins. W e formalize this b y assuming that w e ha v e some generalization funtion P that \resp ets" the ev aluation funtion in the sense that the v alue returned b y P is a set of p ositions that ev ev aluates iden tially . Denition 2.3.2 L et ( G; p I ; S; ev ) b e an interval-value d game. L et f b e any funtion with r ange 2 G , so that f sele ts a set of p ositions b ase d on its ar guments. We wil l say that f resp ets the evaluation funtion ev if whenever p; p 0 2 F for any F in the r ange of f , ev ( p ) = ev ( p 0 ) . A partition system for the game is a triple ( P ; R ; C ) of funtions that r esp e t ev suh that: 1. P : G ! 2 G maps p ositions into sets of p ositions suh that for any p osition p , p 2 P ( p ) . 2. R : G  2 G ! 2 G a  epts as ar guments a p osition p and a set of p ositions S . If p 2 R 0 ( S ) , so that p  an r e ah S , then p 2 R ( p; S )  R 0 ( S ) . 3. C : G  2 G ! 2 G a  epts as ar guments a p osition p and a set of p ositions S . If p 2 C 0 ( S ) , so that p is  onstr aine d to r e ah S , then p 2 C ( p; S )  C 0 ( S ) . As men tioned ab o v e, the funtion P tells us whi h p ositions are suÆien tly \lik e" p that they ev aluate to the same v alue. In ti-ta-to e, for example, the p osition (1) where X has w on with a ro w aross the top migh t b e generalized b y P to the set of p ositions X X X ? ? ? ? ? ? (5) 312 GIB: Imperfet inf orma tion in a omput a tionall y hallenging game as in (2). The funtions R and C appro ximate R 0 and C 0 . One again turning to our ti-ta-to e example, supp ose that w e tak e S to b e the set of p ositions app earing in (5) and that p is giv en b y X X O O O X so that S an b e rea hed from p . R ( p; S ) migh t b e X X ? ? ? ? ? ? (6) as in (3), although w e ould also tak e R ( p; S ) = f p g or R ( p; S ) to b e X X O O O X [ X X ? ? ? ? ? ? [ X X ? ? ? ? ? ? although this last union migh t b e a wkw ard to represen t. Note again that R and C are funtions of p as w ell as S ; the set returned m ust inlude the giv en p osition p but an otherwise b e exp eted to v ary as p do es. W e will no w mo dify Algorithm 2.2.5 so that the transp osition table, instead of a hing results for single p ositions, a hes results for sets of p ositions. As disussed in the in tro du- tion to this setion, this is an analog to the in tro dution of truth main tenane te hniques in to adv ersary sear h. The mo died algorithm 2.3.3 app ears in Figure 2 and returns a pair of v alues { the v alue for the giv en p osition, and a set of p ositions that will tak e the same v alue. Prop osition 2.3.4 Supp ose that v =  ( p; [ x; y ℄) for every ( S; [ x; y ℄ ; v ) in T and p 2 S . Then if ev  ( p ) 2 [ x; y ℄ , the value r eturne d by A lgorithm 2.3.3 is ev  ( p ) . Pro of. W e need to sho w that when the algorithm returns, an y p osition in S ans will ha v e the v alue v ans . This will ensure that the transp osition table remains orret. T o see this, supp ose that the no de b eing expanded is a maximizing no de; the minimizing ase is dual. Supp ose rst that this no de is a loss for the maximizer, ha ving v alue 0. In sho wing that the no de is a loss, w e will ha v e examined suessor no des that are in sets denoted S new in Algorithm 2.3.3; if the maximizer subsequen tly nds himself in a p osition from whi h he has no mo v es outside of the v arious S new , he will still b e in a losing p osition. Sine S all = [ S new , the maximizer will lose in an y p osition from whi h he is onstrained to next mo v e in to an elemen t of S all . Sine ev ery p osition in C ( p; S all ) has this prop ert y , it is safe to tak e S ans = C ( p; S all ). This is what is done in the rst line with a dagger in the algorithm. The more in teresting ase is where the ev en tual v alue of the no de is nonzero; no w in order for another no de n to demonstrably ha v e the same v alue, the maximizer m ust ha v e no new options at n , and m ust still ha v e some mo v e that a hiev es the v alue v ans at n . The rst ondition is iden tial to the earlier ase where v ans = 0. F or the seond, note that an y time the maximizer nds a new b est mo v e, w e set S ans to the set of p ositions that 313 Ginsber g Algorithm 2.3.3 (P artition sear h) Giv en a game ( G; p I ; s; ev ) and ( P ; R ; C ) a partition system for it, a p osition p 2 G , utos [ x; y ℄  [0 ; 1℄ and a transp osition table T onsisting of triples ( S; [ a; b ℄ ; v ) with S  G and a  b; v 2 [0 ; 1℄, to ompute  ( p; [ x; y ℄): if there is an en try ( S; [ x; y ℄ ; z ) with p 2 S return h z ; S i if ev ( p ) 2 [0 ; 1℄ then h v ans ; S ans i = h ev ( p ) ; P ( p ) i if ev ( p ) = max then v ans := 0 S all :=  for ea h p 0 2 s ( p ) do h v new ; S new i =  ( p 0 ; [max ( v ans ; x ) ; y ℄) if v new  y then T := T [ ( S new ; [ x; y ℄ ; v new ) return h v new ; S new i if v new > v ans then h v ans ; S ans i = h v new ; S new i S all := S all [ S new if v ans = 0 then S ans = C ( p; S all ) y else S ans = R ( p; S ans ) \ C ( p; S all ) y z if ev ( p ) = min then v ans := 1 S all :=  for ea h p 0 2 s ( p ) do h v new ; S new i =  ( p 0 ; [ x; min( v ans ; y )℄) if v new  x then T := T [ ( S new ; [ x; y ℄ ; v new ) return h v new ; S new i if v new < v ans then h v ans ; S ans i = h v new ; S new i S all := S all [ S new if v ans = 1 then S ans = C ( p; S all ) else S ans = R ( p; S ans ) \ C ( p; S all ) z T := T [ ( S ans ; [ x; y ℄ ; v ans ) return h v ans ; S ans i Figure 2: The partition sear h algorithm 314 GIB: Imperfet inf orma tion in a omput a tionall y hallenging game w e kno w reursiv ely a hiev e the same v alue. When w e omplete the maximizer's lo op in the algorithm, it follo ws that S ans will b e a set of p ositions from whi h the maximizer an indeed a hiev e the v alue v ans . Th us the maximizer an also a hiev e that v alue from an y p osition in R ( p; S ans ). It follo ws that the o v erall set of p ositions kno wn to ha v e the v alue v ans is giv en b y R ( p; S ans ) \ C ( p; S all ), in terseting the t w o onditions of this paragraph. This is what is done in the seond daggered step in the algorithm. 2.4 Zero-windo w v ariations The eetiv eness of partition sear h dep ends ruially on the size of the sets main tained in the transp osition table. If the sets are large, man y p ositions will b e ev aluated b y lo okup. If the sets are small, partition sear h ollapses to on v en tional  -  pruning. An examination of Algorithm 2.3.3 suggests that the p oin ts in the algorithm at whi h the sets are redued the most are those mark ed with a double dagger in the desription, where an in tersetion is required b eause w e need to ensure b oth that the pla y er an mak e a mo v e equiv alen t to his b est one and that there are no other options. The eetiv eness of the metho d w ould b e impro v ed if this p ossibilit y w ere remo v ed. T o see ho w to do this, supp ose for a momen t that the ev aluation funtion alw a ys returned 0 or 1, as opp osed to in termediate v alues. No w if the maximizer is on pla y and the v alue v new = 1, a prune will b e generated b eause there an b e no b etter v alue found for the maximizer. If all of the v new are 0, then v ans = 0 and w e an a v oid the troublesome in tersetion. The maximizer loses and there is no \b est" mo v e that w e ha v e to w orry ab out making. In realit y , the restrition to v alues of 0 or 1 is unrealisti. Some games, su h as bridge, allo w more than t w o outomes, while others annot b e analyzed to termination and need to rely on ev aluation funtions that return appro ximate v alues for in ternal no des. W e an deal with these situations using a te hnique kno wn as zer o-window se ar h (originally alled s out sear h (P earl, 1980)). T o ev aluate a sp ei p osition, one rst estimates the v alue to b e e and then determines whether the atual v alue is ab o v e or b elo w e b y treating an y v alue v > e as a win for the maximizer and an y v alue v  e as a win for the minimizer. The results of this alulation an then b e used to rene the guess, and the pro ess is rep eated. If no initial estimate is a v ailable, a binary sear h an b e used to nd the v alue to within an y desired tolerane. Zero-windo w sear h is eetiv e b eause little time is w asted on iterations where the estimate is wildly inaurate; there will t ypially b e man y lines sho wing that a new estimate is needed. Most of the time is sp en t on the last iteration or t w o, dev eloping tigh t b ounds on the p osition b eing onsidered. There is an analog in on v en tional  -  pruning, where the b ounds t ypially get tigh t qui kly and the bulk of the analysis deals with a situation where the v alue of the original p osition is kno wn to lie in a fairly narro w range. In zero-windo w sear h, a no de alw a ys ev aluates to 0 or 1, sine either v > e or v  e . This allo ws a straigh tforw ard mo diation to Algorithm 2.3.3 that a v oids the troublesome ases men tioned earlier. 315 Ginsber g 2.5 Exp erimen tal results P artition sear h w as tested b y analyzing 1000 randomly generated bridge deals and om- paring the n um b er of no des expanded using partition sear h and on v en tional metho ds. In addition to our general in terest in bridge, there are t w o reasons wh y it an b e exp eted that partition sear h will b e useful for this game. First, partition sear h requires that the funtions R 0 and C 0 supp ort a partition-lik e analysis; it m ust b e the ase that an analysis of one situation will apply equally w ell to a v ariet y of similar ones. Seond, it m ust b e p ossible to build appro ximating funtions R and C that are reasonably aurate represen tativ es of R 0 and C 0 . Bridge satises b oth of these prop erties. Exp ert disussion of a partiular deal often will refer to small ards as x 's, indiating that it is indeed the ase that the exat ranks of these ards are irrelev an t. Seond, it is p ossible to \ba k up" x 's from one p osition to its predeessors. If, for example, one pla y er pla ys a lub with no  hane of ha ving it impat the rest of the game, and b y doing so rea hes a p osition in whi h subsequen t analysis sho ws him to ha v e t w o small lubs, then he learly m ust ha v e had thr e e small lubs originally . Finally , the fat that ards are simply b eing replaed b y x 's means that it is p ossible to onstrut data strutures for whi h the time p er no de expanded is virtually un hanged from that using on v en tional metho ds. P erhaps an example will mak e this learer. Consider the follo wing partial bridge deal in whi h East is to lead and there are no trumps:  | ~ | } AK | |  10  A Q ~ A ~ | } | } | | | | |  KJ ~ | } | | | An analysis of this situation sho ws that in the main line, the only ards that win tri ks b y virtue of their ranks are the spade Ae, King and Queen. This santions the replaemen t of the ab o v e gure b y the follo wing more general one: 316 GIB: Imperfet inf orma tion in a omput a tionall y hallenging game  | ~ | } xx | |  x  A Q ~ x ~ | } | } | | | | |  Kx ~ | } | | | Note rst that this replaemen t is sound in the sense that ev ery p osition that is an instane of the seond diagram is guar ante e d to ha v e the same v alue as the original. W e ha v e not resorted to an informal argumen t of the form \Ja ks and lo w er tend not to matter," but instead to a preise argumen t of the form, \In the expansion of the sear h tree asso iated with the giv en deal, Ja ks and lo w er w ere pro v en never to matter." Bridge also app ears to b e extremely w ell-suited (no pun in tended) to the kind of analysis that w e ha v e b een desribing; a  hess analog migh t in v olv e desribing a mating om bination and sa ying that \the p osition of Bla k's queen didn't matter." While this do es happ en, asual  hess on v ersation is m u h less lik ely to inlude this sort of remark than bridge on v ersation is lik ely to refer to a host of small ards as x 's, suggesting at least that the partition te hnique is more easily applied to bridge than to  hess (or to other games). That said, ho w ev er, the results for bridge are striking, leading to p erformane impro v e- men ts of an order of magnitude or more on fairly small sear h spaes (p erhaps 10 6 no des). The deals w e tested in v olv ed b et w een 12 and 48 ards and w ere analyzed to termination, so that the depth of the sear h v aried from 12 to 48. (The solv er without partition sear h w as unable to solv e larger problems.) The bran hing fator for minimax without transp osition tables app eared to b e appro ximately 4, and the results app ear in Figure 3. Ea h p oin t in the graph orresp onds to a single deal. The p osition of the p oin t on the x -axis indiates the n um b er of no des expanded using  -  pruning and transp osition tables, and the p osition on the y -axis the n um b er expanded using partition sear h as w ell. Both axes are plotted logarithmially . In b oth the partition and on v en tional ases, a binary zero-windo w sear h w as used to determine the exat v alue to b e assigned to the hand, whi h the rules of bridge onstrain to range from 0 to the n um b er of tri ks left (one quarter of the n um b er of ards in pla y). As men tioned previously , hands generated using a full de k of 52 ards w ere not onsidered b eause the on v en tional metho d w as in general inapable of solving them. The program w as run on a Spar 5 and P o w erMa 6100, where it expanded appro ximately 15K no des/seond. The transp osition table shares ommon struture among dieren t sets and as a result, uses appro ximately 6 b ytes/no de. The dotted line in the gure is y = x and orresp onds to the break ev en p oin t relativ e to  -  pruning in isolation. The solid line is the least-squares b est t to the logarithmi data, and is giv en b y y = 1 : 57 x 0 : 76 . This suggests that partition sear h is leading to an eetiv e redution in bran hing fator of b ! b 0 : 76 . This impro v emen t, ab o v e and b ey ond that 317 Ginsber g 10 10 3 10 5 10 7 10 10 3 10 5 10 7 P artition Con v en tional p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 1 : 57 x 0 : 76 Figure 3: No des expanded as a funtion of metho d pro vided b y  -  pruning, an b e on trasted with  -  pruning itself, whi h giv es a redution when ompared to pure minimax of b ! b 0 : 75 if the mo v es are ordered randomly (P earl, 1982) and b ! b 0 : 5 if the ordering is optimal. The metho d w as also applied to full deals of 52 ards, whi h an b e solv ed while ex- panding an a v erage of 18,000 no des p er deal. 5 This w orks out to ab out a seond of pu time. 3. Mon te Carlo ardpla y algorithms One w a y in whi h w e migh t use our p erfet-information ardpla y engine to pro eed in a realisti situation w ould b e to deal the unseen ards at random, biasing the deal so that it w as onsisten t b oth with the bidding and with the ards pla y ed th us far. W e ould then analyze the resulting deal double dumm y and deide whi h of our p ossible pla ys w as the strongest. Av eraging o v er a large n um b er of su h Mon te Carlo samples w ould allo w us to deal with the imp erfet nature of bridge information. This idea w as initially suggested b y Levy (Levy , 1989), although he do es not app ear to ha v e realized (see b elo w) that there are problems with it in pratie. Algorithm 3.0.1 (Mon te Carlo ard seletion) T o sele t a move fr om a  andidate set M of suh moves: 5. The v ersion of gib that w as released in Otob er of 2000 replaed the transp osition table with a data struture that uses a xed amoun t of memory , and also sorts the mo v es based on narro wness (suggested b y Plaat et al. (Plaat, S haeer, Pijls, & de Bruin, 1996) to b e ro oted in the idea of onspiray sear h (MAllester, 1988)) and the killer heuristi. While the memory requiremen ts are redued, the o v erall p erformane is little  hanged. 318 GIB: Imperfet inf orma tion in a omput a tionall y hallenging game 1. Construt a set D of de als  onsistent with b oth the bidding and play of the de al thus far. 2. F or e ah move m 2 M and e ah de al d 2 D , evaluate the double dummy r esult of making the move m in the de al d . Denote the s or e obtaine d by making this move s ( m; d ) . 3. R eturn that m for whih P d s ( m; d ) is maximal. The Mon te Carlo approa h has dra wba ks that ha v e b een p oin ted out b y a v ariet y of authors, inluding Koller y and others (F rank & Basin, 1998). Most ob vious among these is that the approa h nev er suggests making an \information gathering pla y ." After all, the p erfet-information v arian t on whi h the deision is based in v ariably assumes that the information will b e a v ailable b y the time the next deision m ust b e made! Instead, the tendeny is for the approa h to simply defer imp ortan t deisions; in man y situations this ma y lead to information gathering inadv erten tly , but the amoun t of information aquired will generally b e far less than other approa hes migh t pro vide. As an example, supp ose that on a partiular deal, gib has four p ossible lines of pla y to mak e its on trat: 1. Line A w orks if W est has the  Q. 2. Line B w orks if East has the  Q. 3. Line C defers the guess un til later. 4. Line D (the lev er line) w orks indep enden t of who has the  Q. Assuming that either pla y er is equally lik ely to hold the  Q, a Mon te Carlo analyzer will orretly onlude that line A w orks half the time, and line B w orks half the time. Line C , ho w ev er, will b e presumed to w ork al l of the time, sine the on trat an still b e made (double dumm y) if the guess is deferred. Line D will also b e onluded to w ork all of the time (orretly , in this ase). As a result, gib will  ho ose randomly b et w een the last t w o p ossibilities ab o v e, b elieving as it do es that if it an only defer the guess un til later (ev en the next ard), it will mak e that guess orretly . The orret pla y , of ourse, is D . W e will disuss a solution to these diÆulties in Setions 5{7; although gib 's defensiv e ardpla y on tin ues to b e based on the ab o v e ideas, its delarer pla y no w uses stronger te h- niques. Nev ertheless, basing the ard pla y on the algorithm presen ted leads to extremely strong results, appro ximately at the lev el of a h uman exp ert. Sine gib 's in tro dution, all other omp etitiv e bridge-pla ying programs ha v e swit hed their ardpla y to similar meth- o ds, although gib 's double dumm y analysis is substan tially faster than most of the other programs and its pla y is orresp ondingly stronger. W e will desrib e three tests of GIB's ardpla y algorithms: P erformane on a om- merially a v ailable set of b en hmarks, p erformane in a h uman  hampionship designed to highligh t ardpla y in isolation, and statistial p erformane measured o v er a large set of deals. 319 Ginsber g F or the rst test, w e ev aluated the strength of gib 's ardpla y using Bridge Master (BM), a ommerial program dev elop ed b y Canadian in ternationalist F red Gitelman. BM on tains 180 deals at 5 lev els of diÆult y . Ea h of the 36 deals on ea h lev el is a problem in delarer pla y . If y ou mispla y the hand, BM mo v es the defenders' ards around if neessary to ensure y our defeat. BM w as used for the test instead of randomly dealt deals b eause the signal to noise ra- tio is far higher; go o d pla ys are generally rew arded and bad ones punished. Ev ery deal also on tains a lesson of some kind; there are no ompletely unin teresting deals where the line of pla y is irrelev an t or ob vious. There are dra wba ks to testing gib 's p erformane on non- randomly dealt deals, of ourse, sine the BM deals ma y in some w a y not b e represen tativ e of the problems a bridge pla y er w ould atually enoun ter at the table. The test w as run under Mirosoft Windo ws on a 200 MHz P en tium Pro. As a b en hmark, Bridge Baron (BB) v ersion 6 w as also tested on the same deals using the same hardw are. 6 BB w as giv en 10 seonds to selet ea h pla y , and gib w as giv en 90 seonds to pla y the en tire deal with a maxim um Mon te Carlo sample size of 50. 7 New deals w ere generated ea h time a pla y deision needed to b e made. These n um b ers appro ximately equalized the omputational resoures used b y the t w o programs; BB ould in theory tak e 260 seonds p er deal (ten seonds on ea h of 26 pla ys), but in pratie to ok substan tially less. Gib w as giv en the autions as w ell; there w as no failit y for doing this in BB. This information w as ritial on a small n um b er of deals. Here is ho w the t w o systems p erformed: Lev el BB GIB 1 16 31 2 8 23 3 2 12 4 1 21 5 4 13 T otal 33 100 18.3% 55.6% Ea h en try is the n um b er of deals that w ere pla y ed suessfully b y the program in question. Gib 's mistak es are illuminating. While some of them in v olv e failing to gather informa- tion, most are problems in om bining m ultiple  hanes (as in ase D ab o v e). As BM's deals get more diÆult, they more often in v olv e om bining a v ariet y of p ossibly winning options and that is wh y GIB's p erformane falls o at lev els 2 and 3. A t still higher lev els, ho w ev er, BM t ypially in v olv es the suessful dev elopmen t of omplex end p ositions, and gib 's p erformane reb ounds. This app eared to happ en to BB as w ell, although to a m u h lesser exten t. It w as gratifying to see gib diso v er for itself the omplex end p ositions around whi h the BM deals are designed, and more gratifying still to witness gib 's diso v ery of a maneuv er that had hitherto not b een iden tied in the bridge literature, as desrib ed in App endix B. 6. The urren t v ersion is Bridge Baron 10 and ould b e exp eted to p erform guardedly b etter in a test su h as this. Bridge Baron 6 do es not inlude the Smith enhanemen ts (Smith et al., 1996). 7. GIB's Mon te Carlo sample size is xed at 50 in most ases, whi h pro vides a go o d ompromise b et w een sp eed of pla y and auray of result. 320 GIB: Imperfet inf orma tion in a omput a tionall y hallenging game Exp erimen ts su h as this one are tedious, b eause there is no text in terfae to a om- merial program su h as Bridge Master or Bridge Baron. As a result, information regarding the sensitivit y of gib 's p erformane to v arious parameters tends to b e only anedotal. Gib solv es an additional 16 problems (bringing its total to 64.4%) giv en additional resoures in the form of extra time (up to 100 seonds p er pla y , although that time w as v ery rarely tak en), a larger Mon te Carlo sample (100 deals instead of 50) and hand-generated explanations of the opp onen ts' bids and op ening leads. Ea h of the three fators app eared to on tribute equally to the impro v ed p erformane. Other authors are rep orting omparable lev els of p erformane for gib . F orrester, w orking with a dieren t but similar b en hmark (Bla kw o o d, 1979), rep orts 8 that gib solv es 68% of the problems giv en 20 seonds/pla y , and 74% of them giv en 30 seonds/pla y . Deals where gib has outpla y ed h uman exp erts are the topi of a series of artiles in the Dut h bridge magazine IMP (Esk es, 1997, and sequels). 9 Based on these results, gib w as in vited to partiipate in an in vitational ev en t at the 1998 w orld bridge  hampionships in F rane; the ev en t in v olv ed deals similar to Bridge Master's but substan tially more diÆult. Gib joined a eld of 34 of the b est ard pla y ers in the w orld, ea h pla y er faing t w elv e su h problems o v er the ourse of t w o da ys. Gib w as leading at the halfw a y mark, but pla y ed p o orly on the seond da y (p erhaps the pressure w as to o m u h for it), and nished t w elfth. The h uman partiipan ts w ere giv en 90 min utes to pla y ea h deal, although they w ere p enalized sligh tly for pla ying slo wly . GIB pla y ed ea h deal in ab out ten min utes, using a Mon te Carlo sample size of 500; tests b efore the ev en t indiated little or no impro v emen t if gib w ere allotted more time. Mi hael Rosen b erg, the ev en tual winner of the on test and the pre-tournamen t fa v orite, in fat made one more mistak e than did Bart Bramley , the seond plae nisher. Rosen b erg pla y ed just qui kly enough that Bramley's aum ulated time p enalties ga v e Rosen b erg the vitory . The soring metho d th us fa v ors GIB sligh tly . Finally , gib 's p erformane w as ev aluated diretly using reords from atual pla y . These reords are a v ailable from high lev els of h uman omp etition (w orld and national  hampi- onships, t ypially), so that it is p ossible to determine exatly ho w frequen tly h umans mak e mistak es at the bridge table. In Figure 4, w e sho w the frequeny with whi h this data indiates that a h uman delarer, leading to the n th tri k of a deal, mak es a mistak e that auses his on trat to b eome unmak eable on a double-dumm y basis. The y axis giv es the frequeny of the mistak es and is plotted logarithmially; as one w ould exp et, pla y b eomes more aurate later in the deal. W e also giv e similar data for gib , based on large sample of deals that gib pla y ed against itself. The error proles of the t w o are quite similar. Before turning to defensiv e pla y , let me p oin t out that this metho d of analysis fa v ors gib sligh tly . F ailing to mak e an information gathering pla y gets reeted in the ab o v e gure, sine the la k of information will ause gib to mak e a double-dumm y mistak e subsequen tly . But h uman delarers often w ork to giv e the defenders problems that exploit their relativ e la k of information, and that tati is not rew arded in the ab o v e analysis. Similar results for defensiv e pla y app ear in Figure 5. 8. P osting to re.games.bridge on 14 July 1997. 9. http://www.imp-bridge.nl 321 Ginsber g 0.0001 0.001 0.01 0.1 0 2 4 6 8 10 12 P(err) tri k h uman GIB Figure 4: Gib 's p erformane as delarer 1e-05 0.0001 0.001 0.01 0.1 0 2 4 6 8 10 12 P(err) tri k h uman GIB Figure 5: Gib 's p erformane as defender 322 GIB: Imperfet inf orma tion in a omput a tionall y hallenging game There are t w o imp ortan t te hnial remarks that m ust b e made ab out the Mon te Carlo algorithm b efore pro eeding. First, note that w e w ere a v alier in simply sa ying, \Construt a set D of deals onsisten t with b oth the bidding and pla y of the deal th us far." T o onstrut deals onsisten t with the bidding, w e rst simplify the aution as observ ed, building onstrain ts desribing ea h of the hands around the table. W e then deal hands onsisten t with the onstrain ts using a deal generator that deals un biased hands giv en restritions on the n um b er of ards held b y ea h pla y er in ea h suit. This set of deals is then tested to remo v e elemen ts that do not satisfy the remaining onstrain ts, and ea h of the remaining deals is passed to the bidding mo dule to iden tify those for whi h the observ ed bids w ould ha v e b een made b y the pla y ers in question. (This assumes that gib has a reasonable understanding of the bidding metho ds used b y the opp onen ts.) The o v erall dealing pro ess t ypially tak es one or t w o seonds to generate the full set of deals needed b y the algorithm. No w the ard pla y m ust b e analyzed. Ideally , gib w ould do something similar to what it do es for the bidding, determining whether ea h pla y er w ould ha v e pla y ed as indiated on an y partiular deal. Unfortunately , it is simply impratial to test ea h h yp othetial deision reursiv ely against the ardpla y mo dule itself. Instead, gib tries to ev aluate the probabilit y that W est (for example) has the  K (for example), and to then use these probabilities to w eigh t the sample itself. T o understand the soure of the w eigh ting probabilities, let us onsider a sp ei exam- ple. Supp ose that in some partiular situation, gib pla ys the  5. The analysis indiates that 80% of the time that the next pla y er (sa y W est) holds the  K, it is a mistak e for W est not to pla y it. In other w ords, W est's failure to pla y the  K leads to o dds of 4:1 that he hasn't got it. These o dds are no w used via Ba y es' rule to adjust the probabilit y that W est holds the  K at all. The probabilities are then mo died further to inlude information rev ealed b y defensiv e signalling (if an y), and the adjusted probabilities are nally used to bias the Mon te Carlo sample. The ev aluation P d s ( m; d ) in Algorithm 3.0.1 is replaed with P d w d s ( m; d ) where w d is the w eigh t assigned to deal d . More hea vily w eigh ted deals th us ha v e a larger impat on gib 's ev en tual deision. The seond te hnial p oin t regarding the algorithm itself in v olv es the fat that it needs to run qui kly and that it ma y need to b e terminated b efore the analysis is omplete. F or the former, there are a v ariet y of greedy te hniques that an b e used to ensure that a mo v e m is not onsidered if w e an sho w P d s ( d; m )  P d s ( d; m 0 ) for some m 0 . The algorithm also uses iterativ e broadening (Ginsb erg & Harv ey , 1992) to ensure that a lo w-width answ er is a v ailable if a high-width sear h fails to terminate in time. Results from the lo w- and high-width sear hes are om bined when time expires. Also regarding sp eed, the algorithm requires that for ea h deal in the Mon te Carlo sample and ea h p ossible mo v e, w e ev aluate the resulting p osition exatly . Kno wing simply that mo v e m 1 is not as go o d as mo v e m 2 for deal d is not enough; m 1 ma y b e b etter than m 2 elsewhere and w e need to ompare them quan titativ ely . This approa h is aided substan tially b y the partition sear h idea, where en tries in the transp osition table orresp ond not to single p ositions and their ev aluated v alues, but to sets of p ositions and v alues. In man y ases, m 1 and m 2 ma y fall in to the same en try of the partition table long b efore they atually transp ose in to one another exatly . 323 Ginsber g 4. Mon te Carlo bidding The purp ose of bidding in bridge is t w ofold. The primary purp ose is to share information ab out y our ards with y our partner so that y ou an o op erativ ely selet an optimal nal on trat. A seondary purp ose is to disrupt the opp onen ts' attempt to do the same. In order to a hiev e this purp ose, a wide v ariet y of bidding \languages" ha v e b een de- v elop ed. In some, when y ou suggest lubs as trumps, it means y ou ha v e a lot of them. In others, the suggestion is only temp orary and the information on v ey ed is quite dieren t. In all of these languages, some meaning is assigned to a wide v ariet y of bids in partiular situations; there are also default rules that assign meanings to bids that ha v e no sp eially assigned meanings. An y omputer bridge pla y er will need similar understandings. Bidding is in teresting b eause the meanings frequen tly o v erlap; there ma y b e one or more bids that are suitable (or nearly so) on an y partiular set of ards. Existing omputer programs ha v e simply mat hed p ossible bids against large databases giving their meanings, sear hing for that bid that b est mat hes the ards that the ma hines hold. W orld  hampion Chip Martel rep orts y that h uman exp erts tak e a dieren t approa h. 10 ; 11 Although exp ert bidding is based on a database su h as that used b y existing programs, lose deisions are made b y sim ulating the results of ea h andidate ation. This in v olv es pro jeting ho w the bidding is lik ely to pro eed and ev aluating the pla y in one of a v ariet y of p ossible nal on trats. An exp ert gets his \judgmen t" from a Mon te Carlo-lik e sim ulation of the results of p ossible bids, often referred to in the bridge-pla ying omm unit y as a Bor el sim ulation (so named after the rst pla y er to desrib e the metho d). Gib tak es a similar ta k. Algorithm 4.0.2 (Borel sim ulation) T o sele t a bid fr om a  andidate set B , given a datab ase Z that suggests bids in various situations: 1. Construt a set D of de als  onsistent with the bidding thus far. 2. F or e ah bid b 2 B and e ah de al d 2 D , use the datab ase Z to pr oje t how the aution wil l  ontinue if the bid b is made. (If no bid is suggeste d by the datab ase, the player in question is assume d to p ass.) Compute the double dummy r esult of the eventual  ontr at, denoting it s ( b; d ) . 3. R eturn that b for whih P d s ( b; d ) is maximal. As with the Mon te Carlo approa h to ard pla y , this approa h do es not tak e in to aoun t the fat that bridge is not pla y ed double dumm y . Human exp erts often  ho ose not to mak e bids that will on v ey to o m u h information to the opp onen ts in order to mak e the defenders' task as diÆult as p ossible. This onsideration is missing from the ab o v e algorithm. 12 10. The 1994 Rosen blum Cup W orld T eam Championship w as w on b y a team that inluded Martel and Rosen b erg. 11. F rank suggests (F rank, 1998) that the existing ma hine approa h is apable of rea hing exp ert lev els of p erformane. While this app ears to ha v e b een true in the early 1980's (Lindel of, 1983), mo dern exp ert bidding pratie has b egun to highligh t the disruptiv e asp et of bidding, and ma hine p erformane is no longer lik ely to b e omp etitiv e. 12. In theory at least, this issue ould b e addressed using the single-dumm y ideas that w e will presen t in subsequen t setions. Computational onsiderations urren tly mak e this impratial, ho w ev er. 324 GIB: Imperfet inf orma tion in a omput a tionall y hallenging game There are more serious problems also, generally en tering around the dev elopmen t of the bidding database Z . First, the database itself needs to b e built and debugged. A large n um b er of rules need to b e written, t ypially in a sp eialized language and dep enden t up on the bridge exp ertise of the author. The rules need to b e debugged as atual pla y rev eals o v ersigh ts or other diÆulties. The nature and sizes of these databases v ary enormously , although all of them represen t v ery substan tial in v estmen ts on the part of the authors. The database distributed with meado wlark bridge inludes some 7300 rules; that with q-plus bridge 2500 rules omprising 40,000 lines of sp eialized o de. Gib 's database is built using a deriv ativ e of the Meado wlark language, and inludes ab out 3000 rules. All of these databases doubtless on tain errors of one sort or another; one of the nie things ab out most bidding metho ds is that they tend to b e fairly robust against su h prob- lems. Unfortunately , the Borel algorithm desrib ed ab o v e in tro dues substan tial instabilit y in gib 's o v erall bidding. T o understand this, supp ose that the database Z is somewhat onserv ativ e in its ations. The pro jetion in step 2 of Algorithm 4.0.2 no w leads ea h pla y er to assume its partner bids onserv ativ ely , and therefore to bid somewhat aggressiv ely to omp ensate. The partnership as a whole ends up over omp ensating. W orse still, supp ose that there is an omission of some kind in Z ; p erhaps ev ery time someone bids 7 } , the database suggests a fo olish ation. Sine 7 } is a rare bid, a bid- ding system that mat hes its bids diretly to the database will enoun ter this problem infrequen tly . Gib , ho w ev er, will b e m u h more aggressiv e, bidding 7 } often on the grounds that doing so will ause the opp onen ts to mak e a mistak e. In pratie, of ourse, the bug in the database is unlik ely to b e repliated in the opp onen ts' minds, and gib 's attempts to exploit the gap will b e unrew arded or w orse. This is a serious problem, and app ears to apply to an y attempt to heuristially mo del an adv ersary's b eha vior: It is diÆult to distinguish a go o d  hoie that is suessful b eause the opp onen t has no winning options from a bad  hoie that app e ars suessful b eause the heuristi fails to iden tify su h options. There are a v ariet y of w a ys in whi h this problem migh t b e addressed, none of them p erfet. The most ob vious is simply to use gib 's aggressiv e tendenies to iden tify the bugs or gaps in the bidding database, and to x them. Beause of the size of the database, this is a slo w pro ess. Another approa h is to try to iden tify the bugs in the database automatially , and to b e w ary in su h situations. If the bidding sim ulation indiates that the opp onen ts are ab out to a hiev e a result m u h w orse than what they migh t a hiev e if they sa w ea h other's ards, that is evidene that there ma y b e a gap in the database. Unfortunately , it is also evidene that gib is simply eetiv ely disrupting its opp onen ts' eorts to bid aurately . Finally , restritions ould b e plaed on gib that require it to mak e bids that are \lose" to the bids suggested b y the database, on the grounds that su h bids are more lik ely to reet impro v emen ts in judgmen t than to highligh t gaps in the database. All of these te hniques are used, and all of them are useful. Gib 's bidding is substan tially b etter than that of earlier programs, but not y et of exp ert alib er. 325 Ginsber g The bidding w as tested as part of the 1998 Baron Barla y/OKBridge W orld Computer Bridge Championships, and the 2000 Orbis W orld Computer Bridge Championship. Ea h program bid deals that had previously b een bid and pla y ed b y exp erts; a result of 0 on an y partiular deal mean t that the program bid to a on trat as go o d as the a v erage exp ert result. A p ositiv e result w as b etter, and a negativ e result w as w orse. There w ere 20 deals in ea h on test; although ard pla y w as not an issue, the deals w ere seleted to p ose  hallenges in bidding and a standard deviation of 5.5 imp s/deal is still a reasonable estimate. One standard deviation o v er the 20 deal set ould th us b e exp eted to b e ab out 25 imp s. Gib 's nal sore in the 1998 bidding on test w as +2 imp s; in the 2000 on test it w as +9 imp s. In b oth ases, it narro wly edged out the exp ert eld against whi h it w as ompared. 13 The next b est program in 1998, Blue Chip Bridge, nished with a sore of -35 imp s, not dissimilar from the -37 imp s that had b een suÆien t to win the bidding on test in 1997. The seond plae program in 2000 (one again Blue Chip Bridge) had a sore of -2 imp s. 5. The v alue of information In previous setions of this pap er, w e ha v e desrib ed Mon te Carlo metho ds for dealing with the fat that bridge is a game of imp erfet information, and ha v e also desrib ed p ossible problems with this approa h. W e no w turn to w a ys to o v eromes some of these diÆulties. F or the momen t, let me assume that w e replae bridge with a f 0 ; 1 g game, so that w e are in terested only in the question of whether delarer mak es his on trat. Ov ertri ks or extra undertri ks are irrelev an t. A t least as a rst appro ximation, bridge exp erts often lo ok at hands this w a y , only subsequen tly rening the analysis. If y ou ask su h an exp ert wh y he to ok a partiular line on a deal, he will often sa y something lik e, \I w as pla ying for ea h opp onen t to ha v e three hearts," or \I w as pla ying for W est to hold the spade queen." What he is rep orting is that set of distributions of the unseen ards for whi h he w as exp eting to mak e the hand. A t some lev el, the exp ert is treating the v alue of the game not as zero or one (whi h it w ould b e if he ould see the unseen ards), but as a funtion from the set of p ossible distributions of unseen ards in to f 0 ; 1 g . If w e denote this set of distributions b y S , the v alue of the game is th us a funtion f : S ! f 0 ; 1 g W e will follo w standard mathematial notation and denote the set f 0 ; 1 g b y 2 and denote the set of funtions f : S ! 2 b y 2 S . It is p ossible to extend max and min from the set f 0 ; 1 g to 2 S in a p oin t wise fashion, so that, for example min ( f ; g )( s ) = min ( f ( s ) ; g ( s )) (7) for funtions f ; g 2 2 S and a sp ei situation s 2 S . The maximizing funtion is dened similarly . 13. This is in spite of the earlier remark that GIB's bidding is not of exp ert alib er. GIB w as fortunate in the bidding on tests in that most of the problems in v olv ed situations handled b y the database. When faed with a situation that it do es not understand, GIB's bidding deteriorates drastially . 326 GIB: Imperfet inf orma tion in a omput a tionall y hallenging game As an example, supp ose that in a partiular situation, there is one line of pla y f that wins if W est has the  Q. There is another line of pla y g that wins if East has exatly three hearts. No w min ( f ; g ) is the line of pla y that wins just in ase b oth W est has the  Q and East has three hearts, while max( f ; g ) is the line of pla y that wins if either ondition obtains. It is imp ortan t to realize that the set 2 S is not totally ordered b y these max and min funtions, lik e the unit in terv al is. Instead, 2 S is an instane of a mathematial struture kno wn as a latti e (Gr atzer, 1978, and Setion 6). A t this p oin t, w e note only that w e an extend Denition 2.2.1 to an y set with maximization and minimization op erators: Denition 5.0.3 A game is an o tuple ( G; V ; p I ; s; ev ; f + ; f  ) suh that: 1. G is a nite set of p ossible p ositions in the game. 2. V is the set of values for the game. 3. p I 2 G is the initial p osition of the game. 4. s : G ! 2 G gives the su  essors of a given p osition. 5. ev : G ! f max ; min g [ V gives the value for terminal p ositions or indi ates whih player is to move for nonterminal p ositions. 6. f + : P ( V ) ! V and f  : P ( V ) ! V ar e the  ombination funtions for the maximizer and minimizer r esp e tively. The strutur es G , V , p I , s and ev ar e r e quir e d to satisfy the fol lowing  onditions (unhange d fr om Denition 2.2.1): 1. Ther e is no se quen e of p ositions p 0 ; : : : ; p n with n > 0 , p i 2 s ( p i  1 ) for e ah i and p n = p 0 . In other wor ds, ther e ar e no \lo ops" that r eturn to an identi al p osition. 2. ev ( p ) 2 V if and only if s ( p ) =  . This denition extends Denition 2.2.1 only in that the v alue set and om bination funtions ha v e b een generalized. A su h, Denition 5.0.3 inludes b oth \on v en tional" games in whi h the v alues are n umeri and the om bination funtions are max/min, and our more general setting where the v alues are funtional and the om bination funtions om bine them as desrib ed ab o v e. As usual, w e an use the maximization and minimization funtions to assign a v alue to the ro ot of the tree: Denition 5.0.4 Given a game ( G; V ; p I ; s; ev ; f + ; f  ) , we intr o du e a funtion ev  : G ! V dene d r e ursively by ev  ( p ) = 8 < : ev ( p ) ; if ev ( p ) 2 V ; f + f ev  ( p 0 ) j p 0 2 s ( p ) g ; if ev ( p ) = max ; f  f ev  ( p 0 ) j p 0 2 s ( p ) g ; if ev ( p ) = min . The v alue of ( G; V ; p I ; s; ev ; f + ; f  ) is dene d to b e ev  ( p I ) . 327 Ginsber g The denition is w ell founded b eause the game has no lo ops, and it is straigh tforw ard to extend the minimax algorithm 2.2.3 to this more general formalism. W e will disuss extensions of  -  pruning in the next setion. T o esh out our previous informal desription, w e need to instan tiate Denition 5.0.3. W e do this b y ha ving the v alue of an y partiular no de orresp ond to the set of p ositions where the maximizer an win: 1. The set G of p ositions is a set of pairs ( p; Z ) where p is a p osition with only t w o of the four bridge hands visible (i.e., a p osition in the \single dumm y" game), and Z is that subset of S (the set of situations) that is onsisten t b oth with p and with the ards that w ere pla y ed to rea h p from the initial p osition. 2. The v alue set V is 2 S . 3. The initial p osition p I is ( p 0 ; S ), where p 0 is the initial single-dumm y p osition. 4. The suessor funtion is desrib ed as follo ws: (a) If the delarer/maximizer is on pla y in the giv en p osition, the suessors are obtained b y en umerating the maximizer's legal pla ys and lea ving the set Z of situations un hanged. (b) If the minimizer is on pla y in the giv en p osition, the suessors are obtained b y pla ying an y ard  that is legal in an y elemen t of Z and then restriting Z to that subset for whi h  is in fat a legal pla y . 5. T erminal no des are no des where all ards ha v e b een pla y ed, and therefore orresp ond to single situations s , sine the lo ations of all ards ha v e b een rev ealed. F or su h a terminal p osition, if the delarer has made his on trat, the v alue is S (the en tire set of p ositions p ossible at the ro ot). If the delarer has failed to mak e his on trat, the v alue is S  f s g . 6. The maximization and minimization funtions are omputed p oin t wise, so that f + ( U; V ) = U [ V and f  ( U; V ) = U \ V Giv en an initial single-dumm y situation p orresp onding to a set S of situations, w e will all the ab o v e game the ( p; S ) game . Prop osition 5.0.5 Supp ose that the set of situations for whih the maximizer  an make his  ontr at is T  S . Then the value of the ( p; S ) game is T . It is natural to view T as an elemen t of 2 S ; it is the funtion mapping p oin ts in T to 1 and p oin ts outside of T to 0. Pro of. The pro of pro eeds b y indution on the depth of the game tree. If the ro ot no de p is also terminal, then S = f s g and the v alue is learly set orretly (to s or ) b y the denition of the ( p; S ) game. 328 GIB: Imperfet inf orma tion in a omput a tionall y hallenging game If p is non terminal, supp ose rst that it is a maximizing no de. No w let s 2 S b e some partiular situation. If the maximizer an win in s , then there is some suessor ( p 0 ; S 0 ) to ( p; S ) where the maximizer wins, and hene b y the indutiv e h yp othesis, the v alue of ( p 0 ; S 0 ) is a set U with s 2 U . But sine the maximizer mo v es in p , the v alue assigned to ( p; S ) is a sup erset of the v alue assigned to an y subno de, so that s 2 ev  ( p; S ) = T . If, on the other hand, the maximizer annot win in s , then he annot win in an y  hild of s . If ( p i ; S i ) are the suessors of ( p; S ) in the game tree, then again b y the indutiv e h yp othesis, w e m ust ha v e s 62 ev  ( p i ; S i ) for ea h i . But ev  ( p; S ) = [ i ev  ( p i ; S i ) so that s 62 ev  ( p; S ) = T . F or the minimizing ase, supp ose that the maximizer wins in s . Then the maximizer m ust win in ev ery suessor of s , so that s 2 ev  ( p i ; S i ) for ea h su h suessor and therefore s 2 ev  ( p; S ). Alternativ ely , if the minimizer wins in s , he m ust ha v e a legal winning option so that s 62 ev  ( p i ; S i ) for some i and therefore s 62 ev  ( p; S ). Unfortunately , Prop osition 5.0.5 is in some sense exatly what w e w an ted not to pro v e: it sa ys that our mo died game omputes the set of situations in whi h it is p ossible for the maximizer to mak e his on trat if he has p erfet information ab out the opp onen ts' ards, not the set of situations in whi h it is p ossible for him to mak e his on trat giv en his atual state of inomplete information. Before w e go on to deal with this, ho w ev er, let me lo ok at an example in some detail. The example w e will use is similar to that of Setion 3 and in v olv es a situation where the maximizer an mak e his on trat if either W est has the  Q or East has three hearts. I will denote b y S the set of situations where W est has the  Q, and b y T the set where East has three hearts. It's p ossible to tie in the \defer the guess" example from Setion 3 as w ell, so I will do that also. Here is the game tree for the game in question: q q q q q q q q q q q q q q q q q           P P P P P P P P P P P                        A A A A A A A A A A A A A A A A         C C C C C C C C max max min min min min min 1 0 0 1 1 1 1 0 0 1 S S S S S T T T T T A t the ro ot no de, the maximizer has four  hoies. If he mak es the mo v e on the left (pla ying for S , as it turns out), the minimizer then mo v es in a situation where the maximizer wins if S holds and loses if T holds. F or the seond mo v e, where the maximizer is essen tially pla ying for T , the rev erse is true. In the third ase, the maximizer defers the guess. W e supp ose that he is on pla y again immediately , fored to ommit b et w een pla ying for S and pla ying for T . In the last ase, he wins indep enden t of whether T or S obtains. 329 Ginsber g In the Mon te Carlo setting, the ab o v e tree will atually b e split based on the elemen t of the sample in question. In some ases, S will b e true and w e will examine only this subtree: q q q q q q q q q q q q           P P P P P P P P P P P                         A A A A         max max min min min min min 1 0 1 1 0 S S S S S The maximizer an win b y making an y mo v e other than the seond. In the ases where T obtains, w e examine: q q q q q q q q q q q q           P P P P P P P P P P P            A A A A A A A A A A A A A A A A C C C C C C C C max max min min min min min 0 1 1 0 1 T T T T T Here, the maximizer an win b y making an y mo v e other than the rst. In all ases, b oth of the last t w o mo v es win for the maximizer, sine this approa h annot reognize the fat that the third mo v e simply defers the guess while the fourth wins outrigh t. No w let us return to the situation where w e inlude information ab out the sets that it is p ossible to pla y for. Here is the tree again: q q q q q q q q q q q q q q q q q           P P P P P P P P P P P                        A A A A A A A A A A A A A A A A         C C C C C C C C max max min min min min min 1 0 0 1 1 1 1 0 0 1 S S S S S T T T T T The rst thing that w e need to do is to realize that the terminal no des should not b e lab elled with 1's and 0's but instead with sets where the maximizer an win. This pro dues: 330 GIB: Imperfet inf orma tion in a omput a tionall y hallenging game q q q q q q q q q q q q q q q q q           P P P P P P P P P P P                        A A A A A A A A A A A A A A A A         C C C C C C C C max max min min min min min S [ T S T S [ T S [ T S [ T S [ T S T S [ T S S S S S T T T T T T o understand the lab els, onsider the t w o leftmost fringe no des. The leftmost no de gets lab elled with T \for free" b eause T is eliminated b y the fat that the minimizer  hose S . Sine the maximizer wins in S , the maximizer wins in all ases. F or the seond fringe no de, S is inluded b y virtue of the minimizer's mo ving to T ; T is not inluded b eause the minimizer atually wins on this line. Hene the lab el of T for the no de in question. This analysis assumes that S and T are disjoin t; if they o v erlap, the lab els b eome sligh tly more omplex but the o v erall analysis is little  hanged. Ba king up the v alues one step giv es us: q q q q q q q q q q q q q q q q q           P P P P P P P P P P P                        A A A A A A A A A A A A A A A A         C C C C C C C C max max S [ T S T S [ T S [ T S [ T S [ T S T S [ T S T S [ T S T The minimizer, pla ying with p erfet information, alw a ys do es as b est he an. The rst in terior no de's lab el of S , for example, means that the maximizer wins only if S atually is the ase. Of ourse, our denitions th us far imply that the maximizer is pla ying with p erfet information as w ell, and w e an ba k up the rest of the tree to get: q q q q q q q q q q q q q q q q q           P P P P P P P P P P P                        A A A A A A A A A A A A A A A A         C C C C C C C C S [ T S [ T S [ T S T S [ T S [ T S [ T S [ T S T S [ T S T S [ T S T 331 Ginsber g 1e-05 0.0001 0.001 0.01 0.1 0 2 4 6 8 10 12 P(err) tri k delare defend Figure 6: Defense vs. delarer pla y for h umans As b efore, the maximizer \wins" with either of the last t w o options. Before w e address the fat that the pla y ers do not in fat ha v e p erfet information, let me p oin t out that in most bridge analyses, imp erfet information is assumed to b e an issue for the maximizer only . The defenders are assumed to b e op erating with omplete information for at least the follo wing reasons: 1. In general, there is a premium for delaring as opp osed to defending, so that b oth sides w an t to delare. T ypially , the pair with greater assets in terms of high ards wins the \bidding battle" and sueeds in b eoming the delaring side, so that the o v erall assets a v ailable to the defenders in terms of high ards are generally less than those a v ailable to the delarer. This means that the defenders will generally b e able to predit ea h other's hands with more auray than the delarer an. 2. The defenders an signal, on v eying to one another information ab out the ards they hold. (As an example, pla y of an unneessarily high ard often indiates an ev en n um b er of ards in the suit b eing pla y ed.) They are generally assumed to signal only information that is useful to them but not to delarer, one again impro ving their olletiv e abilit y to pla y as if they had p erfet information. 3. After the rst t w o or three tri ks, defenders' pla y is t ypially loser to double dumm y than is the delarer's. This is sho wn in Figure 6, whi h on trasts the qualit y of h uman pla y as defender with the qualit y of h uman pla y as delarer; w e mak e more mistak es delaring than defending as of tri k four. (This gure is analogous to Figures 4 and 5.) 332 GIB: Imperfet inf orma tion in a omput a tionall y hallenging game There are some deals where it is imp ortan t for delarer to exploit unertain t y on the part of the defenders, but these are denitely the exeption as opp osed to the rule. This suggests that Prop osition 5.0.5 is doing a reasonable job of mo deling the defenders' ardpla y , but the om bination funtion for the maximizer needs to b e mo died to reet the imp erfet-information nature of his task. T o understand this, let us return to our putativ e exp ert, who suggested at the b eginning of this setion that he migh t b e pla ying for W est to hold the spade queen. What he migh t sa y in a bit more detail is, \I ould pla y for ea h opp onen t to hold exatly three hearts, or I ould pla y for W est to hold the spade queen. The latter w as the b etter  hane." This suggests that the v alue assigned to the p osition b y the maximizer is not a single set of situations (those in whi h he an mak e the on trat), but a set S of sets of situations. Ea h set S 2 S orresp onds to one set of situations that the maximizer ould pla y for, giv en his inomplete kno wledge of the p ositions of the opp osing ards. Extending the notation used earlier in this setion, w e will denote the set of sets of situations b y 2 2 S . The maximizer's om bination funtion on 2 2 S is giv en b y max( F ; G ) = F [ G (8) where ea h of F and G are sets of sets of situations. This sa ys that if the maximizer is on pla y in a situation p , and he has one mo v e that will allo w him to selet from a set F of things to \pla y for" and another mo v e that will allo w him to selet from a set G , then his  hoie at p is to selet from an y elemen t of F [ G . The minimizer's funtion is a bit more subtle. Supp ose that at a no de p , the minimizer an mo v e to a suessor with v alue F = f F i g , or to a suessor with v alue G = f G i g . What v alue should w e assign to p ? Sine the minimizer has p erfet information, he will alw a ys guaran tee that the maximizer a hiev es the minim um v alue for the atual situation. Whatev er elemen t of F i 2 F or G j 2 G is ev en tually seleted b y the maximizer, the ev en tual v alue of p will b e the minim um of F i and G j . In other w ords min ( f F i g ; f G j g ) = f min ( F i ; G j ) g (9) where the individual minima are omputed using the p erfet information rule (7). Denition 5.0.6 L et G b e the set of p ositions in an imp erfe t information game, a set of p airs ( p; Z ) wher e p is a p osition fr om the p oint of view of the maximizing player and Z is the set of p erfe t information p ositions  onsistent with p . The imp erfet information game for G is the game ( G; V ; p I ; s; ev ; f + ; f  ) wher e: 1. The value set V is 2 2 S . 2. The initial p osition p I is ( p 0 ; S ) , wher e p 0 is the initial imp erfe t information p osition and S is the set of al l p erfe t information p ositions  onsistent with it. 3. The su  essor funtion is desrib e d as fol lows: (a) If the maximizer is on play in the given p osition, the su  essors ar e obtaine d by enumer ating the maximizer's le gal plays and le aving the elements of the set Z of situations unhange d. 333 Ginsber g (b) If the minimizer is on play in the given p osition, the su  essors ar e obtaine d by making playing any  ar d  that is le gal in any element of X and then r estriting Z to those situations for whih  is in fat a le gal play. 4. T erminal no des ar e no des wher e al l  ar ds have b e en playe d, and ther efor e  orr esp ond to single situations s . F or suh a terminal p osition, if the de lar er has made his  ontr at, the value is ( f s g ; f S g ) . If the de lar er has faile d to make his  ontr at, the value is ( f s g ; f S  f s gg ) . 5. The maximization and minimization funtions ar e given by (8) and (9) r esp e tively. Theorem 5.0.7 Supp ose that the value of the imp erfe t information game for G is T . Then a set of p ositions T is a subset of an element of T if and only if the maximizer has a str ate gy that wins in every element of T , assuming that the minimizer plays with p erfe t information. Pro of. One again, the pro of pro eeds b y indution on the depth of the game tree. And one again, the ase where p is a terminal p osition is handled easily b y the denition. F or the indutiv e ase, w e onsider the maximizer and minimizer separately . F or the maximizer, supp ose that there is some set T of situations that satises the onditions of the theorem, so that the maximizer has a strategy that aters to all of the elemen ts of T . Then the rst mo v e of that strategy will b e some single mo v e to a p osition p i that is a suessor of p and that aters to the elemen ts of T . Th us if the v alue of the suessful  hild is F , T is a subset of some F 2 F b y the indutiv e h yp othesis. Th us if the v alue of the original game is G , T is a subset of an elemen t of G b y virtue of (8). Alternativ ely , if T is a set for whi h the maximizer has no su h strategy , then learly the maximizer annot ha v e a strategy after making an y of the mo v es to the suessor p ositions p i . This means that no sup erset U  T in an y ev  ( p i ), and th us no sup erset of T in ev  ( p ) either. The minimizing ase is not really an y harder. Supp ose rst that the maximizer has no strategy for sueeding in ev ery situation in T . Then the minimizer (pla ying with p erfet information) m ust ha v e some mo v e to a p osition p i with v alue F i su h that T is not a subset of an y elemen t of F i . No w if F i = f T i g , reall that min ( f T i g ; f U i g ) = f T i \ U j g ; and T 6 T i for ea h i . Th us T 6 T i \ U j for ea h i and j , and there is no V  T with V 2 min ( f T i g ; f U i g ) F or the last ase, supp ose that the maximizer do es ha v e a strategy for sueeding in ev ery situation in T . That means that after an y mo v e for the minimizer, the maximizer will still ha v e a strategy that sueeds in T , so that if p i are the suessors of p and ev  ( p i ) = T i , then there is a T i 2 T i with T  T i . No w T  \ i T i 2 min ( T i ) = ev  ( p ). Th us ev  ( p ) on tains an elemen t that is a sup erset of T . Using this result, w e an in theory ompute exatly the set of things w e migh t pla y for giv en a single-dumm y bridge problem. Before w e turn to the issues in v olv ed in doing so in pratie, ho w ev er, let me rep eat the example of this setion using the imp erfet information te hnique. Here is the game tree again with v alues assigned to the terminal no des: 334 GIB: Imperfet inf orma tion in a omput a tionall y hallenging game q q q q q q q q q q q q q q q q q           P P P P P P P P P P P                        A A A A A A A A A A A A A A A A         C C C C C C C C max max min min min min min f S [ T g f S g f T g f S [ T g f S [ T g f S [ T g f S [ T g f S g f T g f S [ T g S S S S S T T T T T Ba king up past the minimizer's nal mo v e giv es us: q q q q q q q q q q q q q q q q q           P P P P P P P P P P P                        A A A A A A A A A A A A A A A A         C C C C C C C C max max f S g f T g f S [ T g f S g f T g f S [ T g f S g f T g f S [ T g f S [ T g f S [ T g f S [ T g f S g f T g f S [ T g And w e an no w omplete the analysis to nally get: q q q q q q q q q q q q q q q q q           P P P P P P P P P P P                        A A A A A A A A A A A A A A A A         C C C C C C C C f S; T ; S [ T g f S; T g f S g f T g f S [ T g f S g f T g f S [ T g f S g f T g f S [ T g f S [ T g f S [ T g f S [ T g f S g f T g f S [ T g Note the dierene in the v alues assigned to the maximizer's third and fourth  hoies at the rst ply . The third  hoie has v alue f S; T g , indiating learly that the maximizer will need to subsequen tly deide whether to pla y for S or for T . But the fourth  hoie has v alue f S [ T g indiating that b oth p ossibilities are atered to. The v alue assigned to the ro ot on tains some redundany (whi h w e will deal with in Setion 7), in that one of the maximizer's  hoies ( S [ T ) dominates the others. Nev ertheless, this v alue learly indiates that the maximizer has an option a v ailable at the ro ot that aters to b oth situations. 335 Ginsber g q q q q q          A A A A Q Q Q Q Q Q C C C C min m 1 m 2 m 3 m 4 q q q q q q     A A A A    S S S S C C C C min min m 2 m 1 m 3 m 4 Figure 7: Equiv alen t games? 6. Extending alpha-b eta pruning to latties The results of the previous setion allo w us to deal with imp erfet information in theory . Unfortunately , omputing the v alue in theory is hardly the same as omputing it in pratie. Some ideas, su h as transp osition tables and partition sear h, an fairly ob viously b e applied to games with v alues tak en from sets more general than total orders. But what ab out  -  pruning, the lin hpin of high-p erformane adv ersary sear h algorithms? The answ er here is far more subtle. 6.1 Some neessary denitions Let us b egin b y onsidering the t w o small game trees in Figure 7, where the minimizer is on pla y at the nonfringe no des and none of the m i is in tended to b e neessarily terminal. Are these t w o games alw a ys equiv alen t? W e w ould argue that they are. In the game on the left, the minimizer needs to selet among the four options m 1 ; m 2 ; m 3 ; m 4 . In the game on the righ t, he needs to rst selet whether or not to pla y m 2 ; if he deides not to, he m ust selet among the remaining options. Sine the minimizer has the same p ossibilities in b oth ases, w e assume that the v alues assigned to the games are the same. F rom a more formal p oin t of view, the v alue of the game on the left is f  ( m 1 ; m 2 ; m 3 ; m 4 ), while that of the game on the righ t is f  ( m 2 ; f  ( m 1 ; m 3 ; m 4 )) where w e ha v e abused nota- tion somewhat, writing m i for the v alue of the no de m i as w ell. Denition 6.1.1 A game wil l b e  al le d simple if for any x 2 v  V , f + f x g = f  f x g = x and also f + ( v ) = f + f x; f + ( v  x ) g and f  ( v ) = f  f x; f  ( v  x ) g 336 GIB: Imperfet inf orma tion in a omput a tionall y hallenging game W e ha v e augmen ted the ondition dev elop ed in the disussion of Figure 7 with the assumption that if a pla y er's mo v e in a p osition p is fored (so that p has a unique suessor), then the v alue b efore and after the fored mo v e is the same. Prop osition 6.1.2 F or any simple game, ther e ar e binary funtions ^ and _ fr om V to itself that ar e  ommutative, asso iative and idemp otent 14 and suh that f + f v 0 ; : : : ; v m g = v 0 _    _ v m and f  f v 0 ; : : : ; v m g = v 0 ^    ^ v m Pro of. Indution on m . When referring to a simple game, w e will t ypially replae the funtions f + and f  b y the equiv alen t binary funtions _ and ^ . W e assume throughout the rest of this setion that all games are simple. 15 The binary funtions _ and ^ no w indue a partial order  , where w e will sa y that x  y if and only if x _ y = y . It is not hard to see that this partial order is reexiv e ( x  x ), an tisymmetri ( x  y and y  x if and only if x = y ) and transitiv e. The op erators _ and ^ b eha v e lik e greatest lo w er b ound and least upp er b ound op erators with regard to the partial order. W e also ha v e the follo wing: Prop osition 6.1.3 Whenever S  T , f + ( S )  f + ( T ) and f  ( S )  f  ( T ) . In other w ords, assuming that the minimizer is trying to rea h a lo w v alue in the partial order and the maximizer is trying to rea h a high one, ha ving more options is alw a ys go o d. 6.2 Shallo w pruning W e are no w able to in v estigate  -  pruning in our general framew ork. Let us b egin with shallo w pruning, sho wn in Figure 8. The idea here is that if the minimizer prefers x to y , he will nev er allo w the maximizer ev en the p ossibilit y of seleting b et w een y and the v alue of the subtree ro oted at T . After all, the v alue of the maximizing no de in the gure is y _ ev  ( T )  y  x , and the minimizer will therefore alw a ys prefer x . In order for the usual orretness pro of for (shallo w)  -  pruning to hold, w e need the follo wing ondition to b e satised: Denition 6.2.1 (Shal low  -  pruning) A game G wil l b e said to allo w shallo w  -  prun- ing for the minimizer if x ^ ( y _ T ) = x (10) 14. A binary funtion f is alled idemp otent if f ( a; a ) = a for all a. 15. W e also assume that the games are suÆien tly omplex that w e an nd in the game tree a no de with an y desired funtional v alue, e.g., a ^ ( b _  ) for sp ei a , b and  . W ere this not the ase, none of our results w ould follo w. As an example, a game in whi h the initial p osition is also terminal surely admits pruning of all kinds (sine the game tree is empt y) but need not satisfy the onlusions of the results in this setion. 337 Ginsber g q q q q q    A A A A    S S S S C C C C max min x y T Figure 8: T an b e pruned (shallo wly) if x  y for al l x; y ; T 2 V with x  y . The game wil l b e said to allo w shallo w  -  pruning for the maximizer if x _ ( y ^ T ) = x (11) for al l x; y ; T 2 V with x  y . We wil l say that G allo ws shallo w pruning if it al lows shal low  -  pruning for b oth players. The denition basially sa ys that the ba k ed up v alue at the ro ot of the game tree is un hanged b y pruning the maximizing subtree in the gure. As w e will see shortly , the expressions (10) and (11) desribing shallo w pruning are iden tial to what are more t ypially kno wn as absorption identities . Denition 6.2.2 Supp ose V is a set and ^ and _ ar e two binary op er ators on V . The triple ( V ; ^ ; _ ) is  al le d a lattie if ^ and _ ar e idemp otent,  ommutative and asso iative, and satisfy the absorption iden tities in that for any x; y 2 V , x _ ( x ^ y ) = x (12) x ^ ( x _ y ) = x (13) W e also ha v e the follo wing: Denition 6.2.3 A latti e ( V ; ^ ; _ ) is  al le d distributiv e if ^ and _ distribute with r esp e t to one another, so that x _ ( y ^ z ) = ( x _ y ) ^ ( x _ z ) (14) x ^ ( y _ z ) = ( x ^ y ) _ ( x ^ z ) (15) Lemma 6.2.4 Eah of (12) and (13) implies the other. Eah of (14) and (15) implies the other. Pro of. These are w ell kno wn results from lattie theory (Gr atzer, 1978). Prop osition 6.2.5 (Ginsb erg & Jara y , 2001) F or a game G , the fol lowing  onditions ar e e quivalent: 338 GIB: Imperfet inf orma tion in a omput a tionall y hallenging game r r r r r    A A A A    S S S S C C C C r max r min      max min x y T Figure 9: T an b e pruned (deeply) if x  y 1. G al lows shal low  -  pruning for the minimizer. 2. G al lows shal low  -  pruning for the maximizer. 3. G al lows shal low pruning. 4. ( V ; ^ ; _ ) is a latti e. Pro of. 16 W e sho w that the rst and fourth onditions are equiv alen t; ev erything else follo ws easily . If G allo ws shallo w  -  pruning for the minimizer, w e tak e x = a and y = T = a _ b in (10). Clearly x  y so w e get a ^ ( y _ y ) = a ^ y = a ^ ( a _ b ) = a as in (13). F or the on v erse, if x  y , then x ^ y = x and x ^ ( y _ T ) = ( x ^ y ) ^ ( y _ T ) = x ^ ( y ^ ( y _ T )) = x ^ y = x: 6.3 Deep pruning Deep pruning is a bit more subtle. An example app ears in Figure 9. As b efore, assume x  y . The argumen t is as desrib ed previously: Giv en that the minimizer has a guaran teed v alue of x at the upp er minimizing no de, there is no w a y that a  hoie allo wing the maximizer to rea h y an b e on the main line; if it w ere, then the maximizer ould get a v alue of at least y . 16. The pro ofs of this and Prop osition 6.3.2 are due to Alan Jara y . 339 Ginsber g r r r r r                      0 | } ~ | max min max min Figure 10: The deep pruning oun terexample Denition 6.3.1 (De ep  -  pruning) A game G wil l b e said to allo w  -  pruning for the minimizer if for any x; y ; T ; z 1 ; : : : ; z 2 i 2 V with x  y , x ^ ( z 1 _ ( z 2 ^    _ ( z 2 i ^ ( y _ T )))    ) = x ^ ( z 1 _ ( z 2 ^    _ z 2 i )    ) : The game wil l b e said to allo w  -  pruning for the maximizer if x _ ( z 1 ^ ( z 2 _    ^ ( z 2 i _ ( y ^ T )))    ) = x _ ( z 1 ^ ( z 2 _    ^ z 2 i )    ) : We wil l say that G allo ws pruning if it al lows  -  pruning for b oth players. As b efore, the prune allo ws us to remo v e the dominated no de ( y in Figure 9) and all of its siblings. The fat that a game allo ws shallo w  -  pruning do es not mean that it allo ws pruning in general, as is sho wn b y the follo wing oun terexample. The example in v olv es a game with one ard that is kno wn to b oth pla y ers; only the suit of the ard matters. The game tree app ears in Figure 10. In this tree, a no de lab elled with a suit sym b ol is terminal and means that the maximizer wins if and only if the suit of the ard mat hes the giv en sym b ol. So at the ro ot of the giv en tree, the maximizer (whose turn it is to pla y) an  ho ose to \turn o v er" the ard, winning if and only if it's a lub, or an defer to the minimizer. The minimizer an  ho ose to turn the ard ( losing just in ase it's a diamond { the suit sym b ols refer to the maximizer's result), or hand the situation ba k to the maximizer. If the maximizer defers y et again, the minimizer an either turn o v er the ard, losing if it's a lub, or simply delare vitory (presumably his  hoie). There is one other wrinkle in this game. A t an y p oin t in the game, the maximizer an  hange the ard from either a diamond or a spade to a lub. No w let's onsider the game itself. A t ply 4, the minimizer will ob viously  ho ose to win the game. Th us at ply 3, the maximizer will need to  ho ose ~ , winning just in ase the ard is a heart. But this means that at ply 2, the minimizer will win the game, sine if the ard is not a diamond he will mo v e to the left (and win at one) while if the ard is not a heart he an win b y mo ving to the righ t. (Remem b er that the minimizer kno ws the suit 340 GIB: Imperfet inf orma tion in a omput a tionall y hallenging game of the ard.) The upshot of this is that the maximizer wins the o v erall game if and only if the ard in question is a lub. A formal analysis pro eeds similarly , lab elling the no des as follo ws: r r r r r                      0 | } ~ | ~ = ~ _ 0 0 = | ^ 0 | = | _ 0 0 = } ^ ~ Note, iniden tally , that the maximizer's abilit y to  hange the ard do es not help him win the game. No w supp ose that w e apply deep pruning to this game. The ply four no de is one where the minimizer an fore a v alue of at most | , suggesting that the siblings of the b ottom | no de an b e pruned. But doing so pro dues the follo wing tree: r r r r r                      pruned? | } ~ | 1 = ~ _ | | | _ } } = } ^ 1 If the maximizer rea hes ply 3, he an win b y  hanging the ard to a lub if need b e. Of ourse, the minimizer w on't let the maximizer rea h ply 3; at ply 2, he'll mo v e left so that the maximizer wins only if the ard is a diamond. That means that the maximizer wins at the ro ot just in ase the ard is either a lub or a diamond. A partial graph of the v alues for this game is as follo ws: r r r r r r       Q Q Q Q Q Q     A A A A          0 1 | } ~  where w e ha v e inluded the ruial fat that x ^ y = 0 if x 6 = y (sine the minimizer kno ws the ard) and ~ _ | = 1 b eause the maximizer an in v ok e his sp eial rule. Other least upp er b ounds are not sho wn in the diagram. The maximizing funtion _ mo v es up the gure; the minimizing funtion ^ mo v es do wn. The deep prune fails b eause w e an't \push" the v alue | ^ 0 past the ~ to get to the | near the ro ot. Somewhat more preisely , the problem is that ~ = ~ _ ( | ^ 0) 6 = ( ~ ^ | ) _ ( ~ ^ 0) = 0 This suggests the follo wing: 341 Ginsber g Prop osition 6.3.2 (Ginsb erg & Jara y , 2001) F or a game G , the fol lowing  onditions ar e e quivalent: 1. G al lows  -  pruning for the minimizer. 2. G al lows  -  pruning for the maximizer. 3. G al lows pruning. 4. ( V ; ^ ; _ ) is a distributive latti e. Pro of. As b efore, w e sho w only that the rst and fourth onditions are equiv alen t. Sine pruning implies shallo w pruning (tak e i = 0 in the denition), it follo ws that the rst ondition implies that ( V ; ^ ; _ ) is a lattie. F rom deep pruning for the minimizer with i = 1, w e ha v e that if x  y , then for an y z 1 ; z 2 ; T , x ^ ( z 1 _ ( z 2 ^ ( y _ T ))) = x ^ ( z 1 _ z 2 ) No w tak e y = T = x to get x ^ ( z 1 _ ( z 2 ^ x )) = x ^ ( z 1 _ z 2 ) (16) It follo ws that ea h top lev el term in the left hand side of (16) is greater than or equal to the righ t hand side; sp eially z 1 _ ( z 2 ^ x )  x ^ ( z 1 _ z 2 ) : (17) W e laim that this implies that the lattie in question is distributiv e. T o see this, let u; v ; w 2 V . No w tak e z 1 = u ^ w , z 2 = v and x = w in (17) to get ( u ^ w ) _ ( v ^ w )  w ^ (( u ^ w ) _ v ) (18) But v _ ( u ^ w )  w ^ ( v _ u ) is an instane of (17), and om bining this with (18) giv es us ( u ^ w ) _ ( v ^ w )  w ^ (( u ^ w ) _ v )  w ^ w ^ ( v _ u ) = w ^ ( v _ u ) This is the hard diretion; w ^ ( v _ u )  ( u ^ w ) _ ( v ^ w ) for an y lattie b eause w ^ ( v _ u )  u ^ w and w ^ ( v _ u )  v ^ w individually . Th us w ^ ( v _ u ) = ( u ^ w ) _ ( v ^ w ), and deep pruning implies that the lattie is distributiv e. F or the on v erse, if the lattie is distributiv e and x  y , then x ^ ( z 1 _ ( z 2 ^ ( y _ T ))) = ( x ^ z 1 ) _ ( x ^ z 2 ^ ( y _ T )) = ( x ^ z 1 ) _ ( x ^ z 2 ) = x ^ ( z 1 _ z 2 ) where the seond equalit y is a onsequene of the fat that x  ( y _ T ), so that x = x ^ ( y _ T ). This v alidates pruning for i = 1; deep er ases are similar. Finally , note that in games where this result applies, w e an on tin ue to use Algorithms 2.2.5 or 2.3.3 without mo diation, sine the prunes that they endorse on tin ue to b e sound as the game tree is expanded. 342 GIB: Imperfet inf orma tion in a omput a tionall y hallenging game 6.4 Appliation to imp erfet information In order to apply these ideas to games of imp erfet information treated as in Setion 5, w e need to sho w that the v alue set in tro dued there is a (hop efully distributiv e) lattie. T o do this, reall that there is redundan t information in an arbitrary elemen t F of 2 2 S , sine if F on tains b oth T and U with T  U (in other w ords, the maximizer an pla y for either T or for U but U is prop erly b etter), the set T an b e remo v ed from F without aeting the maximizer's options in an y in teresting w a y . This suggests the follo wing: Denition 6.4.1 L et F 2 2 2 S for an arbitr ary set S . We wil l say that F is redued if ther e ar e no T ; U 2 F with T  U . We wil l say that F 1 is a redution of F 2 if F 1 is r e du e d and F 1  F 2 . Lemma 6.4.2 Every F 2 2 2 S has a unique r e dution. Pro of. This is immediate; just remo v e the subsumed elemen ts from F . . W e will denote the redution of F b y r ( F ). Armed with this denition, w e an no w mo dify Denition 5.0.6 in the ob vious w a y , replaing the v alue set V with the set of redued elemen ts of V and the maximizing and minimizing funtions (8) and (9) with the redued v ersions thereof, so that max( F ; G ) = r ( F [ G ) (19) and min ( f F i g ; f G j g ) = r ( f F i \ G j g ) (20) Remem b er that w e t ypially write _ for max and ^ for min . Prop osition 6.4.3 Given the ab ove denitions, ( V ; _ ; ^ ) is a distributive latti e. Pro of. W e need to sho w that max and min as dened ab o v e are omm utativ e, asso iativ e, and idemp oten t, that they distribute with resp et to one another, and that the absorption iden tit y (12) is satised. Sine the redution op erator learly omm utes with the initial denitions of max and min , omm utativit y , asso iativit y and distributivit y are ob vious, as is the fat that _ is idemp oten t. T o see that ^ is idemp oten t, w e ha v e F ^ F = r ( f min ( F i ; F j ) g ) = r ( f F i \ F j g ) but ea h elemen t of the set on the righ thand side is a subset of F i \ F i so F ^ F = r ( f F i g ) = r ( F ) = F : F or the absorption iden tit y , w e need to sho w that F _ ( F ^ G ) = F But F ^ G = r f F i \ G j g 343 Ginsber g so F _ ( F ^ G ) = r ( F _ r f F i \ G j g ) = r ( f F i g [ f F i \ G j g ) = r ( f F i g ) = r ( F ) = F sine, one again, ea h elemen t of F ^ G is subsumed b y the orresp onding F i . It follo ws that an implemen tation designed to ompute the v alue of an imp erfet in- formation game as desrib ed b y Theorem 5.0.7 an indeed use  -  pruning to sp eed the omputation. 6.5 Bridge implemen tation Giv en this b o dy of theory , w e implemen ted a single-dumm y v ersion of gib 's double-dumm y sear h engine. Not surprisingly , the most diÆult elemen t of the implemen tation w as build- ing eÆien t data strutures for the manipulation of elemen ts of 2 2 S . T o handle this, w e represen ted ea h elemen t of S as a onjuntion. W e rst iden tied one of the t w o hidden hands H , and then for ea h ard  , w ould write  if  w ere held b y H and :  if  w ere not held b y H . An elemen t of 2 S w as then tak en to b e a disjuntiv e om bination of these onjuntions, and an elemen t of 2 2 S w as tak en to b e a list of su h disjuntions. The adv an tage of this represen tation w as that logial inferene ould b e used to onstrut the redution of an y su h list. In order to mak e this inferene as eÆien t as p ossible, the disjuntions themselv es w ere represen ted as binary de ision diagr ams , or bdd 's (Lind-Nielsen, 2000). There are a v ariet y of publi domain implemen tations of bdd 's a v ailable, and w e used one pro vided b y Lind- Nielsen (Lind-Nielsen, 2000). 17 The resulting implemen tation solv es small endings (p erhaps 16 ards left in total) qui kly but for larger endings, the running times ome to b e dominated b y the bdd omputations; this is hardly surprising, sine the size of individual bdd s an b e exp onen tial in the size of S (the n um b er of p ossible distributions of the unseen ards). W e found that w e w ere generally able to solv e 32-ard endings in ab out a min ute, but that the running times w ere inreasing b y t w o orders of magnitude as ea h additional ard w as added. This is b oth go o d news and bad news. View ed p ositiv ely , the p erformane of the system as onstruted is far sup erior to the p erformane of preeding attempts to deal with the imp erfet information arising in bridge. F rank et.al, for example, are only apable of solving single suit om binations (13 ards left, giv e or tak e) using an algorithm that app ears to tak e sev eral min utes to run (F rank, Basin, & Matsubara, 1998). They subsequen tly impro v e the p erformane to an a v erage time of 0.6 seonds (F rank et al., 2000), but are still restrited to problems that are to o small to b e of m u h use to a program in tended to pla y the omplete game. 17. W e tried a v ariet y of non- bdd based implemen tations as w ell. The bdd -based implemen tation w as far faster than an y of the others. 344 GIB: Imperfet inf orma tion in a omput a tionall y hallenging game That's the go o d news. The bad news is that a program apable only of solving an 8- ard ending in a min ute is inappropriate for pro dution use. Gib is a pro dution program, exp eted to pla y bridge at h uman sp eeds. Another approa h w as therefore needed. 7. Solving single-dumm y problems in pratie 7.1 A hiev able sets The k ey to pratial appliation of the ideas in the previous setion is the realization that when it omes time to mak e a pla y , a single elemen t of F m ust b e seleted: if y ou an pla y for W est to ha v e the  Q or for ea h pla y er to ha v e three hearts but annot ater to b oth p ossibilities sim ultaneously , y ou ev en tually ha v e to atually mak e the  hoie. Denition 7.1.1 Supp ose that the value of the imp erfe t information game for G is F . Given a sp e i A  S , we wil l say that A is a hiev able if ther e is some F 2 F for whih A  F . In other w ords, the set A of situations is a hiev able if the maximizer has a plan that wins for all elemen ts of A . Denition 7.1.2 Given a set S of situations, a pa y o funtion for S is any funtion f : 2 S ! I R suh that f ( U )  f ( T ) whenever U  T . The pa y o funtion ev aluates p oten tial a hiev able sets. Denition 7.1.3 L et G b e a game and S the asso iate d set of situations. If f is a p ayo funtion for S , a solution to G under f is any ahievable set A for whih f ( A ) is maximal. In pratie, w e need not nd the atual v alue of the game; nding a solution to G under an appropriate pa y o funtion suÆes. In bridge, the pa y o funtion is presumably the probabilit y that the ards are dealt as in the set A ; this funtion learly inreases with inreasing set size as required b y Denition 7.1.2 and an b e ev aluated in pratie using the Mon te Carlo sample of Setion 3. Instead of nding the solution to an imp erfet information game, supp ose instead that w e ha v e a Mon te Carlo sample for the game onsisting of a set of situations S = f s i g that is ordered as i = 0 ; : : : ; n . W e an no w pro due an a hiev able set A as follo ws: Algorithm 7.1.4 T o  onstrut a maximal ahievable set A fr om a se quen e h s 0 ; : : : ; s n i of situations: 1. Set A =  . 2. F or i = 0 ; : : : ; n , if A [ f s i g is ahievable, set A = A [ f s i g . The algorithm onstruts the a hiev able set in a greedy fashion, gradually adding elemen ts of S to A un til no more an b e added. Denition 7.1.5 Given a game G and a se quen e S of situations, the a hiev able set in- dued b y S for G is the set  onstrute d by A lgorithm 7.1.4. 345 Ginsber g F rom a omputational p oin t of view, the exp ensiv e step in the algorithm is determining whether or not the set A [ f s i g is a hiev able. This is relativ ely straigh tforw ard, ho w ev er, sine the fo us on a sp ei set eetiv ely replaes the game G with a new game with v alues in f 0 ; 1 g . A t an y partiular no de n , if expanding n demonstrates that A [ f s i g is not a hiev able, the v alue of the game is zero. If expanding n indiates that A [ f s i g is a hiev able one n is rea hed, then the v alue of the no de n is one. Although the sear h spae is un hanged from that of the original imp erfet information game as in Denition 5.0.6, there is no longer an y need to manipulate omplex v alues, and the  he k for a hiev abilit y is therefore tratable in pratie. Let me illustrate this b y returning to our usual example of Setion 5. Here is the fully ev aluated tree one again: q q q q q q q q q q q q q q q q q           P P P P P P P P P P P                        A A A A A A A A A A A A A A A A         C C C C C C C C f S [ T g f S; T g f S g f T g f S [ T g f S g f T g f S [ T g f S g f T g f S [ T g f S [ T g f S [ T g f S [ T g f S g f T g f S [ T g Note that w e ha v e replaed the v alue at the ro ot with its redution. No w supp ose that w e view the set of p ositions as on taining only t w o elemen ts, s 2 S and t 2 T . Presumably W est holds the  Q in s , and East holds three hearts in t . If the ordering  hosen is h s; t i , then w e rst try to a hiev e f s g . In this on text, a no de n is a win for the maximizer if either the maximizer an indeed win at n or s is no longer p ossible (in whi h ase the maximizer's abilit y to a hiev e f s g is undiminished). The game tree b eomes: q q q q q q q q q q q q q q q q q           P P P P P P P P P P P                        A A A A A A A A A A A A A A A A         C C C C C C C C max max min min min min min 1 1 0 1 1 1 1 1 0 1 S S S S S T T T T T All of the T bran hes are wins for the maximizer (who is onerned with s only), and the S bran hes are wins just in ase the maximizer do es indeed win (as he do es if he guesses righ t at either of the rst t w o plies). Ba king up the v alues giv es us: 346 GIB: Imperfet inf orma tion in a omput a tionall y hallenging game q q q q q q q q q q q q q q q q q           P P P P P P P P P P P                        A A A A A A A A A A A A A A A A         C C C C C C C C 1 1 1 0 1 0 1 1 1 0 1 1 1 1 1 0 1 S S S S S T T T T T This indiates (orretly) that the maximizer an a hiev e s pro vided that he do esn't deide to pla y for T at the ro ot of the tree. Note that this analysis is a straigh t minimax, allo wing fast algorithms to b e applied while a v oiding the manipulation of elemen ts of 2 2 S desrib ed in the previous setion. No w w e add t to our a hiev able set, whi h th us b eomes f s; t g . The maximizer wins only if he really do es win (and not just b eause he isn't in terested in T an y more), and the basi tree b eomes: q q q q q q q q q q q q q q q q q           P P P P P P P P P P P                        A A A A A A A A A A A A A A A A         C C C C C C C C max max min min min min min 1 0 0 1 1 1 1 0 0 1 S S S S S T T T T T Ba king up the v alues giv es: q q q q q q q q q q q q q q q q q           P P P P P P P P P P P                        A A A A A A A A A A A A A A A A         C C C C C C C C 1 0 0 0 1 0 0 1 0 0 1 1 1 1 0 0 1 S S S S S T T T T T The maximizer an a hiev e the extended result only b y making the righ tmost mo v e, as desired. What if the righ tmost bran h did not exist, so that the maximizer w ere unable to om bine his  hanes? No w the v alue of the ro ot no de in the ab o v e tree is 0, so that f s; t g is not a hiev able. The maximal a hiev able set returned b y the algorithm w ould b e S ; had the 347 Ginsber g ordering b een h t; s i instead, an alternativ e maximal a hiev able set of T w ould ha v e b een returned instead. In an y ev en t, w e ha v e: Prop osition 7.1.6 Given a game G and a se quen e S of situations, let A b e the ahievable set indu e d by S for G . Then no pr op er sup erset of A in S is ahievable. Pro of. This is straigh tforw ard. F or an y elemen t s 2 S  A , w e kno w that U [ f s g is not a hiev able for some U  A . Th us A [ f s g is not a hiev able as w ell. Algorithm 7.1.4 allo ws us to onstrut maximal a hiev able sets relativ e to our Mon te Carlo sample; reall that w e are taking our sequene S of situations to b e an y ordering of the sample itself. In pratie, ho w ev er, it is imp ortan t not to fo us to o sharply on the sample itself, lest the ev en tual a hiev able set onstruted o v ert irrelev an t probabilisti  harateristis of that sample. This an b e aomplished b y replaing the simple union in step 2 of the algorithm with some more ompliated op eration that aptures the idea of \situations that are either lik e s i or lik e those already in A ." In bridge, for example, A migh t b e all situations where W est has t w o or three hearts, and s i migh t b e some new situation where W est has four hearts. The generalized union w ould b e situations where W est has t w o, three or four hearts. If this more general set is not a hiev able, another attempt ould b e made with the simple union. If w e denote the \general union" b y  , Algorithm 7.1.4 b eomes: Algorithm 7.1.7 T o  onstrut an ahievable set A fr om a se quen e h s 0 ; : : : ; s n i of situa- tions: 1. Set A =  . 2. F or i = 0 ; : : : ; n : (a) If A  f s i g is ahievable, set A = A  f s i g . (b) Otherwise, if A [ f s i g is ahievable, set A = A [ f s i g . This algorithm an b e used in pratie to nd a hiev able sets that are either maximal or eetiv ely so o v er the set of all p ossible instanes, not just those app earing in the Mon te Carlo sample. 7.2 Maximizing the pa y o It remains to nd not just maximal a hiev able sets, but ones that appro ximate the solution to the game in question giv en a partiular pa y o funtion. T o understand ho w w e do this, let me dra w an analogy b et w een the problem w e are trying to solv e and resoure-onstrained pro jet s heduling ( r ps ). In r ps , one has a list of tasks to b e p erformed, together with ordering onstrain ts sa ying that ertain tasks need to b e p erformed b efore others. In addition, ea h task uses a ertain quan tit y of v arious resoures; there are limitations on the a v ailabilit y of an y partiular resoure at an y partiular time. As an example, building an airraft wing ma y in v olv e fabriating the top and b ottom igh t surfaes, building the aileron, and atta hing the t w o. It should b e lear that the aileron 348 GIB: Imperfet inf orma tion in a omput a tionall y hallenging game annot b e atta hed un til b oth it and the wing ha v e b een onstruted. Building ea h setion ma y in v olv e the use of three sheetmetal w ork ers, but only v e ma y b e a v ailable in general. The goal in an r ps problem is t ypially to minimize the length of the s hedule (often alled the makesp an ) without exeeding the resoure limits. In building a wing, it is more eÆien t (and more ost eetiv e) to build it qui kly than slo wly . Man y pro dution s heduling systems try to minimize mak espan b y building the s hedule from the initial time forw ard. A t ea h p oin t, they selet a task all of whose predeessors ha v e b een s heduled, and then s hedule that task as early as p ossible giv en the previously s heduled tasks and the resoure onstrain ts. S heduling the tasks in this w a y pro dues a lo ally optimal s hedule that ma y b e impro v ed b y mo difying the order in whi h the tasks are seleted for s heduling. One metho d for nding an appropriate mo diation to the seletion order is kno wn as sque aky whe el optimization , or sw o (Joslin & Clemen ts, 1999). In sw o , a lo ally optimal s hedule is examined to determine whi h tasks are s heduled most sub optimally relativ e to some o v erall metri; those tasks are deemed to \squeak" and are then adv aned in the task list so that they are s heduled earlier when the s hedule is reonstruted. This pro ess is rep eated, pro duing a v ariet y of andidate solutions to the s heduling problem at hand; one of these s hedules is t ypially optimal or nearly so. Applying sw o to our game-pla ying problem is relativ ely straigh tforw ard. 18 When w e use Algorithm 7.1.7 to onstrut an a hiev able set, w e also onstrut as a b ypro dut a list of sample elemen ts to whi h that a hiev able set annot b e extended; mo ving elemen ts of this list forw ard in the sequene of h s 0 ; : : : ; s n i will ause them to b e more lik ely to b e inluded in the a hiev able set A if the algorithm is rein v ok ed. The w eigh ts assigned to the failing sequene elemen ts an b e onstruted b y determining ho w represen tativ e ea h partiular elemen t is of the remainder of the sample. Returning to our example, supp ose that the set S (where W est has the  Q) has a single represen tativ e s 1 in the Mon te Carlo sample (presumably this means it is unlik ely for W est to hold the ard in question), while T has v e su h represen tativ es t 1 , t 2 , t 3 , t 4 and t 5 . Supp ose also that the initial ordering of the six elemen ts is h s 1 ; t 4 ; t 2 ; t 1 ; t 5 ; t 3 i . Assuming that the maximizer loses his righ tmost option (so that he annot ater to S and T sim ultaneously), the maximal a hiev able set orresp onding to this ordering is S . An examination no w rev eals that all of the t i 's ould ha v e b een a hiev ed but w eren't; in sw o terms, these elemen ts of the sample \squeak." A t the next iteration, the priorities of the t i 's are inreased b y mo ving them forw ard in the sequene, while the priorit y of s 1 falls. P erhaps the new ordering is h t 4 ; t 2 ; s 1 ; t 1 ; t 5 ; t 3 i . This ordering an b e easily seen to lead to the maximal a hiev able set T ; S [ T is still una hiev able. But the pa y o assigned to T is lik ely to b e m u h b etter than that assigned to S (a probabilit y of 0.8 instead of 0.2, if the Mon te Carlo sample itself is un w eigh ted). It is in this w a y that sw o allo ws us to nd a globally optimal (or nearly so) a hiev able set. 18. Squeaky wheel optimization w as dev elop ed at the Univ ersit y of Oregon; the paten t appliation for the te hnique has b een allo w ed b y the U.S. P aten t and T rademark OÆe. The Univ ersit y's in terests in sw o are liensed exlusiv ely to On Time Systems, In. for use in s heduling and related appliations, and to Just W rite, In. for use in bridge-pla ying systems. 349 Ginsber g 7.3 Results Our implemen tation of gib 's ardpla y when delarer is based on the ideas desrib ed ab o v e. (As a defender, a diret Mon te Carlo approa h app ears preferable b eause enough infor- mation is t ypially a v ailable ab out delarer's hand to mak e the double-dumm y assumption reasonably v alid.) The implemen tation is fast enough to onform to the time requiremen ts plaed on a pro dution program (roughly one pu min ute to pla y ea h deal). Ev aluating the impat of these ideas on gib 's ardpla y is diÆult, sine delarer pla y is already the strongest asp et of its game. In extended mat hes b et w een the t w o v ersions of gib , the approa h based on the ideas desrib ed here b eats the Mon te-Carlo based v ersion b y appro ximately 0.1 imp s/deal, but there is a great deal of noise in the data b eause most of the swings orresp ond to dierenes in bidding or defensiv e pla y . It is p ossible to remo v e some of these dierenes artiially (requiring the bidding to b e iden tial b oth times the deal is pla y ed, for example), but defensiv e dierenes remain. Nev ertheless, gib is urren tly a strong enough pla y er that the 0.1 imp s/deal dierene is signian t. The situation on problem deals, su h as those from the par on tests or from the Gitelman sets, is m u h learer. In addition, man y of the deals that gib gets \wrong" are in fat deals that gib pla ys orretly but that the problem omp osers pla y inorretly (Gitelman or, in the ase of the par on tests, Swiss bridge exp ert Pietro Bernasoni). In the follo wing table, w e ha v e b een generous with all parties, deeming a line to b e orret if it is not learly inferior to another. Let me p oin t out that the designers of the problems are attempting to onstrut deals where there is a unique solution (the \answ er" to the test they are p osing the solv er), so that a deal with m ultiple solutions is in fat one that the omp oser has already misanalyzed. Soure size BB Gib MC Gib SW O omp oser am biguous BM lev el 1 36 16 31 36 35 0 lev el 2 36 8 23 34 34 1 lev el 3 36 2 12 34 34 2 lev el 4 36 1 21 31 34 4 lev el 5 36 4 13 28 34 5 1998 par on test 12 0 5 11 12 2 1990 par on test 18 0 8 14 17 3 The ro ws are in order of inreasing diÆult y; it w as univ ersally felt among the h uman omp etitors that the deals in the 1990 par on test w ere far more diÆult than those in 1998. The olumns are as follo ws: Soure is the soure from whi h the problems w ere obtained. Size is the n um b er of problems a v ailable from this partiular soure. BB giv es the n um b er of problems solv ed orretly b y Bridge Baron 6. Gib MC giv es the n um b er solv ed orretly b y gib using a Mon te Carlo approa h. Gib SW O giv es the n um b er solv ed orretly b y gib using sw o and a hiev able sets. omp oser giv es the n um b er solv ed orretly b y the omp oser (in that the in tended solution w as the b est one a v ailable). am biguous giv es the n um b er misanalyzed b y the omp oser (in that m ultiple solutions exist). 350 GIB: Imperfet inf orma tion in a omput a tionall y hallenging game Note, iniden tally , that gib 's p erformane is still less than p erfet on these problems. The reason is that gib 's sample ma y b e sk ew ed in some w a y , or that sw o ma y fail to nd a global optim um among the set of p ossible a hiev able sets. 8. Conlusion 8.1 GIB ompared Other programs Gib partiipated in b oth the 1998 and the 2000 W orld Computer Bridge Championships. (There w as no 1999 ev en t.) Pla y w as organized with ea h ma hine pla ying t w o hands and the omp etitors b eing trusted not to  heat b y \p eeking" at partner's ards or those of the opp onen ts. 19 Ea h tournamen t b egan with a omplete round robin among the programs, with the top four programs on tin uing to a kno  k out phase. The mat hes in the round robin w ere quite short, and it w as exp eted that bridge's sto  hasti elemen t w ould k eep an y program from b eing ompletely dominan t. While this ma y ha v e b een true in theory , in pratie gib dominated b oth round robins, winning all of its mat hes in 1998 and all but one in 2000. The round robin results from the 2000 ev en t w ere as follo ws: 20 Gib WB Mir o Buff Q-Plus Chip Bar on M'lark T otal Gib { 14 11 16 7 19 16 17 100 WBridge 6 { 19 13 16 7 18 20 99 Mir o 9 1 { 18 15 15 13 20 91 Buff 4 7 2 { 12 20 5 20 70 Q-Plus 13 4 5 8 { 11 14 11 66 Blue Chip 1 13 5 0 9 { 11 20 59 Bar on 4 2 7 15 6 9 { 14 57 Meado wlark 3 0 0 0 9 0 6 { 18 Ea h mat h w as on v erted rst to imp s and then to vitory p oints , or VPs, with the t w o omp eting programs sharing the 20 VPs a v ailable in ea h mat h. The rst en try in the ab o v e table indiates that gib b eat wbridge b y 14 VPs to 6; the fourth that gib lost to q-plus bridge b y 7 VPs to 13. (This is gib 's only loss ev er to another program in tournamen t pla y .) In the 1998 kno  k out phase, gib b eat Bridge Baron in the seminals b y 84 imp s o v er 48 deals. Had the programs b een ev enly mat hed, the imp dierene ould b e exp eted to b e normally distributed, and the observ ed 84 imp dierene w ould b e a 2.2 standard deviation 19. Starting with the 2001 ev en t, ea h omputer will handle only one of the four pla y ers, although there is still no attempt to prev en t the (net w ork ed) omputers from transmitting illegal information b et w een partners. 20. There w ere eigh t omp etitors in the ev en t: gib ( www.gib w a re.om ), Hans Leb er's q-plus ( www.q-plus.om ), T omio and Y umik o U hida's mir o bridge ( www.threew eb.ad.jp/~ mb ridge ), Mik e Whittak er and Ian T ra kman's blue hip bridge ( www.bluehipb ridge.o.uk ), Ro d Lud- wig's meado wlark bridge ( rrnet.om/meado wla rk ), bridge bar on ( www.b ridgeba ron.om ), and t w o new omers: Doug Bannion's bridge buff ( www.b ridgebu.om ) and Yv es Costel's wbridge ( ourw o rld.ompuserve.om/homepages/yvesostel ). 351 Ginsber g ev en t. Gib then b eat Q-Plus Bridge in the nals b y 63 imp s o v er 64 deals (a 1.4 standard deviation ev en t). In 2000, it b eat Bridge Bu b y 39 imp s o v er 48 deals in the seminals (a 1.0 standard deviation ev en t) and then b eat wbridge b y 101 imp s o v er 58 deals (a 2.6 standard deviation ev en t). The nals had b een s heduled to run 64 deals, but wbridge oneded after 58 had b een pla y ed. The most publiized deal from the nal w as this one, an extremely diÆult deal that b oth programs pla y ed mo derately w ell. Gib rea hed a b etter on trat and w as aided somewhat b y wbridge 's misdefene in a mo derately omplex situation.  K Q 9 ~ A Q J } 9 6 4 3 2 | 8 6  10 6  8 7 3 2 ~ 10 9 2 ~ 7 5 3 } 10 } A K Q J 8 5 | A J 10 9 5 3 2 | |  A J 5 4 ~ K 8 6 4 } 7 | K Q 7 4 When wbridge pla y ed the North-South ards and gib w as East-W est, North op ened 1 } and ev en tually pla y ed in three notrump, ommitting to taking nine tri ks. The gib East started with four rounds of diamonds as South disarded t w o lubs and . . . ? Lo oking at all four hands, the on trat is old; South an disard another lub and East has none to pla y . There are th us nine tri ks: four in ea h of hearts and spades, and the diamond nine. Giv e East a lub, ho w ev er, and the on trat rates to b e do wn no less than four sine the defense will b e able to tak e at least four lub tri ks. WBridge deided to pla y safe, k eeping the | K Q and disarding a heart. There are no w only eigh t tri ks and the on trat w as do wn one. The bidding and pla y w ere more in teresting when gib w as N-S. North op ened 1NT, sho wing 11{14 HCP without four hearts or spades unless exatly three ards w ere held in ev ery other suit. East o v eralled a natural 2 } and South ue bid 3 } , sho wing w eakness in diamonds and asking North to bid a 4-ard heart or spade suit if he had one. North has no go o d bid at this p oin t. Bidding 3NT with v e small diamonds rates to b e wrong and 4 | is learly out of the question. Gib 's sim ulation suggested that 3  (ostensibly sho wing four of them) w as the least of evils. South raised to 4  , and East doubled, ending the aution. East led a top diamond, and shifted to the ~ 3, w on b y North's ~ Q. Gib no w ashed the ~ J and led the | 6, whi h East  hose (wrongly) to ru. WBridge no w led the } K as East, whi h w as rued with the  J. Gib w as no w able to ash the  AK to pro due: 352 GIB: Imperfet inf orma tion in a omput a tionall y hallenging game  Q ~ A } 9 6 2 | 8  |  8 ~ | ~ 7 } | } Q J 8 5 | A J 10 9 5 3 | |  5 ~ K 8 } | | K Q 4 Kno wing the p osition exatly , gib needed v e more tri ks with North to lead. It rued a diamond, returned to the ~ A and drew East's trump with the  Q. No w a lub fored an en try to the South hand, where the ~ K pro vided the ten th tri k. Humans Gib pla y ed a 14-deal demonstration mat h against h uman w orld  hampions Zia Mahmo o d and Mi hael Rosen b erg 21 in the AAAI Hall of Champions in 1998, losing b y a total of 6.4 imp s (a 0.3 standard deviation ev en t). Early v ersions of gib also pla y ed on OKBridge, an in ternet bridge lub with some 15,000 mem b ers. 22 After pla ying thousands of deals against h uman opp onen ts of v arious lev els, gib 's ranking w as omparable to the OKBridge a v erage. It is probable that neither of these results is an aurate reetion of gib 's urren t strength. The Mahmo o d-Rosen b erg mat h w as extremely short and gib app eared to ha v e the b est of the lu k. The OKBridge in terfae has  hanged and the gib `OKb ots' no longer funtion. The p erformane gures there are th us somewhat outdated, predating v arious reen t impro v emen ts inluding all of the ideas in Setions 5{7. More in teresting information will b eome a v ailable starting in late July of 2001, when gib , paired with Gitelman and his regular partner Brad Moss, will b egin a series of 64-deal mat hes against h uman opp onen ts of v arying skill lev els. 8.2 Curren t and future w ork Reen t w ork on gib has fo used on its w eak est areas: defensiv e ardpla y and bidding. The bidding w ork has b een and on tin ues to b e primarily a matter of extending the existing bidding database, although gib 's bidding language is also b eing  hanged from Standard Amerian (a fairly natural system) to a v arian t of an artiial system alled Mosito de- v elop ed in Australia. 23 Mosito has v ery sharply dened meanings, making it ideal for use 21. Mahmo o d and Rosen b erg ha v e w on, among other titles, the 1995 Cap V olma W orld T op In vitational T ournamen t. As remark ed earlier, Rosen b erg w ould also go on after the GIB mat h to win the P ar Comp etition in whi h GIB nished 12th. 22. http://www.okbridge.om 23. Gib 's v ersion of Mosito is alled Mosito Byte . 353 Ginsber g b y a omputer program, and is an \ation" system, w orking hard to mak e the opp onen ts' bidding as diÆult as p ossible. With regard to defensiv e ardpla y , the k ey elemen ts of high lev el defense are to mak e it hard for partner to mak e a mistak e while making it easy for delarer to do so. Pro viding gib with these abilities will in v olv e an extra lev el of reursion in the ardpla y , as ea h elemen t of the Mon te Carlo sample m ust no w b e onsidered from other pla y ers' p oin ts of view, as they generate and then analyze their o wn samples. These ideas ha v e b een implemen ted but urren tly lead to small p erformane degradations (appro ximately 0.05 imp s/deal) b eause the omputational ost of the reursiv e analyses require reduing the size of the Mon te Carlo sample substan tially . As pro essor sp eeds inrease, it is reasonable to exp et these ideas to b ear signian t fruit. In 1997, Martel, a omputer sien tist himself, suggested that he exp eted gib to b e the b est bridge pla y er in the w orld in appro ximately 2003. y The w ork app ears to b e roughly on s hedule. 8.3 Other games I ha v e left essen tially un tou hed the question of to what exten t the basi te hniques w e ha v e disussed ould b e applied to games of imp erfet information other than bridge. The ideas that w e ha v e presen ted are lik ely to b e the most appliable in games where the p erfet information v arian t is tratable but omputationally  hallenging, and the as- sumption that one's opp onen ts are pla ying with p erfet information is a reasonable one. This suggests that games lik e hearts and other tri k-taking games will b e amenable to our te hniques, while games lik e p ok er (where it is essen tial to realize and exploit the fat that the opp onen ts also ha v e imp erfet information) are lik ely to need other approa hes. A kno wledgmen ts A great man y p eople ha v e on tributed to the gib pro jet o v er the y ears. In the te hnial omm unit y , I w ould lik e to thank Jonathan S haeer, Ri h Korf, Da vid Etherington, Bart Massey and the other mem b ers of irl . In the bridge omm unit y , I ha v e reeiv ed in v aluable assistane from Chip Martel, Ro d Ludwig, Zia Mahmo o d, Andrew Robson, Alan Jara y , Hans Kuijf, F red Gitelman, Bob Hamman, Eri Ro dw ell, Je Goldsmith, Thomas Andrews and the mem b ers of the re.games.bridge omm unit y . The w ork itself has b een supp orted b y Just W rite, In., b y D ARP A/Rome Labs under on trats F30602-95-1-0023 and F30602- 97-1-0294, and b y the Bo eing Compan y under on trat AHQ569. T o ev ery one who has on tributed, whether named ab o v e or not, I o w e m y deep est appreiation. App endix A. A summary of the rules of bridge W e giv e here a v ery brief summary of the rules of bridge. Readers w an ting a more omplete desription are referred to an y of the man y exellen t texts a v ailable (Shein w old, 1996). Bridge is a ard game for four pla y ers, who are split in to t w o pairs. Mem b ers of a single pair sit opp osite one another, so that North-South form one pair and East-W est the other. 354 GIB: Imperfet inf orma tion in a omput a tionall y hallenging game The de k is distributed ev enly among the pla y ers, so that ea h deal in v olv es giving ea h pla y er a hand of 13 ards. The game then pro eeds through a bidding and a pla ying phase. The pla ying phase onsists of 13 triks , with ea h pla y er on tributing one ard to ea h tri k in a lo  kwise fashion. The pla y er who pla ys rst to an y tri k is said to le ad to that tri k. The highest ard of the suit led wins the tri k (Ae is high and deue lo w), unless a trump is pla y ed, in whi h ase the highest trump wins the tri k. The p erson who leads to a tri k is free to lead an y ard he wishes; subsequen t pla y ers m ust pla y a ard of the suit led if they ha v e one, and an pla y an y ard they  ho ose if they don't. The winner of one tri k leads to the next; the p erson who leads to the rst tri k (the op ening le ader ) is determined during the bidding phase of the game. The ob jet of the ard pla y phase is alw a ys for y our partnership to tak e as man y tri ks as p ossible; there is no adv an tage to one partner's taking a tri k o v er another, and the order in whi h the tri ks are tak en is irrelev an t. After the op ening leader pla ys the rst ard to the rst tri k, the pla y er to his left plaes his ards fae up on the table so that all of the other pla y ers an see them. This pla y er is alled the dummy , and when it is dumm y's turn to pla y , dumm y's partner (who an see the partnership's om bined assets) selets the ard to b e pla y ed. Dumm y's partner is alled the de lar er and the mem b ers of the other pair are alled the defenders . The purp ose of the bidding phase is to iden tify trumps and the delarer, and also the  ontr at , whi h will b e desrib ed shortly . The op ening leader is iden tied as w ell, and is the pla y er to the delarer's left. During the bidding phase, v arious on trats are prop osed. The dealer has the rst opp ortunit y to prop ose a on trat and subsequen t opp ortunities are giv en to ea h pla y er in a lo  kwise diretion. Ea h pla y er has man y opp ortunities to suggest a on trat during this phase of the game, whi h is alled the aution . Ea h partnership is required to explain the meanings of their ations during the aution to the other side, if requested. Ea h on trat suggests a partiular trump suit (or p erhaps that there not b e a trump suit at all). Ea h pla y er suggesting a on trat is ommitting his side to winning some par- tiular n um b er of the 13 a v ailable tri ks. The minim um ommitmen t is 7 tri ks, so there are 35 p ossible on trats (ea h of 4 p ossible trumps, or no trumps, and sev en p ossible om- mitmen ts, from sev en to thirteen tri ks). These 35 on trats are ordered, whi h guaran tees that the bidding phase will ev en tually terminate. After the bidding phase is omplete, the side that suggested the nal on trat is the de laring side . Of the t w o mem b ers of the delaring side, the one who rst suggested the ev en tual trump suit (or no trumps) is the delarer. Pla y b egins with the pla y er to the delarer's left leading to the rst tri k. After the hand is omplete, there are t w o p ossible outomes. If the delaring side to ok at least as man y tri ks as it ommitted to taking, the delaring side reeiv es a p ositiv e sore and the defending side an equal but negativ e sore. There are substan tial b on uses a w arded for ommitting to taking partiular n um b ers of tri ks; in general, the larger the ommitmen t, the larger the b on us. There are small b on uses a w arded for winning tri ks ab o v e and b ey ond the ommitmen t. If the delaring side failed to honor its ommitmen t, it reeiv es a negativ e sore and the defenders reeiv e an equal but p ositiv e sore. The o v erall sore in this ase (where the 355 Ginsber g delarer \go es do wn") is generally smaller than the o v erall sore in the ase where delarer \mak es it" (i.e., honors his ommitmen t). App endix B. A new ending diso v ered b y GIB This deal o urred during a short imp mat h b et w een gib and Bridge Baron.  9 6 ~ Q J 8 5 } A Q 3 | K J 10 8  K Q J 8 7 5  4 3 ~ 9 4 3 ~ A 7 2 } 7 } J 10 6 2 | 6 4 2 | A Q 7 3  A 10 2 ~ K 10 6 } K 9 8 5 4 | 9 5 With South ( gib ) dealing at unfa v orable vulnerabilit y , the aution w en t P{2  {X{P{3NT{ all pass. (P is pass and X is double.) The op ening lead w as the  K, du k ed b y gib , and Bridge Baron no w swit hed to a small heart. East w on the ae and returned to spades, gib winning. Gib ashed all the hearts, pit hing a small lub from its hand. It then tested the diamonds, learning of the bad break and winning the third diamond in hand. It then led the } 9 in the follo wing p osition:  | ~ | } | | K J 10 8  Q  | ~ | ~ | } | } J | ? ? ? | A ? ?  10 ~ | } 9 8 | 9 When gib pit hed the ten of lubs from dumm y (it had b een aiming for this ending all along), the defenders w ere helpless to tak e more than t w o tri ks indep enden t of the lo ation of the lub queen. A t the other table, Bridge Baron let gib pla y in 2  making exatly , and gib pi k ed up 12 imp s. 356 GIB: Imperfet inf orma tion in a omput a tionall y hallenging game Referenes Adelson-V elskiy , G., Arlazaro v, V., & Donsk o y , M. (1975). Some metho ds of on trolling the tree sear h in  hess programs. A rtiial Intel ligen e , 6 , 361{371. Ba y ardo, R. J., & Mirank er, D. P . (1996). A omplexit y analysis of spae-b ounded learning algorithms for the onstrain t satisfation problem. In Pr o  e e dings of the Thirte enth National Confer en e on A rtiial Intel ligen e , pp. 298{304. Billings, D., P app, D., S haeer, J., & Szafron, D. (1998). Opp onen t mo deling in p ok er. In Pr o  e e dings of the Fifte enth National Confer en e on A rtiial Intel ligen e , pp. 493{ 499. Bla kw o o d, E. (1979). Play of the Hand with Blakwo o d . Bobbs-Merrill. Esk es, O. (1997). GIB: Sensational breakthrough in bridge soft w are. IMP , 8 (2). F rank, I. (1998). Se ar h and Planning under In omplete Information: A Study Using Bridge Car d Play . Springer-Verlag, Berlin. F rank, I., & Basin, D. (1998). Sear h in games with inomplete information: A ase study using bridge ard pla y . A rtiial Intel ligen e , 100 , 87{123. F rank, I., Basin, D., & Bundy , A. (2000). Com bining kno wledge and sear h to solv e single- suit bridge. In Pr o  e e dings of the Sixte enth National Confer en e on A rtiial Intel li- gen e , pp. 195{200. F rank, I., Basin, D., & Matsubara, H. (1998). Finding optimal strategies for imp erfet information games. In Pr o  e e dings of the Fifte enth National Confer en e on A rtiial Intel ligen e , pp. 500{507. Ginsb erg, M. L. (1993). Dynami ba ktra king. Journal of A rtiial Intel ligen e R ese ar h , 1 , 25{46. Ginsb erg, M. L., & Harv ey , W. D. (1992). Iterativ e broadening. A rtiial Intel ligen e , 55 , 367{383. Ginsb erg, M. L., & Jara y , A. (2001). Alpha-b eta pruning under partial orders. In Games of No Chan e II . T o app ear. Gr atzer, G. (1978). Gener al L atti e The ory . Birkh auser V erlag, Basel. Green blatt, R., Eastlak e, D., & Cro  k er, S. (1967). The green blatt  hess program. In F al l Joint Computer Confer en e 31 , pp. 801{810. Joslin, D. E., & Clemen ts, D. P . (1999). Squeaky wheel optimization. Journal of A rtiial Intel ligen e R ese ar h , 10 , 353{373. Koller, D., & Pfeer, A. (1995). Generating and solving imp erfet information games. In Pr o  e e dings of the F ourte enth International Joint Confer en e on A rtiial Intel ligen e , pp. 1185{1192. Levy , D. N. (1989). The million p ound bridge program. In Levy , D., & Beal, D. (Eds.), Heuristi Pr o gr amming in A rtiial Intel ligen e , Asilomar, CA. Ellis Horw o o d. Lind-Nielsen, J. (2000). BuDDy: Binary Deision Diagram pa k age. T e h. rep., Depart- men t of Information T e hnology , T e hnial Univ ersit y of Denmark, DK-2800 Lyngb y , Denmark. 357 Ginsber g Lindel of, T. (1983). COBRA: The Computer-Designe d Bidding System . Gollanz, London. Marsland, T. A. (1986). A review of game-tree pruning. J. Intl. Computer Chess Assn. , 9 (1), 3{19. MAllester, D. A. (1988). Conspiray n um b ers for min-max sear hing. A rtiial Intel ligen e , 35 , 287{310. P earl, J. (1980). Asymptoti prop erties of minimax trees and game-sear hing pro edures. A rtiial Intel ligen e , 14 (2), 113{138. P earl, J. (1982). A solution for the bran hing fator of the alpha-b eta pruning algorithm and its optimalit y . Comm. A CM , 25 (8), 559{564. Plaat, A., S haeer, J., Pijls, W., & de Bruin, A. (1996). Exploiting graph prop erties of game trees. In Pr o  e e dings of the Thirte enth National Confer en e on A rtiial Intel ligen e , pp. 234{239. S haeer, J. (1997). One Jump A he ad: Chal lenging Human Supr emay in Che kers . Springer-V erlag, New Y ork. Shein w old, A. (1996). Five We eks to Winning Bridge . P o  k et Bo oks. Smith, S. J., Nau, D. S., & Thro op, T. (1996). T otal-order m ulti-agen t task-net w ork plan- ning for on trat bridge. In Pr o  e e dings of the Thirte enth National Confer en e on A rtiial Intel ligen e , Stanford, California. Stallman, R. M., & Sussman, G. J. (1977). F orw ard reasoning and dep endeny-direted ba ktra king in a system for omputer-aided iruit analysis. A rtiial Intel ligen e , 9 , 135{196. Sterling, L., & Nygate, Y. (1990). PYTHON: An exp ert squeezer. J. L o gi Pr o gr amming , 8 , 21{40. Wilkins, D. E. (1980). Using patterns and plans in  hess. A rtiial Intel ligen e , 14 , 165{203. 358

GIB: Imperfect Information in a Computationally Challenging Game

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment