Generalized Rapid Action Value Estimation in Memory-Constrained Environments

Generalized Rapid A ction V alue Estimation in Memory-Constrained En vironmen ts Aloïs Rautureau 1 [0009 − 0007 − 8507 − 2167] , T ristan Cazena ve 2 [000 − 0003 − 4669 − 9374] , and Éric Piette 1 [0000 − 0001 − 8355 − 636 X ] 1 ICTEAM, UCLouv ain, Louv ain-la-Neuve, Belgium 2 LAMSADE, Université P aris-Dauphine, P aris, F rance Abstract. Generalized Rapid A ction V alue Estimation (GRA VE) has b een shown to be a strong v ariant within the Monte-Carlo T ree Searc h (MCTS) family of algorithms for General Game Pla ying (GGP). How- ev er, its reliance on storing additional win/visit statistics at eac h node mak es its use impractical in memory-constrained environmen ts, thereby limiting its applicability in practice. In this pap er, w e introduce the GRA VE 2 , GRA VER and GRA VER 2 algorithms, which extend GRA VE through t wo-lev el search, node recycling, and a combination of both tech- niques, resp ectiv ely . W e show that these enhancements enable a drastic reduction in the n um b er of stored no des while matc hing the pla ying strength of GRA VE. Keyw ords: Mon te-Carlo T ree Searc h · Memory constrain ts · General Game Playing. 1 In tro duction Mon te-Carlo T ree Search (MCTS) [7] is a family of asymmetric partial tree search algorithms that ha v e prov en successful in a wide range of decision-making tasks, ev en in domains where domain-sp eciﬁc kno wledge is scarce or diﬃcult to en- co de sym b olically . These methods, ho wev er, are typically developed under the implicit assumption that suﬃcient memory is av ailable to store an ever-gro wing asymmetric search tree. While this assumption is largely v alid on most mo d- ern desktop and serv er-class hardw are, it signiﬁcan tly limits the applicability of MCTS-based agents on memory-constrained platforms suc h as micro con trollers or smartphones. This limitation is particularly pronounced for Generalized Rapid A ction V alue Estimation (GRA VE) [6], which improv es the pla ying strength of agents based on MCTS in sev eral games b y augmen ting eac h no de with additional win/visit statistics for the All-Mo v es As First (AMAF) selection policy [12]. These additional statistics introduce a constant-factor increase in no de memory usage, making GRA VE particularly sensitive to memory constraints. As a re- sult, enabling GRA VE-level performance while drastically reducing the num b er of stored no des b ecomes a key challenge for deploying strong artiﬁcial agents in en vironments where memory is a scarce resource. 2 A. Rautureau et al. Bey ond practical deploymen t concerns, this challenge also aligns naturally with questions in cognitive mo deling [23]. Best-ﬁrst tree searc h algorithms ha ve b een prop osed as mo dels of human forward planning and short-term memory [9], y et it is unrealistic to assume that human decision mak ers can retain more than a few dozen decision states simultaneously [3] [18]. F rom this p erspective, limiting the num ber of stored no des is not merely an engineering constrain t but a desirable mo deling prop ert y . Developing GRA VE-like algorithms that preserve strong b est-ﬁrst b eha vior and playing strength while op erating under a strict b ound on the num b er of nodes, therefore, con tributes both to the design of eﬃcien t agents and to the algorithmic mo deling of high-level features of human decision making in games. In this setting, no des are treated as atomic units of information rather than as b yte-enco ded data structures, shifting the fo cus from ra w memory usage to principled control o ver informational complexity . T o address this challenge, we introduce GRA VE 2 and GRA VER (Generalized Rapid A ction V alue Estimation with no de Recycling), tw o v arian ts of GRA VE designed for low memory budgets while matching the playing strength of the original algorithm. GRA VE 2 extends GRA VE using a tw o-level search tree ap- proac h [5], while GRA VER incorporates a no de recycling sc heme [22]. W e further prop ose GRA VER 2 , whic h combines the tw o-lev el search of GRA VE 2 with the no de recycling mec hanism of GRA VER. T o our knowledge, this is the ﬁrst im- plemen tation of an MCTS algorithm that integrates b oth no de recycling and t wo-lev el search within a single framework. W e ev aluate these algorithms against GRA VE and corresp onding UCT v ari- an ts, comparing their relative pla ying strength under strict memory constraints. In particular, we measure the num b er of stored no des required to reach equal pla ying strength, the num b er of play outs p erformed, and the empirically mea- sured memory fo otprin t of our implementation. The results pro vide insight into the trade-oﬀs betw een memory usage and p erformance, and op en new persp ec- tiv es for the developmen t of memory-b ounded MCTS metho ds. 2 Related work MCTS algorithms typically diﬀer in their selection and play out p olicies. The standard selection p olicy , UCT, balances exploration of the searc h tree with exploitation of no des with high estimated v alue using the Upper Conﬁdence Bound (UCB) form ula: arg max n ( R ( n ) V ( n ) + C s log V ( N ) V ( n ) (1) where n is a node, N its parent, V and R denote the n umber of visits and the mean sampled reward of a no de, resp ectiv ely , and C is the exploration parameter. Lo w v alues of C fav or exploitation, while high v alues encourage exploration. The v alue C = √ 2 2 is known to provide go od results when sampled rew ards lie in the range [0 , 1] [13], and is used throughout this pap er. GRA VE in Memory-Constrained Environmen ts 3 GRA VE [6] is a generalization of the Rapid A ction V alue Estimation (RA VE) [11] algorithm. It relies on the All-Mov es As First (AMA F) heuristic [4, 12] as part of its selection p olicy , and linearly in terp olates b etw een the exploitation of the mean sampled reward of a no de n and the AMAF v alue of a mov e m stored in a reference no de ref higher up the tree. The interpolation is con trolled by a parameter β n,m deﬁned as: β n,m = AM AF ref ,m AM AF ref ,m + V ( n ) + ( bias × AM AF ref ,m × V ( n )) (2) W e then select the most promising c hild no de using the form ula: arg max n ((1 − β n ) × R ( n ) V ( n ) + β n × AM AF ref ,m ) (3) The bias parameter controls the exten t to whic h AMAF heuristics are fa v ored o ver direct exploitation of node v alues, while a reference threshold sp eciﬁes the minim um num b er of visits required for a no de’s AMAF statistics to b e reused deep er in the search. GRA VE has b een shown to outperform traditional UCT and RA VE in a n umber of games with p erm utable mov es, beneﬁting from the additional infor- mation provided by AMAF heuristics. In domains where strong heuristics are not a v ailable, such as General Game Pla ying (GGP), it often constitutes a stronger alternativ e to other MCTS algorithms [14, 15, 21, 24]. Lik e most MCTS v ariants, GRA VE requires storing the explored search tree in memory to main tain v alue estimates for each no de. This pap er fo cuses on t wo complementary approaches for reducing the size of the stored tree while minimizing the impact on pla ying strength. T w o-level searc h algorithms oﬀer one such approach, eﬀectively squaring the n umber of pla youts performed while increasing the num b er of stored no des b y only a constan t factor of t wo. Similar techniques ha ve b een successfully applied to Pro of-Num ber (PN) Search, where they are kno wn as PN 2 [2, 5]. These algorithms operate by p erforming a standard search from the ro ot no de ( top-level se ar ch ), but replace the usual leaf expansion step with a second searc h initiated from the newly expanded leaf no de ( se c ond-level se ar ch ). The second-lev el search tree is then discarded, and the ro ot of this secondary searc h is added as a leaf to the top-level tree, with the collected v alues propagated up ward, t ypically in a batch. This approach allo ws the algorithm to gather more accurate estimates b efore backpropagation, while enabling the reuse of a large fraction of the a v ailable no de budget. While tw o-lev el searc h algorithms typically split the no de budget ev enly b et w een the t w o levels, they can be generalized b y in troducing a parameter λ ∈ [0 , 1] that determines the fraction of the no de budget allo cated to the second- lev el searc h tree. Let N sec = N ∗ λ and N top = N − N sec denote the num b er of no des assigned to the second- and top-lev el trees, resp ectiv ely . A t wo-lev el searc h with total node budget N and parameter λ will therefore p erform ( λ − λ 2 ) ∗ N 2 4 A. Rautureau et al. iterations in practice, compared to only N iterations for a single-level search using the same no de budget. Orthogonal to t wo-lev el approac hes, no de recycling was ﬁrst introduced for Information Set MCTS (ISMCTS) [8, 22]. The core idea is to reuse the memory of nodes that ha ve b een least recently accessed in order to allo w contin ued ex- pansion under a ﬁxed memory budget. Such nodes are considered less relev an t to the curren t search, as MCTS algorithms inherently revisit promising no des more frequen tly . In this approac h, a ﬁxed p ool of N no des is allo cated, initially containing empt y en tries. While empt y en tries remain, newly expanded nodes are added to the p ool in the same manner as in standard MCTS. Once the p ool is full, the least recen tly used no de is recycled and replaced by the newly expanded no de. Throughout this pap er, we note P the n umber of play outs p erformed. T o eﬃcien tly identify the least recen tly accessed no de, a Least Recen tly Used (LR U) cache is maintained during the search. When a no de is visited during the selection phase, it is temp orarily remov ed from the LR U and reinserted during bac kpropagation, in the rev erse order of remov al. This maintains the crucial in v ariant that only leaf no des can app ear at the front of the LRU, ensuring that in ternal no des are nev er recycled, which w ould otherwise render their c hildren unreac hable. 3 Metho ds Fig. 1. Relationships b et ween GRA VE, GRA VE 2 , GRA VER and GRA VER 2 . N indi- cates the total n umber of no des stored, P the total n umber of play outs p erformed, and N sec the no des stored in the second-level tree. W e introduce tw o v ariants of the GRA VE algorithm fo cusing on memory- constrained environmen ts, resp ectiv ely incorp orating tw o-level search (GRA VE 2 ) and no de recycling (GRA VER). These v arian ts are ﬁnally combined into GRA VER 2 , whic h can utilize no de recycling in b oth the top and second-level trees to further increase the eﬃcien t usage of the ﬁxed memory budget. GRA VE in Memory-Constrained Environmen ts 5 These v arian ts can b e seen as generalizations of one another (Figure 1), pro viding more parameters to ﬁne tune the playing strength and memory usage of the original GRA VE algorithm. 3.1 GRA VE 2 GRA VE 2 is a tw o-lev el adaptation of the GRA VE algorithm that p erforms a second-lev el search when expanding leaf nodes. T w o-level search algorithms typically isolate the top-level and second-level searc h trees, propagating only the v alues of the second-level ro ot back to the top- lev el tree. In the case of GRA VE, how ever, AMAF v alues stored in the top-lev el tree can b e reused within the second-level search. W e refer to this mechanism as forw ard sharing (see Figure 2). F orward sharing enables the second-lev el search to exploit information gathered from previous second-level searches to guide exploration earlier, even though the corresp onding trees hav e b een discarded. This property makes GRA VE 2 particularly w ell suited to preserving pla ying strength while further reducing the algorithm’s memory fo otprin t. Fig. 2. F orward no de sharing in GRA VE 2 . The selection path in the top-lev el tree is ﬁxed while the second-lev el search is running, and the latter may use AMAF v alues aggregated in the top-level tree to guide its exploration as long as the second-level ro ot has few er visits than the parameterized reference threshold. V alues obtained from pla youts in the second-level tree are bac kpropagated to the top-level tree after each iteration, rather than in a single batc h once the second-level searc h terminates. If a c hild of the currently referenced no de exceeds the reference threshold, it b ecomes the new referenced no de. P erforming a second-lev el search b efore bac king up AMAF v alues pro duces more reliable estimates, thereby reducing the amount of noise propagated up the tree when expanding a top-level no de. By adjusting the λ parameter, we 6 A. Rautureau et al. can control the trade-oﬀ b et ween p ermanen tly stored information, holding cru- cial AMAF statistics for the second-lev el search, and the quality of information obtained for newly expanded no des. In time-b ounded scenarios, which represent the most common use case for game-pla ying agents, tw o-lev el search introduces a dra wbac k by limiting the an ytime b eha vior of MCTS algorithms. The searc h can now only terminate every λ × N play outs rather than after eac h individual play out. T o mitigate this issue and av oid sp ending computation on unpromising no des, an early termination mec hanism can b e emplo yed. A related idea was applied in the PN ² algorithm, where the second-lev el search is halted when the proof n um b er of the second-lev el ro ot exceeds twice its dispro of num ber [25]. 3.2 GRA VER The data structure in tro duced for no de recycling—based on a left-child right- sibling representation of the tree com bined with an in trusive ﬁrst-in ﬁrst-out LR U cac he—can be directly applied to the GRA VE algorithm. W e refer to the resulting metho d as GRA VER (GRA VE with no de Recycling) throughout the remainder of this pap er. The main diﬀerence with UCT lies in the fact that leaf no des in GRA VE ma y con tain substan tial information in the form of AMAF statistics accum ulated from previous play outs, whic h could prov e useful later in the search. Recycling suc h nodes may therefore hav e a negative impact on the playing strength of GRA VER. On the other hand, GRA VE’s selection p olicy can assign meaningful v alues to unexpanded or recycled nodes through AMAF heuristics. While UCT con ven tionally assigns a v alue U C B ( n ) = ∞ to unexpanded nodes, ensuring they are alwa ys selected for expansion, GRA VE instead relies solely on stored AMAF v alues for such no des, since a visit count of zero forces β m = 1 . This b eha vior helps prev en t pathological cycles in which the algorithm rep eatedly expands and recycles the same no des. 3.3 GRA VER 2 No de recycling and tw o-lev el search are orthogonal metho ds, enabling the use of b oth of them simultaneously . W e dub the resulting algorithm GRA VER 2 . Re- cycling no des within the top-level tree further improv es memory eﬃciency at the cost of p oten tially discarding additional information when leaf no des are recycled. In the second-level searc h, additional play outs can improv e the qual- it y of v alue estimates while maintaining a ﬁxed no de budget. Empirically , no de recycling schemes exhibit a plateau in playing strength as the ratio b et ween per- formed pla youts and stored no des increases. This allows maximizing the playing strength of the algorithm while b ounding both memory usage and the n umber of pla youts in the second-lev el search. W e use the notation P top (resp. P sec ) to denote the num b er of play outs p er- formed in the top-level (resp. second-level) tree. The total n umber of play outs GRA VE in Memory-Constrained Environmen ts 7 is now decorrelated from the num b er of no des stored and the λ parameter, with P = P top × P sec . 4 Exp erimen tal results W e ev aluate the relativ e playing strength of the prop osed algorithms on the game of Go using a 9 × 9 board. The GRA VE algorithm was ﬁrst tuned against a UCT baseline with an exploration constan t of 0 . 7 to determine suitable v alues for the bias and reference threshold parameters for Go 9 × 9 . These parameters w ere ﬁxed to 10 − 2 and 25 , resp ectiv ely , for all subsequen t exp erimen ts. All pla youts are guided b y the Mo ve A v erage Sampling T ec hnique (MAST) [10] using an ϵ -greedy sampling strategy with ϵ = 0 . 4 . MAST statistics are deca yed b y a factor of 0 . 2 b et w een successive turns within the same game. All algorithms are ev aluated against a baseline GRA VE implementation with P = N = 10 , 000 . Eac h data p oint is computed from 500 games. Win rates are rep orted together with conﬁdence interv als calculated using the Agresti–Coull metho d [1]. 4.1 T w o-lev el search Fig. 3. Winrates of GRA VE 2 with and without forward sharing and UCT ² against GRA VE with P = N = 10 , 000 . The dotted lines indicate the 95% conﬁdence interv al of the winrate of GRA VE against itself, represen ting the region in which compared algorithms can b e considered to ha ve playing strength equal to GRA VE. W e ﬁrst ev aluate the p erformance of GRA VE 2 relativ e to GRA VE. The rel- ativ e win rates for λ = 0 . 5 are rep orted as a function of the n umber of nodes stored b y the tw o-level search algorithms in Figure 3. GRA VE 2 without forward sharing ac hieves pla ying strength comparable to GRA VE starting from a total no de p ool size of 240 ( N top = N sec = 120 , cor- resp onding to P = 14 , 400 ). F orward sharing do es not yield a statistically sig- niﬁcan t impro vemen t in ov erall playing strength. Ho wev er, its eﬀect is suﬃcien t 8 A. Rautureau et al. No des λ 0 . 2 0 . 4 0 . 5 0 . 6 0 . 8 160 31 . 7% 33 . 7% 40 . 0 % 31 . 4% 26% 200 41 . 1% 48 . 4 % 46 . 6% 47 . 2% 33 . 8% 240 51 . 8% 55 . 6 % 54 . 2% 52 . 2% 39 . 2% 280 51 . 2% 57 . 7% 58 . 9 % 52 . 4% 43% 320 59 . 3% 62 . 7 % 60 . 3% 61 . 5% 48 . 6% 360 63 . 5% 64 . 5% 66 . 7 % 63 . 7% 53 . 2% 400 62 . 5% 69 . 4 % 66 . 5% 63 . 9% 58 . 4% 440 67 . 5% 65 . 3% 68 . 1 % 66 . 7% 59 . 6% T able 1. Winrate of GRA VE 2 with forward sharing for v arying no de p ool sizes and λ v alues against GRA VE running for P = 10 , 000 . Conﬁdence interv als are omitted for clarit y , and lie in the range [4 . 0% , 4 . 4%] . that w e cannot reject the hypothesis of equal playing strength with only 200 no des when forward sharing is enabled ( N top = N sec = 100 , corresp onding to P = 10 , 000 ). In the remainder of this paper, an y reference to GRA VE 2 assumes that forw ard sharing is enabled. F or comparison, UCT 2 requires more than 440 no des to reach a similar lev el of pla ying strength, p erforming more than four times as many play outs ( P ≥ 48 , 400 ). It is imp ortan t to note, how ev er, that storing AMAF statistics in eac h node results in a substantially larger memory footprint for GRA VE-based metho ds. In Go 9 × 9 , each GRA VE no de can store up to 82 AMAF entries (one p er in tersection plus the pass mov e), corresp onding in our implementation to an additional 82 × 8 = 656 bytes p er node compared to UCT. Finally , we assess the impact of the λ parameter, which controls the frac- tion of the no de p o ol allo cated to the second-level search, b y v arying λ ov er the range [0 . 2 , 0 . 8] for 160 ≤ N ≤ 440 . The results rep orted in T able 4.1 con- ﬁrm that λ = 0 . 5 yields the best o verall p erformance. Reducing N top ( λ > 0 . 5 ) generally degrades p erformance, while reducing N sec ( λ < 0 . 5 ) do es not pro- duce statistically signiﬁcant impro vemen ts and increases the num b er of play outs p erformed. 4.2 No de recycling No de recycling is less eﬀective at reducing the memory fo otprin t of the search tree while maintaining equiv alen t playing strength. GRA VER matc hes the p er- formance of GRA VE only from appro ximately N ≈ 1 , 536 , with b oth algorithms running for 10 , 000 pla youts. Ho wev er, node recycling schemes preserve the anytime prop ert y of the MCTS algorithm, which may be crucial in certain scenarios. Detailed comparative win rate results are sho wn in Figure 4. GRA VE in Memory-Constrained Environmen ts 9 Fig. 4. Winrates of GRA VER and UCT with no de recycling against GRA VE with P = 10 , 000 . The no de p ool size used is presented on a logarithmic scale. The red dashed line indicates the threshold N = 10 , 000 , b eyond whic h all expanded no des can b e stored and no de recycling no longer takes eﬀect. 4.3 No de recycling in tw o-lev el search W e ev aluate the p erformance of combining no de recycling with t wo-lev el searc h through tw o separate exp erimen ts. In b oth cases, we set λ = 0 . 5 and v ary the n umber of additional play outs as a ratio of the no de p ool size. A ratio of 1 . 0 corresp onds to the setting without no de recycling. Fig. 5. Winrate of GRA VER 2 against GRA VE ( P = 10 , 000 ), v arying the ratio of pla youts to stored nodes in the top-level tree ( Left ) and second-level tree ( Right ). W e ﬁrst compare GRA VE and GRA VER 2 when no de recycling is applied only to the top-lev el tree (right panel of Figure 5). Our ﬁrst observ ation is that com bining tw o-level searc h with no de recycling improv es playing strength under memory constrain ts. In particular, we are able to reduce the no de po ol size to 160 no des while maintaining p erformance comparable to baseline GRA VE. 10 A. Rautureau et al. This threshold is noteworth y b ecause it preven ts the tree from expanding all legal mov es at the ro ot while still reaching depth ≥ 1 . This indicates that the algorithm eﬀectively prunes less promising branches in order to concen trate its limited memory budget on more relev an t parts of the search space. W e then examine the eﬀect of applying node recycling within the second-lev el tree (right panel of Figure 5). The impact on playing strength is not statistically signiﬁcan t, except at very small node p ool sizes ( N ≤ 160 ). Fig. 6. Winrate of GRA VER 2 against GRA VE ( P = 10 , 000 ), v arying the ratio of pla youts to stored nodes in b oth the top and second-lev el trees. The v alues are given with a conﬁdence in terv al of ± 4 . 2 . Finally , we ev aluate the p erformance of GRA VER 2 b y v arying the ratio of pla youts to stored no des in both the top- and the second-lev el trees (Figure 6). Increasing P sec app ears to be less eﬀective than increasing P top , and may even degrade pla ying strength for larger v alues of N . One p ossible explanation is that results bac kpropagated from simulations guide b y MAST exhibit high v ariance, which can negatively aﬀect estimate qual- it y as the num b er of play outs increases. In this setting, the top-level tree may not accumulate suﬃcient information to comp ensate for the loss of ﬁner-grained statistics in the second-lev el tree, leading to reduced ov erall p erformance. GRA VE in Memory-Constrained Environmen ts 11 5 Conclusion W e presented tw o orthogonal approac hes to reduce the memory usage of GRA VE while maintaining playing strength in Go 9 × 9 . T w o-lev el search (GRA VE 2 ) ac hieves the largest reduction, reac hing equiv alen t p erformance with as little as 2% of the original no de count. No de recycling (GRA VER) also reduces the no de p ool size, though more moderately (approximately 15 . 36% ), while preserving the an ytime prop ert y of MCTS. Com bining b oth approaches (GRA VER 2 ) yields an even greater reduction in stored nodes, achieving the performance of GRA VE with N = 160 , P top = 160 and P sec = 80 , corresp onding to 12 , 800 total play outs. This conﬁguration re- duces memory usage to approximately 1 . 6% while maintaining equiv alent playing strength. Applying no de recycling within the second-lev el tree do es not pro duce a statistically signiﬁcan t improv ement b ey ond these results. Within the scop e of our exp erimen ts, we also observe that a sharing factor of λ = 0 . 5 in tw o-level search provides the best o verall p erformance. Deviating from this v alue either reduces playing strength or increases the n umber of play outs without yielding meaningful strength gains. 5.1 F uture w ork Although this w ork substantially reduces the memory usage of GRA VE and in- tro duces several tunable parameters, further reductions may b e possible through the integration of stronger heuristics and heuristic-based pruning strategies [19]. Suc h approaches could shift the balance tow ard search-ligh t or partially search- less metho ds [16, 17], while retaining searc h as a comp onen t of decision making. F urther exp erimen ts should extend the ev aluation to a wider range of games (e.g. on Ludii [20]) and explore additional conﬁgurations of pla yout budgets, no de po ol sizes, and GRA VE parameters. An op en question is whether the com- bination of tw o-level search and node recycling yields comparable or greater impro vemen ts in other MCTS v arian ts, such as HRA VE [24] or ISMCTS [8]. References 1. Agresti, A., Coull, B.A.: Approximate is better than “exact” for in terv al estimation of binomial prop ortions. The American Statistician 52 (2), 119–126 (1998) 2. Allis, L.: Searching for Solutions in Games and Artiﬁcial Intelligence (1994) 3. A tkinson, R.C., Shiﬀrin, R.M.: Human Memory: A Prop osed System and its Con- trol Processes. In: Psyc hology of Learning and Motiv ation, Psychology of Learning and Motiv ation, vol. 2, pp. 89–195. Elsevier (1968) 4. Bouzy , B., Helmstetter, B.: Monte-carlo go developmen ts. In: A dv ances in Com- puter Games, Many Games, Many Challenges, 10th International Conference, ACG 2003, Graz, Austria. IFIP, vol. 263, pp. 159–174. Kluw er (2003) 5. Breuk er, D.: Memory versus search in games. Ph.D. thesis (1998) 6. Cazena ve, T.: Generalized rapid action v alue estimation. In: Pro ceedings of the T wen t y-F ourth International Joint Conference on Artiﬁcial Intelligence, IJCAI 2015. pp. 754–760 (2015) 12 A. Rautureau et al. 7. Coulom, R.: Eﬃcient Selectivit y and Backup Op erators in Monte-Carlo T ree Searc h. In: Computers and Games, 5th International Conference, CG 2006. Lecture Notes in Computer Science, vol. 4630, pp. 72–83. Springer (2006) 8. Co wling, P .I., Po wley , E.J., Whitehouse, D.: Information Set Monte Carlo T ree Searc h. IEEE T rans. Comput. In tell. AI Games 4 (2), 120–143 (2012) 9. De Gro ot, A.D.: Though t and Choice in Chess. No ord-Holl. Uitg. (1946) 10. Finnsson, H., Björnsson, Y.: Learning Simulation Control in General Game-Playing Agen ts. In: Pro ceedings of the T w ent y-F ourth AAAI Conference on Artiﬁcial In- telligence, AAAI 2010, USA, July 11-15, 2010. pp. 954–959. AAAI Press (2010) 11. Gelly , S., Silv er, D.: Mon te-Carlo tree searc h and rapid action v alue estimation in computer Go. Artif. In tell. 175 (11), 1856–1875 (2011) 12. Helm b old, D.P ., Park er-W o od, A.: All-Mov es-As-First Heuristics in Monte-Carlo Go. In: Pro ce edings of the 2009 International Conference on Artiﬁcial In telligence, ICAI 2009, July 13-16, 2009, 2 V olumes. pp. 605–610. CSREA Press (2009) 13. K o csis, L., Szep esvári, C.: Bandit Based Monte-Carlo Planning. In: Machine Learn- ing: ECML 2006, 17th Europ ean Conference on Mac hine Learning, 2006, Pro ceed- ings. Lecture Notes in Computer Science, v ol. 4212, pp. 282–293. Springer (2006) 14. K oriche, F., Lagrue, S., Piette, É., T abary , S.: Constraint-based symmetry detec- tion in general game playing. In: Pro ceedings of the T wen ty-Sixth In ternational Join t Conference on Artiﬁcial In telligence, IJCAI-17. pp. 280–287 (2017) 15. K oriche, F., Lagrue, S., Piette, É., T abary , S.: W o odsto c k : Un programme-joueur générique dirigé par les contrain tes sto c hastiques. Revue D’Intelligence Artiﬁcielle (RIA) 31 (3), 307–336 (2017) 16. Mandziuk, J.: T o wards Cognitively Plausible Game Playing Systems. IEEE Com- put. Intell. Mag. 6 (2), 38–51 (2011) 17. McIlro y-Y oung, R., Sen, S., Kleinberg, J.M., Anderson, A.: Aligning Superhuman AI with Human Behavior: Chess as a Mo del System. In: KDD ’20: The 26th ACM SIGKDD Conference on Knowledge Discov ery and Data Mining, Virtual Even t, 2020. pp. 1677–1687. A CM (2020) 18. Miller, G.A.: The magical num ber seven, plus or minus tw o: Some limits on our capacit y for pro cessing information. Psyc hological Review 63 (2), 81–97 (1956) 19. Moriart y , D.E., Miikkulainen, R.: Evolving Neural Netw orks to F o cus Minimax Searc h. In: Pro ceedings of the 12th National Conference on Artiﬁcial Intelligence, V olume 2. pp. 1371–1377. AAAI Press / The MIT Press (1994) 20. Piette, É., So emers, D.J.N.J., Stephenson, M., Sironi, C.F., Winands, M.H.M., Bro wne, C.: Ludii – the ludemic general game system. In: ECAI 2020. F ron tiers in Artiﬁcial Intelligence and Applications, vol. 325, pp. 411–418 (2020) 21. Piette, É.: Une Nouv elle Approche Au General Game Pla ying Dirigée Par Les Con traintes. Ph.D. thesis, Artois Universit y , Arras, F rance (2016) 22. P owley , E.J., Cowling, P .I., Whitehouse, D.: Memory Bounded Mon te Carlo T ree Searc h. In: Pro ceedings of the Thirteen th AAAI Conference on Artiﬁcial Intelli- gence and Interactiv e Digital Entertainmen t. pp. 94–100. AAAI Press (2017) 23. Rautureau, A., Éric Piette: Cogniplay: a work-in-progress human-lik e model for general game playing (2025), 24. Sironi, C.F., Winands, M.H.M.: Comparison of rapid action v alue estimation v ari- an ts for general game playing. In: IEEE Conference on Computational Intelligence and Games, CIG 2016. pp. 1–8. IEEE (2016) 25. Winands, M., Uiterwijk, J.: PN, PN2 and PN* in lines of action. In: The CMG Sixth Computer Olympiad Computer-Games W orkshop Pro ceedings. T echnical Rep orts in Computer Science CS 01-04, Univ ersiteit Maastric ht (2001)

Generalized Rapid Action Value Estimation in Memory-Constrained Environments

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment