On the Algorithmic Complexity of the Mastermind Game with Black-Peg Results

In this paper, we study the algorithmic complexity of the Mastermind game, where results are single-color black pegs. This differs from the usual dual-color version of the game, but better corresponds to applications in genetics. We show that it is N…

Authors: Michael T. Goodrich

On the Algorithmic Complexity of the Mastermind Game with Black-Peg   Results
On the Algorithmic Comple xity of the Mastermind Game with Black-Pe g Results Michael T . Goodrich Dept. of Computer Science and Secure Computing and Networking Center Uni v ersity of California, Irvine http://www.ics.uci.edu/ ˜ goodrich/ Abstract In this paper , we study the algorithmic complexity of the Mastermind game, where results are single-color black pegs. This dif fers from the usual dual-color version of the game, but better corresponds to applications in genetics. W e show that it is NP-complete to determine if a sequence of single-color Mastermind results hav e a satisfying vector . W e also show ho w to de vise efficient algorithms for discov ering a hidden vector through single-color queries. Indeed, our algorithm improv es a pre vious method of Chv ´ atal by almost a factor of 2 . 1 Intr oduction Mastermind [2, 4] is a g ame played between tw o players—a codemaker and a codebr eaker —using colored pegs. (See Figure 1.) V ie wed mathematically , Mastermind is abstracted as a game where the codemaker selects a plaintext vector , V , of length N , whose elements are selected from an alphabet of size K . For consistency with the board game, the members of this alphabet are often referred to as “colors. ” The codemaker and codebreaker both kno w the v alues of N and K , and game play consists of the codebreaker repeatedly making guesses, V 1 , V 2 , . . . , about the identity of V . For each guess V i , the codemaker provides a score on ho w well V i matches V . In double-count Mastermind, which is the standard version based on the board game, this score consists of a pair of tw o numbers: 1 Figure 1: The Mastermind game. The four large pegs in the middle are used for guessing. The four smaller peg locations on the left are used to score each guess—with black-peg and white-pe g scores. (Image, Copyright 2009, Michael T . Goodrich. Used with permission.) • A black count, b ( V i ) , which is the number of elements in V i and V that match in both value and location. That is, b ( V i ) = |{ j : V i [ j ] = V [ j ] }| . • A white count, w ( V i ) , which is the number of elements in V i that appear in V b ut in dif ferent locations than their locations in V i . That is, letting π denote an arbitrary permutation, w ( V i ) = max π |{ j : V i [ π ( j )] = V [ j ] }| − b ( V i ) . In single-count Mastermind, which has been less studied, the codebreaker is giv en only the black- peg count, b ( V i ) , for each guess, V i . (Note that it is impossible to solve the problem giv en only white-count scores.) The goal is for the codebreak er to disco ver V using a small a number of guesses. 1.1 Pr evious Related W ork The original Mastermind game was in vented in 1970 by Meirowitz, as a board game having holes for sequences of length N = 4 and K = 6 colored pegs. Knuth [4] subsequently showed that this instance of the Mastermind game can be solved in fiv e guesses or less. Chv ´ atal [2] studied 2 the combinatorics of general Mastermind, sho wing that it can be solv ed in polynomial time, in the K ≥ N case, using 2 N d log K e + 4 N guesses, and Chen et al. [1] showed ho w this bound can be improv ed, in this case, to 2 N d log N e + 2 N + d K / N e + 2 guesses. Stuckman and Zhang [6] sho wed that it is NP-complete to determine if a sequence of guesses and responses in general double-count Mastermind is satisfiable. 1.2 Our Results In this paper we study the single-count (black-pe g) version of Mastermind . Such a scenario is moti v ated from genomic data, where a genomic database owner , Dav e, can “play” a type of Mas- termind game with a genomic query string Q –for which a querier thinks that he is querying Dav e in a priv ac y-preserving manner—b ut instead Dave is discovering the full identity of Q . That is, Q is iterati vely compared with strings provided by Dav e (assumed to be from his database, D ), with each done in a pri v acy-preserving online manner , so that all is learned from each comparison is the score measuring the similarity of the two strings, with the (black-peg) score for each string comparison being rev ealed to the database owner (and possibly also the owner of Q ) before the next comparison be gins. W e begin our discussion by showing that, in fact, the problem of determining whether a se- quence of Mastermind responses has a v alid solution is NP-complete, e ven if each response is a single-count response. In addition to the NP-completeness result, we sho w that an arbitrary query string, Q , of length N from an alphabet of size K , can be discov ered with N d log K e + d (2 − 1 /K ) N e + K guesses, each of which is a single-count response. This impro ves the Chv ´ atal upper bound by almost a factor of 2 . 2 Black-P eg Mastermind is NP-Complete As mentioned above, Stuckman and Zhang [6] sho w that double-count Mastermind satisfiability is NP-complete. Unfortunately , their proof, which is based on a reduction from the well-known V ertex Cover problem, does not translate into a proof that single-count Mastermind satisfiability is NP-complete. So we pro vide such a proof in this section. The implications of this f act are that satisfying an arbitrary sequence of Mastermind queries should be considered computationally infeasible. In the single-count Mastermind satisfiability problem, we are giv en a sequence of Mastermind queries, V 1 , V 2 , . . . , V N , and the responses, b ( V 1 ) , b ( V 2 ) , . . . , b ( V N ) , each of which is said to report 3 the number of indices such that the characters in a V i and an unknown v ector , V , at these locations match. W e are asked to determine if there indeed exists a v ector V that satisfies all of these responses. Theorem 1: Single-count Mastermind satisfiability is NP-complete. Proof: It is easy to see that single-count Mastermind satisfiability is in NP . F or e xample, we could nondeterministically guess a vector V and then test in polynomial time whether it satisfies all the responses, b ( V 1 ) , b ( V 2 ) , . . . , b ( V N ) . T o prov e that single-count Mastermind satisfiability is NP-hard, we provide a reduction from 3-Dimensional Matching (3DM), which is a well-known NP-complete problem (e.g., see [3]). In the 3DM problem, we are gi ven three sets, X = { x 1 , . . . , x n } , Y = { y 1 , . . . , y n } , and Z = { z 1 , . . . , z n } , of n elements each. In addition, we are gi ven a set T of m triples, { ( x i 1 , y j 1 , z k 1 ) , . . . , ( x i m , y j m , z k m ) } , whose elements are respecti vely taken from the three sets, X , Y , and Z . The problem is to deter- mine if there is a subset of triples such that each element in X , Y , and Z appears in exactly one triple in this subset. Suppose, then, that we are giv en an instance of the 3DM problem, as described abov e. W e consider the unkno wn vector , V , to consist of the following sequence of v ariables: ( X 1 , . . . , X n ; Y 1 , . . . , Y n ; Z 1 , . . . , Z n ; T 1 , . . . , T m ) , where the semi-colons are used for the sake of notation to separate the four sections in the unknown vector , V . W e perform our reduction by constructing a sequence of guess vectors, V 1 , V 2 , . . . , V N , together with their responses, b ( V 1 ) , b ( V 2 ) , . . . , b ( V N ) , so that there is a satisfying vector V for these responses if and only if there is a solution to the gi ven instance of the 3DM problem. Our construction begins by setting the number of colors, K , to be m + 1 . Intuiti vely , there is a color associated with each triple in T , plus a “null” color , φ , which is guaranteed not to appear in our unkno wn vector , V . W e begin our sequence of queries with three special “enforcer” queries: V 1 = ( φ, . . . , φ ; φ, . . . , φ ; φ, . . . , φ ; φ, . . . , φ ) , which has response b ( V 1 ) = 0 , V 2 = ( φ, . . . , φ ; φ, . . . , φ ; φ, . . . , φ ; 1 , 1 , . . . , 1) , 4 which has response b ( V 2 ) = n , and V 3 = ( φ, . . . , φ ; φ, . . . , φ ; φ, . . . , φ ; 0 , 0 , . . . , 0) , which has response b ( V 3 ) = m − n . Intuitiv ely , V 1 enforces the fact that the null color , φ , cannot appear in the unkno wn vector , V 2 enforces a counting rule that exactly n of the T i ’ s will be set to 1 , and V 3 enforces a counting rule that the remaining m − n of the T i ’ s will be set to 0 . For each triple, T s = ( x i s , y j s , z k s ) , we construct three query vectors, as follo ws. V s, 1 = ( φ, . . . , φ, s, φ, . . . , φ ; φ, . . . , φ ; φ, . . . , φ ; φ, . . . , φ, 0 , φ, . . . , φ ) , where the s is in position i s in the first group and the 0 is in position s in the fourth group. This vector has response, b ( V s, 1 ) = 1 . Next, we construct V s, 2 = ( φ, . . . , φ ; φ, . . . , φ, s, φ, . . . , φ ; φ, . . . , φ ; φ, . . . , φ, 0 , φ, . . . , φ ) , where the s is in position j s in the second group and the 0 is in position s in the fourth group. This vector has response, b ( V s, 2 ) = 1 . Finally , we construct V s, 3 = ( φ, . . . , φ ; φ, . . . , φ ; φ, . . . , φ, s, φ, . . . , φ ; φ, . . . , φ, 0 , φ, . . . , φ ) , where the s is in position k s in the third group and the 0 is in position s in the fourth group. This vector has response, b ( V s, 3 ) = 1 . Intuitiv ely , these three responses collectiv ely form a “chooser” gadget, where we will either have T s = 0 or the three v ariables X i s , Y j s , and Z k s , will each be set to hav e color s (and T s = 1 ). This reduction can clearly be done in polynomial time. So all that remains is for us to show that it works. Suppose, then, that there is a possible solution to the given instance of 3DM. Then for each chosen triple, T s = ( x i s , y j s , z k s ) , we can assign colors T s = 1 , X i s = s , Y j s = s , and Z k s = s , which will satisfy each of the V s, 1 , V s, 2 , and V s, 3 vector responses for this v alue of s . Like wise, setting T s = 0 will satisfy each of the V s, 1 , V s, 2 , and V s, 3 vector responses for a triple T s that is not chosen. Finally , giv en that there are n chosen vectors, we will satisfy the three preliminary vector responses as well. Suppose, alternati vely , that we ha ve a vector V that satisfies all of our vector responses. W e kno w that each X i , Y j , and Z k must be assigned a color other than φ . Since there are only m + 1 colors, this implies each X i , Y j , and Z k must be assigned a color corresponding to a triple number , s . If this T s = 1 , then in order to have satisfied the vectors V s, 1 , V s, 2 , and V s, 3 , we must hav e set X i s = s , Y j s = s , and Z k s = s , which implies we can include the triple ( X i s , Y j s Z k s ) in our matching. If T s = 0 , then we do not include this triple in our matching. By the vector responses V 2 5 and V 3 , we kno w that the number of triples chosen in this way is e xactly n . Thus, we hav e found a v alid 3-dimensional matching. Thus, it is extremely unlikely that we will be able to find a polynomial-time algorithm that can solve arbitrary Mastermind query sequences, ev en if they are single-count results. Note that this is not the same as a guarantee that discov ering a string Q requires a long query sequence, ho we ver . For we show , in the section that follows, that such query strings, Q , can be discov ered fairly ef ficiently using a single-count Mastermind algorithm. 3 A Mastermind Algorithm f or Single-Count Match Queries In this section, we explore an algorithm for the single-count Mastermind game, where the code- breaker , Dav e, eng ages in a series of guesses against the unknown string, Q , each of which re veals only the single-count score between the query string Q and strings provided by Dave, in an it- erati ve online f ashion. Here, we show that Da ve can learn Q with a sequence of N d log K e + d (2 − 1 /K ) N e + K guesses, where N is the length of Q and K is the size of the alphabet (whose members we call “colors”). W e begin the algorithm for Dav e by having him perform K queries, each of which is a vector of elements that are all the same color . This allo ws us to initially kno w the cardinality , c 1 , c 2 , . . . , c K , of e very color in the unkno wn vector , Q . If any c i = 0 , then we remove the color i from our alphabet of colors, and update the v alue of K accordingly . The remainder of Dav e’ s computation proceeds as a recursiv e divide-and-conquer algorithm, which is similar in structure to the approach of Chv ´ atal [2], but improv es his bound by almost a factor of 2 , ev en though his algorithm was for the general two-color case, by reusing kno wledge gained in pre vious reclusive calls. The generic problem is to determine the values of all the elements in a range Q [ l ..r ] , which initially is the entire vector Q = Q [0 ..N − 1] , assuming we know the v alues of c 1 , c 2 , . . . , c K , of e very color in Q [ l ..r ] , and each c i > 0 . If K ≤ 1 , we are done; so let us assume without loss of generality that K ≥ 2 . In addition, we assume inductiv ely that we know d , the number of instances of color 1 outside of the range Q [ l..r ] . Initially , of course, d = 0 . Gi ven this initial setup, we split Q [ l ..r ] into Q [ l ..m ] and Q [ m + 1 ..r ] , where m is in the middle of the interval [ l , r ] . The main challenge, then, is to provide for Q [ l ..m ] and Q [ m + 1 ..r ] the same setup we had for Q [ l ..r ] . This setup can be accomplished by determining the cardinalities, x 1 , x 2 , . . . , x K and y 1 , y 2 , . . . , y K , of e very color that respectiv ely appears in Q [ l ..m ] and Q [ m + 1 ..r ] . W e do this with a series of K − 1 additional queries, where we guess that the elements in Q [ l..m ] are of color 6 i , for i = 2 , 3 , . . . , K , and that the rest of Q is of color 1 . Let the v alues of these queries be denoted as b 2 , b 3 , . . . , b K , and note that, at this point, we kno w the follo wing: x i + y i = c i , for i = 1 , 2 , . . . , K (1) x i + y 1 = b i − d, for i = 2 , 3 , . . . , K (2) x 1 + x 2 + · · · + x K = m − l + 1 . (3) Thus, we can determine y 1 , as y 1 = c 1 + P K i =2 ( b i − d ) − ( m − l + 1) k , for y 1 is counted K times in the sum of c 1 and all the ( b i − d ) ’ s, and the sum of the x i ’ s is m − l + 1 , by Equation (3). Giv en the value of y 1 , we can then determine all the x i v alues, by using Equation (1) for x 1 and Equation (2) for x 2 , x 3 , . . . , x K . Moreov er , once we hav e all these x i v alues, we can determine the v alues, y 2 , y 3 , . . . , y K , using Equation (1). Finally , we can determine the v alues d 0 = d + y 1 and d 00 = d x 1 and use these respecti vely for the role of d in Q [ l..m ] and Q [ m + 1 ..r ] . This gi ves us all the v alues necessary to then recursively determine Q [ l..m ] and Q [ m + 1 ..r ] . Let us, therefore, analyze the number , G ( N , K ) , of vector guesses performed by this algorithm. Ignoring for the time being the initial set of K guesses, we can bound this parameter using the follo wing recurrence: G ( N , K ) = 2 G ( N / 2 , K ) + min { N , K − 1 } . Thus, adding the initial K queries back in, we get that the total number of guesses is at most N d log K e + d (2 − 1 /K ) N e + K . Therefore, we hav e the follo wing. Theorem 2: Giv en an unkno wn length- N string Q , defined on an alphabet of size K , a Master- mind algorithm can discov er Q in polynomial time using N d log K e + d (2 − 1 /K ) N e + K tests against Q , each of which re veals only the number of positions where Q and the test string match. 4 Conclusion W e hav e shown that, e v en though the single-count and sequence-alignment Mastermind satisfiabil- ity problems are NP-complete, one can ef fecti vely construct single-count Mastermind algorithms on arbitrary character strings just by kno wing basic information about the length of the strings and the number of characters in the alphabet used to construct those strings. 7 Acknowledgments W e would like to thank Pierre Baldi, Da vid Eppstein, Daniel Hirschberg, Stas Jarecki, and Michael Nelson for helpful discussions regarding the topics of this paper . This research was supported in part by the National Science Foundation under grants 0724806, 0713046, and 0847968. Refer ences [1] Z. Chen, C. Cunha, and S. Homer . Finding a hidden code by asking questions. In COCOON ’96: Pr oceedings of the Second Annual International Confer ence on Computing and Combinatorics , volume 1090 of LNCS , pages 50–55. Springer , 1996. [2] V . Chv ´ atal. Mastermind. Combinatorica , 3(3/4):325–329, 1983. [3] M. R. Gare y and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness . W . H. Freeman, Ne w Y ork, NY , 1979. [4] D. Knuth. The computer as a master mind. Journal of Recr eational Mathematics , 9:1–5, 1977. [5] A. M. Odlyzko. The rise and fall of knapsack cryptosystems. In C. Pomerance, editor , Cryptology and Computational Number Theory , pages 75–88. Am. Math. Soc., 1990. [6] J. Stuckman and G.-Q. Zhang. Mastermind is np-complete, 2005. http://arxi v .org/abs/cs/0512049. 8

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment