Computer Science / Cryptography and Security Computer Science / Data Structures Computer Science / Information Theory Mathematics / math.IT

Learning Character Strings via Mastermind Queries, with a Case Study Involving mtDNA

February 23, 2026

Reading time: 7 minute

...

#Computer Science #Data Structures #Information Theory #Cryptography and Security #Mathematics #Learning

📝 Original Info

Title: Learning Character Strings via Mastermind Queries, with a Case Study Involving mtDNA
ArXiv ID: 0904.4458
Date: 2009-04-28
Authors: Michael T. Goodrich

📝 Abstract

We study the degree to which a character string, $Q$, leaks details about itself any time it engages in comparison protocols with a strings provided by a querier, Bob, even if those protocols are cryptographically guaranteed to produce no additional information other than the scores that assess the degree to which $Q$ matches strings offered by Bob. We show that such scenarios allow Bob to play variants of the game of Mastermind with $Q$ so as to learn the complete identity of $Q$. We show that there are a number of efficient implementations for Bob to employ in these Mastermind attacks, depending on knowledge he has about the structure of $Q$, which show how quickly he can determine $Q$. Indeed, we show that Bob can discover $Q$ using a number of rounds of test comparisons that is much smaller than the length of $Q$, under reasonable assumptions regarding the types of scores that are returned by the cryptographic protocols and whether he can use knowledge about the distribution that $Q$ comes from. We also provide the results of a case study we performed on a database of mitochondrial DNA, showing the vulnerability of existing real-world DNA data to the Mastermind attack.

💡 Deep Analysis

Deep Dive into Learning Character Strings via Mastermind Queries, with a Case Study Involving mtDNA.

📄 Full Content

Mastermind [10,25] is a game played between two players-a codemaker and a codebreaker-using colored pegs. (See Figure 1.) Viewed mathematically, Mastermind is abstracted as a game where the codemaker selects a plaintext string 1 , Q, of length N , whose elements are selected from an alphabet of size K. For consistency with the board game, the members of this alphabet are often referred to as "colors." The codemaker and codebreaker both know the values of N and K, and play consists of the codebreaker repeatedly making guesses, V 1 , V 2 , . . ., about the identity of Q. For each guess, V i the codemaker provides a score on how well V i matches Q. In double-count Mastermind, which is the standard version based on the board game, this score consists of a pair of two numbers:

which is the number of elements in V i and Q that match in both value and location. That is, b(Q, V i ) = |{j: V i [j] = Q[j]}|.

• A white count, w(Q, V i ), which is the number of elements in V i that appear in Q but in different locations than their locations in V i . That is, letting π denote an arbitrary permutation,

In single-count Mastermind, which has been less studied, the codebreaker is given only the black count, b(Q, V i ), for each guess, V i . (Note that it is impossible to solve the problem given only white-count scores.)

The goal is for the codebreaker to discover Q using a small a number of guesses.

The original Mastermind game was invented in 1970 by Meirowitz, as a board game having holes for vectors of length N = 4 and K = 6 colored pegs. Knuth [25] subsequently showed that this instance of the Mastermind game can be solved in five guesses or less. Chvátal [10] studied the combinatorics of general Mastermind, showing that it can be solved in polynomial time, in the K ≥ N case, using 2N log K + 4N guesses, and Chen et al. [9] showed how this bound can be improved, in this case, to 2N log N + 2N + K/N + 2 guesses. Stuckman and Zhang [33] showed that is NP-complete to determine if a sequence of guesses and responses in general double-count Mastermind is satisfiable. Goodrich [20] shows that singlecount (black-peg) Mastermind satisfiability is NP-complete and that a specific vector Q can be guessed using a single-count (black-peg) query vector that is of length N log K + (2 -1/K)N + K.

Several researchers have explored privacy-preserving data querying methods that can be applied to character strings (e.g., see [2,15,16]). In particular, Atallah et al. [2] and Atallah and Li [3] studied privacypreserving protocols for edit-distance string comparisons, such as in the longest common subsequence (LCS) problem [21,22,36], where each party learns the score for the comparison, but neither learns the contents of the string of the other party. Such comparisons are common in DNA sequence alignment comparisons, for example. Troncoso-Pastoriza et al. [35] described a privacy-preserving protocol for searching for a certain regular-expression pattern in a DNA sequence. In last-year’s Oakland conference, Jha et al. [23] give privacy-preserving protocols for computing edit distance similarity scores between two genomic sequences, improving the privacy-preserving edit distance algorithm of Szajda et al. [34]. Single-count matching results between two strings can be done in a privacy-preserving manner, as well, using privacy-preserving set intersection, e.g., using the method of Freedman et al. [16], Vaidya and Clifton [37] or Sang and Shen [31,32]. The string matching problem can also be done using privacy-preserving dot product computations [1] or even general multi-party computation protocols (e.g., see [12,18,39]) or systems [6]. Jiang et al. [24] study a secure mulitparty method for comparing a genomic sequence against every sequence in a genomic database, providing a score indicating the match strength between the query sequence and each sequence in the database.

In terms of the framework of this paper, the closest previous work is that of Du and Atallah [14], who studied a privacy-preserving protocol for querying a string Q in a database of strings, D, where comparisons are based on approximate matching (but not sequence-alignment). Their protocols assume that the parties are honest-but-curious, however, so that, for instance, the database owner cannot introduce fake strings in his database whose intent is to discover the identity of the query string, Q. The attack model we explore in this paper, on the other hand, allows for “cheating” in the comparison protocol, so that D can introduce strings whose sole purpose is to help him discover the identity of Q.

In this paper we study the Mastermind attack on string data, which is a way that a genomic querier, Bob, can “play” a type of Mastermind game with an unknown string, Q-for which Q’s owner, Alice, thinks that she is comparing with Bob in a privacy-preserving manner-but instead Bob is discovering the full identity of Q.

The attack scenario is that Alice repeatedly participates

…(Full text truncated)…

📄 Read Full PDF on ArXiv

📸 Image Gallery

Reference

This content is AI-processed based on ArXiv data.

Learning Character Strings via Mastermind Queries, with a Case Study Involving mtDNA

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

📸 Image Gallery

Reference

Table of Contents

Table of Contents

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

📸 Image Gallery

Reference

Related Posts

Information, learning and falsification

$t$-Covering Arrays Generated by a Tiling Probability Model

A Channel Coding Perspective of Collaborative Filtering

Start searching

No results found