Quantum Pattern Matching in Generalised Degenerate Strings

A degenerate string is a sequence of sets of characters. A generalized degenerate (GD) string extends this notion to the sequence of sets of strings, where strings of the same set are of equal length. Finding an exact match for a pattern string insid…

Authors: Massimo Equi, Md Rabiul Islam Khan, Veli Mäkinen

Quan tum P attern Matc hing in Generalised Degenerate Strings Massimo Equi 1 , Md Rabiul Islam Khan 2 , and V eli M¨ akinen 2 1 Aalto Univ ersity , Finland 2 Univ ersity of Helsinki, Finland Abstract A de gener ate string is a sequence of sets of characters. A gener al- ize d de gener ate (GD) string extends this notion to the sequence of sets of strings, where strings of the same set are of equal length. Finding an exact matc h for a pattern string inside a GD string can be done in O ( mn + N ) time (Ascone et al., W ABI 2024), where m is the pattern length, n is the n umber of strings and N the total length of strings constituting the GD string. W e mo dify this algorithm to work under a quan tum model of computation, ac hieving running time ˜ O ( √ mnN ). 1 In tro duction Exact string matc hing problem is to decide if a pattern string P of length m app ears as a substring of a text string T of length n . This problem can b e solved in O ( m + n ) time [19] under the classical mo dels of computa- tion. The first quantum algorithm for a string matc hing problem was giv en b y Ramesh and Vinay [22], whose solution could find an exact matc h in time ˜ O ( √ n + √ m ), where ˜ O hides logarithmic factors. Since then, other quan tum algorithms ha v e b een prop osed that improv e the classical compu- tational complexity of man y different string problems such as longest c om- mon substring [11, 1], longest p alindr ome substring [11], minimal string r o- tation [24, 1], longest squar e substring [1], longest c ommon subse quenc e [17], e dit distanc e [5, 12], and multiple string matching [18]. Ho wev er, not m uch ha v e b een explored in regimes where the input is a more general structure than just string. F or matc hing a string on lab eled graphs, tw o different quantum approac hes ha ve b een dev elop ed [8, 9]. The 1 former is tailored to non-sparse graphs, where it giv es a b etter b ound than the b est algorithm in the classical setting. The latter is restricted to level- D AGs , where the input graph is assumed to be a sequence of sets of no des V , with edges E only defined betw een consecutiv e sets, and no des hav e character lab els. In this setting, the prop osed quantum algorithm achiev es O ( | E | √ m ) running time, improving o ver the b est classical quadratic algorithm and th us o vercoming the classical quadratic conditional low er-b ound. Both of these approac hes only work in sp ecial cases, and it is currently op en how to co v er the general case, and how to pro v e matching quantum low er b ounds. A C G T A A C G T G T A G A T C C G G T A C G T C A T A A G T A T G C A A C G T T A Figure 1: A GD string T [1 .. 5] with T [1] = { ACG , TAA , CGT , GTA } , T [2] = { GATC , CGGT } , T [3] = { AC , GT , CA } , T [4] = { TAAGT , ATGCA } , and T [5] = { ACG , TTA } . Underlined characters illustrate a match for pattern GTGTTAA . Bet ween matc hing a string in to another string and matching a string in to a graph, there is a middle ground of w ell-studied structures. This is the case for gener alize d de gener ate strings . A generalized degenerate (GD) string is a sequence of sets of strings called segmen ts, where strings of the same segmen t are of equal length, as Figure 1 illustrates. This definition can b e made more restrictiv e by imp osing constraints on the length of the segmen ts ( de gener ate strings if all segmen ts ha ve length 1), or more loose b y allowing strings in a segment to ha v e differen t lengths ( elastic de gener ate strings ). There are lines of research exploring each one of these v ariants and more [16, 13, 15, 10, 23, 2, 3, 20]. An extensiv e summary of these approac hes and no vel techniques can b e found in [4]. Despite this popularity in the string matching comm unity , no quan tum approach has b een prop osed so far for an y of these structures. In this pap er, we fo cus on solving string matc hing in generalized degenerate strings defined as follows: Problem 1 (String Matching in Generalised Degenerate Strings ( SMGD )) . input : A GD string T and a p attern string P , b oth over an alphab et Σ . output : T rue if and only if ther e is at le ast one o c curr enc e of P in T . 2 1.1 Our Results W e expand the scop e of quantum techniques to v arian ts of string matching b y prop osing the first quantum algorithm for Problem 1. Notice that this is an existence problem, that is, w e hav e to rep ort whether a string o ccurs in a generalized degenerate string or not, without reporting the o ccurrence itself. T o see the need for a custom approach to generalized degenerate strings, w e remark that a reduction from existing approac hes would not pro vide any b enefit. T o see this, one can represen t a GD string as a level-D AG: add edges b et ween all pairs of strings from consecutiv e sets and split strings to paths of c haracter-lab eled no des. The quantum algorithm of Equi et al. [9] then applies to solv e this GD string pattern matching problem, but the reduction is not efficien t, as a GD string consisting of N characters in total could create | E | = Ω( N 2 ) edges, yielding an O ( N 2 √ m )-time quantum algorithm, where m is the length of the pattern. How ev er, the problem can b e solved faster in the classical setting in O ( nm + N ) time [4, Theorem 3] for a generalized degenerate string of n segments and N total c haracters. Inspired by this algorithm, we revisit its strategy and adapt it to enable parallel sp eed-up. W e exploit this prop erty to recast the algorithm in the quantum setting, ac hieving the following. Theorem 1. Ther e exists a quantum algorithm that solves SMGD on a p at- tern string P of length m and a gener alize d de gener ate string T of n se gments and N total char acters in time ˜ O  √ mnN  , with high pr ob ability. As it is customary in quan tum computing, here “high probability” means that our algorithm has a constant error probabilit y , which can then b e b o osted to an arbitrary lo w constant b y running the algorithm a constan t n umber of times. W e remind that the ˜ O notation hides factors of complexit y O ( n o (1) ), whic h in our case means any (p oly)logarithmic factor. 1.2 T ec hnical Ov erview W e first describe a classical parallel approach in Section 3, and then show ho w this translates to a quan tum algorithm in Section 4. T o understand our approac h, consider this very simple naiv e classical solution to SMGD illustrated in Fig. 2 on page 8. Given pattern P of length m and GD string T of n segmen ts and N total c haracters, start from the leftmost p osition, called column (formally defined in Section 2), in T and try to match P . Consume c haracters from P and compare them to T column-b y-column (using tries, see Section 3) chec king whether a character-matc h can b e found or not. Ev en if there is no character-matc h, do not stop, but contin ue consuming one 3 c haracter in P for each column in T until there are no more characters in P . A t this p oint, start again from the b eginning of P , and k eep consuming its c haracters in the same manner un til either a full pattern matc h is found or we reac h the end of T . If no pattern match is found at this p oint, restart all the pro cedure starting from the second column of T this time. Keep restarting the procedure eac h time shifting the starting p osition one column to the righ t un til a match is found or m instances of the pro cedure hav e b een run. If no matc h has b een found un til this p oint, we can safely stop and rep ort that there is no match, as the next instance of our pro cedure would start from a column already c heck ed by the very first pro cedure instance. The classical parallel algorithm follows exactly this logic, with the differ- ence that it launc hes m threads to run the m instances of the ab o ve pro cedure in parallel. In a sense, this is similar to the approac h of Ascone et al. [4, The- orem 3], but there the authors use a bit-vector to keep track of the active prefixes betw een t wo segments, while here w e parallelize o ver all p ossible shifts of the pattern against the GD string. The quan tum algorithm replaces the threads with a sup erp osition, and instead of the tries it finds matc hes of substrings of the pattern in a segment emplo ying t w o nested Grov er’s searches. Using tries in the quantum algo- rithm can lead to comparable p erformances when scanning the GD strings, but using Gro ver’s searches av oids any prepro cessing, saving an additive O ( N ) term in the final complexity . W e remark that the techniques in this pap er generalize the approach of Equi et al. [9], which uses quantum parallelism to simulate a bit-parallel classical algorithm. This is clearly a sp ecial case of a multi-threaded al- gorithm, and in the GD-string setting fully exploiting quan tum sp eed-ups seems infeasible relying only on a pure bit-parallel abstraction. Moreo ver, w e ackno wledge a difficulty in finding matching quan tum low er b ounds for our solution. This is somewhat exp ected, as curren t lo wer b ound strate- gies [7, 8] struggle to provide low er b ounds for the quan tum complexity of finding a matc h for a string in a graph, and that is a clearly harder problem than SMGD . Finally , w e find that, as our algorithm is based on the prop ert y of co vering the GD string with shifts of the pattern, our techniques could b e of indep enden t, ev en non-quan tum, interest. 4 2 Preliminaries 2.1 Generalised Degenerate Strings An alphab et Σ is a set of char acters . A sequence P ∈ Σ m is called a string and its length is denoted m = | P | . W e denote integers i, i + 1 , . . . , j as in terv al [ i..j ] and represent a string P as an array P [1 ..m ], where P [ i ] ∈ Σ for 1 ≤ i ≤ m . String P [ i..j ] is called a substring , string P [1 ..i ] a pr efix , and string P [ i..m ] a suffix of P [1 ..m ]. A gener alise d de gener ate (GD) string T [1 ..n ] is a sequence of non-empty sets T [1] T [2] · · · T [ n ] of fixed length strings, that is, eac h T [ i ] ⊆ Σ k i , where k i is a p ositive integer and | T [ i ] | > 0 for all i . W e denote the c ar dinality of T as n = P n i =1 | T [ i ] | and the size as N = P n i =1 P S ∈ T [ i ] | S | , and use notation T [ i ][ j ] to refer to the j -th string of T [ i ] when accessing it in memory , and T [ i ][ j ][ k ] for the k -th c haracter of that string. Moreo ver, W = P n i =1 k i is the width of T . The language of T is the set of strings { S 1 S 2 · · · S n | S 1 ∈ T [1] , S 2 ∈ T [2] , . . . , S n ∈ T [ n ] } . In this w ork, w e study the problem of string matching in gener alise d de gener ate strings , which consists in finding a matc h for a pattern string P [1 ..m ] in a generalised degenerate string T [1 ..n ], where P has a match in T if i) P is a substring of an y S ∈ T [ i ] for any i , or ii) there is a sequence of strings S i , S i +1 , . . . , S j − 1 , S j suc h that S i ∈ T [ i ] , S i +1 ∈ T [ i + 1] , . . . , S j − 1 ∈ T [ j − 1] , S j ∈ T [ j ] and P = S i [ a..k i ] S i +1 · · · S j − 1 S j [1 ..b ] for some in tegers a, b > 0. W e then can say that a matc h starts at column c 1 and ends at column c 2 , where c 1 = a + P i − 1 x =1 k x and c 2 = b + P j − 1 x =1 k x . F or example, in Figure 1 S 2 = CGGT , S 3 = GT , S 4 = TAAGT , and P = GTGTTAA = S 2 [3 .. 4] S 3 S 4 [1 .. 3], starting at column 6 and ending at column 12. Our algorithm uses the forwar d trie and the b ackwar d trie for eac h T [ i ]. The forw ard trie is a tree on strings in T [ i ] such that eac h string S of T [ i ] corresp onds to a distinct leaf v S and one can follo w character-labeled edges from the ro ot to the leaf v S to sp ell S . The backw ard trie is the forward trie for the set of strings formed by reversing the strings in T [ i ]. The reverse of string S is S r = S [ k ] S [ k − 1] · · · S [1], where k = | S | . 2.2 Quan tum computation In what follo ws, we introduce our quan tum notation, but we assume the reader is familiar with the basic notions of quan tum computing as co vered in textb o oks [21]. In quan tum computation, the state of a qubit is describ ed b y a v ector k er ψ = α | 0 ⟩ + β | 1 ⟩ , that is a so-called sup erp osition of vectors | 0 ⟩ = (1 , 0), | 1 ⟩ = (0 , 1), where α, β ∈ C are amplitudes satisfying | α | 2 + | β | 2 = 1. 5 V ectors | 0 ⟩ and | 1 ⟩ form the so called computational basis, which spans a t wo-dimensional Hilb ert space H of single-qubit states. The tensor pro duct of tw o quan tum states | ψ ⟩ and | ϕ ⟩ can b e written as | ψ ⟩ ⊗ | ϕ ⟩ or | ψ ϕ ⟩ and it is itself a quan tum state. In particular, we can take the tensor pro duct of m ultiple computational basis vectors and obtain the state | i ⟩ = N i k =1 | b k ⟩ , where b k is the k -th bit of the binary represen tation of i . F or example, | 5 ⟩ = | 101 ⟩ = | 1 ⟩ ⊗ | 0 ⟩ ⊗ | 1 ⟩ . Thus, the set of v ectors {| i ⟩ | 0 ≤ i ≤ n − 1 } forms the computational basis for a 2 n -dimensional Hilb ert space H ⊗ n . The quan tum state of m ultiple qubits can then b e represen ted as a v ector in this Hilb ert space, and can b e written as | ψ ⟩ = P n i =1 α i | i ⟩ , where the square of the amplitudes is normalized as P n i =1 α 2 i = 1. When measuring state | ψ ⟩ in the computational basis, with probability | α i | 2 the result is i and the state collapses to | i ⟩ . A state | ψ ⟩ ∈ H ⊗ n is called sep ar able with resp ect to the partition H ⊗ n 1 ⊗ H ⊗ n 2 if it can b e written as the tensor pro duct | ψ ⟩ = | ψ 1 ⟩ | ψ 2 ⟩ , where | ψ 1 ⟩ ∈ H ⊗ n 1 , | ψ 2 ⟩ ∈ H ⊗ n 2 and n = n 1 + n 2 . A state | ψ ⟩ ∈ H ⊗ n that is not separable under any partition is called entangle d . Any unitary transformation U ∈ H 2 n × 2 n acting on H ⊗ n maps a quan tum state | ψ ⟩ ∈ H ⊗ n to a new quan tum state U | ψ ⟩ ∈ H ⊗ n . Some unitary transformations that op erate on one or tw o qubits are called gates , and the application of multiple gates is called a quantum cir cuit . W e adopt the quantum mo del of computation based on quan tum RAM (QRAM) with quantum registers of size O (log n ) (W ord-QRAM), as we are in terested in optimizing the gate complexity of our algorithms. W e will use the notation | ψ ⟩ R to say that the qubits of quantum register R are in state | ψ ⟩ . In the W ord-QRAM mo del, we can apply the transformation n X i =1 α i | i ⟩ I | x ⟩ D → n X i =1 α i | i ⟩ I | x + A [ i ] ⟩ D with up to logarithmic o verhead, where I (index) and D (data) are quan- tum registers of O (log n ) qubits and A [ i ] refers to the i -th elemen t of array A . When x = 0, this can be phrased as accessing the elemen ts of A in superp osi- tion. W e assume that w e can apply also the P auli gates, the Hadamard gate, the con trolled-not gate, the T offoli gate, their generalizations to O (log n ) qubits, and quantum circuits realizing arithmetic and logic op erations on up to O (log n ) qubits with logarithmic o verhead. W e will disregard these logarithmic factors adapting the ˜ O notation. The final k ey ingredient of our algorithms is Gro v er’s search [14] and its generalization in the framew ork of amplitude amplification [6]. Theorem 2 ([6, 14]) . L et A b e a quantum algorithm with no me asur ement, such that A| 0 ⟩ = √ 1 − a | ψ 0 ⟩ + √ a | ψ 1 ⟩ , wher e | ψ 1 ⟩ denotes a sup erp osition 6 of tar get states, | ψ 0 ⟩ is a sup erp osition of the non-tar get states and a is the suc c ess pr ob ability ( 0 < a ≤ 1 ). Ther e exists a quantum algorithm that finds a tar get state with pr ob ability at le ast max(1 − a, a ) using O (1 / √ a ) applic ations of A and A − 1 . If A is a classical function, this corresp onds to standard Gro ver’s searc h. Since A can also b e a Grov er’s search itself, this result allo ws us to nest Gro ver’s searc hes one into the other to achiev e b etter sp eed-ups. In our con text, the success probabilit y a is the ratio betw een the n umber of solutions o ver the num b er of possibilities. F or example, if w e compare t wo strings b oth of length n and we lo ok for a single-character mismatch, then a = m n , where m is the n um b er of single-c haracter mismatches. Then, the complexit y of finding one mismatc h is O  1 √ a  = O  p n m  . This go es to O ( √ n ) in the w orst case, that is when m = 1. 3 Classical P arallel Algorithm W e first giv e a high-lev el idea of a classical parallel algorithm finding a matc h for a pattern string P in a GD string T . This helps us set up the intuition for the quan tum algorithm. The strategy is to use m = | P | threads t 1 , . . . , t m to compute information ab out the prefixes of pattern P while w e scan GD string T , using the forward and bac kw ard tries of the segmen ts of T . The purp ose of this algorithm is not to be efficient, rather pro vide a framew ork to b etter in terpret the quan tum algorithm. Moreov er, for the sak e of exp osition w e assume that k i < m for ev ery 1 ≤ i ≤ n , b oth here and in the quantum algorithm. This implies that P do es not o ccur as a substring of any string in T . W e will drop this assumption in Section 6 by designing a custom quan tum algorithm for finding suc h o ccurrences, and showing that the correctness of our main algorithm is not affected. Algorithm idea The parallel classical algorithm pro ceeds as follows. After instantiat ing the m threads, w e start scanning T from left to righ t, segmen t b y segment, in a for-lo op of n iterations. At first, let us fo cus on what thread t 1 do es. Starting from the first segment T [1] at iteration 1, thread t 1 c hecks whether P [1 ..k 1 ] ∈ T [1] using the forward trie of T [1]. A t iteration 2, t 1 c hecks whether P [ k 1 + 1 ..k 2 ] ∈ T [2], assuming k 1 + k 2 < m . At some later iteration i , t 1 will reac h the end of P , and it will try to matc h a suffix of P as a prefix of a string in T [ i ], ending at a certain column c . F rom column c + 1, t 1 tries 7 P = T = t 1 = t 2 = t 3 = t 4 = t 5 = T G T T A A C G T A A C G T G T A G A T C C G G T { } { } A C G T C A { } T A A G T A T C G A A C G T T A { } { { } T G T T A T G T T A T G T T A T G T G T T A T G T T A T G T T A T T G T T A T G T T A T G T T A T G T T A T G T T A T G T T T G T T A T G T T A T G T A T A T A T T T A G i = 1 i = 6 i = 2 i = 3 i = 4 i = 5 Figure 2: Abstract represen tation of different threads trying to matc h pattern P in GD string T starting from different p osition. Each dash sym b ol repre- sen ts a single character, thus | P | has m = 5 characters and T has N = 52. Eac h thread t h tries to matc h P column b y column, with a shift of h − 1 p ositions w.r.t. thread t 1 , namely t 1 is shifted b y 0 p ositions and t 5 is shifted b y 2 p ositions. The characters highligh ted in green show that thread t 2 finds a matc h at p osition h + r · m = 2 + 1 · 5 = 7. The gray ed-out characters rep- resen ts comparisons that will b e tested but that cannot b ecome full matches. V ariable i counts the iterations of the main for-lo op. to match a prefix of P as a suffix of a string in T [ i ] using the bac kward trie of T [ i ], and it will contin ue to match P in T in this manner until reaching the end of T . This means that, after scanning T , t 1 can tell whether there is a match for P in T starting from some column m · r , for some integer r suc h that m · r < W . Any other generic thread t h do es the same as t 1 , but it starts with a shift of h columns. In other w ords, a generic thread t h can tells whether there is a match for P in T starting from some column h + m · r . As depicted in Figure 2, it is now straigh tforward to see that, if there is a matc h for P in T , there will b e a thread able to find it, b ecause h ranges from 1 to m , th us cov ering all columns. Oracle functions for the quantum algorithm In order to conv ert this parallel algorithm into an efficient quantum algo- rithm, w e will replace the m threads with a superp osition of m states, and we 8 will need to nest three Grov er’s searches G 1 , G 2 , and G 3 , one into the other. The goal is to solve the subproblems describ ed by the following Bo olean functions: 1. f 1 ( h ) = 1 if and only if P has a matc h starting from some column h + r · m in T , for some integer r suc h that m · r < W ; 2. f 2 ( T [ i ] , P [ j i,h ..j i,h + k i ]) = 1 if and only if T [ i ][ s ] = P [ j i,h ..j + k i ], for some s ∈ [1 , | T [ i ] | ]; 3. f 3 ( T [ i ][ s ][ c ] , P [ j i,h + c ]) = T [ i ][ s ][ c ]  = P [ j i,h + c ], where c ∈ [1 , k i ]. These will b e the oracle functions used by three nested Grov er’s searches G 1 , G 2 , and G 3 , resp ectiv ely , going from the outer level to the inner one. 4 Quan tum Algorithm The quan tum algorithm simulates the parallel one b y replacing the threads with a sup erp osition. W e recall that here we assume that k i < m for every 1 ≤ i ≤ n , and w e show how to drop this assumption in Section 6. In our algorithm, w e will use quantum registers ID , active i , match i , K i , ext i , suffm i , prefm i , where subscript i refers to a generic i -th instance among the n copies of a certain register. W e use these registers according to the follo wing logic: • | h ⟩ ID is suc h that 0 ≤ h ≤ m − 1, serves as a substate identifier; • | j i,h ⟩ prefix is suc h that 1 ≤ j i,h ≤ m , iden tifying which prefix of P is managed b y substate h in the current iteration; • | k i ⟩ K i stores the width k i of segmen t T [ i ]; • | a i,h ⟩ active i = | 1 ⟩ if P [1 ..j i,h ] matc hes a suffix of a string in T [1] · · · T [ i − 1], a i,h = 0 otherwise; • | m i,h ⟩ match i = | 1 ⟩ if at least one full matc h for P was found in GD string T [1] T [2] · · · T [ i − 1], m i,h = 0 otherwise; • | ext ( j i,h , i, k i ) ⟩ ext i = | 1 ⟩ if substring P [ j i,h + 1 ..j + k i ] ∈ T [ i ] (extension), ext ( j i,h , i, k i ) = 0 otherwise; • | sm ( j i,h , i, k i ) ⟩ suffm i = | 1 ⟩ if suffix P [ j i,h ..m ] is a prefix of a string in T [ i ] (suffix matc h), sm ( j i,h , i, k i ) = 0 otherwise; 9 • | pm ( j i,h , i, k i ) ⟩ prefm i = | 1 ⟩ if prefix P [1 ..j i,h + k i − m ] is a suffix of a string in T [ i ] (prefix matc h), pm ( j i,h , i, k i ) = 0 otherwise. In addition to these registers, w e also assume to use all auxiliary qubits necessary to compute functions that can b e implemen ted with a classical circuit. A t the b eginning, we apply Hadamard gates on register ID to set up the balanced sup erp osition 1 √ m m − 1 X h =0 | h ⟩ ID | 0 · · · 0 ⟩ . Then, we p erform a Grov er’s search with input register | h ⟩ ID , using the other registers to compute the oracle function, whic h in turn is implemen ted by t wo more levels of nested Gro ver’s searc hes. Let G 1 , G 2 , and G 3 b e the three unitary op erators implemen ting these Gro v er’s searc hes. W e now explain ho w to construct each one of them, starting from the most nested G 3 to the least nested G 1 . Matc hing a string in a segmen t Op erator G 3 is a Grov er’s searc h that chec ks whether tw o giv en strings A and B of the same length l are equal by lo oking for a single-c haracter mismatch. The oracle is the Bo olean function f 3 ( c, A, B ) = A [ c + 1]  = B [ c + 1], c ∈ [0 , l + 1], which can b e implemented with a classical circuit. Thus, G 3 p erforms the transformation G 3 | ψ string ⟩ = G 3 1 √ l l − 1 X c =0 | c ⟩ | A ⟩ | B ⟩ ! = l − 1 X c =0 α i | c ⟩ | A ⟩ | B ⟩ = | ψ ′ string ⟩ . Theorem 2 guarantees that, if | c f 3 ⟩ = P 0 ≤ c ≤| k i |− 1 , f 3 ( c )=1 | c ⟩ are those states for which f 3 ev aluates to 1, then | ⟨ c f 3 | ψ ′ string ⟩ | 2 > 2 / 3. Notice that | A ⟩ | B ⟩ is information presen t in memory that do es not dep end on c , and accessing the single c haracters of A and B can b e done emplo ying QRAM queries. Op erator G 2 is a Gro ver’s search that allo w us to find a matc h for a string A of in to a segmen t T [ i ]. The oracle is the Bo olean function f 2 ( s, A ), whic h returns 1 if string A equals the s -th string of T [ i ], and which w e implemen t through G 3 . Notice that, as done in previous works [22], G 3 detects mismatc hes, so we will set f 2 ( s, A ) = 0 whenever f 3 ( c, A, B ) = 1 for some c , otherwise w e set f 2 ( s, A ) = 1. W e hence hav e that op erator G 2 10 p erforms the transformation G 2 | ψ segment ⟩ = G 2   1 p | T [ i ] | | T [ i ] |− 1 X s =0 | s ⟩ | A ⟩   = | T [ i ] |− 1 X s =0 α i | s ⟩ | A ⟩ = | ψ ′ segment ⟩ . Again, Theorem 2 guarantees that, if | s f 2 ⟩ = P 0 ≤ s ≤| T [ i ] |− 1 , f 2 ( s )=1 | s ⟩ are those states for which f 2 ev aluates to 1, then | ⟨ s f 2 | ψ ′ segment ⟩ | 2 > 2 / 3. As b efore, the c haracters of | A ⟩ can b e retrieved with QRAM queries. F or-lo op computing f 1 The outermost Gro ver’s searc h, realized b y op erator G 1 , solves the problem of finding a matc h for pattern string P in GD string T . The oracle is the Bo olean function f 1 ( h, P , T ), which returns 1 if and only if P has a match starting from some column h + r · m in T , for some in teger r suc h that m · r < W . Implemen ting this oracle function encapsulate the main logic of our algorithm. Notice that f 1 (0 , P , T ) ∨ · · · ∨ f 1 ( m − 1 , P , T ) = 1 if and only if there is at least one match for P in T . Let us no w see ho w to compute f 1 . Starting from a balanced sup erp osition of h “quantum threads”, the first step is to initialize register prefix b y cop ying the v alues from register ID with CNOT gates | ψ 0 ⟩ = 1 √ m m − 1 X h =0 | h ⟩ ID | 0 ⟩ prefix | 0 · · · 0 ⟩ | P ⟩ | T ⟩ → 1 √ m m − 1 X h =0 | h ⟩ ID | h ⟩ prefix | 0 · · · 0 ⟩ | P ⟩ | T ⟩ . Then, we scan GD string T from left to righ t, segment b y segment, in a for- lo op of n iterations. T o see ho w to p erform one generic iteration i , assume that righ t b efore that iteration the system is in state | ψ i ⟩ = 1 √ m m − 1 X h =0 | h ⟩ ID | ϕ h ⟩ where | ϕ h ⟩ = | j i,h ⟩ prefix | a i,h ⟩ active i | m i,h ⟩ match i | k i ⟩ K i | 0 ⟩ ext i | 0 ⟩ suffm i | 0 ⟩ prefm i suc h that the following holds: index j i,h ∈ [1 , m ] is a p osition in P , a i,h = 1 if and only if P [1 ..j i,h ] matc hes a suffix of a string in T [1] · · · T [ i − 1], and 11 m i,h = 1 if and only if P has a match in T [1] · · · T [ i − 1] starting at column h + m · r for some in teger r suc h that m · r < W . W e now compute v alues ext ( j i,h , i, k i ), sm ( j i,h , i, k i ) and pm ( j i,h , i, k i ) using nested Grov er’s searc hes G 2 and G 3 . W e giv e the details of only the computation of ext ( j i,h , i, k i ), and sk etch the computation of sm ( j i,h , i, k i ) and pm ( j i,h , i, k i ), as they are very similar. V alue ext ( j i,h , i, k i ) = 1 if and only if P [ j i,h ..j i,h + k i ] ∈ T [ i ]. T o compute it, we run G 2 o ver string P [ j i,h ..j i,h + k i ] and segmen t T [ i ]. Notice that this is done in sup erp osition, and every “quantum thread” c hecks a differen t substring of P against the same T [ i ]. After computing ext ( j i,h , i, k i ), v alues sm ( j i,h , i, k i ) and pm ( j i,h , i, k i ) can b e computed using the same strategy . The main difference is that, for sm ( j i,h , i, k i ), Gro ver’s search G 2 will ha ve to span the prefixes of the s trings in T [ i ] of length m − j i,h + 1 and compare them against P [ j i,h ..m ], while for pm ( j i,h , i, k i ) op erator G 2 has to act on suffixes of length j i,h and compare them against P [1 ..j i,h ]. A t this p oin t, the system is in state 1 √ m m − 1 X h =0 | h ⟩ ID | j i,h ⟩ prefix | a i,h ⟩ active i | m i,h ⟩ match i | k i ⟩ K i ⊗ | ext ( j i,h , k i ) ⟩ ext i | sm ( j i,h , k i ) ⟩ suffm i | pm ( j i,h , k i ) ⟩ prefm i ! and we hav e to up date registers pm i +1 ,h , a i +1 ,h and m i +1 ,h . These are com- puted b y classical computation p erformed in sup erp osition. W e set prefix i +1 ,h = prefix i,h + k i +1 mod m. The up dates for active i +1 ,h and m i +1 ,h are formally explained in Subrou- tine 1, but intuitiv ely they go as follo ws. First w e c hec k whether j i,h + k i < m . If y es, then the pattern spans the en tire segmen t T [ i ], and we ha v e to c hec k if w e can extend a matc h, in the case where we had an active prefix ( a i,h = 1). If not, then the pattern ends in segment T [ i ], and th us we ha ve to c hec k if we got a full match (up dating match i +1 ,h = 1), and if a new matc h can start in this segment (updating a i +1 ,h = 1). Then, w e mak e the up date j i,h ← j i,h + k i , so that register prefix p oints to the next segmen t to pro cess. W e p oint out that using new registers at eac h iteration is needed since quan tum computing m ust b e rev ersible. Righ t after iteration i , we get the following state, which is the starting p oin t for the next iteration: | ψ i +1 ⟩ = 1 √ m m − 1 X h =0 | h ⟩ ID | j i,h + k i ⟩ prefix | a i +1 ,h ⟩ active i +1 | m i +1 ,h ⟩ match i +1 ⊗ | k i +1 ⟩ K i +1 | 0 ⟩ ext i +1 | 0 ⟩ suffm i +1 | 0 ⟩ prefm i +1 ! 12 if j i,h + k i < m then a i +1 ,h ← a i,h ∧ ext ( j i,h , i, k i ); else // j i,h + k i ≥ m a i +1 ,h ← pm ( j i,h , i, k i ); m i +1 ,h ← m i,h ∨ ( sm ( j i,h , i, k i ) ∧ a i,h ); end Subroutine 1: Register up dates for one iteration of the computation of oracle function f 1 . After the last iteration of the for-lo op, w e ha ve register match storing sup erp osition P m − 1 h =0 | m n,h ⟩ match n , where m n,h = 1 if at any iteration quan tum thread h found a matc h. Th us, applying a Z gate on this register will flip the phase and mark the elemen ts for whic h w e wan t to amplify the amplitude. This concludes the computation of f 1 . Finding a matc h for P in T No w that we know how to compute f 1 , we can use it as the oracle function. Th us, op erator G 1 is a Gro ver’s searc h that allow us to find a match for a string P into a GD string T . The oracle is the Bo olean function f 1 ( h, P , T ), and th us G 1 applies the transformation G 1 | ψ thread ⟩ = G 1 1 √ m m − 1 X s =0 | h ⟩ ID | P ⟩ | T ⟩ ! = m − 1 X s =0 α i | h ⟩ ID | P ⟩ | T ⟩ = | ψ ′ thread ⟩ . where, if | h f 1 ⟩ = P 0 ≤ h ≤ m − 1 , f 1 ( h,P,T )=1 | h ⟩ are those states for whic h f 1 ev al- uates to 1, then | ⟨ h f 1 | ψ ′ thread ⟩ | 2 > 2 / 3. Therefore, measuring register ID returns the index of a thread that detected a match with probability greater than 2 / 3, or a random thread if there was no matc h. This probability can b e b o osted arbitrarily , and one more ev aluation of oracle function f 1 can tell whether the result indicates the existence of a match or not. 5 Analysis Let f 1 , f 2 and f 3 b e resp ectively the oracle functions of the three nested Gro ver’s searches, from outermost to innermost. W e recall that • f 1 ( h ) = 1 if and only if P has a matc h starting from some column h + r · m in T , for some integer r ; 13 • f 2 ( h, i, s ) = 1 if and only if T [ i ][ s + 1] = P [ j i,h + 1 ..j + k i ], for s ∈ [0 , | T [ i ] | ]; • f 3 ( h, i, s, c ) = 1 if and only if T [ i ][ s + 1][ c + 1]  = P [ j i,h + c + 1]. Giv en that function f 3 can be computed in constan t time with a classical circuit, a standard application of Grov er’s searc h on input | ψ string ⟩ implies the follo wing lemma. Let j i,h , a i,h and m i,h b e the v alues stored in superp osition in registers prefix , active and match for state P h,s,c α h,s,c | ϕ h ⟩ | s ⟩ | c ⟩ , where the sum- mation goes o ver also s and c b ecause G 2 and G 3 are not p erfect oracles, and we can hav e spurious states with small amplitudes. W e now consider only those states α h, ¯ s, ¯ c | ϕ h ⟩ | s ⟩ | c ⟩ of maximal amplitude, namely suc h that | α h, ¯ s, ¯ c | = max s,c | α h,s,c | . Lemma 3. The for lo op of our quantum algorithm maintains the fol lowing invariant right b efor e every iter ation i : Invariant : for every h ∈ [0 , m − 1] it holds that: • j i,h = P i − 1 t =1 k t ; • j i,h +1 = j i,h + 1 , wher e addition is p erforme d mod m ; • a i,h = 1 if and only if P [1 ..j i,h − 1] matches a suffix of T [1] · · · T [ i − 1] ; • m i,h = 1 if and only if P has a match in T [1] · · · T [ i − 1] . Pr o of. T o prov e the correctness of the up dates, w e pro ceed b y induction on the iteration num b er i . Righ t b efore iteration i , the algorithm maintains the follo wing inv arian t. Base case , i = 1. When the system is initialized with all threads activ e w e ha ve j i,h = h , a 1 ,h = 0, m 1 ,h = 0 for every h , and the in v ariant holds trivially as no text has b een pro cessed and no partial matc hes exist b eyond the empt y prefix. Inductiv e case , i > 1. W e assume the in v ariant holds b efore iteration i , and we prov e that it still holds after iteration i , righ t b efore iteration i + 1. W e sho w that the v alues for any thread h are up dated correctly . Since we compute the new v alues of active and match according to Subroutine 1, we ha ve t wo cases: (i) j i,h + k i < m , (ii) j i,h + k i ≥ m . In case (i), if a i,h = 0 then b y inductiv e hypothesis prefix P [1 ..j i,h − 1] is not matc hing, and we correctly set a i +1 ,h = 0. Otherwise, a i,h = 1 and by inductive hypothesis the prefix P [1 ..j i,h − 1] has an active match. If that o ccurrence extends into T [ i ], there m ust exists a string s ∈ T [ i ] matc hing P [ j i,h ..j i,h + k i ]. Whether suc h a string 14 exists or not is determined by G 2 , whose result is the v alue ext ( j, i, k i ), and in this case w e properly set a i +1 ,h = ext ( j, i, k i ). In case (ii), an o ccurrence of P migh t end in T [ i ], a new one could start, and the record of an old one could carry o ver. Whether a string s ∈ T [ i ] has a prefix matching P [ j i,h ..m ] or not is found with another application of G 2 , whic h sets the v alue sm ( j i,h , i, k i ). Then, m i +1 ,h is set to if and only if b oth sm ( j i,h , i, k i ) = 1 and a i,h = 1, that is by inductive h yp othesis there is an active match for P [1 ..j i,h − 1]. T o set a i +1 ,h , it suffices to detect if P [1 ..k i − m + j i,h ] matc hes a suffix of a string in T [ i ], whic h is done with the third application of G 2 yielding pm ( j i,h , i, k i ). If it was the case that m h i = 1, by inductive h yp othesis we already found a matc h at a previous iteration, thus we correctly set m i +1 ,h = 1. Lastly , w e alw ays incremen t j i,h b y k i at the end of the iteration, and since by inductiv e h yp othesis j i,h = P i − 1 t =1 k t , we hav e j h +1 = k i + P i − 1 t =1 k t = P i t =1 k t . F rom this also follo ws that j i +1 ,h +1 = j i +1 ,h + 1. Lemma 4. L et GD string T have n se gments T [ i ] e ach of size k i . The for- lo op of our quantum algorithm c orr e ctly c omputes or acle function f 1 in time ˜ O ( P n i =1 p T [ i ] · k i ) with c onstant pr ob ability of suc c ess p 1 > 2 / 3 . Pr o of. Using Lemma 3, w e know that after iteration n (tec hnically righ t b efore a virtual iteration n + 1), we hav e m n,h = 1 if and only if thread h found at least one matc h for P in T [1] · · · T [ n ] = T , whic h guaran tees the correctness of the computation of f 1 . Th us, the correctness of the entire algorithm and the success probabilit y of p 1 > 2 / 3 follo w from Lemma 3, Theorem 2 applied to G 2 and G 3 , and the follo wing observ ation. At an y iteration i of the for-lo op, for every column in T [ i ] there exists a thread h that tries to start a match from that column. T o see this, recall that j i,h is a p osition in P and that a thread tries to match prefix P [1 ..j i,h − m + k i ] as a suffix of a string in T [ i ] when j i,h + k i ≥ m . Thus, j i,h + k i assumes all v alues b et ween 1 + k i to m + k i as a function of h , which means there is a thread for ev ery column. A t iteration i , G 3 is applied to strings that are not longer than k i , th us its complexity is ˜ O  √ k i  . The most exp ensive computation of G 2 is the one for ext ( j i,h , i, k i ), as it in v olves the longest strings. This tak es time ˜ O  p | T [ i ] | · k i  , as there are | T [ i ] | strings in a segment and G 3 is called as oracle. This is then the complexit y of pro cessing one segmen t. Since w e pro cess n segmen ts sequen tially , w e ha ve ˜ O ( P n i =1 p T [ i ] · k i ). Our main result Theorem 1 states that SMGD on a pattern string P of length m and a GD string of n segmen ts and size N can b e solved in time ˜ O ( √ mnN ) b y a quantum algorithm with high probabilit y . W e are no w ready to formally pro ve this result. 15 Pr o of of The or em 1. F rom Lemma 4, if an o ccurrence of P exists in T , there exists at least one thread h such that m n,h = 1. The algorithm applies Gro ver’s searc h G 1 to the equally balanced sup erp osition P m h =0 | h ⟩ | 0 ⟩ · | 0 ⟩ , and from the prop erties of Grov er’s search [14] follows that measuring the system after the appropriate n umber of iterations yields a state | h ⟩ suc h that f 1 ( h ) = 1 with probability p > 2 / 3 if at least one match exists. If no match exists, the oracle never marks an y state and the measurement will return a random thread index. W e can v erify whether f 1 ( h ) = 1 or f 1 ( h ) = 0 by running one more ev aluation of the oracle function. The computation of G 1 requires O ( √ m ) queries to oracle function f 1 . Com bining this with Lemma 4, w e obtain a total complexity of ˜ O √ m n X i =1 p T [ i ] · k i ! . Using the Cauch y–Sch warz inequalit y (details in App endix A), this can b e sho wn to b e b ounded b y ˜ O  √ mnN  . F or what concerns space complexity , w e need O (log m ) qubits for the register represen ting thread IDs, and O (log N ) for the registers in tro duced at eac h iteration, as v alues k i could be order of O ( N ) in the w orst case. Since w e need to introduce n of these registers, one p er iteration, the total space complexit y is O (log m + n log N ) qubits. 6 Dropping assumptions So far, we assumed that k i < m for ev ery 1 ≤ i ≤ n . Let us no w drop this assumption. W e can still find a match for P in T without losing on the complexit y . T o this end, let us first design a quantum algorithm that chec ks whether P has a matc h as a substring of a string in T . W e achiev e this with t wo nested Grov er’s searc hes, the outer one ranging o ver all N characters in T , and the inner one ranging ov er all m pattern p ositions in P . The inner Gro ver’s search uses an oracle function that detects tw o things: single c haracters mismatc hes, and whether the c haracter that we are c hecking is the last one in a segmen t string when w e are not on the last p osition of the pattern. The outer Gro ver’s search uses the inner one as oracle function. Th us, this tak es ˜ O  √ mN  in total, whic h is less than the ov erall complexit y ˜ O  √ mnN  . If the ab o ve prepro cessing pro cedure found no matches, we can contin ue with our main algorithm assuming that P do es not matc h as a substring 16 of a string in T . Ho wev er, it can still b e the case that k i > m for some i , requiring us to make a small mo dification to the algorithm. In the main for lo op, whenev er w e encounter a segment T [ i ] such that k i > m , we do not compute ext ( j i,h , i, k i ), but only sm ( j i,h , i, k i ) and pm ( j i,h , i, k i ). No w it is no longer true what w e sho wed in the pro of of Lemma 4, that is not all columns will ha ve a quantum thread that tries to start a match. How ever, it is easy to see that the columns that we do not co ver were already c hec ked by the prepro cessing pro cedure explained ab ov e. References [1] Sh yan Akmal and Ce Jin. Near-optimal quantum algorithms for string problems. Algorithmic a , 85(8):2260–2317, 2023. [2] Mai Alzamel, Lorraine A. K. Ayad, Giulia Bernardini, Rob erto Grossi, Costas S. Iliop oulos, Nadia Pisanti, Solon P . Pissis, and Gio v anna Rosone. Degenerate string comparison and applications. In 18th Inter- national Workshop on Algorithms in Bioinformatics (W ABI) , v olume 113 of LIPIcs , pages 21:1–21:14, 2018. [3] Mai Alzamel, Lorraine A. K. Ayad, Giulia Bernardini, Rob erto Grossi, Costas S. Iliop oulos, Nadia Pisanti, Solon P . Pissis, and Gio v anna Rosone. Comparing degenerate strings. F undam. Informatic ae , 175(1- 4):41–58, 2020. [4] Ro cco Ascone, Giulia Bernardini, Alessio Conte, Massimo Equi, Est´ eban Gab ory , Roberto Grossi, and Nadia Pisan ti. A unifying taxonom y of pat- tern matc hing in degenerate strings and founder graphs. In Solon P . Pis- sis and Wing-Kin Sung, editors, 24th International Workshop on A lgo- rithms in Bioinformatics, W ABI 2024, R oyal Hol loway, L ondon, Unite d Kingdom, Septemb er 2-4, 2024 , volume 312 of LIPIcs , pages 14:1–14:21. Sc hloss Dagstuhl - Leibniz-Zen trum f ¨ ur Informatik, 2024. [5] Mahdi Boroujeni, Soheil Ehsani, Mohammad Gho dsi, MohammadT aghi Ha jiaghayi, and Saeed Seddighin. Appro ximating edit distance in truly sub quadratic time: Quantum and mapreduce. J. ACM , 68(3):19:1– 19:41, 2021. [6] Gilles Brassard, P eter Høy er, Mic hele Mosca, and Alain T app. Quantum amplitude amplification and estimation. In Quantum Computation and Information , volume 305 of Contemp or ary Mathematics , pages 53–74. American Mathematical So ciet y , Providence, RI, 2002. 17 [7] Harry Buhrman, Subhasree P atro, and Florian Sp eelman. A framew ork of quantum strong exp onen tial-time h yp otheses. In Markus Bl¨ aser and Benjamin Monmege, editors, 38th International Symp osium on The or et- ic al Asp e cts of Computer Scienc e, ST A CS 2021, Saarbr ¨ ucken, Germany (Virtual Confer enc e), Mar ch 16-19, 2021 , volume 187 of LIPIcs , pages 19:1–19:19. Sc hloss Dagstuhl - Leibniz-Zen trum f ¨ ur Informatik, 2021. [8] P arisa Darbari, Daniel Gibney , and Sharma V. Thank achan. Quan- tum time complexity and algorithms for pattern matching on lab eled graphs. In String Pr o c essing and Information R etrieval - 29th Inter- national Symp osium, SPIRE 2022, Conc ep ci´ on, Chile, Novemb er 8-10, 2022, Pr o c e e dings , volume 13617 of L e ctur e Notes in Computer Scienc e , pages 303–314. Springer, 2022. [9] Massimo Equi, Arianne Meijer-v an de Griend, and V eli M¨ akinen. F rom bit-parallelism to quan tum string matching for lab elled graphs. In Lauren t Bulteau and Zsuzsanna Lipt´ ak, editors, 34th A nnual Symp o- sium on Combinatorial Pattern Matching, CPM 2023, Marne-la-V al l ´ ee, F r anc e, June 26-28, 2023 , volume 259 of LIPIcs , pages 9:1–9:20. Sc hloss Dagstuhl - Leibniz-Zen trum f ¨ ur Informatik, 2023. [10] Massimo Equi, T uukk a Norri, Jarno Alanko, Bastien Cazaux, Alexan- dru I. T omescu, and V eli M¨ akinen. Algorithms and complexit y on in- dexing elastic founder graphs. In Hee-Kap Ahn and Kunihik o Sadak ane, editors, 32nd International Symp osium on Algorithms and Computa- tion, ISAAC 2021, F ukuoka, Jap an, De c emb er 6-8, 2021 , volume 212 of LIPIcs , pages 20:1–20:18. Sc hloss Dagstuhl - Leibniz-Zen trum f ¨ ur Infor- matik, 2021. [11] F ran¸ cois Le Gall and Saeed Seddighin. Quan tum meets fine-grained complexit y: Sublinear time quan tum algorithms for string problems. A lgorithmic a , 85(5):1251–1286, 2023. [12] Daniel Gibney , Ce Jin, T omasz Kociumak a, and Sharma V . Thank achan. Near-optimal quantum algorithms for b ounded edit distance and lempel- ziv factorization. In David P . W o o druff, editor, Pr o c e e dings of the 2024 A CM-SIAM Symp osium on Discr ete A lgorithms, SODA 2024, A lexan- dria, V A, USA, January 7-10, 2024 , pages 3302–3332. SIAM, 2024. [13] Rob erto Grossi, Costas S. Iliop oulos, Chang Liu, Nadia Pisanti, Solon P . Pissis, Ahmad Retha, Gio v anna Rosone, F atima V a yani, and Luca V er- sari. On-line pattern matching on similar texts. In 28th Annual Symp o- 18 sium on Combinatorial Pattern Matching (CPM) , volume 78 of LIPIcs , pages 9:1–9:14, 2017. [14] Lo v K. Grov er. A fast quantum mechanical algorithm for database searc h. In Pr o c e e dings of the Twenty-Eighth Annual ACM Symp osium on the The ory of Computing, Philadelphia, Pennsylvania, USA, May 22-24, 1996 , pages 212–219. ACM, 1996. [15] Costas S. Iliop oulos, Ritu Kundu, and Solon P . Pissis. Efficien t pattern matc hing in elastic-degenerate strings. Information and Computation , 279:104616, 2021. [16] Costas S. Iliop oulos, Lauren t Mouc hard, and Mohammad Sohel Rah- man. A new approach to pattern matching in degenerate DNA/RNA sequences and distributed pattern matching. Math. Comput. Sci. , 1(4):557–569, 2008. [17] Ce Jin and Jak ob Nogler. Quan tum sp eed-ups for string synchronizing sets, longest common substring, and k -mismatch matc hing. A CM T r ans. A lgorithms , 20(4):32:1–32:36, 2024. [18] Kamil Khadiev and Danil Sero v. Quan tum algorithm for the m ultiple string matc hing problem. In SOFSEM 2025: The ory and Pr actic e of Computer Scienc e: 50th International Confer enc e on Curr ent T r ends in The ory and Pr actic e of Computer Scienc e, SOFSEM 2025, Br atislava, Slovak R epublic, January 20–23, 2025, Pr o c e e dings, Part II , page 58–69, Berlin, Heidelb erg, 2025. Springer-V erlag. [19] Donald E. Knuth, James H. Morris Jr., and V aughan R. Pratt. F ast pattern matc hing in strings. SIAM J. Comput. , 6(2):323–350, 1977. [20] V eli M¨ akinen, Bastien Cazaux, Massimo Equi, T uukk a Norri, and Alexandru I. T omescu. Linear time construction of indexable founder blo c k graphs. In 20th International Workshop on Algorithms in Bioin- formatics (W ABI) , volume 172 of LIPIcs , pages 7:1–7:18, 2020. [21] Mic hael A. Nielsen and Isaac L. Ch uang. Quantum Computation and Quantum Information: 10th Anniversary Edition . Cambridge Universit y Press, 2010. [22] Hariharan Ramesh and V Vina y . String matc hing in O ( √ n + √ m ) quan- tum time. Journal of Discr ete A lgorithms , 1(1):103–110, 2003. Com bi- natorial Algorithms. 19 [23] Nicola Rizzo, Massimo Equi, T uukk a Norri, and V eli M¨ akinen. Elas- tic founder graphs impro ved and enhanced. The or. Comput. Sci. , 982:114269, 2024. [24] Qisheng W ang and Mingsheng Ying. Quan tum algorithm for lexico- graphically minimal string rotation. The ory Comput. Syst. , 68(1):29–74, 2024. A Complexit y Lemma 5. L et N = P n i =1 | T [ i ] | k i b e the total length of al l strings c onstituting the GD string T . The fol lowing ine quality holds: n X i =1 p | T [ i ] | k i ≤ √ nN Pr o of. W e apply the Cauc h y-Sch warz inequalit y , whic h states that for any sequences of real n um b ers ( a 1 , . . . , a n ) and ( b 1 , . . . , b n ), n X i =1 a i b i ! 2 ≤ n X i =1 a 2 i ! n X i =1 b 2 i ! • Let a i = 1 for all i = 1 , . . . , n . • Let b i = p | T [ i ] | k i . Substituting these in to the inequalit y w e hav e, n X i =1 1 · p | T [ i ] | k i ! 2 ≤ n X i =1 1 2 ! n X i =1  p | T [ i ] | k i  2 ! where n X i =1 1 2 = n X i =1 1 = n and n X i =1  p | T [ i ] | k i  2 = n X i =1 | T [ i ] | k i . By definition, P n i =1 | T [ i ] | k i is exactly N , the total size of the GD string. Therefore, n X i =1 p | T [ i ] | k i ! 2 ≤ n · N 20 T aking the square ro ot of b oth sides, we obtain, n X i =1 p | T [ i ] | k i ≤ √ nN 21

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment