Stringological sequence prediction I: efficient algorithms for predicting highly repetitive sequences
We propose novel algorithms for sequence prediction based on ideas from stringology. These algorithms are time and space efficient and satisfy mistake bounds related to particular stringological complexity measures of the sequence. In this work (the …
Authors: Vanessa Kosoy
Stringological sequence prediction I: ecien t algo rithms for predicting highly rep etitiv e sequences V anessa K osoy 1,2 1 F aculty of Mathematics, T echnion – Israel Institute of T ec hnology 2 Computational Rational Agen ts Lab oratory vanessa@alter.org Abstract. W e prop ose no vel algorithms for sequence prediction based on ideas from stringology . These algorithms are time and space ecien t and satisfy mistak e b ounds related to particular stringological complexit y measures of the sequence. In this work (the rst in a series) w e focus on t w o suc h measures: (i) the size of the smallest straight-line program that pro duces the sequence, and (ii) the n um b er of states in the minimal automaton that can compute any sym b ol in the sequence when giv en its p osition in base as input. These measures are interesting b ecause m ultiple rich classes of sequences studied in combinatorics of w ords (automatic sequences, morphic sequences, Sturmian w ords) ha ve lo w complexity and hence high predictabilit y in this sense. 1 In tro duction Sequence (or “time series”) prediction is a classical problem in articial intelligence, mac hine learning and statistics ([1], [2], [3]). Ho wev er, there is a dearth of examples of prediction algorithms whic h are simultaneously • Computationally ecien t. • Satisfy strong pro v able mistake bounds. • Guaran tee asymptotically near-p erfect prediction for natural ric h classes of deterministic sequences. Arguably , assuming no domain-sp ecic kno wledge, the “gold standard” for sequence prediction is Solomono induction [1]. On an y sequence, it asymptotically p erforms as well as an y computable predictor. It is also conceptually justiable as a formalization of Occam’s razor. Ho wev er, Solomono induction is uncomputable, making it completely unsuitable for an y practical imple- men tation. Muc h work on statistics fo cused on series of contin uous v ariables [3]. This led to algorithms that require assumptions such as particular form of probability distributions (e.g. normal). Most of these are inapplicable to the categorical (discrete) setting which is our o wn fo cus, and they don’t yield in teresting classes of deterministic sequences. In practice, metho ds based on deep learning are incredibly successful in next-token prediction [4]. How ever, the theoretical understanding of generalization b ounds for deep learning is in its infancy [5]. In particular, we don’t ha ve man y examples where strong rigorous mistake b ounds can b e pro v ed. There are computationally ecien t algorithms with strong prov able generalization b ounds for some in teresting classes of sto c hastic processes, for example context-tree weigh ting metho ds [6]. Ho wev er, suc h classes oen degenerate in the deterministic case (e.g. admitting only p erio dic sequences). One notable known example whic h do es come close to our desiderata is using the p erceptron algorithm for online linear classication [2] applied to some features computed from the sequence. Ho wev er, the corresp onding class of predictable deterministic sequences is still quite limited. Hence, nding conceptually dieren t approaches appears to b e a v aluable goal. F ormally , we w ork in the following setting. There is a xed nite alphab et and w e are interested in predictors of the form (here, is the symbol predicted to follow the prex ). The assumptions ab out the data are expressed as a c omplexity me asur e . W e are then in terested in b ounding the num b er of mistak e makes on a sequence in terms of and . At the same time, w e require that is computable in p olynomial time. Moreo v er, w e wish to sim ultaneously b ound the size of the internal state of the predictor in terms of and (this can b e interpreted as the predictor c ompr essing the sequence). The latter is in teresting b ecause it leads to predictors that are space ecien t and in some cases run in quasilinear 1 time. W e study multiple natural complexit y measures with strong mistake and compression b ounds. (T w o such measures are addressed in this w ork, further examples will b e studied in sequels.) Conceptually , suc h complexit y measures can b e viewed as candidate tractable analogues of K olmogorov complexit y . Our algorithms simultaneously get a mistak e b ound of the form and a compression b ound of the form . In particular, when op erating on an innite sequence for whic h gro ws p olylogarithmically , such a predictor runs in quasilinear time and p olylog space while making only p olylog mistakes (see Denition 2.4). An y prediction algorithm that assumes no domain-sp ecic knowledge is forced to rely on prop erties of data that are ubiquitous across man y domains in the real world. One candidate for such a prop erty is hier ar chic al structure [7], [8]. A natural formalism for describing sequences with hierarc hical structure is straight-line programs (SLP) [9]. The size of the smallest SLP pro ducing a sequence is therefore one natural complexit y measure for us to consider. Luc kily , algorithms on sequences with small SLP w ere widely studied in stringology , where the size of the smallest SLP is one of several standard measures of “compressibility” that are equiv alent up to factors of , where is the length of the sequence [10]. Indeed, one of these measures is the size of the LZ77 compression, whic h w as originally prop osed as a computationally tractable analogue of Kolmogoro v complexit y [11]. It is therefore somewhat remarkable that there is, as far as w e know, no prior w ork that uses LZ77 for sequence prediction. As a “w armup”, w e start from a closely related but easier complexity measure, namely the size of the smallest automaton that can compute an y sym b ol in the sequence when giv en the time index in b ase as input, for some xed in teger , a concept known as “ -automaticity” . Notably , innite sequences for which this measure is nite w ere widely studied in com binatorics on words: they are called automatic se quenc es and hav e their own rich theory [12]. Hence, ev en this simple complexit y measure captures a rich family of interesting examples. More precisely , this denes dieren t complexity measures if the automaton is assumed to read digits in le-to-righ t (L TR) 1 I.e. linear up to logarithmic factors. order vs. right-to-le (R TL). F or L TR, we nd an algorithm with mistakes and compression. W e leav e the treatment of R TL for a sequel. W e call str aight-line c omplexity (SLC) the size of the smallest SLP . It’s straightforw ard to show that SLC is dominated b y L TR -automaticity for all v alues of (see Prop osition 4.3). Hence, a prediction algorithm that’s eective w.r.t. SLC is automatically eective w.r.t. L TR -automaticity for all . Moreov er, SLC is closely related to another w ell-studied class of sequences, namely the morphic se quenc es [12] 2 . W e successfully nd a prediction algorithm for SLC with mistak es and compression. This algorithm, which directly uses LZ77 compression, is based on the -SA c ompr esse d index [13]. 1.1 Related W ork P orat and F eldman [14] devised an algorithm for learning an automaton from its b ehavior on inputs giv en in strict lexicographic order. They did not ac knowledge the connection, but this setting is equiv alen t to predicting an automatic sequence. How ever, they only show ed that their algorithm pro duces a minimal automaton and don’t ha v e an explicit mistake bound. More generally , the inference of automata and related classes has receiv ed m uc h atten tion (e.g. [15], [16]), ho wev er most existing work is inapplicable to our setting. Morphic sequences are closely related to L-systems (see e.g. [17]). An L-system is dened b y a starting w ord and a homomorphism , in whic h case the language it pro duces is the set of words . Inference of L-systems receiv ed some attention in the literature, see [18] for a surv ey . Ho w ever, the problem of inferring an L-systems from examples of its language is quite dierent from the problem of morphic sequence prediction. Moreov er, man y of the techniques are heuristic ([19], [20], [21]) and the exact algorithms ha v e exp onen tial complexity (see e.g. [22]). In fact, Duy et al [23] recently pro ved NP-hardness results ab out the problem. Remarkably , the sequence they construct in the pro of is p erio dic, which mak es it trivially easy in the prediction setting w e study . Giv en that our LZP algorithm is based on LZ77 compression, it is relev ant to p oint out that the connection of compression to prediction and learning w as the sub ject of considerable study , see e.g. [24], [25]. Ho wev er, muc h of that literature isn’t concerned with computational eciency . Also, some prediction algorithms are known that are based on LZ78 compression 3 , e.g. [27], [28], [29]. Ho w ever, none of that work explores a connection to automatic or morphic sequences. Notably , LZ78 compression is less suited to our purp oses since it can nev er compress a word of length to a word of length less than . By contrast, LZ77 compression can compress to a length of . Finally , there w as ample research on other (not prediction) algorithmic problems inv olving auto- matic and morphic sequences [30]. 2 Sp ecically , “t ypical” morphic sequences hav e SLC that grows as , although for some it grows as fast as , see App endix G. 3 LZ78 compression w as in tro duced in [26]. 2 Setting In this section, we formalize the framework of stringological sequence prediction. W e dene the online prediction proto col in terms of state-based algorithms, in tro duce the notion of stringological complexit y measures, and establish rigorous criteria for statistical and computational eciency . Finally , we detail the c ounting criterion , whic h relates the learnability of a sequence class to the gro wth rate of the num b er of low-complexit y words. 2.1 Preliminaries and Notation Let b e a xed nite 4 alphab et. W e denote the set of nite words ov er b y and the set of righ t-innite sequences b y . F or a w ord , w e denote its length by . The -th sym b ol in a w ord is denoted . W e use 0-based indexing, suc h that The notation refers to the factor (subw ord) . The notation is the same as . A word is a prex of , denoted , if for some . Logarithms are tak en to base 2 unless otherwise sp ecied. 2.2 The Prediction Proto col W e op erate in the standard deterministic online prediction setting. T o discuss memory and time constrain ts rigorously , w e mo del the predictor not merely as a function of the innite history , but as a state-based mac hine. Denition 2.1 : A pr e dictor is a tuple , where: • is the set of p ossible in ternal states (represen ted as binary strings). • is the initial state. • is the stateup date function . It tak es the curren t state and the most recen t observ ation to pro duce the next state. • is the stateprediction function . It maps the curren t state to a predicted next sym b ol. The prediction pro cess pro ceeds in rounds for a target sequence . A t step : 1. The predictor outputs a h yp othesis . 2. The true sym b ol is revealed. 3. The predictor up dates its in ternal state: . W e assume . The total n um b er of mistakes made by on a nite w ord is denoted b y: W e also in tro duce a notation for the maximal size of the state during the pro cessing of a w ord: 4 W e assume is nite purely for ease of presentation. W e could instead assume that , in which case the factors of in the bounds w ould be replaced b y , where is the highest n umber that actually app eared in the sequence so far. 2.3 Eciency Criteria W e ev aluate performance against the inheren t structural complexity of the individual sequence, rather than a probabilistic prior. Denition 2.2 : A wor d c omplexity me asur e is a function that satises the follo wing conditions: • (P olynomial b ound.) There exists a p olynomial s.t. for all , • (Appro ximate monotonicit y .) There exists a p olynomial s.t. for all and , W e seek predictors that are statistically ecien t, space ecient and time ecient. W e b ound these resources in terms of the sequence’s complexit y and its logarithmic length . Denition 2.3 : ( Statistical Eciency . ) The predictor is statistic al ly ecient with resp ect to if the n umber of mistakes is quasilinear in the complexity . That is, for all Denition 2.4 : ( Computational Eciency . ) The predictor is c omputational ly ecient with resp ect to if • The time to compute is b ounded b y . • The time to compute is b ounded b y . • F or all : W e can th us think of as a compressed represen tation of the history , and we refer to inequalities of the form Equation 7 as “compression b ounds” . (In many examples, this compression is lossless, but it do esn’t ha v e to b e.) In particular, this b ound implies that if the sequence complexit y grows p olylogarithmically , then the space complexit y and the p er-round pro cessing time are also p olylogarithmic 5 . 5 Since p er-round pro cessing time is p olynomial in and is p olylogarithmic in due to inequality Equation 7. 2.4 The Coun ting Criterion T o characterize whic h complexity measures admit statistically ecient predictors, we utilize a coun ting argument. This connects the “v olume” of the concept class (w ords of lo w complexit y) to the hardness of learning. Denition 2.5 : Given a word complexity measure , the counting complexit y is dened as the logarithm of the num b er of w ords of length with complexity at most : The follo wing fact serves as the fundamen tal condition for learnability in this framew ork. Theorem 2.1 : Let b e a word complexit y measure. Then, there exists a statistically ecien t predictor for if and only if the counting complexit y satises: See App endix A for the (fairly straigh tforward) proof. 3 A utomaticit y In this section, we inv estigate our simplest example of a w ord complexity measure: automaticit y . It is dened as the minimal num b er of states in an automaton that computes any symbol in the w ord from a base representation of its position. Details follow. 3.1 Denitions Fix an integer base . F or any length and integer , let denote the standard base- representation of , padded with leading zeros to length . That is, suc h that . W e mo del the structure of a sequence via deterministic nite automata (DF As) that compute from this padded representation. Denition 3.1 : Let b e a nite word. The (L TR) -automaticity of , denoted , is the minimal num b er of states in a DF A (with input alphab et ) such that there exists an in teger satisfying and: Note that we are processing the time index in Le-to-Righ t (most signican t digit rst) order. It is also possible to dene R TL automaticity , where the order of processing is least signicant digit rst. F or our purp oses the resulting complexit y measures are very dierent and require dieren t algorithms. In this work, w e will fo cus solely on L TR automaticity , lea ving R TL automaticity for a sequel. F or an y it’s ob vious that . Hence, Equation 5 is satised for this measure. It is also easy to see that , and therefore is a w ord complexity measure 6 . Example 3.1 : Let . W e dene the Thue-Morse se quenc e recursively by a family of prexes of length : where denotes the bit wise negation of . Note that for all , and hence there is a unique s.t. . The b eginning of this sequence is It’s p ossible to sho w that for all and F rom this it is easy to see that can b e computed by an automaton with t w o states for an y reading direction. Hence, The automaticity of a sequence is highly sensitiv e to the choice of base . A sequence that is simple in one base ma y b e complex in another. Example 3.2 : Let . F or an y , let and Then, and . See App endix E for the proof of this example. This is a serious issue with automaticity-based predictors as candidate algorithms for practical applications, because in most cases there is no w ay to single out a preferred v alue of . In Section 4 w e will see ho w we can o v ercome the dependence on by using the word complexit y measure . 6 This is demonstrated by an automaton that has a state for ev ery element of with , s.t. detects the exact and outputs accordingly . 3.2 An Ecien t Predictor for Automaticit y W e prop ose the Hierarc hical Dictionary Pluralit y (HDP) algorithm. This algorithm predicts b y maintaining a hierarch y of dictionaries for w ord blo cks of size that appro ximate the transition structure of the underlying automaton (see App endix C). Theorem 3.1 : There exists a predictor (HDP) which is statistically and computationally ecien t with resp ect to . Sp ecically , for any and , letting and , w e ha ve: 1. Statistical Eciency: The n umber of mistakes is bounded by 2. Computational Eciency: The predictor satises the compression b ound Here, and in the theorem b elow, the predictor receiv es as a parameter, and is p olynomial time in this input as w ell. See App endix C for the pro of. 4 Straigh tLine Complexit y W e now turn to a more general complexity measure: straight-line complexity (SLC). While - automaticit y characterizes sequences generated b y nite state machines reading the time index, SLC c haracterizes sequences generated by straigh t-line programs: a t yp e of con text-free grammar whose language consists of a single w ord. 4.1 Denitions W e start from the denition of a straigh t-line program. Denition 4.1 : A str aight-line pr o gr am (SLP) ov er is , where is a nite set, and , where . W e require that • F or all , . • is acyclic, i.e. there are no s.t for all , app ears in . • is the unique elemen t of that do esn’t app ear in for an y . W e dene recursiv ely by • F or , . • F or , let . Then, . W e dene the value of as . Th us, an SLP is essen tially an acyclic directed graph with vertices and edges where app ears in . The outgoing edges of ev ery v ertex are ordered (b y the p osition inside ), is the unique source in , and the elemen ts of are sinks. The size of an SLP is dened to b e the n um b er of edges, i.e. It’s easy to see that w e can reduce any SLP to a “binary” SLP with only a mild increase in size: Prop osition 4.1 : F or any SLP , there exists an SLP s.t. • • • F or all , . See App endix F for the proof. W e can no w dene our next word complexit y measure of interest. Denition 4.2 : The str aight-line c omplexity of a w ord , denoted , is the minimum size of an SLP s.t. . T o see that satises Equation 5, we observ e the following Prop osition 4.2 : F or any s.t. , it holds that . See App endix F for the proof. SLC also satises Equation 4, b ecause : consider the SLP with and . The SLC measure is closely related to the Lemp el-Ziv factorization, a foundational concept in lossless data compression. T o explore this relationship, w e formally dene the LZ77 parsing size. Denition 4.3 : The LZ77 factorization of a w ord is the unique decomp osition with the following prop ert y . F or every , denote . Then, either and is the rst o ccurrence of the symbol in , or the follo wing t wo conditions hold: 1. There exists s.t. 2. Either or there is no s.t. That is, is the longest non-empty prex of the remaining sux that has o ccurred previously in (and the previous o ccurence is allow ed to o v erlap with ). The LZ77 c omplexity , denoted , is dened as the n umber of factors in this decomp osition. See App endix B for a recap of elemen tary prop erties and examples of LZ77 factorizations. satises Equation 4 since and Equation 5 since implies . A well-kno wn result in stringology [31] establishes that and are equiv alen t up to logarithmic factors: Hence, an y predictor that is statistically (resp. computationally) ecient w.r.t. is statistically (resp. computationally) ecien t w.r.t. and vice versa. 4.2 Comparison with A utomaticity Straigh t-line complexit y is a strictly more pow erful w ord complexit y measure than L TR - automaticit y (i.e. low er up to log factors). In tuitiv ely , the recursive -section of the time interv al inheren t in pro cessing the Most Signican t Digit First can b e directly mapp ed to the hierarchical concatenation structure of an SLP . Prop osition 4.3 : F or any base and word , the straigh t-line complexity is b ounded b y the L TR -automaticity: See App endix F for the proof. As the following example sho ws, also captures structures that don’t seem to b e captured by for an y v alue of . Example 4.1 : The Fib onac ci wor d is dened as the limit of the recursively dened sequence of w ords The b eginning of the Fib onacci word is The recurrence relation dening naturally describes an SLP of size . Since is the -th Fib onacci num b er, Prop osition 4.2 implies More generally , has mild growth for sev eral well-studied class of sequences (see App endix G for details). 4.3 Ecien t Prediction The Lemp el-Ziv Pluralit y (LZP , see App endix D) algorithm maintains the LZ77 factorization of the sequence so far. It is based on a plurality v ote betw een dieren t o ccurrences of the last factor. Theorem 4.1 : There exists a predictor (LZP) which is statistically and computationally ecien t w.r.t. . F or any and , let and . 1. Statistical Eciency: The n umber of mistakes is bounded by 2. Computational Eciency: The predictor satises the compression b ound See App endix D for the proof. A c kno wledgemen ts This w ork was supp orted by the A dv anced Research+In ven tion Agency (ARIA) of the United Kingdom, the AI Securit y Institute (AISI) of the United Kingdom, Surviv al and Flourishing Corp, and Coecient Giving in San F rancisco, California. The author wishes to thank Alexander Appel, Matthias Georg May er, her sp ouse Marcus Ogren, and Vinay ak Pathak for reviewing dras, lo cating errors and pro viding useful suggestions. Bibliograph y [1] M. Hutter, D. Quarel, and E. Catt, A n intr o duction to universal articial intel ligenc e . Chapman, Hall/CR C, 2024. [2] N. Cesa-Bianc hi and G. Lugosi, Pr e diction, le arning, and games . Cambridge Universit y Press, 2006. doi: 10.1017/CBO9780511546921. [3] P . J. Bro c kw ell and R. A. Davis, Intr o duction to time series and for e c asting . Springer, 2002. [4] T. B. Brown et al. , “Language Mo dels are F ew-Shot Learners,” in A dvanc es in Neur al Information Pr o c essing Systems 33: A nnual Confer enc e on Neur al Information Pr o c essing Systems 2020, NeurIPS 2020, De c emb er 6-12, 2020, virtual , H. Laro c helle, M. Ranzato, R. Hadsell, M.-F. Balcan, and H.-T. Lin, Eds., 2020. [Online]. A v ailable: h ttps://pro ceedings. neurips.cc/pap er/2020/hash/1457c0d6bfcb4967418b8ac142f64a-Abstract.h tml [5] J. F an, C. Ma, and Y. Zhong, “A selective ov erview of deep learning,” Statistic al scienc e: a r eview journal of the Institute of Mathematic al Statistics , v ol. 36, no. 2, p. 264, 2020. [6] I. K onto yiannis, L. Mertzanis, A. Panotopoulou, I. P apageorgiou, and M. Sk oularidou, “Ba yesian Context T rees: Mo delling and exact inference for discrete time series,” CoRR , 2020, [Online]. A v [7] H. A. Simon, “The Architecture of Complexity ,” Pr o c e e dings of the A meric an Philosophic al So ciety , v ol. 106, no. 6, pp. 467–482, 1962, Accessed: July 29, 2025. [Online]. A v ailable: h ttp://www.jstor.org/stable/985254 [8] C. G. Nevill-Manning and I. H. Witten, “Identifying Hierarc hical Structure in Sequences: A linear-time algorithm,” J. A rtif. Intel l. R es. , vol. 7, pp. 67–82, 1997, doi: 10.1613/JAIR.374. [9] P . Bürgisser, M. Clausen, and M. A. Shokrollahi, A lgebr aic c omplexity the ory , vol. 315. in Grundlehren der mathematisc hen Wissenschaen, v ol. 315. Springer, 1997. [10] T. K o ciumaka, G. Na v arro, and N. Prezza, “T ow ard a denitive compressibilit y measure for rep etitiv e sequences,” IEEE T r ansactions on Information The ory , v ol. 69, no. 4, pp. 2074– 2092, 2022. [11] A. Lemp el and J. Ziv, “On the Complexity of Finite Sequences,” IEEE T r ans. Inf. The ory , v ol. 22, no. 1, pp. 75–81, 1976, doi: 10.1109/TIT.1976.1055501. [12] J.-P . Allouc he and J. O. Shallit, A utomatic Se quenc es - The ory, A pplic ations, Gener aliza- tions . Cambridge Univ ersity Press, 2003. [Online]. A v ailable: h ttp://www.cambridge.org/gb/ kno wledge/isbn/item1170556/?site\_lo cale=en\_GB [13] D. Kempa and T. Kociumaka, “Collapsing the Hierarc hy of Compressed Data Structures: Sux Arra ys in Optimal Compressed Space,” in 64th IEEE A nnual Symp osium on F ounda- tions of Computer Scienc e, FOCS 2023, Santa Cruz, CA, USA, Novemb er 6-9, 2023 , IEEE, 2023, pp. 1877–1886. doi: 10.1109/F OCS57990.2023.00114. [14] S. Porat and J. A. F eldman, “Learning Automata from Ordered Examples,” Mach. L e arn. , v ol. 7, pp. 109–138, 1991, doi: 10.1007/BF00114841. [15] C. de la Higuera, Gr ammatic al Infer enc e: L e arning A utomata and Gr ammars . Cambridge Univ ersity Press, 2010. [16] I. A ttias, L. Reyzin, N. Srebro, and G. V ardi, “On the Hardness of Learning Regular Expres- sions,” in 37th International Confer enc e on Algorithmic L e arning The ory , 2026. [Online]. A v ailable: https://openreview.net/forum?id=h VPfu5BqY x [17] G. Rozenberg and A. Salomaa, Lindenmayer Systems: Imp acts on The or etic al Computer Scienc e, Computer Gr aphics, and Developmental Biolo gy . Berlin, Heidelb erg: Springer-V erlag, 2001. [18] F. Ben-Naoum, “A survey on L-system inference,” INFOCOMP Journal of Computer Scienc e , v ol. 8, no. 3, pp. 29–39, 2009. [19] J. Guo et al. , “In verse Pro cedural Modeling of Branc hing Structures by Inferring L-Systems,” A CM T r ans. Gr aph. , v ol. 39, no. 5, pp. 155:1–155:13, 2020, doi: 10.1145/3394105. [20] J. Bernard and I. McQuillan, “T echniques for inferring context-free Lindenmay er systems with genetic algorithm,” Swarm Evol. Comput. , v ol. 64, p. 100893, 2021, doi: 10.1016/ J.SWEV O.2021.100893. [21] J. J. Lee, B. Li, and B. Benes, “Laten t L-systems: T ransformer-based T ree Generator,” A CM T r ans. Gr aph. , v ol. 43, no. 1, pp. 7:1–7:16, 2024, doi: 10.1145/3627101. [22] I. McQuillan, J. Bernard, and P . Prusinkiewicz, “Algorithms for Inferring Context-Sensitiv e L-Systems,” in Unc onventional Computation and Natur al Computation - 17th International Confer enc e, UCNC 2018, F ontaineble au, F r anc e, June 25-29, 2018, Pr o c e e dings , S. Stepney and S. V erlan, Eds., in Lecture Notes in Computer Science, v ol. 10867. Springer, 2018, pp. 117–130. doi: 10.1007/978-3-319-92435-9\_9. [23] C. Duy , S. Hillis, U. Khan, I. McQuillan, and S. L. Shan, “Inductive inference of lindenmay er systems: algorithms and computational complexit y: C. Duy et al.,” Natur al Computing , pp. 1–11, 2025. [24] B. Ry abk o, J. Astola, and M. Malyutov, Compr ession-Base d Metho ds of Statistic al A nalysis and Pr e diction of Time Series . Springer, 2016. doi: 10.1007/978-3-319-32253-7. [25] O. Da vid, S. Moran, and A. Y ehuda yo, “On statistical learning via the lens of compression,” in Pr o c e e dings of the 30th International Confer enc e on Neur al Information Pr o c essing Systems , in NIPS'16. Barcelona, Spain: Curran Asso ciates Inc., 2016, pp. 2792–2800. [26] J. Ziv and A. Lemp el, “Compression of individual sequences via v ariable-rate co ding,” IEEE T r ans. Inf. The ory , v ol. 24, no. 5, pp. 530–536, 1978, doi: 10.1109/TIT.1978.1055934. [27] M. F eder, N. Merhav, and M. Gutman, “Universal prediction of individual sequences,” IEEE T r ans. Inf. The ory , v ol. 38, no. 4, pp. 1258–1270, 1992, doi: 10.1109/18.144706. [28] K. Gopalratnam and D. J. Co ok, “Activ e Lezi: an Incremental P arsing Algorithm for Sequen tial Prediction,” Int. J. A rtif. Intel l. T o ols , v ol. 13, no. 4, pp. 917–930, 2004, doi: 10.1142/S0218213004001892. [29] N. Sagan and T. W eissman, “A F amily of LZ78-based Univ ersal Sequential Probabilit y Assignmen ts,” CoRR , 2024, doi: 10.48550/ARXIV.2410.06589. [30] J. Shallit, The L o gic al A ppr o ach to A utomatic Se quenc es: Exploring Combinatorics on W or ds with W alnut . in London Mathematical So ciety Lecture Note Series. Cambridge Univ ersit y Press, 2022. [31] W. Rytter, “Application of Lemp el-Ziv factorization to the appro ximation of grammar-based compression,” The or. Comput. Sci. , vol. 302, no. 1–3, pp. 211–222, 2003, doi: 10.1016/ S0304-3975(02)00777-6. [32] J. A. Storer and T. G. Szymanski, “Data compression via textual substitution,” Journal of the A CM (JA CM) , v ol. 29, no. 4, pp. 928–951, 1982. [33] S. Constantinescu and L. Ilie, “The Lemp el-Ziv Complexity of Fixed Poin ts of Morphisms,” in Mathematic al F oundations of Computer Scienc e 2006 , R. Králo vič and P . Urzyczyn, Eds., Berlin, Heidelb erg: Springer Berlin Heidelb erg, 2006, pp. 280–291. A Pro of of the Coun ting Criterion In this app endix, w e pro vide the detailed pro of for Theorem 2.1. The theorem establishes a neces- sary and sucien t condition for the existence of a statistically ecien t predictor for a complexit y measure . Sp ecically , it states that such a predictor exists if and only if the counting complexit y grows polynomially in and quasilinearly in . A.1 Suciency W e rst prov e that if the counting complexity satises the condition , then there exists a statistically ecien t predictor . W e construct using the Plurality Algorithm , a standard approach in online learning [2] 7 . W e adapt it the anytime setting where the true complexity and length of the target sequence are unknown. T o handle these unknown parameters, w e emplo y a “doubling trick” strategy that op erates in phases. A.1.1 The Predictor The predictor op erates b y main taining dynamic estimates for the complexit y and length of the target sequence. W e employ a “doubling trick” strategy to adapt to these unkno wn parameters. Let and denote the complexit y b ound and length b ound during phase , resp ectiv ely . W e initialize the predictor with and . The core mechanism is a pluralit y vote o ver a restricted version sp ac e . At an y time step within phase , let denote the history observed so far. W e dene the version space as the set of all candidate nite w ords that are consisten t with the history , hav e complexity at most , and length at most : 7 Common names for this are “majority algorithm” and “halving algorithm” . W e prefer the w ord “plurality” b ecause w e w ork in the m ulticlass setting (i.e. can b e ), in whic h “pluralit y” is more accurate. The predictor estimates the “lik eliho o d” of the next sym b ol based on the cardinalit y of v alid extensions in the v ersion space. Specically , for eac h symbol , we coun t the num b er of words in that ha ve as the next sym b ol at p osition : The algorithm predicts the sym b ol that maximizes this coun t: This pro cedure is formalized in Algorithm 1. If the observed symbol results in a sequence that violates the curren t b ounds (i.e., if or ), the algorithm terminates the curren t phase. It then updates the bounds b y doubling the violated parameter—setting or —and pro ceeds to phase . This ensures that the resource b ounds gro w eciently to accommodate the true parameters of the target sequence. V ariables: History , complexity bound , length b ound . 1 for 2 3 for 4 5 6 receiv e 7 8 if then 9 if then Algorithm 1: Plurality Algorithm A.1.2 Mistak e Analysis W e explicitly b ound the n um b er of mistak es made by this predictor. Analysis of a Single Phase: Consider a single phase with xed b ounds and . Let be the n umber of mistak es made during this phase. Whenev er the predictor mak es a mistake at step (i.e., ), it implies that the true symbol was not the plurality choice. Consequently , the set of candidates consisten t with the true sym b ol, , must b e at most half the size of the total v ersion space at step : Since the new v ersion space at step b ecomes , the size of the version space is reduced b y a factor of at least 2 for ev ery mistak e committed. The initial size of the v ersion space is b ounded by the n umber of words of length at most with complexity at most , whic h is . Since the v ersion space must con tain at least the true target sequence (un til the bounds are violated), its size is at least 1. Therefore: Here, is the set of v ersion space at the start of the phase. Here and ev erywhere, stands for logarithm to base 2. T otal Mistake Bound: Let the target sequence hav e length and complexit y . Due to the monotonicit y condition Equation 5, the complexit y of an y prex of is b ounded b y: where is the p olynomial sp ecied in the denition of the complexity measure. The algorithm pro ceeds through a sequence of phases . The nal phase terminates when the en tire sequence is pro cessed. Due to the doubling strategy: 1. The nal length b ound satises . 2. The nal complexit y b ound satises . 3. The total n umber of phases is bounded b y the n umber of doublings of plus the n umber of doublings of : The total n umber of mistakes is the sum of mistak es in each phase: Although the coun ting complexity is not necessarily monotonic in (as the n um b er of lo w-complexity words of a sp ecic length ma y uctuate), the hypothesis of the theorem provides an upp er b ound that is monotonic. Sp ecically , we ha v e where . Since and for all phases , and since ma y be assumed to be non-decreasing in b oth argumen ts, w e can upp er b ound eac h term in the sum b y . This yields: W e no w verify statistical eciency . Substituting the explicit form of the b ound , we get Recalling that , we conclude This satises the criterion for statistical eciency . A.2 Necessit y W e now prov e the conv erse: if there exists a statistically ecien t predictor , then the counting complexit y must satisfy the gro wth condition. Assume is a statistically ecient predictor. By denition, there exists a p olynomial suc h that for an y sequence , the num b er of mistakes is b ounded b y: W e rely on a compression argumen t (information theoretic lo wer b ound). W e show that if the predictor is ecient, we can construct a compressed representation for any w ord of lo w complexit y . Let . W e wan t to b ound . An y w ord can b e uniquely reconstructed if w e know: 1. The deterministic prediction algorithm . 2. The sp ecic time steps where made a mistak e. 3. The correct sym b ol at those sp ecic time steps. F or a w ord of length with at most mistak es, w e can enco de the locations of the mistak es using bits: for the n umber of mistak es and for the lo cation of eac h. The correct sym b ols at these steps require bits. Thus, the description length in bits is: Since this encoding must distinguish ev ery w ord in , the n um b er of suc h words cannot exceed . T aking the logarithm giv es the counting complexit y: Substituting the mistak e b ound : Th us, the existence of an ecient predictor implies the required bound on the coun ting complexity . B Prop erties of Lemp elZiv 77 In this app endix, we provide background on LZ77-t yp e factorizations, b oth in the standard setting and in a -aligned v arian t used in the analysis of the HDP algorithm (Appendix C). W e rst dene general (non-greedy) factorizations, then sho w that the greedy v arian ts used in the main text are uniquely determined sp ecial cases that minimize the n um b er of factors. B.1 Standard LZ77t yp e F actorizations W e b egin with a general notion of LZ77-t yp e factorization, which relaxes the maximalit y (greed- iness) requiremen t of the standard LZ77 factorization given in the main text. Denition B.1 : An LZ77-typ e factorization of a word is a decomp osition in to non-empt y factors with the following property . F or every , denote . Then, either • and the symbol do es not app ear in , or • there exists s.t. . In the rst case, we call a liter al factor . In the second case, we call a c opy factor with sour c e p osition . The n umber of factors is called the size of the factorization. Note that a copy factor is allo wed to overlap with its source: it is p ossible that . In tuitively , this corresp onds to a byte-b y-byte copy where previously copied symbols b ecome a v ailable as source material for the remainder of the factor. Also note that when and the sym b ol do es app ear in , the factor falls under the second case as a cop y factor of length 1. Example B.1 : Let and (length 8). The following are all v alid LZ77- t yp e factorizations of : 1. (5 factors). Here, , and are eac h a copy of . 2. (4 factors). Here, is a copy from source p osition ; the source and the destination are adjacen t but do not ov erlap. 3. (3 factors). Here, is a copy from source position , with : the source o verlaps with the destination , illustrating the o verlap phenomenon. The greedy LZ77 factorization (Denition 4.3), is the sp ecial case where each factor is c hosen to b e as long as p ossible. W e restate the denition here for clarity . Denition B.2 : The (gr e e dy) LZ77 factorization of a word is the unique LZ77-type factorization that additionally satises the follo wing maximality condition: for ev ery , there is no s.t. That is, each non-nal factor is as long as p ossible: if the factor is a copy , it is the longest prex of the remaining sux that has an earlier o ccurrence in (and if the next symbol is new, it necessarily has length 1). The LZ77 c omplexity is the num b er of factors in this factorization. The greedy LZ77 factorization is uniquely determined: starting from p osition , each factor is xed by the requiremen t that it b e the longest p ossible matc h (or a literal if the symbol is new). Since , the en tire factorization is determined inductively . Example B.2 : Contin uing Example B.1, the greedy LZ77 factorization of is with 3 factors (factorization 3 from Example B.1). This is the unique factorization satisfying the maximality condition. F or instance, factorization 1 from ExampleB.1 is not greedy b ecause could b e extended: , violating maximalit y . A fundamental prop erty of the greedy LZ77 factorization is that it minimizes the n umber of factors among all LZ77-t yp e factorizations (see Theorem 10 in [32]). Theorem B.1 : (StorerSzymanski.) F or an y word , the greedy LZ77 factorization of has the minim um num b er of factors among all LZ77-type factorizations of . B.2 aligned LZ77t yp e F actorizations F or the analysis of the HDP algorithm, we require a v ariet y of LZ77-type factorizations where factors are constrained to b e aligned with blo cks of size . This ensures compatibility with the hierarc hical dictionary structure maintained b y HDP . Denition B.3 : Fix . A -LZ77-typ e factorization of a word is a decomp osition into non-empt y factors with the following prop erties. F or ev ery , denote . Then, there exist suc h that: • (the factor starts at an aligned p osition). • Either or ( and ). Moreo ver, either and do es not app ear in (a literal factor), or there exists s.t. (a cop y factor at level , cop ying a previously seen aligned blo c k). W e call the level of factor , and the n umber of factors the size of the factorization. In other words, each factor occupies a complete aligned blo c k of size (except p ossibly the last factor, whic h ma y b e shorter), and either introduces a new symbol at level 0 or matc hes a prex of an aligned blo c k of the same size that app eared earlier in . Note that, unlike the standard case, o verlapping copies cannot o ccur here: since b oth the source and destination blo c ks are aligned to multiples of and implies , the source blo ck ends at or b efore the start of . An y -LZ77-type factorization is in particular an LZ77-t yp e factorization in the sense of Den- ition B.3. Also note that the c hoice of level is constrained by the starting p osition : we m ust hav e . F or example, a factor starting at a p osition that is an o dd multiple of cannot ha ve lev el greater than 1. Example B.3 : Let , , and (length 8). The aligned blocks are: • Lev el 0 (size 1): • Lev el 1 (size 2): • Lev el 2 (size 4): • Lev el 3 (size 8): The follo wing are b oth v alid -LZ77-t yp e factorizations: 1. (7 factors), all at level 0 except at lev el 1. 2. (4 factors), with lev els , , . In factorization 2, at lev el 1 copies the aligned block , and at lev el 2 copies the aligned blo c k . The greedy -LZ77 factorization is the v ariant that alw ays promotes factors to the highest p ossible lev el. Denition B.4 : Fix . The (gr e e dy) -LZ77 factorization of a word is the unique -LZ77-type factorization that additionally satises the following maximality condition: for ev ery , if is a m ultiple of , then there is no s.t. The maximality condition states that no factor can b e “promoted” to the next lev el: if the curren t factor and its neighbors could b e merged into a larger aligned blo ck that has appeared previously , the greedy factorization would hav e already used that larger blo ck. This determines the factorization uniquely , by the following inductive argument. A t each p osition , the greedy algorithm considers all levels such that and (plus p ossibly with for the last factor), in decreasing order. It selects the highest level at which the aligned blo c k has a previous o ccurrence (or falls bac k to a level-0 literal if the sym b ol is new). Hence, the factor is uniquely determined, and determines the start of the next factor. Example B.4 : Contin uing Example B.3, the greedy -LZ77 factorization of is with 4 factors (factorization 2 from Example B.3). T o see that factorization 1 from Example B.3 is not greedy , observe that at position , the lev el-0 factor can b e promoted: is a multiple of , and the level-1 blo ck matc hes the earlier blo c k , violating the maximality condition. C Hierarc hical Dictionary Pluralit y In this app endix, we provide the detailed description and analysis of the Hierarchical Dictionary Pluralit y (HDP) algorithm referred to in Section 3. This algorithm is designed to eciently predict sequences with low -automaticit y by maintaining a dynamic hierarc hy of dictionaries that appro ximate the transition structure of the underlying automaton. C.1 Algorithm Description The HDP algorithm main tains a compressed representation of the sequence history using a hierarc hy of dictionaries, denoted . The dictionary at level , , stores unique blo cks of length encoun tered so far. T o ensure space eciency , these blo cks are not stored as raw strings but as tuples of indices p oin ting to en tries in the dictionary of the level below, . C.1.1 State The in ternal state of the predictor at time consists of: 1. A list of dictionaries . 1. is xed as the alphab et (conceptually mapping indices to sym b ols). 2. F or , is a dynamic list of -tuples. Eac h en try is a tuple , where is an index in to . This represen ts a w ord of length formed b y concate- nating the w ords at indices from lev el . 2. A list of activ e buers . 1. stores the sequence of indices from lev el observed so far in the current, incomplete blo c k of lev el . Initially , for and is the empt y word for all . C.1.2 Up date R ule The up date function pro cesses a new symbol b y propagating it up the hierarch y . At level 0, the symbol is appended to the buer . When a buer at level reaches size , it forms a complete blo ck. The algorithm c hecks if this blo c k (represented as a tuple of indices) exists in . If not, it is added. The index of this blo c k is then passed as an input to the buer at lev el . This pro cess con tinues recursiv ely until a buer does not ov erow (see Algorithm 2). 1 Input: base , state , new sym b ol 2 let index of in 3 for 4 app end to 5 if 6 return 7 else 8 let 9 empt y 10 if can select s.t. 11 12 else 13 app end to 14 Algorithm 2: HDP Up date C.1.3 Prediction R ule T o predict the next sym b ol, HDP employs a hierarc hical pluralit y v ote. It identies a set of “v alid v ersions” . A version is a pair , where represents a candidate index for the next blo ck at some lev el , and is the implied next sym b ol at lev el 0. The algorithm initializes with all p ossible symbols. It then iteratively lters by chec king consistency with the active buers against the known transitions in the dictionaries . Sp ecically , if the curren t buer com bined with a candidate next index forms a tuple presen t in , the v ersion survives. If ltering at lev el eliminates all candidates, the algorithm halts and outputs the plurality v ote (most frequent ) among the surviv ors from lev el . Otherwise, it pro ceeds to . This logic can b e regarded as running the computation of the underlying automaton in reverse (see Algorithm 3). 1 Input: base , state 2 3 for 4 5 for 6 for suc h that 7 add to 8 if 9 return 10 else 11 Algorithm 3: HDP Predict C.2 Analysis W e now pro ve Theorem 3.1, establishing the compression and mistake bounds for HDP with resp ect to the L TR -automaticity measure . First, some notation. W e will regard a DF A with input alphab et and output alphab et as a tuple , where: • is the set of states. • is the initial state. • is the transition function. • is the output function. W e will use the notation to denote the standard recursiv e extension of the transition function to input w ords. No w, let’s connect the conten t of the dictionaries to the states of the minimal DF A. Lemma C.1 : Consider any s.t. , and a DF A with input alphabet and output alphabet s.t. . Consider also some and s.t. and . Assume that Then, In tuitively , this lemma says that the conten t of an aligned blo c k in is completely determined b y the DF A state reac hed aer reading the “address” of that block. Recall that eac h position in is computed b y feeding the base- digits of into the automaton , most signican t digit rst. W e can split these digits in to tw o parts: a high-or der pr ex of length (which selects whic h aligned blo c k of size w e are in) and a low-or der sux of length (whic h selects the p osition within that blo c k). The high-order prex steers the automaton from to some in termediate state ; the low-order sux then determines, starting from , the actual output symbol. If tw o dieren t blo c k indices and happ en to steer the automaton to the same intermediate state, then ev ery p osition inside those t w o blo cks will see the same low-order computation and hence pro duce the same symbol. In other words, the blo ck’s conten t dep ends only on whic h state the automaton is in when it “en ters” the blo c k — not on how it got there. Pr o of : Let denote the state reached by the automaton aer pro cessing the prexes corresp onding to and . By the hypothesis: W e examine the sub words of starting at indices and . Let b e an in teger oset representing the relativ e p osition within these blo c ks, such that . The absolute p ositions in the sequence are and . Since , the base- representation of the full time index is the concatenation of the represen tation of (padded to length ) and the representation of (padded to length ). That is: The sym b ol generated at is given by . Using the recursiv e prop erty of the transition function , w e ha ve: Similarly for : Since the right-hand sides are identical, for all v alid osets . Consequently , the sub words dened b y the range of are identical. In particular, this shows that , since eectiv ely stores the dierent aligned blocks of size . C.2.1 Compression Bound Theorem 3.1 (P art 2) : F or an y , let and . The state size of HDP is b ounded b y: Pr o of : Let b e the minimal DF A with states that generates in the sense of Den- ition 3.1. That is, there exists some s.t. for an y p osition , . The HDP algorithm builds dictionaries where en tries in corresp ond to blo c ks of length encountered in at indices aligned to m ultiples of . Sp ecically , an y entry added to corresp onds to a sub w ord for some . By Lemma C.1, the conten t of such a blo ck is entirely determined by the state of the DF A aer pro cessing the prex of the index corresp onding to the most signicant digits. Since there are only states in , there are at most distinct types of blo cks of length that can b e generated b y . Consequen tly , the size of the dictionary at an y level is b ounded b y . The maxim um lev el reac hed is b ounded by . The storage cost for one entry in is the size of a -tuple of indices p ointing to . F or , , and hence an index requires bits. Th us, eac h en try tak es bits. Summing o v er all levels : W e also hav e and hence for the cost is . Adding this we obtain the b ound . C.2.2 Mistak e Bound In order to prov e the mistak e b ound, w e will use the -LZ77 factorization (Denition B.4). The n umber of factors in the -LZ77 factorization can b e b ounded in terms of automaticity . Lemma C.2 : F or any , the num b er of factors in the -LZ77 factorization of satises Pr o of : Let . W e analyze the -LZ77 factorization b y mapping the factors to no des in a -ary tree representing all aligned blo cks of . A no de at height represents an aligned blo c k of length . W e classify each aligned blo c k app earing in as New if it is the rst o ccurrence of its con tent in (ordered b y start index), and Old otherwise. By Lemma C.1, the con ten t of an aligned blo c k of length is determined b y the state of the generating DF A aer reading the index prex. Since the DF A has states, there are at most distinct blo ck con tents for any xed length . Therefore, there are at most “New” blo c ks at an y level . Consider a factor in the -LZ77 factorization with length . By the maximality condition in the denition of -LZ77 (sp ecically condition 2), the aligned block of size that con tains (the parent no de in the tree) must not hav e appeared previously in . If it had appeared previously , then the larger block w ould ha ve b een the factor. Thus, every factor is a c hild of a “New” blo c k. The total n umber of “New” blo c ks across all levels (from to ) is: Since ev ery factor is a c hild of a “New” blo c k, and eac h “New” block has at most c hildren (the sub-blo c ks of size ), the total n umber of factors is b ounded by: W e can no w complete the pro of. Theorem 3.1 (P art 1) : The num b er of mistakes is bounded by: Pr o of : Let and . W e consider the -LZ77 factorization of , denoted . By the previous lemma, the n umber of factors is b ounded b y . W e analyze the mistak es committed by the HDP predictor during the processing of a single factor . Let the length of this factor b e . According to the denition of the factorization, unless is a single sym b ol app earing for the rst time, it corresponds to a blo c k that has o ccurred previously in at an aligned p osition. Since HDP pro cesses the sequence in order, the previous o ccurrence of this blo ck was fully pro cessed b efore the start of . Consequen tly , the dictionary entry representing this blo ck m ust already exist in . This ensures that at least one “correct” v ersion (a p ointer to the correct blo ck in ) exists in the version space (i.e. the dictionary elemen ts that might con tribute to ) at the start of the factor. W e no w argue that the version space is stable. The HDP algorithm up dates a dictionary only when a buer is completely lled. F or level , the factor length matches the blo c k size, so is updated only at the end of the block. F or levels , the block size implies that no lev el- blo c k can complete strictly within the duration of . Therefore, for all prediction steps within , the set of a v ailable candidates in the dictionaries is xed to those presen t at the start of the factor. Since no new versions are created, the set of v alid versions is monotonically non-increasing. The predictor mak es a mistak e only when the pluralit y of v alid versions predicts the wrong sym b ol. When a mistake o ccurs, all versions con tributing to the incorrect plurality prediction are eliminated from the v alid set. This reduces the n umber of v alid versions by at least a factor of 2. Since for all , the num b er of mistakes p ossible at any sp ecic lev el is b ounded b y . Summing across all activ e lev els (up to ), the total mistak es for factor are b ounded b y: The total n umber of mistakes is the sum o ver all factors: Substituting the b ound for from Lemma C.2: D Lemp elZiv Pluralit y In this app endix, w e dene the Lemp el-Ziv Plurality (LZP) algorithm and provide the pro of for Theorem 4.1. D.1 Algorithm Description D.1.1 State The state of LZP is a representation of the past sequence via its LZ77 factorization. Let the LZ77 factorization be . The represen tation is a list with an entry for ev ery . Let . Then, • If , w e just store the sym b ol . • If and , w e store the pair , where is an y s.t. Initially , is the empt y list. D.1.2 The SA Compressed Index T o ecien tly implemen t the algorithm while satisfying the space and time constraints dened in Section 2, we utilize the -SA data structure prop osed b y Kempa and K o ciumaka [13]. This index is a compressed represen tation of the so-called sux arra y and its in v erse, designed specically for highly rep etitiv e sequences. The -SA admits a deterministic construction algorithm that runs in p olynomial time given the LZ77 factorization of , represented as ab ov e. This prop erty is crucial, as it allo ws us to build it directly from the compressed state without ev er expanding the sequence to its full length. T o rigorously dene the queries supp orted by the index, we rst dene the lexic o gr aphic al r ank . W e assume a xed total order on the alphab et . The standar d lexic o gr aphic al or der on , denoted , is dened as follo ws: for an y tw o distinct strings , w e sa y if and only if either is a prop er prex of (i.e. ), or there exists an index such that and . Fix some . W e will use the shorthand notation for suxes. The lexicograph- ical rank of a sux , denoted , is its p osition among all suxes of sorted b y . F ormally: Th us, the ranks form a p ermutation of . Once constructed, the -SA (plus its v arian ts describ ed in [13]) supp orts the following stringological queries: • Random A ccess (RA): Given , the query return . • In verse Sux Array (ISA): Given a p osition , the query returns the lexicographical rank of the sux starting at . That is, . This allo ws the predictor to determine the lexicographical order of dieren t histories eciently . • Sux Array (SA): Given a rank , the query returns the starting p osition of the sux whose lexicographical rank is . F ormally , . This serv es as the inv erse op eration to the ISA query . • Longest Common Extension (LCE): Given tw o p ositions , the query returns the length of the longest common prex of the suxes and . F ormally: All these queries are supp orted in p olynomial time. F or our purp oses, their utilit y is enabling the follo wing tw o queries. Denition D.1 : (In ternal Pattern Count.) F or a xed , giv en any and , w e dene Denition D.2 : (Internal Pattern Match.) F or a xed , giv en an y and , w e dene to b e an y s.t. , or if there is no suc h . Lemma D.1 : and can b e computed in time and oracle queries, giv en oracle access to RA, ISA, SA and LCE. Pr o of : T o compute the coun t ( ) or the lo cation ( ) of the pattern , w e iden tify the con tiguous range of suxes in the Sux Array that b egin with this pattern. The pattern consists of the sux follow ed b y the symbol . Since is itself a sux of , its p osition in the Sux Arra y is giv en exactly by . Crucially , b ecause is a prop er prex of an y other sux that matc hes the pattern (i.e., ), is lexicographically strictly smaller than any such . Conse- quen tly , the index marks the inclusiv e lo w er b ound of the range of suxes starting with . The algorithm pro ceeds in t w o stages: 1. W e determine the upp er b ound of the range of suxes starting with by performing a binary searc h to the righ t of . W e use the LCE oracle to verify if a candidate sux extends . 2. Within this iden tied range, w e p erform t w o additional binary searches to isolate the sub- range where the c haracter at oset (the p osition immediately follo wing the prex ) matches . The range returned b y this pro cedure corresp onds to all o ccurrences of . F or , w e return the size of this range. F or , w e return (or if the range is empt y). The core logic is detailed in Algorithm 4, and the wrapp er queries in Algorithm 5. It’s easy to see that there are oracle queries, and the time complexity mo dulo oracle calls is . 1 Input: index , symbol 2 3 4 , 5 6 while 7 8 9 if 10 11 12 else 13 14 if 15 return 16 , 17 , 18 while 19 20 21 22 if 23 24 25 else 26 27 if or 28 return 29 , 30 while 31 32 33 34 if 35 36 37 else 38 39 return Algorithm 4: Pattern Range Searc h via Sux Array Oracle 1 Pro cedure 2 3 if 4 return 5 else 6 7 return 8 Pro cedure 9 10 if 11 return 12 else 13 14 return Algorithm 5: IPC and IPM Queries D.1.3 Up date R ule The Lemp el-Ziv Plurality predictor main tains the LZ77 factorization of the sequence pro cessed so far. When a new sym b ol is receiv ed, the algorithm must update this factorization eciently . Recall that the state is represented by the list . Let the curren t sequence b e with factorization . The new sym b ol eectively asks whether the last factor can b e extended to include while satisfying the LZ77 constrain ts, or if must begin a new factor . Using the metho ds of the previous section, sp ecically the in ternal pattern match query IPM , w e can resolv e this decision ecien tly . Indeed, extending the last factor b y is v alid if and only if the concatenated string app ears previously in . The up date pro cedure is formally describ ed in Algorithm 6. 1 Input: LZP state list , new sym b ol 2 if is empt y 3 .app end( ) 4 return 5 6 7 if “en try” is pair 8 9 else 10 11 12 13 14 if 15 16 else 17 .app end( ) Algorithm 6: LZP Up date D.1.4 Prediction R ule The prediction phase of the Lemp el-Ziv Plurality algorithm relies on a “plurality vote” based on the history of the last LZ77 factor. Let b e the sequence pro cessed so far, and let b e the most recen t factor in its LZ77 factorization. T o predict the next symbol, the algorithm considers all previous o ccurrences of in and examines the sym b ol immediately follo wing eac h o ccurrence. The predicted sym b ol is the one that app ears most frequen tly as a con tinuation of . F ormally , let and . W e consider the set of indices where the pattern app eared previously: F or eac h symbol , w e coun t the “votes” from these occurrences: The algorithm outputs . Ecien tly computing these coun ts is possible using the in ternal pattern coun t query dened in Algorithm 5. Observe that querying , where “pos” is the starting index of the last factor , returns exactly the n umber of times the sux (whic h is ) occurs in . Since the curren t sequence ends precisely aer , an y occurrence of m ust eectiv ely start at some , thereby strictly preceding the curren t factor. Th us, is equiv alen t to . The prediction pro cedure is detailed in Algorithm 7. 1 Input: LZP state list , alphab et 2 if is empt y 3 return arbitrary sym b ol 4 5 6 if “en try” is pair 7 8 else 9 10 11 12 13 14 for in 15 16 if 17 18 19 return Algorithm 7: LZP Predict D.2 Analysis W e no w pro v e Theorem 4.1, establishing the compression and mistake b ounds for LZP with respect to the LZ77 complexit y . First, w e clarify the relationship b etw een the LZP state size and the complexit y measures. Recall that the LZ77 factorization of is denoted , where . D.2.1 Compression Bound Theorem 4.1 (P art 2) : The predictor satises the compression b ound: Pr o of : The internal state of the LZP predictor at time (where has length ) consists solely of the list representing the LZ77 factorization of the history . Note that while the algorithm constructs the -SA index to p erform queries eciently , this index is rebuilt transiently at ev ery step (which is feasible in polynomial time) and is not part of the p ersisten t state . Let . The list con tains exactly entries. Each en try is either a literal sym b ol or a pair represen ting a bac kward reference. • A literal requires bits. • A reference pair requires storing a p osition and a length . This requires bits. Therefore, the bit size of the state is b ounded b y: This conrms the compression b ound. D.2.2 Mistak e Bound Theorem 4.1 (P art 1) : The num b er of mistakes is bounded by: Pr o of : Let and let its LZ77 factorization b e . W e analyze the mistak es committed b y the LZP predictor during the pro cessing of a single factor . Case 1: Literal F actor. If is a single symbol that has not app eared previously in , it con tributes at most mistake. Case 2: Copied F actor. Supp ose is a factor of length that copies a previous substring. Let . By denition, there exists a position in the history ( ) such that . W e dene a “V ersion Space” represen ting the set of v alid history indices consistent with the prex of observ ed so far. Let b e the prex of length . Ob viously , . Because is a v alid LZ77 cop y , the “true” history index is guaran teed to satisfy for all . This ensures the v ersion space is never empt y: . The LZP algorithm predicts the next sym b ol by taking a plurality vote among the extensions of indices in . Let be the correct next sym b ol. The algorithm calculates the coun ts for each sym b ol : The prediction is . The set of surviving v ersions consists exactly of those indices in that correctly predict the observ ed symbol : Th us, the size of the new version space is exactly the coun t of the correct sym b ol: . A mistake o ccurs if . If a mistake o ccurs, it implies that did not win the pluralit y v ote. In particular, could not ha ve held a strict ma jorit y of the votes in . Therefore, whenev er a mistake is made, the n umber of surviving versions m ust satisfy: Let b e the n umber of mistak es made during the pro cessing of factor . Since eac h mistake reduces the v ersion space cardinality b y a factor of at least , we ha ve: (Here, is the set of v ersions at the end of the factor.) Since the true index remains in the set throughout, we m ust hav e . Therefore: T aking logarithms: There migh t b e one extra mistak e in the b eginning, when the factor do esn’t exist y et. T otal Mistak es: The total n umber of mistakes is the sum of mistak es ov er all factors. Since eac h factor contributes at most , we get E BaseSensitivit y of A utomaticit y In this app endix, we prov e Example 3.2. Recall the setup: let , and for any , let and W e claim that and . Note that . E.1 Upp er Bound: Pr o of : W e construct a DF A o ver input alphabet with states that computes from . The sym b ol dep ends only on : In the binary representation , the residue is determined by the last bits , and if and only if . Hence, Dene the DF A with ( states), initial state , and transitions: • F or and : . • and . • F or : and . The output function is and (the v alues of on the remaining states are irrelev an t). On input , the automaton passes through aer reading . It then reads and transitions to (if ) or (if ). The remaining bits lea ve the state unchanged. Therefore, as required, and . E.2 Lo wer Bound: Pr o of : Let be an y DF A with input alphabet and output alphab et that computes in the sense of Denition 3.1, i.e., there exists with such that for all . W e sho w that . Let , so that . In particular, . Consider the aligned blo cks of size in , i.e., the subw ords for By Lemma C.1, if tw o blo c k indices and lead the DF A to the same state aer reading their base- address (i.e., where ), then the corresp onding blo cks ha v e iden tical conten t. Contrapositively , It remains to sho w that the right-hand side is . Structure of 𝑢 𝑘 . The word has p erio d . The transitions from to (whic h w e call -b oundaries ) o ccur at p ositions for o dd with . Since , eac h aligned blo c k of size con tains at most one b oundary . Coun ting distinct blo cks via b oundary osets. F or an -b oundary at p osition (with o dd), the oset of the b oundary within its enclosing aligned blo c k is If , the blo ck con taining this b oundary has the form . T wo suc h blo cks with dieren t osets are manifestly distinct. W e no w sho w that all nonzero osets are achiev ed. Since and is a p ow er of , w e hav e , so m ultiplication b y is a bection on . Moreo ver, since is odd, the o dd integers represen t all distinct residue classes mo dulo : indeed, if and are odd with and , then is divisible by and even; since and is o dd, the only p ossibilit y is . Hence, the set is a p ermutation of . Since there are o dd v alues of in (there are exactly of them), and the rst already exhaust all residues mo dulo , every is realized. This yields blo c ks of the form with pairwise distinct con tent. Therefore, F Prop erties of Straigh tLine Programs In this app endix, w e pro vide the pro ofs of Prop osition 4.1, Proposition 4.2 and Proposition 4.3. F.1 Pro of of Prop osition 4.1 Pr o of : Let b e an SLP ov er . W e construct a binary SLP with and . W e pro cess each nonterminal in top ological order (from sinks tow ard the source ). If , the rule is already binary and w e keep it unc hanged. If , w e replace b y a binary tree of fresh nonterminals. Concretely , let where . W e dene a recursiv e binarization pro cedure that returns a non terminal whose v alue is : • If , create a fresh nonterminal with and return . • If , let . Recursively construct and . Create a fresh non terminal with and return . W e set ’s replacement in to b e the result of this procedure applied to , and similarly for ev ery nonterminal. It remains to b ound . The binarization of a single rule of length pro duces a binary tree with lea ves. Suc h a tree has exactly in ternal no des, each contributing 2 edges. Hence, the num b er of edges in tro duced is . Mean while, the original rule con tributed edges. Therefore, binarizing one rule at most doubles its edge count. F.2 Pro of of Prop osition 4.2 Pr o of : Let with , i.e. for some . Let b e an optimal SLP for , so and . W e construct an SLP with and b y “truncating” . Since , we dene the truncation recursiv ely . F or eac h non terminal with and a target length , w e add a non terminal whic h ev aluates to : • If , then and (no truncation needed). • Otherwise, nd the unique index such that . Let . ‣ If or , create a fresh nonterminal with , discarding en tries . ‣ If and , create a fresh nonterminal with , discarding en tries . The ro ot of is , and is the set of non terminals reachable from . W e no w b ound the size. The recursion visits a sequence of non terminals following a path from the root tow ard the lea ves of . At each visited nonterminal , we create at most one fresh nonterminal . Since the path visits each original non terminal at most once, the num b er of fresh nonterminals is at most . The out-degree of each fresh nonterminal is at most the out-degree of the original non terminal it was deriv ed from, since w e only remo ved en tries from the right end of . That is, . The edges of consist of the edges from original rules (for nonterminals reac hable from the new ro ot that were not truncated) and the edges from the fresh rules. The fresh rules con tribute at most edges in total. The original (unmo died) rules that remain con tribute at most edges. Hence: Therefore, . F.3 Pro of of Prop osition 4.3 Pr o of : Let and let . By denition, there exist an in teger with and a DF A with states, input alphab et and output alphabet , suc h that for all . Let (so that ) and let . Since ev ery satises , the represen tation b egins with leading zeros. Therefore, for all : where . That is, is equally well generated b y starting from state with input length . W e no w construct an SLP of depth . F or an y state and level , dene the w ord That is, is the w ord of length obtained by running from state on all -digit base- inputs in order. In particular, . Eac h w ord decomp oses by the rst input digit: reading digit transitions from to , and the remaining digits produce . Therefore: W e build the SLP b ottom-up, in tro ducing a non terminal for eac h pair with and , suc h that . • Base c ase ( ): is a single sym b ol, so is just the terminal . No edges are needed. • R e cursive c ase ( ): W e set . This rule has exactly edges. The ro ot of the SLP is , and . The n umber of nonterminals with is at most , eac h contributing edges, so Since has as a prex, Prop osition 4.2 gives . G Some Classes of Sequences with Lo w SLC In this app endix, w e surv ey sev eral w ell-studied classes of sequences from com binatorics on words and establish that they hav e slowly gro wing straight-line complexit y . F or each class, w e b ound (or equiv alently , up to a factor of ) as a function of , showing that the LZP algorithm ac hieves strong mistak e b ounds on these sequences. G.1 A utomatic Sequences A utomatic sequences are among the most extensively studied ob jects in combinatorics on words (see [12] for a comprehensive treatmen t). A sequence is called -automatic (for a xed in teger ) if there exists a DF A with input alphab et and output alphab et such that for all and , By Denition 3.1, a -automatic sequence satises : the n um b er of states in the minimal automaton is a constant (dep ending on but not on ). Prop osition 4.3 then immediately giv es: Hence, by the equiv alence of and up to logarithmic factors (equation (19)), w e also hav e . G.2 Morphic Sequences Morphic sequences form one of the most imp ortant families of innite words in combinatorics on w ords (see [12], Chapter 7). W e b egin by recalling the necessary denitions, then state the Lemp el– Ziv complexit y classication due to Constantinescu and Ilie [33]. G.2.1 Bac kground and Denitions Recall that a homomorphism of free monoids is a map satisfying for all . Suc h a map is completely determined by its v alues on the individual symbols in . W e sa y that is non-er asing if for ev ery , i.e., no symbol is mapped to the empt y w ord. A c o ding is a homomorphism that maps every symbol to a single letter (i.e., for all ); it is equiv alen tly just a function applied symbol-by-sym b ol. W e say that is pr olongable on a symb ol if is a prop er prex of , that is, for some non-empt y . In this case, the iterates form an increasing c hain of w ords (eac h is a prex of the next), and their limit denes a unique innite sequence . Concretely , the -th sym b ol of is the -th sym b ol of for an y suciently large . Denition G.1 : A sequence is morphic if there exist a nite alphab et , a non- erasing homomorphism prolongable on some , and a co ding , suc h that If and is the identit y , the sequence is called pur e morphic . The class of morphic sequences strictly con tains the -automatic sequences: a sequence is - automatic if and only if it can b e obtained from the ab ov e construction with b eing -uniform , meaning for every (see [12], Theorem 6.3.2). G.2.2 The Gro wth F unction The asymptotic b ehavior of and for a morphic sequence is gov erned by the gr owth function of the underlying morphism. F or a letter and a non-erasing homomorphism , the gro wth function is , measuring the length of the -th iterate. A classical result (see Lemma 12 in [33] and citations therein) characterizes the p ossible asymptotic b eha viors: there exist an in teger and an algebraic real suc h that When , the growth is exp onential (this includes all -uniform morphisms, which ha ve and ). When , the gro wth is p olynomial of degree . G.2.3 The Constan tinescu–Ilie Classication Constan tinescu and Ilie [33] gav e a complete characterization of the Lemp el–Ziv complexit y classes for xed p oints of non-erasing morphisms. W e state their main result, translated into our notation. Theorem G.1 : (Constantinescu–Ilie [33].) Let b e the xed p oint of a non- erasing morphism prolongable on , and let b e the gro wth function. 1. If is ultimately p erio dic 8 , then . 2. If is not ultimately p erio dic and has exp onential growth ( ), then . 3. If is not ultimately p erio dic and has p olynomial gro wth of degree ( ), then . Note that the case of p olynomial growth with (linear growth, i.e., ) is absent from case 3: Constan tinescu and Ilie sho w that linear gro wth alwa ys forces the xed p oint to b e ultimately p erio dic, reducing it to case 1. 8 A sequence is ultimately p erio dic if there exist with such that for all . G.3 Characteristic W ords F or , the char acteristic wor d with slop e is the binary sequence dened b y Characteristic w ords are a fundamen tal ob ject in com binatorics on w ords (see [12], Chapter 9). They are the prototypical example of sequences that are not morphic in general (a characteristic w ord is morphic if and only if is a quadratic irrational), yet still ha ve logarithmic . Prop osition G.1 : F or any and the corresp onding characteristic w ord , Notice that this b ound is uniform w.r.t. , and hence an y ecient predictor for SLC has a uniform p olylog mistak e b ound o v er this entire class of sequences. The pro of relies on the theory of con tin ued fractions and standard words, whic h we briey recall. G.3.1 Characteristic Blo c ks Ev ery irrational has a unique contin ued fraction expansion with partial quotients . The char acteristic blo cks are dened b y , , and the recurrence W e will denote . Ob viously , we ha ve , and The k ey prop ert y we need is that for (see [12], Theorem 9.1.10). W e also ha ve the ob vious low er b ound: G.3.2 Pro of of Prop osition G.1 Pr o of : If is rational then is p erio dic and the prop osition is obviously true. F rom now on, w e assume is irrational. W e construct a sequence of SLPs with , where eac h extends b y adding fresh non terminals. Because of this cum ulative construction, all non terminals of are already presen t inside and can b e referenced freely when building . W e use the fact that for any nonterminal pro ducing a word and any integer , an SLP for can b e built from by repeated squaring, in tro ducing at most fresh edges. Indeed, writing in binary as with and , the binary metho d computes using squaring rules, then accumulates the pro duct o v er the remaining set bits using at most concatenation rules (one p er set bit among ). Eac h rule is binary (2 edges), giving at most fresh edges in total. F or , no fresh edges are needed. Construction and size b ound. is the trivial SLP with v alue and size . F or : if then and ; otherwise, we build by rep eated squaring and concatenate the terminal , giving . F or , given (which already contains ), we build pro ducing as follows. W e raise the ro ot non terminal of to the p ow er by rep eated squaring ( fresh edges), then concatenate with the ro ot of via one fresh binary rule ( edges). Since the nonterminals of already reside inside , the only new edges are from the squaring c hain and the nal concatenation, giving It follo ws that for all : W e b ound eac h term. By Equation 104, , so the sum telescop es: F or the linear term, (since ), which gives . Iterating, for all (since and ). Hence Substituting Equation 108 and Equation 109 in to Equation 107: Handling arbitrary prexes. Given , let b e the largest index with , so that . W e rst observe that is a prex of : indeed, and . It follo ws that is a prex of . Since , we hav e . No w let , so that and . W e hav e . The SLP for extends b y at most fresh edges from the repeated-squaring c hain, so its size is at most . By Equation 110 and the b ounds and : Finally , Prop osition 4.2 giv es
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment