Stringological sequence prediction I: efficient algorithms for predicting highly repetitive sequences

Stringological sequence prediction I: ecien t algo rithms for predicting highly rep etitiv e sequences V anessa K osoy 1,2 1 F aculty of Mathematics, T echnion – Israel Institute of T ec hnology 2 Computational Rational Agen ts Lab oratory vanessa@alter.org Abstract. W e prop ose no vel algorithms for sequence prediction based on ideas from stringology . These algorithms are time and space ecien t and satisfy mistak e b ounds related to particular stringological complexit y measures of the sequence. In this work (the rst in a series) w e focus on t w o suc h measures: (i) the size of the smallest straight-line program that pro duces the sequence, and (ii) the n um b er of states in the minimal automaton that can compute any sym b ol in the sequence when giv en its p osition in base  as input. These measures are interesting b ecause m ultiple rich classes of sequences studied in combinatorics of w ords (automatic sequences, morphic sequences, Sturmian w ords) ha ve lo w complexity and hence high predictabilit y in this sense. 1 In tro duction Sequence (or “time series”) prediction is a classical problem in articial intelligence, mac hine learning and statistics ([1], [2], [3]). Ho wev er, there is a dearth of examples of prediction algorithms whic h are simultaneously • Computationally ecien t. • Satisfy strong pro v able mistake bounds. • Guaran tee asymptotically near-p erfect prediction for natural ric h classes of deterministic sequences. Arguably , assuming no domain-sp ecic kno wledge, the “gold standard” for sequence prediction is Solomono induction [1]. On an y sequence, it asymptotically p erforms as well as an y computable predictor. It is also conceptually justiable as a formalization of Occam’s razor. Ho wev er, Solomono induction is uncomputable, making it completely unsuitable for an y practical imple- men tation. Muc h work on statistics fo cused on series of contin uous v ariables [3]. This led to algorithms that require assumptions such as particular form of probability distributions (e.g. normal). Most of these are inapplicable to the categorical (discrete) setting which is our o wn fo cus, and they don’t yield in teresting classes of deterministic sequences. In practice, metho ds based on deep learning are incredibly successful in next-token prediction [4]. How ever, the theoretical understanding of generalization b ounds for deep learning is in its infancy [5]. In particular, we don’t ha ve man y examples where strong rigorous mistake b ounds can b e pro v ed. There are computationally ecien t algorithms with strong prov able generalization b ounds for some in teresting classes of sto c hastic processes, for example context-tree weigh ting metho ds [6]. Ho wev er, suc h classes oen degenerate in the deterministic case (e.g. admitting only p erio dic sequences). One notable known example whic h do es come close to our desiderata is using the p erceptron algorithm for online linear classication [2] applied to some features computed from the sequence. Ho wev er, the corresp onding class of predictable deterministic sequences is still quite limited. Hence, nding conceptually dieren t approaches appears to b e a v aluable goal. F ormally , we w ork in the following setting. There is a xed nite alphab et  and w e are interested in predictors of the form       (here,   is the symbol predicted to follow the prex  ). The assumptions ab out the data are expressed as a c omplexity me asur e       . W e are then in terested in b ounding the num b er of mistak e  makes on a sequence     in terms of     and    . At the same time, w e require that  is computable in p olynomial time. Moreo v er, w e wish to sim ultaneously b ound the size of the internal state of the predictor in terms of  and  (this can b e interpreted as the predictor c ompr essing the sequence). The latter is in teresting b ecause it leads to predictors that are space ecien t and in some cases run in quasilinear 1 time. W e study multiple natural complexit y measures  with strong mistake and compression b ounds. (T w o such measures are addressed in this w ork, further examples will b e studied in sequels.) Conceptually , suc h complexit y measures can b e viewed as candidate tractable analogues of K olmogorov complexit y . Our algorithms simultaneously get a mistak e b ound of the form      and a compression b ound of the form    . In particular, when op erating on an innite sequence for whic h  gro ws p olylogarithmically , such a predictor runs in quasilinear time and p olylog space while making only p olylog mistakes (see Denition 2.4). An y prediction algorithm that assumes no domain-sp ecic knowledge is forced to rely on prop erties of data that are ubiquitous across man y domains in the real world. One candidate for such a prop erty is hier ar chic al structure [7], [8]. A natural formalism for describing sequences with hierarc hical structure is straight-line programs (SLP) [9]. The size of the smallest SLP pro ducing a sequence is therefore one natural complexit y measure for us to consider. Luc kily , algorithms on sequences with small SLP w ere widely studied in stringology , where the size of the smallest SLP is one of several standard measures of “compressibility” that are equiv alent up to factors of   , where  is the length of the sequence [10]. Indeed, one of these measures is the size of the LZ77 compression, whic h w as originally prop osed as a computationally tractable analogue of Kolmogoro v complexit y [11]. It is therefore somewhat remarkable that there is, as far as w e know, no prior w ork that uses LZ77 for sequence prediction. As a “w armup”, w e start from a closely related but easier complexity measure, namely the size of the smallest automaton that can compute an y sym b ol in the sequence when giv en the time index in b ase  as input, for some xed in teger    , a concept known as “  -automaticity” . Notably , innite sequences for which this measure is nite w ere widely studied in com binatorics on words: they are called automatic se quenc es and hav e their own rich theory [12]. Hence, ev en this simple complexit y measure captures a rich family of interesting examples. More precisely , this denes dieren t complexity measures if the automaton is assumed to read digits in le-to-righ t (L TR) 1 I.e. linear up to logarithmic factors. order vs. right-to-le (R TL). F or L TR, we nd an algorithm with        mistakes and      compression. W e leav e the treatment of R TL for a sequel. W e call str aight-line c omplexity (SLC) the size of the smallest SLP . It’s straightforw ard to show that SLC is dominated b y L TR  -automaticity for all v alues of  (see Prop osition 4.3). Hence, a prediction algorithm that’s eective w.r.t. SLC is automatically eective w.r.t. L TR  -automaticity for all  . Moreov er, SLC is closely related to another w ell-studied class of sequences, namely the morphic se quenc es [12] 2 . W e successfully nd a prediction algorithm for SLC with    mistak es and    compression. This algorithm, which directly uses LZ77 compression, is based on the  -SA c ompr esse d index [13]. 1.1 Related W ork P orat and F eldman [14] devised an algorithm for learning an automaton from its b ehavior on inputs giv en in strict lexicographic order. They did not ac knowledge the connection, but this setting is equiv alen t to predicting an automatic sequence. How ever, they only show ed that their algorithm pro duces a minimal automaton and don’t ha v e an explicit mistake bound. More generally , the inference of automata and related classes has receiv ed m uc h atten tion (e.g. [15], [16]), ho wev er most existing work is inapplicable to our setting. Morphic sequences are closely related to L-systems (see e.g. [17]). An L-system is dened b y a starting w ord      and a homomorphism        , in whic h case the language it pro duces is the set of words          . Inference of L-systems receiv ed some attention in the literature, see [18] for a surv ey . Ho w ever, the problem of inferring an L-systems from examples of its language is quite dierent from the problem of morphic sequence prediction. Moreov er, man y of the techniques are heuristic ([19], [20], [21]) and the exact algorithms ha v e exp onen tial complexity (see e.g. [22]). In fact, Duy et al [23] recently pro ved NP-hardness results ab out the problem. Remarkably , the sequence they construct in the pro of is p erio dic, which mak es it trivially easy in the prediction setting w e study . Giv en that our LZP algorithm is based on LZ77 compression, it is relev ant to p oint out that the connection of compression to prediction and learning w as the sub ject of considerable study , see e.g. [24], [25]. Ho wev er, muc h of that literature isn’t concerned with computational eciency . Also, some prediction algorithms are known that are based on LZ78 compression 3 , e.g. [27], [28], [29]. Ho w ever, none of that work explores a connection to automatic or morphic sequences. Notably , LZ78 compression is less suited to our purp oses since it can nev er compress a word of length  to a word of length less than    . By contrast, LZ77 compression can compress to a length of   . Finally , there w as ample research on other (not prediction) algorithmic problems inv olving auto- matic and morphic sequences [30]. 2 Sp ecically , “t ypical” morphic sequences hav e SLC that grows as   , although for some it grows as fast as    , see App endix G. 3 LZ78 compression w as in tro duced in [26]. 2 Setting In this section, we formalize the framework of stringological sequence prediction. W e dene the online prediction proto col in terms of state-based algorithms, in tro duce the notion of stringological complexit y measures, and establish rigorous criteria for statistical and computational eciency . Finally , we detail the c ounting criterion , whic h relates the learnability of a sequence class to the gro wth rate of the num b er of low-complexit y words. 2.1 Preliminaries and Notation Let  b e a xed nite 4 alphab et. W e denote the set of nite words ov er  b y   and the set of righ t-innite sequences b y   . F or a w ord     , w e denote its length by  . The  -th sym b ol in a w ord  is denoted   . W e use 0-based indexing, suc h that        The notation     refers to the factor (subw ord)    . The notation   is the same as     . A word  is a prex of  , denoted    , if    for some     . Logarithms are tak en to base 2 unless otherwise sp ecied. 2.2 The Prediction Proto col W e op erate in the standard deterministic online prediction setting. T o discuss memory and time constrain ts rigorously , w e mo del the predictor not merely as a function of the innite history , but as a state-based mac hine. Denition 2.1 : A pr e dictor is a tuple         , where: •      is the set of p ossible in ternal states (represen ted as binary strings). •     is the initial state. •        is the stateup date function . It tak es the curren t state and the most recen t observ ation to pro duce the next state. •      is the stateprediction function . It maps the curren t state to a predicted next sym b ol. The prediction pro cess pro ceeds in rounds      for a target sequence     . A t step  : 1. The predictor outputs a h yp othesis        . 2. The true sym b ol  is revealed. 3. The predictor up dates its in ternal state:         . W e assume      . The total n um b er of mistakes made by  on a nite w ord     is denoted b y:                   W e also in tro duce a notation for the maximal size of the state during the pro cessing of a w ord: 4 W e assume  is nite purely for ease of presentation. W e could instead assume that    , in which case the factors of  in the bounds w ould be replaced b y   , where  is the highest n umber that actually app eared in the sequence so far.           2.3 Eciency Criteria W e ev aluate performance against the inheren t structural complexity of the individual sequence, rather than a probabilistic prior. Denition 2.2 : A wor d c omplexity me asur e is a function       that satises the follo wing conditions: • (P olynomial b ound.) There exists a p olynomial    s.t. for all     ,      • (Appro ximate monotonicit y .) There exists a p olynomial      s.t. for all     and    ,                W e seek predictors that are statistically ecien t, space ecient and time ecient. W e b ound these resources in terms of the sequence’s complexit y   and its logarithmic length  . Denition 2.3 : ( Statistical Eciency . ) The predictor  is statistic al ly ecient with resp ect to  if the n umber of mistakes is quasilinear in the complexity . That is, for all                   Denition 2.4 : ( Computational Eciency . ) The predictor  is c omputational ly ecient with resp ect to  if • The time to compute        is b ounded b y    . • The time to compute          is b ounded b y    . • F or all     :         W e can th us think of   as a compressed represen tation of the history   , and we refer to inequalities of the form Equation 7 as “compression b ounds” . (In many examples, this compression is lossless, but it do esn’t ha v e to b e.) In particular, this b ound implies that if the sequence complexit y grows p olylogarithmically , then the space complexit y and the p er-round pro cessing time are also p olylogarithmic 5 . 5 Since p er-round pro cessing time is p olynomial in    and    is p olylogarithmic in  due to inequality Equation 7. 2.4 The Coun ting Criterion T o characterize whic h complexity measures admit statistically ecient predictors, we utilize a coun ting argument. This connects the “v olume” of the concept class (w ords of lo w complexit y) to the hardness of learning. Denition 2.5 : Given a word complexity measure  , the counting complexit y         is dened as the logarithm of the num b er of w ords of length  with complexity at most  :                The follo wing fact serves as the fundamen tal condition for learnability in this framew ork. Theorem 2.1 : Let  b e a word complexit y measure. Then, there exists a statistically ecien t predictor for  if and only if the counting complexit y satises:               See App endix A for the (fairly straigh tforward) proof. 3 A utomaticit y In this section, we inv estigate our simplest example of a w ord complexity measure: automaticit y . It is dened as the minimal num b er of states in an automaton that computes any symbol in the w ord from a base  representation of its position. Details follow. 3.1 Denitions Fix an integer base    . F or any length    and integer     , let        denote the standard base-  representation of  , padded with leading zeros to length  . That is,           suc h that          . W e mo del the structure of a sequence  via deterministic nite automata (DF As) that compute  from this padded representation. Denition 3.1 : Let     b e a nite word. The (L TR)  -automaticity of  , denoted    , is the minimal num b er of states in a DF A  (with input alphab et   ) such that there exists an in teger  satisfying     and:             Note that we are processing the time index in Le-to-Righ t (most signican t digit rst) order. It is also possible to dene R TL automaticity , where the order of processing is least signicant digit rst. F or our purp oses the resulting complexit y measures are very dierent and require dieren t algorithms. In this work, w e will fo cus solely on L TR automaticity , lea ving R TL automaticity for a sequel. F or an y    it’s ob vious that         . Hence, Equation 5 is satised for this measure. It is also easy to see that        , and therefore   is a w ord complexity measure 6 . Example 3.1 : Let     . W e dene the Thue-Morse se quenc e recursively by a family of prexes   of length   :             where  denotes the bit wise negation of  . Note that for all  ,      and hence there is a unique      s.t.          . The b eginning of this sequence is      It’s p ossible to sho w that for all  and                  F rom this it is easy to see that    can b e computed by an automaton with t w o states for an y reading direction. Hence,         The automaticity of a sequence is highly sensitiv e to the choice of base  . A sequence that is simple in one base ma y b e complex in another. Example 3.2 : Let      . F or an y    , let     and           Then,         and          . See App endix E for the proof of this example. This is a serious issue with automaticity-based predictors as candidate algorithms for practical applications, because in most cases there is no w ay to single out a preferred v alue of  . In Section 4 w e will see ho w we can o v ercome the dependence on  by using the word complexit y measure  . 6 This is demonstrated by an automaton that has a state for ev ery element of   with      , s.t.      detects the exact  and outputs  accordingly . 3.2 An Ecien t Predictor for Automaticit y W e prop ose the Hierarc hical Dictionary Pluralit y (HDP) algorithm. This algorithm predicts b y maintaining a hierarch y of dictionaries for w ord blo cks of size   that appro ximate the transition structure of the underlying automaton (see App endix C). Theorem 3.1 : There exists a predictor (HDP) which is statistically and computationally ecien t with resp ect to   . Sp ecically , for any    and     , letting      and    , w e ha ve: 1. Statistical Eciency: The n umber of mistakes is bounded by             2. Computational Eciency: The predictor satises the compression b ound               Here, and in the theorem b elow, the predictor receiv es  as a parameter, and is p olynomial time in this input as w ell. See App endix C for the pro of. 4 Straigh tLine Complexit y W e now turn to a more general complexity measure: straight-line complexity (SLC). While  - automaticit y characterizes sequences generated b y nite state machines reading the time index, SLC c haracterizes sequences generated by straigh t-line programs: a t yp e of con text-free grammar whose language consists of a single w ord. 4.1 Denitions W e start from the denition of a straigh t-line program. Denition 4.1 : A str aight-line pr o gr am (SLP) ov er  is         , where  is a nite set,     and       , where      . W e require that • F or all    ,      . •  is acyclic, i.e. there are no          s.t for all    ,     app ears in     . •   is the unique elemen t of  that do esn’t app ear in    for an y    . W e dene        recursiv ely by • F or    ,      . • F or    , let      . Then,                 . W e dene the value of  as         . Th us, an SLP is essen tially an acyclic directed graph with vertices  and edges     where   app ears in    . The outgoing edges of ev ery v ertex are ordered (b y the p osition inside    ),   is the unique source in  , and the elemen ts of  are sinks. The size of an SLP is dened to b e the n um b er of edges, i.e.          It’s easy to see that w e can reduce any SLP to a “binary” SLP with only a mild increase in size: Prop osition 4.1 : F or any SLP  , there exists an SLP             s.t. •       •       • F or all     ,       . See App endix F for the proof. W e can no w dene our next word complexit y measure of interest. Denition 4.2 : The str aight-line c omplexity of a w ord  , denoted  , is the minimum size of an SLP  s.t.     . T o see that  satises Equation 5, we observ e the following Prop osition 4.2 : F or any      s.t.    , it holds that      . See App endix F for the proof. SLC also satises Equation 4, b ecause    : consider the SLP with      and      . The SLC measure is closely related to the Lemp el-Ziv factorization, a foundational concept in lossless data compression. T o explore this relationship, w e formally dene the LZ77 parsing size. Denition 4.3 : The LZ77 factorization of a w ord  is the unique decomp osition         with the following prop ert y . F or every    , denote           . Then, either      and   is the rst o ccurrence of the symbol    in  , or the follo wing t wo conditions hold: 1. There exists     s.t.            2. Either      or there is no     s.t.                       That is,   is the longest non-empty prex of the remaining sux that has o ccurred previously in  (and the previous o ccurence is allow ed to o v erlap with   ). The LZ77 c omplexity , denoted  , is dened as the n umber of factors  in this decomp osition. See App endix B for a recap of elemen tary prop erties and examples of LZ77 factorizations.  satises Equation 4 since    and Equation 5 since    implies     . A well-kno wn result in stringology [31] establishes that  and  are equiv alen t up to logarithmic factors:        Hence, an y predictor that is statistically (resp. computationally) ecient w.r.t.  is statistically (resp. computationally) ecien t w.r.t.  and vice versa. 4.2 Comparison with A utomaticity Straigh t-line complexit y is a strictly more pow erful w ord complexit y measure than L TR  - automaticit y (i.e. low er up to log factors). In tuitiv ely , the recursive  -section of the time interv al inheren t in pro cessing the Most Signican t Digit First can b e directly mapp ed to the hierarchical concatenation structure of an SLP . Prop osition 4.3 : F or any base    and word  , the straigh t-line complexity is b ounded b y the L TR  -automaticity:           See App endix F for the proof. As the following example sho ws,  also captures structures that don’t seem to b e captured by   for an y v alue of  . Example 4.1 : The Fib onac ci wor d       is dened as the limit of the recursively dened sequence of w ords                   The b eginning of the Fib onacci word is      The recurrence relation dening   naturally describes an SLP of size  . Since    is the  -th Fib onacci num b er, Prop osition 4.2 implies         More generally ,  has mild growth for sev eral well-studied class of sequences (see App endix G for details). 4.3 Ecien t Prediction The Lemp el-Ziv Pluralit y (LZP , see App endix D) algorithm maintains the LZ77 factorization of the sequence so far. It is based on a plurality v ote betw een dieren t o ccurrences of the last factor. Theorem 4.1 : There exists a predictor (LZP) which is statistically and computationally ecien t w.r.t.  . F or any    and     , let    and    . 1. Statistical Eciency: The n umber of mistakes is bounded by         2. Computational Eciency: The predictor satises the compression b ound            See App endix D for the proof. A c kno wledgemen ts This w ork was supp orted by the A dv anced Research+In ven tion Agency (ARIA) of the United Kingdom, the AI Securit y Institute (AISI) of the United Kingdom, Surviv al and Flourishing Corp, and Coecient Giving in San F rancisco, California. The author wishes to thank Alexander Appel, Matthias Georg May er, her sp ouse Marcus Ogren, and Vinay ak Pathak for reviewing dras, lo cating errors and pro viding useful suggestions. Bibliograph y [1] M. Hutter, D. Quarel, and E. Catt, A n intr o duction to universal articial intel ligenc e . Chapman, Hall/CR C, 2024. [2] N. Cesa-Bianc hi and G. Lugosi, Pr e diction, le arning, and games . Cambridge Universit y Press, 2006. doi: 10.1017/CBO9780511546921. [3] P . J. Bro c kw ell and R. A. Davis, Intr o duction to time series and for e c asting . Springer, 2002. [4] T. B. Brown et al. , “Language Mo dels are F ew-Shot Learners,” in A dvanc es in Neur al Information Pr o c essing Systems 33: A nnual Confer enc e on Neur al Information Pr o c essing Systems 2020, NeurIPS 2020, De c emb er 6-12, 2020, virtual , H. Laro c helle, M. Ranzato, R. Hadsell, M.-F. Balcan, and H.-T. Lin, Eds., 2020. [Online]. A v ailable: h ttps://pro ceedings. neurips.cc/pap er/2020/hash/1457c0d6bfcb4967418b8ac142f64a-Abstract.h tml [5] J. F an, C. Ma, and Y. Zhong, “A selective ov erview of deep learning,” Statistic al scienc e: a r eview journal of the Institute of Mathematic al Statistics , v ol. 36, no. 2, p. 264, 2020. [6] I. K onto yiannis, L. Mertzanis, A. Panotopoulou, I. P apageorgiou, and M. Sk oularidou, “Ba yesian Context T rees: Mo delling and exact inference for discrete time series,” CoRR , 2020, [Online]. A v [7] H. A. Simon, “The Architecture of Complexity ,” Pr o c e e dings of the A meric an Philosophic al So ciety , v ol. 106, no. 6, pp. 467–482, 1962, Accessed: July 29, 2025. [Online]. A v ailable: h ttp://www.jstor.org/stable/985254 [8] C. G. Nevill-Manning and I. H. Witten, “Identifying Hierarc hical Structure in Sequences: A linear-time algorithm,” J. A rtif. Intel l. R es. , vol. 7, pp. 67–82, 1997, doi: 10.1613/JAIR.374. [9] P . Bürgisser, M. Clausen, and M. A. Shokrollahi, A lgebr aic c omplexity the ory , vol. 315. in Grundlehren der mathematisc hen Wissenschaen, v ol. 315. Springer, 1997. [10] T. K o ciumaka, G. Na v arro, and N. Prezza, “T ow ard a denitive compressibilit y measure for rep etitiv e sequences,” IEEE T r ansactions on Information The ory , v ol. 69, no. 4, pp. 2074– 2092, 2022. [11] A. Lemp el and J. Ziv, “On the Complexity of Finite Sequences,” IEEE T r ans. Inf. The ory , v ol. 22, no. 1, pp. 75–81, 1976, doi: 10.1109/TIT.1976.1055501. [12] J.-P . Allouc he and J. O. Shallit, A utomatic Se quenc es - The ory, A pplic ations, Gener aliza- tions . Cambridge Univ ersity Press, 2003. [Online]. A v ailable: h ttp://www.cambridge.org/gb/ kno wledge/isbn/item1170556/?site\_lo cale=en\_GB [13] D. Kempa and T. Kociumaka, “Collapsing the Hierarc hy of Compressed Data Structures: Sux Arra ys in Optimal Compressed Space,” in 64th IEEE A nnual Symp osium on F ounda- tions of Computer Scienc e, FOCS 2023, Santa Cruz, CA, USA, Novemb er 6-9, 2023 , IEEE, 2023, pp. 1877–1886. doi: 10.1109/F OCS57990.2023.00114. [14] S. Porat and J. A. F eldman, “Learning Automata from Ordered Examples,” Mach. L e arn. , v ol. 7, pp. 109–138, 1991, doi: 10.1007/BF00114841. [15] C. de la Higuera, Gr ammatic al Infer enc e: L e arning A utomata and Gr ammars . Cambridge Univ ersity Press, 2010. [16] I. A ttias, L. Reyzin, N. Srebro, and G. V ardi, “On the Hardness of Learning Regular Expres- sions,” in 37th International Confer enc e on Algorithmic L e arning The ory , 2026. [Online]. A v ailable: https://openreview.net/forum?id=h VPfu5BqY x [17] G. Rozenberg and A. Salomaa, Lindenmayer Systems: Imp acts on The or etic al Computer Scienc e, Computer Gr aphics, and Developmental Biolo gy . Berlin, Heidelb erg: Springer-V erlag, 2001. [18] F. Ben-Naoum, “A survey on L-system inference,” INFOCOMP Journal of Computer Scienc e , v ol. 8, no. 3, pp. 29–39, 2009. [19] J. Guo et al. , “In verse Pro cedural Modeling of Branc hing Structures by Inferring L-Systems,” A CM T r ans. Gr aph. , v ol. 39, no. 5, pp. 155:1–155:13, 2020, doi: 10.1145/3394105. [20] J. Bernard and I. McQuillan, “T echniques for inferring context-free Lindenmay er systems with genetic algorithm,” Swarm Evol. Comput. , v ol. 64, p. 100893, 2021, doi: 10.1016/ J.SWEV O.2021.100893. [21] J. J. Lee, B. Li, and B. Benes, “Laten t L-systems: T ransformer-based T ree Generator,” A CM T r ans. Gr aph. , v ol. 43, no. 1, pp. 7:1–7:16, 2024, doi: 10.1145/3627101. [22] I. McQuillan, J. Bernard, and P . Prusinkiewicz, “Algorithms for Inferring Context-Sensitiv e L-Systems,” in Unc onventional Computation and Natur al Computation - 17th International Confer enc e, UCNC 2018, F ontaineble au, F r anc e, June 25-29, 2018, Pr o c e e dings , S. Stepney and S. V erlan, Eds., in Lecture Notes in Computer Science, v ol. 10867. Springer, 2018, pp. 117–130. doi: 10.1007/978-3-319-92435-9\_9. [23] C. Duy , S. Hillis, U. Khan, I. McQuillan, and S. L. Shan, “Inductive inference of lindenmay er systems: algorithms and computational complexit y: C. Duy et al.,” Natur al Computing , pp. 1–11, 2025. [24] B. Ry abk o, J. Astola, and M. Malyutov, Compr ession-Base d Metho ds of Statistic al A nalysis and Pr e diction of Time Series . Springer, 2016. doi: 10.1007/978-3-319-32253-7. [25] O. Da vid, S. Moran, and A. Y ehuda yo, “On statistical learning via the lens of compression,” in Pr o c e e dings of the 30th International Confer enc e on Neur al Information Pr o c essing Systems , in NIPS'16. Barcelona, Spain: Curran Asso ciates Inc., 2016, pp. 2792–2800. [26] J. Ziv and A. Lemp el, “Compression of individual sequences via v ariable-rate co ding,” IEEE T r ans. Inf. The ory , v ol. 24, no. 5, pp. 530–536, 1978, doi: 10.1109/TIT.1978.1055934. [27] M. F eder, N. Merhav, and M. Gutman, “Universal prediction of individual sequences,” IEEE T r ans. Inf. The ory , v ol. 38, no. 4, pp. 1258–1270, 1992, doi: 10.1109/18.144706. [28] K. Gopalratnam and D. J. Co ok, “Activ e Lezi: an Incremental P arsing Algorithm for Sequen tial Prediction,” Int. J. A rtif. Intel l. T o ols , v ol. 13, no. 4, pp. 917–930, 2004, doi: 10.1142/S0218213004001892. [29] N. Sagan and T. W eissman, “A F amily of LZ78-based Univ ersal Sequential Probabilit y Assignmen ts,” CoRR , 2024, doi: 10.48550/ARXIV.2410.06589. [30] J. Shallit, The L o gic al A ppr o ach to A utomatic Se quenc es: Exploring Combinatorics on W or ds with W alnut . in London Mathematical So ciety Lecture Note Series. Cambridge Univ ersit y Press, 2022. [31] W. Rytter, “Application of Lemp el-Ziv factorization to the appro ximation of grammar-based compression,” The or. Comput. Sci. , vol. 302, no. 1–3, pp. 211–222, 2003, doi: 10.1016/ S0304-3975(02)00777-6. [32] J. A. Storer and T. G. Szymanski, “Data compression via textual substitution,” Journal of the A CM (JA CM) , v ol. 29, no. 4, pp. 928–951, 1982. [33] S. Constantinescu and L. Ilie, “The Lemp el-Ziv Complexity of Fixed Poin ts of Morphisms,” in Mathematic al F oundations of Computer Scienc e 2006 , R. Králo vič and P . Urzyczyn, Eds., Berlin, Heidelb erg: Springer Berlin Heidelb erg, 2006, pp. 280–291. A Pro of of the Coun ting Criterion In this app endix, w e pro vide the detailed pro of for Theorem 2.1. The theorem establishes a neces- sary and sucien t condition for the existence of a statistically ecien t predictor for a complexit y measure  . Sp ecically , it states that such a predictor exists if and only if the counting complexit y     grows polynomially in   and quasilinearly in  . A.1 Suciency W e rst prov e that if the counting complexity satises the condition              , then there exists a statistically ecien t predictor  . W e construct  using the Plurality Algorithm , a standard approach in online learning [2] 7 . W e adapt it the anytime setting where the true complexity     and length    of the target sequence are unknown. T o handle these unknown parameters, w e emplo y a “doubling trick” strategy that op erates in phases. A.1.1 The Predictor The predictor op erates b y main taining dynamic estimates for the complexit y and length of the target sequence. W e employ a “doubling trick” strategy to adapt to these unkno wn parameters. Let   and   denote the complexit y b ound and length b ound during phase  , resp ectiv ely . W e initialize the predictor with     and     . The core mechanism is a pluralit y vote o ver a restricted version sp ac e . At an y time step  within phase  , let  denote the history observed so far. W e dene the version space   as the set of all candidate nite w ords that are consisten t with the history , hav e complexity at most   , and length at most   :                           7 Common names for this are “majority algorithm” and “halving algorithm” . W e prefer the w ord “plurality” b ecause w e w ork in the m ulticlass setting (i.e.  can b e   ), in whic h “pluralit y” is more accurate. The predictor estimates the “lik eliho o d” of the next sym b ol based on the cardinalit y of v alid extensions in the v ersion space. Specically , for eac h symbol    , we coun t the num b er of words in    that ha ve  as the next sym b ol at p osition  :               The algorithm predicts the sym b ol    that maximizes this coun t:           This pro cedure is formalized in Algorithm 1. If the observed symbol   results in a sequence     that violates the curren t b ounds (i.e., if     or      ), the algorithm terminates the curren t phase. It then updates the bounds b y doubling the violated parameter—setting      or         —and pro ceeds to phase    . This ensures that the resource b ounds gro w eciently to accommodate the true parameters of the target sequence. V ariables: History    , complexity bound    , length b ound    . 1 for       2                    3 for    4           5          6 receiv e   7       8 if    then    9 if     then      Algorithm 1: Plurality Algorithm A.1.2 Mistak e Analysis W e explicitly b ound the n um b er of mistak es made by this predictor. Analysis of a Single Phase: Consider a single phase  with xed b ounds   and   . Let   be the n umber of mistak es made during this phase. Whenev er the predictor mak es a mistake at step  (i.e.,      ), it implies that the true symbol  was not the plurality choice. Consequently , the set of candidates consisten t with the true sym b ol,       , must b e at most half the size of the total v ersion space at step  :               Since the new v ersion space at step    b ecomes       , the size of the version space is reduced b y a factor of at least 2 for ev ery mistak e committed. The initial size of the v ersion space is b ounded by the n umber of words of length at most   with complexity at most   , whic h is         . Since the v ersion space must con tain at least the true target sequence (un til the bounds are violated), its size is at least 1. Therefore:                     Here,   is the set of v ersion space at the start of the phase. Here and ev erywhere,  stands for logarithm to base 2. T otal Mistake Bound: Let the target sequence  hav e length    and complexit y     . Due to the monotonicit y condition Equation 5, the complexit y of an y prex of  is b ounded b y:              where  is the p olynomial sp ecied in the denition of the complexity measure. The algorithm pro ceeds through a sequence of phases       . The nal phase  terminates when the en tire sequence  is pro cessed. Due to the doubling strategy: 1. The nal length b ound   satises     . 2. The nal complexit y b ound   satises      . 3. The total n umber of phases    is bounded b y the n umber of doublings of  plus the n umber of doublings of  :           The total n umber of mistakes    is the sum of mistak es in each phase:                         Although the coun ting complexity     is not necessarily monotonic in  (as the n um b er of lo w-complexity words of a sp ecic length ma y uctuate), the hypothesis of the theorem provides an upp er b ound that is monotonic. Sp ecically , we ha v e           where             . Since     and      for all phases  , and since  ma y be assumed to be non-decreasing in b oth argumen ts, w e can upp er b ound eac h term in the sum b y      . This yields:                 W e no w verify statistical eciency . Substituting the explicit form of the b ound  , we get                     Recalling that             , we conclude                        This satises the criterion for statistical eciency . A.2 Necessit y W e now prov e the conv erse: if there exists a statistically ecien t predictor  , then the counting complexit y must satisfy the gro wth condition. Assume  is a statistically ecient predictor. By denition, there exists a p olynomial  suc h that for an y sequence  , the num b er of mistakes    is b ounded b y:                  W e rely on a compression argumen t (information theoretic lo wer b ound). W e show that if the predictor is ecient, we can construct a compressed representation for any w ord of lo w complexit y . Let             . W e wan t to b ound    . An y w ord     can b e uniquely reconstructed if w e know: 1. The deterministic prediction algorithm  . 2. The sp ecic time steps where  made a mistak e. 3. The correct sym b ol at those sp ecic time steps. F or a w ord of length  with at most  mistak es, w e can enco de the locations of the mistak es using      bits:   for the n umber of mistak es and    for the lo cation of eac h. The correct sym b ols at these steps require   bits. Thus, the description length  in bits is:          Since this encoding must distinguish ev ery w ord in   , the n um b er of suc h words cannot exceed    . T aking the logarithm giv es the counting complexit y:               Substituting the mistak e b ound           :                                     Th us, the existence of an ecient predictor implies the required bound on the coun ting complexity . B Prop erties of Lemp elZiv 77 In this app endix, we provide background on LZ77-t yp e factorizations, b oth in the standard setting and in a  -aligned v arian t used in the analysis of the HDP algorithm (Appendix C). W e rst dene general (non-greedy) factorizations, then sho w that the greedy v arian ts used in the main text are uniquely determined sp ecial cases that minimize the n um b er of factors. B.1 Standard LZ77t yp e F actorizations W e b egin with a general notion of LZ77-t yp e factorization, which relaxes the maximalit y (greed- iness) requiremen t of the standard LZ77 factorization given in the main text. Denition B.1 : An LZ77-typ e factorization of a word     is a decomp osition         in to non-empt y factors with the following property . F or every    , denote           . Then, either •      and the symbol    do es not app ear in     , or • there exists     s.t.           . In the rst case, we call   a liter al factor . In the second case, we call   a c opy factor with sour c e p osition  . The n umber of factors  is called the size of the factorization. Note that a copy factor is allo wed to overlap with its source: it is p ossible that         . In tuitively , this corresp onds to a byte-b y-byte copy where previously copied symbols b ecome a v ailable as source material for the remainder of the factor. Also note that when      and the sym b ol    do es app ear in     , the factor falls under the second case as a cop y factor of length 1. Example B.1 : Let      and     (length 8). The following are all v alid LZ77- t yp e factorizations of  : 1.          (5 factors). Here,   ,   and   are eac h a copy of      . 2.         (4 factors). Here,     is a copy from source p osition    ; the source    and the destination    are adjacen t but do not ov erlap. 3.       (3 factors). Here,      is a copy from source position    , with             : the source    o verlaps with the destination    , illustrating the o verlap phenomenon. The greedy LZ77 factorization (Denition 4.3), is the sp ecial case where each factor is c hosen to b e as long as p ossible. W e restate the denition here for clarity . Denition B.2 : The (gr e e dy) LZ77 factorization of a word     is the unique LZ77-type factorization         that additionally satises the follo wing maximality condition: for ev ery      , there is no     s.t.                       That is, each non-nal factor is as long as p ossible: if the factor is a copy , it is the longest prex of the remaining sux that has an earlier o ccurrence in  (and if the next symbol is new, it necessarily has length 1). The LZ77 c omplexity  is the num b er of factors  in this factorization. The greedy LZ77 factorization is uniquely determined: starting from p osition     , each factor   is xed by the requiremen t that it b e the longest p ossible matc h (or a literal if the symbol is new). Since          , the en tire factorization is determined inductively . Example B.2 : Contin uing Example B.1, the greedy LZ77 factorization of     is        with 3 factors (factorization 3 from Example B.1). This is the unique factorization satisfying the maximality condition. F or instance, factorization 1 from ExampleB.1 is not greedy b ecause     could b e extended:            , violating maximalit y . A fundamental prop erty of the greedy LZ77 factorization is that it minimizes the n umber of factors among all LZ77-t yp e factorizations (see Theorem 10 in [32]). Theorem B.1 : (StorerSzymanski.) F or an y word     , the greedy LZ77 factorization of  has the minim um num b er of factors among all LZ77-type factorizations of  . B.2  aligned LZ77t yp e F actorizations F or the analysis of the HDP algorithm, we require a v ariet y of LZ77-type factorizations where factors are constrained to b e aligned with blo cks of size   . This ensures compatibility with the hierarc hical dictionary structure maintained b y HDP . Denition B.3 : Fix    . A  -LZ77-typ e factorization of a word     is a decomp osition         into non-empt y factors with the following prop erties. F or ev ery    , denote           . Then, there exist        suc h that: •         (the factor starts at an aligned p osition). • Either        or (      and        ). Moreo ver, either     and    do es not app ear in     (a literal factor), or there exists     s.t.                (a cop y factor at level   , cop ying a previously seen aligned blo c k). W e call   the level of factor   , and the n umber of factors  the size of the factorization. In other words, each factor occupies a complete aligned blo c k of size    (except p ossibly the last factor, whic h ma y b e shorter), and either introduces a new symbol at level 0 or matc hes a prex of an aligned blo c k of the same size that app eared earlier in  . Note that, unlike the standard case, o verlapping copies cannot o ccur here: since b oth the source and destination blo c ks are aligned to multiples of    and     implies       , the source blo ck            ends at or b efore the start of   . An y  -LZ77-type factorization is in particular an LZ77-t yp e factorization in the sense of Den- ition B.3. Also note that the c hoice of level   is constrained by the starting p osition   : we m ust hav e       . F or example, a factor starting at a p osition that is an o dd multiple of  cannot ha ve lev el greater than 1. Example B.3 : Let     ,    , and    (length 8). The aligned blocks are: • Lev el 0 (size 1):         • Lev el 1 (size 2):     • Lev el 2 (size 4):   • Lev el 3 (size 8):  The follo wing are b oth v alid  -LZ77-t yp e factorizations: 1.              (7 factors), all at level 0 except   at lev el 1. 2.        (4 factors), with lev els        ,     ,     . In factorization 2,     at lev el 1 copies the aligned block      , and     at lev el 2 copies the aligned blo c k       . The greedy  -LZ77 factorization is the v ariant that alw ays promotes factors to the highest p ossible lev el. Denition B.4 : Fix    . The (gr e e dy)  -LZ77 factorization of a word     is the unique  -LZ77-type factorization         that additionally satises the following maximality condition: for ev ery    , if   is a m ultiple of  , then there is no      s.t.                            The maximality condition states that no factor can b e “promoted” to the next lev el: if the curren t factor and its neighbors could b e merged into a larger aligned blo ck that has appeared previously , the greedy factorization would hav e already used that larger blo ck. This determines the factorization uniquely , by the following inductive argument. A t each p osition   , the greedy algorithm considers all levels  such that      and        (plus p ossibly  with        for the last factor), in decreasing order. It selects the highest level at which the aligned blo c k has a previous o ccurrence (or falls bac k to a level-0 literal if the sym b ol is new). Hence, the factor   is uniquely determined, and          determines the start of the next factor. Example B.4 : Contin uing Example B.3, the greedy  -LZ77 factorization of    is         with 4 factors (factorization 2 from Example B.3). T o see that factorization 1 from Example B.3 is not greedy , observe that at position     , the lev el-0 factor     can b e promoted:     is a multiple of    , and the level-1 blo ck      matc hes the earlier blo c k      , violating the maximality condition. C Hierarc hical Dictionary Pluralit y In this app endix, we provide the detailed description and analysis of the Hierarchical Dictionary Pluralit y (HDP) algorithm referred to in Section 3. This algorithm is designed to eciently predict sequences with low  -automaticit y by maintaining a dynamic hierarc hy of dictionaries that appro ximate the transition structure of the underlying automaton. C.1 Algorithm Description The HDP algorithm main tains a compressed representation of the sequence history using a hierarc hy of dictionaries, denoted        . The dictionary at level  ,   , stores unique blo cks of length   encoun tered so far. T o ensure space eciency , these blo cks are not stored as raw strings but as tuples of indices p oin ting to en tries in the dictionary of the level below,   . C.1.1 State The in ternal state   of the predictor at time  consists of: 1. A list of dictionaries     . 1.   is xed as the alphab et  (conceptually mapping indices    to sym b ols). 2. F or    ,   is a dynamic list of  -tuples. Eac h en try     is a tuple              , where   is an index in to   . This represen ts a w ord of length   formed b y concate- nating the w ords at indices       from lev el    . 2. A list of activ e buers     . 1.      stores the sequence of indices from lev el  observed so far in the current, incomplete blo c k of lev el    . Initially ,     for    and   is the empt y word for all    . C.1.2 Up date R ule The up date function pro cesses a new symbol     b y propagating it up the hierarch y . At level 0, the symbol is appended to the buer   . When a buer at level  reaches size  , it forms a complete blo ck. The algorithm c hecks if this blo c k (represented as a tuple of indices) exists in   . If not, it is added. The index of this blo c k is then passed as an input to the buer at lev el    . This pro cess con tinues recursiv ely until a buer does not ov erow (see Algorithm 2). 1 Input: base  , state        , new sym b ol    2 let   index of  in  3 for      4 app end  to   5 if      6 return 7 else 8 let     9    empt y 10 if can select  s.t.      11    12 else 13 app end  to   14        Algorithm 2: HDP Up date C.1.3 Prediction R ule T o predict the next sym b ol, HDP employs a hierarc hical pluralit y v ote. It identies a set of “v alid v ersions”  . A version is a pair     , where  represents a candidate index for the next blo ck at some lev el  , and    is the implied next sym b ol at lev el 0. The algorithm initializes  with all p ossible symbols. It then iteratively lters  by chec king consistency with the active buers   against the known transitions in the dictionaries   . Sp ecically , if the curren t buer   com bined with a candidate next index  forms a tuple presen t in   , the v ersion survives. If ltering at lev el  eliminates all candidates, the algorithm halts and outputs the plurality v ote (most frequent  ) among the surviv ors from lev el  . Otherwise, it pro ceeds to    . This logic can b e regarded as running the computation of the underlying automaton in reverse (see Algorithm 3). 1 Input: base  , state        2         3 for      4     5 for      6 for      suc h that        7 add    to   8 if     9 return         10 else 11     Algorithm 3: HDP Predict C.2 Analysis W e now pro ve Theorem 3.1, establishing the compression and mistake bounds for HDP with resp ect to the L TR  -automaticity measure   . First, some notation. W e will regard a DF A  with input alphab et  and output alphab et  as a tuple         , where: •  is the set of states. •   is the initial state. •        is the transition function. •      is the output function. W e will use the notation          to denote the standard recursiv e extension of the transition function to input w ords. No w, let’s connect the conten t of the dictionaries   to the states of the minimal DF A. Lemma C.1 : Consider any     s.t.     ,     and a DF A  with input alphabet  and output alphabet  s.t.            . Consider also some    and      s.t.      and      . Assume that                      Then,                       In tuitively , this lemma says that the conten t of an aligned blo c k in  is completely determined b y the DF A state reac hed aer reading the “address” of that block. Recall that eac h position  in  is computed b y feeding the base-  digits of  into the automaton  , most signican t digit rst. W e can split these digits in to tw o parts: a high-or der pr ex of length  (which selects whic h aligned blo c k of size   w e are in) and a low-or der sux of length    (whic h selects the p osition within that blo c k). The high-order prex steers the automaton from   to some in termediate state  ; the low-order sux then determines, starting from  , the actual output symbol. If tw o dieren t blo c k indices  and  happ en to steer the automaton to the same intermediate state, then ev ery p osition inside those t w o blo cks will see the same low-order computation and hence pro duce the same symbol. In other words, the blo ck’s conten t dep ends only on whic h state the automaton is in when it “en ters” the blo c k — not on how it got there. Pr o of : Let   denote the state reached by the automaton aer pro cessing the prexes corresp onding to  and  . By the hypothesis:                         W e examine the sub words of  starting at indices       and       . Let  b e an in teger oset representing the relativ e p osition within these blo c ks, such that       . The absolute p ositions in the sequence  are         and         . Since     , the base-  representation of the full time index    is the concatenation of the represen tation of  (padded to length  ) and the representation of  (padded to length    ). That is:                               The sym b ol generated at    is given by             . Using the recursiv e prop erty of the transition function               , w e ha ve:                                   Similarly for    :                                    Since the right-hand sides are identical,        for all v alid osets  . Consequently , the sub words dened b y the range of  are identical.  In particular, this shows that      , since   eectiv ely stores the dierent aligned blocks of size   . C.2.1 Compression Bound Theorem 3.1 (P art 2) : F or an y     , let      and    . The state size of HDP is b ounded b y:               Pr o of : Let  b e the minimal DF A with  states that generates  in the sense of Den- ition 3.1. That is, there exists some      s.t. for an y p osition    ,        . The HDP algorithm builds dictionaries   where en tries in   corresp ond to blo c ks of length   encountered in  at indices aligned to m ultiples of   . Sp ecically , an y entry added to   corresp onds to a sub w ord           for some  . By Lemma C.1, the conten t of such a blo ck is entirely determined by the state of the DF A  aer pro cessing the prex of the index corresp onding to the most signicant digits. Since there are only  states in  , there are at most  distinct types of blo cks of length   that can b e generated b y  . Consequen tly , the size of the dictionary at an y level    is b ounded b y      . The maxim um lev el reac hed is b ounded by   . The storage cost for one entry in   is the size of a  -tuple of indices p ointing to   . F or    ,      , and hence an index requires   bits. Th us, eac h en try tak es    bits. Summing o v er all levels      :                      W e also hav e      and hence for    the cost is     . Adding this we obtain the b ound          .  C.2.2 Mistak e Bound In order to prov e the mistak e b ound, w e will use the  -LZ77 factorization (Denition B.4). The n umber of factors in the  -LZ77 factorization can b e b ounded in terms of automaticity . Lemma C.2 : F or any     , the num b er of factors in the  -LZ77 factorization of  satises               Pr o of : Let      . W e analyze the  -LZ77 factorization b y mapping the factors to no des in a  -ary tree representing all aligned blo cks of  . A no de at height  represents an aligned blo c k of length   . W e classify each aligned blo c k app earing in  as New if it is the rst o ccurrence of its con tent in  (ordered b y start index), and Old otherwise. By Lemma C.1, the con ten t of an aligned blo c k of length   is determined b y the state of the generating DF A aer reading the index prex. Since the DF A has  states, there are at most  distinct blo ck con tents for any xed length   . Therefore, there are at most  “New” blo c ks at an y level  . Consider a factor   in the  -LZ77 factorization with length        . By the maximality condition in the denition of  -LZ77 (sp ecically condition 2), the aligned block of size     that con tains   (the parent no de in the tree) must not hav e appeared previously in  . If it had appeared previously , then the larger block w ould ha ve b een the factor. Thus, every factor   is a c hild of a “New” blo c k. The total n umber of “New” blo c ks across all levels  (from  to      ) is:                     Since ev ery factor is a c hild of a “New” blo c k, and eac h “New” block has at most  c hildren (the sub-blo c ks of size   ), the total n umber of factors  is b ounded by:               W e can no w complete the pro of. Theorem 3.1 (P art 1) : The num b er of mistakes is bounded by:             Pr o of : Let     and      . W e consider the  -LZ77 factorization of  , denoted         . By the previous lemma, the n umber of factors is b ounded b y      . W e analyze the mistak es committed by the HDP predictor during the processing of a single factor   . Let the length of this factor b e       . According to the denition of the factorization, unless   is a single sym b ol app earing for the rst time, it corresponds to a blo c k that has o ccurred previously in  at an aligned p osition. Since HDP pro cesses the sequence in order, the previous o ccurrence of this blo ck was fully pro cessed b efore the start of   . Consequen tly , the dictionary entry representing this blo ck m ust already exist in   . This ensures that at least one “correct” v ersion (a p ointer to the correct blo ck in   ) exists in the version space (i.e. the dictionary elemen ts that might con tribute to  ) at the start of the factor. W e no w argue that the version space is stable. The HDP algorithm up dates a dictionary   only when a buer   is completely lled. F or level  , the factor length matches the blo c k size, so   is updated only at the end of the block. F or levels    , the block size      implies that no lev el-  blo c k can complete strictly within the duration of   . Therefore, for all prediction steps within   , the set of a v ailable candidates in the dictionaries     is xed to those presen t at the start of the factor. Since no new versions are created, the set of v alid versions is monotonically non-increasing. The predictor mak es a mistak e only when the pluralit y of v alid versions predicts the wrong sym b ol. When a mistake o ccurs, all versions con tributing to the incorrect plurality prediction are eliminated from the v alid set. This reduces the n umber of v alid versions by at least a factor of 2. Since      for all  , the num b er of mistakes p ossible at any sp ecic lev el is b ounded b y   . Summing across all activ e lev els (up to      ), the total mistak es for factor   are b ounded b y:           The total n umber of mistakes    is the sum o ver all factors:                    Substituting the b ound for  from Lemma C.2:                        D Lemp elZiv Pluralit y In this app endix, w e dene the Lemp el-Ziv Plurality (LZP) algorithm and provide the pro of for Theorem 4.1. D.1 Algorithm Description D.1.1 State The state of LZP is a representation of the past sequence  via its LZ77 factorization. Let the LZ77 factorization be         . The represen tation is a list         with an entry for ev ery    . Let       . Then, • If     , w e just store the sym b ol       . • If     and           , w e store the pair          , where     is an y s.t.            Initially ,  is the empt y list. D.1.2 The  SA Compressed Index T o ecien tly implemen t the algorithm while satisfying the space and time constraints dened in Section 2, we utilize the  -SA data structure prop osed b y Kempa and K o ciumaka [13]. This index is a compressed represen tation of the so-called sux arra y and its in v erse, designed specically for highly rep etitiv e sequences. The  -SA admits a deterministic construction algorithm that runs in p olynomial time given the LZ77 factorization of  , represented as ab ov e. This prop erty is crucial, as it allo ws us to build it directly from the compressed state without ev er expanding the sequence to its full length. T o rigorously dene the queries supp orted by the index, we rst dene the lexic o gr aphic al r ank . W e assume a xed total order on the alphab et  . The standar d lexic o gr aphic al or der on   , denoted   , is dened as follo ws: for an y tw o distinct strings      , w e sa y     if and only if either  is a prop er prex of  (i.e.    ), or there exists an index     such that       and       . Fix some     . W e will use the shorthand notation       for suxes. The lexicograph- ical rank of a sux   , denoted  , is its p osition among all suxes of  sorted b y   . F ormally:                  Th us, the ranks form a p ermutation of      . Once constructed, the  -SA (plus its v arian ts describ ed in [13]) supp orts the following stringological queries: • Random A ccess (RA): Given        , the query   return  . • In verse Sux Array (ISA): Given a p osition        , the query  returns the lexicographical rank of the sux starting at  . That is,    . This allo ws the predictor to determine the lexicographical order of dieren t histories eciently . • Sux Array (SA): Given a rank        , the query   returns the starting p osition  of the sux whose lexicographical rank is  . F ormally ,        . This serv es as the inv erse op eration to the ISA query . • Longest Common Extension (LCE): Given tw o p ositions         , the query    returns the length of the longest common prex of the suxes    and    . F ormally:                               All these queries are supp orted in p olynomial time. F or our purp oses, their utilit y is enabling the follo wing tw o queries. Denition D.1 : (In ternal Pattern Count.) F or a xed     , giv en any    and    , w e dene               Denition D.2 : (Internal Pattern Match.) F or a xed     , giv en an y    and    , w e dene    to b e an y  s.t.      , or  if there is no suc h  . Lemma D.1 :    and   can b e computed in   time and   oracle queries, giv en oracle access to RA, ISA, SA and LCE. Pr o of : T o compute the coun t (  ) or the lo cation (  ) of the pattern   , w e iden tify the con tiguous range of suxes in the Sux Array that b egin with this pattern. The pattern consists of the sux   follow ed b y the symbol  . Since   is itself a sux of  , its p osition in the Sux Arra y is giv en exactly by  . Crucially , b ecause   is a prop er prex of an y other sux   that matc hes the pattern (i.e.,      ),   is lexicographically strictly smaller than any such   . Conse- quen tly , the index  marks the inclusiv e lo w er b ound of the range of suxes starting with   . The algorithm pro ceeds in t w o stages: 1. W e determine the upp er b ound of the range of suxes starting with   by performing a binary searc h to the righ t of  . W e use the LCE oracle to verify if a candidate sux   extends    . 2. Within this iden tied range, w e p erform t w o additional binary searches to isolate the sub- range where the c haracter at oset    (the p osition immediately follo wing the prex   ) matches  . The range       returned b y this pro cedure corresp onds to all o ccurrences of   . F or  , w e return the size of this range. F or  , w e return    (or  if the range is empt y). The core logic is detailed in Algorithm 4, and the wrapp er queries in Algorithm 5. It’s easy to see that there are   oracle queries, and the time complexity mo dulo oracle calls is   .  1 Input: index    , symbol    2      3     4     ,      5      6 while    7         8     9 if      10     11      12 else 13      14 if      15 return  16     ,     17       ,     18 while    19         20     21      22 if    23     24      25 else 26      27 if     or        28 return  29     ,     30 while    31         32     33      34 if    35     36      37 else 38      39 return       Algorithm 4: Pattern Range Searc h via Sux Array Oracle 1 Pro cedure   2     3 if    4 return  5 else 6         7 return        8 Pro cedure   9     10 if    11 return  12 else 13         14 return    Algorithm 5: IPC and IPM Queries D.1.3 Up date R ule The Lemp el-Ziv Plurality predictor main tains the LZ77 factorization of the sequence pro cessed so far. When a new sym b ol  is receiv ed, the algorithm must update this factorization eciently . Recall that the state is represented by the list  . Let the curren t sequence b e  with factorization       . The new sym b ol  eectively asks whether the last factor   can b e extended to include  while satisfying the LZ77 constrain ts, or if  must begin a new factor   . Using the metho ds of the previous section, sp ecically the in ternal pattern match query IPM , w e can resolv e this decision ecien tly . Indeed, extending the last factor   b y  is v alid if and only if the concatenated string    app ears previously in  . The up date pro cedure is formally describ ed in Algorithm 6. 1 Input: LZP state list  , new sym b ol    2 if  is empt y 3  .app end(  ) 4 return 5     6       7 if “en try” is pair    8    9 else 10    11            12      13     14 if    15          16 else 17  .app end(  ) Algorithm 6: LZP Up date D.1.4 Prediction R ule The prediction phase of the Lemp el-Ziv Plurality algorithm relies on a “plurality vote” based on the history of the last LZ77 factor. Let  b e the sequence pro cessed so far, and let   b e the most recen t factor in its LZ77 factorization. T o predict the next symbol, the algorithm considers all previous o ccurrences of   in  and examines the sym b ol immediately follo wing eac h o ccurrence. The predicted sym b ol    is the one that app ears most frequen tly as a con tinuation of   . F ormally , let     and    . W e consider the set of indices  where the pattern  app eared previously:                 F or eac h symbol    , w e coun t the “votes” from these occurrences:              The algorithm outputs       . Ecien tly computing these coun ts is possible using the in ternal pattern coun t query dened in Algorithm 5. Observe that querying    , where “pos” is the starting index of the last factor   , returns exactly the n umber of times the sux   (whic h is  ) occurs in  . Since the curren t sequence  ends precisely aer  , an y occurrence of  m ust eectiv ely start at some      , thereby strictly preceding the curren t factor. Th us,   is equiv alen t to   . The prediction pro cedure is detailed in Algorithm 7. 1 Input: LZP state list  , alphab et  2 if  is empt y 3 return arbitrary sym b ol 4     5       6 if “en try” is pair    7    8 else 9    10            11      12    13    14 for  in  15     16 if    17    18    19 return  Algorithm 7: LZP Predict D.2 Analysis W e no w pro v e Theorem 4.1, establishing the compression and mistake b ounds for LZP with respect to the LZ77 complexit y  . First, w e clarify the relationship b etw een the LZP state size and the complexit y measures. Recall that the LZ77 factorization of  is denoted       , where    . D.2.1 Compression Bound Theorem 4.1 (P art 2) : The predictor satises the compression b ound:            Pr o of : The internal state of the LZP predictor at time  (where  has length  ) consists solely of the list  representing the LZ77 factorization of the history . Note that while the algorithm constructs the  -SA index to p erform queries eciently , this index is rebuilt transiently at ev ery step (which is feasible in polynomial time) and is not part of the p ersisten t state   . Let    . The list  con tains exactly  entries. Each en try is either a literal sym b ol    or a pair    represen ting a bac kward reference. • A literal requires   bits. • A reference pair    requires storing a p osition    and a length    . This requires   bits. Therefore, the bit size of the state  is b ounded b y:                 This conrms the compression b ound.  D.2.2 Mistak e Bound Theorem 4.1 (P art 1) : The num b er of mistakes is bounded by:         Pr o of : Let     and let its LZ77 factorization b e       . W e analyze the mistak es committed b y the LZP predictor during the pro cessing of a single factor   . Case 1: Literal F actor. If   is a single symbol that has not app eared previously in  , it con tributes at most  mistake. Case 2: Copied F actor. Supp ose   is a factor of length   that copies a previous substring. Let     . By denition, there exists a position  in the history (      ) such that          . W e dene a “V ersion Space”   represen ting the set of v alid history indices consistent with the prex of   observ ed so far. Let         b e the prex of length  .                    Ob viously ,      . Because   is a v alid LZ77 cop y , the “true” history index  is guaran teed to satisfy     for all       . This ensures the v ersion space is never empt y:      . The LZP algorithm predicts the next sym b ol by taking a plurality vote among the extensions of indices in   . Let       be the correct next sym b ol. The algorithm calculates the coun ts for each sym b ol    :               The prediction is        . The set of surviving v ersions   consists exactly of those indices in   that correctly predict the observ ed symbol   :                 Th us, the size of the new version space is exactly the coun t of the correct sym b ol:        . A mistake o ccurs if      . If a mistake o ccurs, it implies that   did not win the pluralit y v ote. In particular,   could not ha ve held a strict ma jorit y of the votes in   . Therefore, whenev er a mistake is made, the n umber of surviving versions m ust satisfy:               Let   b e the n umber of mistak es made during the pro cessing of factor   . Since eac h mistake reduces the v ersion space cardinality b y a factor of at least  , we ha ve:                (Here,   is the set of v ersions at the end of the factor.) Since the true index  remains in the set throughout, we m ust hav e      . Therefore:            T aking logarithms:               There migh t b e one extra mistak e in the b eginning, when the   factor do esn’t exist y et. T otal Mistak es: The total n umber of mistakes is the sum of mistak es ov er all  factors. Since eac h factor contributes at most     , we get           E BaseSensitivit y of A utomaticit y In this app endix, we prov e Example 3.2. Recall the setup: let     , and for any    , let     and           W e claim that         and          . Note that          . E.1 Upp er Bound:         Pr o of : W e construct a DF A  o ver input alphabet  with   states that computes    from    . The sym b ol    dep ends only on         :                         In the binary representation           , the residue     is determined by the last    bits       , and        if and only if     . Hence,                   Dene the DF A           with                   (    states), initial state   , and transitions: • F or        and     :          . •         and         . • F or     :          and          . The output function is       and       (the v alues of  on the remaining states are irrelev an t). On input           , the automaton passes through          aer reading     . It then reads   and transitions to   (if     ) or   (if     ). The remaining bits     lea ve the state unchanged. Therefore,          as required, and            .  E.2 Lo wer Bound:          Pr o of : Let           be an y DF A with input alphabet  and output alphab et    that computes   in the sense of Denition 3.1, i.e., there exists    with       such that          for all      . W e sho w that      . Let       , so that           . In particular,     . Consider the aligned blo cks of size   in   , i.e., the subw ords               for      By Lemma C.1, if tw o blo c k indices  and  lead the DF A to the same state aer reading their base-  address (i.e.,                     where      ), then the corresp onding blo cks ha v e iden tical conten t. Contrapositively ,           It remains to sho w that the right-hand side is    . Structure of 𝑢 𝑘 . The word          has p erio d  . The transitions from  to  (whic h w e call  -b oundaries ) o ccur at p ositions   for o dd  with        . Since     , eac h aligned blo c k of size   con tains at most one b oundary . Coun ting distinct blo cks via b oundary osets. F or an  -b oundary at p osition   (with  o dd), the oset of the b oundary within its enclosing aligned blo c k is         If     , the blo ck con taining this b oundary has the form         . T wo suc h blo cks with dieren t osets are manifestly distinct. W e no w sho w that all     nonzero osets are achiev ed. Since     and   is a p ow er of  , w e hav e       , so m ultiplication b y  is a bection on    . Moreo ver, since   is odd, the   o dd integers           represen t all distinct residue classes mo dulo   : indeed, if   and   are odd with               and          , then      is divisible by   and even; since            and   is o dd, the only p ossibilit y is      . Hence, the set                  is a p ermutation of        . Since there are     o dd v alues of  in       (there are exactly  of them), and the rst   already exhaust all residues mo dulo   , every          is realized. This yields     blo c ks of the form       with pairwise distinct con tent. Therefore,                            F Prop erties of Straigh tLine Programs In this app endix, w e pro vide the pro ofs of Prop osition 4.1, Proposition 4.2 and Proposition 4.3. F.1 Pro of of Prop osition 4.1 Pr o of : Let         b e an SLP ov er  . W e construct a binary SLP             with       and       . W e pro cess each nonterminal    in top ological order (from sinks tow ard the source   ). If      , the rule is already binary and w e keep it unc hanged. If      , w e replace  b y a binary tree of fresh nonterminals. Concretely , let           where    . W e dene a recursiv e binarization pro cedure        that returns a non terminal whose v alue is       : • If    , create a fresh nonterminal   with           and return   . • If    , let    . Recursively construct            and            . Create a fresh non terminal   with             and return   . W e set   ’s replacement in   to b e the result of this procedure applied to     , and similarly for ev ery nonterminal. It remains to b ound    . The binarization of a single rule    of length  pro duces a binary tree with  lea ves. Suc h a tree has exactly    in ternal no des, each contributing 2 edges. Hence, the num b er of edges in tro duced is      . Mean while, the original rule con tributed  edges. Therefore, binarizing one rule at most doubles its edge count.  F.2 Pro of of Prop osition 4.2 Pr o of : Let      with    , i.e.    for some     . Let         b e an optimal SLP for  , so      and     . W e construct an SLP              with      and       b y “truncating”  . Since        , we dene the truncation recursiv ely . F or eac h non terminal    with           and a target length     , w e add a non terminal    whic h ev aluates to    : • If     , then      and         (no truncation needed). • Otherwise, nd the unique index  such that              . Let           . ‣ If     or       , create a fresh nonterminal   with           , discarding en tries       . ‣ If     and       , create a fresh nonterminal   with                 , discarding en tries       . The ro ot of   is         , and   is the set of non terminals reachable from    . W e no w b ound the size. The recursion visits a sequence of non terminals             following a path from the root tow ard the lea ves of  . At each visited nonterminal  , we create at most one fresh nonterminal   . Since the path visits each original non terminal at most once, the num b er of fresh nonterminals is at most  . The out-degree of each fresh nonterminal   is at most the out-degree of the original non terminal  it was deriv ed from, since w e only remo ved en tries from the right end of    . That is,          . The edges of   consist of the edges from original rules (for nonterminals reac hable from the new ro ot that were not truncated) and the edges from the fresh rules. The fresh rules con tribute at most         edges in total. The original (unmo died) rules that remain con tribute at most   edges. Hence:                  Therefore,      .  F.3 Pro of of Prop osition 4.3 Pr o of : Let     and let      . By denition, there exist an in teger  with     and a DF A           with    states, input alphab et  and output alphabet  , suc h that         for all    . Let       (so that      ) and let       . Since ev ery    satises      , the represen tation    b egins with  leading zeros. Therefore, for all    :                              where            . That is,  is equally well generated b y  starting from state   with input length   . W e no w construct an SLP  of depth   . F or an y state    and level       , dene the w ord                                 That is,   is the w ord of length   obtained by running  from state  on all  -digit base-  inputs in order. In particular,          . Eac h w ord   decomp oses by the rst input digit: reading digit    transitions from  to      , and the remaining    digits produce    . Therefore:           W e build the SLP b ottom-up, in tro ducing a non terminal   for eac h pair    with    and       , suc h that       . • Base c ase (    ):       is a single sym b ol, so   is just the terminal      . No edges are needed. • R e cursive c ase (    ): W e set             . This rule has exactly  edges. The ro ot of the SLP is      , and             . The n umber of nonterminals with    is at most     , eac h contributing  edges, so                      Since         has  as a prex, Prop osition 4.2 gives             .  G Some Classes of Sequences with Lo w SLC In this app endix, w e surv ey sev eral w ell-studied classes of sequences from com binatorics on words and establish that they hav e slowly gro wing straight-line complexit y . F or each class, w e b ound   (or equiv alently ,   up to a factor of   ) as a function of  , showing that the LZP algorithm ac hieves strong mistak e b ounds on these sequences. G.1 A utomatic Sequences A utomatic sequences are among the most extensively studied ob jects in combinatorics on words (see [12] for a comprehensive treatmen t). A sequence     is called  -automatic (for a xed in teger    ) if there exists a DF A  with input alphab et   and output alphab et  such that for all    and      ,         By Denition 3.1, a  -automatic sequence  satises       : the n um b er of states in the minimal automaton is a constant (dep ending on  but not on  ). Prop osition 4.3 then immediately giv es:                Hence, by the equiv alence of  and  up to logarithmic factors (equation (19)), w e also hav e      . G.2 Morphic Sequences Morphic sequences form one of the most imp ortant families of innite words in combinatorics on w ords (see [12], Chapter 7). W e b egin by recalling the necessary denitions, then state the Lemp el– Ziv complexit y classication due to Constantinescu and Ilie [33]. G.2.1 Bac kground and Denitions Recall that a homomorphism of free monoids        is a map satisfying     for all      . Suc h a map is completely determined by its v alues on the individual symbols in  . W e sa y that  is non-er asing if    for ev ery    , i.e., no symbol is mapped to the empt y w ord. A c o ding is a homomorphism        that maps every symbol to a single letter (i.e.,     for all    ); it is equiv alen tly just a function    applied symbol-by-sym b ol. W e say that  is pr olongable on a symb ol    if  is a prop er prex of  , that is,    for some non-empt y     . In this case, the iterates       form an increasing c hain of w ords (eac h is a prex of the next), and their limit denes a unique innite sequence       . Concretely , the  -th sym b ol of    is the  -th sym b ol of    for an y suciently large  . Denition G.1 : A sequence     is morphic if there exist a nite alphab et  , a non- erasing homomorphism        prolongable on some    , and a co ding      , suc h that             If    and  is the identit y , the sequence is called pur e morphic . The class of morphic sequences strictly con tains the  -automatic sequences: a sequence is  - automatic if and only if it can b e obtained from the ab ov e construction with  b eing  -uniform , meaning    for every    (see [12], Theorem 6.3.2). G.2.2 The Gro wth F unction The asymptotic b ehavior of  and  for a morphic sequence is gov erned by the gr owth function of the underlying morphism. F or a letter    and a non-erasing homomorphism        , the gro wth function is        , measuring the length of the  -th iterate. A classical result (see Lemma 12 in [33] and citations therein) characterizes the p ossible asymptotic b eha viors: there exist an in teger     and an algebraic real     suc h that             When     , the growth is exp onential (this includes all  -uniform morphisms, which ha ve     and     ). When     , the gro wth is p olynomial of degree   . G.2.3 The Constan tinescu–Ilie Classication Constan tinescu and Ilie [33] gav e a complete characterization of the Lemp el–Ziv complexit y classes for xed p oints of non-erasing morphisms. W e state their main result, translated into our notation. Theorem G.1 : (Constantinescu–Ilie [33].) Let      b e the xed p oint of a non- erasing morphism  prolongable on  , and let        b e the gro wth function. 1. If  is ultimately p erio dic 8 , then     . 2. If  is not ultimately p erio dic and   has exp onential growth (     ), then       . 3. If  is not ultimately p erio dic and   has p olynomial gro wth of degree     (     ), then        . Note that the case of p olynomial growth with     (linear growth, i.e.,      ) is absent from case 3: Constan tinescu and Ilie sho w that linear gro wth alwa ys forces the xed p oint to b e ultimately p erio dic, reducing it to case 1. 8 A sequence     is ultimately p erio dic if there exist       with    such that       for all     . G.3 Characteristic W ords F or     , the char acteristic wor d with slop e  is the binary sequence      dened b y             Characteristic w ords are a fundamen tal ob ject in com binatorics on w ords (see [12], Chapter 9). They are the prototypical example of sequences that are not morphic in general (a characteristic w ord is morphic if and only if  is a quadratic irrational), yet still ha ve logarithmic  . Prop osition G.1 : F or any     and the corresp onding characteristic w ord  ,         Notice that this b ound is uniform w.r.t.  , and hence an y ecient predictor for SLC has a uniform p olylog mistak e b ound o v er this entire class of sequences. The pro of relies on the theory of con tin ued fractions and standard words, whic h we briey recall. G.3.1 Characteristic Blo c ks Ev ery irrational     has a unique contin ued fraction expansion               with partial quotients     . The char acteristic blo cks               are dened b y     ,         , and the recurrence           W e will denote       . Ob viously , we ha ve     ,      and            The k ey prop ert y we need is that        for    (see [12], Theorem 9.1.10). W e also ha ve the ob vious low er b ound:                 G.3.2 Pro of of Prop osition G.1 Pr o of : If  is rational then  is p erio dic and the prop osition is obviously true. F rom now on, w e assume  is irrational. W e construct a sequence of SLPs           with       , where eac h   extends   b y adding fresh non terminals. Because of this cum ulative construction, all non terminals of   are already presen t inside   and can b e referenced freely when building   . W e use the fact that for any nonterminal  pro ducing a word  and any integer    , an SLP for   can b e built from  by repeated squaring, in tro ducing at most   fresh edges. Indeed, writing  in binary as     with     and     , the binary metho d computes            using  squaring rules, then accumulates the pro duct o v er the remaining set bits using at most  concatenation rules (one p er set bit among       ). Eac h rule is binary (2 edges), giving at most     fresh edges in total. F or    , no fresh edges are needed. Construction and size b ound.   is the trivial SLP with v alue     and size      . F or   : if     then     and      ; otherwise, we build     by rep eated squaring and concatenate the terminal  , giving                  . F or    , given   (which already contains   ), we build   pro ducing          as follows. W e raise the ro ot non terminal of   to the p ow er   by rep eated squaring (      fresh edges), then concatenate with the ro ot of   via one fresh binary rule (  edges). Since the nonterminals of   already reside inside   , the only new edges are from the squaring c hain and the nal concatenation, giving                It follo ws that for all    :               W e b ound eac h term. By Equation 104,        , so the sum telescop es:                        F or the linear term,         (since     ), which gives      . Iterating,      for all    (since     and     ). Hence          Substituting Equation 108 and Equation 109 in to Equation 107:                        Handling arbitrary prexes. Given     , let    b e the largest index with     , so that     . W e rst observe that   is a prex of   : indeed,          and     . It follo ws that          is a prex of      . Since         , we hav e          . No w let      , so that     and               . W e hav e           . The SLP for    extends   b y at most    fresh edges from the repeated-squaring c hain, so its size is at most       . By Equation 110 and the b ounds       and       :                      Finally , Prop osition 4.2 giv es                  

Stringological sequence prediction I: efficient algorithms for predicting highly repetitive sequences

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment