A New Algorithm for Building Alphabetic Minimax Trees
We show how to build an alphabetic minimax tree for a sequence (W = w_1, >..., w_n) of real weights in (O (n d \log \log n)) time, where $d$ is the number of distinct integers (\lceil w_i \rceil). We apply this algorithm to building an alphabetic pre…
Authors: Travis Gagie
Fundamenta Informaticae * * I N P R E PA R A T I O N * * 1–8 1 IOS Pr ess A New Algorithm f or Buildi ng Alphabetic Minimax T re es T ra vis G ag ie ∗ Department of Computer Science University of Eastern Piedmont 15100 Alessandria (AL), Italy travis@mfn.unipmn.it Abstract. W e s how how to build an alphabetic m inimax tr ee for a seq uence W = w 1 , . . . , w n of real weights in O ( nd log log n ) time, where d is the numb er of distinct integers ⌈ w i ⌉ . W e ap ply th is algorithm to building an alphabetic prefix code gi ven a sample. Keywords: data structures, alphabetic minimax trees 1. Introd uction For the a lphabet ic minimax tree pro blem, we a re gi ven a sequence W = w 1 , . . . , w n of weights and an inte ger t ≥ 2 and asked to find an order ed t -ary tree on n lea ves such that, if the dept hs of the lea ve s from left to right are ℓ 1 , . . . , ℓ n , then max 1 ≤ i ≤ n { w i + ℓ i } is minimized . Such a tree is call ed a t -ary alphab etic minimax tree for W and the minimum maximum sum, α ( W ) , is called the t -ary alphabetic minimax cost of W . Hu, Kleitman and T amaki [7] gav e an O ( n log n ) -time algorithm for this problem when t is 2 or 3. Under the assumption the t ree must b e stric tly t -ary , Kirkpatric k and Klawe [8] ga ve O ( n ) -time and O ( n log n ) -t ime algorit hms for inte ger and real weights , respecti vely , which they applied to boundin g circuit fan-o ut. Coppersmit h, Klawe and Pippeng er [3] modified Kirkpatrick and Klawe’ s algorithms to wor k without the assump tion, and again applied them to bound ing circuit fan-out. Kirk patrick and Przytyck a [9] ga ve an O (lo g n ) -time, O ( n/ log n ) -proce ssor algorit hm for inte ger weig hts in th e CREW Address for correspond ence: Dipartimento di Informatica, via Bellini 25g , 15100 Alessandria (AL), Italy ∗ Supported by Italy-Israel FIRB grant “Pattern Discov ery Algorithms in Discrete Structures, wit h Applications to Bioinformatics”. 2 T . Gagie / Building Alphabetic Minimax T r ees PRAM model. Finally , Eva ns and Kirkpatrick [5] gav e an O ( n ) -ti me algorithm for the problem with inte ger weight s in which we want to find a binary tree tha t minimize s the maximum over i of the sum of the i th weight and the i th node’ s (rather than leaf ’ s) d epth, a nd applied it to restructuri ng ord ered binary trees. In this paper , we gi ve an O ( nd log log n ) -t ime algorithm f or the original problem wit h real weights, w h ere d is the number of di stinct int egers ⌈ w i ⌉ . Our a lgorithm can be adapted to work for an y t bu t, to simplify the presenta tion, we assume t = 2 and w ri te log to m ea n log 2 . 2. Motivation Our interest in alphabeti c minimax trees stems from a problem concerning alphab etic prefix codes, i.e., prefix codes in which the lexi cograph ic order of the code words is the same as that of the characters. Suppose we want to bui ld an alphabet ic prefix code with which to compress a file (or , equiv alently , a leaf-or iented bina ry search tree with which to sort it), but we are giv en only a sample of its charact ers. Let P = p 1 , . . . , p n be the normaliz ed distrib ution of cha racters in the file, let Q = q 1 , . . . , q n be the normaliz ed dis trib ution of charact ers in the sample an d suppose ou r codew ords a re C = c 1 , . . . , c n . An ideal code for Q assigns the i th charact er a code word of length log(1 /q i ) (which may not be an inte ger), and the ave rage codew ord’ s length using such a code is H ( P ) + D ( P k Q ) , where H ( P ) = P i p i log(1 /p i ) is the entrop y of P and D ( P k Q ) = P i p i log( p i /q i ) is the relati ve entropy between P and Q . Consider the best worst-cas e bound we can achiev e on how much the av erage codew ord’ s length exc eeds H ( P ) + D ( P k Q ) . As lo ng as q i > 0 whene ver p i > 0 , the av erage code word ’ s length is X i p i | c i | = X i p i log(1 /p i ) + log ( p i /q i ) + log q i + | c i | = H ( P ) + D ( P k Q ) + X i p i (log q i + | c i | ) (if q i = 0 bu t p i > 0 for some i , then our formula is undefined) . Notice each | c i | is the length of the i th branch in the trie for C . Therefore, the best bound we can achiev e is min C max P ( X i p i (log q i + | c i | ) ) = min C max i { log q i + | c i |} = α (log q 1 , . . . , log q n ) , and we achie v e it when the trie for C is an alph abetic minimax tree for log q 1 , . . . , log q n . In se veral reasonabl e special cases, we can buil d the alphabetic minimax tree for log q 1 , . . . , log q n in o ( n log n ) time. For exampl e, if each pair q i and q j dif fer by at most a multiplicati v e constan t — a case Klawe and Mumey [10] considered w he n bui lding optimal alphabeti c prefix codes — then eac h pair log q i and log q j dif fer by at most an additi ve constant, so the number of distinct integer s ⌈ log q i ⌉ is consta nt and our algorithm runs in O ( n log log n ) time. T . Gagie / Building Alphabetic Minimax T r ees 3 3. Algorithm Let B = b 1 , . . . , b n be the va lues w 1 − ⌊ w 1 ⌋ , . . . , w n − ⌊ w n ⌋ sorted into nonde creasing ord er . K irk patrick and Klawe s ho wed that, if i is the smallest inde x such that α ⌈ w 1 − b i ⌉ , . . . , ⌈ w n − b i ⌉ = α ⌈ w 1 − b n ⌉ , . . . , ⌈ w n − b n ⌉ , then α ( W ) = ⌈ w 1 − b i ⌉ , . . . , ⌈ w n − b i ⌉ + b i and any alphabetic minimax tree for ⌈ w 1 − b i ⌉ , . . . , ⌈ w n − b i ⌉ is an alphabetic minimax tree for W . Their O ( n log n ) -time algorithm for real weights is a simple combina tion of this fact , binary sear ch and their O ( n ) -ti me algorithm for intege r weights: they compute and sort w 1 − ⌊ w 1 ⌋ , . . . , w n − ⌊ w n ⌋ to obtain B , compute an alphabeti c minimax tree for the sequence ⌈ w 1 − b n ⌉ , . . . , ⌈ w n − b n ⌉ of integer weights, and use binary search to find b i ; for each step of the binary search, if the cand idate value to be tested is b j , then they build an alp habetic minimax tree for the sequence ⌈ w 1 − b j ⌉ , . . . , ⌈ w n − b j ⌉ of integ er weights and compare α ⌈ w 1 − b j ⌉ , . . . , ⌈ w n − b j ⌉ to α ⌈ w 1 − b n ⌉ , . . . , ⌈ w n − b n ⌉ . Our idea is to av oid sorting w 1 − ⌊ w 1 ⌋ , . . . , w n − ⌊ w n ⌋ and then bui lding an alphabe tic minimax tree from scratch for each step of the binary search. T o av oid sortin g, we use a techniqu e simila r to the one Klawe and M u mey described for generalized selecti on; to av oid bui lding the trees from scratch , we use a dat a structur e based on Kirkpa trick and Przytycka’ s lev el tree d ata structu re for W . O u r data st ructure, which we describ e in Section 4, stores W and X = x 1 , . . . , x n = 0 , . . . , 0 and performs any sequence of O ( n ) of the follo wing operations in O ( nd log log n ) time: set ( i ) — set x i to 1; undo — undo the last set operat ion; cost — retur n α ⌈ w 1 ⌉ − x 1 , . . . , ⌈ w n ⌉ − x n . W e first find b n = max i { w i − ⌊ w i ⌋} and then, us ing Kirkpatrick and Klawe’ s O ( n ) -time algorithm, α ⌈ w 1 − b n ⌉ , . . . , ⌈ w n − b n ⌉ . W e buil d the multiset S 0 = h w i − ⌊ w i ⌋ , i i and use binary search to find the smallest v alue w i − ⌊ w i ⌋ such that α ⌈ w 1 − ( w i − ⌊ w i ⌋ ) ⌉ , . . . , ⌈ w n − ( w i − ⌊ w i ⌋ ) ⌉ = α ⌈ w 1 − b n ⌉ , . . . , ⌈ w n − b n ⌉ . Once we ha ve w i − ⌊ w i ⌋ , we use Kirk patrick and Klawe’ s O ( n ) -t ime algorith m again to b uild an alpha - betic minimax tree for the sequen ce ⌈ w 1 − ( w i − ⌊ w i ⌋ ) ⌉ , . . . , ⌈ w n − ( w i − ⌊ w i ⌋ ) ⌉ of inte ger weights. For t he k th st ep of the bin ary search, we use Blum et a l. ’ s algorith m [2 ] to find the median m k of the first compon ents in S k ; we di vide S k into S ′ k = h w i − ⌊ w i ⌋ , i i : w i − ⌊ w i ⌋ < m k , S ′′ k = h w i − ⌊ w i ⌋ , i i : w i − ⌊ w i ⌋ = m k , S ′′′ k = h w i − ⌊ w i ⌋ , i i : w i − ⌊ w i ⌋ > m k ; for each second componen t j in S ′ k or S ′′ k with w j not an inte ger , we set x j to 1; we compare α ⌈ w 1 ⌉ − x 1 , . . . , ⌈ w n ⌉ − x n to α ⌈ w 1 − b n ⌉ , . . . , ⌈ w n − b n ⌉ ; if it is equal, then m k is still a cand idate, so we undo 4 T . Gagie / Building Alphabetic Minimax T r ees all the set operation s we pe rformed i n th is step and rec urse on S ′ k ; i f it is gre ater , then m k is too small, so we le av e all the set oper ations and rec urse on S ′′′ k . The last ca ndidate consid ered during the se arch is the v alue w i − ⌊ w i ⌋ we w ant. For the k th ste p of the s earch, we sp end O ( n/ 2 k ) time finding the media n m k and div iding S k into S ′ k , S ′′ k and S ′′′ k , and perform O ( n/ 2 k ) operation s on the data structur e. Summing ov er the steps, we use O ( n ) time to find all the medians and divide all the sets and O ( nd log log n ) time to perfo rm all the operatio ns on the data structu re. Lemma 3.1. G i ven a data structure that performs any sequence of O ( n ) set , u ndo and cost op erations in O ( nd log log n ) time, we can b uild an alphabet ic minimax tree for W in O ( nd log log n ) time. 4. Data structure If we define the weight of the i th leaf of an alphabe tic minimax tree for W to be w i , and the weight of each interna l node to be the maximum of its childre n’ s weigh ts plus 1, then the weight of the root is α ( W ) . W e woul d like to use this property to recompute α ⌈ w 1 ⌉ − x 1 , . . . , ⌈ w n ⌉ − x n ef ficiently after updati ng X , but eve n small chang es can greatly af fect the sha pe of the alph abetic minimax tree: e.g., suppo se n = 2 k + 1 , each w i = k − 1 / 2 and each x i = 0 ; if we set x 1 and x 2 to 1 then, in the unique alphab etic minimax tree for ⌈ w 1 ⌉ − x 1 , . . . , ⌈ w n ⌉ − x n = k − 1 , k − 1 , k , . . . , k , e very ev en-numbe red leaf except the second is a left-child; but if w e instead set x n − 1 and x n to 1 then, in the uniqu e alphabe tic m in imax tree for ⌈ w 1 ⌉ − x 1 , . . . , ⌈ w n ⌉ − x n = k , . . . , k , k − 1 , k − 1 , e very e ve n-numbere d leaf exce pt the ( n − 1) st is a right-c hild. Fortun ately for us, K irk patrick and Przytycka defined a data structur e, called a lev el tree, that repre- sents an alphab etic minimax tree bu t whose shape is less v olatile. Let Y = y 1 , . . . , y n = ⌈ w 1 ⌉ − x 1 , . . . , ⌈ w n ⌉ − x n , and con sider their definitio n of the le vel tree for Y (we ha v e changed the ir notation sligh tly to match our o wn): “W e start our de scriptio n of the le vel tree with the follo wing geometr ic constru ction (see Figure 1): Represe nt the sequence of weights Y by a polygo nal line; for ev ery i = 1 , . . . , n dra w on the plane the point ( i, y i ) , and for ev ery i = 1 , . . . , n − 1 connect the points ( i, y i ) and ( i + 1 , y i +1 ) ; for ev ery i such that y i > y i +1 (resp., y i > y i − 1 ) draw a horizon tal line going from ( i, y i ) to its right (resp., left ) un til it hits the polygo nal line. The interv als defined in such a way are called the level interv als . W e also consider the interv al [(0 , ∞ ) , ( n + 1 , ∞ )] and the degene rate interv als [( i, y i ) , ( i, y i )] as lev el interv als. L et e be a le ve l interv al. Note that at least one of e ’ s endpoints is equal to ( i, y i ) for some index i . . . . W e define the level of a level interva l to be equal to [the sec ond componen t of points b elongin g to that interv al]. Note that an alphabetic minimax tree can be embedded in the plane in such a way that the root of the tree belongs to the le vel interv al [(0 , ∞ ) , ( n + 1 , ∞ )] and that intern al nodes T . Gagie / Building Alphabetic Minimax T r ees 5 whose weights are equal to the weight of one of the leav es belong to the horizonta l line throug h this leaf. Furthermore, if there is a tree edge cuttin g a lev el interv al then adding a node subdi vidi ng this edge to the alphabetic m in imax tree does not increase the weight of the root. By this observ ation we can consider alphabe tic minimax trees which can be embedde d in the p lane in such a way that all e dges inte rsect le vel interv als only at endpoints (see Figure 2). The level tr ee for Y is the ordere d tree whose nodes are in one -to-one cor respond ence with the le vel interv als defined abov e. The parent of a node v is the internal node which corres ponds to the closest lev el interv al w hi ch lies abov e the lev el interv al corres ponding to v . The left -to-righ t order of the children of an interna l node correspond s to the left-to- right order of the corres ponding lev el interv als on the plane (see Figure 3). For eve ry node u of a le vel tree we define load( u ) to be equal to the number of node s of the con structed alphab etic minimax tree whic h belon g to the l ev el interv al correspond ing to u (assuming t he abo ve embeddi ng). If u is a leaf then load( u ) = 1 . Assume tha t u is an internal node and let u 1 , . . . , u k be the childre n of u . L et ∆ u denote the minimum of the valu e ⌈ log n ⌉ and the diffe rence between the le vel of the lev el interv al correspon ding to node u and the lev el of the interv als corres ponding to its children. It is ea sy to confirm that load( u ) = load( u 1 ) + · · · + load ( u k ) 2 ∆ u . ” Notice that, if u is the root of the le vel tree and u 1 , . . . , u k are its child ren, then Kirkpatrick and Przytycka embed load( u 1 ) + · · · + load( u k ) nodes of the alphab etic minimax tree into the interv als correspo nd- ing to u 1 , . . . , u k . It follo ws that α ( Y ) is the le vel of the interv als correspond ing to u 1 , . . . , u k plus log(load( u 1 ) + · · · + load( u k )) . It is s traightfo rward to b uild the le vel tree for Y in O ( n ) time, by first build ing an alpha betic minimax tree for it. Moreov er , if w e set a bit x i to 1 and thus decrement y i , then the shape of the lev el tree for Y and the loads change o nly in t he vicinit y of t he i th leaf and alo ng t he p ath fr om it to th e ro ot. The n umber of lev els is the number of distinct weights in Y plus one, so the length of that path is O ( d ) (recall d is the n umber of distinc t integers ⌈ w i ⌉ ). Unfortunat ely , the le vel tree ca n ha v e ve ry high degree , so we may not be able, e.g., to na viga te very quickl y from the root to a leaf. W e store a point er to the root of the le vel tree a nd an array of pointers to its lea ves, and p ointers from each node to its parent. At each internal node, we store its children in a doubly-l inked list (so each child points to the siblings immediately to its left and right). It is not hard to verify that, with these pointers, we can implement a cost opera tion in O (1) time a nd reach al l the nodes that n eed to be upda ted for a s et operat ion in O ( d ) time. W e ca nnot implemen t set operatio ns in O ( d ) worst -case time, ho wev er , becaus e of the followin g case (see Figure 4): suppos e the siblings u 1 and u 2 immediatel y to the left and right of the i th leaf v are internal nodes whose children belong to lev el interv als with lev el y i − 1 ; if w e set x i to 1 and thus decrement y i and v ’ s le vel, then u 1 ’ s former childr en, v and u 2 ’ s former children will all ha ve the same parent (either a ne w node u if v had siblings other than u 1 and u 2 , as sho wn in F ig ure 4, or their former parent if it did not). T o deal w it h this case, we store all the internal nodes of the lev el tree in a union-find data structure, due to Mannila and Ukkonen [12 ], that suppo rts a deu nion operat ion. Rather than adjusting all of u 1 ’ s and u 2 ’ s fo rmer c hildren to point to their ne w par ent, we simply perform a union opera tion on u 1 and u 2 . 6 T . Gagie / Building Alphabetic Minimax T r ees Figure 1. The level intervals for 4 , 5 , 2 , 2 , 2 , 1 , 2 , 3 , 6 , 4 . Figure 2. An alphabetic minimax tree for 4 , 5 , 2 , 2 , 2 , 1 , 2 , 3 , 6 , 4 . Figure 3. The level tree for 4 , 5 , 2 , 2 , 2 , 1 , 2 , 3 , 6 , 4 , with internal nodes’ loads shown. T . Gagie / Building Alphabetic Minimax T r ees 7 Figure 4. Decrementing a node v ’ s level can for ce us to comb ine its adjacen t siblings u 1 and u 2 into a new node u . Whene v er we follo w a poin ter to an internal node, we perform a find operation on it and, if necessary , update the pointer . Each cost operation on the lev el tree take s one find operatio n on the union-find data structu re and O (1) extra time, and each set operation takes at most one union operatio n, O ( d ) find operation s and O ( d ) extra time. Whene ver w e make a modificati on to the le vel tree oth er than an operat ion on the uni on-find data structu re, we push it onto a stack. T o perform an undo operation on the l ev el tree, we pop and r ev erse al l the modificatio ns we made si nce starting the la st set operation and, if necessa ry , perform a d eunion opera tion. Any sequence of O ( n ) operations on the le vel tree takes O ( nd ) operations on the union-find data struct ure, which Mannila and Ukkonen showed take a total of O ( nd log log n ) time. Lemma 4.1. In O ( n ) time we can bui ld a data structure that performs any sequence of O ( n ) set , un do and cost opera tions in O ( nd log log n ) time. 5. Conclusion Combining Lemmas 3.1 and 4.1, we ha ve the follo wing theorem: Theor em 5.1. W e can b uild an alphabe tic minimax tree for W in O ( nd log log n ) time. Since d could be as small as 1 or as lar ge as n , our theorem is incompara ble to prev ious results. W e can bui ld the tree in O n min ( d log log n, log n ) time, of course, by first findin g d in O ( n ) time and then, depend ing on whether d log log n < log n , us ing ei ther our alg orithm or one of the O ( n log n ) -t ime algori thms mentione d in Section 1. In closing, we note there has recent ly been interesting work in vo lving unordered minimax trees. Baer [1] observ ed that the problem of b uilding a prefix code with mimimum maximum pointwise re- dunda ncy — orig inally posed and solv ed by Drmota and Szpank owsk i [4 ] — can also be solve d w it h a H u ffman -like algor ithm, due to Golumbic [6], for b uildin g unordered minimax trees. Giv en a prob- ability distrib ution ov er n characters , Drmota and S zp anko wski’ s algorith m takes O ( n log n ) time, or 8 T . Gagie / Building Alphabetic Minimax T r ees O ( n ) time if the probabilitie s are so rted by the fraction al parts of their logarith ms; we conje cture that, b y using Blum et al. ’ s algorithm as we did in this paper , it can be made to run in O ( n ) time ev en when the probab ilities are unsorted . Like Huf fman’ s algorithm (see [11]), Golumbic’ s algorithm takes O ( n log n ) time, or O ( n ) time if the probab ilities are sorted by their valu es. Refer ences [1] Baer , M. B.: T ight bo unds on m inimum maximum pointwise redun dancy , Pr o ceedings of the IEEE Interna- tional Symposium on Information Theory , 2008. [2] Blum, M., Floyd, R. W ., Pratt, V . R., Riv est, R. L., T arjan, R. E.: T ime b ounds fo r selection , Journal of Computer and System Sciences , 7 (4), 1973, 448–4 61. [3] Coppersmith, D., Klawe, M. M., Pippenger , N.: Alphabetic minimax trees of degree at most t , SIAM J ournal on Computing , 15 (1), 1986, 189–1 92. [4] Drmota, M. , Szpankowski, W .: Precise minim ax redundan c y and regret, I EEE T ransactions on Info rmation Theory , 50 (11), 2004, 2686– 2707. [5] Evans, W . S., Kirkpatrick , D. G. : Restructuring orde red binar y trees, Journal o f A lgorithms , 5 0 (2), 2 004, 168–1 93. [6] Golumbic, M. C.: Comb inatorial merging, IEEE T ransactions on Computers , 25 (11), 1976, 11 64–1167. [7] Hu, T . C ., Kleitman, D. J., T amaki, J .: Binary trees optimum und er various criteria, SIA M Journal on Applied Mathematics , 37 (2), 1979 , 246–2 56. [8] Kirkpatrick , D. G., Klawe, M. M.: Alph abetic m inimax trees, SIAM Journal on Computing , 14 (3) , 1 985, 514–5 26. [9] Kirkpatrick , D. G., Przytycka , T . M.: An optima l pa rallel m inimax tree algorithm , Pr o ceedings of the 2nd Symposium on P a r allel and Distributed Pr ocessing , 199 0. [10] Klawe, M. M. , Mumey , B.: Upper and lower bound s on constructing alp habetic binary trees, SIAM Journal on Discr ete Mathematics , 8 (4), 1995, 638–65 1. [11] van Leeuwen, J.: On the constru ction of Huffman trees, Pr oceedings o f the 3rd Internatio nal Colloq ium on Automata, Lan guages and Pr ogramming , 1976 . [12] Mannila, H., Ukkonen, E.: The set u nion problem with backtra cking, P r oceeding s of the 13th In ternational Colloquium on A utomata , Lang uag es and Pr ogramming , 19 86.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment