Minimax Trees in Linear Time

A minimax tree is similar to a Huffman tree except that, instead of minimizing the weighted average of the leaves' depths, it minimizes the maximum of any leaf's weight plus its depth. Golumbic (1976) introduced minimax trees and gave a Huffman-like,…

Authors: Pawel Gawrychowski, Travis Gagie

Minimax T rees in Linear Time Pa we l Gawrycho wski 1 and T ravis Gagie 2 , ⋆ 1 Institute of Computer Science Universit y of W roclaw, P oland gawry1@gma il.com 2 Researc h Group in Genome In formatics Universit y of Biele feld, German y travis.gag ie@gmail.com Abstract. A minimax tree is similar to a H u ffman tree except that, instead of minimizing th e w eigh ted a verag e of the leav es’ d epths, it min- imizes the ma ximum o f an y leaf ’s weig ht plus its dep th. Golum bic ( 1976) introduced minimax trees and gav e a Huff man - lik e, O ( n log n )-time al- gorithm for b uilding them. Drmota and S zpanko wski (2002) ga ve an- other O ( n log n ) -time algorithm, which c hecks the Kraft Inequ alit y in eac h step of a b inary search. In this pap er w e show ho w Drmota and Szpanko wski’s algorithm can b e made to run in linear time on a word RAM with Ω (log n )- bit w ords. W e also discuss ho w our solution applies to problems in data compression, group testing and circuit design. 1 In tro duction In a minimax tree for a multiset W = { w 1 , . . . , w n } of weigh ts, ea c h leaf has a weigh t w i , e ac h int ernal no de has weight equal to the maximum of its children’s weigh ts plus 1, and the weight of the ro ot is a s small as po ssible. In o ther w ords, if ℓ i is the dept h of the lea f with weigh t w i , then max i { w i + ℓ i } is minim ized. The weigh t o f the roo t is called the minimax cost of W , denoted M ( W ). Golumbic [17] show ed that if w e modify Huffman’s algorithm [20] to rep eatedly replace the tw o no des with sma llest weights w i and w j , by a no de with weigh t max( w i , w j ) + 1 instead of w i + w j , then it builds a minimax tree instead of a Huffman tr ee. Like Huffman’s algor ithm, it takes O ( n log n ) time and can build trees of any degr ee. Our results in this pap er also generalize to hig her deg rees a nd larger co de al- phab ets but, fo r the sa k e of simplicity , we hencefor th consider o nly binary trees and alphab ets. Golum bic, Parker [29] and Ho ov er, Klawe and Pipp enger [18] show ed how to use Golumbic’s a lgorithm to restr ict cir cuits’ fa n-in and fan-out without gr eatly increas ing their sizes or depths. Drmota and Szpanko wski [9, 10] p ointed out tha t, if P = p 1 , . . . , p n is a proba bilit y distribution a nd each w i = log(1 / p i ), then a minimax tree for W is th e co de-tree for a prefix co de with ⋆ This pap er was written while th e second au thor wa s at the U nivers ity of Eastern Piedmon t, Italy , supp orted by I taly-Israel FIRB Pro j ect “Pa ttern Disco very Algo- rithms in Discrete Struct u res, w ith Ap plications to Bioinfo rmatics”. 2 P . Gaw rycho wski and T. Gagie minim um maximum point wise redundancy with respec t to P . (As w e are consid- ering only binar y trees in this paper , by log w e alw ays mean log 2 .) They ga ve an- other O ( n log n )-time a lgorithm for building minimax trees and, by analy zing it, prov ed b ounds o n the r edundancy of arithmetic co ding, which Baer [3] recently improv ed by ana lyzing Golumbic’s a lgorithm. Drmo ta and Szpankowski start with a Shanno n co de [30] for P , in which the c odeword for the i th c hara cter has length ⌈ log (1 /p i ) ⌉ , for each i ; they sort the lo garithms by their fractional par ts, i.e., lo g(1 /p 1 ) − ⌊ log(1 /p 1 ) ⌋ , . . . , log (1 /p n ) − ⌊ log(1 /p n ) ⌋ ; and they use binary search to find the lar gest v alue x such that ⌈ log(1 /p 1 ) − x ⌉ , . . . , ⌈ log(1 /p n ) − x ⌉ ob ey the Kra ft Inequality [27]. In a previo us pap er [1 4] (see also [15, 2 1]) we noted that minimax tr ees built with Golum bic’s alg orithm have the s ame Sib- ling Prop erty [11, 1 6] as Huffman tr ees, and tur ned the F aller-Gallager -Knuth algorithm [26] for dynamic Huffman co ding into an a lgorithm for dynamic Shan- non co ding. Intriguingly , altho ugh static Huffman coding is optimal a nd s tatic Shannon co ding is not, dynamic Shannon co ding ha s a b etter worst-cas e b ound than dynamic Huffman co ding do es. Hu, Kleitman a nd T amaki [19 ] g av e an O ( n log n )-time algorithm for build- ing alphab etic minimax trees, in whic h the lea ves’ weight s, from left to right, m ust b e in the given order. Kirkpatrick and Klaw e [23] and Copp ersmith, Klaw e and Pipp enger [6] gave an algorithm (or, more precisely , t w o alg orithms that are equiv alen t when trees a re binary) that builds an alpha betic minimax tree for int eger weigh ts in O ( n ) time, a nd sho wed how to use it to restrict circuits’ fan-in and fan- out without gr eatly inc reasing their sizes or depths and without chang- ing the num ber s o f e dge crossing s (and, thus, preserving planarity). Kir kpatrick and Kla we also show ed how to combine their algor ithm with binary search in order to build alphab et minimax trees for r eal weigh ts in O ( n log n ) time. W e note that, if their algo rithm for in teger weigh ts is viewed as an alphab etic ana- logue of the K raft Inequality — a s it was by Y eung [32] and Nak a tsu [2 8], who independently rediscovered it — then their algorithm for rea l weigh ts is the a l- phab etic analogue of Drmota and Szpankowski’s. Kirkpatr ic k a nd Pr zyt yck a [24] gav e an O (log n )-time, O ( n/ log n )-pro cessor algorithm for in teger weigh ts in the CREW PRAM mo del. In another prev ious pap er [13 ] we used a data structure due to Kirkpatrick and Przytyck a and a technique for generaliz ed selection due to Klawe and Mumey [25], to make K irkpatrick and K la we’s a lgorithm for rea l weigh ts run in O  n min(log n, d lo g log n )  time, wher e d is the num ber of distinct v alues ⌈ w i ⌉ . In this pap er we prov e a conjecture we made then, that a simila r mo dification ca n make Drmota and Szpanko wski’s a lgorithm run in O ( n ) time. 2 Applications In the full version of this pap er, w e w ill co nsider all of the following problems: A. build a prefix co de with minimum maximum p oint wise r edundancy; B. given a go o d estimate of the dis tribution ov er an alphab et, build a go o d prefix code ; Minimax T rees in Linear Time 3 C. given a go od estimate of the distribution over a set, design a go o d group test to find the unique target; D. build a minimax tree fo r a multiset o f real weigh ts; E. build a Shannon co de; F. build a tree whose leav es hav e at most given depths; G. r estrict a circuit to hav e b ounded fan-in o r fan-o ut; H. build a minimax tree for a multiset o f integer weigh ts. The authors cited in the introduction hav e already shown, how ever, that Pro b- lem A takes O ( n ) mo re time than D, E than F, and F and G than H. Therefo re, in the current version of this pap er, w e consider only Pr oblems B, C, D and H. In the remainder of this section we define what w e mean b y “go o d” in Prob- lems B and C, and show they take O ( n ) more time than D. Pr oblems B and C are, in fact, equiv alent to each other and to A, and ana logous to a pr oblem we considered in our pap er [13] on building alphab etic minimax tree s. In Section 3 we give tw o O ( n )-time algorithms for P roblem H. Finally , in Section 4 w e show how to use either of those a lgorithms to obtain an algor ithm for Pro blem D that takes O ( n ) time o n a w ord RAM with Ω (log n )-bit words. It follows that a ll the problems listed above ta k e O ( n ) time. Suppo se w e wan t to build a go o d prefix co de with which to compress a file, but we are given only a sample o f its characters. Let P = p 1 , . . . , p n be the normalized dis tribution o f character s in the file, let Q = q 1 , . . . , q n be the normalized distribution of character s in the s ample a nd supp ose our co dewords are C = c 1 , . . . , c n . An ideal co de for Q assigns the i th c hara cter a co deword of leng th log (1 /q i ) (which may not b e an integer), and the av erage co deword’s length using such a co de is H ( P ) + D ( P k Q ), where H ( P ) = P i p i log(1 /p i ) is the entropy o f P and D ( P k Q ) = P i p i log( p i /q i ) is the rela tiv e entrop y b et ween P and Q . The ent ropy measures our exp ected surpris e a t a character drawn uniformly a t rando m from the file, given P ; the relative en tropy (also known as the informational divergence or Kullback-Leibler pseudo-dis tance) meas ures the increase in o ur exp ected s urprise when we es timate P by Q , and is often used to quantify how w ell Q approximates P (see, e.g., [8]). Consider the b est worst-case b ound we can achiev e, given only Q , on how m uch the average co deword’s length excee ds H ( P )+ D ( P k Q ). A result by Katona and Nemetz [22] implies w e do not g enerally achieve a constan t b ound o n the difference when C is a Huffman co de for Q . (Given P , of course, the b est b ound we could a c hieve on how muc h the av erage co deword’s length exc eeds H ( P ), would b e the redundancy of a Huffm an code for P .) F or example, if q 1 , . . . , q n are prop ortional to F n , . . . , F 1 , whe re F i denotes the i th Fib onacci nu mber (i.e., F 1 = F 2 = 1 a nd F i = F i − 1 + F i − 2 for i ≥ 3), then the co dew ords ’ leng ths are 1 , . . . , n − 2 , n − 1 , n − 1 in any Huffman co de for Q . If p n is sufficiently close to 1, then H ( P ) + D ( P k Q ) ≈ log (1 /q n ) 4 P . Gaw rycho wski and T. Gagie = log n X i =1 F i = n log φ + O (1) but the av erag e co deword’s length P i p i | c i | ≈ n − 1, so for lar ge n the difference is ab out (1 / log φ − 1) n ≈ 0 . 44 n , where φ ≈ 1 . 62 is the golden r atio. As long as q i > 0 whenever p i > 0, the av erage co deword’s length X i p i | c i | = X i p i  log(1 /p i ) + lo g( p i /q i ) + lo g q i + | c i |  = H ( P ) + D ( P k Q ) + X i p i (log q i + | c i | ) (if q i = 0 but p i > 0 for s ome i , then D ( P k Q ) is infinite). Notice each | c i | is the length o f a bra nc h in the co de-tree for C . Therefore, the b est b ound we can achiev e is min C max P ( X i p i (log q i + | c i | ) ) = min C max i { log q i + | c i |} = M (log q 1 , . . . , log q n ) , which is less than 1, b y insp ection of Drmo ta and Szpanko wski’s algorithm. (Re- call that M (log q 1 , . . . , log q n ) deno tes the minimax cost o f { log q 1 , . . . , log q n } , i.e., the weigh t of the r o ot of a minimax tree for { log q 1 , . . . , log q n } .) Mo reov er, we achiev e this b ound when the co de-tree for C has the same s hape a s a min- imax tree for { log q 1 , . . . , log q n } . In o ther words, Problem B takes O ( n ) more time than D. Now supp ose we w ant to des ign a go od g roup test (see, e.g., [1, 2]) to find the unique targ et in a set, given o nly an estimate Q — presumably gained from past exp erience or exper imen tation — of the proba bilit y distribution P accor ding to which t he ta rget is c hosen. A group test allows us to choose, rep eatedly , a subset of the elemen ts a nd check whether the tar get is among them. W e can r epresent a gro up test as a decis ion tree in which each leaf is lab elled with an element and each internal no de is lab elled with the concatenation o f its c hildren’s lab els. Because suc h a decision tree can b e viewed as the co de-tree for a pr efix co de, and vice versa, the exp ected n umber of c hecks w e mak e exceeds H ( P ) + D ( P k Q ) by a s little as po ssible when the decision tree for our group test has the sa me shap e a s a minimax tree for { log q 1 , . . . , log q n } . In other words, Problem C is equiv alen t to B and, therefore, also ta k es O ( n ) more time than D. W e a re currently studying whether either Drmota a nd Szpanko wski’s so lu- tion to Pro blem A or o ur so lution to B can give us an intuitiv e explana tion of why dynamic Shannon co ding has a be tter worst-case b ound than dynamic Huffman co ding do es. On the one hand, worst-case bounds (esp ecially for online algorithms; see, e.g., [5]) are often prov en by considering a g ame betw een the Minimax T rees in Linear Time 5 algorithm and an omniscie n t adversary , a nd minimizing the maximum point wise redundancy at each step seems so mehow related ( more than just by name) to t he minimax stra tegy for the algorithm. On the other ha nd, dynamic prefix co ding can b e viewed a s a pro cedure in which we rep eatedly build a pr efix c ode based on a s ample — i.e., the character s already encode d. 3 Minimax T rees for I n teger W eigh ts In this section we give t wo O ( n )-time alg orithms for building a minima x tree fo r a multiset of integer weigh ts, both based on the following lemma (which we note applies to an y w eights, not only integers) and cor ollary: Lemma 1. If W = { w 1 , . . . , w n } is a multiset of weights and W ′ = n max  w 1 , max i { w i } − n + 1  , . . . , max  w n , max i { w i } − n + 1  o , then M ( W ′ ) = M ( W ) . Mor e over, any minimax t r e e for W ′ b e c omes a minimax tr e e for W when we r eplac e the le aves’ weights e qual t o max i { w i } − n + 1 by the weights in W less than or e qual to max i { w i } − n + 1 , in any or der. Pr o of. Cons ider a minimax tree T for W . Without loss of gener alit y , we can assume T is strictly binary — i.e., that every internal no de has exa ctly t wo children — and, therefore, that it ha s height at most n − 1. (Recall that, for simplicity , w e consider only binary trees.) If n = 1 , then W = w 1 = max i { w i } − n + 1 . Otherwise, all the leaves ha ve depth at lea st 1, s o M ( W ) ≥ max i { w i } + 1. Consider any lea f (if one exists) with weigh t less than max i { w i } − n + 1 and depth ℓ . Since max i { w i } − n + 1 + ℓ ≤ max i { w i } < M ( W ), increa sing that leaf ’s weigh t to max i { w i } − n + 1 and upda ting its ancestor s’ w eights, does not change the w eight M ( W ) of the ro ot. It follows that M ( W ′ ) = M ( W ). Now c onsider a minimax tree T ′ for W ′ . If we replace the leav es’ w eights equal to max i { w i } − n + 1 by the weigh ts in W less than or equal to max i { w i } − n + 1 and up date a ll the nodes’ w eights, then the weigh t M ( W ′ ) o f the ro ot ca nnot increase nor, by definition, decrease to less than M ( W ). Since M ( W ′ ) = M ( W ), it follows that the r e-weigh ted tree is a minimax tree fo r W . ⊓ ⊔ Corollary 1. When al l the weights in W ar e inte gers, we c an sort W ′ in O ( n ) time. Pr o of. When all the weigh ts in W at least max i { w i } − n + 1 are integers, all the weigh ts in W ′ are in tegers in the in terv al  max i { w i } − n + 1 , max i { w i }  . Since this int erv al has length n − 1, we can sort W ′ in O ( n ) time using either direc t addressing, which takes O ( n ) extra space, or radix sort, which takes no extra space [12]. ⊓ ⊔ F or our first a lgorithm, we build and sort W ′ ; build a minimax tree for W ′ using a implemen tation o f Golumbic’s algorithm that takes O ( n ) time when the 6 P . Gaw rycho wski and T. Gagie weigh ts a re already so rted; a nd replace the leaves’ weigh ts equal to max i { w i } − n + 1 by the weights in W less than or equal to max i { w i } − n + 1 . W e note that V an Leeuw en [31] show ed how to implement Huffman’s algorithm to take O ( n ) when the weigh ts are already sorted. W e could implemen t Golum bic’s algorithm analogo usly , but w e think the implementation b elow is simpler. Lemma 2. Golumbic’s algorithm c an b e impl emente d to take O ( n ) time when the weights ar e alr e ady sorte d. Pr o of. W e s tart with the w eights stored in a linked lis t in nondecreasing order, and set a po in ter to the head of the list. W e then rep eat the following pro cedure un til there is o nly one no de left in the list, w hic h is the ro ot of a minimax tree for the g iv en w eights: we mov e the p ointer along the list to the last weigh t less than or equal to the maximum of the first tw o weigh ts plus 1; r emo ve the first tw o no des from the list; make thos e no des the children of a new no de w ith weigh t equal to the maximum of their weight s plus one; and insert the new no de immediately to the right of the pointer. Notice we re mo ve tw o nodes for each one w e insert, so the total n um b er of nodes is 2 n − 1. Therefore, since the pointer passes over each no de o nce, this implementation takes O ( n ) time. ⊓ ⊔ Building and sorting W ′ takes O ( n ) time, b y Coro llary 1; building a minimax tree for W ′ takes O ( n ) time, by Lemma 2; replacing the leav es’ w eights equal to max i { w i } − n + 1 b y the w eights in W less than or equal to max i { w i } − n + 1 takes O ( n ) time, becaus e it can b e done in any order. By Lemma 1, t he r esulting tree is a minimax tree for W . Theorem 1. Given a multiset W of n int e ger weights, we c an build a minimax tr e e for W in O ( n ) time. Our second algor ithm differs in its second step: instead o f using Go lum bic’s algorithm to build a minimax tree for W ′ , we use Kirkpa tric k and Klaw e’s O ( n )- time algorithm fo r integer weigh ts to build an alphab etic minimax tr ee for the sequence V c onsisting of the weigh ts in W ′ in non-incr easing order. The algo- rithm’s correctnes s follows fr om the Kraft Inequality: Theorem 2 (Kraft, 1949). If ther e ex ists a binary tr e e whose le aves have depths ℓ 1 , . . . , ℓ n , t hen P i 1 / 2 ℓ i ≤ 1 . Conversely, if P i 1 / 2 ℓ i ≤ 1 and ℓ 1 ≤ · · · ≤ ℓ n , t hen t her e exists an or der e d binary tre e whose le aves, fr om left to right, have depths ℓ 1 , . . . , ℓ n . By the latter part of Theorem 2 a nd a standard exchange argument — i.e., if a minimax tree contains tw o leav es s uc h that the dee per one has a highe r weigh t than t he shallow er one, then we can sw ap their w eights — there exis ts a minimax tree for W ′ in which the leav es’ w eights ar e non-increasing from left to right. Therefore, by definition, any alphab etic minimax tree for V is a minimax tree for W ′ . Minimax T rees in Linear Time 7 4 Minimax T rees for R eal W eigh ts Strictly spea king, Drmota and Szpank owski’s algorithm w orks only when giv en a m ultiset o f w eights equal to { log p 1 , . . . , log p n } for some proba bilit y distribution P = p 1 , . . . , p n . F or any v alue c , howev er, if W = { w 1 , . . . , w n } and W ′ = { w 1 + c, . . . , w n + c } then, by definition, M ( W ′ ) = M ( W ) + c and any minimax tree for W ′ bec omes a minimax tree for W when we subtract c from each leaf ’s weigh t. In particular, if c = − log ( P i 2 w i ) then P i 2 w i + c = 2 c P i 2 w i = 1; there fore, W ′ = { lo g p 1 , . . . , log p n } for so me probability distribution P = p 1 , . . . , p n and we ca n use Dr mota and Szpankowski’s alg orithm to build minimax tr ees for W ′ and, thus, for W . Without loss o f generality , w e hencefor th assume the given mult iset W o f weigh ts is equal to { log p 1 , . . . , log p n } fo r some probability distribution P (so each w i ≤ 0 ). Theorem 3 (Drmota and Szpank o wski, 2002). If W = { w 1 , . . . , w n } is a multiset of weights, X = { x 1 , . . . , x n } = {| w 1 | − ⌊| w 1 |⌋ , . . . , | w n | − ⌊| w n |⌋} and x i is the lar gest element in X ∪ { 0 } such that X x j ≤ x i 1 / 2 ⌊| w j |⌋ + X x j >x i 1 / 2 ⌈| w j |⌉ ≤ 1 , then any minimax t r e e for {−⌊| w j |⌋ : x j ≤ x i } ∪ {−⌈ | w j |⌉ : x j > x i } b e c omes a m inimax tr e e for W when we r eplac e e ac h le af ’s weight −⌊| w j |⌋ or − ⌈| w j |⌉ by w j . If x 1 ≤ · · · ≤ x n and x i > 0 then, by Theorem 3, i is the lar gest index such that {⌊| w j |⌋ : x j ≤ x i } ∪ {⌈ | w j |⌉ : x j > x i } satisfies the K raft Inequality . T o build a minimax tree for W with Drmota and Szpankowski’s algorithm, w e com- pute and sort X ; use binar y sear c h to find i , in each round testing whether the Kraft Inequality holds; build a minimax tree for {−⌊| w 1 |⌋ , . . . , −⌊| w i |⌋ , −⌈ | w i +1 |⌉ , . . . , −⌈| w n |⌉} ; a nd replace each leaf ’s weigh t −⌊| w j |⌋ or −⌈| w j |⌉ b y w j . Our ver- sion differs in three wa ys: we use generalized s election instea d o f sorting and binary search; w e use a new data structure to test the Kra ft Inequality; and we use either of our a lgorithms from Section 3 to build the minimax tree for {−⌊| w 1 |⌋ , . . . , −⌊| w i |⌋ , −⌈ | w i +1 |⌉ , . . . , −⌈ | w n |⌉} . I n the remainder of this sectio n we first s ho w how to use generalized selection to find i in O ( n ) time, excluding the time needed to test the Kr aft Inequality; we then s ho w how to p erform a ll the necessary tests in a total of O ( n ) time o n a word RAM with Ω (log n )-bit words, using our new data structure. Since eac h of our alg orithms from S ection 3 takes O ( n ) time, it follo ws that w e can build a m inimax tree for W in O ( n ) time. T o find x i in O ( n ) time with general selection, we start with the multiset X 1 = X ∪ { 0 } and rep eat the following pr ocedur e un til we re ac h the empt y s et: in the r th round, we u se the linear -time selectio n algo rithm due to Blum et al . [4] to find the cur rent multiset X r ’s media n x m , then tes t whether X x j ≤ x m 1 / 2 ⌊| w j |⌋ + X x j >x m 1 / 2 ⌈| w j |⌉ ≤ 1 ; 8 P . Gaw rycho wski and T. Gagie if so, w e remov e those elements of X r that ar e less than or eq ual to x m and recurse o n the resulting multiset; if not, w e remove those elemen ts of X r that are greater than or equal to x m and recurse. The elemen t x i is the largest median we consider for which the test is p ositive. Since the size of the multisets decr eases by a fa ctor o f a t least 2 in each round, we use O (log n ) rounds and we find all the medians in a total of O ( n ) time. By the same ar gumen ts we used to prov e Lemma 1, we can a ssume, without loss of generality , that ⌈| w j |⌉ ≤ n − 1 for eac h j . T o test the Kraft Inequality , we use a data structure cons isting of t wo n -bit binar y fractio ns, S 1 and S 2 , each br oken into (log n )-bit blo cks and initially set to 0. F or 1 ≤ k ≤ n − 1, adding 1 / 2 k to either fraction takes O (1) amortized time, for the same reason that incrementing a binar y counter takes O (1) amo rtized time (s ee, e.g., [7, Section 17.3]). On a word RAM with Ω (log n )-bit words, nondestructively tes ting whether S 1 + S 2 ≤ 1 takes O ( n/ log n ) time, b ecause adding each corr espo nding pair of blo cks takes O (1) time a nd, by induction, the num ber car ried from each pair to the nex t is at most 1; r esetting either fra ction to 0 takes O (1) time for each blo c k, i.e., O ( n/ log n ) time in total. Before s tarting to sear c h for x i , w e set S 1 = P j 1 / 2 ⌈| w j |⌉ in O ( n ) time. Throughout our g eneralized selection, we ma in tain the inv ariant that, at the beg inning of the r th round, S 1 = X j 1 / 2 ⌈| w j |⌉ + X 0 x m 1 / 2 ⌈| w j |⌉ , we can test the K raft Inequa lit y in O ( n/ log n ) time b y chec king whether S 1 + S 2 ≤ 1 . If the test is positive, then w e a dd S 2 to S 1 in O ( n/ lo g n ) t ime; if the test is negative, then w e do not change S 1 . In either ca se, straight forward ca lculation shows tha t, after w ards, S 1 = X j 1 / 2 ⌈| w j |⌉ + X 0

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment