Optimum Binary Search Trees on the Hierarchical Memory Model
The Hierarchical Memory Model (HMM) of computation is similar to the standard Random Access Machine (RAM) model except that the HMM has a non-uniform memory organized in a hierarchy of levels numbered 1 through h. The cost of accessing a memory locat…
Authors: Shripad Thite
OPTIMUM BINAR Y SEAR CH TREES ON THE HIERAR CHICAL MEMOR Y MODEL BY SHRIP AD THITE B.E., Univ ersit y o f P o ona, 199 7 THESIS Submitted in partia l fulfillmen t of the requiremen ts for the degree of Master of Science in Computer Science in the Graduate College of the Univ ersit y of Illinois a t Urbana-Champaign, 200 1 Urbana, Illinois c Cop yrigh t by Shripad Thite, 2001 ABSTRA CT The Hierarc hical Memory Mo del (HMM) of computation is similar to the standard Random Access Mac hine (RAM) mo del except that the HMM has a non-uniform memory organized in a hierarc h y of lev els n um b ered 1 throug h h . The cost of accessing a memory lo cation increases with the lev el n um b er, and access es to memory lo cations belonging to the same lev el cost the same. F o rmally , the cost of a single access t o the memory lo cation at a ddress a is giv en b y µ ( a ), where µ : N → N is the memory cost function, a nd the h distinct v alues of µ mo del the differen t lev els of the memory hierarc h y . W e study the problem of constructing and storing a binary search tree (BST) of min- im um cost, o v er a set o f key s, with probabilities for success ful and unsuccessful searc hes, on t he HMM with an arbitrary n um b er of memory lev els, and for t he sp ecial case h = 2. While the problem of constructing optim um binary searc h trees ha s b een w ell studied for the standard R AM mo del, the additional parameter µ for the HMM increases the com binatorial complexit y of the problem. W e pres en t tw o dynamic pro g ramming algo- rithms to construct optim um BSTs b ottom- up. These a lgorithms run efficien tly under some natural assum ptions ab out the memory hierarc h y . W e also giv e an efficien t algo - rithm to cons truct a BST that is close to o ptim um, by mo difying a w ell-kno wn linear- time appro ximation algorithm for the RAM mo del. W e conjecture that the problem of con- structing an optim um BST for the HMM with an arbitrary memory cost function µ is NP-complete. iii T o m y father iv “Results? Wh y , man, I ha v e gotten lot s of results! If I find 10,000 w a ys something w o n’t work, I hav en’t failed.” — Thomas Alv a Edison. ( www.thomasedison. com ) v A CKNO WLEDGMENTS First and f o remost, I w ould like to thank m y advisor, Mic hael Loui. This thesis w o uld hav e b een of m uch p o orer qualit y if no t for the copious amoun ts of time and red ink dev oted by him. Prof. L o ui has b een a wonderful and understanding guide and men to r , and I feel privileged to ha v e had him as an advisor. Thanks to Jeff Ericks on and Sariel Har-P eled for t a king the time to read and suffer early draf t s, and for n umerous helpful discussions. Sp ecial thanks to Jeff Eric kson for letting me sp end an ino r dinate amount of time o n this pro ject while I w as supp osed t o b e working on something els e. I am ex tremely grateful to Mitc h Harris for b eing there on so man y o ccasions t o listen to my ramb lings, to b ounce ideas off of, and o f ten just for b eing there. I w ould also like to t hank Prof. Ed Reingold; it w as during his CS 4 73 class in fall 1998 that the topic of optim um binary searc h trees (on the RAM mo del) came up for discussion. I would lik e to thank m y mentor at the Los Alamos National Lab ora t o ry , Madhav Marathe, for prov iding supp ort and an en vironmen t in whic h to explore the general sub ject of hierarc hical memory mo dels during my in ternship there in summer 1998. vi T ABLE OF CONTENTS CHAPTER P A GE 1 In tro duction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 What is a binary searc h tr ee? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.1 Searc hing in a BST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.2 W eigh ted binary searc h t r ees . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Wh y study binary searc h tr ees? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Ov erview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2 Bac kground and Related W ork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1 Binary searc h trees and related pr o blems . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1.1 Constructing optimum binary searc h trees on the RAM . . . . . . . . 7 2.1.1.1 Dynamic programming a lgorithms . . . . . . . . . . . . . . . . . 7 2.1.1.2 Sp eed-up in dynamic programming . . . . . . . . . . . . . . . . . 10 2.1.2 Alphab etic trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.1.3 Huffman trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.1.4 Nearly optim um searc h trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.1.5 Optimal binary decision t r ees . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.2 Mo dels of computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.2.1 The need for an alternat ive to the RAM mo del . . . . . . . . . . . . . . . 16 2.2.1.1 Mo dern computer organizatio n . . . . . . . . . . . . . . . . . . . . 17 2.2.1.2 Lo cality of reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.2.1.3 Memory effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 9 2.2.1.4 Complexit y of commun ication . . . . . . . . . . . . . . . . . . . . 1 9 2.2.2 External memory algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.2.3 Non-uniform memory architec ture . . . . . . . . . . . . . . . . . . . . . . . . 21 2.2.4 Mo dels for non-uniform memory . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3 Algorithms for Constructing Optim um and Nearly Optim um Binary Searc h T rees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.1 The HMM mo del . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.2 The HM M 2 mo del . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.3 Optim um BSTs on the HMM mo del . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.3.1 Storing a tree in memory o pt ima lly . . . . . . . . . . . . . . . . . . . . . . . 29 vii 3.3.2 Constructing an optimum tree when the memory assignmen t is fixed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1 3.3.3 Naiv e a lgorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.3.4 A dynamic programming alg orithm: algorithm P ar ts . . . . . . . 33 3.3.5 Another dynamic progra mming alg orithm: algorithm Tr unks . 37 3.3.6 A top- do wn algorithm: algorithm Split . . . . . . . . . . . . . . . . . . 41 3.4 Optim um BSTs on the HM M 2 mo del . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.4.1 A dynamic programming alg orithm . . . . . . . . . . . . . . . . . . . . . . . 42 3.4.1.1 algorithm Tw oLevel . . . . . . . . . . . . . . . . . . . . . . . . 43 3.4.1.2 Pro cedure TL-phas e-I . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.4.1.3 Pro cedure TL-phas e-I I . . . . . . . . . . . . . . . . . . . . . . . . 44 3.4.1.4 Correctness of algorithm TwoLevel . . . . . . . . . . . . . 47 3.4.1.5 Running time of algorithm Tw oLe vel . . . . . . . . . . . 47 3.4.2 Constructing a nearly optimum BST . . . . . . . . . . . . . . . . . . . . . . 48 3.4.2.1 An approx imation algo rithm . . . . . . . . . . . . . . . . . . . . . 48 3.4.2.2 Analysis of the running time . . . . . . . . . . . . . . . . . . . . . 49 3.4.2.3 Qualit y of approx imation . . . . . . . . . . . . . . . . . . . . . . . . 53 3.4.2.4 Lo w er b ounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.4.2.5 Appro ximation b ound . . . . . . . . . . . . . . . . . . . . . . . . . . 5 7 4 Conclusions and Open P roblems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.2 Op en problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 9 4.2.1 Efficien t heuristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.2.2 NP-hardness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.2.3 An algorithm efficien t on the HMM . . . . . . . . . . . . . . . . . . . . . . . 6 0 4.2.4 BSTs optim um on b oth the R AM and the HMM . . . . . . . . . . . . . 60 4.2.5 A monoto nicit y principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.2.6 Dep endence on the pa rameter h . . . . . . . . . . . . . . . . . . . . . . . . . . 64 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 6 viii LIST OF FIGURES Figure P age 1.1 A binary searc h tree ov er the set { 1 , 2, 3, 5, 8, 13, 21 } . . . . . . . . . . . . . . . 2 2.1 algorithm K1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 algorithm K2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.1 algorithm P ar ts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.2 procedure P ar tition-Me mor y . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.3 algorithm Tr unks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.4 algorithm Tw oLevel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.5 procedure TL-phase -I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.6 procedure TL-phase -I I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.7 algorithm Appro x-BS T . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.8 algorithm Appro x-BS T (cont’d.) . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.1 Summary of results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.2 An optim um BST on the unit- cost RAM mo del. . . . . . . . . . . . . . . . . . . . 61 4.3 An optim um BST on the HMM mo del. . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.4 The cost of an optimum BST is not a unimo dal function. . . . . . . . . . . . . 65 ix CHAPTER 1 In tr o du c tion 1.1 What is a binary searc h tree ? F or a set of n distinct k eys x 1 , x 2 , . . . , x n from a totally ordered univ erse ( x 1 ≺ x 2 ≺ . . . ≺ x n ), a binary search tree (BST) T is an ordered, ro ot ed binary tree with n internal no des. The in ternal no des of the tree corresp o nd to the k eys x 1 through x n suc h that an inorder tr av ersal of the no des visits the k eys in order of precedence , i.e., in the order x 1 , x 2 , . . . , x n . The external no des corresp ond to in terv a ls b et we en t he ke ys, i.e., the j -th external no de represen ts the set of elemen ts b et w een x j − 1 and x j . Without ambiguit y , w e iden tify the no des of the tree by the corresp onding k eys. F or instance, a binary searc h t ree on the set o f inte gers { 1, 2, 3, 5, 8, 13, 2 1 } with the natura l or dering of in tegers could lo ok like the tree in figure 1.1. The in ternal no des of the tree are lab eled { 1 , 2, 3, 5, 8, 13, 2 1 } and the external no des ( leav es) are lab eled A through H in o r der. Let T i,j for 1 ≤ i ≤ j ≤ n denote a BST on the subs et of k eys from x i through x j . W e define T i +1 ,i to b e the unique BST ov er the empt y subset of ke ys from x i +1 through x i whic h consists of a single external no de with probability of acce ss q i . W e will use T to denote T 1 ,n . A binary searc h tree with n internal no des is stored in n lo cations in memory: each memory lo cation con tains a k ey x i and tw o p oin ters to the memory lo cations con taining the left and righ t c hildren of x i . If the left (resp. r ig h t) subtree is empt y , then the left (resp. right) p ointer is Nil . In this section, w e will restrict our attention to the standard RAM mo del of compu- tation. 1 13 5 1 A 3 2 B C D 8 E F 21 G H Figure 1.1 A binary searc h tree ov er the set { 1, 2, 3, 5, 8, 13, 21 } 1.1.1 Searc hing in a B ST A searc h in T i,j pro ceeds recursiv ely as follows. The searc h a rgumen t y is compared with t he ro ot x k ( i ≤ k ≤ j ). If y = x k , then the search terminates successfully . Otherwise, if y ≺ x k (resp. y ≻ x k ), then the searc h pro ceeds recursiv ely in the left subtree, T i,k − 1 (resp. the right subtree, T k +1 ,j ); if the left subtree (resp. righ t subtree) of x k is an external no de, i.e., a leaf, then the searc h fa ils without visiting any other no des b ecause x k − 1 ≺ y ≺ x k (resp. x k ≺ y ≺ x k +1 ). (W e adopt the conv en tion that x 0 ≺ y ≺ x 1 means y ≺ x 1 , and x n ≺ y ≺ x n +1 means y ≻ x n .) The depth of an internal o r external no de v is the n um b er of no des on the path to the no de from the ro ot , denoted b y δ T ( v ), or simply δ ( v ) when the tree T is implicit. Hence, for instance, the depth of the ro ot is 1. The cost of a successful or unsuccess ful searc h is the n um b er of comparisons needed to determine the outcome. Therefore, the cost of a successful searc h that terminates at some in ternal no de x i is equal to the depth of x i , i.e., δ ( x i ). The cost of an unsuccess ful searc h that would hav e terminated at the external no de z j is one less than the depth of z j , i.e., δ ( z j ) − 1. So, for instance, the depth of the in ternal no de la b eled 8 in the tree of figure 1.1 is 3. A se arc h for the k ey 8 w ould p erfo rm three comparisons, with the no des lab eled 13, 5, and 8, b efore terminating success fully . Therefore, the cost of a succe ssful searc h 2 that terminates at the no de lab eled 8 is the same as the path length of the no de, i.e., 3. On the o ther hand, a searc h for the v alue 4 w ould p erform comparisons with the nodes lab eled 13, 5, 1, and 3 in that order and then w ould terminate with failure, for a to t a l of four comparisons. This unsuccessful searc h would hav e visited the external no de la b eled D ; therefore, the cost o f a search that terminates at D is one less tha n the depth of D , i.e., 5 − 1 = 4. Ev en thoug h the external no des are conceptually presen t, they a re not necessary for implemen ting the BST data structure. If any subtree of a n internal no de is empt y , then the p oin ter to that subtree is assumed to b e Nil ; it is not necessary to “visit” this empt y subtree. 1.1.2 W eigh ted binary searc h trees In the weigh ted case, we are also give n the probability that the searc h argumen t y is equal to some ke y x i for 1 ≤ i ≤ n and the probabilit y that y lies b et we en x j and x j +1 for 0 ≤ j ≤ n . Let p i , fo r i = 1, 2, . . . , n , denote the probability that y = x i . Let q j , for j = 0, 1, . . . , n , denote the probability that x j ≺ y ≺ x j +1 . W e hav e n X i =1 p i + n X j =0 q j = 1 . Define w i,j as w i,j = j X k = i p k + j X k = i − 1 q k . (1.1) Therefore, w 1 ,n = 1 , and w i +1 ,i = q i . (Note that this definition differs from the function w ( i, j ) referred to b y Knuth [Knu73]. Under definition (1.1), w i,j is the sum of the probabilities asso ciated with the subtree o v er the k eys x i through x j . Under Knuth’s definition, w ( i, j ) = w i +1 ,j is t he sum of the probabilities associat ed with the k eys x i +1 through x j .) Recall that the cost of a successful searc h that terminates at t he in ternal no de x i is δ ( x i ), and the cost of an unsuc cessful searc h that terminates at the external no de z j is δ ( z j ) − 1. W e define the c ost of T to b e the exp ected cost of a search: cost( T ) = n X i =1 p i · δ T ( x i ) + n X j =0 q j · ( δ T ( z j ) − 1) . (1.2) 3 In other w ords, the cost of T is the weigh ted sum of the depths of the in ternal and external no des of T . An optimum binary se a r ch tr e e T ∗ is one with minimu m cost. Let T ∗ i,j denote the optim um BST o v er the subset of k eys from x i through x j for all i , j suc h that 1 ≤ i ≤ j ≤ n ; T ∗ i +1 ,i denotes the unique optim um BST consisting of an external no de with probabilit y of access q i . 1.2 Wh y study binary se arch t r e es? The binary searc h tr ee is a fundamen tal data structure that supp orts the op erations of inserting and deleting k eys, as w ell as searc hing for a k ey . The straigh tforw ard imple- men ta t ion of a BST is adequate and efficien t for the static case when the probabilities of accessing k eys are kno wn a prio ri or can at least b e estimated. More complicated implemen tations, suc h as red-blac k trees [CLR90], A VL trees [A VL62, Knu73], and spla y trees [ST85], g ua ran tee that a sequence of op erations, including insertions and deletions, can b e executed efficien tly . In addition, the binary searc h tree also serve s as a mo del for studying the p erfo rmance of algorithms like Quicksor t [Kn u73, CLR90 ]. The recursiv e execution of Quicksor t corresp onds to a binary tree where eac h no de represe n ts a partition of the elem en ts to b e sort ed into left and right parts, consisting o f elemen ts that are resp ectiv ely less than and greater than the piv ot elemen t. The running time of Quicksor t is the sum of the w o rk done b y the a lgorithm corresp onding to eac h no de of this r ecursion tr ee. A binary searc h tree also arises implicitly in the con text of binary search. The BST corresp onding to binary searc h ac hiev es the theoretical minim um num b er of comparisons that are necessary to searc h using only k ey comparisons. When an explic it BST is used as a data structure, we wan t to construct one with minim um cost. When studying the p erformance of Quicksor t , we wan t to pro v e low er b ounds on the cost and hence the running time. Therefore, the problem of constructing optim um BSTs is of considerable in terest. 4 1.3 Ov ervie w In c hapter 2, we surv ey bac kground w ork on bina r y searc h trees and computational mo dels for non-uniform memory computers. In c hapter 3, we give algorithms for constructing optimum binary searc h trees. In section 3 .3 , we consider the most general v ariant of the HMM mo del, with a n arbitra ry n um b er of memory lev els. W e presen t t w o dynamic prog ramming a lgorithms a nd a top- do wn algo r ithm to construct optim um BSTs on the HMM. In section 3.4, w e consider the sp ecial case of the HMM mo del with only tw o memory lev els. F or this mo del, w e presen t a dynamic programming a lgorithm to construct optim um BSTs in section 3.4.1, and in section 3.4.2, a linear-time heuristic to construct a BST close to the optimum. Finally , w e conclude with a summary of our results and a discuss ion of open problems in c hapter 4. 5 CHAPTER 2 Bac kgr o und and Related W o rk In this c hapter, w e surv ey related w ork on the problem of constructing optim um binary search trees, and on computationa l mo dels for hierarc hical memory . In section 2.1 w e discuss t he optim um binary searc h tree problem and relat ed problems. In section 2.2, w e discuss memory effects in mo dern computers and prese n t argumen ts for b etter theoretical mo dels. In section 2.2.2 , we surv ey related w ork on designing data structures and algo r it hms, and in section 2.2.4, we discuss prop osed mo dels of computation for hierarc hical-memory computers. 2.1 Binary searc h tr e es and related pro b lems The binar y searc h tree has b een studied extensiv ely in differen t con texts. In sections 2.1.1 through 2.1.5, w e will summarize previous w ork on the follo wing related problems that ha v e b een studied on the RAM mo del of computation: • constructing a binary searc h tree suc h that the exp ected cost of a se arc h is mini- mized; • constructing an alphab etic tree suc h that t he sum of the we igh ted path lengths of the external no des is minimized; • constructing a prefix-free co de t r ee with no restriction on the lexicographic order of the no des suc h that the w eigh ted path lengths of all no des is minimized; • constructing a binary search t r ee close to optimu m by an efficien t heuristic; • constructing an optimal binary decision tree. 6 2.1.1 Constructing optim u m binary searc h trees on the RAM 2.1.1.1 Dynamic programming algorithms Theorem 1 ( Kn uth [Kn u71], [Kn u73]). An optim um BST can b e constructed by a dynamic programming algor ithm that runs in O ( n 2 ) -time and O ( n 2 ) -space. Pro of: By the principle of optimalit y , a binary searc h tree T ∗ is optim um if and only if eac h subtree of T ∗ is optimum. The standard dynamic programming algorithm pro ceeds as follow s: Recall that cost ( T ∗ i,j ) denotes the cost of an o pt imum BST T ∗ i,j o v er the k eys x i , x i +1 , . . . , x j and the corresponding pro babilities p i , p i +1 , . . . , p j and q i − 1 , q i , . . . , q j . By the principle of optimality a nd the definition of the cost function in equation (1.2), cost( T ∗ i,j ) = w i,j + min i ≤ k ≤ j cost( T ∗ i,k − 1 ) + cost( T ∗ k +1 ,j ) for i ≤ j cost( T ∗ i +1 ,i ) = w i +1 ,i = q i (2.1) Recurrence (2.1) suggests a dynamic programming algorithm, algorithm K1 in figure 2.1, that constructs optim um subtree s bo t tom-up. algorithm K1 is the standard dynamic pro gramming algorithm. F or eac h d from 0 t hrough n − 1, and for eac h i , j suc h that j − i = d , the algorithm ev aluat es the cost of a subtree with x k as the ro ot, for ev ery p o ssible ch oice of k b et w een i and j , and selects the one for whic h this cost is minimized. algorithm K1 constructs arrays c and r , such tha t c [ i, j ] is the cost o f an optimum BST T ∗ i,j o v er the subset of k eys from x i through x j and r [ i, j ] is the index of the ro ot o f suc h an o ptim um BST. The structure of the tr ee can b e retrieve d in O ( n ) time from the arra y r a t the end of the algorithm as follo ws. Let T [ i, j ] denote the o ptimum subtree constructed by algorithm K1 and represen ted implicitly using the array r . The index of the ro ot of this subtree is giv en b y t he array en try r [ i, j ]. Recursiv ely , the left and righ t subtrees of the ro ot are T [ i, r [ i, j ] − 1] and T [ r [ i, j ] + 1 , j ] resp ectiv ely . F or eac h fixed d and i , the algorithm take s O ( d ) time to ev aluat e the c hoice of x k as the ro ot for all k suc h that i ≤ k ≤ j = i + d , and hence, P n − 1 d =0 P n − d i =1 O ( d ) = O ( n 3 ) t ime o v era ll. Kn uth [Knu71] sho wed that the following monotonicit y principle can b e used t o reduce the time complexit y to O ( n 2 ): for all i , j , 1 ≤ i ≤ j ≤ n , let R ( i, j ) denote the index 7 algorithm K1 ([ p 1 ..p n ] , [ q 0 ..q n ]): ( Initialization phase. ) ( An optim um BST ov er the empt y subset of key s from x i +1 through x i ) ( consists of just t he external no de with probability q i . ) ( The ro ot of this subtree is undefined. ) for i := 0 to n c [ i + 1 , i ] ← w i +1 ,i = q i r [ i + 1 , i ] ← Nil for d := 0 to n − 1 for i := 1 to n − d j ← i + d ( Initially , the optimum subtree T ∗ i,j is unkno wn. ) c [ i, j ] ← ∞ for k := i to j Let T ′ b e the tree with x k at the ro ot, and T ∗ i,k − 1 and T ∗ k +1 ,j as the left and right subtrees, resp ectiv ely , i.e., x k T ∗ [ i, k − 1] T ∗ [ k + 1 , j ] Let c ′ b e the cost of T ′ : c ′ ← w i,j + c [ i, k − 1] + c [ k + 1 , j ] ( Is T ′ b etter than the minim um- cost tree so f ar? ) if c ′ < c [ i, j ] r [ i, j ] ← k c [ i, j ] ← c ′ Figure 2.1 algorithm K1 8 algorithm K2 ([ p 1 ..p n ] , [ q 0 ..q n ]): ( Initialization phase. ) for i := 0 to n c [ i + 1 , i ] ← w i +1 ,i = q i r [ i + 1 , i ] ← Nil for d := 0 to n − 1 for i := 1 to n − d j ← i + d c [ i, j ] ← ∞ for k := r [ i, j − 1] to r [ i + 1 , j ] Let T ′ b e the tree x k T ∗ [ i, k − 1] T ∗ [ k + 1 , j ] c ′ ← w i,j + c [ i, k − 1] + c [ k + 1 , j ] if c ′ < c [ i, j ] r [ i, j ] ← k c [ i, j ] ← c ′ Figure 2.2 algorithm K2 of the roo t o f an optimu m BST o ver the key s x i , x i +1 , . . . , x j (if more than one ro ot is optim um, let R ( i, j ) b e the smallest suc h index); then R ( i, j − 1) ≤ R ( i, j ) ≤ R ( i + 1 , j ) . (2.2) Therefore, the innermost lo op in algorithm K1 can b e mo dified to pro duce algo- rithm K2 (figure 2.2) with improv ed running time. Since ( j − 1) − i = j − ( i + 1) = d − 1 whenev er j − i = d , the v alues of r [ i, j − 1] and r [ i + 1 , j ] are a v ailable during the iteration when j − i = d . The n um b er of times that the b o dy of the innermost lo op in algorithm K2 is executed is r [ i + 1 , j ] − r [ i, j − 1] + 1 9 when j − i = d . Therefore, the r unning time of algorithm K2 is prop ort ional to n − 1 X d =0 n − d X i =1 ( r [ i + 1 , j ] − r [ i, j − 1] + 1) where j = i + d = n − 1 X d =0 ( r [ n − d + 1 , n + 1] − r [1 , d ] + n − d ) ≤ n − 1 X d =0 (2 n − d ) since r [ n − d + 1 , n + 1] − r [1 , d ] ≤ ( n + 1) − 1 = O ( n 2 ) . The use of the monotonicit y principle a b ov e is in fact an application of the general tec hnique due to Y a o [Y ao82] to sp eed-up dynamic programming under some sp ecial conditions. (See subsection 2.1.1.2 b elo w.) The space required by b o th a lgorithms for the tables r and c is O ( n 2 ). 2.1.1.2 Sp eed-up in dynamic programming F or the sak e of completeness, w e repro duce b elo w results due to Y ao [Y ao82]. Consider a recurrence to compute the v alue of c (1 , n ) fo r t he function c () defined b y the follow ing recurrence c ( i, j ) = w ( i, j ) + min i ≤ k ≤ j ( c ( i, k − 1) + c ( k + 1 , j )) for 1 ≤ i ≤ j ≤ n c ( i + 1 , i ) = q i (2.3) where w () is some function and q i is a constan t fo r 1 ≤ i ≤ n . The form o f the recurren ce suggests a simple dynamic progra mming algorithm that computes c ( i, j ) fro m c ( i, k − 1 ) and c ( k + 1 , j ) for all k from i through j . This alg orithm sp ends O ( j − i ) time computing the optim um v alue of c ( i, j ) for ev ery pair i , j , suc h that 1 ≤ i ≤ j ≤ n , for a total running time of P n i =1 P n j = i O ( j − i ) = O ( n 3 ). The function w ( i, j ) satisfies the c on c ave quadr angle ine quality (QI) if: w ( i, j ) + w ( i ′ , j ′ ) ≤ w ( i ′ , j ) + w ( i, j ′ ) (2.4) 10 for all i , i ′ , j , j ′ suc h tha t i ≤ i ′ ≤ j ≤ j ′ . In a dditio n, w ( i, j ) is monotone with resp ect to set inclusion o f interv als if w ( i, j ) ≤ w ( k , l ) whenev er [ i, j ] ⊆ [ k , l ], i.e., k ≤ i ≤ j ≤ l . Let c k ( i, j ) denote w ( i, j ) + c ( i, k − 1) + c ( k + 1 , j ) for eac h k , i ≤ k ≤ j . Let K ( i, j ) denote the maxim um k for whic h the optimum v alue of c ( i, j ) is ac hiev ed in recurrence (2.3), i.e., for i ≤ j , K ( i, j ) = max { k | c k ( i, j ) = c ( i, j ) } Hence, K ( i, i ) = i . Lemma 2 (Y ao [Y ao82]). If w ( i, j ) is monotone and satisfies the conca v e quadrangle inequalit y (2.4), then the function c ( i, j ) defined b y recurrence (2.3) also satisfies the conca v e QI, i.e., c ( i, j ) + c ( i ′ , j ′ ) ≤ c ( i ′ , j ) + c ( i, j ′ ) for all i , i ′ , j , j ′ suc h that i ≤ i ′ ≤ j ≤ j ′ . Pro of (Mehlhorn [Meh84]): Consider i , i ′ , j , j ′ suc h that 1 ≤ i ≤ i ′ ≤ j ≤ j ′ ≤ n . The pro o f of the lemma is b y induction o n l = j ′ − i . Base cases: The case l = 0 is trivial. If l = 1, then either i = i ′ or j = j ′ , so the inequalit y c ( i, j ) + c ( i ′ , j ′ ) ≤ c ( i ′ , j ) + c ( i, j ′ ) is trivially true. Inductiv e step: Consider the tw o cases: i ′ = j and i ′ < j . Case 1: i ′ = j . In this case, the concav e QI reduces to the inequality : c ( i, j ) + c ( j, j ′ ) ≤ c ( i, j ′ ) + w ( j, j ) . Let k = K ( i, j ′ ). Clearly , i ≤ k ≤ j ′ . Case 1a: k + 1 ≤ j . c ( i, j ) + c ( j, j ′ ) ≤ w ( i, j ) + c ( i, k − 1 ) + c ( k + 1 , j ) + c ( j, j ′ ) b y t he definition of c ( i, j ) ≤ w ( i, j ′ ) + c ( i, k − 1) + c ( k + 1 , j ) + c ( j, j ′ ) b y t he monotonicit y of w () 11 No w if k + 1 ≤ j , then from the induction h yp othesis, c ( k + 1 , j ) + c ( j, j ′ ) ≤ c ( k + 1 , j ′ ) + w ( j, j ). Therefore, c ( i, j ) + c ( j, j ′ ) ≤ w ( i, j ′ ) + c ( i, k − 1) + c ( k + 1 , j ′ ) + w ( j, j ) = c ( i, j ′ ) + w ( j, j ) b ecause k = K ( i, j ′ ), a nd b y definition of c ( i, j ′ ). Case 1b: k ≥ j . c ( i, j ) + c ( j, j ′ ) ≤ c ( i, j ) + w ( j, j ′ ) + c ( j, k − 1) + c ( k + 1 , j ′ ) b y the definition of c ( j, j ′ ) ≤ c ( i, j ) + w ( i, j ′ ) + c ( j, k − 1) + c ( k + 1 , j ′ ) b y the monotonicity of w () No w if k ≥ j , then from the induction h yp o t hesis, c ( i, j ) + c ( j, k − 1) ≤ c ( i, k − 1) + w ( j, j ). Therefore, c ( i, j ) + c ( j, j ′ ) ≤ w ( i, j ′ ) + c ( i, k − 1) + w ( j, j ) + c ( k + 1 , j ′ ) = c ( i, j ′ ) + w ( j, j ) b y the definition of c ( i, j ′ ). Case 2: i ′ < j . L et y = K ( i ′ , j ) and z = K ( i, j ′ ). Case 2a: z ≤ y . Note that i ≤ z ≤ y ≤ j . c ( i ′ , j ′ ) + c ( i, j ) = c y ( i ′ , j ′ ) + c z ( i, j ) = ( w ( i ′ , j ′ ) + c ( i ′ , y − 1 ) + c ( y + 1 , j ′ )) + ( w ( i, j ) + c ( i, z − 1) + c ( z + 1 , j )) ≤ ( w ( i, j ′ ) + w ( i ′ , j ′ )) + ( c ( i ′ , y − 1 ) + c ( i, z − 1) + c ( z + 1 , j ) + c ( y + 1 , j ′ )) from the concav e QI for w ≤ ( w ( i, j ′ ) + w ( i ′ , j ′ )) + ( c ( i ′ , y − 1 ) + c ( i, z − 1) + c ( y + 1 , j ) + c ( z + 1 , j ′ )) from the induction hypothesis, i.e., the concav e QI applied to z ≤ y ≤ j ≤ j ′ = c ( i, j ′ ) + c ( i ′ , j ) b y definition of c ( i, j ′ ) a nd c ( i ′ , j ). Case 2b: y ≤ z . This case is symmetric to case 2a ab ov e. 12 Theorem 3 ( Y ao [Y ao82]). If the function w ( i, j ) is monotone and satisfie s the con- ca v e quadrangle inequality , then K ( i, j − 1) ≤ K ( i, j ) ≤ K ( i + 1 , j ) . Pro of (Mehlhorn [Meh84]): The theorem is trivially true when j = i + 1 b ecause i ≤ K ( i, j ) ≤ j . W e will prov e K ( i, j − 1) ≤ K ( i, j ) for the case i < j − 1, b y induction on j − i . Recall that K ( i, j − 1) is the largest index k that achie v es the minim um v alue of c ( i, j − 1) = w ( i, j − 1) + c ( i, k − 1) + c ( k + 1 , j − 1) (cf. equation (2 .3)). Therefore, it suffices to sho w that c k ′ ( i, j − 1) ≤ c k ( i, j − 1) = ⇒ c k ′ ( i, j ) ≤ c k ( i, j ) for all i ≤ k ≤ k ′ ≤ j . W e prov e the stronger inequalit y c k ( i, j − 1) − c k ′ ( i, j − 1) ≤ c k ( i, j ) − c k ′ ( i, j ) whic h is equiv alen t to c k ( i, j − 1) + c k ′ ( i, j ) ≤ c k ′ ( i, j − 1) + c k ( i, j ) . The last inequalit y a b o v e is expanded to c ( i, k − 1) + c ( k + 1 , j − 1) + c ( i, k ′ − 1) + c ( k ′ + 1 , j ) ≤ c ( i, k ′ − 1) + c ( k ′ + 1 , j − 1) + c ( i, k − 1) + c ( k + 1 , j ) or c ( k + 1 , j − 1) + c ( k ′ + 1 , j ) ≤ c ( k ′ + 1 , j − 1) + c ( k + 1 , j ) . But this is simply the conca v e quadrang le inequality for the function c ( i, j ) fo r k ≤ k ′ ≤ j − 1 ≤ j , whic h is true b y the induction hypothesis. As a conseque nce of theorem 3, if w e compute c ( i, j ) b y diagona ls, in order of increas- ing v a lues of j − i , then w e can limit our searc h for the optimum v a lue of k to the ra ng e from K ( i, j − 1 ) through K ( i − 1 , j ). The cost of computing all entries on one diagonal 13 where j = i + d is n − d X i =1 ( K ( i + 1 , j ) − K ( i, j − 1) + 1) = K ( n − d + 1 , n + 1) − K (1 , d ) + n − d ≤ ( n + 1) − 1 + ( n − d ) < 2 n. The speed-up techniq ue in this section is used to impro ve the running time of the standard dynamic programming a lg orithm t o compute o ptimum BSTs. It is easy to see that the para meters of the optimum BST problem satisfy the conditions required b y Theorem 3. 2.1.2 Alphab etic trees The sp ecial case of the problem of constructing a n optim um BST when p 1 = p 2 = · · · = p n = 0 is know n a s the alp h ab etic tr e e pr oblem . This problem arises in the contex t of constructing optimum binary co de tr ees. A binary co dew ord is a string of 0’s a nd 1’s. A prefix-free binary co de is a seq uence o f binary co dew or ds suc h that no co dew ord is a prefix of another. Corresp onding to a prefix-free co de with n + 1 co dew ords, there is a ro oted binary tree with n internal no des and n + 1 external no des where the co dew ords corresp ond to the external no des o f the tree. In the alphab etic tree problem, w e require that the co dew ords at the external no des app ear in order f r o m left to right. T aking the left branc h of the tr ee stands for a 0 bit and ta king the right branc h stands for a 1 bit in the co dew ord; t hus, a pat h in the tree from the ro ot to the j -t h external no de represen ts the bits in the j - th codeword. This metho d of co ding preserv es the lexicographic order of messages. The probability q j of the j -th co dew ord is the likelihoo d that the sym b ol corresp onding to that co dew ord will app ear in any message. Th us, in this problem, p 1 = p 2 = · · · = p n = 0 and P n j =0 q j = 1. Hu and T uc k er [HT71] dev elop ed a tw o-phase a lg orithm that constructs an o ptim um alphab etic tree. In the first phase, starting with a sequence of n + 1 no des, pa irs of no des are recursiv ely com bined into a single tree to obtain an assignmen t of lev el n um b ers to the no des. The tree constructed in t he first phase do es not necessarily hav e the lea v es in order. In the second phase, the nodes are recomb ined in to a tree where the no des are no w 14 in lexicographic order and the depth of a no de is the same as the lev el n um b er assigned to it in the first phase. It is non- trivial to prov e t hat there exists an optim um alphab etic tree with the external no des at the same depths a s the lev el num b ers constructed in the first phase. The algorithm uses a priorit y queue with at most n + 1 elemen ts on whic h it p erfo r ms O ( n ) op erations. With the appropriate implemen tation, such as a leftist tree [Knu73] or a F ib onacci heap [CLR90 ], the algorit hm requires O ( n log n ) time and O ( n ) space. 2.1.3 Huffman trees If w e relax t he condition in the alphab etic tree problem that the co dewords should b e in lexicographic or der, then the problem of constructing an optim um prefix-free co de is the Huffman tree problem. Huffman’s classic result [Huf5 2 ] is tha t a simple g r eedy algorithm, running in time O ( n log n ), suffices t o construct a minim um-cost co de tr ee. 2.1.4 Nearly optim um searc h trees The b est know n alg orithm, algorithm K2 due to Knuth [Kn u71], to construct an optim um searc h tree requires O ( n 2 ) t ime and space (Theorem 1). If w e are willing to sacrifice optimalit y for efficie ncy , then w e can use a simple linear-t ime heuristic due to Mehlhorn [Meh84] to construct a tree T t ha t is not to o far from optim um. In f act, if T ∗ is a tree with minimum cost, then cost( T ) − cost( T ∗ ) ≤ lg (cost( T ∗ )) ≈ lg H where H = P n i =1 p i lg(1 /p i ) + P n j =0 q j lg(1 /q j ) is the en trop y of the probability distribu- tion. 2.1.5 Optimal binary decision trees W e remark that the related problem of constructing an opt ima l binary decision tree is kno wn to b e NP-complete. Hy afil and Riv est [HR7 6] pro v ed tha t the f o llo wing problem is NP-hard: Problem 4. Let S = { s 1 , s 2 , . . . , s n } b e a finite set of ob jects and let T = { t 1 , t 2 , . . . , t m } b e a finite set of tests. F o r eac h test t i and ob ject x j , 1 ≤ i ≤ m and 1 ≤ j ≤ n , w e 15 ha v e eithe r t i ( x j ) = True or t i ( x j ) = F alse . Construct a n identific ation pro cedure for the ob jects in S suc h that the exp ected nu m b er of tests required to completely iden tify an elemen t of S is minimal. In ot her w ords, construct a binary decision tree with the tests a t t he internal no des and the ob jects in S at the external no des, suc h that the sum of the path lengths of the external no des is minimized. The a uthors sho w ed, via a reduction from Exact Co ver by 3- Sets (X3C) [G J79], that the optimal binary decision tree problem r emains NP-ha rd eve n when the tests are all subsets of S o f size 3 and t i ( x j ) = Tr ue if and only if x j is an elemen t of set t i . F or more details on the optim um binary searc h tree problem and related problems, w e refer the reader to the excellen t surv ey article b y S. V. Na g ara j [Nag97]. 2.2 Mo dels of co mputation The R andom Access Machine (RAM) [P ap95, BC94] is used most often in the design and analysis of alg o rithms. 2.2.1 The need for an alternativ e to the RAM mo del The RAM is a sequen tial mo del of computation. It consists of a single pro cessor with a predetermined set of instructions. Different v ariants of the RAM mo del assume different instruction sets—for instance, the real RAM [PS85] can p erform exact arithmetic o n real n um b ers. See also Louis Mak’s Ph.D . thesis [Mak95]. In the RAM mo del, memory is organized as a p oten tially un b ounded array of lo ca- tions, num b ered 1, 2, 3, . . . , eac h o f whic h can store an arbitrarily large in teger v alue. On the RAM, the memory organization is uniform; i.e., it take s the same amoun t of time to access any lo cation in memory . While the RAM mo del serv es to approxim ate a real computer fairly w ell, in some cases, it has b een observ ed empirically that algorithms (and data structures) b ehav e m uc h worse tha n predicted on the RAM mo del: t heir running times are substan tially larger than what eve n a careful analysis o n the RAM mo del w ould predict b ecause of memory effects suc h as paging and cac hing. In the following subsections, we review the hierarchical memory organization of mo dern computers, and ho w it leads t o memory effects so tha t the cost of accessing memory b ecomes a significan t part of the to t a l running 16 time of an a lg orithm. W e surv ey empirical observ ations o f these memory effects, and the study of data structures and algorithms that attempt to o v ercome b ottlenec ks due to slo w memory . 2.2.1.1 Mo dern computer organization Mo dern computers hav e a hierarc hical memory orga nization [HP96]. Memory is or- ganized into lev els suc h as the pro cessor’s registers, the cache (primary and secondary), main memory , secondary storage, and ev en distributed memory . The first few lev els of the memory hierarch y comprising the CPU registers, cac he, and main memory are realized in silicon compo nen ts, i.e., hardw are devices suc h as in tegrated circuits. This t yp e o f fast memory is called “ in ternal” storage, while the slow er magnetic disks, CD-ROMs , and tap es used for realizing secondary and tertiary storage comprise the “external” storage. Registers ha ve the smallest acces s time, and magnetic disks and tap es are t he slo w est. T ypically , the memory in one lev el is an o r der of mag nitude faster than in the next lev el. So, for instance, access times for registers and cac he memory are a few nanoseconds, while accessing main memory tak es tens of nanoseconds. The sizes (n um b ers of memory lo cations) of the leve ls also increase b y an order of magnitude fro m one leve l to the next. So , for instance, typical cache sizes are measured in kiloby tes while main memory sizes are of the order of megab ytes and larger. The reason for these differences is that faster memory is more exp ensiv e to manufacture and therefore is a v ailable in smaller quan tities. Most m ulti-progra mmed systems allo w the sim ultaneous exec ution of programs in a time-sharing fashion ev en when the sum of the memory requiremen ts of t he programs exceeds the amount of phy sical main memory av aila ble. Suc h systems implemen t virtual memory : not all data items referenced by a program need to reside in main memory . The virtual address space, whic h is m uc h larger than the real address space, is usually partitioned in to pages. P ages can reside either in main memory or on disk . When the pro cessor referenc es an address b elonging to a page not curren tly in t he ma in memory , the page mu st b e loaded f rom disk into main memory . Therefore, the time to access a memory lo catio n also depends on whether the corresp onding page o f virtual memory is curren tly in main memory . 17 Consequen tly , the memory organization is highly non-uniform, and the assumption of uniform memory cost in the R AM mo del is unrealistic. 2.2.1.2 Lo cality of reference Man y algorithms exhibit the phenomen on of spatial and temp oral lo calit y [Smi82]. Data items ar e accessed in regular patterns so tha t the next item to b e a ccesse d is v ery lik ely to b e one that is stored close to the last few items acces sed. This phenomenon is called spatia l lo cality . It o ccurs b ecause data items that a r e logically “close” to eac h other a lso tend to b e stored close t o gether in memory . F or instance, an ar r a y is a typical data structure used to represen t a list of related items of the same type. Consecutiv e arra y elemen ts are also stored in adjacen t memory lo catio ns. (See, ho w ev er, Chatterjee et al. [CJLM99 ] for a study of the adv antage o f a nonlinear la y out o f arrays in memory . Also, arc hitectures with in terleav ed memory store consecutiv e array elemen ts on differen t memory devices to fa cilita t e parallel or pip elined access to a blo ck of addresses.) A data item that is access ed at an y time is like ly to b e a ccessed again in the near future. F or example, the index v ariable in a lo op is pro ba bly also used in t he b o dy of t he lo op. Therefore, during the execution o f the lo op, the v ariable is accessed sev eral times in quic k succession. This is the phenomenon of temp oral lo cality . In addition, the hardw are arc hitecture mandates that the pro cessor can op erate only on data prese n t in its registers . Therefore, executing an op erat ion requires extra time to mo v e t he op erands in to registers and store the result ba c k to free up the registers for the next op era t ion. T ypically , data can be mo v ed only b et w een adjacen t lev els in the mem ory hierarc hy , suc h as b et ween the registers and the primary cache , cach e and main memory , and the main memory and secondary storage, but not directly b etw een the registers and secondary storage. Therefore, an algor it hm des igner mus t make effic ien t use of a v ailable memory , so that data is av aila ble in the fastest p ossible memory lo cation whenev er it is required. Of course, moving data a r o und inv olv es extra o v erhead. The memory allo cation problem is complicated b y the dynamic na t ure of man y algo rithms. 18 2.2.1.3 Memory effects The effects of cac hes on the p erformance of algorithms hav e b een observ ed in a n um b er of con t exts. Smith [Smi82] presen ted a large n um b er of empirical results obtained b y sim ulating the data access pat terns of real progr ams o n differen t cac he architecture s. LaMarca and Ladner [LL99] inv estigated the effect of caches on the p erformance of sorting algorithms, b o th exp erimentally and analytically . The authors sho w ed how to restructure MergeSor t , QuickSor t , and HeapSor t to impro v e the utilization of the cac he and reduce the execution time of these algor it hms. Their theoretical prediction of cac he misses incurred closely ma t c hes the empirically observ ed p erformance. LaMarca and Ladner [LL96] also inv estigated empirically the p erforma nce of heap implemen tations on diff erent architec tures. They presen ted optimizations to reduce t he cac he misses incurred b y heaps and ga v e empirical data ab out how their optimizations affected o v erall p erfo rmance on a n um b er of differen t architecture s. The p erformance of sev eral algo rithms such as matrix transp ositions and F FT on the virtual memory mo del was studied by Aggarw al and Chandra [AC 88]. The a uthors mo deled virtual memory as a large flat address-space whic h is partitioned in to blo c ks. Eac h blo c k of virtual memory is mapp ed into a blo c k of real memory . A blo ck of memory m ust b e loaded into real memory b efore it can b e acces sed. The authors sho wed that some algorithms m ust still run slo wly ev en if the algorithms w ere able to predict memory accesses in adv ance. 2.2.1.4 Complexit y of comm unication Algorithms tha t operate on large data sets sp end a subs tan tial amoun t of time ac- cessing data (reading from and writing to memory). Consequen tly , memory access time (also referred to in the literature a s I/O- o r comm unication-time) frequen tly dominates the computation time. Therefore, the RAM mo del, whic h do es not accoun t for memory effects, is inadequate for a ccurately predicting the p erfo r ma nce of suc h algo r ithms. Dep ending on the machine o rganization, either the time to compute results or the time to read/write data ma y dominate the running time of an algorithm. A computation graph represen ts the dep endency relationship b etw een data items—there is a directed edge from v ertex u to v ertex v if the op eration that computes the v alue at v requires that the v a lue at u b e already av ailable. F or computation on a collection of v alues whose 19 dep endencies form a grid gr a ph, the tradeoff b etw een the computation time and memory access time w as quan tified by Papadimitriou and Ullman [PU87]. The I/O -complexit y of an algorithm is the cost of inputs and outputs b et w een faster in ternal memory and slo w er secondary memory . Aggarwal and Vitter [A V88 ] pro v ed tigh t upp er and lo w er b ounds for the I/O-complexit y of sorting, computing the FFT, p erm uting, and matrix transpo sition. Hong and Kung [HK8 1] in t r o duced an abstract mo del of p ebbling a computation graph to a na lyze the I/O-complexit y of a lg orithms. The v ertices of t he g raph that hold p ebbles represen t dat a items that are loaded in to main memory . With a limited num b er of p ebbles a v aila ble, the n um b er of mo v es needed to transfer all t he p ebbles from the input v ertices to the o utput vertice s of the computatio n graph is the n um b er o f I/O op eratio ns b et w een main memory and external memory . In terpro cessor comm unication is a signific an t b ottlenec k in multiprocessor arc hitec- tures, and it b ecomes mo r e sev ere as the num b er o f pro cessors increases. In fact, depend- ing on the degree of parallelism of the problem itself, the comm unication time b etw een pro cessors frequen tly limits the execution speed. Aggarwal et al. [A CS90] prop osed the LPRAM mo del for parallel random access mac hines that incorp orates b oth t he compu- tational p o w er and communic ation dela y of pa r a llel architec tures. F or this mo del, they pro v ed upp er b ounds on b oth computatio n time and comm unication step s using p pro ces- sors for a n um b er of algorithms, including matrix m ultiplication, sorting, and computing an n - p oint FFT. 2.2.2 External memory algorithms Vitter [Vit] surv ey ed the stat e of the art in the design and a nalysis of data structures and algorithms that op erate on data sets that are to o large to fit in main memory . These algor ithms try t o reduce the p erformance b ottlenec k of accesses to slow er external memory . There has b een considerable in terest in the area of I/ O -efficien t algorithms for a lo ng time. Kn uth [Kn u73] in v estigated sorting algorit hms that w ork on files that are to o large to fit in fast in ternal memory . F or example, when the file to b e sorted is stored on a sequen tial tap e, a pro cess of loading blo cks of records into internal memory where they are sorted and using the tap e to merge the sorted blo ck s turns o ut quite naturally to b e more efficien t than r unning a sorting algo rithm on the en tire file. 20 Grossman and Silv erman [GS73] considered the v ery g eneral problem of storing records on a secondary storag e device to minimize exp ected retriev al time, when the pr o babilit y of accessing any record is know n in adv ance. The authors mo del the pattern of accesses b y means of a para meter that c haracterizes the degree to whic h the accesses are sequen tial in nature. There has b een interes t in the numerical computing field in impro ving the p erforma nce of algorithms that o p erate on large mat r ices [CS]. A success ful strategy is to par t ition the matrix in to rectangular blo c ks, eac h blo c k small enough to fit entirely in main memory or cac he, and o p erate on the blo ck s indep endently . The same blo c king strategy has b een emplo y ed for graph algorit hms [ABCP98, CGG + 95, NGV96]. The idea is to cov er an input gra ph with subgraphs; eac h subgraph is a small diameter neigh b orho o d of vertice s just big enough to fit in main memory . A computation on the entire g r a ph can b e p erformed by lo ading eac h neighborho o d subgraph in to main memory in turn, computing the final r esults for all vertice s in the subgraph, and storing bac k the results. Gil and Ita i [GI99] studied the pro blem of storing a binary tree in a virtual memory system to minimize the n um b er of page faults. They considered the pro blem o f allo cating the no des of a g iven binary tree ( no t necessarily a searc h tree) to virtual memory pages, called a packing, to optimize the cac he p erformance for some pattern of accesses to the tree no des. The authors inv estigated the specific mo del for tree access es in whic h a no de is accessed only via the pa t h from the ro ot to that no de. They presen ted a dynamic programming algorithm to find a pac king that minimizes t he n um b er o f page faults incurred and the n umber of differen t pages visited while accessing a no de. In addition, the authors pro v ed that the problem of finding an optimal pac king that also uses the minim um num b er o f pages in NP-complete, but they presen ted an efficien t appro ximation algorit hm. 2.2.3 Non-uniform memory arc hitecture In a non- uniform memory arc hitecture (NUMA), eac h pro cessor con tains a p ortion of the shared memory , so access times to differen t part s of the shared address space can v ary , sometimes significan tly . 21 NUMA arc hitectures hav e b een prop osed fo r large-scale m ultipro cessor computers. F or instance, Wilson [Wil87] prop osed an arc hitecture with hierarc hies of shared buses and cac hes. The author prop osed extensions of cache coherency proto cols to main tain cac he coherency in this mo del and pr esen ted simu lations to demonstrate that a 12 8 pro cessor computer could b e constructed using this arc hitecture t ha t w ould ac hiev e a substan tial fra ctio n of its p eak p erfo r ma nce. A related ar c hitecture prop osed by Hagersten et al. [HLH92], called the Cache -Only Memory Architec ture (COMA), is similar t o a NUMA in the sense that each pro cessor holds a p ortion of the shared a ddress space. In the COMA, ho w eve r, the allo cation of the shared address space among the pro cessors can b e dyn amic. All of t he distributed memory is org a nized like large cac hes. The cache b elonging to eac h pro cessor serv es t w o purp oses—it cac hes the recen tly accessed data fo r the pro cessor itself and also contains a p ortion of the shared memory . A coherence proto col is used t o manage the cac hes. 2.2.4 Mo d els for non - uniform memory One motiv ation for a better mo del of computat io n is the desire to mo del real com- puters more accurately . W e wan t to to b e able to design and analyze alg orithms, predict their p erformance, and c haracterize the hardness of problems. Consequen tly , we w an t a simple, elegan t model t ha t provides a faithful abstraction of an actual compute r. Be- lo w, we surv ey the theoretical mo dels of computation that hav e b een pr o p osed t o mo del memory effects in actual computers. The seminal pa p er b y Agga rw al et al. [AACS 87] in tro duced t he Hierarc hical Memory Mo del (HMM) o f computatio n with lo garithmic memory a ccess cost, i.e., acces s to the memory lo cation at address a takes time Θ(log a ). The HMM mo del seems realistic enough to mo del a computer with multiple lev els in the memory hierarc hy . It confirms with our intuition that successiv e lev els in memory b ecome slow er but bigger. Standard p olynomial-time RAM algorithms can run on this HMM mo del with an extra facto r of at most O (log n ) in the running time. The authors show ed tha t some a lg orithms can b e rewritten to reduce this f actor by t aking adv a n tage of lo cality of reference, while ot her algorithms cannot b e improv ed asymptotically . Aggarw al et al. [A CS87] pro p osed the Hierarchic al Memory mo del with Blo ck T r ansfer (HMBT) as a b etter mo del that incorp o r ates the cost of dat a t r a nsfer b etw een lev els in 22 the mem ory hierarc h y . The HMBT model allo ws data to b e transferred betw een lev els in blo ck s in a pip elined manner, so t hat it ta kes only constan t time p er unit o f memory after the initial item in the blo c k. The authors considered v ariants of the mo del with differen t memory access costs: f ( a ) = log a , f ( a ) = a β for 0 < β < 1, and f ( a ) = a . Aggarw al and Chandra [A C88 ] prop o sed a mo del V M f for a computer with virtual memory . The virtual memory on the V M f mo del consists of a hierarc hical par t it ioning of memory into con t ig uous interv als or blo c ks. Some subset of the blo c ks at any lev el are stored in faster (real) memory at an y time. The blo cks and sub -blo c ks of virtual memory are used to mo del disk blo c ks, pages of real memory , cac he lines, etc. The authors’ model for the real memory is the HMBT model B T f in whic h blo c ks of real memory can b e transferred b etw een memory lev els in unit time p er lo cation after the initial access, i.e., in a pip elined manner. The V M f is considered a higher-lev el abstraction on whic h to analyze application programs, while the r unning time is determined b y the time tak en b y the underly ing blo ck transfers. In b oth t he mo dels considered, the V M f and the B T f , the parameter f is a memory cost f unction represen ting the cost o f accessing a lo cation in real or virtual memory . The Uniform Memory Hierarch y (UMH) mo del of computation proposed b y Alp ern et al. [ACFS94] incorp orat es a n um b er of para meters that mo del the hierarchic al nature of computer memory . Like the HMBT , the UMH mo del allows data transfers betw een success iv e memory lev els via a bus. The tra nsfer cost along a bus is parameterized b y the bandwidth of the bus. Other parameters include the size o f a blo ck and the n um b er of blo c ks in each lev el of memory . Regan [Reg96] in tro duced the Blo ck Mov e (BM) model of computation that ex tended the ideas of t he HMBT mo del propo sed b y Agga rw al et a l. [AC S87]. The BM mo del allo ws more complex op erations suc h as sh uffling and rev ersing of blo c ks of memory , as w ell as the abilit y to apply other finite transductions b esides “copy ” to a blo ck of memory . The memory-access cost of a blo c k transfer, similar to that in the HMBT mo del, is unit cost p er lo cation aft er the initia l access. Regan pro v ed tha t different v ariants of the mo del are equiv alen t up to constan t fa ctors in the memory-access cost. He studied complexit y classes fo r the BM mo del and compared them with standard complexit y classes defined for the RAM and the T uring mac hine. 23 Tw o extensions of the HMBT mo del, the P ara llel HMBT (P-HMBT) and the pip elined P-HMBT (PP-HMBT), w ere in v estigated by Juurlink a nd Wijshoff [JW94 ]. In these mo dels, data transfers b et w een memory lev els ma y pro ceed concurren tly . The authors pro v ed tigh t b o unds on the total running time of sev eral problems on the P-HMBT mo del with acce ss cost function f ( a ) = ⌊ log a ⌋ . The P-HMBT mo del is iden tical to the HMBT mo del exc ept tha t blo c k transfers of data ar e allo we d to pro ceed in parallel b e- t w een memory lev els, and a transfer can tak e place only b et w een successiv e leve ls. In the PP-HMBT mo del, different blo ck transfers in v olving the same memory lo cation can b e pip elined. The authors show ed that the P-HMBT and HMBT mo dels are incomparable in strength, in the sense that t here a re problems that can b e solv ed faster on one mo del than on the ot her; how ev er, the PP-HMBT mo del is strictly more p o werful than b oth the HMBT and the P-HMBT mo dels. A n um b er of mo dels ha v e also b een prop osed for parallel computers with hierarchic al memory . V alian t [V al89] prop osed the Bulk-Sync hronous P arallel (BSP) mo del as an abstract mo del for designing and analyzing parallel programs. The BSP mo del consists of c om- p onents that p erfo rm computation and memory access tasks a nd a r outer that deliv ers messages p o in t-to-p oin t b et w een the comp onen ts. There is a facilit y to sync hronize all or a subset of comp onents a t the end of eac h sup erstep . The mo del emphasizes the sep- aration of t he task of computatio n and the task of communic ating b etw een comp onen ts. The purp ose of the ro ut er is to implemen t access b y the comp onen ts to shared memory in parallel. In [V al90], V aliant argues that the BSP mo del can b e implemen ted efficien tly in hardware, and therefore, it serv es as b oth an abstract mo del for designing, ana lyzing and implemen ting alg orithms as w ell as a realistic arch itecture realizable in hardw are. Culler et al. [CKP + 96] prop o sed the LogP mo del of a distributed-memory m ulti- pro cessor mach ine in whic h pro cessors comm unicate by p oin t-to- p oin t messages. The p erformance c haracteristics of the in terconnection netw ork are mo deled b y fo ur pa r a me- ters L , o , g , and P : L is the latency incurred in transmitting a mes sage ov er the net work, o is the ov erhead during whic h the pro cessor is busy transmitting or receiving a mes- sage, g is the minimum gap ( t ime interv al) b etw een consecutiv e message transmissions or reception by a pro cessor, a nd P is the num b er of pro cessors or memory mo dules. The 24 LogP model do es not mo del lo cal arc hitectural features, suc h as cac hes and pip elines, at eac h pro cessor. F or a comprehensiv e discussion of computatio nal mo dels, including mo dels for hier- arc hical memory , we refer the reader to the b o ok by Sav age [Sav98]. F or the rest of this thesis, w e fo cus on a generalization o f the HMM model due to Aggarw al et al. [AA CS87] where the memory cost function can b e an arbitrary nonde- creasing function, not just logarithmic. No w that w e hav e a more realistic mo del of computation, our next go al is to re-analyze existing algorithms and data structures, and either pro v e t hat they are still efficien t in this new mo del or design b etter ones. Also, in the cases where we observ e worse p erformance on the new mo del, w e w ould also lik e to b e able to prov e nontriv ial low er b ounds. This leads to our primary intere st in this thesis, whic h studies the problem of constructing minim um- cost binary searc h trees on a hierarchical memory mo del of computation. 25 CHAPTER 3 Algorith ms for Cons tructin g Optim um and Nearly Optim um Binary Se arc h T rees 3.1 The HMM mo del Our v ersion of the HMM mo del of computation consists of a single pro cessor with a potentially un b ounded n um b er of memory lo cat io ns with addresses 1, 2, 3 , . . . . W e iden tif y a memory locatio n b y its a ddress. A lo cation in memory can store a finite but arbitrarily large intege r v alue. The pro cessor can exec ute an y instruction in constant time, not counting the time sp en t reading from or writing to memory . Some instructions read op erands from memory or write results in to the memory . Suc h instructions can address a ny memory lo cation directly b y its a ddress; this is called “random access” to memory , as opp osed to sequen tial access. At most one memory lo cation can b e accessed at a time. The time take n t o read and write a memory lo cation is the same. The HMM is con trolled by a prog r a m consisting of a finite sequence o f instructions. The state of the HMM is defined by the sequence num b er of t he curren t instruction and the con ten t s of memory . In the initia l state, the pro cessor is just ab out to execute the first instruction in its program. If the length of the binary represen tation of the input is n , then memory lo cations 1 through n contain the input, and all memory lo cations at higher addresses con tain zeros. The program is not stored in memory but enco ded in the pro cessor’s finite con trol. The memory organization of the HMM mo del is dra ma t ically differen t from that of the RAM. On the HMM, accessing different memory lo cations may tak e differen t a moun ts 26 of time. Memory is organized in a hierarc h y , from fastest t o slo we st. Within eac h lev el of the hierarc h y , the cost of accessing a memory lo cation is the same. More precis ely , the memory o f the HMM is organized into a hierarc h y M 1 , M 2 , . . . , M h with h different leve ls, where M l denotes the set of memory lo cations in lev el l for 1 ≤ l ≤ h . Let m l = | M l | b e the n um b er of memory lo cations in M l . The time to access ev ery lo cation in M l is the same. Let c l b e the time tak en to access a single memory lo cation in M l . Without loss o f g enerality , the lev els in the memory hierarc h y are organized from fastest to slo w est, so that c 1 < c 2 < . . . < c h . W e will refer to the memory lo cations with the lo w est cost of access, c 1 , a s the “ c heap est” memory lo catio ns. F or an HMM, w e define a memory cost function µ : N → N that give s the cost µ ( a ) of a single access to the memory lo cation at address a . The function µ is defined by the follo wing increasing step function: µ ( a ) = c 1 for 0 < a ≤ m 1 c 2 for m 1 < a ≤ m 1 + m 2 c 3 for m 1 + m 2 < a ≤ m 1 + m 2 + m 3 . . . c h for P h − 1 l =1 m l < a ≤ P h l =1 m l . W e do no t mak e any assumptions a b out the relativ e sizes of the leve ls in the hierarc h y , although w e exp ect t ha t m 1 < m 2 < . . . < m h in an actual computer. A memory c onfigur ation with s locatio ns is a sequence C s = h n l | 1 ≤ l ≤ h i where eac h n l is the n um b er of memory lo cations from lev el l in the memory hierarch y and P h l =1 n l = s . The running time of a prog ram on the HMM mo del consists of the time take n b y the processor to execute the instructions according to the pro g ram and the time tak en to a ccess memory . Clearly , if ev en the fastest memory on the HMM is slow er than the uniform-cost memory on the RAM, then the same pro gram will tak e longer on the HMM than on the RAM. Assume that the RAM memory is unit cost p er access, and that 1 ≤ c 1 < c 2 < . . . < c h . Then, the running time of an a lgorithm on the HMM will b e at most c h times that o n the R AM. An interes ting question is whether the algorithm can b e redesigned to take adv an tage of lo calit y of reference so that its running time on the HMM is less than c h times the running time o n the RAM. 27 3.2 The HMM 2 mo de l The Hierarc hical Memory Mo del with tw o memory lev els ( HM M 2 ) is the sp ecial case of the general HMM mo del with h = 2. In the HMM 2 , memory is organized in a hierarc h y consisting o f only tw o lev els, denoted by M 1 and M 2 . There are m 1 lo cations in M 1 and m 2 lo cations in M 2 . The total num b er of memory lo cat io ns is m 1 + m 2 = n . A single access to an y lo cation in M 1 tak es time c 1 , and an acces s to any lo cation in M 2 tak es time c 2 , with c 1 < c 2 . W e will refer to the memory lo cations in M 1 as the “c heap er” or “less exp ensiv e” lo cations. 3.3 Optim um BSTs on the HMM mo del W e study the fo llowing problem for the HMM mo del with n memory lo cations and an a rbitrary memory cost function µ : { 1, 2 , . . . , n } → N . Problem 5. [ Constructin g an optimum BST on the HMM ] Supp ose w e are giv en a set of n k eys, x 1 , x 2 , . . . , x n in order, the probabilities p i for 1 ≤ i ≤ n t ha t a searc h argumen t y eq uals x i , and the pro babilities q j for 0 ≤ j ≤ n that x j − 1 ≺ y ≺ x j . The problem is to construct a binary se arc h tr ee T ov er the set of key s and compute a memory assignmen t function φ : V ( T ) → { 1 , 2 , . . . , n } that assigns the (in ternal) no des of T to memory lo catio ns suc h tha t the exp ected cost o f a searc h is minimized. Let h T , φ i denote a p otential solution to the ab ov e problem: T is the com binatorial structure o f the tree, and the memory a ssignmen t function φ maps the internal no des of T t o memory lo catio ns. If v is an in ternal no de of T , then φ ( v ) is the address o f the memory lo cation where v is stored, and µ ( φ ( v )) is the cost of a single acces s to v . If v stores the k ey x i , then w e will sometimes write φ ( x i ) for φ ( v ). On the other hand, if v is an external no de of T , then suc h a no de do es not actually exist in the tree; how ev er, it do es con tribute to the probability that its paren t no de is a ccesse d. Therefore, for an external no de v , we use φ ( v ) to denote the memory lo cation where the p ar ent of v is stored. Let T v denote the subtree of T ro oted at v . No w T v is a binary search tree ov er some subset, sa y { x i , x i +1 , . . . , x j } , of k eys; let w ( T v ) denote the sum of the corresponding probabilities: w ( T v ) = w i,j = P j k = i p k + P j k = i − 1 q k . (If v is t he external no de z j , w e use 28 the con v ention that v is a subtree ov er t he empt y set o f k eys from x j +1 through x j , and w ( T v ) = w j +1 ,j = q j .) Therefore, w ( T v ) is the probabilit y that the searc h for a ke y in T pro ceeds anyw here in the subtree T v . On the HMM mo del, making a single comparison of the searc h ar g umen t y with the k ey x i incurs, in additio n to the constan t computation time, a cost of µ ( φ ( x i )) for accessing the memory lo catio n where the corresp onding no de of T is stored. By the cost of h T , φ i , we mean the exp ected cost of a searc h: cost( h T , φ i ) = n X i =1 w ( T x i ) · µ ( φ ( x i )) + n X j =0 w ( T z j ) · µ ( φ ( z j )) (3.1) where the first summation is ov er all n in ternal no des x i of T and the second summation is o v er the n + 1 external no des z j . Here is a no ther w a y t o deriv e the ab ov e f o rm ula—the searc h algorithm a ccesse s the no de v whenev er the searc h pro ceeds any where in the subtree ro oted at v , and the probabilit y of this ev en t is precisely w ( T v ) = w i,j . The con tribution of the no de v to the total cost is the pro babilit y w ( T v ) o f accessing v times the cost µ ( φ ( v )) of a single access to the memory lo cation containing v . The pa ir h T ∗ , φ ∗ i is an optimum solution to an instance of problem 5 if cost( h T ∗ , φ ∗ i ) is minimum o ver all binary searc h trees T and functions φ a ssigning the no des of T to memory lo cations. W e show b elow in Lemma 7 that for a g iv en tr ee T there is a unique function φ that optima lly assigns no des of T to memory lo cations. It is easy to see tha t on the standard RAM mo del where ev ery memory access tak es unit time, equation (3.1) is equiv alen t to equation (1.2). Eac h no de v con tributes once to the sum on the right side of (3.1) for eac h of its ancestors in T . 3.3.1 Storing a tree in memory optimally The f o llo wing lemmas sho w that the problem of constructing optimum BSTs sp ecifi- cally on the HMM mo del is interes ting b ecause of the in terplay betw een the t w o parameters— the comb inatorial structure of the tr ee and the memory assignme n t; restricted v ersions of the general problem hav e simple solutions. Consider the follo wing r estriction of problem 5 with the com binat o rial structure of the BST T fixed. 29 Problem 6. Giv en a binary searc h tree T ov er the set of key s x 1 through x n , compute an optim um memory assignmen t function φ : V ( T ) → { 1 , 2 , . . . , n } that assigns the no des of T to memory lo cations suc h that the exp ected cost of a searc h is minimized. Let π ( v ) denote the paren t of the no de v in T ; if v is the roo t, then let π ( v ) = v . Let φ ∗ denote an optimum memory a ssignmen t function that a ssigns the no des of T to lo cations in memory . Lemma 7. With T fixed, for ev ery no de v of T , µ ( φ ∗ ( π ( v ))) ≤ µ ( φ ∗ ( v )) . In other w ords, f or a fixed BST T , there exists an o ptimal memory a ssignmen t function that a ssigns ev ery no de of T to a memory lo cation that is no more exp ensiv e than the memory lo catio ns assigned to its c hildren. Pro of: Assume to the contrary that fo r a par t icular no de v , we hav e µ ( φ ∗ ( π ( v ))) > µ ( φ ∗ ( v )) . The contribution of v and π ( v ) to the total cost of the tree in the summation (3.1) is w ( T π ( v ) ) µ ( φ ∗ ( π ( v ))) + w ( T v ) µ ( φ ∗ ( v )) . The no de π ( v ) is accessed whenev er the searc h pro ceeds an ywhere in the subtree ro oted a t π ( v ), and lik ewise with v . Since eac h p i , q j ≥ 0, π ( v ) is a ccesse d at least as often as v , i.e., w ( T π ( v ) ) ≥ w ( T v ). Therefore, since µ ( φ ∗ ( v )) < µ ( φ ∗ ( π ( v ))) b y our assumption, w ( T π ( v ) ) µ ( φ ∗ ( v )) + w ( T v ) µ ( φ ∗ ( π ( v ))) ≤ w ( T π ( v ) ) µ ( φ ∗ ( π ( v ))) + w ( T v ) µ ( φ ∗ ( v )) so that we can sw ap the memory lo cat io ns where v and its paren t π ( v ) are stored and not increase the cost o f the solution. As a consequence, t he ro ot of any subtree is stored in the cheapest memory lo cation among all no des in that subtree. Lemma 8. F or fixed T , the o ptimum memory assignmen t function, φ ∗ , can b e deter- mined b y a greedy algorithm. The running time of this greedy algorithm is O ( n log n ) on t he RAM. 30 Pro of: It follows from Lemma 7 that under some optim um memory assignmen t, the ro ot of the tree mus t b e assigned the ch eap est av aila ble memory lo cation. Again from the same lemma, the next c heap est av ailable lo catio n can b e assigned o nly to one of the c hildren of the ro ot, and so on. The following a lgorithm implemen ts this greedy strategy . By the weight of a no de v in the tree, w e mean the sum of the probabilities of all no des in the subtree ro oted at v , i.e., w ( T v ). The v alue w ( T v ) can b e computed f o r ev ery subtree T v in linear time and stored at v . W e main tain t he set of candidates f o r the next c heap est lo cation in a heap ordered by their w eights. Among all candidates, the optim um choice is to assign the c heap est lo cation to the hea viest v ertex. W e extract this v ertex, sa y u , from t he top of the heap, store it in the next a v aila ble memory lo cation, and insert the tw o children of u into the heap. Initially , the heap con tains just the ro o t of the en tire tree, and the algorithm con tin ues un til the heap is empty . This algorithm p erforms n insertions and n deletions on a heap containing at most n elemen ts. Therefore, its running time on the uniform-cost RAM mo del is O ( n log n ). 3.3.2 Constructing an optim um tree wh en the memory assign- men t is fixed Consider the following restriction of problem 5 where the memory assignmen t function φ is giv en. Problem 9. Supp ose eac h of the ke ys x i , for 1 ≤ i ≤ n , is a ssigned a priori a fixed lo cation φ ( x i ) in memory . Compute the structure of a binary searc h tree of minim um cost where ev ery no de v i of the tree corresp onding to k ey x i is stored in memory lo cation φ ( x i ) . Lemma 10. G iv en a fixed assignmen t of k eys to memory locat io ns, i.e., a function φ from the set of ke ys (equiv alen tly , the set of no des of any BST T ) to the set of memory lo cations, the BST T ∗ of minim um cost can b e constructed b y a dynamic pr o gramming algorithm. The running time of this a lgorithm is O ( n 3 ) o n the RAM. Pro of: The principle of optima lity clearly applies here so that a BST is optimum if a nd only if eac h subtree is o ptim um. The standard dynamic programming algorithm pro ceeds as follow s: 31 Let cost ( T ∗ i,j ) denote the cost of an optimum BST ov er the k eys x i , x i +1 , . . . , x j and the corresp onding probabilities p i , p i +1 , . . . , p j and q i − 1 , q i , . . . , q j , given the fixed memory assignmen t φ . By the principle of optimality , cost( T ∗ i,j ) = w i,j · µ ( φ ( x k )) + min i ≤ k ≤ j cost( T ∗ i,k − 1 ) + cost( T ∗ k +1 ,j ) for i ≤ j cost( T ∗ i +1 ,i ) = w i +1 ,i = q i . (3.2) Recall that w i,j is the probability that the ro ot of this subtree is accessed, and µ ( φ ( x k )) is the cost of a single a ccess to the memory lo catio n φ ( x k ) where x k is stored. Notice that this expression is equiv alen t to equation (2.1) except fo r the m ultiplica- tiv e factor µ ( φ ( x k )). Therefore, algorithm K1 from section 2.1.1 .1 can b e used to construct the o ptim um binary searc h tree efficien tly , giv en an assignmen t of ke ys to memory lo catio ns. In general, it do es not see m p ossible to use a mo no tonicit y principle to reduc e the running time to O ( n 2 ), as in algorithm K2 of section 2.1 .1 .1. 3.3.3 Naiv e algorithm A naiv e algorithm fo r problem 5 is to try ev ery p ossible mapping o f key s to memory lo cations. Lemma 10 gua r a n tees that we can then use dynamic programming to construct an optim um binary searc h tree for t ha t memory assignme n t. W e select the minim um-cost tree o v er all p ossible memory assignmen t functions. There are n m 1 , m 2 , . . . , m h suc h mappings from n k eys to n memory lo cations with m 1 of the first ty p e, m 2 of the second type, and so on. The multinomial co efficien t is maximiz ed when m 1 = m 2 = · · · = m h − 1 = ⌊ n/h ⌋ . The dynamic programming algorithm tak es O ( n 3 ) time to compute the optim um BST for eac h fixed memory assignmen t. Hence, the running t ime o f the naiv e 32 algorithm is O n ! n h ! h · n 3 ! = O √ 2 π n ( n/e ) n ( p 2 π ( n/h )( ( n/h ) /e ) ( n/h ) ) h · n 3 ! using Stirling’s approx imation = O √ 2 π n ( p 2 π ( n/h )) h · h n · n 3 ! = O h ( h/ 2) (2 π n ) ( h − 1) / 2 · h n · n 3 = O h n + h/ 2 · n 3 − ( h − 1) / 2 (2 π ) ( h − 1) / 2 = O ( h n · n 3 ) . (3.3) Unfortunately , the ab ov e algorithm is inefficien t and therefore infeasible ev en for small v alues of n b ecause its running time is exp onen tial in n . W e dev elop muc h more efficien t algorithms in the f o llo wing sections. 3.3.4 A dynamic programming algorithm: algorithm P arts A b etter algorithm uses dynamic programming to construct optim um subtrees b otto m- up, lik e algorithm K1 from section 2 .1.1.1. Our new alg orithm, algorithm P ar ts , constructs an optimu m subtree T ∗ i,j for eac h i , j , suc h that 1 ≤ i ≤ j ≤ n and for ev- ery memory configuration h n 1 , n 2 , . . . , n h i consisting of the j − i + 1 memory lo cations a v ailable a t this stage in the computation. F o r each p ossible c hoice x k for the ro ot of the subtree T i,j , there are at most j − i + 2 ≤ n + 1 differen t w a ys to partition the n um b er of av a ila ble lo cations in each of h − 1 lev els of the memory hierarc h y b et wee n the left and righ t subtrees of x k . (Since the num b er of memory lo cations assigned to any subtree equals the n um b er of no des in the subtree, w e ha v e the freedom to c ho o se o nly the n um b er of lo cations from any h − 1 lev els b ecause the n um b er of lo cations from the remaining lev el is then determined.) W e mo dif y algorithm K1 from section 2.1 .1.1 as follo ws. algorithm K1 builds larger a nd larger optim um subtrees T ∗ i,j for all i , j suc h that 1 ≤ i ≤ j ≤ n . F or ev ery c ho ice of i and j , the a lgorithm iterates through the j − i + 1 choice s fo r the ro ot of the subtree fro m among { x i , x i +1 , . . . , x j } . The left subtree of T ∗ i,j with x k at the roo t is a 33 BST, sa y T ( L ) , ov er the k eys x i through x k − 1 , and the righ t subtree is a BST, say T ( R ) , o v er the key s x k +1 through x j . The subtree T i,j has j − i + 1 no des. Supp ose the n um b er of memory lo cations a v ailable for the subtree T i,j from each of the memory lev els is n l for 1 ≤ l ≤ h , where P h l =1 n l = j − i + 1. There are ( j − i + 1) + h − 1 h − 1 = j − i + h h − 1 = O ( n + h ) h − 1 ( h − 1)! = O 2 h − 1 ( h − 1)! n h − 1 since h ≤ n differen t wa ys to pa r tition j − i + 1 o b jects in to h parts without restriction, and therefore, at most as many differen t memory configurations with j − i + 1 memory lo cations. (There are lik ely to b e far fewe r differen t memory configura t io ns b ecause there are at most m 1 memory locatio ns from the first lev el, at most m 2 from the second, and so on, in a ny configuration.) Let λ b e the smallest in teger suc h that n λ > 0; in o t her w ords, the c heap est a v ailable memory lo catio n is f rom memory lev el λ . F or ev ery choice of i , j , and k , there are at mo st min { k − i + 1 , n λ } ≤ n differen t c hoices for the n um b er of memory lo cations from lev el λ t o b e assigned to the left subtree, T ( L ) . This is b ecause the left subtree with k − i no des can b e assigned an y num b er from zero to max { k − i, n λ − 1 } lo cat io ns from the first av ailable memory lev el, M λ . (Only at most n λ − 1 locations from M λ are av ailable after the r o ot x k is stored in the che ap est a v ailable lo cation.) The remaining lo cations from M λ a v ailable t o the en tire subtree a re assigned to the right subtree, T ( R ) . Lik ewise, there are at most min { k − i + 1 , n λ +1 + 1 } ≤ n + 1 differen t c hoices for the n um b er of w a ys to partition the av ailable memory lo cations from the next memory lev el M λ +1 b et w een the left and righ t subtrees, and so on. In general, the n um b er of memory lo cations from the memory lev el l a ssigned to the left subtree, n ( L ) l , ranges from 0 to at most n l . Corresp o ndingly , the n um b er of memory lo cations from the lev el l assigned to the right subtree n ( R ) l is n l − n ( L ) l . W e mo dify algorithm K1 b y inserting h − λ ≤ h − 1 mo r e nested lo ops that iterate through ev ery suc h w a y to partitio n the a v ailable memory lo cations from M λ through M h − 1 b et w een the left and righ t subtrees of T i,j for a fixed c hoice of x k as the ro o t. 34 algorithm P ar ts : ( Initialization ) for i := 0 to n Let C 0 b e the empt y memory configuration h 0 , 0 , . . . , 0 i C [ i + 1 , i, C 0 ] ← q i ; R [ i + 1 , i, C 0 ] ← Nil ; for d := 0 to n − 1 ( Construct optimum subtrees with d + 1 no des. ) for eac h memory configuration C of size d + 1 for i := 1 to n − d j ← i + d C [ i, j, C ] ← ∞ R [ i, j, C ] ← Nil for k := i to j ( Num b er of no des in the left and right subtrees. ) l ← k − i ( n um b er of no des in t he left subtree ) r ← j − k ( n um b er of no des in t he righ t subtree ) Call procedure P ar tition-M emor y (fig ure 3.2) to compute the optim um wa y t o pa r tition the a v ailable memory lo cations. Figure 3.1 algorithm P ar ts 35 procedure P ar tition-Memor y : Let C ≡ h n 1 , n 2 , . . . , n h i . Let λ b e the smallest in teger suc h that n λ > 0. for n ( L ) λ := 0 to n λ for n ( L ) λ +1 := 0 to n λ +1 . . . for n ( L ) h − 1 := 0 to n h − 1 n ( L ) h ← l − P h − 1 i =1 n ( L ) i n ( R ) λ ← n λ − n ( L ) λ n ( R ) λ +1 ← n λ +1 − n ( L ) λ +1 . . . n ( R ) h − 1 ← n h − 1 − n ( L ) h − 1 n ( R ) h ← r − P h − 1 i =1 n ( R ) i Use one c heap lo cation for the r o ot, i.e., n ( L ) λ ← n ( L ) λ − 1 n ( R ) λ ← n ( R ) λ − 1 Let C L = h 0 , . . . , 0 , n ( L ) λ , n ( L ) λ +1 , . . . , n ( L ) h i . Let C R = h 0 , . . . , 0 , ( n λ − 1) − n ( L ) λ , n λ +1 − n ( L ) λ +1 , . . . , n h − n ( L ) h i . Let T ′ b e the tree with x k at the ro ot, and the left and right c hildren are giv en b y R [ i, k − 1 , C L ] and R [ k + 1 , j, C R ] resp ectiv ely . i.e., T ′ is the tree x k T [ i, k − 1 , C L ] T [ k + 1 , j, C R ] ( Let c ′ b e the cost of T ′ . ) ( The ro ot of T ′ is stored in a lo cat io n of cost c λ . ) C ′ ← c λ · w i,j + C [ i, k − 1 , C L ] + C [ k + 1 , j, C R ] if C ′ < C [ i, j, C ] R [ i, j, C ] ← h k , C L i C [ i, j, C ] ← C ′ Figure 3.2 p rocedure P ar t ition-Memor y 36 Just lik e algorithm K1 , algorithm P ar ts of figure 3.1 constructs arrays R and C , each indexed by the pair i , j , suc h that 1 ≤ i ≤ j ≤ n , a nd the memory configuration C sp ecifying the num b ers of memory lo catio ns from eac h of the h leve ls a v ailable to the subtree T i,j . Let C = h n 1 , n 2 , . . . , n h i . The array en try R [ i, j, C ] stores the pair h k , C L i , where k is the index o f the ro o t of the optimum subtree T ∗ i,j for memory configuration C , and C L is the optimum memory configuration for the left subtree. In other w ords, C L sp ecifies for eac h l the num b er of memory lo cations n ( L ) l out of the total n l lo cations from lev el l av ailable to the subtree T i,j that are assigned to the left subtree. The memory configuration C R of the righ t subtree is automatically determined: the n um b er of memory lo cations n ( R ) l from lev el l that are assigned to the righ t subtree is n l − n ( L ) l , except that one lo cation fro m the cheapest memory lev el av ailable is consumed by the ro ot. The structure o f the optimum BST and the o ptim um memory assignmen t function is stored implic itly in the arra y R . Let T [ i, j, C ] denote the implicit represen tation of the optim um BST ov er the subset of key s fr om x i through x j for memory configuration C . If R [1 , n, C ] = h k , C ′ i , t hen the ro ot of the en tire tree is x k and it is stored in the che ap est a v ailable memory lo catio n of cost c λ . The left subtree is ov er the subset of k eys x 1 through x k − 1 , and the memory configuration for the left subtree is C ′ = h 0 , . . . , 0 , n ′ λ , n ′ λ +1 , . . . , n ′ h i . The r ig h t subtree is ov er the subset of k eys x k +1 through x n , a nd the memory configura- tion for the right subtree is h 0 , . . . , 0 , ( n λ − 1) − n ′ λ , n λ +1 − n ′ λ +1 , . . . , n h − n ′ h i . In algorithm P ar ts , there are 3 + ( h − 1) = h + 2 nested lo ops eac h of whic h iterates a t most n times, in addition to the lo op that iterates o v er all p ossible memory configurations of size d + 1 for 0 ≤ d ≤ n − 1. Hence, the running time of the algorithm is O 2 h − 1 ( h − 1)! n h − 1 · n h +2 = O 2 h − 1 ( h − 1)! · n 2 h +1 . (3.4) 3.3.5 Another dynamic programming algorithm: algorithm T run ks In this subsection, w e dev elop another algor it hm that iterativ ely constructs optim um subtrees T ∗ i,j o v er lar g er and larger subsets of k eys. F ix an i and j with 1 ≤ i ≤ j ≤ n and j − i = d , and a memory configuration C s +1 = h n 1 , n 2 , . . . , n h − 1 , n h i consis ting of s + 1 memory lo catio ns from the first h − 1 lev els of the memory hierarc h y and none from the la st lev el, i.e., n 1 + n 2 + · · · + n h − 1 = s + 1 a nd n h = 0. A t iteration s + 1, w e require an o ptim um subtree, ov er the subset of k eys from x i through x j , with s of its 37 no des assigned to memory lo cations f r om the first h − 1 lev els of the memory hierarc h y and the remaining ( j − i + 1) − s no des stored in the most exp ensiv e lo cations. Call the subtree induced b y the no des stored in t he first h − 1 memory lev els the trunk (short for “truncated”) of the tree. (L emma 7 guaran tees that the trunk will also be a tree, and the ro ot of the en tire tree is a lso the ro ot of the trunk. So, in f a ct, a trunk with s + 1 no des of a tree is obtained b y pruning the tree down to s + 1 no des by recursiv ely deleting lea v es.) W e require the optim um subtree T ∗ 1 ,n with P h − 1 r =1 m r = n − m h no des in the trunk, all of whic h a r e assigned to the n − m h lo cations in the ch eap est h − 1 memory lev els. Recall that m l is the n um b er o f memory lo catio ns in memory lev el l fo r 1 ≤ l ≤ h . algorithm Tr unks in fig ure 3.3 constructs a table indexed b y i , j , and C s +1 . There are n 2 differen t choic es of i a nd j suc h tha t 1 ≤ i ≤ j ≤ n . Also, there are ( s + 1) + ( h − 1) − 1 h − 2 = s + h − 1 h − 2 differen t w a ys to partition s + 1 o b jects into h − 1 parts without restriction, and therefore, at most as ma ny differen t memory configurations with s + 1 memory lo catio ns from the first h − 1 memory lev els. (As mentioned earlier, there a re likely to b e far fewe r differen t memory configurations b ecause there are restrictions on the n umber of memory lo catio ns from eac h lev el in an y configuration.) F or ev ery v alue of k from i to j and eve ry t from 0 to s , w e construct a subtree with x k at the ro ot and t no des in the trunk of the left subtree (t he left trunk ) and s − t no des in the trunk of the right subtree (the right trunk ). By Lemma 7, t he ro ot of the subtree x k is alwa ys stored in the c heap est a v ailable memory lo cation. There are at most s t w ays to select t out of the remaining s memory lo cations t o assign to the left trunk. (In fact, since the s memory lo cat io ns are no t necessarily all distinct, t here are lik ely to b e far fewe r w ays t o do this.) As t iterates from 0 through s , the tota l n um b er of w a ys to partit io n the a v ailable s memory lo cations and assign them to the left and right trunks is at most s X t =0 s t = 2 s . When all the no des of the subtree are stor ed in memory lo cations in lev el h (t he base case when s = 0), an optimum subtree T ∗ i,j is one constructed b y algorithm K2 from section 2.1.1.1. Therefore, in an initial phase, w e execute algorithm K2 to construct, 38 algorithm Tru nks : Initially , the o ptim um subtree T ∗ i,j is unkno wn for a ll i , j , except when the subtree fits entirely in memory lev el M h , in whic h case the optim um subtree is the one computed b y algorithm K2 during the initializatio n phase. for d := 0 to n − 1 for i := 1 to n − d j ← i + d ( Construct an o pt imum BST ov er t he subset of k eys from x i through x j . ) for k := i to j ( Cho ose x k to b e the ro ot o f this subtree. ) for s := 1 to n − m h − 1 ( Construct a BST with s no des in its trunk. ) F or eve ry memory configuration C s of size s for t := 0 to s ( The left trunk has t no des. ) F or ev ery c hoice of t o ut of the s memory lo catio ns in C s to assign to the left subtree. Let T ′ b e the BST ov er t he subset of k eys from x i through x j with x k at the ro ot, t no des in the t runk of the left subtree, a nd s − t no des in the trunk of the righ t subtree. The left subtree of T ′ is the previously computed optim um subtree ov er the key s x i through x k − 1 with t no des in it s trunk, and the right subtree of T ′ is the previously computed optimum subtree ov er the k eys x k +1 through x j with s − t no des in its trunk. If the cost of T ′ is less than that of the minimum-cost subtree found so far, then record T ′ as the new optim um subtree. Figure 3.3 algorithm Trunks 39 in O ( n 2 ) time, all optim um subtrees T ∗ i,j that fit en t ir ely within one memory lev el, in particular, the last a nd most exp ensiv e memory lev el. The total running time of the dynamic progra mming algorithm is, therefore, O n 2 + n − 1 X d =0 n − d X i =1 i + d X k = i n − m h − 1 X s =0 s + h − 1 h − 2 · 2 s ! . Let f ( n ) = n − m h − 1 X s =0 s + h − 1 h − 2 · 2 s . By definition, f ( n ) ≤ n − m h − 1 X s =0 ( s + h − 1) h − 2 ( h − 2)! · 2 s = 1 ( h − 2)! n − m h − 1 X s =0 ( s + h − 1) h − 2 · 2 s . Th us, f ( n ) is bo unded abov e b y the sum of a geometric series whose ratio is a t most 2 · ( n − m h − 1 + h − 1). Hence, we hav e f ( n ) ≤ 1 ( h − 2)! · 2 n − m h ( n − m h + h − 2) n − m h − 1 2( n − m h + h − 2) − 1 = O 2 n − m h · ( n − m h + h ) n − m h ( h − 2)! . Therefore, the running time of the algorithm is O n − 1 X d =0 n − d X i =1 j = i + d X k = i 2 n − m h · ( n − m h + h ) n − m h ( h − 2)! ! = O 2 n − m h · ( n − m h + h ) n − m h ( h − 2)! n − 1 X d =0 n − d X i =1 ( d + 1) ! = O 2 n − m h · ( n − m h + h ) n − m h · n 3 ( h − 2)! . (3.5) algorithm Trunks is efficien t when n − m h and h are b o th small. F or instance, consider a memory organization in whic h the memory cost function gro ws as the tow er function defined by : to w er(0) = 1 to w er( i + 1) = 2 to we r( i ) = 2 2 . . . 2 ) ( i + 1 times) for all i ≥ 1 . If µ ( a ) = to wer( a ) is the memory cost function, then P h − 1 r =1 m r = n − m h < lg P h r =1 m r = lg n , and h = log ∗ n . F or all practical purp oses, log ∗ n is a small constan t; t herefore, the running time b ound of equation 3.5 is a lmost a p olynomial in n . 40 3.3.6 A top-do wn algorithm: algorithm Split Supp ose there are n distinct memory costs, o r n lev els in the memory hierarc h y with one locat io n in eac h lev el. A top-down r ecursiv e a lg orithm t o construct an optim um BST has to decide at eac h step in t he recursion ho w to partition the av a ilable memory lo cations b et w een the left and right subtrees. No t e that the n um b er of memory lo catio ns assigned to the left subtree determines the num b er of k eys in the left subtree, and therefore iden tifies the ro ot. So, for example, if k of the a v ailable n memory lo cat io ns are assigned to the left subtree, then there are k k eys in the left subtree, and henc e, the ro ot o f the tree is x k +1 . A t the top lev el, the ro ot is assigned the che ap est memory lo cation. Eac h of t he remaining n − 1 memory lo cations can b e assigned to either the left or the right subtree, so that k of the n − 1 lo cations are assigned to the left subtree a nd n − 1 − k lo cations to t he right subtree for every k suc h that 0 ≤ k ≤ n − 1 . Thus , there a r e 2 n − 1 differen t w ays to partition the av aila ble n − 1 memory lo cations b etw een the tw o subtrees of the ro ot. The algorithm pro ceeds recursiv ely to compute the left and righ t subtrees. The asymptotic running time of the ab ov e algo rithm is giv en by the recurrence T ( n ) = 2 n − 1 + max 0 ≤ k ≤ n − 1 { T ( k ) + T ( n − 1 − k ) } . No w, T ( n ) is at least 2 n − 1 , whic h is a conv ex function, and T ( n ) is a monotonically increasing function of n . Therefore, a simple inductiv e a rgumen t sho ws that T ( n ) itself is con v ex, so that it achiev es t he maxim um v alue at either k = 0 or k = n − 1. At k = 0, T ( n ) = 2 n − 1 + T (0) + T ( n − 1 ) whic h is the same v alue a s at k = n − 1. Therefore, T ( n ) ≤ 2 n − 1 + T (0) + T ( n − 1) = n − 1 X i =0 2 i = 2 n − 1 = O (2 n ) . (3.6) 3.4 Optim um BSTs on the HMM 2 mo de l In this section, w e consider the problem of constructing and storing an optim um BST on the HMM 2 mo del. Recall that the HMM 2 mo del consists of m 1 lo cations in memory 41 lev el M 1 , eac h of cost c 1 , and m 2 lo cations in memory lev el M 2 , eac h of cost c 2 , with c 1 < c 2 . 3.4.1 A dynamic programming algorithm In this section, w e dev elop a h ybrid dynamic programming algo r it hm to construct an optim um BST. Recall that algorithm K2 of section 2.1.1 constructs an optimum BST for the uniform-cost RAM mo del in O ( n 2 ) time. It is an easy observ ation that the structure of an optim um subtree that fits en tirely in one memory lev el is t he same as that of the optimum subtree on the uniform-cost RAM model. Therefore, in an initial phase of our hyb rid algo rithm, w e construct o ptimum subtrees with at most max m 1 , m 2 no des that fit in t he largest memory leve l. In phase I I, w e construct larger subtrees. Recall fr o m equation (2.1) t ha t on the uniform-cost RAM mo del the cost c ( i, j ) of an optim um BST ov er the subset of k eys from x i through x j is giv en by the recurrence c ( i + 1 , i ) = w i +1 ,i = q i c ( i, j ) = w i,j + min i ≤ k ≤ j ( c ( i, k − 1) + c ( k + 1 , j )) when i ≤ j On the HMM 2 mo del, the cost of an optim um BST T ∗ i,j o v er the same subset of k eys is c ( i + 1 , i, n 1 , n 2 ) = q i c ( i, j, n 1 , n 2 ) = µ ( φ ( x k )) · w i,j + min i ≤ k ≤ j 0 ≤ n ( L ) 1 0, then x k is stor ed in a lo cation o f cost c 1 , and n ( L ) 1 + n ( R ) 1 = n 1 − 1 a nd n ( L ) 2 + n ( R ) 2 = n 2 ; 42 • otherwise, n 1 = 0 and n 2 = j − i + 1, so x k is stored in a lo cation o f cost c 2 , and the en tire subtree is stored in the second memory lev el; the optim um subtree T ∗ i,j is the same as the optimum o ne on t he RAM mo del constructed during phase I. The first phase of the algor it hm, procedure TL-phase-I constructs arrays C and R , where C [ i, j ] is the cost of an optim um BST (on the unifor m-cost mo del) o ve r the subset of key s f rom x i through x j ; R [ i, j ] is the index of the ro ot of suc h an optim um BST. The sec ond phase, procedure TL-phase-I I , constructs arra ys c and r , suc h that c [ i, j, n 1 , n 2 ] is the cost of an optim um BST ov er the subset o f k eys from x i through x j with n 1 and n 2 a v ailable memory lo cations of cost c 1 and c 2 resp ectiv ely , and n 1 + n 2 = j − i + 1; r [ i, j, n 1 , n 2 ] is the index o f the ro ot of suc h an optim um BST. The structure of the t ree can b e retriev ed in O ( n ) time from t he arra y r at the end of the execution of algorithm Tw oLeve l . 3.4.1.1 algorithm Tw oLev el algorithm Tw oLevel first calls procedure TL-phase-I . Recall that proce- dure TL-phas e-I c onstructs a ll subtrees T i,j that con tain few enough no des to fit en tir ely in an y one lev el in the memory hierarch y , sp ecifically the largest lev el. En tries in table R [ i, j ] are filled b y procedure TL-phase-I . procedure TL-p hase-I I computes optim um subtrees where n 1 and n 2 are g r eater than zero. Therefore, prior to in v oking algo r it hm TL-p hase-I I , algorithm Tw oLeve l initializes the entries in table r [ i, j, n 1 , n 2 ] when n 1 = 0 and when n 2 = 0 from the en tries in table R [ i, j ]. 3.4.1.2 Pro cedure TL-phase-I procedure TL-p hase-I is iden tical to algorithm K2 f rom section 2.1.1.1 except that the outermost lo o p in v olving d iterates only max { m 1 , m 2 } times in p rocedure TL- phase-I . procedure TL-phase- I computes optimum subtree s in a b otto m-up fashion. It fills en tries in the tables C [ i, j ] a nd R [ i, j ] by diagonals, i.e., in the or der of increasing d = j − i . The size of the largest subtree that fits en tirely in one memory lev el is max { m 1 , m 2 } , corr esp o nding to d = max { m 1 , m 2 } − 1. 43 algorithm Tw oLevel : Call procedure TL-phas e-I (figure 3.5) If either m 1 = 0 or m 2 = 0, then w e a re done. Otherwise, Initialize, for all i , j suc h that 1 ≤ i ≤ j ≤ n : r [ i, j, 0 , j − i + 1] ← R [ i, j ] r [ i, j, j − i + 1 , 0] ← R [ i, j ] c [ i, j, 0 , j − i + 1] ← c 2 · C [ i, j ] c [ i, j, j − i + 1 , 0] ← c 1 · C [ i, j ] Call procedure TL-phas e-I I (figure 3.6) Figure 3.4 algorithm Tw oLeve l F or eve ry i, j with j − i = d , TL-phase-I computes the cost o f a subtree T ′ with x k at t he ro ot for all k , suc h that R [ i, j − 1] ≤ k ≤ R [ i + 1 , j ]. No te that ( j − 1) − i = j − ( i + 1) = d − 1; t herefore, en tries R [ i, j − 1] and R [ i + 1 , j ] are already a v ailable during this iteration of the outermost lo op. The optim um choice for the ro ot of this sub tree is the v alue of k for whic h the cost of the subtree is minimized. 3.4.1.3 Pro cedure TL-phase-I I procedure TL-phase-I I is an implemen tation of algorithm P ar ts in section 3.3.4 for the sp ecial case when h = 2. pr ocedure T L-phase -I I also constructs in- creasingly larger optim um subtrees in an iterative fa shion. The additio nal complexit y in this algo rithm arises from the fact that for each p ossible choice of ro ot x k of the subtree T i,j , there are also a n um b er of different wa ys to partition the a v aila ble c heap lo cations b et w een the left and righ t subtrees of x k . There are m 1 c heap locatio ns and m 2 exp ensiv e lo cations a v ailable to store the sub- tree T i,j . If m 1 ≥ 1, then the ro ot x k is stored in a c heap lo catio n. The r emaining c heap lo cations are part itioned in to t w o , with n ( L ) 1 lo cations assigned to the left subtree and n ( R ) 1 lo cations assigned to the right subtree. n ( L ) 2 and n ( R ) 2 denote the n um b er o f exp ensiv e lo cations av a ilable to the left and righ t subtrees resp ectiv ely . Since the al- gorithm constructs optimu m subtrees in increasing order of j − i , the tw o table en tries r [ i, k − 1 , n ( L ) 1 , n ( L ) 2 ] and r [ k + 1 , j, n ( R ) 1 , n ( R ) 2 ] are already a v ailable during the iteration when j − i = d b ecause ( k − 1) − i < d and j − ( k + 1) < d . 44 procedure TL-phase -I : ( Initialization phase. ) for i := 0 to n C [ i + 1 , i ] ← w i +1 ,i = q i R [ i + 1 , i ] ← Nil for d := 0 to max { m 1 , m 2 } − 1 for i := 1 to n − d j ← i + d ( Num b er of no des in this subtree: j − i + 1 = d + 1 . ) C [ i, j ] ← ∞ R [ i, j ] ← Nil for k := R [ i, j − 1] to R [ i + 1 , j ] ( ⋆ ) T ′ is the tree x k T [ i, k − 1] T [ k + 1 , j ] C ′ ← w i,j + C [ i, k − 1] + C [ k + 1 , j ] if C ′ < C [ i, j ] R [ i, j ] ← k C [ i, j ] ← C ′ Figure 3.5 p rocedure TL-phase- I 45 procedure TL-p hase-I I : for d := min { m 1 , m 2 } t o n − 1 for n 1 := 0 to min { m 1 , d + 1 } n 2 ← ( d + 1) − n 1 for i := 1 to n − d j ← i + d c [ i, j, n 1 , n 2 ] ← ∞ r [ i, j, n 1 , n 2 ] ← Nil for k := i to j ( Num b er of no des in the left and right subtrees. ) l ← k − 1 r ← n − k if n 1 ≥ 1 Use o ne c heap lo cat io n fo r t he ro ot; ( No w, there are only n 1 − 1 c heap lo cations av aila ble. ) for n ( L ) 1 := max { 0 , ( n 1 − 1) − r } to min { l , ( n 1 − 1) } n ( L ) 2 ← l − n ( L ) 1 n ( R ) 1 ← ( n 1 − 1) − n ( L ) 1 n ( R ) 2 ← r − n ( R ) 1 ( ⋆ ) T ′ ← x k T [ i, k − 1 , n ( L ) 1 , n ( L ) 2 ] T [ k + 1 , j, n ( R ) 1 , n ( R ) 2 ] c ′ ← c 1 · w i,j + c [ i, k − 1 , n ( L ) 1 , n ( L ) 2 ] + c [ k + 1 , j, n ( R ) 1 , n ( R ) 2 ] if c ′ < c [ i, j, n 1 , n 2 ] r [ i, j, n 1 , n 2 ] ← k c [ i, j, n 1 , n 2 ] ← c ′ Figure 3.6 p rocedure TL-phas e-I I 46 3.4.1.4 Correct ness of algorithm TwoLe v el algorithm TwoLevel calls procedure TL-p hase-I and p ro cedure TL-phase- I I , whic h implem en t dynamic programming to build larger and larg er subtree s of min- im um cost. The principle of optimalit y clearly applies to t he problem of constructing an optimum tree—eve ry subtree of an optimal tree must also b e optimal give n the same n um b er of memory lo cations of eac h kind. Therefore, algorithm Tw oLevel correctly computes an optimum BST ov er the entire set of k eys. 3.4.1.5 Running time of algorithm Tw oLev el The running time of algorithm Tw oLevel is prop ortiona l to the n um b er of times o v era ll that the lines mark ed with a star ( ⋆ ) in TL- phase-I and TL-phase -I I are executed. Let m = min { m 1 , m 2 } b e the size of the smaller of the t w o memory lev els. The n um b er of times that the line in algorithm TL-phase- I mark ed with a star ( ⋆ ) is executed is n − m X d =0 n − d X i =1 ( R [ i + 1 , j ] − R [ i, j − 1] + 1) = n − m X d =0 ( R [ n − d + 1 , n + 1] − R [1 , d − 1] + n − d ) ≤ n − m X d =0 2 n = 2 n ( n − m + 1) = O ( n ( n − m )) . The n um b er of times that the line ( ⋆ ) in procedure TL-phase-I I is executed is at most n − 1 X d = m min { m 1 ,d +1 } X n 1 =0 n − d X i =1 i + d X k = i m. A simple calculation sho ws tha t the tw o summations in v olving d and i iterate O ( n − m ) times each, the summation ov er n 1 iterates O ( n ) times, and the innermost summation has O ( n ) terms, so that the n um b er of times that the starred line is executed is O ( mn 2 ( n − m ) 2 ). Therefore, the total running time of algorithm TwoLevel is T ( n, m ) = O ( n ( n − m ) + mn 2 ( n − m ) 2 ) = O ( mn 2 ( n − m ) 2 ) . (3.8) 47 In general, T ( n, m ) = O ( n 5 ), but T ( n, m ) = o ( n 5 ) if m = o ( n ), and T ( n, m ) = O ( n 4 ) if m = O ( 1 ), i.e., the smaller lev el in memory has only a constant n um b er of memory lo cations. This case would arise in architecture s in whic h the faster memory , suc h as the primary cac he, is limited in size due to pra ctical considerations such as monetary cost and the cost of cache coherence proto cols. 3.4.2 Constructing a nearly optim u m BST In this section, we consider the pro blem of constructing a BST on the HMM 2 mo del that is close to optimum. 3.4.2.1 An appro ximation algorithm The follow ing top-down recursiv e a lg orithm, algorithm Appr o x-BST of figures 3.7 and 3.8 , is due t o Mehlhorn [Meh84]. Its analysis is adapted from the same source. The in tuition b ehind algorithm Appro x- BST is to c ho ose the ro ot x k of the subtree T i,j so that the we igh ts w i,k − 1 and w k +1 ,j of the left a nd righ t su btrees are as close to equal as p ossible. In ot her w ords, w e c ho ose the key x k to b e the ro ot suc h that | w i,k − 1 − w k +1 ,j | is as small as p o ssible. Then, w e recursiv ely construct t he left and right subtrees. Once the tr ee ˜ T has b een constructed by the ab o v e heuristic, w e o ptimally assign the no des of ˜ T to memory lo cations using Lemma 8 in O ( n log n ) additio na l time. Algorithm Ap pro x- BST implemen ts the ab ov e heuristic. The parameter l represen ts the depth of the recursion; initially l = 0, and l is incremen ted by one whenev er the algorithm recursiv ely calls itself. The parameters lo w l and high l represen t the lo w er and upp er b o unds on the range of the probability distribution spanned by the k eys x i through x j . Initially , low l = 0 and high l = 1 b ecause the ke ys x 1 through x n span the en tir e range [0 , 1 ]. Whenev er t he r o ot x k is c hosen, according to the ab o v e heuris tic, to lie in the middle of this rang e, i.e., mid l = (low l + high l ) / 2, the span of the key s in the left subtree is bounded by [lo w l , med l ] and the span o f the k eys in the righ t subtree is b ounded b y [ med l , high l ]. These are the ra ng es passed as parameters to the t w o recursiv e calls of the algorithm. 48 Define s 0 = q 0 2 s i = s i − 1 + q i − 1 2 + p i + q i 2 for 1 ≤ i ≤ n (3.9) By definition, s i = q 0 2 + i X k =1 p k + i − 1 X k =1 q k + q i 2 = w 1 ,i − q 0 2 − q i 2 (3.10) Therefore, s j − s i − 1 = w 1 ,j − w 1 ,i − 1 + q i − 1 2 − q j 2 = w i,j + q i − 1 2 − q j 2 b y definition 1.1 (3.11) In Lemma 13 b elow, w e sho w that at each leve l in the recursion, the input parameters to Appro x-BST () satisfy lo w l ≤ s i − 1 ≤ s j ≤ high l . 3.4.2.2 Analysis of the runnin g time W e pro v e t ha t the running time of algorit hm Appro x-BST is O ( n ). Clearly , the space complexit y is a lso linear. The running time t ( n ) of algorithm Ap pro x- BST can b e expressed b y the recur- rence t ( n ) = s ( n ) + max 1 ≤ k ≤ n [ t ( k − 1) + t ( n − k )] (3.12) where s ( n ) is the time t o compute the index k satisfying conditions (i), (ii), and (iii) giv en in the algo rithm, and t ( k − 1) and t ( n − k ) are the times for the tw o recursiv e calls. W e can implemen t the searc h for k as a binary search. Initially , c ho ose r = ⌊ ( i + j ) / 2 ⌋ . If s r ≥ med l , then k ≤ r , otherwise k ≥ r , and w e pr o ceed recursiv ely . Since this binary searc h take s O (log( j − i )) = O (log n ) time, the o v erall running time of algorithm Appro x-BST is t ( n ) = O (log n ) + max 1 ≤ k ≤ n [ t ( k − 1) + t ( n − k )] ≤ O (log n ) + t (0) + t ( n − 1) = O ( n log n ) . 49 Appro x-BST ( i, j, l , low l , high l ): med l ← (lo w l + high l ) / 2; Case 1: (the base case) if i = j Return the tree with three no des consisting o f x i at the ro ot and the external no des z i − 1 and z j as the left and right subtrees resp ectiv ely: x i z i − 1 z i Otherwise, if i 6 = j , then find k satisfying all the following three conditions: (i) i ≤ k ≤ j (ii) either k = i , or k > i and s k − 1 ≤ med l (iii) either k = j , or k < j and s k ≥ med l (Lemma 11 guarantee s tha t such a k alw ays exists.) (Con tin ued in figure 3.8 ) Figure 3.7 algorithm Appr ox-BST 50 (Con tin ued from figure 3.7) Case 2a: if k = i Return the tree with x i at the ro ot, the external no de z i − 1 as the left subtree, and the recursiv ely constructed subtree T i +1 ,j as the right subtree: x i z i − 1 Appro x-BST ( i + 1 , j, l + 1 , med l , high l ) Case 2b: if k = j Return the tree with x j at the ro ot, the external no de z j as the right subtree, and the recursiv ely constructed subtree T i,j − 1 as the left subtree: x j Appro x-BST ( i, j − 1 , l + 1 , lo w l , med l ) z j Case 2c: if i < k < j Return the tree with x k at the ro ot, and recursiv ely construct the left and right subtrees, T i,k − 1 and T k +1 ,j resp ectiv ely: call Appro x-BST ( i, k − 1 , l + 1 , low l , med l ) recursiv ely to construct the left subtree. call Appro x-BST ( k + 1 , j, l + 1 , med l , high l ) recursiv ely to construct the right subtree. Figure 3.8 algorithm Appr ox-BST (cont’d.) 51 Ho w eve r, if w e use exp onen tial searc h and then binary search to determine the v alue of k , then the o v erall running time can b e reduced to O ( n ) as follow s. In tuitiv ely , an exp o nen t ia l searc h follow ed b y a binary searc h finds the correct v alue of k in O (lo g( k − i )) time instead of O (log( j − i )) time. Initially , choose r = ⌊ ( i + j ) / 2 ⌋ . Now, if s r ≥ med l w e kno w k ≤ r , otherwise k > r . Consider the case when k ∈ { i , i + 1, i + 2, . . . , r = ⌊ ( i + j ) / 2 ⌋} . An exp onen tial searc h for k in this in terv al pro ceeds b y trying all v alues of k f r o m i , i + 2 0 , i + 2 1 , i + 2 2 , and so o n up to i + 2 ⌈ lg( r − i ) ⌉ ≥ r . Let g b e the smallest in teger suc h tha t s i +2 g ≥ med l , i.e., i + 2 g − 1 < k ≤ i + 2 g , or 2 g ≥ k − i > 2 g − 1 . Hence, lg( k − i ) > g − 1, so that the n um b er of comparisons made b y this exponen tial searc h is g < 1 + lg ( k − i ). No w, w e determine the exact v alue of k by a binary searc h on the in terv al i + 2 g − 1 + 1 through i + 2 g , whic h take s lg (2 g − 2 g − 1 )) + 1 < g + 1 < lg ( k − i ) + 2 comparisons. Lik ewise, when k ∈ { r + 1, r + 2, . . . , j } , a searc h for k in this interv al using exp onential and then binary searc h t a k es lg( j − k ) + 2 comparisons. Therefore, the time s ( n ) tak en to determine the v alue of k is at most d (2 + lg(min { k − i, j − k } )), where d is a constant. Hence, the running time of algorithm App ro x-BST is prop ortional to t ( n ) = max 1 ≤ k ≤ n ( t ( k − 1) + t ( n − k ) + d (2 + lg min { k , n − k } ) + f ) where f is a constant. By the symmetry of the expression t ( k − 1) + t ( n − k ), we hav e t ( n ) ≤ max 1 ≤ k ≤ ( n +1) / 2 ( t ( k − 1) + t ( n − k ) + d (2 + lg k ) + f ) . (3.13) W e prov e tha t t ( n ) ≤ (3 d + f ) n − d lg( n + 1) by induction on n . This is clearly true for n = 0. Applying the induction h yp o t hesis in the recurrence in equation (3.13), w e ha v e t ( n ) ≤ max 1 ≤ k ≤ ( n +1) / 2 (3 d + f )( k − 1) − d lg k + (3 d + f )( n − k ) − d lg( n − k + 1 ) + d (2 + lg k ) + f ) = (3 d + f )( n − 1) + max 1 ≤ k ≤ ( n +1) / 2 ( − d lg( n − k + 1) + 2 d + f ) = (3 d + f ) n + max 1 ≤ k ≤ ( n +1) / 2 ( − d lg( n − k + 1) − d ) . 52 The expression − d (1 + lg( n − k + 1)) is alwa ys negativ e a nd its v alue is maximum in the range 1 ≤ k ≤ ( n + 1) / 2 at k = ( n + 1) / 2. Therefore, t ( n ) ≤ (3 d + f ) n − d (1 + lg(( n + 1) / 2 )) = (3 d + f ) n − d lg ( n + 1) . Hence, the running time of algorithm App ro x-BST is O ( t ( n )) = O ( n ). Of course, if w e c ho ose to construct an optimal memory assignmen t for ˜ T , then the total running time is O ( n + n log n ) = O ( n log n ). 3.4.2.3 Qualit y of app rox imation Let ˜ T denote the binary searc h tree constructed by algorithm Appro x-BST . In the rest of this section, w e pro v e a n upp er b ound on ho w m uch the cost of ˜ T is w orse than the cost of an optim um BST. The follow ing analysis applies whether we c ho ose to construct an o ptimal memory assignmen t or to use the heuristic o f a lg orithm Appro x-BST . W e no w deriv e an upp er b ound on the cost of the tree, ˜ T , constructed b y algorithm Appro x-BST . Let δ ( x i ) denote t he depth of the in ternal no de x i , 1 ≤ i ≤ n , and let δ ( z j ) denote the depth of the external no de z j , 0 ≤ j ≤ n in ˜ T . (Recall that the depth of a no de is the n um b er o f no des o n the path from the ro o t t o that no de; the depth of the ro ot is 1.) Lemma 11. If the parameters i , j , lo w l , and high l to Appro x-BST () satisfy lo w l ≤ s i − 1 ≤ s j ≤ high l , then a k satisfying conditions (i), (ii), and (iii) stated in the algorithm alw ay s exists. Pro of: If s i ≥ med l , then c ho osing k = i satisfies conditions (i), (ii), and (iii). L ikew ise, if s j − 1 ≤ med l , then k = j satisfies all the conditions. Otherwise, if s i < med l < s j − 1 , then since s i ≤ s i +1 ≤ · · · ≤ s j − 1 ≤ s j , consider the first k , with k > i , such that s k − 1 ≤ med l and s k ≥ med l . Then k < j and s k ≥ med l , and this v alue of k satisfies all three conditions. Lemma 12. The parameters of a call to App ro x-B ST satisfy high l = lo w l + 2 − l . 53 Pro of: The pro of is b y induction on l . The initial call to Ap pro x-B ST with l = 0 has lo w l = 0 and high l = 1 . Whenev er the alg orithm recursiv ely constructs the left subtree T i,k − 1 in cases 2b and 2c, we ha v e low l +1 = lo w l and high l +1 = med l = (lo w l + high l ) / 2 = (2lo w l + 2 − l ) / 2 = low l + 2 − l − 1 = low l +1 + 2 − ( l +1) . On the other hand, whenev er the algorithm recursiv ely constructs the r ig h t subtree T k +1 ,j , in cases 2a and 2c, w e hav e high l +1 = high l and low l +1 = med l = high l +1 − 2 − ( l +1) . Lemma 13. The parameters of a call Appro x-B ST ( i, j, l , low l , high l ) satisfy lo w l ≤ s i − 1 ≤ s j ≤ high l . Pro of: The initial call is Ap pro x- BST (1 , n, 1 , 0 , 1). Therefore, s i − 1 = s 0 = q 0 ≥ 0 and s j = s n = 1 − q 0 / 2 − q n / 2 ≤ 1. Thus, the parameters to the initia l call t o Appro x-BST () satisfy the give n condition. The rest of the pro of follows by induction on l . In case 2a , the algorithm c ho oses k = i b ecause s i ≥ med l , and recursiv ely constructs the right subtree o v er the subse t of k eys from x i +1 through x j . Therefore, w e hav e low l +1 = med l ≤ s i ≤ s j ≤ high l = high l +1 . In case 2b, the algorithm c ho oses k = j b ecause s j − 1 ≤ med l , and then recursiv ely constructs the left subtree o ver the sub set of k eys from x i through x j − 1 . Therefore, w e ha v e low l +1 = lo w l ≤ s i − 1 ≤ s j − 1 ≤ med l = high l +1 . In case 2c, algorithm Appro x- BST c ho oses k suc h that s k − 1 ≤ med l ≤ s k and i < k < j . Therefore, during the recursiv e call to construct the left subtree ov er the subset of k eys f rom x i through x k − 1 , w e hav e lo w l +1 = low l ≤ s i − 1 ≤ s k − 1 ≤ med l = high l +1 . During the recursiv e call to construct the right subtree ov er the subset of k eys from x k +1 through x j , w e hav e low l +1 = med l ≤ s k ≤ s j ≤ high l = high l +1 . Lemma 14. D uring a call to Appro x-BST with parameter l , if an internal no de x k is created, then δ ( x k ) = l + 1 , and if a n external no de z k is created, then δ ( z k ) = l + 2 . Pro of: The proo f is b y a simple induction o n l . The ro ot, a t depth 1, is created when l = 0. The recursiv e calls to construct the left and righ t subtrees are made with the parameter l increme n ted b y 1. The depth of the external node created in cases 2a and 2b is one more than the depth of its paren t, and therefore equal to l + 2. 54 Lemma 15. F or eve ry internal no de x k suc h that 1 ≤ k ≤ n , p k ≤ 2 − δ ( x k )+1 and for ev ery external no de z k suc h that 0 ≤ k ≤ n , q k ≤ 2 − δ ( z k )+2 . Pro of: Let the in ternal no de x k b e created during a call t o Appro x-BST ( i, j, low l , high l ). Then, s j − s i − 1 ≤ high l − lo w l b y Lemma 13 = 2 − l b y Lemma 12 s j − s i − 1 = w 1 ,j − q j 2 − w 1 ,i − 1 + q i − 1 2 b y definition of s i − 1 and s j ≥ p k b ecause i ≤ k ≤ j . Therefore, b y Lemmas 13 and 12, for the internal no de x k ( i ≤ k ≤ j ) with probability p k , we hav e p k ≤ s j − s i − 1 ≤ 2 − l = 2 − δ ( x k )+1 b y Lemma 14. Lik ewise, for the external no de z k ( i − 1 ≤ k ≤ j ) with corresp onding probability of access q k , w e ha v e s j − s i − 1 = j X r = i p r + j − 1 X r = i − 1 q r + q j 2 − q i − 1 2 b y definition 3.10 = j X r = i p r + q i − 1 2 + j − 1 X r = i q r + q j 2 Therefore, since i − 1 ≤ k ≤ j , w e hav e q k ≤ 2( s j − s i − 1 ) ≤ 2(high l − lo w l ) b y Lemma 13 = 2 − l +1 b y Lemma 12 = 2 − δ ( z k )+2 b y Lemma 14 . Lemma 16. F or eve ry internal no de x k suc h that 1 ≤ k ≤ n , δ ( x k ) ≤ lg 1 p k + 1 55 and for ev ery external no de z k suc h that 0 ≤ k ≤ n , δ ( z k ) ≤ lg 1 q k + 2 . Pro of: Lemma 15 sho ws that p k ≤ 2 − δ ( x k )+1 . T aking logarithms of b oth sides to the base 2 , we ha v e lg p k ≤ − δ ( x k ) + 1; therefore, δ ( x k ) ≤ − lg p k + 1 = lg (1 /p k ) + 1. Since the depth of x k is an in teger, we conclude t ha t δ ( x k ) ≤ ⌊ lg (1 /p k ) ⌋ + 1. Lik ewise, for external no de z k , δ ( z k ) ≤ ⌊ lg(1 /q k ) ⌋ + 2. No w w e deriv e an upp er b ound o n cost( ˜ T ). Let H denote the en trop y of the proba- bilit y distribution q 0 , p 1 , q 1 , . . . , p n , q n [CT91], i.e., H = n X i =1 p i lg 1 p i + n X j =0 q j lg 1 q j . (3.14) If all the in ternal no des of ˜ T w ere stored in the exp ensiv e lo cations, t hen the cost of ˜ T would b e at most n X i =1 c 2 p i δ ( x i ) + n X j =0 c 2 q j ( δ ( z j ) − 1) ≤ c 2 n X i =1 p i lg 1 p i + 1 + n X j =0 q j lg 1 q j + 1 ! b y Lemma 16 = c 2 n X i =1 p i lg 1 p i + n X j =0 q j lg 1 q j ! + n X i =1 p i + n X j =0 q j !! = c 2 ( H + 1) b y definition 3.14 and b ecause n X i =1 p i + n X j =0 q j = 1. (3.15) 3.4.2.4 Lo w er b ounds The f o llo wing low er b ounds are kno wn f o r the cost of a n optimum binary search t r ee T ∗ on t he standard uniform-cost R AM mo del. Theorem 17 (Mehlhorn [Meh 75]). cost ( T ∗ ) ≥ H lg 3 56 Theorem 18 (De Prisco, De San tis [dPdS96]). cost ( T ∗ ) ≥ H − 1 − n X i =1 p i ! (lg lg ( n + 1) − 1) . Theorem 19 (De Prisco, De San tis [dPdS96]). cost ( T ∗ ) ≥ H + H lg H − ( H + 1) lg ( H + 1) . The lo w er b ounds of Theorems 17 and 19 are express ed only in terms of H , the en tr op y of the proba bility distribution. The smaller the en tropy , the tig hter the bo und of Theorem 17. Theorem 19 improv es on Mehlhorn’s low er b ound for H ' 15. Theorem 18 a ssumes kno wledge of n , and pro ves a lo w er b ound b etter than t ha t of Theorem 17 for large enough v alues of H . 3.4.2.5 Approxim ation b ound Corollary 20. The a lgorithm App ro x-BST constructs the tree ˜ T suc h that cost ( ˜ T ) − cost ( T ∗ ) ≤ ( c 2 − c 1 ) H + c 1 (( H + 1) lg( H + 1) − H lg H ) + c 2 . Pro of: Theorem 19 immediately implies a lo wer b ound of c 1 ( H + H lg H − ( H + 1) lg( H + 1)) on the cost of T ∗ . The r esult t hen follows from equation (3.15). F or la r g e enough v alues of H , H + 1 ≈ H so that lg( H + 1) ≈ lg H ; hence, ( H + 1) lg ( H + 1) − H lg H ≈ lg H . Th us, w e hav e cost( ˜ T ) − cost( T ∗ ) / ( c 2 − c 1 ) H + c 1 (lg H ) . (3.16) When c 1 = c 2 = 1 as in the uniform-cost RAM mo del, equation (3.16) is the same as the appro ximation b ound obtained b y Mehlhorn [Meh84 ]. 57 CHAPTER 4 Conclus ions and Op en Proble ms 4.1 Conclus ions The table of figure 4.1 summarizes our results for the pro blem of constructing an optim um binary search tree o v er a set of n k eys and the corresponding probabilities of access, on the general HMM mo del with an arbitrary num b er of lev els in the memory hierarc h y and on the tw o-lev el HMM 2 mo del. Recall that h is the num b er of memory lev els, and m l is the n um b er o f memory lo catio ns in lev el l for 1 ≤ l ≤ h . W e see from table 4.1 that algorithm P ar ts is efficien t when h is a small constan t. The running time of algorithm P ar ts is independen t of the sizes of the differen t mem- ory lev els. On the other hand, the running time of algorithm Trunks is p olynomial in n precisely when n − m h = P h − 1 l =1 m l is a constan t, ev en if h is large. Therefore, for instance, algorithm P ar ts would b e appropriate for a three-lev el memory hierar- c hy , where the binary searc h tree has to b e stored in cac he, main memory , and on disk. algorithm Tr unks would b e more efficien t when the memory hierarch y consists of man y leve ls and the last memory lev el is extremely large. This is b ecause algorithm Tr unks uses the sp eed-up tec hnique due to Kn uth [Knu71, Kn u73] and Y ao [Y ao82] to tak e adv antage of the fact that large subtrees of the BST will in fa ct b e stored entire ly in the last memory lev el. When h is larg e and n − m h is not a constan t, the relativ ely simple top-down algorithm, algorithm S plit , is the most efficien t. In particular, when h = Ω( n/ log n ), it is faster than algorithm P ar ts . F or the HMM 2 mo del, w e ha v e the h ybrid algorithm, algorithm TwoLevel , with running time O ( n ( n − m ) + mn 2 ( n − m ) 2 ), where m = min { m 1 , m 2 } is the siz e of the 58 Mo del Algorithm Section Running time HMM algorithm P ar ts 3.3.4 O 2 h − 1 ( h − 1)! · n 2 h +1 HMM algorithm Tru nks 3.3.5 O (2 n − m h · ( n − m h + h ) n − m h · n 3 / ( h − 2)!) HMM algorithm Split 3.3.6 O (2 n ) HMM 2 algorithm Tw oLevel 3.4.1 O ( mn 2 ( n − m ) 2 ) Figure 4.1 Summary of results smaller of the tw o memory lev els ( m ≤ n/ 2). Pro cedure TL- phase-I I of algorithm Tw oLevel is an implemen tation of algorithm P ar ts f or a sp ecial case. The running time of algorithm Tw oLevel is O ( n 5 ) in the worst case, the same as the w orst-case running time of algorithm P ar ts for h = 2. How ev er, if m = o ( n ), then algo- rithm TwoLevel outperfo rms algorithm P ar ts ; in particular, if m = Θ(1), then the running time of algorithm Tw oLevel is O ( n 4 ). None of o ur algorithms depend on the actual costs of accessing a memory lo cation in differen t lev els. W e state as an op en problem b elow whether it is p ossible to tak e adv an tage o f knowle dge of the relativ e costs o f memory accesses to design a more efficien t algorithm for constructing o ptim um BSTs. F or the problem of approximating an optimum BST on the HMM 2 mo del, we ha v e a linear-time a lgorithm, algorithm App ro x-BST of section 3 .4 .2, that constructs the tree ˜ T such tha t cost( ˜ T ) − cost( T ∗ ) ≤ ( c 2 − c 1 ) H + c 1 (( H + 1) lg( H + 1) − H lg H ) + c 2 where cost( T ∗ ) is the cost of an optim um BST. 4.2 Op e n problems 4.2.1 Efficien t heuristics W e noted ab ov e that o ur alg o rithms do no t assume an y relationship b et w een the costs c l of accessing a memory lo cation in level l , 1 ≤ l ≤ h . It should b e p o ssible to design an algorithm, more efficien t than a n y of the algorithms in this thesis, that takes adv an tage 59 of knowle dge of the memory costs to construct an optim um binary searc h t r ee. The memory cost function µ ( a ) = Θ(log a ) w ould b e esp ecially intere sting in this con text. 4.2.2 NP-hardness Conjecture 21. The problem of constructing a BST o f minim um cost on the HMM with h = Ω( n ) lev els in the memory hierarc h y is NP-hard. The dynamic pro gramming algorithm, algorithm P ar ts , of section 3.3.4 runs in time O ( n h +2 ), whic h is efficien t only if h = Θ(1 ). W e conjecture that when h = Ω( n ), the extra complexit y of the nu m b er of differen t wa ys to store the k eys in memory , in addition to computing the structure of an optimum BST, mak es the problem hard. 4.2.3 An algorithm efficien t on the HMM Although w e a re in terested in the problem of constructing a BST and storing it in memory such that the cost on the HMM is minimized, w e analyze the running times of our algorithms on the RAM mo del. It w ould b e in teresting to analyze the pattern of memory acce sses made b y the algorithms to compute an optimum BST, and optimize the running time of each of the algorithms when run on t he HMM mo del. 4.2.4 BSTs optim um on b oth the RAM and the HMM When is the structure of the optim um BST the same on the HMM as on the RAM mo del? In other words, is it p ossible to characterize when t he minim um- cost tree is the one that is optim um when the memory configuration is uniform? The fo llo wing small example demonstrates that, in general, the structure of an opti- m um tree on the uniform-cost RAM mo del can b e ve ry differen t from the structure of an optim um tr ee on t he HMM . T o disco ve r this example, w e used a computer program to p erform an exhaustiv e searc h. Consider a n instance of the problem o f constructing an optimum BST on the HMM 2 mo del, with n = 3 key s. The n um b er of times p i that the i -th k ey x i is accessed, fo r 1 ≤ i ≤ 3, and the n um b er of times q j that the searc h argumen t lies b etw een x j and 60 72 98 49 20 95 22 84 Figure 4.2 An optim um BST on the unit- cost RAM mo del. x j +1 , for 0 ≤ j ≤ 3, are: p i = h 98 , 72 , 9 5 i q j = h 49 , 20 , 2 2 , 84 i The p i ’s and q j ’s are the frequencies of access. They are not normalized to add up to 1, but such a tr a nsformation could easily b e made without c hanging the optim um solution. In this instance of the HMM mo del, there is o ne memory lo cation eac h whose cost is in { 4, 12, 14, 44, 66, 76, 8 2 } . The optim um BST o n the RAM mo del is sho wn in figure 4.2. Its cost on the RAM mo del with each lo cation of unit cost is 983, while the cost of the same tree on this instance of the HMM mo del is 1 6 , 752. On the other hand, the BST ov er the same set o f key s and frequencies that is optim um on this instance of the HMM mo del is show n in figure 4.3. Its cost on the unit-cost R AM mo del is 990 and on the ab ov e instance of the HMM mo del is 16 , 730. In figure 4.3, the no des of the tree are lab eled with the frequency o f the corresp onding k ey , and the cost of the memory lo cation where the no de is stored in square brac k ets. 4.2.5 A monotonicit y pr inc iple The dynamic progr a mming alg o rithms, algorithm P ar ts of section 3.3.4 and al- gorithm Tw oLevel of section 3 .4.1, iterate through the large num b er of p o ssible w a ys 61 95 [4] 98 [12] 49 [66] 72 [14] 20 [82] 22 [76] 84 [44] Figure 4.3 An optimum BST o n the HMM mo del. of partitioning the av ailable memory lo catio ns b etw een left a nd righ t subtrees. It w ould b e in teresting to disco v er a monotonicity principle, similar to the conca v e quadra ng le inequalit y , whic h w ould reduce the num b er of differen t optio ns t ried b y the algorithms. F or the problem of constructing an optimum BST on the HM M 2 mo del with only t w o different memory costs, w e w ere able to dispr ove the following conjectures b y giving coun ter-examples: Conjecture 22 (Dispro ved). If x k is the ro ot of an optim um subtree ov er the subset of k eys x i through x j in whic h m c heap locations are assigned to the left subtree, then the ro ot of an optimum subtree ov er the same subset of k eys in whic h m + 1 cheap lo cations are assigned to the left subtree m ust hav e index no smaller than k . Coun t er-example: Consider an instance of the problem of constructing an optim um BST o n the HMM 2 mo del, with n = 7 k eys. In this instance, t here are m 1 = 5 c heap memory lo cations such that a single access to a c heap lo cation costs c 1 = 5, and m 2 = 10 exp ensiv e lo cations such that a single access to an exp ensiv e lo catio n has cost c 2 = 1 5. The n um b er o f times p i that the i -th k ey x i is accessed, for 1 ≤ i ≤ 7, and the num b er 62 of times q j that the searc h argument lies b et wee n x j and x j +1 , for 0 ≤ j ≤ 7, are: p i = h 2 , 2 , 2 , 10 , 4 , 9 , 5 i q j = h 6 , 6 , 7 , 4 , 1 , 1 , 9 , 6 i The p i ’s and q j ’s are the frequencies of a ccess; they could easily b e normalized to add up to 1. An exhaustiv e search shows tha t the optim um BST with n ( L ) 1 = 0 c heap lo cations assigned to the left subtree ( a nd therefore, 4 c heap lo catio ns assigned to the rig h t subtree), with tota l cost 1 , 89 0, ha s x 3 at the ro ot. The optim um BST with n ( L ) 1 = 1 c heap locatio ns assigned to the left subtree (and 3 c heap lo cations assigned to the righ t subtree ), with total cost 1 , 77 0, ha s x 2 at the ro ot. This example disprov es conjecture 22. Conjecture 23 (Dispro ved). If x k is the ro ot of an optim um subtree ov er the subset of key s x i through x j in whic h m ch eap lo cations a r e assigned to the left subtree, then in the optim um subtree ov er the same subset of k eys but with x k +1 at the ro ot, the left subtree m ust hav e a ssigned no few er than m che ap lo cations. Coun t er-example: Consider an instance of the problem again with n = 7 k eys. In this instance, there are m 1 = 5 c heap memory lo cations suc h that a single access to a c heap lo cation costs c 1 = 9, and m 2 = 1 0 exp ensiv e lo cations suc h that a single access to an expensiv e lo cation has cost c 2 = 27. The num b er of times p i that the i -th k ey x i is a ccesse d, for 1 ≤ i ≤ 7, and the n um b er of t imes q j that t he searc h argumen t lies b et w een x j and x j +1 , for 0 ≤ j ≤ 7, are: p i = h 7 , 3 , 9 , 3 , 3 , 6 , 3 i q j = h 4 , 9 , 4 , 5 , 5 , 7 , 5 , 9 i As a result of an exhaus tiv e searc h, w e see that the optimum BST with x 4 at the ro ot, with to t a l cost 3 , 969, has 3 c heap lo cations assigned to the left subtree, and 1 c heap lo cation assigned to the rig ht subtree. How ev er, the o pt imum BST with x 5 at the ro ot, with total cost 4 , 068, has only 2 cheap lo cations assigned to the left subtree, and 2 c heap lo cations assigned to the righ t subtree. This example disprov es conjecture 23. Conjecture 24 (Dispro ved). [ Conjec ture of un imo dality ] The cost of an optimum BST with a fixed ro ot x k is a unimo dal function o f the n um b er of c heap lo cations assigned to the left subtree. 63 Conjecture 24 would imply tha t w e could subs tan tially improv e the r unning time of algorithm P ar ts of section 3.3.4. The h − 1 innermost lo ops of algorithm P ar t s eac h p erform a linear searc h for the o ptim um wa y to partition the a v ailable memory lo cations fr o m eac h lev el b etw een the left a nd rig ht subtrees. If the conjecture w ere t r ue, w e could p erfor m a discrete unimo dal searc h instead and reduce the o v erall running time to O ((log n ) h − 1 · n 3 ). Coun t er-example: A coun ter-example to conjecture 24 is the binary searc h tree ov er n = 15 k eys, where the frequencies of access are: p i = h 2 , 2 , 9 , 2 , 1 , 4 , 10 , 9 , 9 , 7 , 5 , 6 , 9 , 8 , 10 i q j = h 1 , 8 , 8 , 1 , 3 , 4 , 6 , 6 , 6 , 3 , 3 , 10 , 8 , 3 , 4 , 3 i The instance of the HMM mo del has m 1 = 7 c heap memory lo cations of cost c 1 = 7 and m 2 = 24 exp ensiv e lo cations of cost c 2 = 16. Through an exhaustiv e searc h, we determined that the cost of an optim um binary searc h tree with x 8 at t he ro ot exhibits the b eha vior sho wn in the graph o f figure 4.4 as the n um b er n ( L ) 1 of che ap lo cations assigned to the left subtree v aries from 0 through 6. (As the r o ot, x 8 is alwa ys assigned to a c heap lo cation.) The graph of figure 4.4 plots the costs of the optim um left a nd right subtrees of the ro ot and their sum, a s the n um b er of c heap lo cations assigned to the left subtree increases, or equiv alen tly , as the n um b er of c heap lo cations assigned to the r ig h t subtree decreases. (Note that the total cost of the BST is o nly a constant more t ha n the sum of the costs of the left and righ t subtrees since the ro ot is fixed.) W e see fr o m the graph that the cost of an optimum BST with n ( L ) 1 = 4 is greater than that for n ( L ) 1 = 3 and n ( L ) 1 = 5; th us, t he cost is not a unimo dal function of n ( L ) 1 . 4.2.6 Dep endenc e on the parameter h Do wney and F ello ws [DF99] define a class of parameterized problems, called fixe d- p ar ameter tr actable (FPT). Definition 25 (Do wney , F ellows [DF99]). A parameterized problem L ⊆ Σ ∗ × Σ ∗ is fixed-parameter tractable if there is an algorit hm that correctly decides for input ( x, y ) ∈ Σ ∗ × Σ ∗ , whether ( x, y ) ∈ L in time f ( k ) n α , where n is the size of the main part of the input x , | x | = n , k is t he inte ger parameter which is the length of y , | y | = k , α is a constant indep enden t of k , and f is an a r bitrary f unction. 64 2 4 6 8 2000 3000 4000 5000 6000 3376 2773 2440 2251 2062 1963 1864 1792 2425 2570 2705 2902 3082 3425 3874 4720 5801 5343 5145 5153 5144 5388 5738 6512 5801 Sum 2425 Right subtree 3376 Left subtree Figure 4.4 The cost of an optimu m BST is not a unimo dal function. The b est algorithm we ha v e for the general problem, i.e., for arbitrary h , is algo- rithm P ar ts of section 3.3.4, whic h runs in time O ( n h +2 ). Consider the case where all h lev els in the memory hierarc h y ha v e roughly the same n um b er of lo cations, i.e., m 1 = m 2 = . . . = m h − 1 = ⌊ n/h ⌋ and m h = ⌈ n/h ⌉ . If the n um b er of lev els h is a parameter to the problem, it remains op en whether this problem is (strongly unifo r mly) fixed-parameter tractable—is there an algorithm to construct an opt imum BST t hat runs in time O ( f ( h ) n α ) where α is a constan t indep enden t of b oth h and n ? F or instance, is there an algorithm with running time O (2 h n α )? Recall that w e ha ve a top- do wn algo- rithm ( algorithm Split of section 3.3.6) that runs in time O (2 n ) for the case h = n . A p ositiv e answ er to this question would imply that it is feasible to construct optim um BSTs o v er a la rge set of k eys f o r a larger rang e of v alues of h , in par t icular, ev en when h = O (log n ). 65 References [AA CS87] A. Agg a rw al, B. Alp ern, A. K. Chandra, and M. Snir. A mo del for hierar - c hical memory . In Pr o c e e dings of the 19th A CM Symp osium on the Th e ory of Computing , pages 305–31 4, 1987 . [ABCP98] B. Awerbuc h, B. Berger, L. Co w en, a nd D. Peleg. Near-linear time construc- tion of sparse neighborho o d cov ers. SI AM Journal on Com p uting , 28 (1):263– 277, 1998. [A C88] A. Aggarwal and A. K . Chandra. Virtual memory algorit hms. In Pr o c e e dings of the 20th A CM Symp osium on the The ory of Computing , pa ges 173–185, 1988. Preliminary V ersion. [A CFS94] B. Alp ern, L. Carter, E. F eig, and T. Selk er. The uniform memory hierarch y mo del of computation. Algorithmic a , 12:72–1 0 9, 1994. [A CS87] A. Aggarw al, A. K. Chandra, and M. Snir. Hierarc hical memory with blo c k transfer. In Pr o c e e dings of the 2 8th IEEE Symp os i um on F oundations o f Computer Scienc e , pages 204–2 16, 198 7. [A CS90] A. Ag garw al, A. K. Chandra, and M. Snir. Comm unication complexit y of PRAMs. The or etic a l Computer Sci e n c e , 71 :3 –28, 1990. [A V8 8] A. Aggarw al and J. S. Vitter. The input/output complexit y of sorting and related problems. Communic ations of the ACM , 31(9):1116 –1127, Septem b er 1988. [A VL6 2] G. M. Adel’son-V el’skii and E. M. Landis. An algorithm for the organization of information. Soviet Mathematics Dokla dy , 3:1 259–1263 , 1 962. [BC94] D. P . Bov et and P . Crescenzi. Intr o duction to the The ory of Com p lexity . Pren tice Hall, 199 4. [CGG + 95] Y.- J. Chiang, M. T. Go o dric h, E. F. Gr o v e, R. T amassia, D . E. V engroff, a nd J. S. Vitt er. External-memory graph algor it hms. In Pr o c e e dings of the Sixth A nnual ACM-SIAM Symp o sium on Disc r ete Algorithms (S an F r ancisc o, CA, 1995) , pages 139–1 49, 199 5. [CJLM99] S. Chatterjee, V. V. Jain, A. R. Leb ec k, a nd S. Mundhra. No nlinear array la y o uts for hierarc hical memory systems. In Pr o c e e dings of the ACM Inter- national Confer enc e on Sup e r c omputing, Rho des, Gr e e c e , June 1999. 66 [CKP + 96] D. E. Culler, R. M. Kar p, D. P atterson, A. Sahay , E. E. Sa n tos, K. E. Sc ha user, R . Subramonian, and T. v on Eic ken . LogP: A practical mo del of parallel computation. Com munic ations o f the ACM , 39(11 ):78–85, 1996. [CLR90] T. H. Cormen, C. E. Leiserson, and R. L. Riv est. Intr o duction to Algorithms . MIT Press, 1990. [CS] S. Chatterjee and S. Sen. Cac he-efficien t matrix transp o sition. [Online] ftp://ftp.c s.unc.edu/pub /users/sc/pa pers/hpca00.pdf [Septem b er 17, 2000]. [CT91] T. M. Cov er and J. A. Thomas. Elements of Information The ory . Wiley , 1991. [DF99] R. G. D owney and M. R. F ello ws. Par ameterize d Complexity . Monographs in Computer Science. Springer, 1999. [dPdS96] R. de Prisco and A. de Santis . New lo wer b ounds on the cost of binar y searc h trees. The or etic al Computer Scie n c e , 156 (1–2):315 – 325, 1996. [GI99] J. Gil and A. Itai. Ho w to pack trees. Journal of Algorithms , 32(2):1 0 8–132, 1999. [GJ79] M. R. Garey and D . S. Johnson. Com p uters and Intr actability: A Guide to the The ory of NP-Comp leteness . W. H. F reeman a nd Co., 1 979. [GS73] D. D. Grossman and H. F. Silv erman. Placemen t of records on a secondary storage device to minimize access time. Journal o f the ACM , 20(3) :4 29–438, July 1973. [HK81] J. Hong and H. Kung. I/O-complexit y: The red blue p ebble game. In Pr o- c e e din g s of ACM Symp osium o n The ory of Co mputing , 1981. [HLH92] E. Hagersten, A. Landin, and S. Har idi. DD M—a cac he-only memory archi- tecture. IEEE Computer , pages 44 –54, Septem b er 1992. [HP96] J. L. Hennessy a nd D . A. P atterson. Comp uter Ar ch i te ctur e: A Quantitative Appr o ach . Morgan Kaufmann, 2 nd edition, 1996. [HR76] L. Hy afil and R. L. Riv est. Constructing o ptimal binary decision trees is NP-complete. Information Pr o c es s i n g L etters , 5 (1):15–17 , Ma y 19 7 6. [HT71] T. C. Hu and A. C. T uc k er. Optimal computer searc h trees and v ariable-length alphab etical codes. SIAM Journal on Applie d Mathematics , 2 1(4):514– 5 32, Decem b er 1971 . [Huf52] D. A. Huffman. A metho d for the construction of minim um redundancy co des. Pr o c e e dings of the Institute o f R a d io Engine e rs , 40 ( 9 ):1098–11 01, Septem b er 1952. [JW94] B. H. H. Juurlink and H. A. G. Wijshoff. The parallel hierarch ical memory mo del. In Algorithm The ory — SW A T , num b er 82 4 in Lecture Notes in Computer Science, pages 240 –251. Spring er- V erlag, 1994 . 67 [Kn u71] D. E. K nuth. Optim um binary searc h trees. A cta In f o rmatic a , 1:14–2 5 , 1971 . [Kn u73] D. E. Kn uth. The Art of Computer Pr o gr amming, vol. 3: Sorting and Se ar ch- ing . Addison-W esley , 1973. [LL96] A. LaMarca and R. E. Ladner. The influence of cache s on the p erfor- mance o f heaps. Journal of Exp erimental A lgorithmics , 1(4), 1 9 96. [Online] http://www. jea.acm.org/1 996/LaMarcaI nfluence/ [Septem b er 1 7, 2 000]. [LL99] A. LaMarca and R. E. Ladner. The influenc e of cac hes on the p erformance of sorting. Journal of Algorithms , 31(1 ):66–104, 19 99. [Mak95] L. Mak. The Powe r of Par a l lel T ime . PhD thesis , Unive rsit y of Illinois a t Urbana-Champaign, May 1995. [Meh75] K. Mehlhorn. Nearly optimal binar y search trees. A cta Inform a tic a , 5 :287– 295, 1975. [Meh84] K. Mehlhorn. Data Structur es and Algorithms 1: So rting and Se a r ching . EA TCS Monographs on Theoretical Computer Science. Springer-V erlag, 1984. [Nag97] S. V. Nagara j. Optimal binary searc h trees. The or etic al Comp uter Scien c e , 188:1–44, 1997. [NGV96] M. H. No dine, M. T. Go o dric h, a nd J. S. Vitt er. Blo c king for external graph searc hing. A lgorithmic a , 16(2):18 1–214, August 1996. [P a p95] C. H. P apadimitriou. Com p utational Com plexity . Addison-W esley , 1995. [PS85] F. P . Preparata and M. I. Shamos. Computational Ge ometry: A n I ntr o duc- tion . T exts and Monographs in Computer Science. Springer-V erlag, 1985. [PU87] C. H. Papadimitriou and J. D. Ullman. A comm unication-time tr a deoff. SIAM Journal on Computing , 16(4) :6 39–646, August 1987. [Reg96] K. W. Rega n. Linear time and memory-efficien t computation. SIAM Journal on Computing , 25(1):133 –168, F ebruary 1 996. [Sa v98] J. E. Sav ag e. Mo del s of Computation: Exploring the Power of Com p uting . Addison-W esley , 19 98. [Smi82] A. J. Smith. Cac he memories. A CM Com puting Surveys , 14(3):473 –530, Septem b er 1982. [ST85] D. D . Sleato r and R. E. T arjan. Self-adjusting binary searc h trees. Journal of the Asso cia tion fo r Com puting Machinery , 32(3):65 2–686, July 1985. [V al89] L. G. V alia n t. Bulk sy nc hronous parallel computers. In M. Reev e a nd S. E. Zenith, editors, Pa r al lel Pr o c essin g and A rtificial Intel ligenc e . Wiley , 1989. ISBN 0-471- 92497- 0 . [V al90] L. G . V aliant. A bridging mo del for para llel computation. Communic ations of the ACM , 33(8):1 0 3–111, August 1990. 68 [Vit] J. S. Vitter. External memory algo rithms and data structures: Dealing with massiv e data. T o app ear in ACM Computing Surveys . [Wil87] A. W. Wilson Jr. Hierarc hical cac he/bus a rc hitecture for shared memory m ultipro cessors. In Pr o c e e ding s of the F ourte enth International Symp osium on Computer Ar chite ctur e , pages 244–25 2, June 1 987. [Y ao82] F. F. Y ao. Sp eed-up in dynamic pro gramming. SIAM Journal on Algebr aic Discr ete Metho ds , 3 ( 4):532–54 0 , 1 982. 69
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment