Priority Queue Based on Multilevel Prefix Tree
Tree structures are very often used data structures. Among ordered types of trees there are many variants whose basic operations such as insert, delete, search, delete-min are characterized by logarithmic time complexity. In the article I am going to…
Authors: ** 논문에 명시된 저자는 **불명**이며, 참고문헌에만 “
Priorit y Queue bas ed o n m ultilev el prefix tree Da vid S. P laneta ∗ August 7, 2021 Abstract T ree structures are ve ry often used data structures. Among ordered types of trees there are many v ari ants whose basic op erations such as insert, delete, searc h, delete-min are c haracterized by logarithmic time complexity . In the article I am going to presen t th e structu re whose time complexity for each of the ab ove op erations is O ( M K + K ), where M is the size of data typ e and K is constant prop erly matching the size of data type. Properly matc hed K will make the structure func- tion as a v ery effective Priorit y Qu eue. The structure size linearly de- p ends on the number and size of elements. PT rie is a clever combina- tion of the idea of p refix tree – T rie, structure of logarithmic time com- plexity for insert and remov e op erations, doub ly linked list and qu eues. 1 In tro duction Prior it y T rie (PT rie [5]) uses a few structures including T rie of 2 K degree [1 ], which is the str ucture core. D ata r ecording in PT rie consists in breaking the word into parts whic h make the indexes of the following lay ers in the str ucture (table lo o k-at). The la st layers con tain the addr esses of do ubly linked list’s no des. Each o f the list no des stores the queue [3], into which the elements are inser ted. Moreover, ea ch layer contains the structure of lo garithmic time complexity of inser t and remov e oper ations. Which help to define the destination of data in the do ubly linked list [3]. They can b e v arious v a riants of order ed trees [3] or a skip list [4]. 1.1 T erminology Bit pattern is a set of K bits. K (length of bit pattern) defines the num ber of bits which are cut o ff the binary word. M defines num ber (length) of bits in a binary w ord. ∗ dplaneta@gmail.com 1 v a lue of word = M z }| { 101 ... 1 | {z } K 00101 ... N is num ber of a ll v alues o f P T r ie. 2 K is v ariation K of element binar y set { 0 , 1 } . It determines the n umber of groups (num ber of Layers [Figure 1]), which the bit pattern may be divided into during one step (one level). The set of v a lues decomp osed into the group b y the fir st K bits (the version o f a lgorithm describ ed in paper was implemen ted by machine of little-endian type). The Figure 1: Lay er A C B D . . . log 2 P = log 2 2 K = K The Structure of l og- arithmics time com- plexit y of insert and remov e op erations MIN MAX 00 .... 00 G 1 00 ... 01 G 2 . . . 11 ... 11 G P P = 2 K path is de fined s tarting fr om the most imp or tant bits of v ariable. The v alue of patter n K (index) determines the lay er we move to [Figure 2]. The low est lay ers determine the no des of the lis t which store the q ueues for ins erted v alues. L defines the level the lay er is on. Pro bability that e xactly G keys corr esp ond to one pa rticular pa ttern, where fo r each of P L sequences o f leading bits there is such a no de that co rresp onds to at least t wo keys equa ls N G P − GL (1 − P − L ) N − G F o r r andom PT rie the average n um b er o f layers o n level L , for L = 0 , 1 , 2 , . . . is P L (1 − (1 − P − L ) N ) − N (1 − P − L ) N − 1 If A N is av erage n um b er of layers in ra ndom PT rie of degr ee P = 2 K containing N k eys. Then A 0 = A 1 = 0, and for N ≥ 2 we get [2 ]: A N = 1 + X G 1 + ... + G P = N N ! G 1 ! . . . G P ! P − N A G 1 + . . . + A G P = 1 + P 1 − N X G 1 + ... + G P = N N ! G 1 ! . . . G P ! A G 1 = 2 Figure 2: PT rie A C B D . . . Θ( log 2 2 K ) = O ( K ) Lay er MIN MAX 00 .... 00 G 1 00 ... 01 G 2 . . . 11 ... 11 G P P = 2 K Lay er Lay er Lay er . . . . . . . . . . . . . . . . . . . . . . . . Θ( log 2 K N ) = Θ( lgN lg 2 K ) = O ( M K ) L 1 L 2 L log 2 K N N ode Queue N ode Queue N ode Queue N ode Queue N ode Queue T ail Head 3 1 + P 1 − N X G N G P − 1 N − G A G = 1 + 2 G (1 − N ) X G N G 2 G − 1 N − G A G 2 Implemen tation Op eration Description Bound create Creates ob ject O (1) insert(data) Adds e lemen t to the structure. O ( M K + K ) bo olean r emov e(data) Remov es v alue from the tr ee. If op eration fa iled b ecaus e there was no such v alue in the tree it re- turns F ALSE(0 ), other wise returns TRU E(0). O ( M K + K ) bo olean sea rch(data) Lo oks for the words in the tree. If finds return TRU E(1), otherwise F ALSE(0). O ( M K ) *minimum () Returns the address of the lo west v a lue in the tree, or empty address if the op eration failed because the tree was empt y . O (1) *maximum() Returns the a ddress of the highest v a lue in the tr ee or empty address if the op eration failed because the tree was empt y . O (1) next Returns the addr ess of the next no de in the tree or empty address if v alue transmitted in parameter was the greatest. The order of moving to successive elements is fixe d - from the smallest to the la rgest and fro m “the y oungest to the oldest” (stable) in c ase o f identical words. O (1) back Similar to ‘next’ but it retur ns the address of pr eceding no de in the tree. O (1) Basic op erations can be joined. F or example, the effect connected with the heap; delete-min() c an b e replaced by op erations remov e(minim um()). 2.1 Insert Determine the interlinked index (p ointer) to ano ther layer using the length of pattern pr o jecting o n the word. If in terlink determined by index is not empty and indicated the list no de – 4 try to inser t the v a lue into the queue of determined no de. If the elemen ts in the queue tur n out to be the same, insert v alue in to the queue. Otherwise , if elements in the q ueue are differe n t from the inserted v alue, the no de is “pushed” to a low er level and the hitherto existing lev el (the place of no de) is complement ed with a new layer. Nex t, try again to insert the element, this time how ever, into the newly created lay er. Else , if the interlink determined by index is empty , insert v alue of index int o the o rdered binary tree from the curr ent lay er [Figur e 3]. F a ther of a newly created no de in order ed bina ry tr ee from the curr ent lay er determines the place for leav es; If the new ly created node in or dered binary tree is on the right side of father (added index > father index), the v alue added to the list will be inserted after the node determined by father index a nd the path of the highest indexes (mak e use of p ointer ‘max’ o f the lay ers – time cost O (1)) of lower level layers. If newly cr eated no de is on the left s ide of father (added index < father index), the v alue a dded to the list will b e inser ted b e- fore the no de determined by father index and the path of the smallest indexes (make use of p ointer ‘min’ of the layers – time cost O (1)) o f low er level layers. One can wonder wh y we use the queue a nd not the stack or the v alue Figure 3: Insert v alue of index in to the order ed binary tree fro m the lay er A C B D La ye r insert(index) < < < . . . MIN MAX 00 .... 00 00 ... 01 . . . 11 ... 11 counter. V a lue coun ter c annot b e used becaus e complex elements can be in- serted int o PT rie structure, distinguishable in the tree only b ecause of so me words. Also , it is not a g o o d idea to use a stack b ecause the q ueue makes the structure stable. And this is a very useful characteristic. I used “plain” Bi- nary Search T ree in the structur e of loga rithmic time co mplexity . F or a small nu mber of tree no des it is a very g o o d s olution b eca use for K = 4, 2 K = 16. So in the tree there may be max im um 16 (differen t) ele men ts. F or suc h a small amount of (differen t) v alues the remaining orde red trees will probably turn out to b e at mos t a s effectiv e as unusually simple Binary Search T r ees. 5 2.1.1 Analysis In case of rando m data it will tak e Θ( lgN lg 2 K ) = Θ( l og 2 K N ) = O ( M K ) goings through layers to find the place in the hea p cor e – T rie tree. On a t least o ne lay er of P T r ie structur e we will use inserting in to the ordere d binary tree in whic h maximum n um ber of nodes is 2 K . While inserting the new v alue I need informa- tion where exactly it will be lo cated in the list. Suc h infor mation ca n be obtained in tw o wa ys; I will get the information if the representation of the nearest index on the list is to the left o r to the right s ide of the inser ted word index. It ma y happ en that in the structure there is already is exac tly the same word as the in- serted one. In such case v alue index w on’t be inserted in to an y lay er of the PT rie bec ause it will no t be necessar y to add a new no de of the list. V alue will b e in- serted in to the queue of already existing n o de. T o sum up, while m oving thr ough the lay ers of PT rie we c an stop at some level b ecause o f empt y index. Then, a no de will b e a dded to the list in place determined by binary search tree and the remaining pa rt of the path. This is wh y the b ound of op eration which inser ts new v alue in to P T r ie equa ls Θ( log 2 K N + l og 2 2 K ) = Θ( l og 2 K N + K ) = O ( M K + K ). 2.2 Find Metho d find like in case of pla in T rie tr ees g o es thro ugh succeeding lay ers fol- lowing the path determined by binary repr esentation o f sear ch v a lue. It can b e stated that it uses num ber key as a guide while moving down the cor e of P T rie – pre fix tree. In case of searching tre e things can happ en: • W e do n’t reach the node of the list b ecause the index w e determine is empt y on any of lay ers – s earching failure. • W e reach the no de but v alues fro m the queue a re different from the searched v alue – searching failure. • W e r each the no de and the v alues fro m the queue a re exactly like the ones we seek – sea rching s uccess. 2.2.1 Analysis Searching in prefix tree is very fast b ecause it finds the words using word key as indexes. In ca se of search failure the longest ma tc h o f a searched w ord is found. It must be taken into co nsideration that during op era tion ‘search’ we use o nly the attributes o f prefix tree. This is why the amount o f search nu mbers lo o ked thro ugh during the random sea rch is Θ( l og 2 K N ) = O ( M K ). 6 2.3 Remo v e Remov e method just like find method “moves down” the P T rie structure to seek for the element to b e deleted. If it do esn’t reach the no de of the list, or it do es but the sea rch v alue is different fr om the v alue of no de q ueue, it do es not delete any ele men t of PT rie b ecause it is not there. How ever if it r eaches the no de o f the list and s earch v alue turns out to b e the v alue from the queue – it removes the v a lue from the queue. If it remains empty after removing the element from the queue the no de will b e removed from the list and will r eturn to the “upp er” lay ers of prefix tree to delete p ossible, remaining, empt y lay ers. 2.3.1 Analysis Since it is p oss ible not only to go down the tre e but also come back upw ards (in c ase of deleting of the low er layer or the no de o f the list) the to tal leng th of the path move on is limited Θ(2 l og 2 K N ). If delete the lay er, it means there was only one wa y down from that layer, which implicates the fact tha t the o r- dered binary tree of a given layer contained only o ne no de (index). The layer is remov ed if it remains empt y a fter the remov al of no de fro m ordered binary tree. So the num be r of op eration neces sary for the remov al of the lay er con- taining one element equals Θ(1). In case of remov a l of layer L i , if ordered binary tree of higher level layer L i − 1 , despite removing the no de which de- termines empty layer we came from, do es not r emain empty it means that there co uld b e maximum 2 K no des in the ordered binar y tree . Op era tion of v a lue delete from o rdered binary tree amounts to Θ( l og 2 2 K ) = Θ( K ). There is no p oint of “ clim bing” up the upp er lay ers, since the layer we came from would no t b e empt y . A t this stage the metho d re mov e ends. T o sum up, worse time complexity o f remov e op eratio n is Θ(2 l og 2 K N + K ) = O ( M K + K ). 2.4 Extract minim um and maxim um If the list is not empty , ‘minimum’ rea ds the v alue p ointed by the he ad o f the list and ‘Maximum’ reads the v alue p ointed by the tail of the list. 2.4.1 Analysis Time complexity o f op era tions is Θ(1). 2.5 Iterators The no des of the list are linked. If we know the p o sition of one o f the no des, we hav e a direct access to its neig hbors. The ‘next’ op er ation rea ds the successor of current po int ed no de. The ‘prev’ op eration reads the predecess or of currently po int ed no de. 7 2.5.1 Analysis Moving to the no de its neigh bor requires only r eading o f the conten ts of the po int er ‘next’ o r ‘pr ev’. Time co mplexity o f such op er ations equals Θ(1). 3 Correctness PT rie has been desig ned lik e this, s o as not to assume that k eys hav e to be po sitive num bers or only integers - they can be even s trings (how ev er, in most cases the weigh t o f ar cs is represented by num bers). T o insert PT rie nega tive and p o sitive in tegers I use not one PT rie, but tw o! One o f the structures is des- tined exclusively fo r storing p ositive in tegers and the other one for stor ing only negative integers. The la tter structure o f PT rie is r esp onsible o nly for negative int egers - the in tegers are stored in reverse o rder on the list (for machine o f little- endian type). Therefore in cas e of the seco nd structure of PT rie (re spo nsible only for negative integers) I used standard opera tion of PT r ie: PT rie2.maximum to extract the smalle st v alue. Also r eal num b ers (for exa mple in ANSI IEE E 754-1 985 standa rd) can b e used of the description o f the weight of arcs on c on- dition that tw o interrelated s tructures of P T rie will b e used to put o ff exp o nen t and mantissa. It is po ssible, b ecause implementation of PT rie describ ed by me uses queue, which mak es it stable. One of the structures of P T r ie serves as storage for exp onent, where each of the no des of the list will cont ain additional structure of PT rie to store mantissa. 4 Conclusions Efficiency of P T rie (source co des [5 ]) considera bly dep ends on the length of pattern K . K defin es o ptional v alue, which is the p ow er of tw o in the range [1, min( M )]. The total s ize of necessary memor y b ound is prop ortiona l to Θ( log 2 K N (2 K +1 ) K ) be cause the n um b er of lay ers required to remember N ran- dom elements in PT rie o f deg ree 2 K equals lgN lgP ∗ P . Mo reov er, ea ch lay ers has tree of maximum size 2 K no des and table of the P -elemen ts, so the nec- essary memory b ound equal Θ( l og 2 K N ∗ 2 P ) = Θ( M K ∗ 2 K +1 ). F o r data types of constan t size maximum T r ie tree height equa ls M K . So the p essimistic op- eration time c omplexity is O ( M K + K ). F or example, for four-byte num bers it is the mos t effective to determine the pattern K = 4 bits long. Then, the pes simistic num ber of steps necessa ry for the op er ation on the PT rie will equal Θ( M K + K ) = 32 4 + 4 = 12. Increasing K to K = 8 does not increase the efficiency of the structure o per ation b ecause Θ( M K + K ) = 32 8 + 8 = 12 . What is mor e, in w ill unnecessar ily increase the memory demand. A single lay er consisting of P = 2 K groups for K = 8 will contain tables P = 2 8 = 256 lo ng, not when K = 4, only P = 2 4 = 16 links. F or v ariable size data the time complexit y equals Θ( l og 2 K N + K ). Moreov er, the length of patter n K must b e c arefully ma tc hed. F o r e xample, for s trings K should not be longer than 8 bits b ecause we could 8 accidentally read the co n tents from b eyond the string which no rmally cons ist of one- byte sign! I t is p oss ible to recor d da ta of v ariable siz e in the str ucture provided each of the analy zed words will end with identical key . There a re no obstacles for strings becaus e they normally finish with “end of line” sign. Owing to the rea ding of word keys and going thr ough indexe s (table lo ok -at), pr imary , partial op er ations of PT rie metho d are very fast. If we car efully match K with data type, PT r ie will certainly serve as a really effectiv e P riority Queue [6]. References [1] Re n ´ e de la Brianda is, File Se ar ching Using V ariable L ength Keys , P ro ceed- ings o f the W estern Joint Co mputer Co nference, 29 5-29 8, 1 959. [2] Do nald E. Knuth, The Art of Computer Pr o gr amming V ol. 3, Addison W es- ley Lo ngman, Inc. 19 98. [3] Do nald E. Knuth, The Art of Computer Pr o gr amming V ol. 1, Addison W es- ley Lo ngman, Inc. 19 98. [4] Willia m P ugh, Skip li sts ar e a data structur e that c an b e use d in plac e of b al anc e d tr e es , C ommun ications o f the ACM, 33(6 ) 668 -676, June 19 90. [5] David S. P laneta, PT rie: Priority Queue b ase d on multilevel pr efix tr e e , Cornell Universit y Computing and Information Science T echnical Rep orts, TR2006- 2023, 20 06. [O nline]. Av ailable: http ://te chrep orts.library. cornel l.edu :8081/Dienst/UI/1.0/Display/cul.cis/TR2006- 2023 Av aila ble [so urce]: http:/ /http ://ptrie.sourceforge.net [6] David S. P laneta, Line ar Time Algo rithms Base d on Multilevel Pr e- fix T r e e for Finding Shortest Path with Positive Weights and Min- imum Sp anning T r e e in a Networks , Cornell University Comput- ing and Information Science T echnical Rep or ts, TR200 6-204 3, 2006 . [Online]. Av aila ble: http:/ /tech reports.library.cornell.edu: 8081/D ienst /UI/1.0/Display/cul.cis/TR2006- 2043 9
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment