Binar Shuffle Algorithm: Shuffling Bit by Bit

Binar Shuffle Algorithm: S huffling Bit by Bit by William F. Gilreath August 2008 ( will@wil liamgilr eath.com ) Binar Shuffle Algorithm: S huffling Bit by Bit - 2 - Abstract Frequently, randoml y organized data is needed to avoid an anomalous ope ration of other algorithms and computational processes. An analog y is that a deck of cards is ordered within the pack, but before a game of poker or soli taire the deck is shuffled to c reate a random permutation. Shuffling is used to assure that an a ggregate of data elements for a sequ ence S is randoml y arranged, but avoids an ordered or partiall y ordered permutation. Shuffling is the process o f arranging data elements into a random permutation . The sequence S as an aggregation of N data elements, t here are N! possibl e permutations. For the large number of possible permutations, two of the possible permutation s are for a sorted or ordered placement of data elements--both an ascendin g and descending sorted permutation. Shuffling must avoid inadvertentl y creating either an as cending or descending permutati on. Shuffling is frequentl y coupl ed to another algorithmic function -- pseudo- random number generation. The efficiency and qualit y of the shuffle is directl y dependent upon the random number generation algorit hm utilized. A more effectiv e and efficient method of shuffling is to use parameterization to configure the shuf fle, and to shuffle into su b-arrays by utilizin g the encoding of the data elements . The binar shuffle algorithm uses the encoding of the data elements and parameterizati on to avoid any direct coupling to a random number generation algorithm, but still remain a linear O(N ) shuffle algorithm. Keywords: permutation, randomi ze, shuffle, sort, unsort Binar Shuffle Algorithm: S huffling Bit by Bit - 3 - Introduction 1. Concept of Shuffling Shuffling is a process of re-orderin g data elements of a sequence f rom an initial permut ation into a random arrangement of an arbitra ry permutation. A shuffle al gorithm scrambles data elements into a random placement withou t any apparent organizi ng key evident. Shuffling is not as promi nent or visible in the computer science dom ain of algorithms. However, shuffling is the inverse of so rting, a shuffle is an unsort function, and the logical complement of sorting algorithms as an unsorti ng algorithm. A shuffle algorithm is con ceptually similar to an unsort algorithm , thus conceptuall y the logical converse of a sorting algorit hm. Sorting places data elements of a sequenc e into a ver y spec ific permutation, whereas shu ffling puts the data elements in to all but an ordered permutation of the possible permutations. The correspondence between sortin g and shuffling leads to a possibl e formalization for the definition of shuffling ba sed upon sorting. The consi deration that a shuffle algorithm is unsort algorithm, the sorting algorit hm definition can be used to formul ate a definition for a shuffle algorithm. 2. Theoretical Definition of Shu ffling Shuffling is unworkable to define in the form o f an unsort algorith m--the converse definition of sorting with the formal definitio n of sorting using mathematical rel ations. However, a more tangible definition for sh uffling is a probabilistic definiti on. The probabilistic definition of shufflin g is: Given a sequence S of N record s R 0 , R 1 , ... , R n-2 , R n-1 that are arranged in a per mutation: p(0)p(1) ... p(n-2)p(n-1 ). The sequence S is considered shuffled if fo r the k possi ble selection of any two records R i and R j where i ≠ j, that the probabili ty of R i ≤ R j is equal to R i > R j for 0 < k ≤ N! . For the k th possible selection, the probabilit y P k is: P k (R i ≤ R j ) = P k (R i > R j ) where i ≠ j and 0 < k ≤ N! Thus after one or man y selections, that any two unique records t hat the probabilit y is equally likely for lesser or greate r relation. No matter how m any times two distinct records are selected, the overall probabilit y of lesser or greater remains equal—there is no bias. In contrast, a sorted sequence is de fined by the probabilit y as: For the k th possible selection, the probabilit y P k is: Binar Shuffle Algorithm: S huffling Bit by Bit - 4 - P k (R i ≤ R j ) = 1.0 where i < j and 0 < k ≤ N! For any two different rec ords in a sorted sequence where the re cords are in increasing positi ons it is always true that the records m aintain the lesser than relation. It is possible to define shuffling in finit e probabilities for each relati on, or: P k (R i ≤ R j ) = 0.5 ∧ P k (R i > R j ) = 0.5 where i ≠ j and 0 < k ≤ N! It is equally likel y f or less er or greater for any k th number of selections fro m the sequence. Shuffling is a random perm utation of the records in a sequence, thus for the selection of records the two relations are equall y likely for one, two, three, or m any selections. Conversel y, sorting is an ordered permutatio n so for two records withi n the sequence at increasing or decreasing p ositions, the relation of lesser and greate r than, respectivel y is always probabilisticall y certain. For a sorted sequence, the probabili stic definition of shuffling is invalid, an d vice-versa. Hence the probabilistic definition of sh uffling is converse wit h sorting. 3. Approaches to Shuffling One approach to shuffling, is to ut ilize the sequence generated b y one pseudo-random number generator is used in the shuffle for a seque nce of data elements. On e proposed approach [MacLaren and Marsaglia 1965] is using the sequences of two ps eudo-random nu mber generators to improve the random number generator properties. In effect, to shuffle to more full y randomize the stochastic property of a sequence generated b y a random number generator. But such an approach need not be restricted to improving the qualit y of a random number generator. Using randomly generated values for a sequence that is then sorted is an obviou s and intuitive approach to shuffling. An improvement to the ap proach of MacLaren and M arsaglia using only a single random sequence was devised [B ays and Durham 1976] and i s known as a Ba y s-Durham shuffle, or sometimes Ba y ’s shuffle. W hile an improvement, it st ill couples the random number generation and the specific algorith m used into the shuffle. In t he two algorithmic approaches, th e shuffle was to improve an already exist ent random number generator, hen ce the random number generation was alread y a part of the algorithm. With a shuffle of data elem ents to randomi ze or scramble the elements into a stochasti c permutation, coupling to the pe rformance complexit y of the random number generator is a fla wed approach. It seems almost inevitable that a random num ber generator al gorithm is coupled, and becom es part of the composition of shuffling. The shu ffle involves organizing the d ata elements, but this algorithm is then used as a composit e with the random number generatio n algorithm to create the shuffle algorithm. Knuth gives a shufflin g algorithm Algorithm M [Knuth 1998] In Algorithm M, there is a step: Binar Shuffle Algorithm: S huffling Bit by Bit - 5 - M2. [Ext ract j.] Set j  ≤ kY/M ƒ , where m is th e modulu s used in the seq uence 〈 Y n 〉 ; th at is, j is a rando m value 0 ≤ j < k , determ ined by Y. This step in the algorithm couples the shuffle algorithm to the random number generator. The shuffle algorithm formul ated by Bays and Durham [ Bays and Durham 1976] i s an optimization of Algorithm M, but has t he same algorithmic step as Algorit hm M. Later, Knuth describes the Fische r-Yates algorithm what he calls Al gorithm P . There is a step: P2. [Gen erate U. ] Gener ate a ra ndom num ber U, un iformly distribu ted betw een zero and one. Once again, the shuffle al gorithm is coupled to the random number g enerator algorithm. An improvement, and an im plicit constraint is that a shuffle algorithm is independent of the permutation of data elements. More sim ply, the shuffle algorithm is not directly coupled to a particular random number gener ation algorithm. 4. Summary Shuffling is like an unsort algorithm conceptuall y ; however, it is difficult to define shuffling strictly as the logical inverse of a sorti ng algorithm. A m ore workable approach to formaliz ing the concept of shufflin g is to use a probabilistic definiti on. The distinction in formalization highlights the primar y different in shuffling to sorting. Sorting has a mathematical relation among each data element in th e sequence, but shuffling has a probabilistic r elation among all the data elements. Binar Shuffle Algorithm: S huffling Bit by Bit - 6 - Algorithm Synopsis The binar shuffle algorith m utilizes the encoding of the data elements, m ore specificall y the bits, to partition the data elements into a random organiz ation. A data element is encoded with some arrangement of the bits i n particular structure. For each bit, the bi nar shuffle places the element in a lower sub-arra y for a 0-bit, or the element in an upp er sub-array for a 1-bit—partitio ning the array of data elements. Consider an ordinal b yte, an encoding of eight bits from a most significant bit (MSB) to a least significant bit (LSB). Follo wing the most significant bit to the least significant bit, and m oving the data elements into lo wer and upper sub-arra ys, respectively would creat e an ordered arrangement of the data elements —the binar sort. The sequence of bits for the encoding is the encoding order. However, t o shuffle, the bits are used to place the data ele ments in a random arrangement b y r andomly accessin g the bits from 0 to 7 of the 8-bits --but not in the encoding order, or the reverse encodi ng order. The 8-bits are put in to a bit schedule for how each bit i t used to place the element i nto one of the sub-arra ys. The M-bits of a data element are scheduled by the index position of the bits and the bit value itself. This bit schedule is u sed to shuffle the data elements, for a given bi t within the data element, and a bit value. The bit schedule is passed to the binar shuffle as two arrays, one for the specific bit value at an index , and the other for the bit positi on at an index. For an arra y of N-data elements, partiti oning of an arra y into two sub- arrays continues for each bit in the encoding in a data element. Thus fo r an 8-bit arra y of b y te data el ements, there are 8·N passes for a given bit sch edule to shuffle the arra y. Binar Shuffle Algorithm: S huffling Bit by Bit - 7 - Operation of Algorithm The binary shuffle is exactl y the same in the process as the binar sort, so t hat algorithm operation discussion is used. The difference in the bina r shuffle to the binar sort is that each bit is used to re-arrange into a non-ordered placement of d ata elements. Each bit is not used as a key to map a data element into a sub-a rray, but is compared agai nst the bit schedule for placement to form a random permutation. The binar shuffle algorith m operates both iterativel y and recursively in place on the original array passed as a paramet er initially. The algorithm operates in four disc rete steps similar to t he binar sort, which are: 1. Evaluate for recursive base case. 2. Initialize starting arra y bounds. 3. Partition arra y into one or two s ub-arrays. 4. Determine recursive call on sub-arrays. Evaluate for Recursive Base Case The first step is to evaluate the passed pa rameters for termination o f the algorithm, to determine if a recursive base case h as been reached. When one of the two po ssible base cases of the binar shuffle is reached, the recursion t erminates, and the call returns with out any further operation on the passed parameters. T he two criteria for the bas e case of the recursion are: 1. Reach the end of the bits in an element. 2. The size of a sub-arra y is one element. Both cases are simple en ough. The first case is to reach the end of th e number of bits for a given data element. In effect, there are no mo re bits to extract to use for partit ioning. The second case the bounds of the arra y parameter are evaluated to see if there ar e any elements to partition in to a sub-array. For an arra y of one element there is no point t o partition, as the element is in it s final position. Thus for either case the operation of the binar shuffle term inates, or the recursive method call returns. Initialize Starting Arra y Bounds If the binar shuffle algorithm does not terminate, then the operation proce eds to the init ialization step. The passed arra y bounds are used to initialize th e bounds for the original arra y. The passed array bounds are retained for use later i n the operation of t he algorithm. The bounds of the original array are used b y variables to track the changing boundaries of the original array durin g partitioning. After initializ ing the original arra y boundaries, the operation then proceeds to partition. Binar Shuffle Algorithm: S huffling Bit by Bit - 8 - Partition Arra y into Sub-arrays The heart of the binar shuffle al gorithm and the key operation is the parti tioning of the original array. The partition st ep of the algorithm divides the elements of the o riginal arra y into lower and upper sub-arrays. The partit ion operation has three distin ct steps: 1. Extract the nth bit from the data element i n selected position. 2. Using the bit value, place the data element in t he sub-arra y. 3. Adjust array boundar y so the used sub-array is extended to encomp ass element. It is important to note that the selected position used in the operation is t he lower position of th e array. The selected position acts as a point of focus for the operati on of partitioning the arra y, the data element at the lower posi tion is the working element, the element th at is being placed into a sub-array b y pa rtitioning. Bit Extraction Before any partitioning is possible on the selected data element, the bit at t he particular posit ion in the bit schedule must be ex tracted to determine the sub-arra y to place the element. The process of bit extracting uses a shift op eration to the left by N-bits, and a bit mask to extract the bit as an integer zero or non-zero ( not necessarily integer value of 1, but the integer value of the bit mask literal). The bit mask is a l iteral that depends on the data el ement word size. The bit wise logical and operation masks all th e bits to zero or to the value of the literal bit m ask. Placement of Element The placement of the data element is depend ent upon th e bit value from the bit ex traction. Depending on the bit val ue and the bit in the bit schedule, the d ata element is placed b y one of two possibilit ies. The data element selected is in the lower sub- array. The two possibil ities are: 1. The data element is alrea dy placed in position in the lo wer sub-arra y . 2. The data element is in wr ong position, exchange wit h the element in the upper sub-arra y. For a bit value that equals t he schedule bit, the data element is in po sition in the lower sub-arra y, thus nothing is done. For a bit v alue that is unequal to the schedule bit , the data element is exchanged or swapped wi th the data element in the upper sub-arra y. In either case, the array bounds are adjusted afterwards once the data element is placed. Adjust Array Bounds Once the data element is pl aced in the correct sub-arra y, the array boundaries are adjusted to encompass the element. For the lower sub-arra y , the lower arra y bound is incremented, and for the upper sub-array, the upper array bound is decremented. In the operation of the al gorithm, the lower and upper arra y bounds approach one another as each data element is placed in the correct sub-array. Binar Shuffle Algorithm: S huffling Bit by Bit - 9 - Partition Repetiti on The partitioning process conti nues iteratively for each data element to be correctly placed in a sub-array. The iterative p rocess continues until the arra y bounds cross over or overlap. At whi ch point all the elements are partiti oned into the sub-arrays. The last step is t o determine the recursive method invocati on to continue the operation of t he algorithm on the sub-arra ys. Determine Recursive Call on Sub-arrays The last primar y step of the operation of th e binar shuffle is to contin ue the algorithm recursivel y on the sub-arra ys. De pend ing upon the process of partitioni ng, it is possible that one or two sub- arrays were created. For one sub-ar ray, no partitioning occurred; eff ectively the original arra y is undivided. The condition of not p artitioning the original arra y into two sub-arrays is termed pass- through. With two sub-arra ys, the partitioning process successful ly divided the data elements of the original arra y into two sub-arrays. However, the operation of the al gorithm must determine which case is the r esult of partitioning-- pass-through with one su b-array, or two sub-arra ys. The original passed arra y bounds are evaluating using the array bound va riables that changed during parti tioning. From the evaluation the recursive call is either a singl e recursive call or two recursive calls . The parameters passed involve the original arra y bounds, the arra y var iables modified duri ng partitioning, and the original array. For either recursive call t he bit position is the next incremental bit position in the data element from the current posit ion i to the next positi on i+1 . Binar Shuffle Algorithm: S huffling Bit by Bit - 10 - Illustration of the Algorithm The binar shuffle algorith m works on different data t ypes (such as character, integer, ordinal, float, double, string) of di fferent data sizes. The illustration o f the binar shuffle is for example values moved into the lower or upp er sub-arra y s for a shuffle. Consider an initial arra y of 32-bit word values [0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15] that are sixteen 32-bit words to shuffle into a random p ermutation. The bit mask for the bit extraction i s hexadecimal 0x000 00001 or integer 1. A total of 8-bits are utiliz ed from the 32-bit integer word, from bit positions 31 to 24. The bit values are an alternating pattern of on e and zero or the series 1,0,1,0,1,0,1,0 respectivel y. The initial array using the binar shuffle algorithm is partitioned int o a new random permutatio n of [12 11 6 7 8 9 10 0 15 14 13 2 3 4 5 1] for the final shuffled array. The initial call is of the f orm: shuffle( 0, data. length-1 , 0, dat a, indx, bits, 8) ; The call to the method sh uffle, with the sub-array from 0...len gth of the data arra y , an initial index position of 0, th e data sub-array (the initial arra y passed), the ind ex of bit posit ions, the bit schedule, and the number o f bits to use in shuffling the arra y. There are several recursi ve invocations, but to ill ustrate the initial first pass to partition the arra y into sub-arrays is used for ill ustration. Initially the two resulting sub-arra ys are empty, and the starting array contains all 16-data elements. The ar ray and sub-arra y s have the data elements at the start: Array = [][0 1 2 3 4 5 6 7 8 9 1 0 11 12 13 14 15] [] The first value selected f or partitioning is the data element integer v alue of 0. The extracted bit is a 0-bit, and the bit value for the bit schedule is a bit-0 is equal. Thus the integer value of 0 is in the correct lower sub-arr ay. In this case, the sub-arra y boundaries are incremented to i nclude the data element, as there is no d ata element exchange. The resulting arra ys are: Array = [0][1 2 3 4 5 6 7 8 9 10 11 12 1 3 14 15][ ] The second value selected for partitioning is the data element inte ger value of 1. The extracted bit is a 1-bit, and the bit v alue for the bit schedule is a bit-0. The extracted bit from th e scheduled bit is not equal, thus the data element is exchanged with t he data element in the upper sub-arra y. The resulting arra ys are: Array = [0][15 2 3 4 5 6 7 8 9 1 0 11 12 13 14][1] The next data element partit ioned is the integer value 15, and so forth. Thi s process of partitioning into sub-arrays of shuf fled data elements usi ng the bit value extracted continues recursively. The final per mutation of the array is: Array = [12 11 6 7 8 9 1 0 0 15 1 4 13 2 3 4 5 1] Binar Shuffle Algorithm: S huffling Bit by Bit - 11 - The initial array of sorted elements partitioned into a final permutation that i s randoml y arranged. One important poin t in illustrating the operation of the bina r shuffle is that there are several parameters to this particular configuration of the al gorithm. The initial permutation of the data d’ , the data element bit size w , the number of bits used n , the bit schedule s , and the bit indices i . Another unique fin al permutation from the binar shuffl e is possible or a parameteriz ed form of B(n,s,i) where the 3-parameters o f number of bits us ed n , the bit schedule s , and the indices i . Binar Shuffle Algorithm: S huffling Bit by Bit - 12 - Analysis of Algorithm The binar shuffle is in pe rformance linearly propo rtional in both time and space, or Bi g-Oh of O(n) . a. Space The space performance, or m emory utilized b y the binar shuffle algorithm is the simplest analysis. The binar shuffle uses the data o f an indices and bit schedul e, along with the number of bits to use, and the starting index position. The indices, bit s chedule, and number of bits to use are read-only constants never updated. The ind ex position i s updated, as are the sub-arra y boundaries. The starting arra y of data elements, and following sub- arrays are the same array structure, passed recursiv ely. Hence the size of data elements i s N , and the bit schedule and indices are of size N . The other param eters passed, and used withi n the binar shuffle are a constant number of c . The space performance comp lexit y is the sum of the two quantities--the arr ays and the variables. The three arra ys a re of siz e N , and thus in Big-Oh notation are c ⋅ N . The vari ables are a constant number c ; for the expression o f the sum that is the space complexit y is the arithmetic expression c ⋅ N+c . Using Big-Oh not ation, the space anal ysis performance is O(c ⋅ N+c) or more sim ply O(N) that is linear space complex ity. b. Time The time performance or runt ime performance of the binar shuffle al gorithm is more invo lved. The partitioning recursivel y into sub-arrays recursivel y has the possibility of complex mathematical analysis. Before delving deeper into t he analysis of the time performance o f the binar shuffle, there are several important specifi c points to consider. The three points that relate to time performance are: 1. Each bit at an index is used onl y once to shuffle. 2. Each data element for a bi t is accessed only once. 3. For a given data element bit size of w , onl y s bits are used wh ere 0 < s ≤ w . For an arra y of N-elements, t here are a constant c number of bits t o use, the given size s of bits. Each bit extracted is used o nce in the shuffle, and a data element h as its bit accessed once. Thus the time performance is a product ex pression of the number of bit access es a , the number of bits used c , and the number o f data elements N . The time complex ity is expressed in the form of an arithmetic expression of: T = a ⋅ c ⋅ N Since each bit is accessed once for ea ch pass to shuffle the dat a elements, the time performance expression is sim plified to: Binar Shuffle Algorithm: S huffling Bit by Bit - 13 - T = 1 ⋅ c ⋅ N = c ⋅ N Thus the time performance is linear or O (c ⋅ N) or more simply O(N) . A different examination and anal ysis of the time performance is that t he arra y of N-elements and using s-bits to shuffle is a matrix of N-rows and s-columns . The size of N can vary, but the number of bits s is a cons tant c . The algorithm accesses th e N × c cells of the matrix for each row and each column once. Thus t o access all the cells in th e matrix once will have a time performance complexi ty of O(N ⋅ c) = O(c ⋅ N) = O(N) . Binar Shuffle Algorithm: S huffling Bit by Bit - 14 - Performance of Algorithm A. Test of Performance The binar shuffle algorith m was implemented in the C programmi ng language, and compi led with the GNU C Compi ler (gcc) for the PowerPC platform. The test of pe rformance of the bina r shuffle algorithm varies in si ze from the initial size to ten times the original size. A shuffle algorithm randomizes data, thus the permutation of the elements of th e test set need onl y be in sorted or ordered organizati on. The test data set uses unique, non-r epeated data elements to avoid any potential anomalous data elements, and to test the perform ance of shuffling on each data element in a data set onl y once. The test program creates a data set that consisted of ordered 32-bit i ntegers, var y ing from an initial size of 200,000-elem ents to 2,000,000-elements in increments of 200 ,000. The test data set was ordered in both asce nding and descending order, and the inte gers were non-repeated. The test program was execute d several times to avoid an y spurious inconsistencies. B. Performance Results The results of the performance tests on th e generated data sets were consi stent with th eoretical analysis. Binar Shuffle 0 2000 4000 6000 8000 10000 12000 1 4 7 10 13 16 19 2 2 25 28 31 3 4 3 7 4 0 43 45 4 9 Size 1K El ements Graph of Size to Performance of Binar Shuffle Binar Shuffle Algorithm: S huffling Bit by Bit - 15 - One unexpected result was that as the data s et size increased linearl y, the performance time increased linearl y in proportion , but at a sub-linear rate. However, this sub-l inear time to data size is still the Big-Oh com plexit y of O(N) , but with a constant c < 1. Th e constant varied depending on the size of the test data set, but remained withi n the constraint of 0.417 ≤ c ≤ 0.5, or more specificall y 0 < c < 1. The binar shuffle algorith m performance is consistent with the theoretical anal y sis as a Big-Oh O(N) linear algorithm. The constant flux depends on the test data set siz e but is a consistent constant for linear perform ance. Binar Shuffle Algorithm: S huffling Bit by Bit - 16 - Future Work The binar shuffle algorith m is by no means finished or done, the re are much further work and possibiliti es with the algorithm. Future work with the binar shuffle h as the goal of improving the performance of the binar shuffle, opti mizing the implementation of the binar shuffle algorithm, and potentially generaliz ing the binar shuffle for many data t ypes and encodings. All three venues for future work must avoid creating a non-linear algorithm, thus preserving performance in both time and space. All the while for the opt imizations, t he binar shuffle must avoid the anathema of a shuffle al gorithm--creating an ordered or partiall y ordered permutation, inadvertentl y sorting the arra y of data elements. These constraints fo rm the bounds for future work with the binar shuff le algorithm. There are several potential areas for futu re work with the binar shuffle algorithm that are: 1. Variations of the algorithm 2. Bit scheduling 3. Parallelization 4. Dynamicall y a djust for re-shuffle These four areas named f or further endeavor are not the only potential prospects, but are apparent avenues for more future wo rk. A. Variations Variations of the binar shuffle algorithm ar e mainly in two variants of the b inar shuffle algorithm. These two changes are: 1. Translation of binar shuffle to other pro gramming languages and paradi gms. 2. Iterative implementation of t he binar shuffle algorithm . a. Translation to Other Programmi ng Languages One area of future work i s to port the binar shuffle to other paradigms of programming languages. Often an algorithm is easily implemented in one programmi ng language and paradigm, but it is mu ch more difficult to translate and port the al gorithm to another programming language a nd paradigm. However, ( famous last words before eating them) i t is not impossible, as ever y programming language is equivalent in Turin g completeness. Hence in theory the binar shuffle algorithm can b e translated and ported to an y other programming language, but in practice it might be more or less difficult to do so in actuality. One paradigm of progra mming of interest is that of function al programming langua ges. The interesting point of consideration is utilizing the functional paradigm to i mplement the binar shuffle, and how the bina r shuffle algorithm operates us ing a functional approach to its implementation. I t would be equall y interesting to port th e binar shuffle to other programmin g languages that take a different approach to im plementation, and have uniq ue and novel features. Binar Shuffle Algorithm: S huffling Bit by Bit - 17 - The question is one of ho w does the binar shuffle algorithm ch ange in comparison to oth er implementations is answered by translation. The answer requires a ctual translation to evaluate the new version of the binar shuffle, but for different implementati ons in different paradigms, much better insights into the binar shuffle can be d etermined. b. Iterative implementation of the algorithm The binar shuffle algorith m is primarily a recursive algorithm, after each shuffle on a bit at an index; the created sub-arra ys are then recursivel y shuffled for th e next bit index at the successive position. However, recursi on can be costly, as recursion creates stack f rames on the runtime stack. One potential improvement is to use an iterative rather than a recursiv e approach, begging the question of if it is pos sible to convert the recursive bin ar shuffle to an iterative binar shuffl e. The question remains that if it is possible, how does it alter the binar shu ffle algorithm implementation. Another optimizati on is to use iteration partiall y, if a full conversion to comp lete iteration is not viable. In the case that a shuffle onl y creates a single sub-arra y, rather than continue recursivel y, the next shuffle for a bit at an index is handled iterativel y. This is a partial optim ization to avoid the overhead of recursion when th ere are not two sub-arra ys to partition with a shuffle. B. Bit Scheduling Future work with bit scheduling involv es two potential im provements for the operation of bin ar shuffle that uses the bits to shuffle the data elements. The two possi ble improvements are: 1. Reverse bit schedule from t he encoding 2. Ubiquitous bit s chedule for any encoding Reverse Bit Schedule For a given data set of el ements, the elements have a particular en coding. That is to s ay that the encoding follows a specific bit in dex, an order of the bits, wit hin a data element from the most significant bit (MSB) to the least si gnificant bit (LSB). When th e bit schedule follows the bit order, effectivel y an ordered permutation is generated--the same as the binar sort. Thus following the reverse bit s chedule, or from the least significant bit to the most significant for a given encoding is a p otential generalization of the binar shuffle. Thi s optimization would eliminate the need to pass a bit index as part of the bit schedule for a given data element t ype. The bit index woul d be coded as part of the binar shuffle algorithm for the specific type of data element passed for randomiz ation. The trade-off for this i mprovement is that a specific t ype in a specific encoding would require a binar shuffle implementatio n for that type and encoding. Binar Shuffle Algorithm: S huffling Bit by Bit - 18 - b. Ubiquitous Bit Schedule Improving and optimiz ing the bit schedule for a shuffle imp roves the efficac y of the binar shuffle algorithm by ensuring that for an y possible permutation that th e algorithm will never b y design create an ordered or partiall y ordered permutation of the data el ements. C. Parallelization Parallelization of the binar shuffle involv es creating a parallel variant of th e binar shuffle. The two possible (and again, b y no means the only viable) appro aches to paralleliz ation are: 1. Use the created of sub-ar rays. 2. Utilize each bit at an index . a. Use the created sub-arra ys Parallelize the operation of th e binar shuffle; at som e point a sub-array, and a single processo r handles the other created s ub-arrays. The sub-arrays are mutu ally exclusive of one another, th us can shuffle recursivel y independent of other sub-arra ys. Once all the processors finish shu ffling the sub-arra y s recursively, each is gathered into the original arr ay. b. Utilize each bit at an index Each bit for a shuffle is m apped to a specific processor in the pa rallel system. Thus parallelize the binar shuffle for each of the bi ts at an index used. Each shuf fle uses a specific bit that i s independent from the other bits i n the data elements. At some poi nt when each processor finishes shuffling using the bits, the shuffled a rrays are gathered into a complet e array. The gathering of the shuffled arr ays on each processor is more involved, as each bit index creates a unique permutation on each pro cessor. The last serial proc ess of the parallel binar shuffle is integration of each data e lement from each permutation on each processor into a final permutation of the arra y. Both approaches at paralleliz ation have some serial algorithmic pro cess in the binar shuffle algorithm. An open research questi on is for future work in volving parallelization th at minimizes or avoids a serial approach in the parallelization of the binar shuffle algorithm. D. Dynamic Adjusting fo r Re-shuffle One possible area of furt her work is dynamic parameterizati on; the binar shuffle algorithm dynamically re-adjusts the param eters of the shuffle. Hence for a given final permutation, the binar shuffle will re-shuffle shou ld the shuffle not be optim al for the parameters originall y passed. Such a self-adjus ting binar shuffle would re-configure the pa rameters of the number of bits used, and the bit schedule of indices and values. The open qu estion is what criteria are used to determine if the final permutation needs to be re-shuffled under di fferent parameters. Binar Shuffle Algorithm: S huffling Bit by Bit - 19 - Two possible metrics for re-shuf fling the arra y are: 1. Compare the final permu tation of data elements compared with ori ginal positions in the array. 2. Analyze the final permutati on to determine if any triple of elements is ordered, or sorted, such as a ≤ b ≤ c. The shuffle is either repe ated on the original arra y of data elements, or on the final permutation for the array of data elements. An autom atic re-shuffling would avoid any partiall y ordered, incompletely shuffled, or ordered p ermutations of the data elements. Other future work for improvements and op timizations is possible in the context of applications and libraries that utilize th e binar shuffle. Binar Shuffle Algorithm: S huffling Bit by Bit - 20 - Conclusion The binar shuffle is an O(N ) linear, universal, recursive shu ffling algorithm. The bi nar shuffle utilizes the uni versal property of the encoding of the data el ements for shuffling. The bits for encoding of the data elements are used on the array from an initial starting permut ation into a randomly arranged perm utation. The extraction, ex change, and partitioning of the arra y into sub- arrays is the shuffle proc ess. The binar shuffle is universal, utilizing the encoding of data elements instead of other a prior propert y of the data. The binar shuffle algorith m is a configurable or parameteriz ed shuffle algorithm. The specific operation of the algorithm i s configured by a 3-tuple of the number of bits from the encoding to use, the bit index, and the bit values. The bit index and bit values form the bit schedule used for the shuffle. The use of parameterization separates the binar shuffle from bias to a ps eudo-random number generator. Instead, the binar shuffle al gorithm uses a bit s chedule--an index of bit posi tions, and bit values to organize an array of data elements into a random pe rmutation. The random aspect of the shuffle is external to the al gorithmic operation. Binar Shuffle Algorithm: S huffling Bit by Bit - 21 - Appendix A - J ava Source Code public final class Binar ShuffleTest { private BinarShuffle Test(){} public final static void shuffle(final int lo_bo und, final int hi_bound, final int pos, final int[] array, final int index [], final int[] bits, final int size) { System.out.prin t(">>> lo = "+lo_bound+"; hi = "+hi_bound); System.out.prin t("Index: "+pos+"; Position: "+index[pos]); System.out.prin tln(" <<<"); System.out.prin tln(); if(pos > size | | lo_bound >= hi_bound) retu rn; int lo = lo_bou nd; int hi = hi_bou nd; while(lo < hi+1 ) { final int b it = (index[pos] >> array[lo ]-1) & 0x00000001; System.out. print(">>> lo = "+lo_bound+" ; hi = "+hi_bound); System.out. print("Index: "+pos+"; Posit ion: "+index[pos]); System.out. println(" <<<"); System.out. println(); printArray( array); if(bit == b its[pos]) { lo++; } else { int tem p = array[hi]; array[h i] = array[lo]; array[l o] = temp; hi--; }//end if }//end while if(lo == hi_bou nd + 1) { shuffle(lo_ bound,hi_bound,pos+1, array, index,bits,size); } else { shuffle(lo_ bound, lo-1, pos+1, array,in dex,bits,size); shuffle(lo, hi_bound, pos+1, array,inde x,bits,size); }//end if }//end shuffle Binar Shuffle Algorithm: S huffling Bit by Bit - 22 - public final static void printArray(final int[] array) { System.out.print ("Array = ["); for(int x=0;x #include #define WIN32 1 #define TRACE 0 #ifdef WIN32 #include #else #include #endif #ifdef WIN32 clock_t start, stop; #else timeval start, stop; #endif #ifdef WIN32 double duration( const clock_t first, const c lock_t last) { return(d ouble)(last - first) / CLOCK S_PER_SEC; } #else double duration( timeval first, timeval last) { return (doub le) (1000000* (last.tv_sec - first.tv_sec ) + (last.tv_usec - first.tv_usec)); } #endif void printArray(const in t array[], const int len) { int x = 0; printf("Array = ["); for(x=0;x >> lo = %d; hi = %d; Index: %d; Position: %d <<< \n\r", lo _bound,hi_bound,pos,index[po s]); if(pos > size || lo _bound >= hi_bound) return; //uint = 32-bits+1 = 33 lo = lo_bound; hi = hi_bound; while(lo < hi+1) { const int bit = (array[lo] << index[pos]) & 0x80000000; if(TRACE) print f("lo = %d; hi = %d ; Index: %d ; Position: %d; \n\r", lo,hi,lo,hi,pos,index[pos] ); if(TRACE) print Array(array,hi_bound-lo_boun d+1); if(bit == bits[ pos]) { lo++; } else { int temp = array[hi]; array[hi] = array[lo]; array[lo] = temp; hi--; }//end if }//end while if(lo == hi_bound + 1) { binar_shuffle(l o_bound,hi_bound,pos+1, arra y,index,bits,size); } else { binar_shuffle(l o_bound, lo-1, pos+1,array,i ndex,bits,size); binar_shuffle(l o, hi_bound, pos+1,array,ind ex,bits,size); }//end if }//end binar_shuffle Binar Shuffle Algorithm: S huffling Bit by Bit - 25 - void test_shuffle_ascend (const int size) { double time = 0.0; int x = -1; int indx[] = { 31, 3 0, 29, 28, 27, 26, 25, 24, 2 3, 22, 21, 20, 19, 18, 17, 1 6, 15, 1 4, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1 }; int bits[] = { 0, 1 , 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0 , 1, 0, 1, 0, 1, 0, 1, 0 }; int* data = malloc(s izeof(int)*size); for(x=0;x=0;x- -) { data[x] = x; }//end for start = clock(); binar_shuffle(0, siz e-1, 1, data, indx, bits, 4) ; stop = clock(); time = duration(star t,stop); printf("Time to shuf fle size %d is %f seconds \n \r",size,time); printf("%d,%f \n\r", size,time); }//end test_shuffle_dsce nd Binar Shuffle Algorithm: S huffling Bit by Bit - 26 - #define DELTA 200000 #define LIMIT 10 int main() { int x = -1; printf("\n\r"); for(x=0;x

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment