Explanation and exact formula of Zipfs law evaluated from rank-share combinatorics

This work proves that ranks and shares are statistically dependent on one another, based on simple combinatorics. It presents a formula for rank-share distribution and illustrates that Zipfs law, is descended from expected values of various ranks in …

Authors: A Shyklo

1 Explanation and exact formul a of Zipf’s la w evaluated from rank-share combinat orics A Sh y klo A BSTR A CT This work proves that rank s and shares are statistic ally depende nt on one another, b ased on sim ple combinatorics. It pres ents a form ula for rank - share distribution and illustrat es that Zipf’s law, is d escended from expected values of various ranks in the n ew distribution. All conclusions, formulas and charts presented here were tested against publical ly available statistica l data in different areas. T he correlation coefficient between the calculated values and statistic al num bers provided b y Bureau of Labor Statistics was 0.99899. Monte -Carlo sim ulati ons were perf ormed as additional evidence. Introduction The m y sterious Z ipf ’s law astonishes r esearchers for over 100 years alread y. It was initiall y prese nted b y Jean-Baptiste Estoup [1] in 19 08. He ob served a strange proportional dependency between frequencies of word usage in texts . Later it was observed in m any languages, that the freque ncy of most comm on w ords is proportional to 1/rank. For example, the word “the” is the m ost commonl y used word in th e English language. T he second most common, “of” is used about half as much as the first. T he third, “and” is used about a third as m uch as the first, and so on. This dependency was popularized by and named after a linguist from Harvard U niversity known as George Kingsley Z ipf [2]. It was u sed in 1913 b y Germ an physicist Felix Auer bach in the "La w of Po pulation Concentration” to describe the size distribution of cities. In 1991 , W entian Li d emonstrates [3] that random ly generated texts f ollow the sam e frequency distribution as real languages. The pattern i s distinct in a g reat deal of research, which causes a recognizable empirical distribution . It can be f ound in t he us e of wor ds, in cit y populati ons, last names, distribution of wealth, f requency of n atural disasters, m arkets behavior etc. It is distinct in the 8 0/20 rule. There were m ultiple attempts to explain it. And it was partially done in m any publications [4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16], [17], howe ver the exact m ath behind it rem ained u nknown, e ven after centuries of research. This work started as a practical attempt to apply the latest statistical form ulas to real life data. W orking with large datasets, we surpr isingly f ound inaccurac y in the exist ing equations. Clos er observations of the various data samples revealed statistical depe ndency between rank, share and number of participants . Further anal y sis led to a so lid understanding of the combin atorics driv ing ranking pr ocess and exact f ormul a for rank-s hare distribution, which provides k ey to understanding of Zip f ’s law and Pareto princ iple. . Results To de monstrate the dependency between the rank an d share, let’s ass ume that we have combined volume T shared between N part icipants. Us ing combinatorics princi ples we can calculate that there are ! )! 1 ( )! 1 ( T N N T    ways to split the volum e. If w e sort and rank ea ch case and count how many t imes the share of some ran k equal to a certain number (S), we can cal culate the probabilit y of this event. If the outcome appears x times , ( when the rank k has the share S), the probabi li ty of this outc ome can be calculated as : )! 1 ( ! )! 1 ( * x = S) k, N, P(T ,    N T T N To illustrate it let’s look at this simplified exam ple: Let’s assum e that we had 3 companies, which sold 10 item s combined. T here are 66 poss ible combinations of how they can split the market volum e (T =10). If we sort each combination an d count how man y t imes rank one has each value from 0 to 10, we can create t he following chart: 2 From this chart we can see , for example, that the prob ability of the Rank 1 to have the share = 7 of 10 (or 70%) is 12/66 or 18.1 818%. So we can calculate th e probability of every share value for rank 1 Similar charts can be crea ted for Rank2 and Rank 3. Figure 2: Number of com binations vs shares for Ranks 2 and 3 (T=10, N= 3) ) Using this logic w e can calc ulate Probability Densit y Functions of vari ous ranks, N and T. 0 0 0 0 6 15 15 12 9 6 3 0 2 4 6 8 10 12 14 16 0 1 2 3 4 5 6 7 8 9 10 number of combination s Share Figure 1: Number of combina tions vs shares f or Rank 1 of 3 (T=10) 3 9 15 21 15 3 0 0 0 0 0 0 5 10 15 20 25 0 1 2 3 4 5 6 7 8 9 10 number of combination s Share Rank 2 30 21 12 3 0 0 0 0 0 0 0 0 5 10 15 20 25 30 35 0 1 2 3 4 5 6 7 8 9 10 number of combination s Share Rank 3 3 Figure 3 : Probabilit y Density Functions for all rank s (N from 2 to 5) W e were able to evaluate t he universal form ula for PDFs of rank -share distribution a s: ) ) 1 ( * )! ( )! ( )! ( * ) 1 (( * )! ( )! 1 ( ! )! 1 ( = S) k, P( N, 2 } f l oo r( 1/ S) m in { N,             N k i k i iS i N k i k N k N k N N To prove this f ormula , we tested it aga inst stat istical data and perform ed Monte- Carlo s imulations. Figure 4 represents the sam e distributions for N = 4 c alculated based on 500000 ran dom outcom es. 0.0% 0.5% 1.0% 1.5% 2.0% 2.5% 0 20 40 60 80 100 probablity (%) share (%) Share PDF s of 2 participants Rank 1 Rank 2 0.0% 1.0% 2.0% 3.0% 4.0% 5.0% 6.0% 0 20 40 60 80 100 probability share (%) Share PDF s of 3 participants Rank 1 Rank 2 Rank 3 0% 2% 4% 6% 8% 10% 12% 0 20 40 60 80 100 probability share (%) Share PDF s of 4 participants Rank 1 Rank 2 Rank 3 Rank 4 0% 2% 4% 6% 8% 10% 12% 14% 16% 18% 20% 0 20 40 60 80 100 probability Share (%) Share PDFs of 5 participants Rank 1 Rank 2 Rank 3 Rank 4 Rank 5 4 Figure 4: Mo nte-Carlo Simulation of N = 4 The expected values of share for each rank c an be calculated as: ) 1 ( * 1 = S N   k i i N W h ere S represents share for r ank k, and N - num ber of participants Figure 5: Expected v alues of the rank -share distribution. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 ExpVal R1 R2 R3 R4 R5 5 Figure 6 : Expected v alues for Zipf’s law in double log scale calculated for N from 1 to 100 The expected v alues of the rank -share distribution g ives us the dependency between rank and f requency known as Zipf’s La w . T o prove it we tested it o n publically available datas ets with known N. For exam ple we k now that C anada has 13 states, Bra zil has 27 s tates and US – 50 States. We can get statist ics of the area distribution for each c ountry from these sources [18],[19],[ 20]. Figure 7: Shares of the area of the states in US, C anada, Brazil on double lo g scale (real vs calc.) 0.0001 0.001 0.01 0.1 1 1 10 100 0.01% 0.10% 1.00% 10.00% 100.00% 1 10 100 Canada [13] Brasil [27] US [50] 0.00% 0.01% 0.10% 1.00% 10.00% 100.00% 1 10 100 N 13 N 27 N 50 6 Another example ca n be the distribution of letters among Europ ean languages, published here [ 21 ]. W e exactly know how m any letters are in each language. Figure 8 : Frequencies of l etters usage in language s on double log scale (real vs expect ed values). Table 1 : Correlation coeff icients between r eal and calculated dist ributions of letters usage. Language Correlation Language Correlation Czech 0.972696909 Icelandic 0.984039287 Danish 0.983906839 Italian 0.974405202 Dutch 0.972386224 Polish 0.99029575 Esperanto 0.979436537 Portuguese 0.984913919 Finnish 0.977592954 Spanish 0.985479455 French 0.971366449 Swedish 0.969368024 German 0.98786782 Turkish 0.991741606 To achieve a m ore accurate verification, we b undled together num bers provided by Bureau of Labor Statistics of US Departm ent of Labor [ 22]. W e analyzed Occupational Em ploy ment and W ages distributions between 2 2 categories in m ore than 50 US cities. W e ran ked the categories f or each city, then we calculated the average share for each ra nk from 1 to 22, a nd com pared results to the ca lculated values. The observed c orrelation coefficient was 0.99 899712. 0.00% 0.01% 0.10% 1.00% 10.00% 100.00% 1 10 100 0.01% 0.10% 1.00% 10.00% 100.00% 1 10 100 7 Conclusion W e demonstrated that rank and share are statist ically related and evaluated an exac t formula for the rank - share distribut ion. This is a universal law, which can be applied to an y area. T hat’s why we can find it in completely different places (words, population, markets). The expected values of the new distribution gives us the d ependency between rank and frequenc y known as Zipf’s La w. W e can r ank distribution of word s, frequency of nat ural disasters, population in the cities or incom e spreading, and observe the same pattern caused by simple rank -share com bina torics. Discussion The PDF form ula of the r ank-share distribution c ontains bi nom ial coefficients and probably could be simplified using binomial e quations. T he rank -share distribution possibly b elongs to binomial series. It sti ll needs to be classified. W e spend significant tim e tryi ng to derive it from known distributions. W e w ere able partially derive dependency for s ome rank s between rank -share and Negative- Binomial distribution . However, universa l dependency still should be evaluated. . For this work we were conc entrating on continuous solutions, assuming that T ( number of shared items) is large enough t o b e cons idered as ∞ , but it would b e int eresting t o deri ve an exact f ormula for disc rete solutions including T as a parameter. Methods W e start ed analyzing big sets of stati stical data and observed strong recognizable patterns between ranks, shares and number of participants . W e also noticed that possible share values for each rank were located within certain range and figured the log ic for ranges: The maxim u m share of Rank k is always 1/k. The minim u m share is 0 for al l ranks except Rank 1 (where m in is 1/N). To explain the logic be hind it, imagine the case when we have jus t two part icipants. T hey split sh ares in some proportions (50/50, 60/40 or so.). Participant rank ed #1 could not have a share less than 50%. The same w ay pa rticipant #2 c ould not have a share more than 50 %. So #1 distributed between 1/2 and 1 while #2 between 0 and 1/2 . In the case of 3 players, the lo w est possible value for #1 is 1/3 (case of equal 0.1 1 10 100 1 10 Figure 9: Expected V alues for N=22 Calcuat ed V s Av g.Sta t . Average Stat Calculated 8 distribution between #1, #2 and #3). The highest possible value for # 2 is 1/2 (case 50/50/0). As a res ult we have #1 between 1/3 a nd 1. #2 between 0 a nd 1/2. #3 bet w een 0 and 1/ 3. W e tested this logic for m any N and it perfectly fit with real data. At some point we rea lized that combinatorics ma y also govern di stributions of shares for ea ch rank. To test it we created a simple python algorithm . W e used N nested loops with N variables and count ed only cases where sum of all variables equal T. T hen we c ould rank each case indexing al l the shares used f or each rank. W e must acknowledge th at ther e can be variations among the methodologies of the ra nking process. For example: W hat should we do in cases whe n the shares of two participants are eq ual? Should we consider participants with the s hare = 0? W e did a relativel y com pli cated im pact anal y s is of these factors and observed that when T is big enough, all scenarios will lead us to the same depend ency. The outcom es of our calculation correlates with real statistica l data. W e continued seriousl y testing it. T ests were conducted f or various ranks up to N=70. To ac hieve it w e c ompleted s ignificant work improving the eff ectiv eness of algorithm s, using various programm ing environm ents and cam e up w ith a v ery ef fective r ecursive C# algorithm scalable for map reduce, which let us proc ess the cases up to N= 100 and T = 5000. W e combined all pre-calcul ated cases in the database and tr ied to fit the data to one of the existing PDFs. Unfortunatel y , we were not able to find pro per distribution or to der ive our calculated num bers from known distributions. T hus we start ed working on a u niversal form ula f or rank -share PDF. W e tested our approaches agains t a p re -calculated datab ase. Continuous Formu la for the Last Rank Assum e w e have 5 particip ants (N = 5). How can we c ount the number of com bina tions for Rank 5? For a given value of S 5 , the m ini mum value of S 4 can be S 5 , the m aximum is (T- S 5 )/(N-1). For each S 4 , the minim um value of S 3 is S 4 , the max imum is (T-S 5 -S 4 )/(N- 2) For each S 3 , the minim um value of S 2 is S 3 , the max imum is (T-S 5 -S 4 -S 3 )/(N-3). For each S 2 , we have just o ne value of S 1 = T-S 2 -S 3 -S 4 -S 5 . Figure 10 : Combinations for last rank of 5 Based on this logic we can calculate the q uantity (number of c ombination) for the Last Rank as: 9 1 - N 2 - N 2 1 1 ) S - (T S ) 2 ( ) S - S - (T S ) 1 ( ) S - (T S N dS ... dS dS dS 1 ... ... = ) Q( S N 2 i 2 1 - N N 1 - N N N           i N N W e used SymPy Python pack age to calculate results of these interac tions and found the pattern: )! 3 ( )! 2 ( ) S ( = ) S N, Q(T , ) 2 ( N N     N N N T N W h en we normalize it we can calcu late the probabilit y of the last rank ) 2 ( N N ) S ( * ) 1 ( = ) S N, P(T ,    N N N T T N N The expected value of the last rank can be ca lculated as 2 1 N Continuous Formula for N-1 Rank In our exam ple, for the sec ond lowest rank (N -1), we should start from S 4 . The right part of the diagram remains the sam e. In the left part, the minim u m of S 5 is 0, but the m axi mum c an be either S 4 or (T-S 4 *4), depending on what is l ess. Figure 11 : Combinations for rank 4 of 5 There are two diff erent equations depending if S 4 > (T-S 4 *4). W e can calculate the results for both integra l equations. For contin uous solutions, we can represen t the quantities for S 4 of N=5 as two polynomials: 3 1 S 4 ) S 4 ( S * 3 64 - 4 2 4 3 4    3 ) S 4 1 ( 3 4  10 W e can also analyze the conditio n and see that the first po ly nom ial works f or S 4 < 1/5 and the second one for S 4 >= 1/5. For this example, the co ntinuous solution can be pres ented as following graph: The universal form ula for the first pol y nomial, applied o n the interval between 0 an d 1/N, is: 2 ) S * 1 ( ) S * ) 1 ( 1 ( ) 2 ( 1 - N ) 2 ( 1 - N        N N N N N The universal form ula for the second polynom ial, applied on the interval betwee n 1/N and 1/(N -1), is: 2 ) S * ) 1 ( 1 ( ) 2 ( 1 - N     N N N To get exact PDF f ormulas, we should also norm alize the equations. W e can also calculate the ex pected values for rank N - 1 as ) 1 ( 1  N N Middle Ranks Formu las W e can follow this path to see that t he higher the rank , the more integrals we n eed for the continuous solution. The solutio n of these integrals would be a set of polynomials. There are d ifferent functions on the intervals 1 to 1/ 2, 1/2 to 1/3, 1/3 to 1/4 , 1 /4 to 1/5 …. In general, the PDF for N - tier ranks can be represented as sets of po lynomial functions wi th the degree (N -2). For exam ple a ssum ing T=1, we can calculate the not norm alized polynomials for N =3 to 5 as : 0 0.05 0.1 0.15 0.2 0.25 0.3 Figure 12. Rank 4 of 5 as conmbination of tw o polinimials 11 Table 2 : Polynomials for share PDS’s calculated fo r N from 3 to 5 S (for N=3) R1 R2 R3 0 - 1/3 2S -(3S- 1) 1/3-1/2 (3S- 1) -2(2S- 1) 1/2 - 1 (1 - S) S (for N= 4) R1 R2 R3 R4 0 - 1/4 6S 2 -3S(7S- 2) (4S- 1) 2 1/4-1/3 (4S- 1) 2 - 42 S 2 +24*S-3 3(3S- 1) 2 1/3-1/2 - 11S 2 +10S-2 3(2S- 1) 2 1/2 - 1 (1 - S) 2 S (for N=5) R1 R2 R3 R4 R5 0 - 1/5 24S 3 36S 2 *(1-4S) 4S(61S 2 -27S+3) (1 -5S) 3 1/5-1/4 (5S- 1) 3 -476S 3 +300S 2 -60S+4 4((1-3S) 3 - 2(1-4S) 3 ) 4(1-4S) 3 1/4-1/3 -131S 3 + 117S 2 - 33S + 3 (1 -2S) 3 - 3(1-3S) 3 6(1-3S) 3 1/3-1/2 (1 - S) 3 - 4(1-2S) 3 4(1-2S) 3 1/2 - 1 (1 - S) 3 Here are some m ore graphical representations: Figure 12 : Graphical repr esentations of pol y nomials for P DF’s for ranks 2 and 3 of 5 N5 R3 polynomial s N5 R2 polynomials 0 0.1 0.2 0.3 0.4 0 0.2 0.4 0.6 12 Universal formula fo r PDF . W h en we analyzed pol y nom ials for ranks N and N-1 we realized that they c ould be presented as a sum of the terms li ke ) 2 ( ) S 1 (   N i multiplied by coef ficients ( a 1 a 2 a 3 a 4 a 5 ), where i between 1 and N.                                 ... ) S 5 1 ( ) S 4 1 ( ) S 3 1 ( ) S 2 1 ( ) S 1 1 ( * ... ~ S) k, P(N , ) 2 ( ) 2 ( ) 2 ( ) 2 ( ) 2 ( 5 4 3 2 1 N N N N N a a a a a To calculate coef ficients for all pol y nomial equations we created the Python package, which parsed through all com binations of coefficients an d return ed a corresponding m atrix for the given polynomial. Th ese are the results calculated for a ll polynomials with N from 3 to 5: Table 3 : Coefficients for p olynomial equations calculated for N from 3 to 5 S (for N=3) R1 R2 R3 0 - 1/3 [0 2 - 2] [0 0 1] 1/3-1/2 [1 2 0] [0 2 0] 1/2 - 1 [1 0 0] S (for N= 4) R1 R2 R3 R4 0 - 1/4 [0 3 - 6 3] [0 0 3 - 3] [0 0 0 1] 1/4-1/3 [1 - 3 3 0] [0 3 - 6 0] [0 0 3 0] 1/3-1/2 [1 - 3 0 0] [0 3 0 0] 1/2 - 1 [1 0 0 0] S (for N= 5) R1 R2 R3 R4 R5 0 - 1/5 [0 4 - 12 12 4] [0 0 6 - 12 6] [0 0 0 4 - 4] [0 0 0 0 1] 1/5-1/4 [1 - 4 6 - 4 0] [0 - 4 12 - 12 0] [0 0 6 - 12 0] [0 0 0 4 0] 1/4-1/3 [1 - 4 6 0 0] [0 4 - 12 0 0] [0 0 6 0 0] 1/3-1/2 [1 - 4 0 0 0] [0 4 0 0 0] 1/2 - 1 [1 0 0 0 0] It’s obvious that the c oefficients f ollow a binomial pattern. W hen we normalize the dependenc y, we evaluate the followin g formula for PDFs of rank-share distribution: ) ) 1 ( * )! ( )! ( )! ( * ) 1 (( * )! ( )! 1 ( ! )! 1 ( = S) d, k, P( N , 2 d             N k i k i iS i N k i k N k N k N N 13 W h ere S represents the share f or rank k, N numbers o f participants, d represents range (l ike d=1 [1/2 – 1] d=2 [1/3 – 1/2] d=3 [1/4 – 1/3] … N) d is related to S and c ould not be m ore than N. So it can be calcu lated as min{N, ⌊ 1/S ⌋ } Verification To verif y the equations we t ested them against publically available datasets from various sourc es [ 18 ], [1 9],[ 20], [2 1] ,[22] with k nown number of categories . W e rank ed and normalized e ach dataset to fit shares between 0 and 100%. For exam ple, we used data extracts f rom Bureau of Lab or Statistics of US Department of Lab o r. Table 4 : Example of occu pational employment and wages dataset f or various US towns. Major occupat ional group Percent of tota l employ ment Birmingham Montgomery A nchorage Fairbanks Flagstaff Total, all occupat ions 100.00% 100.00% 100.00% 100.00% 100.00% Management 4.2* 3.8* 5.7* 5.9* 5.5 Business and fi nancial oper ations 4.4* 4.5* 5.1 3.6* 3.3* Computer and mathematical 2.6* 2.4* 1.9* 1.7* 1.3* A rchitecture and engineeri ng 1.4* 1.6* 3.0* 2.2 1.4 Life, phy sical, and social scienc e 0.5* 0.7* 1.5* 3.1* 2.7* Community and social services 0.8* 1.2* 2.0* 1.7* 1.8* Legal 0.8 0.8 0.8 0.6* 0.7 Education, train ing, and library 5.2* 6 5.2* 8.1* 6.9 A rts, design, entertainment, sports, and media 1.1* 1.1* 1.3 0.9* 1.2 Healthcare pract itioner and tec hnical 7.9* 5.3 5.7 5.7 6.3* Healthcare supp ort 2.7 2.4* 2.9 2.1* 1.8* Protective service 2.7* 3.1* 2.3* 2.1 2.9* Food preparati on and serv ing relate d 8.1* 8.8 9.3 8.9 15.0* Building and gro unds clean ing and maintenance 2.6* 3.5 3 4.3 4.2* Personal care an d service 2.6* 2.8* 4.5* 1.8* 4.0* Sales and relat ed 12.5* 10.6 9.3* 8.3* 10.7 Office and admin istrative sup port 16.9* 15.6 17.2* 16.3 14.1* Farming, fishing , and forestry 0.1* 0.4 0.1* 0.2* 0.1* Construction and extract ion 4.1 3.0* 5.5* 8.1* 3.2* Installation, maintenance, a nd repair 4.7* 4 4.6* 5.5* 4.7* Production 6.6 10.4* 2.1* 2.3* 4.0* Transportat ion and material moving 7.3 7.9* 7.1 6.6 4.4* The d ata was be transformed to calculate expecte d values for all rank f rom 1 to 22 like this: 14 Table 5 : Example of tran sformed and ranked data t o evaluate expected v alues for shares (N=22) Rank Birmingham Montgomery Anchorage Fairbanks Flagstaff Exp. Value. 1 16.9 15.6 17.2 16.3 15 16.2 2 12.5 10.6 9.3 8.9 14.1 11.08 3 8.1 10.4 9.3 8.3 10.7 9.36 4 7.9 8.8 7.1 8.1 6.9 7.76 5 7.3 7.9 5.7 8.1 6.3 7.06 6 6.6 6 5.7 6.6 5.5 6.08 7 5.2 5.3 5.5 5.9 4.7 5.32 8 4.7 4.5 5.2 5.7 4.4 4.9 9 4.4 4 5.1 5.5 4.2 4.64 10 4.2 3.8 4.6 4.3 4 4.18 11 4.1 3.5 4.5 3.6 4 3.94 12 2.7 3.1 3 3.1 3.3 3.04 13 2.7 3 3 2.3 3.2 2.84 14 2.6 2.8 2.9 2.2 2.9 2.68 15 2.6 2.4 2.3 2.1 2.7 2.42 16 2.6 2.4 2.1 2.1 1.8 2.2 17 1.4 1.6 2 1.8 1.8 1.72 18 1.1 1.2 1.9 1.7 1.4 1.46 19 0.8 1.1 1.5 1.7 1.3 1.28 20 0.8 0.8 1.3 0.9 1.2 1 21 0.5 0.7 0.8 0.6 0.7 0.66 22 0.1 0.4 0.1 0.2 0.1 0.18 For verification, we combined data from more tha n 50 US towns . Monte-Carlo Simul ation. W e used Wolfram Mathem atica to perform Monte Carlo simulations for PDFs (for N from 2 to 6). Th e following code was used: m = Random Integer[100, {500000, 3} ] m2 = Sort /@ m m3 = Transpose[{m 2[[All, 1]], m 2[[All, 2]] - m 2[ [All, 1]], m 2[[All, 3]] - m 2[[All, 2]], 100 - m2[[All, 3]]}] m4 = Sort /@ m 3 ListPlot[Value s[KeySort[C ounts[m4[[All, 4] ]]]]] 15 Figure 13 : Monte-Carlo Simulation for N3 and N5 N3: N5: References 1. J. Estoup. Gam mes st´enographiques. Institut Stenographique de France, 1916. 2. Zipf, G.K. Selected Studies of the Principle of Rel ative Frequency in Language. Cambridge, MA: Harvard Universit y Press. I SBN 9780674434929 (1 932). http://www.hup.har vard.edu/catalog.php?isbn=9780 674434929 3. W Li, Random texts exhibit Zipf’s -law-like word frequency distr ibution. IEEE Tr ansactions on Information Theor y (1992). https://www.santafe.edu/r esearch/results/working -papers/r andom-texts- exhibit-zipfs- law-like-word-frequenc y 4 . Seung Ki Baek , Sebastian Bernhardsson , P etter Minnhagen. Zipf ’s law unzipped. arX iv.org. (2011) https://arxiv.org/abs/110 4.1789 5. Fernando Buen día. Market Shares Ar e Not Zipf-Dist ributed . C om plex Sy stem s. (2013) http://www.com plex-systems.com/pdf/22 -3- 2.pdf 6. Ruokuang Lin, C hunhua Bian, Qianli D. Y .Ma, Scaling laws in human speech, decr easing emergence of new words and a gen eralized model. arXi v.org. (2015). http://arxi v.org/pdf/1412.4846.pdf 10 20 30 40 50 500 1000 1500 2000 10 20 30 40 50 60 500 1000 1500 5 10 15 20 25 30 500 1000 1500 2000 2500 20 40 60 500 1000 1500 2000 2500 3000 3500 10 20 30 40 1000 2000 3000 4000 5000 6000 5 10 15 20 25 30 1000 2000 3000 4000 5000 6000 5 10 15 20 25 2000 4000 6000 8000 5 10 15 20 5000 10 000 15 000 16 7. Nikola y K. Vitanov, Marcel Auslo os, Test of two h y potheses explaining the size of populations in a system of cities. arXiv.org. (2015). http://arxiv.org/pdf/1 506.08535.pdf 8. Vladimir V. Boc hkarev, Eduard Yu. Lerner, Zi pf and non-Zi pf Laws for Homogeneous Mark ov Chain, . arXiv.org. (2012 ). http://arx iv.org/abs/1207.1872 9. Hila Riem er, Suman Mallik, Devanathan Sudharshan, Market Sh ares Follow the Zipf Distribut ion, College of Business at Illinois. (2002 ). http:// www.business.illinois.edu/W orking_Papers/papers/02 - 0125.pdf 10. Ste ven T . Piantadosi, Zipf’s word frequenc y law in natural language: a critical review and future directions . The Un iversity of Rochester . (2015) https://colala.bcs.rochest er.edu/papers/pianta dosi2014zipfs.pdf 1 1. D.Yu. Nam in, Mandelbrot's Model for Zipf 's Law: Can Mandelbrot's Mo del Explain Zipf's Law f or Language? Journa l of Quantitative Linguistics 16(3): 274-28 5 (2009) 12. AARON CL AUSET, COSMA ROHILLA SH ALIZI, M. E. J. NEW MAN. POW ER-LA W DISTRIBUTIONS IN EMPIRICAL D ATA. arXiv.org. (20 09 ) http://arxiv.org/pdf /0706.1062v2.pdf 13. Elvis Oltean . An econophysical approach of polynom ial distribution applied to incom e and expenditure . arXi v.org. (20 09 ) http://arx iv .org/ftp/arxi v/papers/1410/1410.3860.pd f 14. K. E. Keched zhy, O. V. Usatenko, V. A . Yampol’skii, A. Ya. Usik ov . Rank distributions of words in additive many-step Markov chains and the Z ipf law . arX iv.org. (20 04 ). http://arxiv.org/pdf /phy sics/0406099 .pdf 15. Bernat Corom inas-Murtra, Lus F Seo ane, Ricard V. Sole. Z ipf's law, unbounded com plexity and open- ended evolution. arX iv.org. (20 16 ) h ttp://arxiv.org/abs/1612.016 05 16. Oscar Fontane lli, Pedro Miramontes, Yaning Ya ng, Germ inal Cocho, W entian Li. Beyond Zipf ’s Law: The Lavalette Rank Function and Its Properties . arXiv.org. (2016) . https://arxiv.org/pdf/1606.0 1959.pdf 17. Bohdan B. Kh omtchouk, Claes W ahlestedt. Zip f’s l aw emerges asymptoticall y during phase transitions in comm unicative systems . arXiv.org. (2016) https://arxiv. org/pdf/1603.03153.pdf 18. List of U.S. states and territories by area . W ikipedia. (2017). https://en.wik ipedia.org/wiki/List_of_U.S._states_and _territories_b y _area 19. List of Brazilian sta tes by area. W ikipedia. (2016). https://en.wik ipedia.org/wiki/List_of_Brazilian_states_b y _area 20. List of Canad ian provinces and territories b y area. W ikipedia. (2017). https://en.wik ipedia.org/wiki/List_of_Canadian_prov inces_and_territories_b y _area 21. Letter frequenc y. W ikipedia. (2017). https:// en.wikipedia.org/wiki/Letter_f requency 22. Bureau of Labor Statistics of US Departm ent of Labor. ( 2016) https://www.bls.gov /regions/

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment