Zipfs Law and the Frequency of Characters or Words of Oracles

The article discusses the frequency of characters of Oracle,concluding that the frequency and the rank of a word or character is fit to Zipf-Mandelboit Law or Zipf's law with three parameters,and figuring out the parameters based on the frequency,and…

Authors: Xiuli Wang

Zipfs Law and the Frequency of Characters or Words of Oracles
Zipf ’ s La w and the Frequenc y of Ch aract ers or W ords o f Oracles W ang Xiuli Anhui Uni versity Hefei, Anhui 23003 9 China Email: w angx iuli@ahu.edu .cn Abstract —The article discusses the frequency of characters of Oracle,concluding that the frequency and the rank of a word or characte r is fit to Zipf-Mandelboit Law or Zipf ’ s law with three p arameters,and figuring out the parameters b ased on th e frequency ,and pointin g out that what some researchers of Oracle call the assembling on the two ends is just a description by their impression about the Oracle data. I . I N T RO D U C T I O N It is known that the American ling uist Ge orge Kingsley Zipf presented the law about freq uency of word and the rank of word accor ding to it’ s f requen cy ,th e Zipf ’ s law has experienced mu ch intensive study in d ifferent domain s by different research ers such as Levyand Mandelbr oit,and he nce has been modified to fit into data more precisely ,therefor e there are varieties o f it ,one variety of the law is f = c r α where f is th e frequen cy , r is the rank, and α is a para meter ,and c is a constant. a nother variety is f = c ( r + a ) α where a is an other par ameter . George Kingsley Zip f argues that the law is v alid because the least effort pr inciple in h uman be havior [2].A recen t research shows that Simultan eous minimization in the effort of both hearer an d sp eaker based on the game-theo retic e volution will lead to Zipf ’ s law in the transition b etween referentially useless systems and ind exical reference system s [1]. Chinese cha racters of Oracles are Chine se archaic charac- ters that ar e different from mod ern Chinese chara cters and inscribed on animal bones and tortoise shell.Since archaic Chinese is single-syllab le langu age that is alm ost o ne syllable correspo nds to at least on e word,and one character rep resents one word u sually .Hen ce W e can know the frequ ency of words by just co unting the fre quency of characters.And scholars of Oracle have made some ambiguo us or e ven wrong claim about the distribution o f chara cters of Oracle such as ”assemb ling on two ends”. In this paper, we will first list some varieties of Zip fs’ law ,an d then g iv e the distribution of words of oracles b ased on the data collected f rom transliteration of or acles.W e show the 0 1 — ?? distribution is Zipfs’ law with thr ee parameter, give the v alue of th e pa rameters ,an d explain why these par ameters take such values I I . Z I P F S L AW Zipfs’ Law is an empirical law , there are v arieties of it,here we d iscuss som e varieties that are relev ant to our stud y . A. Zipfs’ law with α = 1 one variety of Zipf Law or the orig inal one is the fo llowing f = c r α The following example is list about frequen cy and rank of words of Mandarin e xample 1 , see the figure 2 in the bo ttom. B. Zipf law with α = 1 If series F ( α ) = ∞ X i =1 f i = ∞ X i =1 c r α i is defin ed on the complex plane ,that is α ∈ C ,it is what mathematician s call Riemann ζ functio n which ha s a lot of results to be app licable to areas relevant to Zipf law or the like there is another variety called Zipf-Mandelb roit formula f i = c ( a + r ) α where 0 ≤ a < 1 these can be regarded as terms of the generalized har monic series: H ( α ) = ∞ X i =1 f i = ∞ X i =1 c ( a + r ) α I I I . T H E W O R D F R E Q U E N C Y O F A R C H A I C C H I N E S E A N D Z I P F L AW A. W or d fr equency of ar chaic Chinese of the classical Chine se documen ts and Zipf Law There are some corpus which are in time near to oracle in histor y ,we ha ve cou nted the frequency o f words of a work called ShiJi which means it is the re cord o f the history before it’ s co mpletion ,it was written in about B.C,the time of it’ s writing is near Shang dynasty r elativ ely in the term of inherited docume nts . In ord er to make comparison with word frequ ency of oracles,we list the fr equency and the ran k o f words of ShiJi in the following : example 2 word f requen cy of Sh iJiPart,see the figu res 3-5 in the bottom. Obviously ,Zipf la w is valid for Sh iJi B. the F r equency of Characters o r W or ds o f Oracles and Zipf ’s Law The langu age used in orac les is archaic Chinese,but the oracles that scho lars h ad found and co llected are in ab out se veral hun dred years long time f rom Shang dy nasty(fro m 1600 BC to 1046 BC) if we do no t consider abo ut the oracle bones of Zhou dyn asty .It can b e regarded as cor pus of ar chaic Chinese of th at perio d,so we expect that the corpus is fit into Zipf law . The f ollowing is d ata of fr equency o f characters of Oracle by Th e Cen ter for Studying an d Application of Chinese Character of Huadong Nor mal University: Based on the data above,we make them fit to Zipf- Mandelbr oit formu la and by caculating,get the follo wing: f i = c ( a + r ) α α ≈ 1 , a ≈ 1 where it’ s parameters are α ≈ 1 , a ≈ 1 .W e c an conclud e from Zipf-Mandelb roit f ormula that is fit to the data abovethe corp us o f oracle or the language is different fro m Mandarin or ShiJi, α ≈ 1 , a ≈ 1 the par ameter reveals th at there should be a word with hig her frequen cy than the word with the high est frequen cy in the cor pus.It seems the word should be a commas,but W e know varieties of Zip f law are controversial over the explanatio n of it’ s p hysics meaning . I V . C O N C L U S I O N word fr equency or character fre quency of oracle is fit to Zipf-Man delbroit formula as following: f i = c ( a + r ) α where α ≈ 1 , a ≈ 1 .And som e research es’ c onclusion tha t” the distribution of c haracters or character frequ ency o f oracle assembles on th e tw o end” is ju st a descrip tion by imp ression R E F E R E N C E S [1] Ramon Ferrer i Cancho an d Ricar d V . Sol. L east ef fort and the origins of scaling in human langua ge. Proc eedings of the National Academy of Scienc es , 100(3 ):788–791, 2003. [2] G.K. Zipf. Human beha vior and principle of least ef fort: an introduct ion to human ecolo gy . addison wesley , cambrid ge, massac husetts, 1949. A P P E N D I X Fig. 1. The hierarchy of the rational ,algebraic ,and transc endental numbe r . Fig. 2. Example One. 0 Recei v ed date Jun. 2014 Fig. 3. ExampleT wo. Fig. 4. ExampleT wo. Fig. 5. ExampleT wo.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment