Comment on "Language Trees and Zipping" arXiv:cond-mat/0108530
Every encoding has priori information if the encoding represents any semantic information of the unverse or object. Encoding means mapping from the unverse to the string or strings of digits. The semantic here is used in the model-theoretic sense or …
Authors: Xiuli Wang
Commen t on ”Langua ge T rees an d Zipping” Xiu-Li W a ng ∗ Dep artmant of Chinese Liter atur e and L anguage Anh ui University Hefei Anhui 230039 China (Dated: Octob er 24, 2018) every encoding has priori information if the enco ding represents an y seman tic information of the un- verse or ob ject.Encoding means mappin g from the un- verse to the string or strings of digits. The seman tic here is used in the model-theoretic sense or denotation of the ob ject.if enco d ing or strings of sym b ols is the adequate and true mapping of mo del or ob ject,and the mapping is recursive or computable ,th e distance betw een tw o strings(text)is mapping the distance betw een models.W e then are able to measure the distance by computing the d istance be- tw een the tw o strings.Oherwise,w e ma y take a misleading course.”language tree” ma y n ot be a family tree in the sense of historical linguistics.Rather it just means the similarit y COMMENT ON ”LANGUAGE T REES AND ZIPPING” Several statements that Benedetto e t al. make in their Letter [1, 2]are not certainly true.Firs t,W e claim a state- men t that Bene de tto et al. . make in their Letter and their r eply [1, 2]has mixed str ings of symbols with the ob jects or models the strings denote.In another word ,strings o f symbols are different from the o b ject or model the strings denote except when the strings only denote themselves.Moreov er,a s tatemen t o f the comment on the Letter by Dmitry V. Khmelev et al. is inaccurate [3].That is ,”No tice that the lang uage tree (L T) diagram [1 ] do es not include the Russia n langua g e (Slavic family of Indo- Europ ean family of languag es: 288 × 10 6 sp eak er s ). Our computations show that once Russian is included, it does not cluster with the other mem b ers of the Slavic gr oup. Obviously , certain Cyrillic alpha b et based lang uages w ere left out of the study , whic h improves results significantly and s ho ws that a prio ri information abo ut the alphab et is being taken adv antage of to achiev e the results outlined in their Letter .”. String of symbols and symbol may self-refer or re- fer to other o b ject.When It r efer to or denote another ob ject ,we say the ob ject is mo del of the string o f symbols or mea ning (semantics) of the string of s y m- bo ls [4, 5 ].The string of sy m b ols repre s en ts the o b ject or the mo del.Ob vious ly when It refer to or denote Itself,the meaning or mo del and the symbol or string of symbo ls are the same.The alphab et or text(string of symbols) are not language.They are symbols or strings of sym b ols that just recor d the langua ge Clearly ,every enco ding has pr iori information if the enco ding repres en ts a n y semantic information of the un- verse or ob ject.Enco ding means mapping from the un- verse to the string or strings of digits. The semantic here is used in the mo del-theoretic sense or denotation of the ob ject .By choos ing a str ing o r co de that maps the entities,relation and function in the unv erse to sym- bo ls a nd the rela tion,function of the symbols ,W e en- co de our knowledge ab out the mo del or ob ject to o.If we enco de the ob ject by rando mly assigning the ob ject to a string every one or mach ine ca n not reco gnize or get any informatio n ab out the unv ers e or the o b ject without the ass ignmen t.F or instance,by isomorphism ,a group is mapp ed to a group which maintain a n y information o f the former one such as relations function etc.If the gro up is mapp ed to an o ther structure randomly ,we ca n not get a n y informatio n ab out the former one from the latter one without the mapping ,ev en when we know there ex- ist a ma pping from the gro up to the str uc tur e. W e may consider the a logical sen tence as the co de of its mo del.A more concrete example is the binar y co de o f integer.If the mapping from integer to binary co de is ra ndom,w e can no t recov er the integer from its binary co de with- out the mapping.Even the mapping is not random ,that is, the mapping is recur siv e or co mputable ,w e hav e to make effor t to get the information if we know there exists a mapping that is recursive,or we a re unable to get an y information ab out the integer.Afterall ,the mapping and the mo del a string corre s pond to a re priori infor mation that human b eing provide. Therefore,it is true tha t ev ery enco ding has priori in- formation whic h is symboliza tion(mapping to symbol) of part or all of the human being’s knowledge ab out the mo del.Ev en when ”As for the ob jectio n concerning the co ding chosen for our texts, o ne has to remember that a zipp er r eads the sequences of c hara cters which o ne in- puts to it, nothing more than this. The idea o f comparing languages written with different a lphabets canno t forget this simple statement. In order to compa re languag es written with different a lphabets one should, fo r instance, consider texts written with the phonetic alpha b et. This is the reason for not having included in our preliminary analysis o f the language tr e e langua ges such as Chines e, Greek, Russian, e tc.” , the pho netic alphab et with which the texts a r e wr itten enco des the knowledge of human ab out the language. Hence,if the dis tance that Benedetto et al. define is capable of the mea sure of s imilarit y of the compr essed text,It at most measures the similarity b et ween the t wo text co mpared .If the alphabet computationally repr esen t some info r mation of language ,the distance resulted from the co mparison is the measure of the similar it y o f infor- 2 mation of the language .Otherwise It is just the measure of the similarit y of the text. When the co mpression technique is applied to DNA se- quence to cluster DNA,the distance is just the measure o f the similarity .Only under the presupp osition that DNA is mapping of featur e s of creature can we ge t so me infor- mation of cr eature such as evolution rela tion or family tree. Secondly ,the lang ua ge tree ma y not be a family tree .Indo-Euro pean family of languages is not a concept that describ e the family co mposed of descenda n ts and their ancestor [6]. Many Language s are descendants of a same archaic one.They are v ery similar in spelling,sy n tax ev en mean- ing or sema n tics when they inherit or use the same alpha - bet.Histo r ical linguist compar e language in sp elling (pho- netics),syntax and meaning to reconstr uct their ances- tor.But unfor tunately these effort and r esults are proved not to b e solid or reliable in man y cases without data such as historica l text reco rd .Rather,W e know that s im- ilarity may b e b ecause of type o f la nguages that ha pp en to b e similar in some a spect ,interaction b et ween lan- guages whic h is called linguis tic union or b eing des cen- dant of a same ancient father.There is no genetic rela - tionship betw een langua g es, but they still share featur es, and they a re sp o k en in the same regio n .Ba lk an linguistic union or sprach bunds, such as Albanian, Greek , Bulgar- ian and Romanian are all IE la nguages .How ever, they are not closely r elated. Cla ssification of languages may be genetic t yp ological or a real(linguistic union) [6].So,what do es the term ”languag e tree” mean?It ma y not b e a fam- ily tree in the sens e of historica l linguistics.Rather it just means the similarity [6].By the technique,Benedetto et al. just show the similar it y be t ween the texts ,or the sim- ilarity b et ween the langua ges that may no t b e similarity among mem b ers o f family only if the simila rit y betw een the text (strings or sym b ols) is the mapping of the sim- ilarity b et ween the langua ges adequa tely and truly .The language tree is no t able to b e considered as a family tree in the s ense of histo r ical ling uistics. Thirdly ,the distance Benedetto et al. define in their Let- ter is s imilar to the NID definition b y Li Ming [7].As w e discuss relation b et ween the enco ding and mo del above,if enco ding or s trings o f symbo ls is the a dequate and true mapping of model or ob ject,and the mapping is recursive or computable ,the distance b et ween tw o strings(text)is mapping the distance be tw ee n models .W e then are able to measure the distance by co mputing the distance b e - t ween the tw o strings.O herwise,w e may ta k e a misleading course. There is inten tion (presupp osition) in pure mathe- matic re search that the mapping from mo del to string is not considere d as a key ques tio n.But a pplication to practical pr oblem may cause trouble or error.In fact,it has to b e solved fir stly to decide wether mapping fro m mo del to str ing or str ings contains the information of the mo del,although we often do the mapping that is heuris- tic and v alid. As everyone knows,theory of physics is the ”strings” ,and exper imen ts o f physics is to test o r chec k weth er the mapping is v alid.The empirica l science may be consider a s sea rc hing for a nd testing ma pping . Thank Ming -Hui Zhang who works as a faculty in Physics Depar tmen t of Anhui Universit y for helpful dis- cussion. ∗ w angxiuli@ahu.edu.cn [1] D. Benedetto, E. Cagli oti, and V. Loreto, Phys. Rev. Lett. 88 , 048702 (2002). [2] D. Benedetto, E. Cagli oti, and V. Loreto, Phys. Rev. Lett. 90 , 089804 (2003). [3] D. V. Khmelev and W. J. T eahan, Phys. Rev. Lett. 90 , 089803 (2003). [4] S. G. Simpson, Model Theory (1998), UR L http://www .math.psu.edu/sim ps on/courses/math563. [5] M. Otto, Algorithmic Mo del Theory for Sp ecific Semantic Domains (2002), URL http://www - compsci.swan.ac.uk/ ~ csmartin/a mt.html . [6] R. H.Robins, Current T rends in Linguistics 11 , 3 (1973). [7] M. Li a nd P . M. B. Vitan yi, An Intr o duction to Kolmo gor ov Com plexity and Its Applic ations (Springer- V erlag, Berlin, 1997), second edition ed.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment