Punctuation effects in English and Esperanto texts

Reading time: 6 minute
...

📝 Original Info

  • Title: Punctuation effects in English and Esperanto texts
  • ArXiv ID: 1004.4848
  • Date: 2012-09-04
  • Authors: Researchers from original ArXiv paper

📝 Abstract

A statistical physics study of punctuation effects on sentence lengths is presented for written texts: {\it Alice in wonderland} and {\it Through a looking glass}. The translation of the first text into esperanto is also considered as a test for the role of punctuation in defining a style, and for contrasting natural and artificial, but written, languages. Several log-log plots of the sentence length-rank relationship are presented for the major punctuation marks. Different power laws are observed with characteristic exponents. The exponent can take a value much less than unity ($ca.$ 0.50 or 0.30) depending on how a sentence is defined. The texts are also mapped into time series based on the word frequencies. The quantitative differences between the original and translated texts are very minutes, at the exponent level. It is argued that sentences seem to be more reliable than word distributions in discussing an author style.

💡 Deep Analysis

Deep Dive into Punctuation effects in English and Esperanto texts.

A statistical physics study of punctuation effects on sentence lengths is presented for written texts: {\it Alice in wonderland} and {\it Through a looking glass}. The translation of the first text into esperanto is also considered as a test for the role of punctuation in defining a style, and for contrasting natural and artificial, but written, languages. Several log-log plots of the sentence length-rank relationship are presented for the major punctuation marks. Different power laws are observed with characteristic exponents. The exponent can take a value much less than unity ($ca.$ 0.50 or 0.30) depending on how a sentence is defined. The texts are also mapped into time series based on the word frequencies. The quantitative differences between the original and translated texts are very minutes, at the exponent level. It is argued that sentences seem to be more reliable than word distributions in discussing an author style.

📄 Full Content

Punctuation effects in english and esperanto texts M. AUSLOOS previously at : GRAPES@SUPRATECS, Universit´e de Li`ege, Sart-Tilman, B-4000 Li`ege, Euroland nowadays at : 7 rue des Chartreux, B-4122 Plainevaux, Belgium Abstract A statistical physics study of punctuation effects on sentence lengths is presented for written texts: Alice in wonderland and Through a looking glass. The translation of the first text into esperanto is also considered as a test for the role of punctuation in defining a style, and for contrasting natural and artificial, but written, languages. Several log-log plots of the sentence length-rank relationship are presented for the major punctuation marks. Different power laws are observed with characteristic exponents. The exponent can take a value much less than unity (ca. 0.50 or 0.30) depending on how a sentence is defined. The texts are also mapped into time series based on the word frequencies. The quantitative differences between the original and translated texts are very minutes, at the exponent level. It is argued that sentences seem to be more reliable than word distributions in discussing an author style. Key words: texts, sentence statistics, Zipf, ranking, translation, esperanto 1 Introduction Since [1], there is a relatively interesting set of studies pertaining to the struc- ture of written texts through techniques based on statistical physics ideas and methods, usually measuring the word length or/and word frequency distribu- tion. Without claiming to be exhaustive, let us mention recent studies, much after 2000, on german [2], polish [3] english and irish [3–8], chinese [7–10], japanese [11], greek [12–14], turkish [15], hungarian [16], welsh [17], baltic and slavic [18], but also in less natural languages like fortran [19], artificial [20], or Email address: marcel.ausloos@ulg.ac.be (M. AUSLOOS ). Preprint submitted to Elsevier 2 November 2018 arXiv:1004.4848v1 [cs.CL] 27 Apr 2010 esperanto [21]. Of course these studies are partially a revival of an enormous flurry of studies in linguistics which started as early as 1930 and included later on work by Zipf and many others[22–24]. Debates exist whether a few texts are sufficiently representative of a language and how big a lexicon must be before it becomes significant. This caveat presented, it is fair to say that it seems that several specific features of written texts have not been studied in detail. The role of punctuation on the structure of texts is one of these. According to wikipedia the first inscription with punctuation mark is the Mesha Stele (9thBC); see http : //en.wikipedia.org/wiki/Mesha−Stele. A long time ago Greeks and Romans adopted a few punctuation marks (the dot and combinations, essentially) in order to mark pauses in texts, to be played. Other historical details on the creation, dissemination, use and types of punctuations in various languages can be found in http : //en.wikipedia.org/wiki/Punctuation, and http : //grammar.ccc.commnet.edu/grammar/marks/marks.htm. Through these e-references, it can be learned that punctuation marks are sym- bols that indicate the structure and organization of a written text in a specific language, for readability, as much as for suggesting intonation and pauses when reading aloud. In written English, punctuation is vital to disambiguate the meaning of sentences, though this does not go without problems [25,26]. Notice that some modern writers have attempted to go in some sense back- ward. As far as 1895, Crane published The Black Riders and Other Lines [27] in capital letters: the poems appearing without punctuation, an unusual ty- pographical presentation for the time, - a style system considered as garbage by the critics. In another language, e.g. french, Apollinaire [28] published one of his major pieces Alcools without punctuation. Thereafter, Similarly, the french surrealists and dadaists scorned punctuation, like Aragon [29] who avoided any in most of his poems and prose for/about Elsa Triolet. That fol- lowed from the para-psychological theory put forward by Breton [30] in The Manifesto, containing new/practical recipes for enhancing the Magic Surreal- ist Art, such as: ”...Punctuation of course necessarily hinders the stream of absolute continuity which preoccupies us ... ”. This was recently ”poetically” reformulated by Hahn [31] in The Pity of Punctuation poem. Some ”maxi- mum” was likely reached by Joyce [32]. In Ulysses symbolically conserving the structure of Homers The Odyssey, where there is no punctuation, Joyce omits punctuation entirely, in the last chapter of the novel, - consisting of eight long paragraphs, in order to mimic the uninterrupted flow of naked thoughts. Thus punctuation could be avoided. Indeed there is some redundance, since a capital letter can indicate to the reader a new sentence. One major difficulty 2 nevertheless occurs in text analysis: it is more easy to observe a punctuation sign on a text than a capital letter. However, fundamentally, in lite

…(Full text truncated)…

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut