Generalized Hurst exponent and multifractal function of original and translated texts mapped into frequency and length time series

Generalized Hurst exponent and multifractal function of original and   translated texts mapped into frequency and length time series
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

A nonlinear dynamics approach can be used in order to quantify complexity in written texts. As a first step, a one-dimensional system is examined : two written texts by one author (Lewis Carroll) are considered, together with one translation, into an artificial language, i.e. Esperanto are mapped into time series. Their corresponding shuffled versions are used for obtaining a “base line”. Two different one-dimensional time series are used here: (i) one based on word lengths (LTS), (ii) the other on word frequencies (FTS). It is shown that the generalized Hurst exponent $h(q)$ and the derived $f(\alpha)$ curves of the original and translated texts show marked differences. The original “texts” are far from giving a parabolic $f(\alpha)$ function, - in contrast to the shuffled texts. Moreover, the Esperanto text has more extreme values. This suggests cascade model-like, with multiscale time asymmetric features as finally written texts. A discussion of the difference and complementarity of mapping into a LTS or FTS is presented. The FTS $f(\alpha)$ curves are more opened than the LTS ones


💡 Research Summary

The paper presents a novel application of nonlinear dynamics to quantify the complexity of written texts. Using two English novels by Lewis Carroll (Alice’s Adventures in Wonderland and Through the Looking‑Glass) and their Esperanto translation, the authors convert each text into two one‑dimensional time series. The first series, called the Length Time Series (LTS), records the number of characters in each successive word. The second series, the Frequency Time Series (FTS), orders the distinct words by decreasing global frequency and records the frequency values in that order. By mapping the discrete linguistic data onto a temporal axis, the authors can treat a text as a dynamical signal and apply tools originally developed for physical systems.

The core of the analysis is the generalized Hurst exponent h(q) and the multifractal spectrum f(α) derived from it. For a given moment order q, h(q) measures how the q‑th order structure function scales with the window size. Positive q emphasizes large fluctuations (long words or high‑frequency words), while negative q highlights small fluctuations (short or rare words). A dependence of h on q signals multifractality, i.e., the presence of multiple scaling exponents. Through a Legendre transform, h(q) is converted into the singularity strength α and its fractal dimension f(α). A parabolic f(α) indicates monofractal (single‑scale) behavior; deviations from a parabola reveal a hierarchy of scales.

The empirical results are striking. Both the original English texts and the Esperanto translation exhibit a clearly nonlinear h(q) and an f(α) curve that is far from parabolic. The Esperanto version displays the widest α‑range, meaning its multifractal spectrum is more extreme than that of the English originals. This suggests that the translation process introduces different scaling rules—perhaps because Esperanto’s lexical and syntactic constraints differ from English’s. In contrast, shuffled versions of each text (where word order is randomized) produce an almost flat h(q) and a nearly perfect parabolic f(α), confirming that the observed multifractality originates from the genuine sequential organization of words rather than from the marginal distributions of length or frequency alone.

A comparison between the two mapping strategies reveals complementary information. The FTS‑based spectra are consistently broader than the LTS‑based ones, indicating that frequency information is more sensitive to the hierarchical structure of the text. This is intuitive: word frequencies obey Zipf’s law, a scale‑free distribution, whereas word lengths are confined to a relatively narrow range. Consequently, FTS captures long‑range correlations across many orders of magnitude, while LTS reflects more localized structural constraints.

The authors interpret these findings through the lens of cascade models. They argue that the writing process can be viewed as a multiscale cascade in which information is transferred asymmetrically across scales (from letters to words to phrases). The original texts retain this cascade, producing multifractal signatures; the shuffled texts destroy the cascade, leaving only a monofractal background. This perspective aligns with earlier work on turbulence and financial time series, where multifractality is linked to hierarchical energy transfer.

Beyond the methodological contribution, the study suggests several practical applications. Generalized Hurst exponents and multifractal spectra could serve as quantitative fingerprints for author identification, translation quality assessment, or cross‑linguistic stylistic comparison. Because LTS and FTS highlight different aspects of textual organization, a combined analysis could provide a richer, more nuanced characterization of literary style than traditional lexical statistics.

In conclusion, the paper demonstrates that mapping texts into simple numeric series and applying multifractal analysis uncovers deep, scale‑dependent structures that survive translation and are destroyed by randomization. The approach opens a promising avenue for interdisciplinary research at the intersection of physics, linguistics, and literary studies, and invites future work to explore larger corpora, additional languages, and alternative mappings such as syntactic trees or semantic networks.


Comments & Academic Discussion

Loading comments...

Leave a Comment