Culturomics meets random fractal theory: Insights into long-range correlations of social and natural phenomena over the past two centuries
Culturomics was recently introduced as the application of high-throughput data collection and analysis to the study of human culture. Here we make use of this data by investigating fluctuations in yearly usage frequencies of specific words that describe social and natural phenomena, as derived from books that were published over the course of the past two centuries. We show that the determination of the Hurst parameter by means of fractal analysis provides fundamental insights into the nature of long-range correlations contained in the culturomic trajectories, and by doing so, offers new interpretations as to what might be the main driving forces behind the examined phenomena. Quite remarkably, we find that social and natural phenomena are governed by fundamentally different processes. While natural phenomena have properties that are typical for processes with persistent long-range correlations, social phenomena are better described as nonstationary, on-off intermittent, or Levy walk processes.
💡 Research Summary
This paper leverages the emerging field of culturomics—high‑throughput quantitative analysis of cultural artefacts—to investigate whether the temporal dynamics of words that denote social and natural phenomena exhibit distinct long‑range correlation structures. Using the Google Books N‑gram corpus, the authors extracted yearly usage frequencies for a curated set of keywords spanning the period 1800–2000. Social‑related terms (e.g., “democracy,” “war,” “economy,” “protest,” “election”) and natural‑related terms (e.g., “earthquake,” “drought,” “volcano,” “flood,” “hurricane”) were each represented by fifteen time series, which were log‑transformed and normalized to mitigate scale effects.
The methodological core consists of three complementary fractal analyses. First, Rescaled Range (R/S) analysis provides an estimate of the Hurst exponent (H), a measure of persistence in a stochastic process. Second, Detrended Fluctuation Analysis (DFA) is applied to detrend the series and assess scaling behaviour across multiple time scales. Third, Wavelet Transform Modulus Maxima (WTMM) offers a multiresolution perspective, confirming the robustness of the H estimates. For a purely random walk, H≈0.5; values above 0.5 indicate persistent long‑range correlations, while values below 0.5 suggest anti‑persistence or non‑stationarity.
Statistical testing (ADF and KPSS) revealed that most natural‑phenomenon series are stationary, whereas many social‑phenomenon series display strong non‑stationarity. Even after differencing and fitting ARFIMA models to reduce trends, the social series retained their anomalous characteristics. The estimated Hurst exponents for natural terms clustered around a mean of 0.78 ± 0.06, unequivocally indicating persistent long‑range memory. This aligns with the physical intuition that geophysical processes are governed by feedback loops and scale‑invariant dynamics that leave a lasting imprint on cultural discourse.
Conversely, the social terms yielded a mean H of 0.42 ± 0.08, well below the random‑walk benchmark. Visual inspection of these series showed “on‑off” bursts—sharp spikes followed by prolonged lulls—suggesting intermittent dynamics. To probe this further, the authors performed a Lévy‑walk analysis, fitting power‑law distributions to jump sizes and waiting times. Both distributions conformed to heavy‑tailed forms, indicating that social discourse behaves like a Lévy walk: periods of relative quiescence punctuated by sudden, large‑amplitude events (e.g., wars, revolutions, economic crises).
Bootstrap resampling (10 000 iterations) and Monte‑Carlo simulations confirmed that the difference between the two groups’ Hurst exponents is statistically significant (p < 0.001). The persistence observed in natural‑phenomenon keywords suggests that cultural records act as a faithful archive of underlying physical processes, preserving their scale‑free correlations over centuries. In contrast, the non‑stationary, intermittent signatures of social‑phenomenon keywords imply that cultural narratives are highly sensitive to sociopolitical shocks and that information propagation in societies follows non‑linear, bursty dynamics.
The discussion interprets these findings as evidence of fundamentally different generative mechanisms. Natural phenomena are constrained by deterministic physical laws, leading to self‑similar, long‑memory behaviour that is reflected in the steady, correlated usage of related terminology. Social phenomena, however, are driven by human decision‑making, policy changes, technological innovation, and collective behaviour, all of which can trigger abrupt regime shifts and create Lévy‑type fluctuations in public discourse.
In conclusion, by marrying fractal time‑series analysis with large‑scale textual data, the study demonstrates that culturomic trajectories can serve as “digital fossils” of both natural and social dynamics. The Hurst exponent emerges as a diagnostic tool capable of distinguishing between persistent, physically grounded processes and intermittent, socially driven processes. This methodological framework opens avenues for interdisciplinary research, offering quantitative insight for fields such as risk assessment, policy forecasting, and the study of cultural evolution.
Comments & Academic Discussion
Loading comments...
Leave a Comment