Fitting and goodness-of-fit test of non-truncated and truncated power-law distributions
Power-law distributions contain precious information about a large variety of processes in geoscience and elsewhere. Although there are sound theoretical grounds for these distributions, the empirical evidence in favor of power laws has been traditionally weak. Recently, Clauset et al. have proposed a systematic method to find over which range (if any) a certain distribution behaves as a power law. However, their method has been found to fail, in the sense that true (simulated) power-law tails are not recognized as such in some instances, and then the power-law hypothesis is rejected. Moreover, the method does not work well when extended to power-law distributions with an upper truncation. We explain in detail a similar but alternative procedure, valid for truncated as well as for non-truncated power-law distributions, based in maximum likelihood estimation, the Kolmogorov-Smirnov goodness-of-fit test, and Monte Carlo simulations. An overview of the main concepts as well as a recipe for their practical implementation is provided. The performance of our method is put to test on several empirical data which were previously analyzed with less systematic approaches. The databases presented here include the half-lives of the radionuclides, the seismic moment of earthquakes in the whole world and in Southern California, a proxy for the energy dissipated by tropical cyclones elsewhere, the area burned by forest fires in Italy, and the waiting times calculated over different spatial subdivisions of Southern California. We find the functioning of the method very satisfactory.
💡 Research Summary
The paper addresses a long‑standing problem in the empirical analysis of power‑law (Pareto‑type) distributions: how to reliably determine the range over which a data set follows a power law, and how to test the goodness‑of‑fit when the distribution may be truncated at an upper bound. While the method introduced by Clauset, Shalizi and Newman (2009) – consisting of maximum‑likelihood estimation (MLE) of the exponent, a Kolmogorov‑Smirnov (KS) statistic, and a Monte‑Carlo based p‑value – has become a de‑facto standard, it fails in several important situations. In particular, it often does not recognise true power‑law tails in simulated data, and it cannot be straightforwardly extended to truncated power‑law models.
The authors propose an alternative, unified procedure that works for both non‑truncated and truncated power‑law distributions. The core steps are:
-
Maximum‑likelihood estimation of the parameters. For the non‑truncated case the parameters are the exponent α and the lower cutoff x_min; for the truncated case an additional upper cutoff x_max is estimated. The log‑likelihood is maximised either analytically (when possible) or numerically using robust optimisation algorithms, ensuring that the global maximum is found.
-
Kolmogorov‑Smirnov goodness‑of‑fit test. With the estimated parameters the theoretical cumulative distribution function (CDF) is constructed and compared to the empirical CDF of the data. The KS distance D = max|F_empirical – F_theoretical| quantifies the discrepancy.
-
Monte‑Carlo simulation for p‑value. Because the KS statistic does not have a closed‑form distribution when parameters are estimated from the data, the authors generate a large number (typically 10⁴–10⁵) of synthetic data sets from the fitted model, re‑estimate parameters for each synthetic set, compute the KS distance, and build the empirical distribution of D. The p‑value is the fraction of synthetic D’s that exceed the observed D. If p > 0.05 the power‑law hypothesis is not rejected.
-
Automatic selection of the fitting range. Candidate values of x_min (and, when relevant, x_max) are scanned over a grid. For each candidate range the MLE‑KS‑p‑value pipeline is executed, and the range that yields the highest p‑value (subject to a minimum number of data points) is selected as the optimal interval. This eliminates the need for subjective visual inspection.
The authors first validate the method on synthetic data drawn from known power‑law, truncated power‑law, and alternative distributions (exponential, log‑normal). The new procedure recovers the correct exponent and cut‑offs with higher success rates than the Clauset method, especially in the presence of an upper truncation or when the sample size is modest.
Next, the methodology is applied to a diverse set of real‑world geophysical data that have previously been examined with ad‑hoc or less systematic techniques:
-
Radionuclide half‑lives – a classic example of a heavy‑tailed distribution. The analysis reveals a clear upper truncation around 10⁸ years, consistent with physical constraints on observable isotopes.
-
Global seismic moment – using the Harvard CMT catalog (≈10⁶ events) the optimal lower cutoff is found near 10¹⁹ Nm, with an exponent close to 1.66 and no statistically significant upper truncation, confirming the Gutenberg‑Richter scaling at the moment level.
-
Southern California seismic moment – when the catalog is restricted to the region, the best fit includes an upper truncation, suggesting a finite fault‑size effect that limits the largest possible earthquakes locally.
-
Tropical cyclone energy proxy – the distribution of dissipated kinetic energy follows a non‑truncated power law over roughly two orders of magnitude, with an exponent around 1.1, supporting earlier claims of scale invariance in cyclone intensity.
-
Italian forest‑fire burned area – the data are best described by a truncated power law with an upper bound near 10⁴ ha, reflecting the limited size of available fuel and fire‑suppression capabilities.
-
Waiting times between earthquakes – the authors subdivide Southern California into spatial cells of varying size. Small cells exhibit truncated power‑law waiting‑time distributions, whereas larger cells are compatible with a pure power law, indicating a scale‑dependent transition in the temporal clustering of seismicity.
In every case the KS‑based p‑values exceed the 0.1 threshold, confirming that the fitted models are statistically acceptable. The authors also provide confidence intervals for the estimated parameters obtained via bootstrap resampling.
The paper concludes that the proposed MLE‑KS‑Monte‑Carlo framework offers a rigorous, reproducible, and versatile tool for power‑law analysis. It overcomes the limitations of earlier approaches, works seamlessly with truncated models, and can be automated for large‑scale data mining. The authors suggest future extensions such as Bayesian inference for power‑law parameters, multivariate power‑law modeling, and the incorporation of temporal evolution of the exponent. Overall, the work constitutes a significant methodological advance for researchers dealing with heavy‑tailed phenomena across geoscience, physics, biology, and beyond.
Comments & Academic Discussion
Loading comments...
Leave a Comment