Power-law distributions in empirical data
Power-law distributions occur in many situations of scientific interest and have significant consequences for our understanding of natural and man-made phenomena. Unfortunately, the detection and characterization of power laws is complicated by the large fluctuations that occur in the tail of the distribution – the part of the distribution representing large but rare events – and by the difficulty of identifying the range over which power-law behavior holds. Commonly used methods for analyzing power-law data, such as least-squares fitting, can produce substantially inaccurate estimates of parameters for power-law distributions, and even in cases where such methods return accurate answers they are still unsatisfactory because they give no indication of whether the data obey a power law at all. Here we present a principled statistical framework for discerning and quantifying power-law behavior in empirical data. Our approach combines maximum-likelihood fitting methods with goodness-of-fit tests based on the Kolmogorov-Smirnov statistic and likelihood ratios. We evaluate the effectiveness of the approach with tests on synthetic data and give critical comparisons to previous approaches. We also apply the proposed methods to twenty-four real-world data sets from a range of different disciplines, each of which has been conjectured to follow a power-law distribution. In some cases we find these conjectures to be consistent with the data while in others the power law is ruled out.
💡 Research Summary
The paper addresses the widespread but often mis‑applied claim that many empirical phenomena follow power‑law distributions. It points out two fundamental difficulties: (1) the heavy‑tailed region contains few observations, leading to large statistical fluctuations, and (2) the range over which the power law holds (the lower cutoff x_min) is rarely known a priori. Traditional approaches—most commonly log‑log plotting followed by ordinary least‑squares (OLS) fitting—are shown to be unreliable because OLS is biased by the noisy tail and provides no quantitative test of the hypothesis.
To overcome these problems the authors propose a rigorous statistical framework that consists of five steps. First, they define the continuous and discrete power‑law probability density functions, P(x) ∝ x^‑α for x ≥ x_min. Second, they estimate the scaling exponent α by maximum‑likelihood estimation (MLE) for any given x_min; for the continuous case the estimator reduces to α̂ = 1 + n
Comments & Academic Discussion
Loading comments...
Leave a Comment