On fitting power laws to ecological data
Heavy-tailed or power-law distributions are becoming increasingly common in biological literature. A wide range of biological data has been fitted to distributions with heavy tails. Many of these studies use simple fitting methods to find the parameters in the distribution, which can give highly misleading results. The potential pitfalls that can occur when using these methods are pointed out, and a step-by-step guide to fitting power-law distributions and assessing their goodness-of-fit is offered.
💡 Research Summary
The paper “On fitting power laws to ecological data” addresses a growing trend in ecology and related biological fields: the frequent claim that various empirical measurements follow a power‑law (heavy‑tailed) distribution. While such claims are attractive because power‑law models capture the prevalence of extreme events and rare species, the authors demonstrate that many published studies rely on overly simplistic fitting procedures that can produce severely biased parameter estimates and misleading scientific conclusions.
The authors begin by defining the canonical continuous power‑law probability density function (p(x)=C x^{-\alpha}) for (x\ge x_{\min}), where (\alpha) is the scaling exponent and (C) is a normalising constant. They note that the exponent’s value determines whether moments such as the mean or variance are finite, a property that has profound ecological implications. The paper then critiques the dominant “log‑log plot plus linear regression” approach. Two fundamental problems are highlighted: (1) the logarithmic transformation inflates the influence of small observations, introducing systematic bias; (2) linear regression is typically performed on a visually selected tail region (often the top 10–20 % of the data), which is an arbitrary choice that ignores the statistical definition of the tail. Consequently, the estimated exponent (\alpha) can be substantially over‑ or under‑estimated, and the goodness‑of‑fit is never formally tested.
To remedy these issues, the authors adopt the maximum‑likelihood estimation (MLE) framework introduced by Clauset, Shalizi, and Newman (2009). The procedure consists of three tightly coupled steps. First, a candidate set of lower cut‑off values (x_{\min}) is generated; for each candidate, the Kolmogorov–Smirnov (KS) distance between the empirical cumulative distribution and the theoretical power‑law CDF is computed. The (x_{\min}) that minimises the KS distance is selected as the optimal threshold, thereby providing an objective definition of the tail region. Second, given this threshold, the scaling exponent is estimated by the closed‑form MLE formula (\alpha = 1 + n \big
Comments & Academic Discussion
Loading comments...
Leave a Comment