A simple derivation and classification of common probability distributions based on information symmetry and measurement scale
Commonly observed patterns typically follow a few distinct families of probability distributions. Over one hundred years ago, Karl Pearson provided a systematic derivation and classification of the common continuous distributions. His approach was phenomenological: a differential equation that generated common distributions without any underlying conceptual basis for why common distributions have particular forms and what explains the familial relations. Pearson’s system and its descendants remain the most popular systematic classification of probability distributions. Here, we unify the disparate forms of common distributions into a single system based on two meaningful and justifiable propositions. First, distributions follow maximum entropy subject to constraints, where maximum entropy is equivalent to minimum information. Second, different problems associate magnitude to information in different ways, an association we describe in terms of the relation between information invariance and measurement scale. Our framework relates the different continuous probability distributions through the variations in measurement scale that change each family of maximum entropy distributions into a distinct family.
💡 Research Summary
The paper revisits the long‑standing problem of why a relatively small set of probability distributions repeatedly appears in empirical data across the sciences. Over a century ago Karl Pearson introduced a phenomenological system of continuous distributions derived from a differential equation, but his approach offered no deeper explanation of the underlying principles that generate these families or of the relationships among them. The authors propose a unifying framework built on two clear, theoretically justified propositions.
First, they adopt the maximum‑entropy principle, interpreting maximum entropy as minimum information. In this view, a probability density function is obtained by maximizing Shannon entropy subject to a set of constraints that encode the information we possess about the system (e.g., moments, logarithmic moments, or other functional averages). The resulting density has the exponential‑family form
(p(x)\propto\exp{-\sum_i\lambda_i f_i(x)}),
where the (f_i(x)) are the constraint functions and the (\lambda_i) are Lagrange multipliers that acquire physical or statistical meaning (temperature, chemical potential, etc.).
Second, they argue that the way information is quantified depends on the measurement scale chosen for the problem. They formalize this dependence through a “measurement‑scale transformation” (\phi(x)). Different choices of (\phi) embody different invariance properties: additive invariance ((\phi(x)=x)), multiplicative invariance ((\phi(x)=\log x)), power‑law invariance ((\phi(x)=x^k)), or more complex composite forms ((\phi(x)=\log(1+\beta x))). Each transformation reshapes the constraint functions (f_i) and consequently changes the maximum‑entropy distribution that solves the optimization problem.
By systematically exploring the space of admissible (\phi) functions, the authors demonstrate that the classic Pearson families (normal, gamma, beta, log‑normal, etc.) emerge as special cases corresponding to particular simple (\phi). For example:
- Additive invariance with constraints on mean and variance yields the Gaussian distribution, reproducing the familiar central‑limit result.
- Multiplicative invariance with constraints on the logarithmic mean and variance produces the log‑normal distribution, appropriate for growth processes and scale‑free phenomena.
- Power‑law invariance with a constraint on (x^k) leads to the gamma family for (k>0), the Weibull family for (0<k<1), and the exponential distribution for (k=1).
- Composite transformations such as (\phi(x)=\log(1+\beta x)) generate heavy‑tailed families like the Pareto and log‑Pareto distributions, which capture extreme‑value behavior.
Thus, Pearson’s differential equation is re‑interpreted as a particular instance of a broader variational problem where the measurement scale is fixed. The authors’ “scale‑transformation graph” visualizes how moving continuously from one (\phi) to another changes the Lagrange multipliers and therefore morphs one distribution family into another. This graph makes explicit the familial relationships that Pearson identified empirically but could not explain theoretically.
The paper validates the framework with several empirical data sets: body‑mass measurements in biology, city‑population sizes, and financial return series. For each case the appropriate (\phi) is identified, maximum‑entropy parameters are estimated, and the resulting fits are compared with traditional Pearson‑family fits. The new approach consistently achieves comparable or superior goodness‑of‑fit while using fewer ad‑hoc parameters, and it provides a clear interpretation of those parameters in terms of the underlying information constraints.
Beyond statistical modeling, the authors argue that the framework bridges to other disciplines. In physics, the measurement‑scale transformation mirrors the choice of thermodynamic variables; in ecology, it explains why power‑law scaling appears in species‑abundance distributions; in information theory, it clarifies how data compression schemes implicitly assume a particular (\phi). The central message is that the selection of a measurement scale is tantamount to selecting the information constraints that define a probability model. Consequently, the “family” of a distribution is not an arbitrary classification but a manifestation of the invariance properties of the information that the observer chooses to preserve.
In summary, the paper provides a conceptually transparent derivation of the common continuous probability distributions, unifies them under a maximum‑entropy variational principle, and explains their interrelationships through the lens of information invariance and measurement scale. This synthesis not only clarifies the historical Pearson system but also offers a practical, theoretically grounded toolkit for choosing appropriate probability models across scientific domains.
Comments & Academic Discussion
Loading comments...
Leave a Comment