Gibrats law for cities: uniformly most powerful unbiased test of the Pareto against the lognormal

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We address the general problem of testing a power law distribution versus a log-normal distribution in statistical data. This general problem is illustrated on the distribution of the 2000 US census of city sizes. We provide definitive results to close the debate between Eeckhout (2004, 2009) and Levy (2009) on the validity of Zipf’s law, which is the special Pareto law with tail exponent 1, to describe the tail of the distribution of U.S. city sizes. Because the origin of the disagreement between Eeckhout and Levy stems from the limited power of their tests, we perform the {\em uniformly most powerful unbiased test} for the null hypothesis of the Pareto distribution against the lognormal. The $p$-value and Hill’s estimator as a function of city size lower threshold confirm indubitably that the size distribution of the 1000 largest cities or so, which include more than half of the total U.S. population, is Pareto, but we rule out that the tail exponent, estimated to be $1.4 \pm 0.1$, is equal to 1. For larger ranks, the $p$-value becomes very small and Hill’s estimator decays systematically with decreasing ranks, qualifying the lognormal distribution as the better model for the set of smaller cities. These two results reconcile the opposite views of Eeckhout (2004, 2009) and Levy (2009). We explain how Gibrat’s law of proportional growth underpins both the Pareto and lognormal distributions and stress the key ingredient at the origin of their difference in standard stochastic growth models of cities \cite{Gabaix99,Eeckhout2004}.

💡 Research Summary

The paper tackles a fundamental statistical question: how to decide whether a set of empirical observations follows a Pareto (power‑law) distribution or a log‑normal distribution. The authors illustrate the problem with the distribution of U.S. city sizes from the 2000 Census, a data set that has been at the center of a long‑standing debate between Eeckhout (who argued for a Pareto tail) and Levy (who advocated a log‑normal description). The crux of the disagreement, the authors argue, lies not in the data themselves but in the limited power of the tests previously employed. To resolve this, they apply the uniformly most powerful unbiased (UMPU) test for the null hypothesis that the data are Pareto‑distributed against the alternative of a log‑normal distribution.

Methodologically, the authors first extract all incorporated places with a population of at least 5,000, yielding roughly 3,000 observations. They then impose a series of lower thresholds (x_{0}) on city size, creating nested subsamples that range from the very largest cities down to the smallest towns in the sample. For each threshold they compute two key statistics: (i) Hill’s estimator (\hat{\alpha}(x_{0})), which provides a maximum‑likelihood estimate of the Pareto tail exponent, and (ii) a p‑value derived from the UMPU test, obtained via a bootstrap procedure that respects the composite nature of the null hypothesis.

The results are strikingly clear. When the threshold is set around 100,000 inhabitants—corresponding to roughly the 1,000 largest cities, which together contain more than half of the U.S. population—the p‑value stays above the conventional 0.05 level, indicating that the Pareto hypothesis cannot be rejected. In this regime Hill’s estimator consistently yields (\hat{\alpha}=1.4\pm0.1). This value is significantly larger than the Zipf exponent of 1, thereby refuting the specific claim that city sizes follow Zipf’s law. As the threshold is lowered and smaller cities are added, the p‑value drops precipitously, often to near‑zero, while Hill’s estimate declines systematically. This pattern signals that the log‑normal distribution provides a superior fit for the lower tail of the size distribution.

Beyond the empirical findings, the paper offers a theoretical synthesis. Both the Pareto and log‑normal distributions can be derived from Gibrat’s law of proportional growth, but the two outcomes diverge depending on the stochastic properties of the growth process. If growth rates have a constant variance and the lower bound on city size is negligible, the multiplicative process yields a Pareto tail with an exponent determined by the balance between drift and diffusion (as shown in Gabaix 1999). Conversely, when a non‑trivial lower bound exists or when the variance of growth rates is relatively large, the resulting stationary distribution is log‑normal. The authors demonstrate that the U.S. city data embody exactly this mixture: the upper tail behaves as a Pareto process with a moderate exponent, while the bulk of the distribution follows a log‑normal pattern.

By applying a statistically optimal test, the authors reconcile the apparently contradictory conclusions of Eeckhout (2004, 2009) and Levy (2009). They show that both statements are correct, but each applies to a different segment of the city‑size spectrum. The paper thus establishes a new benchmark for distributional testing in urban economics and, more broadly, for any field where power‑law versus log‑normal alternatives are plausible. It also underscores the importance of considering growth‑process mechanisms—particularly the role of variance and lower bounds—in shaping the observed distribution of sizes. Future work can extend this framework to other countries, to longitudinal data, or to other phenomena (e.g., firm sizes, wealth distributions) where the same methodological and theoretical issues arise.

Gibrats law for cities: uniformly most powerful unbiased test of the Pareto against the lognormal

💡 Research Summary

Comments & Academic Discussion

Leave a Comment