Zipfs Law Leads to Heaps Law: Analyzing Their Relation in Finite-Size Systems

Zipfs Law Leads to Heaps Law: Analyzing Their Relation in Finite-Size   Systems
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Background: Zipf’s law and Heaps’ law are observed in disparate complex systems. Of particular interests, these two laws often appear together. Many theoretical models and analyses are performed to understand their co-occurrence in real systems, but it still lacks a clear picture about their relation. Methodology/Principal Findings: We show that the Heaps’ law can be considered as a derivative phenomenon if the system obeys the Zipf’s law. Furthermore, we refine the known approximate solution of the Heaps’ exponent provided the Zipf’s exponent. We show that the approximate solution is indeed an asymptotic solution for infinite systems, while in the finite-size system the Heaps’ exponent is sensitive to the system size. Extensive empirical analysis on tens of disparate systems demonstrates that our refined results can better capture the relation between the Zipf’s and Heaps’ exponents. Conclusions/Significance: The present analysis provides a clear picture about the relation between the Zipf’s law and Heaps’ law without the help of any specific stochastic model, namely the Heaps’ law is indeed a derivative phenomenon from Zipf’s law. The presented numerical method gives considerably better estimation of the Heaps’ exponent given the Zipf’s exponent and the system size. Our analysis provides some insights and implications of real complex systems, for example, one can naturally obtained a better explanation of the accelerated growth of scale-free networks.


💡 Research Summary

The paper tackles a long‑standing puzzle in complex‑system science: why Zipf’s law (a rank‑frequency power law) and Heaps’ law (a sub‑linear growth law for the number of distinct items) so often appear together. Rather than invoking a specific stochastic generative model, the authors start from the assumption that the system’s frequencies follow a pure Zipf distribution (f(r)=C,r^{-\alpha}). By integrating this distribution they derive the cumulative token count at which a word of rank (r) first appears, (N_r\approx C,r^{1-\alpha}/(1-\alpha)). Inverting this relation yields an explicit expression for the number of distinct tokens as a function of the total token count:
\


Comments & Academic Discussion

Loading comments...

Leave a Comment