Statistical analysis of the Hirsch Index

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The Hirsch index (commonly referred to as h-index) is a bibliometric indicator which is widely recognized as effective for measuring the scientific production of a scholar since it summarizes size and impact of the research output. In a formal setting, the h-index is actually an empirical functional of the distribution of the citation counts received by the scholar. Under this approach, the asymptotic theory for the empirical h-index has been recently exploited when the citation counts follow a continuous distribution and, in particular, variance estimation has been considered for the Pareto-type and the Weibull-type distribution families. However, in bibliometric applications, citation counts display a distribution supported by the integers. Thus, we provide general properties for the empirical h-index under the small- and large-sample settings. In addition, we also introduce consistent nonparametric variance estimation, which allows for the implemention of large-sample set estimation for the theoretical h-index.

💡 Research Summary

**
The paper addresses a fundamental gap in the statistical treatment of the Hirsch index (h‑index) by focusing on the discrete nature of citation counts, which are inherently integer‑valued. While previous work, notably Beirlant and Einmahl (2010), derived asymptotic normality and variance formulas for the empirical h‑index under the assumption of a continuous citation distribution, this assumption is unrealistic for bibliometric data. The authors therefore develop a comprehensive theory that accommodates integer‑valued citation variables.

First, the authors define the theoretical h‑index hₙ as the supremum of x such that n S⁻(x) ≥ x, where S⁻(x)=P(X ≥ x) is the left‑limit of the survival function of a positive random variable X representing citation counts. When X is integer‑valued, this reduces to a maximization over natural numbers: hₙ = max{j : n S(j‑1) ≥ j}. They then introduce the empirical counterpart ˆHₙ using the empirical survival function Ŝₙ⁻(x), and show that the commonly used integer‑valued version ˆhₙ (the original Hirsch definition) coincides with ⌊ˆHₙ⌋ when X is integer‑valued.

For small samples, the authors exploit the fact that the indicator variables Y_{j,n}=I

Statistical analysis of the Hirsch Index

💡 Research Summary

Comments & Academic Discussion

Leave a Comment