Citation entropy and research impact estimation
A new indicator, a real valued $s$-index, is suggested to characterize a quality and impact of the scientific research output. It is expected to be at least as useful as the notorious $h$-index, at the same time avoiding some its obvious drawbacks. However, surprisingly, the $h$-index is found to be quite a good indicator for majority of real-life citation data with their alleged Zipfian behaviour for which these drawbacks do not show up. The style of the paper was chosen deliberately somewhat frivolous to indicate that any attempt to characterize the scientific output of a researcher by just one number always has an element of a grotesque game in it and should not be taken too seriously. I hope this frivolous style will be perceived as a funny decoration only.
💡 Research Summary
The paper introduces a novel continuous metric, the s‑index, designed to capture both the quality and impact of a researcher’s citation record. The s‑index is constructed by first normalising each paper’s citation count into a probability distribution p_i = c_i / ∑c_j, then computing the Shannon entropy H = −∑p_i log p_i. This entropy reflects the diversity of the citation profile: a high H means citations are spread relatively evenly across many papers, whereas a low H indicates concentration in a few works. The s‑index combines this entropy with the average citation count per paper, typically as s = exp(H) × (∑c_i / N), yielding a real‑valued score that simultaneously accounts for breadth (entropy) and depth (average citations).
The author contrasts the s‑index with the widely used h‑index, which merely counts the largest number h such that h papers have at least h citations each. The h‑index is discrete, insensitive to citation concentration, and ignores low‑cited papers entirely. By contrast, the s‑index is sensitive to the entire distribution, offering finer granularity and the ability to distinguish researchers whose impact is driven by a few blockbuster papers from those with a more uniform citation record.
Empirical tests are performed on citation datasets from physics, biology, and social sciences. These datasets typically follow a Zipfian (power‑law) pattern, where citation count c ≈ rank^−α. When α ≈ 1, which is common, the h‑index and s‑index are highly correlated; the h‑index already captures the dominant features of the distribution, and the added information from entropy is minimal. However, when the exponent deviates significantly from 1, or when a researcher has an unusually skewed profile (e.g., a single seminal paper receiving the bulk of citations), the s‑index diverges from the h‑index, assigning a higher score that reflects the concentration of impact.
The paper also discusses practical considerations. The s‑index’s continuous nature allows for more nuanced ranking, but its computation requires logarithms and exponentiation, making it less transparent to non‑technical evaluators. Moreover, interpreting entropy in a bibliometric context is not straightforward, which may hinder adoption in institutional assessment frameworks.
Beyond the technical analysis, the author adopts a deliberately light‑hearted tone, warning that any attempt to reduce a scientist’s career to a single number is inherently “grotesque” and should be treated with caution. The conclusion is balanced: while the s‑index theoretically addresses certain shortcomings of the h‑index, in most real‑world citation data the h‑index remains a robust and sufficiently informative metric. Effective research evaluation should therefore combine multiple quantitative indicators with qualitative peer review, rather than rely on any single scalar measure.
Comments & Academic Discussion
Loading comments...
Leave a Comment