Fractional counting of authorship to quantify scientific research output

Fractional counting of authorship to quantify scientific research output
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We investigate the problem of counting co-authorhip in order to quantify the impact and relevance of scientific research output through normalized \textit{h-index} and \textit{g-index}. We use the papers whose authors belong to a subset of full professors of the Italian Settore Scientifico Disciplinare (SSD) FIS01 - Experimental Physics. In this SSD two populations, characterized by the number of co-authors of each paper, are roughly present. The total number of citations for each individuals, as well as their h-index and g-index, strongly depends on the average number of co-authors. We show that, in order to remove the dependence of the various indices on the two populations, the best way to define a fractional counting of autorship is to divide the number of citations received by each paper by the square root of the number of co-authors. This allows us to obtain some information which can be used for a better understanding of the scientific knowledge made through the process of writing and publishing papers.


💡 Research Summary

The paper addresses a well‑known bias in bibliometric indicators: the h‑index and g‑index, which are widely used to quantify an individual’s scientific output, are strongly affected by the average number of co‑authors per paper. To investigate this issue, the author selects a homogeneous yet internally diverse sample of 60 full professors in experimental physics (SSD FIS01) from Italian universities, representing roughly a quarter of all professors in that discipline. Using the Web of Science database, all publications and citation counts for these scholars are retrieved.

Initial analysis confirms the classic relationships reported by Hirsch: total citations C_tot scale linearly with the square of the h‑index (C_tot = α H², α≈4.45) and with the square of the g‑index (C_tot = β G², β≈1.68). However, plotting H and G against the average number of co‑authors M_j for each professor reveals two distinct sub‑populations: some researchers publish many papers with large author lists (high‑energy experiments), while others work in smaller groups. Consequently, both H and G increase with M_j, making direct comparisons across individuals unfair.

To correct for this, the author proposes a fractional counting scheme. For each paper i with m_i authors and C_i citations, a weighted citation χ_i(μ)=C_i / m_i^μ is defined, where μ∈(0,1) is a tunable exponent. Using these weighted citations, a fractional h‑index h_μ and a fractional g‑index g_μ are constructed analogously to the original definitions: h_μ is the largest integer such that the h_μ‑th ranked weighted citation is at least h_μ, and g_μ is the largest integer for which the sum of the top g_μ weighted citations reaches at least g_μ².

The key methodological step is to find the value μ* that minimizes the dependence of h_μ and g_μ on M_j. The author formulates an objective function f_p(N,μ) that quantifies the residual correlation for both indices (p = h,g) and searches for the μ that minimizes the sum f_h+f_g across the dataset of N = 60 scholars. The optimization yields μ* ≈ 0.53 ± 0.01, essentially ½. This result implies that dividing each paper’s citations by the square root of its author count (√m) best neutralizes the co‑authorship effect.

When μ≈0.5, the normalized indices h_μ and g_μ become virtually independent of M_j, as demonstrated by scatter plots and correlation analyses. Moreover, the total weighted citations C(μ)_tot still obey linear relations C(μ)_tot = α_μ h_μ² = β_μ g_μ², with α_μ≈5.45 and β_μ≈2.04, slightly larger than the original α and β because the weighting reduces the contribution of highly co‑authored papers.

The distribution of the normalized indices across the 60 scholars is examined by constructing histograms of h_μ and g_μ (with μ = 0.53). Both histograms are well fitted by Cauchy‑Lorentz functions L_p(x)=L₀+2A/


Comments & Academic Discussion

Loading comments...

Leave a Comment