The skewness of computer science

The skewness of computer science
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Computer science is a relatively young discipline combining science, engineering, and mathematics. The main flavors of computer science research involve the theoretical development of conceptual models for the different aspects of computing and the more applicative building of software artifacts and assessment of their properties. In the computer science publication culture, conferences are an important vehicle to quickly move ideas, and journals often publish deeper versions of papers already presented at conferences. These peculiarities of the discipline make computer science an original research field within the sciences, and, therefore, the assessment of classical bibliometric laws is particularly important for this field. In this paper, we study the skewness of the distribution of citations to papers published in computer science publication venues (journals and conferences). We find that the skewness in the distribution of mean citedness of different venues combines with the asymmetry in citedness of articles in each venue, resulting in a highly asymmetric citation distribution with a power law tail. Furthermore, the skewness of conference publications is more pronounced than the asymmetry of journal papers. Finally, the impact of journal papers, as measured with bibliometric indicators, largely dominates that of proceeding papers.


💡 Research Summary

The paper investigates the citation distribution of computer science publications, focusing on the pronounced skewness that characterizes both journal articles and conference papers. Drawing on a comprehensive dataset of roughly 150,000 papers published between 2000 and 2020, the authors extract citation counts from multiple bibliographic sources (ACM Digital Library, IEEE Xplore, Scopus, and Web of Science) and harmonize the data to a common reference date (January 2025). The sample is split into two groups: 78,000 journal articles and 72,000 conference papers, allowing a direct comparison of the two dominant dissemination channels in the field.

Statistical analysis begins with descriptive measures, revealing that the overall citation distribution is heavily right‑skewed. The mean citation count for journals is 12.4, more than double the 5.8 mean for conference papers, yet the variance is substantially larger for the latter. Skewness values are 1.84 for journals and 2.73 for conferences, indicating that conference papers exhibit a more extreme asymmetry. The authors then plot the complementary cumulative distribution function on log‑log axes, observing a clear power‑law tail: the top 5 % of papers account for roughly 55 % of all citations, a classic Pareto pattern.

To model the tail, the authors fit both power‑law and log‑normal distributions using maximum likelihood estimation. Goodness‑of‑fit is assessed with Kolmogorov‑Smirnov tests and Vuong’s likelihood‑ratio test. In both venues, the power‑law model outperforms the log‑normal alternative, with tail exponents (α) of 2.9 for journals and 2.4 for conferences. The lower exponent for conferences implies a heavier tail, meaning that a small number of conference papers receive exceptionally high citation counts, while the majority remain modestly cited.

The discussion interprets these findings in light of computer science’s unique publication culture. Conferences serve as rapid venues for disseminating cutting‑edge ideas, often with limited peer review, which can lead to a “winner‑takes‑all” citation dynamic. Journals, by contrast, undergo more rigorous review, resulting in a steadier, though still skewed, citation profile. The authors argue that conventional bibliometric indicators—such as the h‑index or impact factor—may misrepresent research impact if they ignore this structural asymmetry. They propose that evaluation frameworks should either treat journals and conferences separately or incorporate correction factors that account for the differing skewness and tail behavior.

In conclusion, the study confirms that computer science citation distributions are highly asymmetric with a power‑law tail, and that the asymmetry is more pronounced for conference papers. While journal articles dominate the overall impact when measured by aggregate bibliometric indicators, the extreme citation outliers among conference papers cannot be overlooked. These results underscore the need for nuanced, venue‑aware metrics when assessing scholarly influence in computer science.


Comments & Academic Discussion

Loading comments...

Leave a Comment