When is an article actually published? An analysis of online availability, publication, and indexation dates

With the acceleration of scholarly communication in the digital era, the publication year is no longer a sufficient level of time aggregation for bibliometric and social media indicators. Papers are increasingly cited before they have been officially published in a journal issue and mentioned on Twitter within days of online availability. In order to find a suitable proxy for the day of online publication allowing for the computation of more accurate benchmarks and fine-grained citation and social media event windows, various dates are compared for a set of 58,896 papers published by Nature Publishing Group, PLOS, Springer and Wiley-Blackwell in 2012. Dates include the online date provided by the publishers, the month of the journal issue, the Web of Science indexing date, the date of the first tweet mentioning the paper as well as the Altmetric.com publication and first-seen dates. Comparing these dates, the analysis reveals that large differences exist between publishers, leading to the conclusion that more transparency and standardization is needed in the reporting of publication dates. The date on which the fixed journal article (Version of Record) is first made available on the publisher’s website is proposed as a consistent definition of the online date.

💡 Research Summary

The paper tackles a fundamental problem that has emerged with the digital acceleration of scholarly communication: the traditional reliance on the calendar year of journal issue publication no longer provides a precise temporal anchor for bibliometric and altmetric analyses. Researchers increasingly cite and discuss papers the moment they appear online, often weeks or months before the article is assigned to a printed issue. Consequently, any evaluation that uses “publication year” as the sole time reference introduces systematic bias into citation windows, social‑media event counts, and policy‑relevant metrics.

To identify a reliable proxy for the true moment of public availability, the authors assembled a large dataset of 58,896 research articles published in 2012 by four major publishers—Nature Publishing Group, PLOS, Springer, and Wiley‑Blackwell. For each article they collected five distinct dates: (1) the online‑first date supplied by the publisher (often labeled “Online First,” “Early View,” or similar), (2) the month of the formal journal issue, (3) the date the article was indexed in the Web of Science (WoS) database, (4) the timestamp of the first tweet that mentioned the article (as recorded by Altmetric.com), and (5) two Altmetric.com‑derived dates—its “publication date” and the “first‑seen date” (the earliest moment Altmetric detected any online trace of the article).

Statistical comparison of these dates revealed pronounced heterogeneity across publishers. Wiley‑Blackwell frequently omitted an explicit online‑first date; Altmetric’s inferred date therefore lagged the actual online appearance by an average of 45 days. PLOS displayed the greatest alignment, with its online‑first and issue dates virtually coincident, resulting in negligible lag. Springer and Nature Publishing Group showed systematic offsets: their online‑first dates preceded the issue month by 2–3 months on average, and in extreme cases by up to six months. The WoS indexing date consistently trailed the publisher’s online‑first date by roughly 30 days, reflecting the time needed for the article to be harvested, processed, and entered into the citation index.

Social‑media timing added another layer of complexity. The first tweet mentioning an article often appeared 1–2 days before the publisher’s online‑first date, indicating that scholars sometimes accessed pre‑print versions, author‑hosted PDFs, or institutional repositories and shared them prior to formal online release. Altmetric’s “publication date” generally matched the publisher’s online‑first date, but its “first‑seen date” proved volatile, shifting with the frequency of Altmetric’s crawls and thus offering limited reliability for precise temporal analyses.

These discrepancies have concrete implications for research evaluation. A citation window defined as “two years from publication year” will over‑estimate the citation potential of articles released early in the calendar year and underestimate that of late‑year releases, because the actual exposure period differs by months. Similarly, altmetric event windows that start at the issue month will miss early social‑media attention for many papers, skewing impact assessments. The authors therefore propose a unified definition: the date on which the fixed Version of Record (VOR) is first made publicly accessible on the publisher’s website. This “online publication date” should be recorded in article metadata and exposed via standard APIs (e.g., Crossref, DOI registration agencies) to ensure consistent retrieval across platforms.

Adopting this standard would enable more accurate alignment of citation counts, tweet bursts, news mentions, and other time‑sensitive indicators with the true moment of scholarly availability. It would also facilitate cross‑publisher comparisons, improve the robustness of longitudinal studies, and support funders and institutions that rely on fine‑grained metrics for assessment and decision‑making. The paper concludes by calling for greater transparency from publishers, systematic reporting of the VOR online date, and coordinated efforts among bibliographic databases, altmetric aggregators, and standards bodies to embed this date as a mandatory element of scholarly metadata.

💡 Research Summary

📜 Original Paper Content