Enforcing public data archiving policies in academic publishing: A study of ecology journals

Reading time: 5 minute
...

📝 Original Info

  • Title: Enforcing public data archiving policies in academic publishing: A study of ecology journals
  • ArXiv ID: 1810.13040
  • Date: 2023-06-15
  • Authors: - John Doe - Jane Smith - Richard Roe

📝 Abstract

To improve the quality and efficiency of research, groups within the scientific community seek to exploit the value of data sharing. Funders, institutions, and specialist organizations are developing and implementing strategies to encourage or mandate data sharing within and across disciplines, with varying degrees of success. Academic journals in ecology and evolution have adopted several types of public data archiving policies requiring authors to make data underlying scholarly manuscripts freely available. Yet anecdotes from the community and studies evaluating data availability suggest that these policies have not obtained the desired effects, both in terms of quantity and quality of available datasets. We conducted a qualitative, interview-based study with journal editorial staff and other stakeholders in the academic publishing process to examine how journals enforce data archiving policies. We specifically sought to establish who editors and other stakeholders perceive as responsible for ensuring data completeness and quality in the peer review process. Our analysis revealed little consensus with regard to how data archiving policies should be enforced and who should hold authors accountable for dataset submissions. Themes in interviewee responses included hopefulness that reviewers would take the initiative to review datasets and trust in authors to ensure the completeness and quality of their datasets. We highlight problematic aspects of these thematic responses and offer potential starting points for improvement of the public data archiving process.

💡 Deep Analysis

Figure 1

📄 Full Content

The value of open data in the scientific discovery process is well-documented (Bowker, 2001;Hilgartner, 2013;Leonelli, 2013;GEO, 2015). As Michael Nielsen (2011) wrote in Reinventing discovery: The new era of networked science (p. 108), "Scientists in many fields are collaborating online to create enormous databases that map out the structure of the universe, the world's climate, the world's oceans, human languages, and even all the species of life." Sharing data, as Nielsen and others (e.g., The Royal Society, 2012) note, increases the speed and enhances the quality of scientific discovery. In some cases, creating the "enormous databases" to facilitate improved science is a direct result of answering scientific questions: No one astronomer, for example, can build and deploy the tools necessary to survey distant galaxies without direct coordination and collaboration. In other words, sometimes shared databases emerge out of necessity. Other cases require coordination of small-scale projects that could, in theory, exist as standalone pursuits without sharing data; instead, researchers recognize some value-whether scientific, legal, moral, or other-in sharing datasets.

Coordination of data sharing efforts in the latter cases relies on a number of stakeholders. Funding agencies, for example, might seek to streamline their efforts by requiring data to be shared and preventing costly re-collection. Funders have both incentives and enforcement mechanisms readily available (i.e., “carrots and sticks”) (Couture et al., 2018;Diekema et al., 2014). Other stakeholders, including scientific journals, manage a delicate balance between incentives and enforcement. These journals are increasingly requiring researchers to make datasets associated with manuscripts available, often by establishing public data archiving (PDA) policies (Roche et al., 2015). PDA policies illustrate journals’ recognition of data archiving as an essential step in the research process (Whitlock et al., 2010;Vines et al., 2013), yet the appropriate mechanisms for managing the data archiving process are, to date, undefined.

Journals have several motivations for instituting PDA policies and developing appropriate strategies for incentivizing and enforcing compliance. In principle, policies requiring authors to publish the datasets underpinning analyses in their manuscripts facilitate scrutinization, reproduction, and replication of studies (Bloom et al., 2014;Goecks et al., 2010). The resulting transparency can increase public trust in science (Beardsley, 2010;Duke and Porter, 2013;South and Duke, 2010) and, by extension, enhance the reputation of the journal. Journals may also view PDA as a way to increase citations, to provide other researchers interested in the same or similar phenomena with resources, and to provide valuable objects of collaboration (Borgman, 2007;Edwards et al., 2011). Furthermore, PDA policies aid in ensuring the sustainability and quality of scientific data. Without adequate PDA, the short and long-term sustainability of research data diminishes (Kaye and Hawkins, 2014;Vines et al., 2014). Incorporating review of datasets into the publishing process can help to avert some of what Leonelli (2014: 1) refers to as “difficulties caused by the lack of adequate curation for the vast majority of data in the life sciences.”

Publishers have implemented PDA policies in various journals across scientific disciplines (see Appendix A for a list of examples in the biological sciences) and many different types of policies have emerged. In general, PDA policies fall on a spectrum from those only requiring authors to make data “available upon request” (e.g., via email from an interested party) to requiring authors to deposit datasets in specific repositories housing specific types of data (e.g., GenBank, the universal choice for genome sequencing data). Each approach has benefits and drawbacks that require editorial staff to balance incentives and enforcement. As we discuss in the next section, many journals have moved beyond “available upon request” policies and instead fall somewhere between voluntary dataset contribution and mandated PDA. For example, some journals require authors to write data availability statements (brief attestations to where the data are located) and allow authors to choose from a variety of repositories to house their data.

Moving beyond “available upon request” policies served as a consensus step toward realizing the potential of open data in science. As Michener describes, factors such as the availability of technical infrastructure for data sharing and funder policies requiring sharing drove these changes in norms (Michener, 2015). However, the effectiveness of PDA policies for enabling reproduction, replication, and data reuse remains questionable. For example, Roche et al. (2015: 1) found that 56% of published datasets related to manuscripts in top ecology journals were incomplete, and 64% “were archived in a w

📸 Image Gallery

cover.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut