Digital archives are the preferred means for open access to research data. They play essential roles in knowledge infrastructures - robust networks of people, artifacts, and institutions - but little is known about how they mediate information exchange between stakeholders. We open the "black box" of data archives by studying DANS, the Data Archiving and Networked Services institute of The Netherlands, which manages 50+ years of data from the social sciences, humanities, and other domains. Our interviews, weblogs, ethnography, and document analyses reveal that a few large contributors provide a steady flow of content, but most are academic researchers who submit datasets infrequently and often restrict access to their files. Consumers are a diverse group that overlaps minimally with contributors. Archivists devote about half their time to aiding contributors with curation processes and half to assisting consumers. Given the diversity and infrequency of usage, human assistance in curation and search remains essential. DANS' knowledge infrastructure encompasses public and private stakeholders who contribute, consume, harvest, and serve their data - many of whom did not exist at the time the DANS collections originated - reinforcing the need for continuous investment in digital data archives as their communities, technologies, and services evolve.
Open access to data, or any other profound shift in scholarly practice, does not occur by mandate alone. Rather, change occurs incrementally, as knowledge infrastructures adapt to these new practices. Knowledge infrastructures are "robust networks of people, artifacts, and institutions that generate, share, and maintain specific knowledge about the human and natural worlds" (Edwards, 2010, p. 17). They are living systems influenced by complex sociotechnical factors (Borgman, Darch, et al., 2015;Edwards et al., 2013;Karasti & Blomberg, 2017).
Many stakeholders are involved in the knowledge infrastructures associated with research data. These include the scholars and teams who produce those data, funding agencies that provide the resources to conduct research, universities and other research institutions where investigations are based or conducted, research policymakers in public and private organizations, current and prospective users of those data, and the libraries and archives that may acquire and steward those data. These stakeholders are bound together by community relationships, contracts, and myriad information technologies. Making data “open” occurs in a knowledge infrastructure that mediates exchanges between creators and consumers, both enabling and constraining the uses that can be made of those data.
While research data can be exchanged publicly or privately, data archives are the mechanism preferred by most journals and funding agencies (Borgman, 2015;Pasquetto, 2018;Wallis, Rolando, & Borgman, 2013). Digital data archives play central roles in knowledge infrastructures as entities that facilitate the flow of data between parties, often over long periods of time. Despite the growth in research about data practices, sharing, and reuse, and advances in standards and practices through organizations such as the Research Data Alliance and Force11, few have studied the role of data archives in knowledge infrastructures. All too often, the data archive is a “black box” to which data are contributed and from which data are retrieved (Borgman, Darch, et al., 2015;Borgman, Darch, Sands, Wallis, & Traweek, 2014;Force11, 2018;Mayernik, Wallis, & Borgman, 2013;Mayernik, Wallis, Pepe, & Borgman, 2008;Pasquetto, Randles, & Borgman, 2017;Pasquetto, Sands, & Borgman, 2015;Research Data Alliance, 2018a;Wallis et al., 2013).
The study reported here opens that black box to examine the roles and relationships of data contributors, data consumers, and data curators. Of specific concern are the characteristics and capabilities of knowledge infrastructures supporting data exchange and the mediating roles played by archives as institutions and by archivists as partners with contributors and consumers.
Digital data archives are not monolithic entities; they take many forms and have many homes. Some collect only data of certain types and formats, such as genome sequences for biological research or survey data for the social and economic sciences. Others are more generic, collecting textual documents, static and moving images, audio, and other data types. Data archives range widely in mission, from providing immediate access to replication datasets to long-term preservation. Accordingly, they vary in the degree of investment in data curation. Some institutions devote days or weeks of professional labor to curating each dataset before deposit; others rely on “self-curation,” accepting data in whatever form submitted, with minimal review. The longevity of collections also varies from short-term grant funding to long-term commitments by universities, governments, or other agencies (Borgman, 2015; “Directory of Open Access Repositories -SHERPA Services,” 2018; International Council of Scientific Unions, 2018; National Science Board (U.S.), 2005). Business models may be based on memberships, grant funding, institutional support, contributions, corporate for-profit entities, or a combination (Shankar, Eschenfelder, & Downey, 2016).
This paper reports on a case study, conducted over a period of three years, of a significant exemplar of digital research data archives: the Dutch Data Archiving and Networked Services institute. DANS was chosen to represent several trends in knowledge infrastructures associated with research data. It serves multiple communities with a diverse array of material spanning the social sciences and humanities, plus some physical and life sciences content. It is a governmentfunded entity responsible for collecting certain categories of data, thus providing an opportunity to assess the influence of policy mandates. DANS is a node in multiple networks of data repositories and digital libraries, nationally and internationally, being part of intersecting knowledge infrastructures. Lastly, DANS provides “self-archiving” services, holding contributors largely responsible for curation activities prior to deposit. However, they also employ a staff of archivists, providing opportunities to observe the distribut
This content is AI-processed based on open access ArXiv data.