A Checklist to Publish Collections as Data in GLAM Institutions
Purpose The purpose of this study is to offer a checklist that can be used for both creating and evaluating digital collections, which are also sometimes referred to as data sets as part of the collections as data movement, suitable for computational use.
Design/methodology/approach The checklist was built by synthesising and analysing the results of relevant research literature, articles and studies and the issues and needs obtained in an observational study. The checklist was tested and applied both as a tool for assessing a selection of digital collections made available by galleries, libraries, archives and museums (GLAM) institutions as proof of concept and as a supporting tool for creating collections as data.
Findings Over the past few years, there has been a growing interest in making available digital collections published by GLAM organisations for computational use. Based on previous work, the authors defined a methodology to build a checklist for the publication of Collections as data. The authors’ evaluation showed several examples of applications that can be useful to encourage other institutions to publish their digital collections for computational use.
Originality/value While some work on making available digital collections suitable for computational use exists, giving particular attention to data quality, planning and experimentation, to the best of the authors’ knowledge, none of the work to date provides an easy-to-follow and robust checklist to publish collection data sets in GLAM institutions. This checklist intends to encourage small- and medium-sized institutions to adopt the collection as data principles in daily workflows following best practices and guidelines.
💡 Research Summary
The paper presents a practical checklist designed to help Galleries, Libraries, Archives, and Museums (GLAM) institutions transform and publish their digital collections as research‑ready datasets. Recognizing a growing interest in making cultural heritage collections computationally accessible, the authors set out to fill a gap: while prior work has addressed data quality, planning, and experimentation, no comprehensive, easy‑to‑follow checklist existed for GLAM organizations, especially smaller ones with limited resources.
Methodologically, the study proceeds in four stages. First, a systematic literature review classifies relevant sources into five categories—best practices, data quality, checklist definition, strategy & data planning, and examples & experiments. This review identifies key elements such as metadata standards (MARC XML, Dublin Core, JSON), open licensing (CC0 or similar), API provision, data cleaning/enrichment, and long‑term preservation strategies.
Second, an observational online survey conducted between 10‑30 October 2022 gathered 43 responses from GLAM institutions and researchers across the United States, Europe, and a few Asian participants. The survey reveals that most respondents have low to moderate experience with “Collections as Data,” feel insufficiently informed at the start of implementation, and cite a need for clearer guidance on strategy, budgeting, legal issues, and technical tooling.
Third, the authors synthesize the literature insights and survey findings to draft the checklist. The checklist is organized into four procedural steps—(1) identify relevant topics and conduct a literature review, (2) assess information needs and issues through stakeholder input, (3) synthesize findings into concrete checklist items, and (4) evaluate and refine the checklist through real‑world application. Within these steps, about 30 specific items are detailed, covering data format selection (structured formats such as JSON‑LD, CSV, RDF), metadata quality control (automated validation plus manual review), licensing declaration, access mechanisms (well‑documented APIs, sample queries), version control, documentation, security and privacy safeguards, and a sustainability plan for ongoing maintenance.
Fourth, the checklist is applied as a proof‑of‑concept to collections from five GLAM institutions, including the British Library, National Library of Scotland, Bibliothèque nationale du Luxembourg, and two other European libraries. The application uncovers concrete improvement actions: unifying data formats, enriching and cleaning metadata, revisiting licensing to ensure openness, and producing API documentation with usage examples. Notably, for small‑ and medium‑size institutions the “strategy & data plan” phase—defining long‑term governance, budgeting, and staffing—emerges as a critical success factor.
Key contributions of the paper are: (a) the creation of a robust, domain‑specific checklist that operationalizes the “Collections as Data” principle; (b) empirical validation through both a broad survey and targeted case studies; (c) emphasis on scalability for institutions with limited technical expertise or financial resources; and (d) alignment with emerging European initiatives such as the European Data Space for Cultural Heritage and the European Cultural Heritage Cloud, positioning the checklist as a bridge between local practice and international data‑sharing infrastructures.
The authors conclude that the checklist can substantially lower barriers for GLAM institutions to make their holdings computationally reusable, thereby fostering new research, creative applications, and cross‑institutional collaborations. Future work is suggested in three areas: developing automated tooling that can generate checklist reports from repository metadata, establishing a continuous feedback loop with data users to keep the checklist up‑to‑date, and mapping the checklist items to emerging standards for semantic interoperability (e.g., schema.org, CIDOC‑CRM) to further ease integration into global cultural‑heritage data ecosystems.
Comments & Academic Discussion
Loading comments...
Leave a Comment