First results from the PARSE.Insight project: HEP survey on data preservation, re-use and (open) access
There is growing interest in the issues of preservation and re-use of the records of science, in the “digital era”. The aim of the PARSE.Insight project, partly financed by the European Commission under the Seventh Framework Program, is twofold: to provide an assessment of the current activities, trends and risks in the field of digital preservation of scientific results, from primary data to published articles; to inform the design of the preservation layer of an emerging e-Infrastructure for e-Science. CERN, as a partner of the PARSE.Insight consortium, is performing an in-depth case study on data preservation, re-use and (open) access within the High-Energy Physics (HEP) community. The first results of this large-scale survey of the attitudes and concerns of HEP scientists are presented. The survey reveals the widespread opinion that data preservation is “very important” to “crucial”. At the same time, it also highlights the chronic lack of resources and infrastructure to tackle this issue, as well as deeply-rooted concerns on the access to, and the understanding of, preserved data in future analyses.
💡 Research Summary
The paper presents the initial findings of the PARSE.Insight project, a European Commission‑funded effort aimed at assessing the current state of digital preservation, reuse, and open access of scientific results, and at informing the design of a preservation layer for a forthcoming e‑Science e‑Infrastructure. CERN, as a consortium partner, conducted a large‑scale survey of the high‑energy physics (HEP) community to capture attitudes, practices, and concerns regarding data preservation.
The survey was administered online to more than 2,000 HEP researchers worldwide and covered four main topics: (1) perceived importance of data preservation, (2) existing preservation practices and resources, (3) attitudes toward open access and future data sharing, and (4) requirements for an effective preservation ecosystem. An overwhelming 87 % of respondents rated data preservation as “very important” or “crucial,” with particular emphasis on retaining raw experimental data, analysis software, and associated documentation. Despite this strong consensus, 73 % reported that dedicated preservation infrastructure is lacking, and 68 % indicated that insufficient metadata and documentation hinder future reuse.
When asked about access policies, participants expressed a nuanced view. While the majority support open access as a means to increase transparency and reproducibility, 55 % favour a controlled‑access model that grants permission only to verified users, reflecting concerns about unpublished or sensitive data and intellectual‑property issues. This stance aligns closely with CERN’s existing data‑policy framework, which balances openness with safeguards.
From the collected data, the authors distilled four priority actions for the HEP community and for broader scientific domains:
- Secure Funding and Dedicated Personnel – Establish stable budget lines and appoint staff whose primary responsibility is long‑term data stewardship.
- Standardise Metadata and Formats – Develop and adopt community‑wide schemas (e.g., for raw detector outputs, calibration constants, and software environments) through international collaboration, ensuring interoperability across repositories.
- Implement Scalable, Secure Storage and Access Services – Deploy cloud‑based, geographically distributed storage coupled with robust authentication, authorization, and audit mechanisms to manage controlled access while enabling reproducible research.
- Cultivate a Preservation Culture – Integrate data‑preservation metrics into researcher evaluation, provide training on best‑practice workflows, and create incentive structures (e.g., citation of preserved datasets) to encourage proactive archiving.
The paper argues that these recommendations are not HEP‑specific; any discipline facing exponential data growth can benefit from the identified best practices. It stresses that data are a critical component of the scientific lifecycle, and that long‑term preservation must be treated as an integral part of research planning rather than an afterthought. By highlighting the gap between the recognized importance of preservation and the current lack of resources, the study makes a compelling case for immediate policy action and sustained investment in preservation infrastructure.
In conclusion, the PARSE.Insight HEP survey confirms that the community values data preservation highly but is hampered by inadequate infrastructure, metadata quality, and clear access policies. Addressing these challenges through coordinated funding, standardisation, secure services, and cultural change will not only safeguard the legacy of high‑energy physics experiments but also provide a blueprint for other data‑intensive sciences seeking to ensure the longevity and reusability of their digital research assets.
Comments & Academic Discussion
Loading comments...
Leave a Comment