Background: Biomedical research projects deal with data management requirements from multiple sources like funding agencies' guidelines, publisher policies, discipline best practices, and their own users' needs. We describe functional and quality requirements based on many years of experience implementing data management for the CRC 1002 and CRC 1190. A fully equipped data management software should improve documentation of experiments and materials, enable data storage and sharing according to the FAIR Guiding Principles while maximizing usability, information security, as well as software sustainability and reusability. Results: We introduce the modular web portal software menoci for data collection, experiment documentation, data publication, sharing, and preservation in biomedical research projects. Menoci modules are based on the Drupal content management system which enables lightweight deployment and setup, and creates the possibility to combine research data management with a customisable project home page or collaboration platform. Conclusions: Management of research data and digital research artefacts is transforming from individual researcher or groups best practices towards project- or organisation-wide service infrastructures. To enable and support this structural transformation process, a vital ecosystem of open source software tools is needed. Menoci is a contribution to this ecosystem of research data management tools that is specifically designed to support biomedical research projects.
Emerging data-driven research methods and the push for open and reproducible science amplify the need for strategic approaches to the management of research data across disciplines [1]. Data science and "big data" analytic applications require streamlined data collection and metadata annotation by the initial producers of source data, i.e. researchers and experimentalists in the laboratory considering the biomedical context. Research data management (RDM) is often described as a life cycle spanning task including the planning of a data generating experiment, collection of primary data, processing and analysing, publishing and sharing, preservation, and re-use of data [2]. Awareness for the challenges in long-term data management has reached a level where structural measures are put into practice, e.g. funders requiring detailed data management plans in grant applications [3]. In several scientific communities including the life sciences, further political and technological efforts to permanently install high-quality data management in the scientific process are currently supported by commitments to the FAIR Guiding Principles [4] (Findable, Accessible, Interoperable, and Reusable data). Unique and persistent resolvable identifiers (PID) [5] and descriptive metadata compatible with semantic web technologies [6] are widely applied examples of enabling tools for FAIR data sharing. Their efficient integration into routine workflows and information systems in biomedical research should be a central goal for infrastructural software development.
Several studies suggest negative economic consequences of insufficient experiment reproducibility are crucial alongside scientific and ethical aspects of RDM. Research results based on irreproducible studies might lead to a huge waste of money, staff time, and can result in delays in drug development or termination of clinical trials (e.g. [7,8]). About 1/3 of preclinical irreproducibility results from biological reagents and reference materials [7]. Li et al. report that the description of experiment materials are key for research results, although often insufficiently reported [9]. This is especially true for antibodies [10], cell lines [11,12], and animal models [13].
Diverse stakeholders demand researcher groups and projects to fulfil a large variety of requirements regarding RDM. Many of those requirements have to be addressed by the biomedical researchers. Amongst these stakeholders are funding agencies (good scientific practice), biomedical journals (authors guidelines), universities (data policies), core facilities (terms of use), and research group leaders (traceability).
Recommendations published by the German Research Foundation (Deutsche Forschungsgemeinschaft, DFG) in their guideline for Good Scientific Practice (GSP) [14] include the record keeping of (published) research data for at least ten years. Apart from the technological challenges with e.g. changing file formats and storage devices over such a long time, this implies requirements regarding the metadata enrichment in terms of data integrity and intelligibility. The latest revision of the GSP guideline (published in September 2019) [14] explicitly states that all materials, methods, and data that belong to scientific publications should comply with the FAIR Guiding Principles. This also encompasses the upload of annotated research data to public repositories as required by some journals. A collection of such public repositories can e.g. be found at the Registry of Research Data Repositories [15] or the FAIRsharing registry [16].
Biomedical experiment workflows tend to be increasingly complex regarding the application of highly sophisticated and expensive measurement devices / techniques and are therefore increasingly supported by specialized service facilities [17]. Examples include technical approaches for nucleotide sequencing, advanced light microscopy and nanoscopy (ALMN) [18] or echocardiography of research animals [19]. Professional RDM to support these facilities exemplarily includes the documentation of necessary planning steps, (sometimes very large) raw data file generation and storage, complex analytical pipelines, special software products, as well as proprietary or non-proprietary raw and metadata formats.
Regarding the relatively small time-budget of an experimental scientist, the balancing between complete and sufficient effort to collect and document as much information as possible about experimental materials and workflows is challenging. Experimental documentation must be noted down in (usually paper-based) lab notebooks of the researchers performing the experiments. Moreover, if, e.g. antibodies are used by several lab members of a research group, this information should be present in every single notebook. In some research groups the switch to Electronic Laboratory Notebook (ELN) based documentation is intended to simplify and enhance some aspects of GSP and RDM [20].
These i
This content is AI-processed based on open access ArXiv data.