Integrating Research Data Management into Geographical Information Systems
Ocean modelling requires the production of high-fidelity computational meshes upon which to solve the equations of motion. The production of such meshes by hand is often infeasible, considering the complexity of the bathymetry and coastlines. The use of Geographical Information Systems (GIS) is therefore a key component to discretising the region of interest and producing a mesh appropriate to resolve the dynamics. However, all data associated with the production of a mesh must be provided in order to contribute to the overall recomputability of the subsequent simulation. This work presents the integration of research data management in QMesh, a tool for generating meshes using GIS. The tool uses the PyRDM library to provide a quick and easy way for scientists to publish meshes, and all data required to regenerate them, to persistent online repositories. These repositories are assigned unique identifiers to enable proper citation of the meshes in journal articles.
💡 Research Summary
The paper addresses a critical bottleneck in ocean modelling: the creation of high‑fidelity, unstructured computational meshes that accurately represent complex bathymetry and coastlines. Manual mesh generation is impractical for large‑scale, high‑resolution simulations, so the authors rely on Geographic Information Systems (GIS), specifically QGIS, to define the domain geometry. QMesh, a Python‑based tool developed at Imperial College London, reads QGIS project files, extracts geometrical layers (shapefiles, NetCDF bathymetry and resolution fields), and converts them into a format consumable by the mesh generator Gmsh. Gmsh then produces the final mesh, which can be used in CFD solvers for ocean dynamics.
A major problem identified is the lack of reproducibility: while papers often describe the simulation setup, the actual mesh and the exact software versions used are rarely shared as supplementary material. To solve this, the authors integrate the PyRDM (Python Research Data Management) library into QMesh, creating a “publish‑with‑one‑click” workflow. PyRDM automatically parses the <datasource> tags in the QGIS XML project to locate every dependent file, bundles them together with the QMesh source code, and uploads the package to persistent online repositories such as Figshare, Zenodo, or DSpace via their REST APIs. Upon successful upload, a Digital Object Identifier (DOI) is minted and returned to the user for citation. For source code, PyRDM queries the Git repository to obtain the exact commit hash, checks whether that version has already been published, and either re‑uses an existing DOI or creates a new repository entry.
The tool is available both as a command‑line interface (CLI) and as a graphical plugin within QGIS. In the GUI, the user simply selects the target repository (default Figshare), decides whether the data should be public or private, and clicks a button; the CLI requires the path to the QGIS project file and optional flags. The process is fully automated: after the user confirms the action, PyRDM creates the repository, uploads all files, and displays the DOI. The authors demonstrate the workflow with a realistic case study: generating a mesh for the Orkney and Shetland Isles region. After constructing the QGIS layers (coastlines, bathymetry, resolution), they run QMesh to produce the mesh, then use the publishing tool to deposit both the input GIS files and the resulting mesh on Figshare. The resulting Figshare page lists all files, assigns a title and tags derived from the QGIS project name, and provides a DOI that can be cited in journal articles.
In the discussion, the authors highlight several practical challenges. First, repository APIs lack standardisation: Figshare requires each author’s Figshare ID in an AUTHORS file, whereas a more universal solution would be ORCID authentication, which Figshare is beginning to support. Zenodo’s API currently does not allow searching for existing repositories, limiting duplicate‑check capabilities. Second, storage limits on free accounts (e.g., Figshare’s 1 GB total, 250 MB per file) are insufficient for large ocean‑scale meshes, prompting a recommendation to use institutional Figshare installations or other cloud‑based private repositories. Third, reproducibility depends not only on QMesh but also on the exact versions of its dependencies, especially Gmsh; the authors suggest that future work should capture and publish these version details as well.
Overall, the integration of a research‑data‑management publishing tool into a GIS environment dramatically lowers the effort required for scientists to share both their meshes and the software that generated them. By automating metadata extraction, repository creation, and DOI assignment, the system encourages open science practices, improves citation of data products, and enhances the reproducibility of ocean‑modelling studies. The paper concludes with a call for the scientific community to provide incentives and cultural support for such open data practices, thereby overcoming the motivational barrier that currently hampers widespread adoption.
Comments & Academic Discussion
Loading comments...
Leave a Comment