A Serverless Tool for Platform Agnostic Computational Experiment Management
Neuroscience has been carried into the domain of big data and high performance computing (HPC) on the backs of initiatives in data collection and an increasingly compute-intensive tools. While managing HPC experiments requires considerable technical acumen, platforms and standards have been developed to ease this burden on scientists. While web-portals make resources widely accessible, data organizations such as the Brain Imaging Data Structure and tool description languages such as Boutiques provide researchers with a foothold to tackle these problems using their own datasets, pipelines, and environments. While these standards lower the barrier to adoption of HPC and cloud systems for neuroscience applications, they still require the consolidation of disparate domain-specific knowledge. We present Clowdr, a lightweight tool to launch experiments on HPC systems and clouds, record rich execution records, and enable the accessible sharing of experimental summaries and results. Clowdr uniquely sits between web platforms and bare-metal applications for experiment management by preserving the flexibility of do-it-yourself solutions while providing a low barrier for developing, deploying and disseminating neuroscientific analysis.
💡 Research Summary
The paper introduces Clowdr, a lightweight, server‑less tool designed to bridge the gap between fully managed web portals and low‑level, do‑it‑yourself (DIY) scripts for running neuroimaging experiments on high‑performance computing (HPC) clusters and cloud platforms. Modern neuroscience increasingly relies on massive datasets and compute‑intensive pipelines, yet many researchers lack the expertise required to configure job schedulers, manage container images, and collect reproducibility‑critical metadata. Existing solutions fall into two camps: (1) web‑based portals such as OpenNeuro or CBRAIN, which are user‑friendly but often lock users into a specific platform and limit pipeline customization; and (2) DIY approaches that give full flexibility but demand deep knowledge of cluster administration, container orchestration, and provenance tracking. Clowdr aims to combine the best of both worlds by leveraging two community standards—BIDS (Brain Imaging Data Structure) for data organization and Boutiques for tool description—while providing an automated, low‑overhead execution environment.
Technical Architecture
Clowdr is implemented as a Python 3.9 command‑line interface (CLI) coupled with a background worker process. Users invoke the tool with a single command (clowdr launch) supplying a BIDS‑formatted dataset path, a Boutiques JSON descriptor for the analysis tool, and optional execution parameters (e.g., SLURM partition, AWS Batch queue). The CLI validates inputs using bids-validator and parses the Boutiques schema to generate a concrete execution script. The worker abstracts over multiple back‑ends: it can submit jobs to traditional HPC schedulers (SLURM, PBS, SGE) by generating appropriate batch scripts, or to cloud batch services (AWS Batch, Google Cloud Life Sciences) via their respective SDKs. For container execution, Clowdr supports both Docker and Singularity, automatically pulling the image defined in the Boutiques descriptor and recording its SHA‑256 digest.
Execution Recording and Provenance
A central innovation is the automatic creation of a detailed “execution record” in JSON format once a job finishes. This record contains:
- Job identifiers, start/end timestamps, and wall‑clock duration.
- Resource usage (CPU cores, GPU devices, memory consumption).
- SHA‑256 hashes of all input and output files.
- The exact container image hash.
- The full set of Boutiques parameters actually passed to the tool.
- User‑defined metadata (project ID, researcher name, etc.).
These records enable rigorous reproducibility checks, facilitate error diagnosis, and serve as machine‑readable inputs for downstream meta‑analyses. The tool also produces a static HTML report that visualizes the execution record, provides download links for logs and outputs, and renders preview images (e.g., NIfTI slices) for quick inspection. Because the report is static, it can be hosted on any web server, GitHub Pages, or cloud storage without additional backend services.
Security and Access Control
Clowdr follows the principle of least privilege. In cloud environments it creates temporary IAM roles scoped only to the required S3 buckets and automatically revokes them after job completion. On HPC systems it operates under the user’s existing account, avoiding the need for extra authentication mechanisms.
Empirical Evaluation
Two case studies demonstrate Clowdr’s practicality. The first involved a large‑scale functional MRI preprocessing pipeline (using FSL and AFNI) run on a SLURM cluster for 200 participants; the entire workflow completed within 48 hours with minimal manual intervention. The second case migrated a combined structural and diffusion MRI pipeline to AWS Batch, achieving a ~30 % cost reduction compared with an on‑premise cluster while preserving full provenance via execution records. In both scenarios, the generated JSON records were directly reusable for reproducibility audits and for feeding into larger consortium‑level analyses.
Limitations and Future Work
Current support is limited to a handful of schedulers (SLURM, PBS, SGE) and cloud batch services (AWS Batch, Google Cloud Life Sciences). Multi‑node MPI workloads receive only basic resource allocation, and there is no built‑in network‑cost optimization for moving terabytes of imaging data. The execution‑record schema, while rich, is not yet fully aligned with broader community standards such as RO‑Crate or the Neuroimaging Data Model, necessitating conversion steps for interoperability. The authors propose a plugin architecture to add Kubernetes/Argo support, RO‑Crate export, and integration with cloud‑native file systems (e.g., Amazon FSx for Lustre) to address these gaps.
Conclusion
Clowdr offers a pragmatic middle ground: it preserves the flexibility of DIY scripting while dramatically lowering the barrier to entry for HPC and cloud‑based neuroimaging analyses. By automatically handling BIDS‑compliant data, Boutiques‑described tools, container execution, and comprehensive provenance capture, Clowdr enables researchers to focus on scientific questions rather than infrastructure details. The tool’s lightweight, server‑less design, coupled with its emphasis on reproducibility and easy result sharing, makes it a valuable addition to the neuroinformatics ecosystem.
Comments & Academic Discussion
Loading comments...
Leave a Comment