In 2008, melamine in infant formula forced laboratories across three continents to verify a compound they had never monitored. Non-targeted analysis using LC/GC-HRMS handles these cases. But when findings trigger regulatory action, reproducibility becomes operational: can an independent laboratory repeat the analysis and reach the same conclusion?
We assessed 103 tools (2004-2025) against six pillars drawn from FAIR and BP4NTA principles: laboratory validation (C1), data availability (C2), code availability (C3), standardised formats (C4), knowledge integration (C5), and portable implementation (C6). Health contributed 51 tools, Pharma 31, and Chemistry 21.
Nine in ten tools shared data (C2, 90/103, 87%). Fewer than four in ten supported portable implementations (C6, 40/103, 39%). Validation and portability rarely appeared together (C1+C6, 18/103, 17%). Over twenty-one years, openness climbed from 56% to 86% while operability dropped from 55% to 43%. No tool addressed food safety.
Journal data-sharing policies increased what authors share but not what reviewers can run. Tools became easier to find but harder to execute. Strengthening C1, C4, and C6 would turn documented artifacts into workflows that external laboratories can replay.
In 2008, melamine contamination in Chinese infant formula forced laboratories across three continents to detect a compound they had never monitored [1][2][3]. Similar incidents emerged repeatedly. Between 1990 and 2009, diethylene glycol in paediatric paracetamol caused kidney failure outbreaks [4]. Industrial dyes such as Sudan Red appeared in spices [5,6]. Nitrosamine impurities emerged in widely prescribed pharmaceuticals [7,8]. Per-and polyfluoroalkyl substances (PFAS) reached drinking water supplies [9,10]. In each case, authorities had to act on compounds that were never on any predefined target list [11,12]. The analytical identification became the basis for product recalls, import detentions and public-health advisories Figure 1. It had to withstand regulatory audit and legal challenge [13].
These incidents share one problem: laboratories must identify compounds they never expected to find, often within days, and their findings must hold up across borders. When findings trigger regulatory action, reproducibility shifts from scientific ideal to operational requirement. The question becomes whether an independent laboratory can verify the claim without privileged access to the originating site. NTA platforms must deliver architectural guarantees, not just analytical performance. Regulatory compliance and public health depend on it. Reproducibility must be built into the infrastructure, not retrofitted after findings emerge. Understanding why requires distinguishing targeted from non-targeted approaches. Classical targeted analysis works when the contaminant is known and available as a reference standard [14]. It fails when the species is novel, intentionally disguised or absent from surveillance lists. This gap has driven the rise of NTA. LC/GC-HRMS acquires full-scan data across broad mass ranges, prioritises unexpected features and retrospectively interrogates stored data for compounds not anticipated at acquisition [15,16]. LC-HRMS targets polar and thermolabile compounds; GC-HRMS serves volatile and semi-volatile species; both share the non-targeted acquisition logic. LC/GC-HRMS NTA platforms now inform decisions in pharmaceutical quality control, clinical diagnostics and environmental monitoring precisely because action must be taken on emerging or previously unrecognised chemical signals [17,18].
When NTA findings trigger regulatory action, data and scripts deposited in repositories are necessary but not sufficient [19,20]. It must mean that an independent group, potentially in a different jurisdiction running on a different compute stack, can begin from the declared inputs, execute the declared workflow and reach concordant conclusions within agreed tolerances [21,22]. The question shifts from “did you find something?” to “can someone else verify what you found?” [23]. Reproducibility becomes a property of the analytical architecture [24,25].
That property is determined by design decisions that either enable portability or trap findings at the site where they were produced [26][27][28]. Open file formats reduce vendor lock-in: standards such as mzML and mzTab allow different tools to read the same data, while stable identifiers such as InChIKey ensure consistent molecular reference [29].
Provenance frameworks record analytical lineage. W3C PROV ontology and packaging conventions such as Research Object Crate (RO-Crate) document how data, parameters and software environments produce a reported call [30,31]. Workflow engines such as Nextflow and Snakemake decouple pipeline logic from execution environment; Galaxy provides a web-based layer with similar portability guarantees [32][33][34][35].
These systems treat workflows as governed artefacts [36]. Each design choice determines whether an external assessor can reconstruct the path from raw signal to reported call without privileged access [37,38].
Auditing current practice reveals what is already hardened and what remains aspirational. This study treats LC/GC-HRMS NTA platforms as cheminformatics infrastructure and asks which architectural guarantees are already in place. We assess six pillars-validation, data availability, algorithmic transparency, standardised formats, knowledge integration and portable implementation-across Health, Pharma and Chemistry domains [32][33][34][35][39][40][41][42][43][44][45][46][47]. These pillars collectively express regulatorygrade reproducibility: a level of reproducibility that can withstand independent audit, reuse and legal scrutiny across organisational boundaries.
Food safety represents a critical yet underserved application area. Incidents such as melamine in infant formula and Sudan dyes in spices required rapid cross-border verification-exactly the scenario demanding portable, validated NTA workflows. Whether current tools address this need-or whether the field’s architectural trajectory actively precludes food safety applications-remains unexamined. This study provides that examination.
These
This content is AI-processed based on open access ArXiv data.