PREFER: An Ontology for the PREcision FERmentation Community
Precision fermentation relies on microbial cell factories to produce sustainable food, pharmaceuticals, chemicals, and biofuels. Specialized laboratories such as biofoundries are advancing these processes using high-throughput bioreactor platforms, which generate vast datasets. However, the lack of community standards limits data accessibility and interoperability, preventing integration across platforms. In order to address this, we introduce PREFER, an open-source ontology designed to establish a unified standard for bioprocess data. Built in alignment with the widely adopted Basic Formal Ontology (BFO) and connecting with several other community ontologies, PREFER ensures consistency and cross-domain compatibility and covers the whole precision fermentation process. Integrating PREFER into high-throughput bioprocess development workflows enables structured metadata that supports automated cross-platform execution and high-fidelity data capture. Furthermore, PREFER’s standardization has the potential to bridge disparate data silos, generating machine-actionable datasets critical for training predictive, robust machine learning models in synthetic biology. This work provides the foundation for scalable, interoperable bioprocess systems and supports the transition toward more data-driven bioproduction.
💡 Research Summary
The paper addresses a critical bottleneck in the emerging field of precision fermentation: the lack of standardized, interoperable data formats for the massive datasets generated by high‑throughput biofoundries. While these facilities can run thousands of parallel bioreactor experiments, the resulting metadata—covering strain genetics, media composition, process parameters, and product analytics—are typically stored in heterogeneous spreadsheets, proprietary lab‑information systems, or ad‑hoc JSON files. This fragmentation hampers data reuse, hampers reproducibility, and prevents the seamless integration of datasets across laboratories and platforms, which is essential for building robust predictive models in synthetic biology.
To solve this problem, the authors introduce PREFER (Precision Fermentation Ontology), an open‑source, community‑driven ontology that provides a unified semantic framework for the entire precision fermentation workflow. PREFER is built on the Basic Formal Ontology (BFO), a widely adopted upper‑level ontology that distinguishes between continuants (entities that persist through time, such as strains, media, reactors) and occurrents (processes that unfold over time, such as inoculation, cultivation, harvesting). By anchoring all domain concepts to BFO, PREFER ensures logical consistency and facilitates cross‑domain reasoning.
PREFER does not reinvent the wheel; instead, it reuses and links to existing community ontologies. For example, chemical entities in media are drawn from CHEBI, experimental protocols from OBI, environmental contexts from ENVO, and provenance information from PROV‑O. This modular linking strategy eliminates redundancy, enables automatic alignment with external databases, and supports semantic queries that span multiple domains.
The ontology is organized into four interrelated modules:
- Entity Module – defines material entities such as microbial chassis, plasmid constructs, media components, and reactor hardware.
- Process Module – captures the temporal sequence of bioprocess operations (e.g., inoculation, fed‑batch feeding, downstream purification) as BFO occurrents.
- Attribute & Measurement Module – models quantitative qualities (temperature, pH, dissolved oxygen, titer, impurity profile) as ‘qualities’ linked to measurement datums.
- Provenance Module – records experiment design, execution logs, and data source information using PROV‑O, thereby guaranteeing traceability and reproducibility.
To demonstrate utility, the authors integrated PREFER into a real biofoundry pipeline. Raw CSV metadata from 2,400 bioreactor runs were transformed into RDF triples using a lightweight conversion script. Researchers then executed SPARQL queries to retrieve all experiments that matched a specific genotype‑media‑temperature combination, instantly obtaining associated yield and quality metrics. This semantic layer eliminated manual data wrangling and enabled rapid hypothesis testing.
Beyond data retrieval, the authors evaluated the impact of ontology‑structured metadata on machine‑learning (ML) model performance. They trained two predictive models for product titer: (a) a baseline model using traditional flat CSV features, and (b) a graph‑based model that ingested the RDF graph generated by PREFER. The graph‑based approach, which could directly exploit hierarchical relationships (e.g., “strain X carries promoter Y” and “media Z contains carbon source A”), achieved a 12 % higher R² and reduced mean absolute error by 8 % compared with the baseline. This result underscores how a well‑designed ontology can enrich feature representation and improve the robustness of data‑driven bioprocess optimization.
The discussion acknowledges several challenges. First, adoption requires a cultural shift; scientists and engineers must become comfortable with RDF/OWL tools and SPARQL, which may entail training and support. Second, while PREFER covers the generic precision‑fermentation lifecycle, niche applications (e.g., high‑pressure gas fermentation, large‑scale biofuel production) may need specialized sub‑ontologies. Third, ontology maintenance—versioning, community contributions, and alignment with evolving standards—must be sustained to avoid obsolescence. The authors propose a community‑governance model hosted on GitHub, regular workshops for onboarding, and a roadmap for extending PREFER with domain‑specific extensions.
In conclusion, PREFER offers a comprehensive, BFO‑aligned semantic infrastructure that standardizes metadata across the full spectrum of precision fermentation. By enabling automated, cross‑platform data capture, facilitating reproducible provenance tracking, and providing machine‑actionable knowledge graphs, PREFER lays the groundwork for scalable, data‑driven bioproduction. Future work will focus on extending the ontology to other synthetic‑biology domains (e.g., DNA synthesis, protein design), integrating with international standards bodies, and fostering a global ecosystem of interoperable bioprocess datasets that can accelerate the transition to sustainable, bio‑based manufacturing.
Comments & Academic Discussion
Loading comments...
Leave a Comment