Quantifying the implicit process flow abstraction in SBGN-PD diagrams with Bio-PEPA
For a long time biologists have used visual representations of biochemical networks to gain a quick overview of important structural properties. Recently SBGN, the Systems Biology Graphical Notation, has been developed to standardise the way in which such graphical maps are drawn in order to facilitate the exchange of information. Its qualitative Process Diagrams (SBGN-PD) are based on an implicit Process Flow Abstraction (PFA) that can also be used to construct quantitative representations, which can be used for automated analyses of the system. Here we explicitly describe the PFA that underpins SBGN-PD and define attributes for SBGN-PD glyphs that make it possible to capture the quantitative details of a biochemical reaction network. We implemented SBGNtext2BioPEPA, a tool that demonstrates how such quantitative details can be used to automatically generate working Bio-PEPA code from a textual representation of SBGN-PD that we developed. Bio-PEPA is a process algebra that was designed for implementing quantitative models of concurrent biochemical reaction systems. We use this approach to compute the expected delay between input and output using deterministic and stochastic simulations of the MAPK signal transduction cascade. The scheme developed here is general and can be easily adapted to other output formalisms.
💡 Research Summary
The paper addresses a long‑standing gap between the visual representation of biochemical networks in Systems Biology Graphical Notation Process Diagrams (SBGN‑PD) and the quantitative models required for automated analysis. While SBGN‑PD provides a standardized set of glyphs for drawing pathways, it implicitly relies on a “Process Flow Abstraction” (PFA) that treats reactions as transformations of quantitative flows (tokens) through processes. The authors first make this abstraction explicit by formally defining the PFA and by extending each SBGN‑PD glyph with a set of quantitative attributes—rate expressions, initial amounts, compartment identifiers, and units. These attributes are captured in a lightweight textual representation called SBGNtext, which mirrors the graphical diagram while embedding the necessary numerical information.
To bridge the gap between SBGN‑PD and a formal quantitative language, the authors develop SBGNtext2BioPEPA, a conversion tool that parses SBGNtext and automatically generates a complete Bio‑PEPA model. Bio‑PEPA is a process‑algebraic formalism designed for concurrent biochemical systems; it encodes species as “populations” and reactions as “processes” with explicit kinetic laws. The conversion pipeline proceeds in three stages: (1) parsing the SBGNtext to extract entities, processes, and arcs; (2) mapping the extracted elements and their quantitative attributes onto Bio‑PEPA syntax (species declarations, reaction definitions, kinetic laws, and compartment specifications); and (3) emitting a syntactically correct Bio‑PEPA file ready for simulation. Because all kinetic parameters are taken directly from the SBGN‑PD annotations, the user does not need to manually transcribe values, eliminating a major source of error and effort.
The methodology is validated on the mitogen‑activated protein kinase (MAPK) cascade, a canonical signaling pathway characterized by multi‑step phosphorylation, signal amplification, and time‑delay phenomena. The authors first construct an SBGN‑PD of the MAPK cascade, annotate it with kinetic constants and initial concentrations, and then translate it to Bio‑PEPA using their tool. Two complementary simulation experiments are performed: a deterministic analysis using ordinary differential equations (ODEs) derived from the Bio‑PEPA model, and a stochastic analysis employing the Gillespie algorithm as implemented in the Bio‑PEPA stochastic simulator. Both simulations reproduce the expected activation profile of MAPK, but the stochastic runs also reveal cell‑to‑cell variability in the timing of activation. By measuring the elapsed time between an upstream stimulus (e.g., growth factor binding) and the appearance of active MAPK, the authors demonstrate that the PFA‑based model can quantitatively predict signal propagation delays.
Beyond the case study, the generated Bio‑PEPA code is shown to be portable to other analysis environments such as PRISM (for probabilistic model checking) and MATLAB (for custom numerical experiments), highlighting the flexibility of the approach. The authors argue that their framework transforms SBGN‑PD from a purely illustrative tool into a “model‑first” interface that can feed directly into a variety of quantitative pipelines.
Key contributions of the work are:
- A formal specification of the implicit Process Flow Abstraction underlying SBGN‑PD, together with a metadata schema that enriches each glyph with quantitative information.
- The SBGNtext2BioPEPA conversion tool, which automates the generation of syntactically correct Bio‑PEPA models from annotated SBGN‑PD diagrams.
- An empirical validation on the MAPK cascade, showing that deterministic and stochastic simulations derived from the automatically generated models faithfully reproduce known biological dynamics and enable the calculation of signal‑delay metrics.
The authors conclude that the proposed scheme is generic and can be adapted to other formal representations (e.g., Kappa, BioNetGen) or integrated with machine‑learning‑based parameter estimation workflows. By providing a seamless bridge from visual pathway maps to executable quantitative models, the work paves the way for large‑scale, automated, and reproducible systems‑biology analyses.
Comments & Academic Discussion
Loading comments...
Leave a Comment