A Data-Enhanced Agent-Based Model for Simulating 3D Cancer Spheroid Growth: Integrating Metabolism and Mechanics
📝 Original Info
- Title: A Data-Enhanced Agent-Based Model for Simulating 3D Cancer Spheroid Growth: Integrating Metabolism and Mechanics
- ArXiv ID: 2512.15361
- Date: 2025-12-17
- Authors: Pedro Garcia-Gomez, Paula Guerrero-Lopez, Silvia Hervas-Raluy, Jose Manuel Garcia-Aznar
📝 Abstract
Cancer research has shifted from a purely gene-centric view to a more holistic understanding that recognizes the critical role of the tumour microenvironment, where mechanics and metabolism are key drivers of disease progression. However, the intricate interplay between these multifactorial mechanisms remains poorly understood. To address this gap, we present an agent-based computational model (ABM) that integrates tumour metabolism and mechanics to study 3D cancer spheroid growth. Our approach unifies the metabolism and mechanical aspects of tumour development within an integral model for cancer spheroid formation and growth. In addition to that, we performed a computational calibration of the parameters and tested the model versatility to reproduce different cellular behaviours. Our model reproduced qualitatively and quantitatively the experimental results of spheroid growth obtained in the lab and also allowed to discern different dynamics that cancer cells can present under the same conditions, providing insight into the potential factors contributing to the variability in the size of spheroids. Furthermore, it also showed its adaptability to reproduce diferent cell lines and behaviours by tuning its parameters. This study highlights the significant potential and versatility of integrative modelling approaches in the field of cancer research, not only as a tool to complement in vitro studies, but also as independent tools to derive conclusions from the physical reality.📄 Full Content
Cells obtain the energy needed for their physiological functions through a series chemical reactions encompassed in what is known as energy metabolism. These reactions produce adenosine tri-phospate (ATP), the main energy source of the cell. The mode of ATP production varies depending on the cell type and the micro-environmental conditions. Under oxygen-rich conditions, cells undergo aerobic metabolism and generate substantial amounts of ATP via oxidative phosphorylation (OXPHOS). In contrast, when oxygen availability is limited, oxidative phosphorylation cannot proceed, and ATP is produced predominantly through glycolysis in an anaerobic regime. Although less efficient in terms of energy yield, anaerobic metabolism is faster than its aerobic counterpart. Regular healthy cells tend to rely on the oxydative phosporilation as their main way to produce energy, as it is the most efficient set of reactions. However, cancer cells have been observed to switch to the anaerobic regime even when there is oxygen available in the microenvironment. This phenomena is know as the Walburg effect [8] [9] . The precise reasons underlying this metabolic reprogramming remain uncertain. Initially, it was hypothesized that this effect occurs due to a permanent malfunctioning of the oxidative phosphorylation [10]. However, more recent theories suggest that this metabolic shift may confer a selective advantage to cancer cells [11]. For instance, the anaerobic pathway produce lactate which acidifies the surrounding microenvironment. While normal cells are susceptible to damage and death under acidic conditions, cancer cells appear resilient, potentially giving them an advantage for proliferation [12].
The metabolism is far from being the only aspect apart from genetics that can affect cancer cells. The microenvironment in which cells reside also plays a pivotal role in regulating cancer development, both directly and indirectly. Among its various components, the physical and mechanical properties of the extracellular matrix (ECM) are particularly influential. Features such as interstitial fluid pressure and tissue architecture have been shown to critically affect tumour growth, cellular behaviour, and metastatic potential [4]. These mechanical cues modulate not only the accessibility of nutrients and signaling molecules but also the ability of cancer cells to migrate, proliferate, and adapt to their surroundings, underscoring the complex interplay between the microenvironment and tumour progression [13]. One critical mechanical property of the ECM is its density, which can modulate various aspects of cancer cell physiology. High-density ECMs are associated with reduced cell motility, thereby impeding metastatic dissemination, which depends on the ability of cancer cells to migrate through the surrounding tissue [14,15]. Moreover, ECM density affects the morphology and compactness of tumour spheroids [16]. Denser matrices tend to promote more compact spheroid formation, which in turn limits the diffusion of essential nutrients and oxygen into the spheroid core, potentially resulting in hypoxic conditions and altered cellular behaviour [17,18]. These biophysical constraints underscore the integral role of the ECM in shaping cancer cell dynamics beyond purely genetic factors.
In recent years, computational models have emerged as a valuable tool for investigating the biological and physical mechanisms underlying various biological processes. In the context of cancer computational modelling, there are different approaches that normally can be used, distinguishing continuous models [19,20,21] and cell-based or agent-based models [22,23]. Continuous models tend to be faster and more computationally efficient, as they do not simulate individual cells but the homogenised properties of the tissue formed by the cells. Agent-based models, meanwhile, are usually more computationally expensive but they offer a more direct translation to biological observations. In these types of models, single cells are being simulated and their behaviour can be programmed and controlled individually. This allows single-cell differentiation and more spatial heterogeneity than continuous models.
In silico models offer notable advantages in terms of speed and flexibility, allowing for rapid and systematic variation of experimental conditions that may be difficult or time-consuming to reproduce using in vitro methods. However, computational tools are not yet sufficient as stand-alone research methods. To enhance their reliability and relevance, they are most effective when used in combination with in vitro or in vivo experiment data, which help to feed the model parameters and validate simulation outcomes.
Computational studies focused exclusively on a single regulatory mechanism of cancer are inherently limited, as they fail to account for the collective and interconnected nature of the factors governing tumour growth. In this work, we present an agent-based computational model that integrates both tumour metabolism and mechanics, through the consideration of the regulatory mechanisms due to ECM mechanics, nutrients and oxygen availability. Incorporating these two effects within a unified framework presents several challenges, most notably the presence of numerous unknown parameters that require careful calibration with reliable experimental data for validating model predictions. To address this, we performed in vitro experiments using 3D microfluidic devices to obtain tumour spheroids growth, providing quantitative data for calibrating the model. To estimate the unknown model parameters from experimental data, we employed a Bayesian inference strategy [19,24]. This non-deterministic approach systematically explores the parameter space and compares model predictions with experimental observations to infer the most probable parameter values. Given the high computational cost associated with simulating the agent-based model, we utilized a surrogate modelling approach to enable efficient exploration of the parameter landscape. Specifically, we implemented a Gaussian process model as the surrogate model, allowing for an accurate approximation of the model output with significantly reduced computational demand.
The integration of in vitro and in silico approaches offers a powerful framework for studying complex biological systems such as tumour progression. In silico models enable hypotheses testing and exploration of different conditions and meanwhile in vitro experiments provide physiologically relevant data. By combining both strategies, it is possible to not only improve model accuracy through data-enhanced calibration, but also to gain deeper insights and underlying mechanisms present in the in vitro experiments.
The presentation of this manuscript is organized as follows. In Section 2, we first present the experimental methodology carried out to asses spheroid growth patterns and provide the data necessary for the model calibration (Section 2.1). We then provide a detailed description of the mechanical (Section 2.3.1) and metabolic (Section 2.3.2) components of the model, as well as the statistical framework employed for calibration, including model order reduction, sensitivity analysis and Bayesian infernece techniques (Section 2.4). The results and the predictive capabilities of the model, as well as the exploration of new scenarios, are presented and discussed in Section 3 . Finally, in Section 4, we summarize the key conclussions of this study and propose potential directions for future research.
We conducted a series of in vitro experiments using the A549 cell line, which is derived from a human lung carcinoma. Individual A549 cells were cultured in 3D microfluidic devices embedded in collagen gels of varying densities to investigate their growth behaviour and spheroid formation. Images of the microfluidic devices were captured daily for seven days following cell seeding. These images were subsequently analysed to quantify the size of the resulting spheroids.
The microfluidic devices used in this study are fabricated from polydimehtyl-siloxane (PDMS), a biocompatible and transparent material, following the protocol described by Shin et al [25]. The design includes a central chamber, where a collagen hydrogel mimicking the tissue matrix and containing the cells is confined, and two lateral channels through which the nutrients are introduced, as shown in Figure 1 a). The geometry is patterned onto a silicon wafer using an SU-8 master mould and then replicated in PDMS. The PDMS was prepared by mixing the base and curing agent at a 10:1 weight ratio, followed by curing at 80°C overnight. Once cured, the devices were trimmed and punched to create access ports for the channels. Finally, the PDMS structures were bonded to the glass bottoms of 35 mm Petri dishes via plasma treatment, and subsequently coated with poly-D-lysine (PDL) to enhance collagen matrix adhesion within the device.
The lung adenocarcinoma cell line A549 was cultured with Dulbecco’s modified Eagle’s medium at 4.5 g/L glucose and supplemented with 10% fetal bovine serum and 1% of penicillin/streptomicin. Cells were incubated at 37 °C with 5% CO 2 until 80% confluence was reached for use in the experiment. When optimal confluence was reached, cells were trypsinised, centrifuged (1200 rpm, 5 minutes) and passed through a 40 µm cell strainer in order to ensure removal of cell aggregates. Subsequently, cells were counted using a Neubauer chamber and added to the collagen mix to leave a final concentration of 0.2 x 106 cell/mL. A type I collagen-based matrix was used for the 3D cell culture, using the protocol by Shin et al. [25] The hydrogel consisted of a mixture at 4 ºC of 10X DPBS, collagen at a final concentration of 6 mg/mL, 0.5M NaOH to adjust pH to 7.5 and the cells. This mix was introduced into the central chamber, as shown in Figure 1 a), and left to polymerize at 37 ºC in humid boxes, turning the device every 5 minutes for at least 20 minutes. Finally, the lateral channels were hydrated with culture medium. These devices where conserved in a CO 2 incubator at 37 ºC for 9 days and the media of the two lateral chambers was changed every 2 days.
Figure 1: a) Geometry and design of the microfluidic devices described in the present work, with a central chamber containing the gel with the 3D cells embedded and two side channels to feed the nutrients. [13] b) Size comparison between a microfluidic chip and a 1 euro coin. c) A scheme of the 2D area projection of the spheroid measured in silico and in vitro.
The spheroid growth was monitored with a Leica DM IL Led microscope. Daily pictures were taken of the central chamber at 4X magnification in brightfield. Later, these images were processed with Im-ageJ [26] and the spheroids 2D projected area was segmented with the in-house semi-automatic Seg-mentation3D App developed by C. Borau using Matlab (Mathworks, Natick, CA, US) as described by Alamán-Díez et al [27]. The data obtained were processed and represented using GraphPad Prism 8. A scheme of this measure is presented in Figure 1 c).
The computational model is divided in two coupled models: A mechanic and a metabolic model. The mechanical model used in this study builds upon the framework proposed by Gonc ¸alves et al. [28] which extends the built-in agent-based mechanical model of PhysiCell to incorporate the cell-matrix interactions. The ECM plays a key role in regulating multiple cellular processes. In the context of tumour spheroid development and growth, the density of the ECM can influence both the size and growth rate of the spheroid. Additionally, the ECM can also affect how far individual cells migrate [13], thereby affecting spheroid compactness, as increased migratory capacity often results in less aggregated spheroid structures.
In the computational model, each cell is represented as a fixed-size agent whose position evolves over time based on the net balance of mechanical forces acting upon it at each time step.These forces arise from two types of interactions: intercellular forces and interactions between the cells and the ECM (Figure 2.a). The interaction between cells is governed by a balance of two opposing forces: a cell-to-cell adhesion force and a cell-to-cell repulsion force. These forces are derived from corresponding potential functions, as illustrated in Figure 2.b. The interaction between cells and the ECM is modeled by other two main forces: a random locomotive force, which drives cell movement, and a drag force, which resists motion (Figure 2). The mechanical properties of the ECM are incorporated into the model via the drag force, which is defined as proportional to the collagen viscosity [29]. In this framework, the ECM is treated as a continuous viscous medium that produces mechanical resistance, without explicitly modelling individual collagen fibers. [28] Figure 2: a) Scheme of the forces that act in the cell as part of the mechanic model. b) Graphic representation of the attraction and repulsion potential between two cells where the x-axis represents r = x j -x i , being x i and x j the position of the particle i and j . R an R A are the characteristic distances of the two potentials. These two distances define the range of action of both potential functions.
The model considered in this study is non-inertial, as the inertial term can be neglected. This assumption is justified by the fact that the timescale over which forces reach equilibrium is significantly shorter than other characteristic cellular timescales, such as those associated with cell division or cell death [30]. Under this assumption, the equilibrium equation for each can be written as:
being N the total number of cells, i and j indexes that go through all the cells i, j ∈ [1, N ]. F cca and F ccr are the adhesion and repulsive cell-to-cell forces respectively, F loc is the random locomotive force and F drag is the drag force of the extracellular matrix. The drag and locomotive forces act in the whole domain of the simulation. However, the two cell-cell interaction forces only act when the cells are at certain distances from each other. These two forces are given by the following expressions. For the adhesion interaction we formulate:
and for the repulsion interaction:
where C cca and C ccr are the adhesion and repulsion coefficients that control the magnitude of the two forces and x i and x j are the positions of the particles i and j. Φ and Ψ are two potentials that depend on the distance between the cells and are given by the following expressions:
otherwise.
(4)
otherwise.
(
Both potentials are described in the right panel of the Figure 2. R d is defined as the sum of the radius of the two particles that are interacting (R d = r i + r j ), while R A is defined as R plus the maximum adhesion distance (δ) that is established depending on the potential range of action (R A = R + δ). The values of the constants here are detailed in Table 1.
A drag force has been implemented, to model the interaction between the cells and the ECM, as the density of collagen of the ECM reduces the velocity of movement of the cells. The drag force is described as the Stokes’ law:
where v i is the velocity of the i cell, R is the radius of the cells and η represents the dynamic viscosity of the collagen gel, which depends on the density of the collagen fibers of the gel and was characterised by Valero et al. in [29].
The last force that needs to be considered is the locomotive force (F loc ), which has been characterised using the experiments done by Plou et al. [13]. These experiments correspond to single cell migration tests where the cells are separated sufficiently for not interacting with each other. In this particular context, equation 1 can be reduced as the interaction terms can be neglected:
From this, and using the equation 6 we can obtain the expression of the velocity for a single cell only interaction with the matrix:
The F i loc was considered a polinomical function and its parameters were calibrated by [28] using the experiments from [13], as was cited previously. The polinomical function obtained was:
where p is a random numbre from 0 to 1.
Finally, starting from equation 1, and applying Stokes’ law along with the computation of all forces described in this section, the velocity of each cell can be determined at every simulation step according to the following expression:
Symbol Parameter Value Units Reference
Dynamic viscosity [7.96, 18.42, 39.15]
Table 1: List of physical parameters used in the model.
An ATP-driven model.
ATP, the primary energy source for most cellular processes, also functions as a regulator of key activities such as cell division, metabolism, movement, and apoptosis [31], [32], [33]. Given ATP’s central role in cellular metabolism, our metabolic model is structured around this molecule. At each time step, ATP levels are quantified for every individual cell and compared against two predefined thresholds. These levels determine whether the cells are apoptotic, quiescent or proliferating. Therefore, interactive mechanical forces can vary depending on the cell state as it is indicated in Figure 3. The molecular and chemical pathways through which cells produce ATP can vary depending on cell type and the specific phase of the cell cycle. However, most cells derive their ATP from glucose via two major processes: glycolysis and oxidative phosphorylation. Beyond these primary pathways, a complex network of additional reactions provides essential substrates for ATP synthesis. Therefore, we propose a simplified model, based on the one introduced by Jacquet et al. [34], which focuses on the main reactions responsible for ATP production.
The energy metabolism of a human cell can be summarized in five main processes, each one involving multiple coupled reactions. The abbreviation used in the chemical equations that summarized these processes are shown in • Glycolisis: in this process, glucose is converted to pyruvate [35]. During this sequence of reactions, 2 molecules of ATP are produced. This anaerobic process occurs in the cytoplasm and can be summarized as follows:
• Lactic acid fermentation: This anaerobic reaction is described by the following equation:
Although this reaction does not generate ATP directly, it regenerates N AD + , a critical substrate for glycolysis [36].
• Pyruvate oxidation: The pyruvate oxidation transform the pyruvate produced in the glycolisis into acetyl-coenzime-A:
• Citric acid cycle: also known as the Krebs cycle (equation 15), takes place in the mitocondria and transform the Acetil-coenzime-A produced during pyruvate oxidation into the necessary metabolites for oxidative phosphorilation.
• Oxydative phosphorilation: The oxidative phosphorilation takes place in the electron transport chain of the walls of the mitocondria and it uses the metabolites produced at the Krebs cycle (equation 15) at the ATP syntase to produce ATP molecules from ADP:
In order to save computational resources, the last three reaction (14, 15 and 16) have been summarised into a single aerobic reaction of production of energy. This was done because we are just interested in the production of ATP, not the intermediate metabolites, and these three metabolic reactions take place in a chain where only the last one produces ATP molecules. The equation that sums up these three metabolic processes is:
To simulate computationally the consumption of energy done by the cells due to their different functions (migration, proliferation…), we will implement one last equation. The cellular consumption of ATP equation:
After all, the computational model here presented has 4 final equations (see Table 3). These equations were implemented in COPASI as ordinary differential equations (ODEs).
The two models explained in the subsections 2.3.1 and 2.3.2 operate independently and are invoked at different parts of the code. However, they are coupled in order to introduce dependencies between them and thereby provide a more accurate representation of reality. The coupling strategy is represented in Figure 3. The metabolic model, through the equations of consumption and production of metabolites explained in 2.3.2 determines at each time step of the simulation the intracellular ATP levels of each individual cell. ATP levels in each cell are then compared against the thresholds described in Figure 3 to determine whether the cell state is dead, quiescent, or proliferating. As it can be seen in Figure 3, depending on the ATP level, different cell functions are activated and correspondingly different forces are considered:
• ATP levels below AT P death threeshold: The cells below this ATP level are in the apoptotic state. This state do not have any force associated, as dead cells do not interact with their environment.
• ATP levels between AT P death and AT P prolif threeshold: Cells in this ATP range are quiescent.
The quiescent state has forces associated only with the interaction with other cells (cell-cell adhesion force (F cca ) and cell-cell repulsion force (F ccr )), as quiescent cells do not migrate.
• ATP levels above AT P prolif threeeshold: The cells which internal ATP is above this ATP level are proliferating. These cells have forces related with the interaction between cells and forces derived from the movement and the relation with the extracellular matrix (cell-cell adhesion force (F cca ), cell-cell repulsion force (F ccr ), lococotive force (F loc ) and drag force (F drag ).
The computational model described in this section contains numerous parameters whose exact values are not well established. This uncertainty arises from the difficulty of accurately measuring these parameters, their variability across different cell lines, and the limited availability of data in the literature. These parameters can be categorized into two main groups: metabolic and non-metabolic parameters (see Table 4). A brief description of these parameters is presented here:
• Metabolic parameters. These include the rates at which the metabolic processes outlined in Section 2.3.2 occur within the cell: k glu for glycolysis (Eq. 12), k aer for the Krebs cycle (Eq. 15), and k ana for lactic acid fermentation (Eq.13), as well as the rate of cellular energy consumption, k ene (Eq. 18). These parameters are not directly measurable, and the estimations available in the literature lack the accuracy required to confidently assign specific values.
• Non-metabolic parameters. These include the ATP thresholds associated with cell death (AT P death ) and proliferation (AT P prolif ), as illustrated in Figure 3, as well as the cellular proliferation rate (k prolif ). The ATP thresholds are intrinsic to the model and cannot be directly measured. While estimations for the proliferation rate are available in the literature [37,38], this parameter is subject to variability due to experimental conditions and different measure methods, introducing a degree of uncertainty. Therefore, these parameters also need to be estimated.
Accurate estimation of these parameters is crucial, as they can significantly influence the model’s predictions. Given the absence of definitive values from the literature, we employ a Bayesian estimation framework, informed by experimental data, to derive probabilistic estimates for these parameters. In order to implement this Bayesian inference process more efficiently, we quantified the influence of the parameters in our model to identify those that could be fixed for the subsequent analysis . We also replace the high-fidelity model with a Gaussian process to reduce the computational cost of uncertainty quantification analyses.
To assess the influence of each model parameter on the quantity of interest (the projected area of the spheroid), we employ a two-step sensitivity analysis. First, we perform a marginalization analysis in which each parameter is varied independently. This approach allows to observe the isolated effect of each parameter on model behaviour. Subsequently, we compute the Sobol sensitivity indices, which provide a quantitative measure of both the individual (first-order) and interactive (higher-order) contributions of each parameter to the output of the computational model. Together, these analyses enable us to identify the parameters that have the greatest impact on the selected quantity of interest and to disregard those with negligible influence, thereby refining the model and focusing subsequent analyses on the most relevant parameters.
Marginalization over a specific parameter entails varying this parameter in the range of possible values considered and check all the possible combinations of the rest of the N -1 parameters for the specific value of the studied parameter. Then, the mean of all these combinations of outcomes of this combination of parameters should be calculated to check the marginalized output of the model:
M denotes the number of random combinations for the marginalization process, N corresponds to the total number of parameters and GP is the surrogate model explained in the following subsection.
Gaussian Processes (GPs) are powerful surrogate models that enable rapid estimation of a model’s output [39], eliminating the need to execute the full model each time a new set of parameters is evaluated. The primary goal of a Gaussian Process is, given a set of training data
where {t i } J i=1 represents the input parameters, y i the corresponding output, N the number of training samples, and J the number of input parameters, to infer the probability distribution of the output at a new, unseen point x predict [40].
A GP is defined by its mean function and a covariance matrix (Γ). The covariance matrix is constructed using the training data and has a square shape structure with dimensions N × N , corresponding to the number of training points. This matrix is generated through a covariance function Γ(t i , t j ) = k(t i , t j ), commonly referred as the kernel function.The choice of kernel function is crucial, as it determines how correlations between data points are modeled and directly influences the behaviour of the GP. The kernell function can take various forms, depending on the characteristics of the data we are trying to reproduce. In this work, the squared exponential covariance function (RBF: Radial Basis Function) was employed.
where σ 2 denotes the variance and l i represents the characteristic length scale of each parameter. These quantities are the hyperparameters of the Gaussian Process. During training, the hyperparameters are optimized to maximize the likelihood of the model fitting the training data.
To assess the predictive performance of the GP metamodel, we use the Nash-Sutcliffe efficiency coefficient [41], defined as:
being f :
where N T the number of testing samples. The Nash-Sutcliffe has a range from -∞ to 1, where values closer to 1 indicate high predictive accuracy, while negative values suggest poor model performance [42].
Bayesian estimation was selected over other deterministic parameter estimation methods because it provides not only point estimates of the parameters but also their full probability distributions. Access to the full distribution is particularly important in biological systems, which are inherently stochastic. Reporting a single point estimate for a biological parameter could lead to overconfident and potentially misleading conclusions. The Bayesian estimation framework is based on Bayes’ theorem, which can be stated as: P (A|B) = P (B|A)P (A) P (B) = P (B|A)P (A)
A P (B|A)P (A)
In the context of Bayesian estimation of parameters, the Bayes theorem can be reformulated as:
Here, y obs represents the experimental data used for calibration and θ are the parameters of our model. p(θ|y obs ) is known as the posterior distribution and is the main objective of the Bayesian calibration process: it describes the probability distribution of the parameters conditioned on the observed data. p(θ) represents the prior distribution, which encodes the existing knowledge or assumptions about the parameters before observing the new data. The form of this prior distribution depends on the amount and quality of prior information available.
The term p(y obs |θ) is the likelihood function [43], which quantifies the probability of observing the data given a set of parameter values. It links the experimental observations (y obs ) to the computational model output. In this study, it is assumed that the experimental data follow a normal distribution centered around the model predictions f (θ):
As each experimental data is assumed to be independently drawn from this normal distribution, the likelyhood can be written as the multiplication of the probability distribution of the individual observations as follows:
To characterize the posterior distribution of the model parameters, multiple simulations are required to adequately sample the parameter space. The model here presented has a high computational cost. Due to this, a surrogate model (GP) (Section 2.4.2) is employed to approximate the model output f (θ) in the likelihood function (26).
For the implementation of the model, we have selected PhysiCell [30] as our main computational tool. PhysiCell is an open-source agent-based cell simulation program. This computational tool has already implemented many mechanical and phenotipic functionalities easily adaptable to the characteristics of the cancer cells we are simulating. Furthermore, PhysiCell offers seamless integration with other computational tools, which is crucial when developing coupled computational models like the one proposed in this work. PhysiCell has already demonstrated its value as a reliable and effective platform for cancer research, not only as a complement to experimental studies conducted in the lab, but also as a standalone modelling tool capable of delivering powerful insights. [44] [45].
The mechanical part of the model has been entirely implemented in PhysiCell, however, for the metabolic part, an external tool was integrated in the PhysiCell mechanical model. This external tool is Copasi [46], a software application built for the construction and simulation of biochemical networks and equations using SBML System Biology Markup Language), an open data format for the simulation of computational systems biology. A squematic representation of the integration, the model and the computational tools used is portrayed in Figure 4. For the implementation of the sensitivity analysis and the calibration process described in Section 2, we employed the PyMC probabilistic programming library [47]. The integration of the different models and statistical modules was carried out using PhysiCool [48]. An overview of the overall structure of our workflow-including the computational model, the sensitivity analysis, and the calibration process-is provided in Figure 5.
Figure 5: General scheme of the calibration process. First, a Gaussian process is trained using outputs from multiple simulations of the computational model with different parameter combinations (green boxes). This surrogate model is then employed to perform a sensitivity analysis, identifying influential and non-influential parameters (yellow boxes). Finally, Bayesian calibration is carried out using only the GP with the experimental data available to obtain the calibrated parameter set (red boxes).
In this section, we first present the results of the microfluidic experiments conducted to study tumour spheroid growth within 3D microfluidic devices. We then proceed with a sensitivity analysis of the mathematical model to assess the relative influence of each parameter on the computational model’s output. Subsequently, we describe the parameter estimation procedure used to calibrate the model against the experimental data. Finally, we evaluate the model’s ability to reproduce the experimental observations using the estimated parameter values.
Brightfield images of the tumour-on-chip device were collected at days 1, 4, 7, and 9 to track the temporal evolution of spheroid formation and progression (Figure 7a). Four independent replicas of the same experiment were performed to ensure reproducibility. Cells were seeded in the microfluidic devices using diffusors to favor single-cell deposition rather than pre-aggregated clusters. Once seeded, the devices were incubated at 37 °C with 5 % CO 2 for 9 days, and cells were supplied daily with glucose-rich medium (6 mg/ml) to support proliferation. Images were acquired every 24 hours with a Leica DM IL Led microscope.
Over the days of the experiment, cells progressively proliferated, forming compact spheroids by day 4. These spheroids continued to grow until day 9, generating a heterogeneous population with a broad distribution of sizes. Image segmentation and processing (as described in Section 2.2) provided quantitative measurements of spheroid area (Figure 6). From these data, a growth curve was obtained showing a steady increase in spheroid size from day 1 to day 8, with growth appearing to stabilize by day 9.
The temporal analysis also revealed that spheroid sizes were relatively homogeneous in the early days, whereas variability increased markedly from day 4 onwards, leading to a broad distribution. By day 9, the dispersion decreased again, producing a more compact size range. Consistently, the mean spheroid area was larger than the median at each time point, indicating the presence of numerous small spheroids alongside a subset of much larger ones, which acted as outliers. This indicates that spheroids within the same chip can display very different growth behaviours. Due to this, in the subsequent computational calibration, we divided our experimental data into three different groups based on the quantiles of the spheroid size distribution at day 7. This time point was chosen because it already showed clear differences in spheroid behaviour. Specifically, the first (≈ 2500 µm 2 ) and the third (≈ 7500 µm 2 ) quantiles were used to establish thresholds: spheroids below the first quantile were considered small, those between the first and third quantiles medium, and those above the third quantile large (Figure 7c). This approach provided a data-enhanced means of classifying spheroids while accounting for the inherent variability in their growth.
A scatter plot of the marginalization process was perform to visually check the impact of the different paramets on the quantity of interest. Figure 7 illustrates the marginalization results for each parameter of the model (described in Section 2.4). The blue dots of the graph are part of the training data of the GP (D training ). The red line represents the result of the marginalization process explained in equation 19.
Figure 7 indicates that three parameters significantly influence spheroid area: energy consumption rate, proliferation rate, and proliferation ATP level. The glycolysis reaction rate exhibits a weaker, though still perceptible, influence on spheroid area, more clearly visualized in Figure 7.b). It can be seen that among the non-metabolic parameters, the proliferation rate and the proliferation ATP threshold have the most substantial influence on spheroid area. This suggests that once spheroid cells transition to a quiescent state, they are unable to revert to proliferation, ultimately leading to cell death. This hypothesis is supported by the negligible effect of the death ATP level on spheroid area. Among metabolic parameters, the energy consumption rate is the dominant factor. A minor effect of the glycolysis reaction rate is also observed (Figure 7.h). The lack of dependency on aerobic and anaerobic reaction rates suggests that any of these pathways are preferentially activated under the experimental conditions used for training the GP model. To extend the sensitivity analysis, we computed the Sobol indices of the model parameters, as indicated in Section 2.4.1. In Figure 8, first and total order Sobol indexes are presented. This figure supports the conclusions drawn from Figure 7, while also offering a more detailed understanding of the relative influence of each parameter and their interactions. Figure 8 clearly show that the energy consumption rate (K ene ) is the most influential parameter, with a substantial first-order Sobol index. When considering total-order indexes, additional dependencies emerge. Second-order Sobol indexes, which quantify pairwise parameter interactions can be found in Appendix A. These results indicate that k glu , k prolif , and AT P prolif contribute meaningfully to the model output when combined with K ene . Conversely, the other three parameters (K aer , K ana , and AT P death ) exhibit negligible influence and will henceforth be fixed at the midpoint of their respective range of values (Table 6).
To enable efficient simulation for Bayesian parameter calibration, the efficienty of the GP has been analyised. The model was trained on 1024 measurements, and achieved a Nash-Sutcliffe efficiency coefficient (Eq. 21) of Q 2 = 0.98, confirming its high predictive accuracy.
As described in subsection 2.4.3, the Bayesian optimization process provides a probability distribution for the model parameters. The Bayesian optimization algorithm explores the parameter space defined by the prior distributions and, after comparing the GP predictions with the experimental data, yields posterior distributions of the parameters of the model.
Because the experiments in subsection 3.1 revealed markedly different dynamics and regimes, the spheroids were classified into three groups, each displaying distinct behaviours. As outlined in Section 2.1, parameter estimation was performed independently for each spheroid group. Uniform priors were assigned to the four parameters identified as most influential by the sensitivity analysis (Figure 8). The prior ranges, reported in Table 5, were determined either from preliminary model simulations or from values reported in the literature. In addition to the full posterior distributions, the maximum a posteriori (MAP) estimate was computed for each experimental group. This estimate corresponds to the parameter set that best reproduces the experimental measurements. Figure 9 shows the posterior distributions of the four parameters, together with the marginal MAP values (estimated individually for each parameter) and the joint MAP values (estimated simultaneously for all parameters). The figure highlights clear differences both among the posterior distributions of the four parameters and across the three spheroid groups. In particular, the posterior distributions of K ene and K prolif are more sharply defined than those of K glu and AT P prolif . Furthermore, differences between groups can be observed both in the posterior distributions and in the MAP estimates. Specifically, larger spheroids exhibit higher K prolif values and lower K ene values compared with smaller spheroids. Once the joint MAP values were obtained, they were introduced into the computational model to evaluate whether the simulations could reproduce the experimental observations (Table 6). Three simulations were performed for each group (large, medium, and small spheroids). The resulting spheroid areas were measured, averaged across replicates, and compared with the experimental data (Figure 10).
Overall, the computational model successfully reproduced the experimental dynamics across the three spheroid groups. However, during the initial days, the simulated spheroid areas tended to be slightly smaller than those measured in vitro. This effect might be due to the assumption that all spheroids start being an individual cell that proliferate through the days, updating its size. However experimental measurements clearly indicate that spheroids start their growth with a size different than that of individual cells. This might be due to an error in the experiments, probably because the diffu- sor used did not favor single-cell deposition in the microfluidic devices correctly and small aggregates of cells formed in the microfluidic devices at the beginning of the experiments. Despite this, Figure 10, shows that our computational model is able to overcome this initial difference between the experimental reality and the computational approach and reproduce the general time-dependent behaviour of the experimental data, specially for the last days of the experiments.
Collagen concentration regulates the size of the tumor spheroid.
Once the parameters of our model were calibrated and the simulations successfully reproduced the behaviours observed in the experimental setup (Figure 10), we analysed the model response across different collagen densities.
Using the parameters listed in Table 6, we ran 10 simulations for the three sets of parameters and for three different ECM collagen concentrations: a high-density matrix (6 mg/ml), a medium-density matrix (4 mg/ml), and a low-density matrix (2.5 mg/ml). Several measurements were obtained from the spheroids formed in the simulations. Figure 11 shows the resulting spheroid areas. As shown, low- density collagen produces the smallest spheroids. This reduction is not due to changes in proliferation, as cell proliferation does not depend on the collagen density of the ECM (see Appendix 1). Instead, the smaller spheroids observed in the 2.5 mg/ml condition result from fragmentation of the initial spheroid into multiple smaller aggregates. This hypothesis is supported by the greater number of spheroids formed in low-density matrices (Appendix 1).
Figure 11 also shows that medium-density ECMs tend to yield larger spheroids than high-density ECMs. Again, this difference cannot be attributed to variations in proliferation, but rather to changes in spheroid compaction. Spheroids formed in high-density matrices are more compressed and therefore occupy a smaller area. To verify this behaviour, we quantified the mean volume occupied per cell within each spheroid. The results (Figure 12) support this interpretation: cells in high-density matrices have less available space, resulting in more compact spheroids. These findings confirm that differences in spheroid compactness drive the size differences between medium-and high-density matrices, where the number of spheroids formed is similar (Appendix 1).
According to Hernández-Hatibi et al. [16], mesenchymal cell lines such as Panc-1 and MIA PaCa-2 tend to exhibit behaviours consistent with our simulations: spheroids grown in 4 mg/ml collagen form larger structures than those grown in 6 mg/ml matrices. In contrast, more epithelial cell lines (BxPC-3 and Capan-2) do not show this behaviour. These lines show either no significant differences in spheroid size across ECM densities or exhibit larger spheroids in denser matrices rather than in the 4 mg/ml conditions. This behaviour could be explained by the greater motility and metastatic potential of mesenchymal cell lines. In motility-driven spheroid expansion, as it is in this case where all the conditions present the same proliferation (See Appendix 1), increased ECM density restricts cell movement, thus limiting spheroid growth. Conversely, epithelial cells exhibit significantly lower motility; therefore, their spheroid expansion is governed primarily by proliferation rather than movement. As a result, epithelial spheroids are less sensitive to increased ECM stiffness and the associated reduction in cell motility. To test this hipothesis in our model, we simulated epithelial-like behaviour by reducing the locomotive force (Equation 8) of computational cells by a factor of 0.1, and we quantified spheroid area under this reduced-velocity condition. We then performed an statistical analysis (See Appendix C) on the measurements derived from the computational spheroids to asses wheter they align with the findings of Hernandez-Hatibi et al. Figure 13 compares the results and statistical analysis obtained for the spheroid area of the simulations in ephithelial and mesenchymal cell lines. As observed in Figure 13, reducing cell velocity results in the disappearance of size differences between spheroids grown in high-and medium-density matrices. This difference can be appreciated not only visually but also through the statistical analysis presented in the figure. Therefore, these simulations reproduce the behaviour reported by Hernández-Hatibi et al. [16] for epithelial cell lines, confirming our hypothesis, reduced movility reproduce epithelial behaviours.
Computational models are powerful tools to complement and improve the understanding of in vivo and in vitro experiments. Nevertheless, current models still face mayor challenges in order to become fully realistic and broadly applicable. In particular, they must capture biological behaviour in its entirety, rather than focusing on isolated aspects, and reproduce experimental results not only qualitatively but also quantitatively. In this study, we take a step toward addressing both of these challenges.
Figure 11: Spheroid areas obtained from computational simulations across three ECM collagen concentrations (2.5, 4, and 6 mg/ml). Ten replicates were performed per condition. Despite identical proliferation rates across matrices (see Appendix 1, Figure 16), clear differences in spheroid area emerge. Low-density collagen (2.5 mg/ml) produces smaller spheroids due to fragmentation of the initial aggregate. Medium-density collagen (4 mg/ml) yields the largest spheroids, whereas highdensity collagen (6 mg/ml) results in more compact, smaller aggregates. We present a computational model that integrates metabolic and mechanical effects, thereby bridging two approaches that are often studied separately. This model enables the evaluation of diverse biological quantities related to metabolism and mechanics, such as metabolites consumtion and secretion or the velocities of the cells. Simultaneously, it provides multicellular structural descriptors, including the number and size of spheroids formed, thereby offering insight into the organization and development of tumour-like structures. Building on this modelling framework, the sensitivity analysis and Bayesian parameter calibration performed in this work enables to infer not only qualitative but also quantitative results from our model. From Figures 7 and8, we infer that within the explored parameter space and experimental conditions, three parameters (k aer , k ana , and AT P death ) present negligible influence on the model output. This suggests that under the specific conditions tested, for the A549 cells both the aerobic and anaerobic metabolic pathways are balanced and cells do not show a clear preference for either pathway. Nevertheless, a broader sensitivity analysis would be required to confirm that these parameters remain negligible across wider parameter ranges and experimental contexts. The posterior distributions obtained through Bayesian calibration (Figure 9) indicate that larger spheroids are characterised by higher proliferation rates (k prolif ) and lower energy consumption (k ene ). This suggests that spheroids tend to grow larger when they achieve greater energy efficiency, meaning they can proliferate more rapidly while expending less energy. As shown Figure 13: a), b) and c) Comparation between the spheroid areas from mesenchymal and epithelial cell lines. A statistical analysis was conducted to evaluate the significance of the differences in size observed between the groups. The detailed methodology is provided in Appendix C. d) Experimental results from Hernandez-Hatibi et al. [16] for different mesenquimal (Panc-1 and MIA PaCa-2) and epithelial (BxPC-3 and Capan-2) cell lines. Under reduced motility, spheroid sizes in medium-and high-density matrices converge, as it can be seen in the statistical analysis. This reproduces qualitativelythe behaviour reported experimentally for epithelial cell lines(d).
in Figure 10, combining our computational model with the Bayesian calibration pipeline enabled the quantitative reproduction of experimental results obtained in vitro using microfluidic devices. This agreement between simulations and experiments reinforces the model’s capacity to capture key biophysical mechanisms underlying spheroid development.
Moreover, our model is not only able to reproduce experimental data, but also to generate mechanistic insights into spheroid growth under different microenvironmental conditions. As illustrated in Figure 11, simulations reveal that spheroids reach their largest sizes at intermediate collagen densities (4 mg/mL). This behaviour arises because low-density matrices fail to physically confine cells, leading to the formation of multiple smaller spheroids, whereas high-density matrices impose strong mechanical constraints that compact the structure and reduce the available space per cell. These findings highlight the model’s capacity to connect extracellular matrix properties with emergent multicellular organisation. Additionally, the model proved to be flexible enough to capture qualitative differences between cell lines and phenotypes. By modifying the parameters that determine cell motility, we were able to reproduce behaviours typically associated with epithelial and mesenchymal states (Section 3.5). This demonstrates that the computational framework can be tuned to represent diverse biological scenarios and suggests its potential for studying transitions between phenotypic states or comparing different tumour cell populations.
Despite the model’s success in capturing experimental data (Figure 10), certain limitations remainand open important avenues for future research. First, our metabolic model focus on the glycolysis-OXPHOS axis. However, recent studies have demonstrated the importance of the role of other metabolites for the production of energy. Guerrero-Lopez et al demonstrated that glutamine metabolism is crucial in 3D cultures, especially under glucose restriction [50]. The future incorporation of glutamine pathways will increase the model’s biological realism. The ECM is currently simulated as a homogeneous continuum medium that provides mechanical resistance. However, the ECM consists of heterogeneous fibre networks that directly influence tumour cell migration, and tumour cells can, in turn, remodel these fibres during growth and invasion [51,52]. While our continuum approximation is computationally feasible and sufficient to reproduce certain experimental observations [28], it does not capture ECM heterogeneity or its dynamic interactions with tumour cells. Despite these limitations, the proposed model has proven to be a robust tool for reproducing in vitro experiments and for generating biologically meaningful insights into the processes governing cancer cell behaviour and tumour spheroid formation. In addition to that, the model here proposed fills the current gap in the literature of a single agent-based cancer model that couples the metabolic and mechanic aspects of tumour spheroid formation. This work thus contributes to the development of integrated computational frameworks that can bridge metabolism and mechanics, supporting both experimental research and the advancement of predictive oncology.
A Appendix 1. First, second and total order Sobol indexes.
In this appendix, we present the results of the Sobol sensitivity analysis performed for our computational model. Table 7 reports the first-order and total-order Sobol indices, which quantify the individual contribution of each parameter and their combined (global) influence on model output, respectively. Table 8 shows the second-order Sobol indices, providing insight into pairwise parameter interactions within the model. B Appendix 2. Additional measurements and statitical analysis for different collagen density.
In this appendix we present some extra measurements performed for the parameters obtained through the Bayesian optimization process.
To evaluate whether the differences observed across simulation scenarios were statistically significant, we applied either an ANOVA or a Kruskal-Wallis test, selecting the appropriate method based on the normality and homogeneity of variance of the data.
Normality was assessed using the Shapiro-Wilk test [53], which is well suited for small sample sizes and therefore appropriate for our dataset, composed of fewer than 50 simulations per condition. The outcomes of the normality analyses are provided in Tables 9 and10.
Homogeneity of variances across groups was subsequently evaluated using Levene’s test, with results reported in Tables 11 and12.
Based on the combined results of the normality and variance-homogeneity assessments, each dataset was assigned to either a parametric ANOVA followed by a Tukey-Kramer post-hoc or a non-parametric Kruskal-Wallis test with post hoc Dunn’s test for further statistical comparison. The corresponding decisions for each case are summarized in Tables 13 and14.
Glu + 2 NAD + ←→ 2 Pyr + 2 NADH + 2 AT P Aerobic ATP production Pyr + NADH + 3 O 2 ←→ 17 AT P + NAD + Lactic acid fermentation Pyr + NADH ←→ Lac + NAD + Cellular ATP consumption AT P -→ ∅