Reading time: 28 minute
...

๐Ÿ“ Original Info

  • Title:
  • ArXiv ID: 2512.17979
  • Date:
  • Authors: Unknown

๐Ÿ“ Abstract

Industrial symbiosis fosters circularity by enabling firms to repurpose residual resources, yet its emergence is constrained by socio-spatial frictions that shape costs, matching opportunities, and market efficiency. Existing models often overlook the interaction between spatial structure, market design, and adaptive firm behavior, limiting our understanding of where and how symbiosis arises. We develop an agent-based model where heterogeneous firms trade byproducts through a spatially embedded double-auction market, with prices and quantities emerging endogenously from local interactions. Leveraging reinforcement learning, firms adapt their bidding strategies to maximize profit while accounting for transport costs, disposal penalties, and resource scarcity. Simulation experiments reveal the economic and spatial conditions under which decentralized exchanges converge toward stable and efficient outcomes. Counterfactual regret analysis shows that sellers' strategies approach a near Nash equilibrium, while sensitivity analysis highlights how spatial structures and market parameters jointly govern circularity. Our model provides a basis for exploring policy interventions that seek to align firm incentives with sustainability goals, and more broadly demonstrates how decentralized coordination can emerge from adaptive agents in spatially constrained markets.

๐Ÿ“„ Full Content

Industrial Symbiosis (IS) refers to the exchange of byproducts between firms, turning one company's waste into valuable inputs for another one. By reusing materials such as heat, rubble, or chemical residues, IS reduces environmental impact while generating economic benefits, embodying the circular economy principle of closing resource loops. For instance, the RETDA eco-industrial park in China reported a 33% increase in input productivity, a 3650% rise in water use efficiency, and a 30.91% reduction in emissions [12].

Despite its potential, IS rarely emerges spontaneously. Exchanges depend on local supply-demand compatibilities, economic incentives, and logistical constraints [29]. Poor coordination often results in wasted resources and missed opportunities, highlighting the gap between individual strategies and collective circularity goals. Identifying the conditions under which decentralized firms can coordinate effectively is therefore a key research challenge.

Analytical solutions are often intractable due to the combinatorial complexity of interactions and the emergent nature of prices and traded quantities. Simulation-based approaches, by contrast, are particularly suited to study such nonlinear, decentralized, and heterogeneous systems [28]. By systematically varying system configurations, simulations can indeed reveal market behaviors and highlight factors that support efficient and resilient local circularity. Agent-based models, in particular, are able to capture heterogeneity among actors and test how different market designs, decision rules, and learning mechanisms affect the emergence of exchanges under realistic economic constraints [19]. Their ability to explicitly represent geosituated agents makes them especially suitable for studying spatially dependent interactions, such as proximity based collaborations, transportation costs, and localized resource flows. Unlike theoretical game-theoretic models, which often rely on a small number of representative players to ensure tractability, agent-based models can accommodate many heterogeneous actors and interactions.

However, their effectiveness is often constrained by the simplifications made in representing agent behaviors. In many existing symbiosis models [8,21,23,26,43,61], firms follow fixed behavioral rules or static pricing strategies, limiting their ability to capture the adaptive and uncertain nature of real-world exchanges. In practice, firms continuously adjust decisions in response to shifting demand, fluctuating costs, and evolving environmental regulations. This highlights the need to enhance decision-making mechanisms within agent-based models, enabling agents to learn, adapt, and explore strategies in a more dynamic way.

In this paper, we present a spatially explicit, decentralized multiagent model in which buyers and sellers of byproducts interact through multilateral spatial double auctions using co-evolving strategies. Sellers adaptively update their pricing strategies via reinforcement learning in a partially observable environment, while buyers select offers that maximize utility subject to spatial and economic constraints. A key feature of the model is that both transaction prices and traded quantities emerge endogenously from these interactions, allowing us to investigate how local decisions give rise to systemic patterns. Our contributions are threefold. First, we develop a reinforcement learning-based market model for decentralized byproduct exchanges, based on a novel auction mechanism adapted to spatial and economic constraints of territorial circularity. Second, we validate the learned strategies via counterfactual regret analysis, showing convergence toward near-Nash equilibria. Finally, a simulation-based analysis of conditions under which adaptive pricing and agent interactions lead to stable, efficient, and circular outcomes, providing insights for policymakers.

The paper first reviews related work (Section 2), then presents the decentralized market model (Section 3) and simulation setup (Section 4). Results on price formation and local circularity are analyzed in Section 5, followed by discussion (Section 6) and conclusions (Section 7).

Industrial symbiosis in Eco-Industrial Parks (EIPs) has traditionally been studied using system dynamics to capture network level behavior and feedback loops [7,35,42,53,65]. Complementary approaches have treated EIPs as complex adaptive systems, highlighting dynamic business interactions and emergent symbiotic relationships [14,20,32,47,63]. Other studies focused on material and energy flows, byproduct exchanges, and social network analysis [13,18,20,48,62]. Recent reviews [17,44] emphasize the potential of agent-based modeling to capture local, adaptive, and self-organizing behaviors. Emerging agent-based studies have integrated input-output models [61], explored spatially networked exchanges [26,38,43], and examined sector specific synergies [8,21,23]. Yet, despite these advances, the explicit representation of economic incentives and market coordination mechanisms remains underexplored. Also, these works does not explicitly model markets where prices, trade volumes, and resource allocations emerge endogenously from decentralized interactions.

Auction and negotiation mechanisms in Multi-Agent Systems (MAS) have become canonical tools for decentralized coordination. From early negotiation platforms [10] to double auctions-market mechanisms in which multiple buyers and sellers simultaneously submit bids and offers-used to study bidding strategies and price discovery [50,56], research has demonstrated how decentralized negotiations can generate robust outcomes. Extensions have examined repeated competition [4,34], peer-to-peer trading [41],

social influence [59], and multilateral bargaining across multiple resources [1,11]. Spatial aspects are frequently acknowledged, for instance through costly trade links [3], heterogeneous bilateral transport costs [15], or land markets where localized interactions induce differentiated prices [22]. Yet these contributions remain largely theoretical, typically involving few participants and limited attention to adaptive mechanisms that capture the evolving dynamics of real markets. In particular, the integration of spatial frictions with learning-based bidding strategies has received limited attention, despite its relevance for understanding decentralized coordination in resource-constrained environments.

Traditional learning agents often rely on fixed rules [16], or supervised training [33], limiting their ability to adapt to dynamic market conditions and strategic interactions. Reinforcement Learning (RL) overcomes these limitations by allowing agents to learn optimal policies through trial-and-error, balancing exploration and exploitation in complex, evolving environments. In the context of double auctions, RL agents are particularly well suited due to the complex interactions between buyers and sellers and the continuous feedback provided by order book dynamics. Agents can be designed to adopt specific roles, such as deciding when to buy, sell, or set prices, enabling cooperative and strategic trading behaviors [5,37]. Multi-Agent Reinforcement Learning (MARL) extends this approach to settings with multiple interacting agents, allowing the simulation of bottom-up market dynamics [24,31,36,52]. MARL models have been shown to reproduce emergent behaviors in order books and centralized market microstructures by capturing interactions among heterogeneous agents [31]. This enables detailed analysis of price formation, liquidity, and strategic adaptation. However, existing approaches largely ignore spatial constraints, as these are typically absent in the stock market. Yet, such constraints can be critical in markets where transportation or delivery costs influence trading decisions [46]. RL with spatial considerations have been primarily explored in logistics, where carriers and shippers act as bidders in dynamic auctions mediated by centralized brokers [2,40,60,64]. Even when shippers themselves are modeled as learning agents [58], exchanges remain intermediary mediated.

In contrast, many real-world IS transactions are inherently local and broker-free. To the best of our knowledge, RL has not yet been applied to decentralized, spatially constrained markets where transport costs and policy incentives influence agents’ strategies.

Taken together, three main research gaps emerge:

(1) Industrial symbiosis models rarely provide in depth mechanism for endogenous market-based price formation. (2) Auction and negotiation models seldom incorporate adaptive learning of agents. (3) Reinforcement learning in markets has not been applied to decentralized settings where spatial topology influence agents behavior. Our contribution is to bridge these strands by introducing a decentralized auction model for industrial symbiosis where (i) sellers adapt prices through reinforcement learning, (ii) spatial constraints are explicitly internalized, and (iii) both prices and traded quantities emerge from multilateral interactions. This framework connects industrial symbiosis modeling with mechanism design and multi-agent RL, allowing modelers to explore how local adaptive dynamics shape circularity outcomes. Our objective is to reproduce the systemic complexity of inter-firm interactions within industrial symbiosis as a subset of the broader circular economy, providing a foundational basis for modeling circular economic dynamics.

To concentrate on the essential dynamics of the model, we restrict our analysis to the exchange of a single byproduct. This simplification enables us to isolate and assess the effects of multilateral interactions and sellers strategies, without additional complexity introduced by multiple interdependent byproducts. We model a decentralized market for one byproduct, composed of buyers B = {๐ต 1 , . . . , ๐ต ๐‘ ๐ต } and sellers S = {๐‘† 1 , . . . , ๐‘† ๐‘ ๐‘† }. Each agent has a spatial location, and the distance between a buyer ๐ต ๐‘– and a seller ๐‘† ๐‘— is denoted ๐‘‘ ๐‘– ๐‘— . Buyers have heterogeneous initial demands ๐‘ž ๐‘– drawn from a scaled uniform distribution and evaluate incoming offers based on their price sensitivity ๐›ฝ ๐‘– . Specifically, buyer ๐ต ๐‘– accepts an offer ๐‘ ๐‘– ๐‘— from seller ๐‘† ๐‘— only if ๐‘ ๐‘– ๐‘— โ‰ค ๐›ฝ ๐‘– ๐‘ ๐‘š , where ๐‘ ๐‘š is the reference market price. This mechanism ensures that buyers only engage in transactions that meet their cost expectations relative to the external market.

Sellers are endowed with quantities ๐‘ž ๐‘— and propose personalized prices to buyers, defined as

The term ๐‘ ๐‘– ๐‘— is originally defined as ๐‘ ๐‘– ๐‘— = ๐‘‘ ๐‘– ๐‘— ๐‘ ๐‘ก , where ๐‘ ๐‘ก represents the transportation cost per kilometer (expressed in percentage of the market price). This term can be further extended to capture the total logistics cost required for the byproduct provided by the seller to become usable by the buyer, including handling, treatment, or adaptation costs. Finally, ๐œ™ ๐‘— is an adaptive pricing parameter, that evolves during the learning phase.

In addition to market and transportation costs, sellers face a penalty ๐‘ ๐‘‘ for unsold quantities (also expressed in percentage of the market price), representing the cost of sending residual flows to landfill and reflecting both economic and environmental impacts. If a seller ๐‘— ends a timestep with ๐‘ž unsold ๐‘— units remaining, its reward is reduced by ๐‘ž unsold ๐‘— ๐‘ ๐‘‘ , imposing a direct financial penalty that encourages sellers to adjust pricing strategies to minimize waste while still aiming to maximize profit.

To characterize market conditions, we define the scarcity level of the byproduct as

This measure allows us to systematically investigate the effects of supply-demand imbalances on agents’ behavior in the decentralized market.

Each seller ๐‘† ๐‘— aims to maximize its cumulative expected profit at each simulation step. Let q ๐‘ก ๐‘— = (๐‘ž ๐‘ก ๐‘– ๐‘— ) ๐‘– โˆˆ B denote the vector of quantities bought by ๐ต ๐‘– from ๐‘† ๐‘— at step ๐‘ก, and p ๐‘ก ๐‘— = (๐‘ ๐‘ก ๐‘– ๐‘— ) ๐‘– โˆˆ B the corresponding price vector. Defining c ๐‘— = (๐‘ ๐‘– ๐‘— ) ๐‘– โˆˆ B as the transport cost vector, the instantaneous profit can be written compactly as

where โŸจโ€ข, โ€ขโŸฉ denotes the dot product. The first term corresponds to the net revenue from executed contracts, while the second term penalizes unsold inventory through the landfill cost ๐‘ ๐‘‘ . This spatially defined market closely resembles a spatial Bertrand oligopoly [30], where multiple sellers compete by setting prices and buyers prefer lower-cost options. Unlike the classical Bertrand model, where identical products drive prices toward marginal cost, the inclusion of transport costs differentiates sellers spatially, allowing local price variation and creating room for adaptive strategies to emerge.

We model the interactions between buyers and sellers using a decentralized multilateral auction described in Algorithm 1. Each seller submits a set of bids to each buyers, where each bid specifies a price and quantity for a given product ๐‘, computed on the basis of the logistic costs required to deliver the product to the buyer’s location and the external market price. Buyers evaluate the incoming bids and select the one that maximizes their utility, subject to their price sensitivity ๐›ฝ ๐‘– and demand ๐‘ž ๐‘– . Selected bids are then sent back as proposals to sellers. Each seller chooses a subset of the received bids to accept in order to maximize expected profit, taking into account transport costs ๐‘ ๐‘– ๐‘— and remaining inventory ๐‘ž ๐‘— . Contracts are executed, updating both buyers’ needs and sellers’ available quantities. This process iterates until no further mutually acceptable bids can be matched. Sellers adapt their pricing strategy ๐œ™ ๐‘  over time using a strategy based on historical rewards that will be described in 3.3.

Require: Buyers B, Sellers S, product ๐‘ 1: while contracts has been made at the previous timestep do 2:

Seller bids: each ๐‘† ๐‘— โˆˆ S submits a bid (๐‘ž ๐‘— , ๐‘ ๐‘– ๐‘— ) to each ๐ต ๐‘– โˆˆ B, where ๐‘ž ๐‘— is the maximum quantity available for ๐‘† ๐‘— , and ๐‘ ๐‘– ๐‘— = ๐œ™ ๐‘— ๐‘ ๐‘š + ๐‘‘ ๐‘– ๐‘— ๐‘ ๐‘ก

Buyer choice: each ๐ต ๐‘– evaluates the received bids, selects the most attractive one within price tolerance ๐›ฝ ๐‘– ๐‘ ๐‘š , and sends acceptance proposal to the corresponding seller 4:

Seller allocation: each ๐‘† ๐‘— sorts accepted bids by expected profit per sold unit, confirms contracts until inventory is exhausted, updates sold quantity and profit 5:

Remove satisfied buyers and empty sellers from B and S 6: end while 7: Return executed contracts

It has been shown by [4] that Bertrand competition problems can be effectively learned using bandit algorithms. Accordingly, each seller’s sequential pricing decision is modeled as a contextual multiarmed bandit. The action space is discretized into ๐พ intervals, โˆ€ 0 < ๐‘˜ โ‰ค ๐พ:

where ๐œ™ min = -๐‘ ๐‘‘ ๐‘๐‘š . At every timestep ๐‘ก, seller ๐‘† ๐‘— selects a discrete pricing action ๐œ™ ๐‘ก ๐‘— โˆˆ ฮฆ based on the current market context and receives a reward ๐‘Ÿ ๐‘ก ๐‘— . Each interval is associated with a weight ๐‘ค ๐‘˜ representing the expected reward. When an action is chosen and the associated reward is obtained, the weights are updated via an exponential moving average (EMA):

Here, ๐›ผ โˆˆ [0, 1] is the smoothing factor, which balances the influence of past and recent observations. By prioritizing recent data, EMA allows rapid adaptation to changing conditions while remaining computationally efficient, requiring only constant-time updates per action. Although it does not offer formal convergence guarantees, EMA provides a practical compromise between adaptivity and scalability, and allows to maintain high learning efficiency [6].

Sellers select their next action using Boltzmann (softmax) sampling [55] with a declining temperature ๐œ ๐‘ก :

where ๐œ 0 sets the initial exploration level, decay controls the annealing rate, and ๐œ min prevents premature convergence. This ensures exploration dominates early in learning, while the policy gradually becomes greedier over time. The subsequent action ๐œ™ (๐‘ก +1)

๐‘— is drawn at random from the set of actions according to probabilities ๐‘ƒ ๐‘˜ . This sampling strategy balances exploration and exploitation in a multi-agent, stochastic environment. In our simulations, Boltzmann exploration consistently converged faster and with less oscillation than ๐œ–-greedy or UCB, which often struggled with noise from multiagent interactions [39]. By combining a discretized action space, adaptive reward estimates, and Boltzmann sampling, the model captures the decentralized emergence of price formation and allocation dynamics in a spatially distributed market for byproducts. While Boltzmann does not always guarantee optimal regret bounds in multi-agent stochastic settings, its smooth adaptation makes it practical and effective for this scenario.

We finally introduce a symbiosis index (SI) to quantify the efficiency of local byproduct exchanges. It is defined as:

, where ๐‘ž bought denotes the quantity actually purchased, ๐‘ž toSell the quantity offered by the seller, and ๐‘ž needed the quantity required by the buyer. The denominator normalizes the indicator so that it cannot exceed 1, even when the sold quantity is greater than the buyer’s need. This ensures that the symbiosis index reflects the proportion of demand satisfied rather than rewarding oversupply. In contrast to existing technical indicators [25,27,57,65], which often incorporate numerous parameters (e.g., material compatibility, lifecycle impacts, or multi-criteria environmental scores), our simplified formulation deliberately focuses only on exchanged quantities. This avoids unnecessary complexity while remaining sufficient for capturing efficiency in the present modeling framework.

The model is implemented in a simulation framework that supports decentralized markets with heterogeneous agents. Buyers and sellers are spatially distributed across a territory, which can be represented either with an abstract 2D environment or with a real geographic area with road network constraints. Distances, travel times, and transport costs are computed based on the underlying spatial representation, allowing realistic modeling of logistical constraints. The businesses interact through the auction mechanism described earlier, with offers, acceptances, and executed contracts dynamically updating inventories and demands.

The framework tracks a variety of systemic metrics over time, including:

โ€ข Local prices and their evolution;

โ€ข Proportion of demand satisfied locally (symbiosis index);

โ€ข Network connectivity and trading patterns;

โ€ข Distance travelled by the sold byproducts.

These indicators form the backbone of our analysis, as they connect local agent decisions to measurable systemic dynamics. This flexibility enables us to investigate a wide range of experimental settings, from dense and highly interconnected territories to sparse environments where exchanges are scarce. Visualization tools provide spatial and temporal representations of market dynamics, price formation, and circularity indicators. The code implementation is available on GitLab1 .

For similar reasons behind our decision to focus on a single product, we refrain from using a real-world territory and instead construct virtual environments. This approach provides precise control over key parameters such as firm density, territorial extent, and spatial clustering, thereby enabling a systematic exploration of their influence on circularity indicators and overall market performance. One of the key factors shaping the potential for cooperation is the spatial dispersion of firms. We therefore define the cluster spread parameter (๐‘๐‘ ) as the standard deviation of companies positions around cluster centers, normalized by the environment width: small values lead to compact, well separated clusters, while setting it to 1 produces a configuration comparable to a fully uniform distribution across the environment, effectively removing any clustering effect. Figure 1 shows the difference between a low (1a) and high (1b) spread value. In our experiments, businesses were allocated into four spatial clusters. By default, the cluster spread parameter was set to 1, indicating a broad dispersion and minimal clustering. The results we obtain came from simulations conducted with a population of 40 firms. The market price was fixed at 100, and the transportation cost was set to 0.1 per kilometer. Sellers adjusted their strategies through Boltzmann exploration, with a temperature decay parameter set to 0.996 and the action space discretized into 30 bins. The relatively small scale of the simulated market was chosen to balance computational tractability with the ability to capture complex multi-agent interactions. This size of population allows us to conduct thousands of simulation runs for systematic sensitivity analyses and counterfactual experiments, which would be computationally prohibitive at larger scales. Furthermore, each simulation runs for a large number of timesteps (1000 steps) to ensure that seller strategies have sufficient time to adapt and converge, and that emergent patterns of price formation and allocation stabilize. By prioritizing repeated experimentation and long-horizon dynamics over absolute market size, we focus on understanding the fundamental mechanisms driving spatial effects, adaptive bidding behavior, and the emergence of circularity. Our implementation is optimized for performance, allowing on a standard laptop processor (Intel(R) Core(TM) Ultra 9 185H processor with 21 cores) to achieve approximately 100 simulations of 1000 timesteps per minute.

In this section, we examine the core dynamics of the decentralized, spatially explicit market, focusing on conditions for price convergence and the efficiency of adaptive strategies using counterfactual regret analysis. We then explore how spatial organization, resource scarcity, disposal costs, and density interact to shape market outcomes and local circularity. Finally, a global sensitivity analysis quantifies the influence of individual parameters and their interactions on emergent symbiosis patterns.

We first explore empirically the evolution of transaction prices as sellers adapt their strategies under different scarcity and disposal cost conditions, with a density fixed to 0.001. Figures 2a and2b show the mean price trajectories over time compared to the external market price and the disposal cost. As anticipated given the decaying temperature, the curves gradually stabilize, approaching an equilibrium after sufficient simulation steps. In the high scarcity and low disposal cost setting (Figure 2a), prices converge toward an equilibrium close to the external market price. By contrast, in the low scarcity and medium disposal cost scenario (Figure 2b), prices decline steadily and stabilize near the disposal cost, reflecting buyers’ stronger bargaining power and the reduced competitive pressure on resource demand. To better understand the efficiency and stability of the sellers’ adaptive strategies, we next examine the dynamics at the level of individual firms using counterfactual regret analysis [66], which quantifies how much a seller could have improved its payoff by choosing alternative actions.

Let ๐œ™ (๐‘ก ) ๐‘— โˆˆ ฮฆ ๐‘— denote the action used by seller ๐‘  at timestep ๐‘ก, and ๐‘Ÿ (๐‘ก ) ๐‘— (๐œ™ (๐‘ก ) ๐‘— ) denote the reward received at that step. The optimal reward at time ๐‘ก is

where ๐‘Ÿ (๐‘ก ) ๐‘— (๐œ™) is the reward seller ๐‘  would have obtained at step ๐‘ก if it had chosen action ๐œ™ while all other sellers’ actions remained unchanged. The regret of seller ๐‘  at time ๐‘ก is then

A positive regret indicates that the seller could have achieved a higher payoff at that step by selecting a different action, whereas a regret of zero indicates that the chosen action was optimal given the choices of the other sellers at that moment. Empirically, we observe that per-step total regret, the aggregate deviation from a joint optimum, declines over time and converges toward zero in both scenarios (Figure 3). Since per-step regret measures the immediate payoff loss relative to the best fixed action in hindsight at that step, a vanishing regret implies that sellers’ strategies are nearly optimal given the current actions of others. Consequently, the joint strategy profile of all sellers appears to converge toward an approximate Nash equilibrium, providing justification for the stability of the emergent market dynamics under our MARL learning scheme.

In this section, we examine how firm density, resource scarcity, and disposal costs shape the dynamics of transaction prices and circularity. We focus on these output metrics because they directly quantify the level of local circularity achieved in the market and the average exchange price, making them relevant proxies for assessing the effectiveness of different policy levers. By systematically varying firm density, resource scarcity, and disposal costs, we analyze how structural and economic factors influence sellers’ adaptive strategies, shaping both the emergence of local symbiosis and overall market outcomes. Figures 4 and5 On the price side (Figure 4), we observe that, as expected, higher disposal costs induce sellers to lower their prices in order to avoid the landfill penalty. This effect is strongest in low scarcity environments (๐‘  = 0.5), where intense competition can even drive equilibrium prices below zero in dense territories. At intermediate scarcity (๐‘  = 1), the downward trend remains but is attenuated, as sellers retain partial market power. Under high scarcity (๐‘  = 2), prices remain largely insensitive to ๐‘ ๐‘‘ : the excess demand ensures that buyers continue to pay close to the outside option ๐‘ ๐‘š , leaving sellers with little incentive to adjust their strategies. Interestingly, at the demand equilibrium (๐‘  = 1), we observe higher variance, reflecting the instability and potential oscillations in companies’ strategies as they adapt to competitive pressures.

Symbiosis levels (Figure 5) exhibit a complementary pattern: the share of demand met through local exchanges increases with ๐‘ ๐‘‘ , as the disposal penalty strengthens the incentive to form local contracts. Here, density plays a critical role: in dense environments, local exchanges dominate rapidly and symbiosis saturates close to one; in sparse environments, by contrast, symbiosis remains low even under high ๐‘ ๐‘‘ , reflecting structural limitations in firms’ connectivity. We observe that this time scarcity has only a marginal effect on the evolution of symbiosis: higher scarcity slightly enhances local exchanges, but the dominant factors remain disposal cost and firm density.

The previous analysis showed that market outcomes are shaped by interacting mechanisms, but it did not allow us to fully disentangle their relative influence. To address this limitation and account for the influence of additional variables, we conducted a variancebased global sensitivity analysis [49]. We evaluate the influence of the simulation parameters on the symbiosis index and converged mean price measured at the end of each simulation run, after 1000 timesteps. By analysing which parameters exert the greatest influence on the price and symbiosis index, we can identify the factors that policymakers should prioritize when designing interventions to foster circularity while keeping prices at a low level. We simulate for varying values of the five following variables: To efficiently explore the parameter space, we employed surrogate models, which act as fast emulators of the full simulation while capturing the key input-output relationships [51]. Specifically, we implemented a sparse Polynomial Chaos Expansion (PCE), enabling the analytical derivation of Sobol’ indices from its coefficients [54]. The surrogate models demonstrated strong predictive performance, achieving an ๐‘… 2 of 95% for the symbiosis and 92% for the price on a 20% test sample. Parameter sampling was performed using Latin hypercube sampling to train surrogate models, with ๐‘  sampled on a log 2 scale and ๐‘‘๐‘’๐‘›๐‘ ๐‘–๐‘ก๐‘ฆ on a log 10 scale. For each parameter set, two independent simulations were executed to account for stochastic variability in agent decisions, resulting in a total of 20,000 independent simulations.

Sobol’ sensitivity analysis decomposes the variance of the model output into orthogonal contributions attributable to individual input parameters and their interactions. We computed the full set of Sobol’ indices: first-order indices quantify the direct effect of each parameter on output variance, while higher-order indices capture interaction effects amongst parameters [9]. Taken together, Sobol’ indices provide a rigorous and comprehensive assessment of the relative importance of independent inputs and their interactions, with their sum accounting for 100% of the variance.

In Table 1, out of 31, we show only the 12 indices greater than 1% of the total variance for either symbiosis or price indices. The Sobol’ analysis confirms the strong influence of firm density: higher density reduces the average distance between agents and intensifies competition, explaining its impact on both model outcomes. As expected, price is also strongly affected by scarcity, and both outputs are influenced by transport costs. Interestingly, higher-order interactions contribute substantially to the variance, emphasizing the importance of interplay effects between parameters. Even if higher density is highly influential for both our symbiosis and price objectives, in practice, density is constrained by the considered territory and is not directly actionable. Therefore, in the subsequent analysis, we focus on the 4 other input variables under different fixed density scenarios, since their interaction effects with density are all significant even at a third order. To better capture the non-linear effects, we replace the sparse PCE by a Multi-Layer Perceptron neural network surrogate model (๐‘… 2 of 97% for the symbiosis and 95% for the price), and use it to predict both the Partial Dependence Plots (PDP) and Individual Conditional Expectations (ICE) [45] as illustrated in Figure 6. In these plots, each colored line corresponds to an ICE, showing how the outcome of a single simulation responds when varying one parameter while holding others fixed; the black line is the PDP, representing the average effect across all simulations, and the color of each ICE line encodes the underlying firm density.

First, for the price in Figure 6a, the PDP and ICE plots are consistent with the Sobol’ analysis in Table 1. In the blue low-density cases, the exchange price remains close to the market price and is only weakly influenced by other variables. As density increases, however, the price tends to decrease. The most influential variable is Scarcity, which exhibits a threshold effect: at low levels, scarcity has little impact, but between 1.5 and 2.5 it strongly drives the price upward. Beyond 2.5, the effect persists but grows at a slower rate.

We also observe that increasing the disposal cost provides a strong and linear incentive for price reduction; the higher the density, the stronger the effect. Finally, cluster spread and kilometer cost have more limited influence, suggesting that they are less effective levers for intervention on the price outcome.

Second, for circularity in Figure 6b, we again observe a pronounced density effect: higher density leads to greater symbiosis, largely independent of other variables. At low density levels, cluster spread emerges as an important factor, but only when it takes on small values. As spread increases, its influence fades rapidly, suggesting that spatial compactness can partially compensate for low density but only within narrow limits. Reducing travel costs appears more consistently actionable: at low density, for example, a per-kilometer cost below 2 is required to reach any desirable symbiosis value. Under the same low density conditions, increasing disposal costs also provide a clear incentive to exchange byproducts, while at high density its effect is only marginal. Finally, scarcity has a notable effect on low-to-mid density values only above the critical threshold of 2.5.

In summary, firm density is a primary determinant of both symbiosis and price, though it is mostly exogenous and difficult to influence directly through policy. Amongst the controllable parameters, transport cost (๐‘ ๐‘ก ) and disposal penalty (๐‘ ๐‘‘ ) emerge as effective levers. The sensitivity results indicate that reducing perkilometer transport costs, through measures such as shared logistics or targeted subsidies, can substantially enhance exchange opportunities in low-density areas. Similarly, higher disposal penalties consistently incentivize firms to seek symbiotic exchanges instead of wasteful disposal.

Our simulations highlight several insights into the dynamics of decentralized byproduct markets. First, the emergent pricing behavior shows that the RL process enables sellers to adapt efficiently to local conditions. The counterfactual regret analysis further demonstrates that the learned strategies converge toward a near Nash equilibrium, suggesting that decentralized adaptive learning can yield stable and rational outcomes without requiring a centralized market clearing mechanism. This finding underscores the potential of bottom-up coordination mechanisms in fostering circular exchanges.

The symbiosis index provides a tractable measure of local circularity and illustrates how territorial factors, scarcity, kilometer and disposal costs shape resource exchanges. Our results show that transportation costs and disposal penalties represent actionable levers, especially where density is not naturally high. Moreover, interactions among variables, including up to third-order effects, highlight that circularity outcomes result from the combined influence of spatial configuration and economic incentives rather than any single factor. These findings emphasize the importance of integrated, multivariate approaches when designing circular economy interventions.

Beyond methodological insights, the model offers a structured framework for evaluating policy instruments. The parameters of the simulation naturally correspond to concrete levers available to regulators. Increasing the disposal cost ๐‘ ๐‘‘ can be interpreted as a landfill While these policy-relevant insights highlight the practical applicability of our model, it is important to recognize several limitations that temper the generalizability of the results. The present model focuses on a single byproduct, which allows us to isolate core mechanisms of decentralized exchange but leaves aside interdependencies across multiple byproduct markets. Real-world industrial symbiosis often involves multi-product complementarities, cascading exchanges, and competition between alternative uses of resources. Likewise, our experiments relied on stylized virtual geographies rather than empirically calibrated territories. While this approach ensured control and replicability, it abstracts away from infrastructural constraints such as road networks or administrative boundaries that can strongly affect exchange feasibility. Another simplification concerns the temporal dimension: in our simulations, companies participate in auctions independently of their production cycles, whereas in practice production schedules and storage constraints strongly influence availability and willingness to trade. Furthermore, our agents trade without any memory of past interactions, while in reality firms build relationships over time, developing trust, reliability, and long term contracting practices. Nonetheless, by abstracting from these complexities, the present work provides an indispensable first step toward systematically modeling spatially constrained markets for industrial symbiosis, upon which richer and more realistic frameworks can be built.

These limitations also point to promising directions for future work. Extending the model to multi-product environments would allow capturing substitution effects and synergies between resource flows. Introducing temporal dynamics and agent memory would bring the model closer to real-world conditions, enabling the study of production rhythms, storage capacities, and relational contracts. Grounding the simulations in empirical case studies of specific territories would make it possible to validate predictions and directly inform policy design. Ultimately, developing a panel of decision parameters that systematically map simulation outcomes to actionable policies could provide decision makers with a robust tool to design interventions that align individual incentives with collective circular economy objectives.

We introduced a decentralized multi-agent model for local byproduct exchanges, capturing how sellers adapt prices through RL, based on spatial and economic constraints. Simulations exhibit that adaptive decentralized strategies can produce stable, efficient, and circular market outcomes, as evidenced by low counterfactual regret and high symbiosis indices. Our results highlight two complementary contributions.

First, from a policy design perspective, the simulator is the first step toward a decision support tool. By linking agent-level parameters to macro-level outcomes such as the symbiosis index, the model allows policymakers to identify which levers most strongly affect circularity. This provides a controlled environment where different regulatory scenarios can be tested before real-world implementation. The emergent insights from our simulations help policymakers prioritize interventions that effectively promote circular practices across heterogeneous firms and contexts.

Second, from a MAS perspective, the work provides a new methodological understanding of decentralized exchange mechanisms under spatial and economic constraints. We show how RL agents, interacting in a double auction market with heterogeneous resources and transport costs, can converge to stable trading patterns. Beyond the application domain, our contribution lies in revealing how market based coordination and learning dynamics shape emergent equilibria.

Proc. of the 25th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2026), C. Amato, L. Dennis, V. Mascardi, J. Thangarajah (eds.), May 25 -29, 2026, Paphos, Cyprus. ยฉ 2026 International Foundation for Autonomous Agents and Multiagent Systems (www.ifaamas.org ). This work is licenced under the Creative Commons Attribution 4.0 International (CC-BY 4.0) licence.

https://github.com/AAMAS746/aamas2026-746

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

โ†‘โ†“
โ†ต
ESC
โŒ˜K Shortcut