Learning long term climate-resilient transport adaptation pathways under direct and indirect flood impacts using reinforcement learning

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Climate change is expected to intensify rainfall and other hazards, increasing disruptions in urban transportation systems. Designing effective adaptation strategies is challenging due to the long-term, sequential nature of infrastructure investments, deep uncertainty, and complex cross-sector interactions. We propose a generic decision-support framework that couples an integrated assessment model (IAM) with reinforcement learning (RL) to learn adaptive, multi-decade investment pathways under uncertainty. The framework combines long-term climate projections (e.g., IPCC scenario pathways) with models that map projected extreme-weather drivers (e.g. rain) into hazard likelihoods (e.g. flooding), propagate hazards into urban infrastructure impacts (e.g. transport disruption), and value direct and indirect consequences for service performance and societal costs. Embedded in a reinforcement-learning loop, it learns adaptive climate adaptation policies that trade off investment and maintenance expenditures against avoided impacts. In collaboration with Copenhagen Municipality, we demonstrate the approach on pluvial flooding in the inner city for the horizon of 2024 to 2100. The learned strategies yield coordinated spatial-temporal pathways and improved robustness relative to conventional optimization baselines, namely inaction and random action, illustrating the framework’s transferability to other hazards and cities.

💡 Research Summary

The paper presents a novel decision‑support framework that integrates long‑term climate projections, flood hazard modeling, urban transport simulation, and economic impact assessment within a reinforcement‑learning (RL) loop to generate adaptive, multi‑decadal adaptation pathways for urban transportation systems. The authors instantiate the framework for Copenhagen’s inner city, covering the period 2024‑2100, and evaluate it under three Representative Concentration Pathways (RCP2.6, RCP4.5, RCP8.5).

Integrated Assessment Model (IAM)
The IAM consists of four modular components: (1) a rainfall projection model that samples annual extreme‑rainfall events from scenario‑conditioned cumulative distribution functions; (2) a flood model (SCALGO Live) that converts sampled rainfall into spatially distributed water depths across the study area; (3) a transport simulation that extracts road, cycling, and pedestrian networks from OpenStreetMap, defines 29 traffic‑assignment zones (TAZs), and generates 84 000 origin‑destination trips using the Danish National Travel Survey. Travel times are computed on the shortest‑time routes, and depth‑disruption functions map water depth on each network segment to speed reductions, route changes, or trip cancellations. (4) An impact valuation module quantifies (i) direct infrastructure damage, (ii) travel delays, and (iii) trip cancellations, converting each into monetary losses using Danish construction cost estimates, value‑of‑time parameters, and established cancellation cost methods.

Reinforcement‑Learning Formulation
The sequential decision problem is modeled as a Markov Decision Process. The state at each yearly step includes, for every TAZ, the current damage, delay, cancellation metrics, and a vector representing the residual effectiveness of already‑implemented interventions (which decay over time). The action space is zone‑specific and discrete, offering eight possible low‑impact, green‑infrastructure measures (e.g., bioretention planters, soakaways, storage tanks, porous asphalt, etc.). Once deployed, an intervention remains active for its predefined lifetime, and a masking mechanism prevents re‑selection of already‑active measures.

The reward function is a monetized objective that penalizes the sum of avoided flood impacts and the costs of new investments plus ongoing maintenance:
( r_t = -\sum_i \big( I_{i,t} + D_{i,t} + C_{i,t} + A_{i,t} + M_{i,t} \big) ).
Thus the RL agent seeks to minimize total expected discounted cost over the horizon.

Policy representation uses a message‑passing Graph Neural Network (GNN) that ingests node features, performs L rounds of neighbor aggregation, and outputs per‑node logits. After applying the feasibility mask, a softmax yields a probability distribution over admissible actions for each zone. This architecture is permutation‑invariant, shares parameters across zones, and can be transferred to graphs of different sizes.

Training and Experiments
The environment is implemented as a Gymnasium wrapper in Python, and policies are trained with Proximal Policy Optimization (PPO) via Stable‑Baselines3. Training runs with 10 parallel environments for up to 4.5 million steps, using batch size 64, 1 024 environment steps per update, 10 epochs, entropy coefficient 0.01, and a KL‑divergence clip of 0.2. Early stopping occurs when the average return plateaus. Results are reported over ten random seeds (mean ± standard deviation).

Two baselines are considered under the RCP4.5 scenario: (1) No Control (NC), which never implements any measure, and (2) Random Control (RND), which selects actions uniformly at random each year.

Results
The learned RL policy outperforms both baselines. Compared with NC, the RL policy reduces the average annual total cost by roughly 15 % and achieves a 30 % higher avoided‑damage ratio under high‑intensity rainfall events. The policy exhibits a coherent spatial‑temporal investment pattern: early years favor low‑cost, low‑capacity measures (bioretention, soakaways) deployed broadly, while later years concentrate higher‑capacity interventions (storage tanks, permeable pavements) in strategically vulnerable zones. This emergent sequencing reflects the algorithm’s ability to balance immediate cost savings against long‑term risk mitigation, a trade‑off that static optimization approaches typically miss.

Discussion and Limitations
The flood model assumes uniform rainfall over the entire catchment, neglecting temporal distribution and storm‑track variability, which may under‑represent peak discharge dynamics. Cost and effectiveness parameters are taken from literature rather than calibrated to local conditions, limiting sensitivity analysis. The current objective focuses solely on monetary cost; other societal objectives (e.g., equity, carbon emissions) are not incorporated.

Future Work
The authors propose extending the framework with high‑resolution stochastic rainfall generators, Bayesian uncertainty propagation, and multi‑objective RL to capture additional sustainability criteria. They also suggest applying the methodology to other hazards (e.g., heatwaves, sea‑level rise) and to different urban contexts to test transferability.

In summary, the paper demonstrates that coupling an integrated assessment model with reinforcement learning can automatically discover robust, cost‑effective, long‑term adaptation pathways for urban transport under deep climate uncertainty, offering a scalable tool for city planners seeking resilient infrastructure strategies.

Learning long term climate-resilient transport adaptation pathways under direct and indirect flood impacts using reinforcement learning

💡 Research Summary

Comments & Academic Discussion

Leave a Comment