Plasticine: Accelerating Research in Plasticity-Motivated Deep Reinforcement Learning
Developing lifelong learning agents is crucial for artificial general intelligence (AGI). However, deep reinforcement learning (RL) systems often suffer from plasticity loss, where neural networks gradually lose their ability to adapt during training. Despite its significance, this field lacks unified benchmarks and evaluation protocols. We introduce Plasticine, the first open-source framework for benchmarking plasticity optimization in deep RL. Plasticine provides single-file implementations of over 13 mitigation methods, 6 evaluation metrics, and learning scenarios with increasing non-stationarity levels from standard to continually varying environments. This framework enables researchers to systematically quantify plasticity loss, evaluate mitigation strategies, and analyze plasticity dynamics across different contexts. Our documentation, examples, and source code are available at https://github.com/RLE-Foundation/Plasticine.
💡 Research Summary
Plasticity loss— the gradual decline in a deep reinforcement‑learning (RL) agent’s ability to incorporate new information— has emerged as a critical obstacle to lifelong learning and, by extension, to the development of artificial general intelligence. While a growing body of work proposes various mitigation strategies (reset‑based interventions, normalization tricks, regularization schemes, alternative activation functions, and specialized optimizers), the field has lacked a unified benchmark, standardized metrics, and reproducible evaluation pipelines. To fill this gap, the authors present Plasticine, the first open‑source framework dedicated to benchmarking plasticity optimization in deep RL.
The framework is organized around three pillars: Methods, Metrics, and Environments. Under Methods, more than 13 representative mitigation techniques are implemented in a “single‑file but modular” fashion, building on the CleanRL PPO baseline. The techniques are grouped into five categories:
- Reset‑based interventions (e.g., Shrink‑and‑Perturb, Plasticity Injection, ReDo, Layer‑wise resetting) that periodically re‑initialize weights or neurons to revive dormant units.
- Normalization (LayerNorm, Normalize‑and‑Project) that stabilizes pre‑activation statistics and periodically rescales weights to their initial norms, thereby decoupling parameter growth from effective learning rates.
- Regularization (L2 weight decay, regenerative regularization, Parseval regularization) that penalizes excessive norm growth or enforces orthogonal weight structures to prevent representation collapse.
- Activation functions (CReLU, Deep Fourier Features) that ensure non‑zero gradients for all units and enrich the feature space with sinusoidal components.
- Optimizers (TRAC, Kron) that act as meta‑optimizers, dynamically adjusting update magnitudes based on gradient history and discount‑factor‑specific tuners.
Plasticine’s Metrics module provides six quantitative indicators that together capture different facets of plasticity: the ratio of dormant to active neurons, Stable Rank and Effective Rank of internal representations, weight and gradient norms, and auxiliary statistics such as parameter change rates. By logging these metrics at every training step, researchers can pinpoint whether a method primarily preserves activation diversity, controls weight explosion, or maintains representation dimensionality.
The evaluation suite spans three progressively non‑stationary scenarios. Standard online RL uses the Arcade Learning Environment (ALE) to assess plasticity under natural policy‑driven distribution shifts. Continual RL introduces explicit task changes via Procgen and DeepMind Control (DMC) benchmarks, with two sub‑scenarios: (a) visual feature drift within a single task and (b) full task switches across disparate domains. Finally, a Progressive non‑stationarity ladder systematically increases the magnitude and frequency of distribution shifts, allowing stress‑testing of each mitigation strategy.
Experimental results (detailed in the appendix) reveal several key insights. Normalization‑based methods, especially NaP combined with LayerNorm, consistently preserve Stable Rank and curb weight‑norm growth across all environments, making them the most robust across the board. Reset‑based approaches yield rapid performance rebounds after abrupt task switches but can introduce weight‑norm inflation if applied too frequently. Activation‑function modifications dramatically improve the Dormant‑to‑Active ratio, particularly in visually rich Procgen settings where CReLU and Fourier features keep neurons alive. Meta‑optimizers like TRAC and Kron excel in environments with volatile reward signals, stabilizing gradient norms and reducing catastrophic forgetting.
Beyond the empirical findings, Plasticine’s design philosophy emphasizes ease of use and extensibility. All methods are encapsulated in a single Python file that can be dropped into any CleanRL workflow, while the modular internal API permits researchers to toggle individual components without rewriting the core training loop. Comprehensive documentation, example scripts, and a clear directory structure (agents, env wrappers, metrics, scripts) lower the barrier for newcomers and facilitate reproducible research.
In conclusion, Plasticine provides a much‑needed, standardized platform for studying and mitigating plasticity loss in deep reinforcement learning. By unifying implementations, metrics, and evaluation scenarios, it enables rapid prototyping, fair comparison, and deeper diagnostic analysis of how neural networks adapt—or fail to adapt—under continual non‑stationarity. The authors intend to expand the repository with additional methods, environments, and community contributions, positioning Plasticine as the de‑facto benchmark for future lifelong‑learning RL research.
Comments & Academic Discussion
Loading comments...
Leave a Comment