플라스틱성 회복을 위한 트윈 네트워크 기반 리셋 기법 AltNet
📝 Abstract
Neural networks have shown remarkable success in supervised learning when trained on a single task using a fixed dataset. However, when neural networks are trained on a reinforcement learning task, their ability to continue learning from new experiences declines over time. This decline in learning ability is known as plasticity loss. To restore plasticity, prior work has explored periodically resetting the parameters of the learning network, a strategy that often improves overall performance. However, such resets come at the cost of a temporary drop in performance, which can be dangerous in real-world settings. To overcome this instability, we introduce AltNet, a reset-based approach that restores plasticity without performance degradation by leveraging twin networks. The use of twin networks anchors performance during resets through a mechanism that allows networks to periodically alternate roles: one network learns as it acts in the environment, while the other learns off-policy from the active network’s interactions and a replay buffer. At fixed intervals, the active network is reset and the passive network, having learned from prior experiences, becomes the new active network. AltNet restores plasticity, improving sample efficiency and achieving higher performance, while avoiding performance drops that pose risks in safety-critical settings. We demonstrate these advantages in several high-dimensional control tasks from the DeepMind Control Suite, where AltNet outperforms various relevant baseline methods, as well as state-of-the-art resetbased techniques.
💡 Analysis
Neural networks have shown remarkable success in supervised learning when trained on a single task using a fixed dataset. However, when neural networks are trained on a reinforcement learning task, their ability to continue learning from new experiences declines over time. This decline in learning ability is known as plasticity loss. To restore plasticity, prior work has explored periodically resetting the parameters of the learning network, a strategy that often improves overall performance. However, such resets come at the cost of a temporary drop in performance, which can be dangerous in real-world settings. To overcome this instability, we introduce AltNet, a reset-based approach that restores plasticity without performance degradation by leveraging twin networks. The use of twin networks anchors performance during resets through a mechanism that allows networks to periodically alternate roles: one network learns as it acts in the environment, while the other learns off-policy from the active network’s interactions and a replay buffer. At fixed intervals, the active network is reset and the passive network, having learned from prior experiences, becomes the new active network. AltNet restores plasticity, improving sample efficiency and achieving higher performance, while avoiding performance drops that pose risks in safety-critical settings. We demonstrate these advantages in several high-dimensional control tasks from the DeepMind Control Suite, where AltNet outperforms various relevant baseline methods, as well as state-of-the-art resetbased techniques.
📄 Content
Deep learning systems are often designed to learn and converge on a single task. In non-stationary environments, however, the goal being optimized by the model evolves over time. Success in such settings requires continual adaptation rather than the ability to identify a single solution. This need motivates the field of continual learning or lifelong learning, where an agent updates, accumulates, and exploits knowledge throughout its lifetime [5]. A central obstacle in continual learning is plasticity loss-the progressive decline in an agent’s ability to learn from new data over time [6,15,16,22]. We say that a network has lost plasticity if it can no longer optimize its objective as effectively as a freshly initialized counterpart [17]. Plasticity loss has been observed in non-stationary settings. For instance, Achille et al. [2] showed that pre-training on blurred CIFAR images impaired subsequent learning of the original dataset. Similarly, Ash and Adams [3] found that pre-training on half of a dataset and using the resulting model as a starting point when tackling a supervised learning task reduced accuracy compared to training on the full dataset from scratch. More broadly, Dohare et al. [7] demonstrated that when neural networks are trained sequentially on multiple tasks, their ability to learn new tasks declines with each additional task.
Reinforcement learning (RL) compounds the difficulty of maintaining plasticity over time because, even when the task itself is stationary, RL agents face inherent sources of non-stationarity. First, agents collect their own data; as policies evolve, the distribution of encountered states and actions shifts, producing input nonstationarity. Second, many RL algorithms such as DQN, A2C, PPO, and SAC [10,19,20,24] rely on bootstrapping, where predictions of future rewards serve as learning targets. As these predictions evolve, the targets themselves change, creating target non-stationarity. Together, these factors require agents to continually adapt to shifting data distributions even when tackling a single task, thereby amplifying plasticity loss.
To mitigate plasticity loss, various approaches have been proposed ( Section 2). Among these, a particularly promising family of methods is based on periodically resetting network parameters [6,14,22,25]. Resets are effective because they restore the network to a well-conditioned, highly plastic initialization that is gradually lost during training. As networks adapt to specific tasks or data distributions, they accumulate pathologies-such as dormant neurons, growing weight magnitudes, and reduced rank-that impair their ability to learn from new data [6]. Resetting the parameters removes these accumulated effects and reinitializes the network to conditions resembling its original, plastic initialization (see supporting analysis in Appendix E). Nikishin et al. [22] empirically demonstrated that resetting a network can substantially improve performance by renewing its ability to learn and exploit data. Although effective, full network resets come at a cost: they erase all information embedded in the network and cause immediate performance collapses (see Figure 2, orange curve). This makes Standard Resets [22] impractical for real-world deployment. The central challenge we address in this paper is how to retain the benefits of full network resets in restoring plasticity while avoiding the performance instability they induce.
To address the plasticity-stability dilemma, we introduce AltNet, a reset-based alternating network approach that preserves plasticity without inducing recurring performance drops. AltNet maintains two networks that periodically switch roles. At any given time, the active network interacts with the environment, while the passive network learns off-policy from the active agent’s experience and a shared replay buffer. At fixed intervals, the active network is reset and the passive network, having learned from prior experiences, becomes the new active network. This alternating structure anchors performance across resets and prevents performance collapse. Importantly, AltNet successfully leverages resets without any performance instability even at a low replay ratio of 1; in these cases, by contrast, Standard Resets [22] fail (see Figure 2, orange curve) and more sophisticated methods such as Resets with Deep Ensembles (RDE) [14] still exhibit sharp post-reset performance drops (see Figure 2, blue curve). To understand which factors contribute to AltNet’s superior performance, we systematically evaluate aspects such as model capacity, number of networks, replay ratio, buffer size, and reset duration ( subsection 4.2). Finally, we show that AltNet also improves performance in on-policy settings, as demonstrated by comparisons with the on-policy baseline, PPO [24] ( subsection 4.3).
Plasticity. Prior work uses the term plasticity to refer to the degree to which a network generalizes to unseen data [4] or to ref
This content is AI-processed based on ArXiv data.