A fundamental challenge in Continual Learning (CL) is catastrophic forgetting, where adapting to new tasks degrades the performance on previous ones. While the field has evolved with diverse methods, this rapid surge in diverse methodologies has culminated in a fragmented research landscape. The lack of a unified framework, including inconsistent implementations, conflicting dependencies, and varying evaluation protocols, makes fair comparison and reproducible research increasingly difficult. To address this challenge, we propose LibContinual, a comprehensive and reproducible library designed to serve as a foundational platform for realistic CL. Built upon a high-cohesion, low-coupling modular architecture, LibContinual integrates 19 representative algorithms across five major methodological categories, providing a standardized execution environment. Meanwhile, leveraging this unified framework, we systematically identify and investigate three implicit assumptions prevalent in mainstream evaluation: (1) offline data accessibility, (2) unregulated memory resources, and (3) intra-task semantic homogeneity. We argue that these assumptions often overestimate the real-world applicability of CL methods. Through our comprehensive analysis using strict online CL settings, a novel unified memory budget protocol, and a proposed category-randomized setting, we reveal significant performance drops in many representative CL methods when subjected to these real-world constraints. Our study underscores the necessity of resource-aware and semantically robust CL strategies, and offers LibContinual as a foundational toolkit for future research in realistic continual learning. The source code is available from \href{https://github.com/RL-VIG/LibContinual}{https://github.com/RL-VIG/LibContinual}.
E NDOWING machines with the human-like capability to continuously acquire new knowledge and adapt to evolving environments represents one of the ultimate milestones toward achieving artificial general intelligence. Continual Learning (CL), also known as lifelong or incremental learning, is the research paradigm dedicated to this goal. It requires a model to learn sequentially from a non-stationary stream of data, acquiring knowledge from a series of tasks without compromising its performance on previously learned tasks. However, this ideal learning process is hindered by a fundamental challenge, catastrophic forgetting [1], [2]. When a model adjusts its parameters to accommodate a new task's data distribution, it often overwrites the knowledge critical for past tasks, causing significant performance degradation on prior tasks. Navigating the trade-off between plasticity (the ability to learn new knowledge) and stability (the preservation of old knowledge), known as the stability-plasticity dilemma [3], constitutes the central challenge in the field.
To address this challenge, the research community has explored various technical avenues [4], including regularizationbased methods [5], [6] that protect prior knowledge, replay-based methods [7], [8], [9] that rehearse past data, optimization-based methods [10], [11] that constrain parameter updates, and architecture-based methods [12], [13] that adapt the model structure. More recently, the advent of Pre-Trained Models (PTM) has triggered a profound paradigm shift in CL, giving rise to representation-based methods [14], [15], [16] that focus on efficiently adapting powerful pre-trained features. The research focus is gradually moving from “learning from scratch” [7] to “efficiently and robustly fine-tuning and adapting powerful pre-trained knowledge” [14], [4].
Meanwhile, this rapid surge in diverse methodologies has culminated in a fragmented research landscape. Methods are often implemented using different deep learning frameworks, conflicting dependency versions, and inconsistent data processing pipelines. Such fragmentation makes it difficult to determine whether performance gains stem from algorithmic innovation or merely from differences in implementation details and hyperparameter tuning. Consequently, there is a critical lack of a unified framework capable of providing a standardized implementation and fair comparison across the diverse methods. While several libraries have been developed to aid reproducible research, such as Avalanche [17], Continuum [18] and PyCIL [19], they often exhibit limitations. As will be discussed in Section II-B, some existing frameworks suffer from rigid component coupling, which complicates the extension or customization of internal modules. Others lack native support for modern Vision-Language Models (VLM), restricting the comparison between traditional training-fromscratch methods and PTM-based strategies. This absence of a comprehensive, modular, and up-to-date toolkit creates a barrier to rigorous empirical analysis and further advances.
To address this challenge, we propose LibContinual, a comprehensive and reproducible library designed to serve as a foundational platform for realistic continual learning. LibContinual is built upon a high-cohesion, low-coupling architectural They typically use tasks with high intra-task semantic homogeneity. We introduce a more challenging category-randomized setting, preventing models from relying on task-level shortcuts and thus testing for more robust representations.
design (detailed in Section III). It decouples the experimental workflow into modular components, inlcuding Trainer, Model, Buffer, and DataModule, driven by a unified configuration system. This design allows researchers to seamlessly mix and match diverse backbones, classifiers, and buffer strategies within a standardized execution environment. Leveraging this architecture, we integrate 19 representative algorithms spanning all five major categories: regularization, replay, optimization, representation, and architecture-based methods. By providing a unified interface for classical and modern PTMbased methods, LibContinual can enable the community to conduct fair, transparent, and scalable benchmarking.
Equally important, during the development of LibContinual, the standardization of protocols allowed us to identify several implicit assumptions deeply embedded in mainstream evaluation paradigms. These assumptions, often accepted as convention, may overestimate the real-world applicability of CL methods. Specifically, we identify three implicit assumption: (1) The Assumption of Offline Data Accessibility, which presumes multi-epoch training on task data, ignoring the single-pass nature of real-world streams; (2) The Assumption of Unregulated Memory Resources, where inconsistent accounting of storage (e.g., raw images vs. abstract features) obscures true algorithmic efficiency; and (3) The Assumption of Intra-Task
This content is AI-processed based on open access ArXiv data.