C3Box: A CLIP-based Class-Incremental Learning Toolbox
Traditional machine learning systems are typically designed for static data distributions, which suffer from catastrophic forgetting when learning from evolving data streams. Class-Incremental Learning (CIL) addresses this challenge by enabling learning systems to continuously learn new classes while preserving prior knowledge. With the rise of pre-trained models (PTMs) such as CLIP, leveraging their strong generalization and semantic alignment capabilities has become a promising direction in CIL. However, existing CLIP-based CIL methods are often scattered across disparate codebases, rely on inconsistent configurations, hindering fair comparisons, reproducibility, and practical adoption. Therefore, we propose C3Box (CLIP-based Class-inCremental learning toolBOX), a modular and comprehensive Python toolbox. C3Box integrates representative traditional CIL methods, ViT-based CIL methods, and state-of-the-art CLIP-based CIL methods into a unified CLIP-based framework. By inheriting the streamlined design of PyCIL, C3Box provides a JSON-based configuration and standardized execution pipeline. This design enables reproducible experimentation with low engineering overhead and makes C3Box a reliable benchmark platform for continual learning research. Designed to be user-friendly, C3Box relies only on widely used open-source libraries and supports major operating systems. The code is available at https://github.com/LAMDA-CL/C3Box.
💡 Research Summary
The paper introduces C3Box, a comprehensive Python toolbox designed to standardize and simplify research on class‑incremental learning (CIL) that leverages the vision‑language pre‑trained model CLIP. Traditional machine‑learning pipelines assume static data distributions; when data streams evolve, models quickly suffer catastrophic forgetting. CIL mitigates this by allowing a system to acquire new classes over time while preserving previously learned knowledge. Recent advances in pre‑trained models, especially CLIP, have shown that the alignment of visual and textual embeddings can dramatically improve generalization and reduce forgetting in incremental scenarios. However, existing CLIP‑based CIL methods are scattered across many code repositories, each with its own configuration style, making fair comparison, reproducibility, and practical adoption difficult.
C3Box addresses these challenges by providing a unified, modular framework that integrates three families of CIL approaches: (1) traditional exemplar‑based methods (e.g., FOSTER, MEMO), (2) ViT‑based prompt‑tuning methods (e.g., L2P, DualPrompt, CODA‑Prompt, EASE, SimpleCIL, APER, TUNA), and (3) state‑of‑the‑art CLIP‑centric methods (e.g., RAPF, CLG‑CBM, MG‑CLIP, PROOF, ENGINE, BOFA). In total, 17 algorithms are implemented within a single CLIP‑based backbone (ViT‑B/16) using the OpenCLIP library, which supports two widely used pre‑trained weight sets: LAION‑400M and the original OpenAI CLIP weights.
A key design decision is the adoption of a JSON‑based configuration file that encapsulates every experimental detail: dataset choice, initial class count (m), incremental class count per stage (n), memory budget per class, random seed, training hyper‑parameters (learning rate, batch size, optimizer, number of epochs, etc.), and method‑specific options. This eliminates the need to modify source code for each experiment; users simply edit the JSON and launch the run with a single command (`python main.py –config=./exps/
Comments & Academic Discussion
Loading comments...
Leave a Comment