When Classes Evolve: A Benchmark and Framework for Stage-Aware Class-Incremental Learning
Class-Incremental Learning (CIL) aims to sequentially learn new classes while mitigating catastrophic forgetting of previously learned knowledge. Conventional CIL approaches implicitly assume that classes are morphologically static, focusing primarily on preserving previously learned representations as new classes are introduced. However, this assumption neglects intra-class evolution: a phenomenon wherein instances of the same semantic class undergo significant morphological transformations, such as a larva turning into a butterfly. Consequently, a model must both discriminate between classes and adapt to evolving appearances within a single class. To systematically address this challenge, we formalize Stage-Aware CIL (Stage-CIL), a paradigm in which each class is learned progressively through distinct morphological stages. To facilitate rigorous evaluation within this paradigm, we introduce the Stage-Bench, a 10-domain, 2-stages dataset and protocol that jointly measure inter- and intra-class forgetting. We further propose STAGE, a novel method that explicitly learns abstract and transferable evolution patterns within a fixed-size memory pool. By decoupling semantic identity from transformation dynamics, STAGE enables accurate prediction of future morphologies based on earlier representations. Extensive empirical evaluation demonstrates that STAGE consistently and substantially outperforms existing state-of-the-art approaches, highlighting its effectiveness in simultaneously addressing inter-class discrimination and intra-class morphological adaptation.
💡 Research Summary
The paper introduces a new problem setting for continual learning called Stage‑Aware Class‑Incremental Learning (Stage‑CIL). Traditional class‑incremental learning (CIL) assumes that each semantic class remains morphologically static over time, focusing solely on preserving representations of previously learned classes while adding new ones. In many real‑world scenarios, however, a single class can undergo substantial, structured transformations (e.g., a larva turning into a butterfly). This intra‑class evolution creates a form of forgetting that standard CIL benchmarks and metrics cannot capture.
To formalize this challenge, the authors define Stage‑CIL as a learning scenario where each training example is a triplet (image x, class y, stage s). The stage index s belongs to a fixed set of M ordered stages and is provided as side information. During training, the data for any given class appear in non‑decreasing stage order, so the model must learn both (i) inter‑class discrimination (the classic CIL goal) and (ii) intra‑class coherence (maintaining a consistent identity across stages).
To evaluate methods under this setting, the paper proposes Stage‑Bench, a benchmark comprising ten diverse visual domains (fish, flowers, food, vegfru, fungi, insects, birds, pets, plant leaves, objects). Each domain contains 20 classes, and each class is annotated with two ordered stages (Stage‑0 = initial morphology, Stage‑1 = evolved morphology), yielding a total of 18,895 images. The benchmark follows a (B‑m, Inc‑n) × S₂ protocol: an initial session introduces m classes, each incremental session adds n new classes, and for every class the two stages appear sequentially. Two new metrics are introduced: Inter‑F, the average drop from each class’s peak accuracy to its final accuracy (identical to conventional CIL forgetting), and Intra‑F, a normalized accuracy drop between a class’s Stage‑0 and Stage‑1 test performance, directly measuring intra‑class forgetting.
The authors also present a novel method called STAGE (Stage‑aware Evolution). STAGE operates in two phases. Phase 0 (Anchor Distillation) processes only the first‑stage samples of each class. A frozen vision‑language backbone (image encoder g_img and text encoder g_text) is used, while class‑specific projection layers are trained on Stage‑0 data. A stable “anchor” prototype p₀ᶜ is constructed by averaging visual features of Stage‑0 images and fusing them with the class’s textual embedding via cross‑modal attention. This anchor serves as a fixed identity reference for the class.
Phase 1 (Predictive Evolution) predicts the representation of a later stage without re‑training the classifier on mixed‑stage data. An Evolution‑aware Memory Pool P = {u₁,…,u_K} stores a set of learnable basis transformation patterns (e.g., “color darkens”, “shape elongates”). For a given class anchor p₀ᶜ, the top‑k most similar patterns are retrieved using cosine similarity, and an attention mechanism aggregates them into a transformation context cᶜ,ᵢ. A residual evolution network E then produces the predicted next‑stage feature: ˆxₛ₊₁ᵢ = p₀ᶜ + E(p₀ᶜ + cᶜ,ᵢ). The predicted feature is fed to the classifier to output the class label. This predict‑then‑classify pipeline allows the model to anticipate future morphologies while keeping the anchor untouched, thereby reducing intra‑class forgetting.
Extensive experiments compare STAGE against a wide range of state‑of‑the‑art CIL approaches, including distillation‑based, rehearsal‑based, prompt‑based, and adapter‑based methods, all evaluated under the Stage‑Bench protocol. Results show that STAGE consistently achieves lower Inter‑F and dramatically lower Intra‑F scores; in particular, Intra‑F improves by more than 30 % relative to the best baseline. The method remains robust when the memory pool size is varied, indicating that a modest number of transformation patterns suffice. Ablation studies confirm that both the anchor distillation and the memory‑pool‑driven evolution prediction are essential for the observed gains.
The paper’s contributions are threefold: (1) introducing the Stage‑CIL paradigm and a rigorous evaluation protocol that explicitly measures intra‑class evolution; (2) releasing the Stage‑Bench benchmark with official splits, mapping files, and data loaders to facilitate reproducible research; and (3) proposing the STAGE framework, which demonstrates that learning abstract, transferable evolution patterns is an effective strategy for mitigating intra‑class forgetting.
Limitations are acknowledged. The current benchmark only handles two stages (M = 2), leaving multi‑stage evolution (M > 2) unexplored. The linear combination of memory‑pool patterns may struggle with highly non‑linear morphological changes. The memory pool is fixed in size and updated in a simple manner, which could lead to pattern degradation over very long continual streams. Finally, reliance on textual prompts for cross‑modal fusion may be less effective in domains where textual semantics are weak or unavailable (e.g., medical imaging). Future work could extend the framework to multi‑stage temporal modeling, incorporate non‑linear transformation modules, devise dynamic memory management strategies, and explore pure‑vision approaches for evolution prediction. Overall, the work opens a new research direction at the intersection of continual learning and morphological dynamics, providing both a solid benchmark and a promising baseline.
Comments & Academic Discussion
Loading comments...
Leave a Comment