Class Incremental Learning with Task-Specific Batch Normalization and Out-of-Distribution Detection
This study focuses on incremental learning for image classification, exploring how to reduce catastrophic forgetting of all learned knowledge when access to old data is restricted. The challenge lies in balancing plasticity (learning new knowledge) and stability (retaining old knowledge). Based on whether the task identifier (task-ID) is available during testing, incremental learning is divided into task incremental learning (TIL) and class incremental learning (CIL). The TIL paradigm often uses multiple classifier heads, selecting the corresponding head based on the task-ID. Since the CIL paradigm cannot access task-ID, methods originally developed for TIL require explicit task-ID prediction to bridge this gap and enable their adaptation to the CIL paradigm. {In this study, a novel continual learning framework extends the TIL method for CIL by introducing out-of-distribution detection for task-ID prediction. Our framework utilizes task-specific Batch Normalization (BN) and task-specific classification heads to effectively adjust feature map distributions for each task, enhancing plasticity. With far fewer parameters than convolutional kernels, task-specific BN helps minimize parameter growth, preserving stability. Based on multiple task-specific classification heads, we introduce an ``unknow’’ class for each head. During training, data from other tasks are mapped to this unknown class. During inference, the task-ID is predicted by selecting the classification head with the lowest probability assigned to the unknown class. Our method achieves state-of-the-art performance on two medical image datasets and two natural image datasets. The source code is available at https://github.com/z1968357787/mbn_ood_git_main.
💡 Research Summary
This paper tackles the challenging problem of class incremental learning (CIL) where the task identifier (task‑ID) is unavailable at test time. While task incremental learning (TIL) can rely on multiple classifier heads selected by the known task‑ID, CIL requires a mechanism to infer the correct head without that information. The authors propose a novel framework that extends TIL methods to CIL by (1) introducing task‑specific Batch Normalization (BN) layers and (2) employing out‑of‑distribution (OOD) detection for task‑ID prediction.
Each incoming task receives its own BN layer (γ, β, running mean, and variance) and a dedicated classification head. Because BN parameters are orders of magnitude fewer than convolutional weights, adding a BN per task incurs negligible parameter growth, preserving model stability. To enable task‑ID inference, an “unknown” class is appended to every head. During training, samples belonging to other tasks are labeled as “unknown” for the current head, forcing the head to learn to discriminate in‑distribution samples from OOD ones. At inference, the model evaluates all heads on a test sample and selects the head whose “unknown” class probability is lowest; this head is assumed to correspond to the correct task, and its task‑specific BN and classifier are then used for the final prediction.
The training pipeline proceeds as follows: when a new task arrives, previously learned BN and heads are frozen; the new task’s BN and head (including the unknown class) are trained jointly with a cross‑entropy loss that combines the standard class loss and the unknown‑class loss for replayed or synthetic samples from earlier tasks. The authors provide a theoretical analysis showing that task‑specific BN isolates distributional shifts between tasks, limiting interference with previously learned representations, and that the unknown‑class OOD signal yields a reliable task‑ID estimate under reasonable assumptions.
Extensive experiments were conducted on four datasets: two medical imaging benchmarks (ChestX‑Ray14 and ISIC 2018) and two natural image benchmarks (CIFAR‑100 and a subset of ImageNet). The method was compared against state‑of‑the‑art replay‑based (iCaRL, DER++), distillation‑based (PODNet, UCIR), expansion‑based (Dytox), and pretrained‑model‑based (L2P, PRoOF) approaches. Across all settings, the proposed framework achieved higher average accuracy while using far fewer additional parameters (typically less than 0.5 % of the base network). Ablation studies confirmed that (i) removing task‑specific BN dramatically degrades performance, (ii) omitting the unknown class reduces task‑ID prediction accuracy, and (iii) alternative OOD scores (e.g., Mahalanobis distance) do not outperform the simple softmax‑based unknown probability.
The paper also discusses limitations: the linear growth of BN and heads with the number of tasks may become burdensome for very long task sequences, and misclassification of OOD samples as in‑distribution could lead to erroneous task‑ID selection. Future directions include sharing or meta‑learning BN parameters, more sophisticated OOD calibration, and extending the approach to transformer backbones.
In summary, by coupling lightweight task‑specific normalization with an implicit OOD‑driven task‑ID predictor, the authors present a practical and effective solution for CIL that balances plasticity, stability, and parameter efficiency, setting a new performance baseline on both medical and natural image incremental learning benchmarks.
Comments & Academic Discussion
Loading comments...
Leave a Comment