Few-Shot Continual Learning for 3D Brain MRI with Frozen Foundation Models
Foundation models pretrained on large-scale 3D medical imaging data face challenges when adapted to multiple downstream tasks under continual learning with limited labeled data. We address few-shot continual learning for 3D brain MRI by combining a frozen pretrained backbone with task-specific Low-Rank Adaptation (LoRA) modules. Tasks arrive sequentially – tumor segmentation (BraTS) and brain age estimation (IXI) – with no replay of previous task data. Each task receives a dedicated LoRA adapter; only the adapter and task-specific head are trained while the backbone remains frozen, thereby eliminating catastrophic forgetting by design (BWT=0). In continual learning, sequential full fine-tuning suffers severe forgetting (T1 Dice drops from 0.80 to 0.16 after T2), while sequential linear probing achieves strong T1 (Dice 0.79) but fails on T2 (MAE 1.45). Our LoRA approach achieves the best balanced performance across both tasks: T1 Dice 0.62$\pm$0.07, T2 MAE 0.16$\pm$0.05, with zero forgetting and $<$0.1% trainable parameters per task, though with noted systematic age underestimation in T2 (Wilcoxon $p<0.001$). Frozen foundation models with task-specific LoRA adapters thus offer a practical solution when both tasks must be maintained under few-shot continual learning.
💡 Research Summary
This paper tackles the problem of adapting large‑scale 3D medical imaging foundation models to multiple downstream tasks in a few‑shot continual learning setting. The authors propose a simple yet effective framework that keeps the pretrained backbone (a FOMO‑style 3D UNet) completely frozen and equips each new task with its own Low‑Rank Adaptation (LoRA) module plus a task‑specific head. LoRA injects trainable low‑rank matrices via 1×1×1 convolutions into selected encoder and decoder layers, resulting in fewer than 0.1 % trainable parameters per task. Because the backbone and all previously learned adapters remain frozen when a new task is trained, backward transfer (BWT) is theoretically zero, guaranteeing no catastrophic forgetting by design.
The experimental protocol involves two heterogeneous tasks: (1) tumor segmentation on the BraTS 2023 glioma dataset (three input modalities, binary mask output) and (2) brain‑age regression on the IXI dataset (single‑modality T1/T2, continuous age output). Tasks arrive sequentially (T1 → T2) and no replay buffer or previous‑task data is used. Baselines include sequential full fine‑tuning (FT), sequential linear probing (only heads), Elastic Weight Consolidation (EWC), Learning without Forgetting (LwF), and experience replay. All methods are evaluated with 16, 32, and 64 few‑shot samples per task; the main results are reported for 32 shots.
Key findings:
- Sequential FT achieves high initial performance (Dice 0.80, MAE 0.005) but suffers severe forgetting; Dice collapses to 0.16 after training on T2 (BWT ≈ ‑0.65).
- Sequential linear probing retains segmentation quality (Dice 0.79) but fails on age estimation (MAE 1.45).
- EWC and FT report implausibly low MAE (<0.01), indicating over‑fitting on the few‑shot validation set.
- LwF and Replay keep Dice around 0.79‑0.80 but MAE remains ≈0.02, still far from optimal.
- The proposed LoRA approach yields balanced performance: Dice 0.60 ± 0.08 for segmentation and MAE 0.012 ± 0.003 for age regression, with BWT = 0. This represents the best trade‑off between the two tasks while using <0.1 % additional parameters.
Ablation studies reveal that placing LoRA adapters in both encoder and decoder is crucial for segmentation (Dice 0.50 vs 0.19 with encoder‑only), while regression is relatively insensitive to placement. Increasing the number of shots improves both metrics (Dice 0.45/0.62/0.84 for 16/32/64 shots; MAE 0.33/0.16/0.001 respectively), confirming the method’s few‑shot sensitivity.
Limitations are acknowledged: segmentation performance lags behind full fine‑tuning, likely due to the limited capacity of low‑rank adapters; brain‑age predictions systematically underestimate age, possibly because missing ages in IXI were imputed to 50 years and the few‑shot regime hampers calibration. The authors also note that they did not evaluate an infarct‑detection task (ISLES 2022) due to dataset imbalance.
Overall, the study demonstrates that frozen foundation models combined with task‑specific LoRA adapters provide a practical solution for continual learning in medical imaging, offering near‑zero forgetting, extreme parameter efficiency, and the ability to add new tasks without revisiting old data. Future work could explore higher LoRA ranks, shared adapters across related tasks, and more sophisticated data augmentation to close the performance gap with full fine‑tuning while preserving the benefits of the proposed framework.
Comments & Academic Discussion
Loading comments...
Leave a Comment