Catastrophic Forgetting in Kolmogorov-Arnold Networks

Reading time: 6 minute
...

📝 Abstract

Catastrophic forgetting is a longstanding challenge in continual learning, where models lose knowledge from earlier tasks when learning new ones. While various mitigation strategies have been proposed for Multi-Layer Perceptrons (MLPs), recent architectural advances like Kolmogorov-Arnold Networks (KANs) have been suggested to offer intrinsic resistance to forgetting by leveraging localized spline-based activations. However, the practical behavior of KANs under continual learning remains unclear, and their limitations are not well understood. To address this, we present a comprehensive study of catastrophic forgetting in KANs and develop a theoretical framework that links forgetting to activation support overlap and intrinsic data dimension. We validate these analyses through systematic experiments on synthetic and vision tasks, measuring forgetting dynamics under varying model configurations and data complexity. Further, we introduce KAN-LoRA, a novel adapter design for parameter-efficient continual fine-tuning of language models, and evaluate its effectiveness in knowledge editing tasks. Our findings reveal that while KANs exhibit promising retention in low-dimensional algorithmic settings, they remain vulnerable to forgetting in high-dimensional domains such as image classification and language modeling. These results advance the understanding of KANs’ strengths and limitations, offering practical insights for continual learning system design.

💡 Analysis

Catastrophic forgetting is a longstanding challenge in continual learning, where models lose knowledge from earlier tasks when learning new ones. While various mitigation strategies have been proposed for Multi-Layer Perceptrons (MLPs), recent architectural advances like Kolmogorov-Arnold Networks (KANs) have been suggested to offer intrinsic resistance to forgetting by leveraging localized spline-based activations. However, the practical behavior of KANs under continual learning remains unclear, and their limitations are not well understood. To address this, we present a comprehensive study of catastrophic forgetting in KANs and develop a theoretical framework that links forgetting to activation support overlap and intrinsic data dimension. We validate these analyses through systematic experiments on synthetic and vision tasks, measuring forgetting dynamics under varying model configurations and data complexity. Further, we introduce KAN-LoRA, a novel adapter design for parameter-efficient continual fine-tuning of language models, and evaluate its effectiveness in knowledge editing tasks. Our findings reveal that while KANs exhibit promising retention in low-dimensional algorithmic settings, they remain vulnerable to forgetting in high-dimensional domains such as image classification and language modeling. These results advance the understanding of KANs’ strengths and limitations, offering practical insights for continual learning system design.

📄 Content

Catastrophic forgetting, also known as catastrophic interference (McCloskey and Cohen 1989), a fundamental challenge in machine learning, occurs when a neural network loses previously acquired information while learning from new data. This phenomenon is central to the field of continual learning, where models are trained incrementally on nonstationary data distributions (Ven, Soures, and Kudithipudi 2024;Kemker et al. 2017). Moreover, it is prevalent in a wide range of research fields such as meta-learning (Spigler 2020), domain adaptation (Xu et al. 2020), foundation models (Luo et al. 2025), and reinforcement learning (Zhang et al. 2023), where the retention of prior knowledge is critical for generalization and stability.

Multi-Layer Perceptrons (MLPs) are inherently prone to catastrophic forgetting (Zenke, Poole, and Ganguli 2017). Several techniques have been proposed to overcome catastrophic forgetting in MLPs (Wang et al. 2025;De Lange et al. 2022). Regularization-based techniques (Kirkpatrick et al. 2017;Kong et al. 2024) impose restrictions on the network’s weight adjustments, hence reducing the likelihood of interference with previously acquired knowledge. Architecture-based methods (Yoon et al. 2018;Mirzadeh et al. 2022) mitigate forgetting by modifying the network’s architecture to accommodate new information. Rehearsalbased methods (Buzzega et al. 2020;Riemer et al. 2019) aim to preserve prior information by including data samples from earlier learning sessions during the current session. Although catastrophic forgetting has been extensively studied in MLPs, it remains relatively underexplored in emerging fundamental neural architectures such as Kolmogorov-Arnold Networks (KANs) (Liu et al. 2025).

KANs, inspired by the Kolmogorov-Arnold representation theorem (Kolmogorov 1961), have emerged as a promising alternative neural network architecture to traditional MLPs. KANs were introduced to address several fundamental limitations of MLPs. Unlike MLPs, which rely on fixed activation functions, KANs utilize learnable onedimensional activation functions (spline) along the edges of the network. Splines can be easily adjusted locally and are accurate for low-dimensional functions, giving KANs the potential to avoid forgetting. As spline bases are local, a data sample affects only a few related spline coefficients, leaving other coefficients unaltered. This unique architecture enables KANs to learn non-linear relations more effectively and to be more robust against catastrophic forgetting in continual learning scenarios (Lee et al. 2025). KANs have been successfully applied in various domains (Yang and Wang 2025;Abd Elaziz, Ahmed Fares, and Aseeri 2024), yet studies around their effectiveness in mitigating catastrophic forgetting in continual learning are still quite limited.

Only a few pioneer works have studied the catastrophic forgetting phenomenon in KANs under the continual learning settings. Lee et al. recently proposed a simple and heuristic strategy, WiseKAN, which allocates distinct parameter subspaces to different tasks to mitigate catastrophic forgetting in KANs. Liu et al. demonstrated robustness of KANs against catastrophic forgetting using synthetic data on re-gression tasks. Furthermore, some studies proposed modified KANs to achieve robust retention in specific domains, such as classification (Hu et al. 2025) and face forgery detection (Zhang et al. 2025) tasks. Despite these initial efforts, a comprehensive understanding of forgetting in KANs remains elusive, particularly in terms of theoretical characterization and empirical evaluation on practical real-world tasks.

To bridge the gap, we first develop a theoretical framework for understanding catastrophic forgetting in KANs by formulating several key factors such as activation support overlap and intrinsic data dimension. Our analysis reveals that forgetting in KANs scales linearly with activation support overlap and grows exponentially with the intrinsic dimensionality of the task manifold, offering a principled explanation for KANs’ robustness in simple tasks and vulnerability in complex domains. Building on these insights, we then conduct extensive empirical experiments comparing KANs with MLPs across a spectrum of tasks, including the low-dimensional synthetic addition and the highdimensional image classification. Furthermore, we design a novel LoRA (Hu et al. 2022) adapter based on KAN, termed KAN-LoRA, to enable continual fine-tuning of language models (LMs) for sequential knowledge editing. Across all experimental settings, our results consistently corroborate the theoretical analysis, illustrating that while KANs achieve strong retention in structured and low-dimensional tasks, they remain susceptible to forgetting in high-dimensional domains, thereby highlighting both the strengths and limitations of KANs in practical continual learning scenarios. Our main contributions are summarized as below:

• We develop a theoretical

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut