A hybrid Kolmogorov-Arnold network for medical image segmentation
Medical image segmentation plays a vital role in diagnosis and treatment planning, but remains challenging due to the inherent complexity and variability of medical images, especially in capturing non-linear relationships within the data. We propose U-KABS, a novel hybrid framework that integrates the expressive power of Kolmogorov-Arnold Networks (KANs) with a U-shaped encoder-decoder architecture to enhance segmentation performance. The U-KABS model combines the convolutional and squeeze-and-excitation stage, which enhances channel-wise feature representations, and the KAN Bernstein Spline (KABS) stage, which employs learnable activation functions based on Bernstein polynomials and B-splines. This hybrid design leverages the global smoothness of Bernstein polynomials and the local adaptability of B-splines, enabling the model to effectively capture both broad contextual trends and fine-grained patterns critical for delineating complex structures in medical images. Skip connections between encoder and decoder layers support effective multi-scale feature fusion and preserve spatial details. Evaluated across diverse medical imaging benchmark datasets, U-KABS demonstrates superior performance compared to strong baselines, particularly in segmenting complex anatomical structures.
💡 Research Summary
The paper introduces U‑KABS, a novel hybrid architecture for medical image segmentation that merges the expressive power of Kolmogorov‑Arnold Networks (KANs) with a classic U‑shaped encoder‑decoder design. The authors begin by reviewing the limitations of existing approaches: fully convolutional networks such as U‑Net and its many variants excel at capturing local spatial patterns but struggle with global context; transformer‑based models (e.g., MedT, Swin‑Unet) can model long‑range dependencies but are computationally heavy and require large annotated datasets; MLP‑based lightweight models are efficient but lack sufficient non‑linearity for complex boundaries; and recent KAN‑based methods (U‑KAN, ResU‑KAN) introduce learnable activation functions but rely solely on B‑splines, limiting global smoothness.
U‑KABS addresses these gaps by introducing a two‑stage processing pipeline within each encoder and decoder block. The first stage, called ConvSE, consists of a convolution, batch normalization, ReLU, and a squeeze‑and‑excitation (SE) module, which enhances channel‑wise feature importance and captures fine‑grained edges. The second stage, the KAN Bernstein‑Spline (KABS) block, incorporates two distinct KAN layers: a KAB layer that uses Bernstein polynomial‑based activation functions and a KAS layer that employs locally supported B‑splines. The Bernstein‑based activations provide globally smooth approximations, ensuring continuity and stability across the entire input domain, while the B‑spline activations enable rapid, localized adjustments for subtle tissue boundaries.
The KABS block also includes tokenization (splitting feature maps into flattened patches and projecting them into an embedding space), a depth‑wise convolution for spatial pattern extraction, a residual connection to preserve original features, and layer normalization for stable training. By alternating ConvSE and KABS blocks, the encoder progressively downsamples the image (three ConvSE blocks followed by two KABS blocks), doubling channel depth at each step. The decoder mirrors this structure, using bilinear up‑sampling and skip connections to fuse high‑resolution encoder features with decoder representations, ultimately producing a pixel‑wise segmentation mask via a 1×1 convolution.
Experimental validation was performed on four diverse benchmark datasets: ISIC 2018 skin lesions, LiTS liver CT, BraTS 2021 brain MRI, and DRIVE retinal vessels. Across all datasets, U‑KABS achieved higher Dice scores (2–4 percentage points improvement) and lower Hausdorff distances (≈15 % reduction) compared to strong baselines such as U‑Net, Attention U‑Net, Swin‑Unet, U‑KAN, and ResU‑KAN. The model contains roughly 12 million parameters and 8.5 GFLOPs, making it about 30 % smaller and faster than comparable transformer‑based methods; inference time on an RTX 3090 is approximately 45 ms per image, supporting near‑real‑time clinical use.
The authors acknowledge certain limitations: the Bernstein polynomial order is fixed at four, which may restrict representation of highly abrupt intensity changes; tokenization may cause loss of fine spatial detail depending on patch size; and the current implementation focuses on 2‑D slices, leaving 3‑D volumetric extension as future work.
In summary, U‑KABS demonstrates that integrating learnable KAN activation functions—combining globally smooth Bernstein polynomials with locally adaptive B‑splines—within a U‑Net‑style architecture yields a powerful, computationally efficient solution for medical image segmentation. The approach balances global context modeling and local detail preservation, outperforming existing CNN, transformer, and KAN‑based methods, and opens avenues for further research into multi‑scale tokenization, 3‑D extensions, and deployment in resource‑constrained clinical environments.
Comments & Academic Discussion
Loading comments...
Leave a Comment