DAUNet: A Lightweight UNet Variant with Deformable Convolutions and Parameter-Free Attention for Medical Image Segmentation

Reading time: 4 minute
...

📝 Original Info

  • Title: DAUNet: A Lightweight UNet Variant with Deformable Convolutions and Parameter-Free Attention for Medical Image Segmentation
  • ArXiv ID: 2512.07051
  • Date: 2025-12-07
  • Authors: Adnan Munir, Muhammad Shahid Jabbar, Shujaat Khan

📝 Abstract

Medical image segmentation plays a pivotal role in automated diagnostic and treatment planning systems. In this work, we present DAUNet, a novel lightweight UNet variant that integrates Deformable V2 Convolutions and Parameter-Free Attention (SimAM) to improve spatial adaptability and context-aware feature fusion without increasing model complexity. DAUNet's bottleneck employs dynamic deformable kernels to handle geometric variations, while the decoder and skip pathways are enhanced using SimAM attention modules for saliency-aware refinement. Extensive evaluations on two challenging datasets, FH-PS-AoP (fetal head and pubic symphysis ultrasound) and FUMPE (CT-based pulmonary embolism detection), demonstrate that DAUNet outperforms state-of-the-art models in Dice score, HD95, and ASD, while maintaining superior parameter efficiency. Ablation studies highlight the individual contributions of deformable convolutions and SimAM attention. DAUNet's robustness to missing context and low-contrast regions establishes its suitability for deployment in real-time and resource-constrained clinical environments.

💡 Deep Analysis

Figure 1

📄 Full Content

Medical image segmentation is a foundational task in computer-assisted diagnosis, enabling the precise localization and delineation of anatomical structures that are critical for clinical interpretation, surgical planning, and disease monitoring. Accurate and automated segmentation reduces manual effort and inter-observer variability, particularly in high-throughput clinical settings. Despite significant advances achieved through convolutional neural networks (CNNs), especially the widely adopted UNet architecture [1], key challenges persist, most notably in achieving robustness, han-dling anatomical variability, and maintaining computational efficiency.

Although effective in many scenarios, the classical UNet architecture presents several limitations. Its use of fixedgrid convolutions restricts adaptability to variable-sized features and irregular organ boundaries. Recent deformableconvolution segmentation networks have also emphasized boundary-aware modeling to better align predictions with anatomical contours [2]. Additionally, UNet often struggles in low-contrast or noisy environments, common in modalities such as ultrasound [3], [4] and CT angiography, where anatomical boundaries are not clearly visible [5], [6]. Moreover, UNet lacks mechanisms to capture long-range dependencies, which are crucial for modeling global context in complex medical images.

To overcome these shortcomings, recent works have explored enhancements to UNet via transformer-based modules and attention mechanisms [7], [8]. For instance, Masoudi et al. [7] proposed FAT-Net, which augments a UNet-style backbone with transformer branches to capture long-range interactions and feature adaptation modules to suppress background noise. Similarly, Zhang et al. [9] introduced TransAttUNet, incorporating a Self-Aware Attention (SAA) module that integrates Transformer Self-Attention (TSA) and Global Spatial Attention (GSA) to improve multi-scale feature fusion. Other methods such as DSEUNet [10] and MISSFormer [11] attempt to bridge CNN and transformer paradigms. DSEUNet deepens the UNet backbone while introducing Squeeze-and-Excitation (SE) blocks [12] and hierarchical supervision. MISSFormer, on the other hand, employs enhanced transformer blocks and multi-scale fusion to balance local and global feature representation.

General-purpose models such as MedSAM [13], adapted from the Segment Anything Model (SAM), offer promptbased segmentation across various modalities. Trained on over 1.5 million image-mask pairs, MedSAM shows strong performance on CT, MRI, and endoscopy images. However, limitations remain due to the underrepresentation of certain modalities (e.g., mammography) and imprecise vessel boundary segmentation when using bounding-box prompts.

While the aforementioned models demonstrate commendable segmentation performance, they often suffer from high computational complexity and slower inference, limiting their suitability for real-time or resource-constrained environments.

Hybrid models like H2Former [14] and SCUNet++ [15] further aim to unify the strengths of CNNs and transformers. H2Former leverages hierarchical token-wise and channelwise attention to model both local and global dependencies. SCUNet++ integrates CNN bottlenecks and dense skip connections to improve pulmonary embolism (PE) segmentation. Although SCUNet++ achieves high Dice scores on PE datasets, it tends to produce blocky segmentation outputs on large lesions and has a substantial parameter burden. Other methods, such as CE-Net [16], augment UNet with Dense Atrous Convolution (DAC) and Residual Multi-kernel Pooling (RMP) blocks to improve feature representation, but their multi-branch architectures increase memory requirements and limit scalability.

Motivated by the need for efficient, adaptable, and robust segmentation models suitable for real-world clinical deployment, we propose DAUNet, a lightweight and effective UNetbased architecture featuring two key innovations:

• Improved Bottleneck: A lightweight deformable convolution-based bottleneck module [17], [18] that introduces dynamic, spatially adaptive receptive fields. This design enables the model to better capture geometric deformations and irregular anatomical boundaries. • Improved Decoder: A parameter-free attention mechanism (SimAM) [19] is integrated into the decoder and skips connections to enhance spatial feature representation and facilitate efficient feature fusion, without increasing model complexity. To demonstrate its effectiveness, we evaluate DAUNet on two challenging medical image segmentation tasks: (1) fetal head and pubic symphysis segmentation from transperineal ultrasound using the FH-PS-AoP dataset [20], and (2) pulmonary embolism detection in CT angiography using the FUMPE dataset [21]. Both tasks are characterized by substantial anatomical variability, low-contrast regions, and limited contextual information, factors that commonly impair the performance of conventional models. T

📸 Image Gallery

00166_gt_mask.png 00166_original.png 00166_overlay.png 00166_pred_Fat.png 00166_pred_Trans.png 00166_pred_mask.png 00503_gt_mask.png 00503_original.png 00503_overlay.png 00503_pred_Fat.png 00503_pred_Trans.png 00503_pred_mask.png 00684_gt_mask.png 00684_original.png 00684_overlay.png 00684_pred_Fat.png 00684_pred_Trans.png 00684_pred_mask.png Bio_Pic_AdnanMunir.jpeg Bio_Pic_Shujaat_Khan2.jpg Overlay_4patches.png PAT005_D0120_image.png PAT017_D0279_image.png PAT026_D0136_image.png Proposed_Segmention_Unet.png UNet_output.png UNet_patch1.png UNet_patch2.png UNet_patch3.png UNet_patch4.png Unet_original_vs_amplified_masked_offsets_patch1_166.png Unet_original_vs_amplified_masked_offsets_patch2_166.png Unet_original_vs_amplified_masked_offsets_patch3_166.png Unet_original_vs_amplified_masked_offsets_patch4_166.png deform_fig.png image_patch_1_filled_black_border.png image_patch_2_filled_black_border.png image_patch_3_filled_black_border.png image_patch_4_filled_black_border.png logo-eps-converted-to.png original_vs_amplified_masked_offsets_original_DAUNet_166.png original_vs_amplified_masked_offsets_original_UNet_166.png original_vs_amplified_masked_offsets_patch1_166.png original_vs_amplified_masked_offsets_patch2_166.png original_vs_amplified_masked_offsets_patch3_166.png original_vs_amplified_masked_offsets_patch4_166.png params_vs_dsc_psfh_with_arrow.png patch1.png patch2.png patch3.png patch4.png pred_Unet_proposed.png simAM_fig.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut