Nodule-DETR: A Novel DETR Architecture with Frequency-Channel Attention for Ultrasound Thyroid Nodule Detection

Thyroid cancer is the most common endocrine malignancy, and its incidence is rising globally. While ultrasound is the preferred imaging modality for detecting thyroid nodules, its diagnostic accuracy

Nodule-DETR: A Novel DETR Architecture with Frequency-Channel Attention for Ultrasound Thyroid Nodule Detection

Thyroid cancer is the most common endocrine malignancy, and its incidence is rising globally. While ultrasound is the preferred imaging modality for detecting thyroid nodules, its diagnostic accuracy is often limited by challenges such as low image contrast and blurred nodule boundaries. To address these issues, we propose Nodule-DETR, a novel detection transformer (DETR) architecture designed for robust thyroid nodule detection in ultrasound images. Nodule-DETR introduces three key innovations: a Multi-Spectral Frequency-domain Channel Attention (MSFCA) module that leverages frequency analysis to enhance features of low-contrast nodules; a Hierarchical Feature Fusion (HFF) module for efficient multi-scale integration; and Multi-Scale Deformable Attention (MSDA) to flexibly capture small and irregularly shaped nodules. We conducted extensive experiments on a clinical dataset of real-world thyroid ultrasound images. The results demonstrate that Nodule-DETR achieves state-of-the-art performance, outperforming the baseline model by a significant margin of 0.149 in mAP@0.5:0.95. The superior accuracy of Nodule-DETR highlights its significant potential for clinical application as an effective tool in computer-aided thyroid diagnosis. The code of work is available at https://github.com/wjj1wjj/Nodule-DETR.


💡 Research Summary

Thyroid cancer is the most prevalent endocrine malignancy worldwide, and high‑resolution ultrasound remains the first‑line imaging modality for its detection. However, ultrasound images suffer from low contrast, blurred nodule boundaries, and speckle noise, which limit both radiologists’ visual assessment and the performance of conventional computer‑aided detection (CAD) systems. In response to these challenges, the authors propose Nodule‑DETR, a novel detection‑transformer architecture specifically engineered for robust thyroid nodule detection in clinical ultrasound scans.
The core of Nodule‑DETR consists of three complementary modules. First, the Multi‑Spectral Frequency‑domain Channel Attention (MSFCA) module transforms the input image into the frequency domain using a 2‑D Fourier transform, separates low‑ and high‑frequency components, and learns channel‑wise attention weights for each spectral band. By emphasizing frequency bands that highlight subtle texture and edge cues, MSFCA amplifies the representation of low‑contrast nodules while suppressing background speckle. Second, the Hierarchical Feature Fusion (HFF) module aggregates multi‑level features extracted from a hybrid CNN‑ViT backbone. Features from four scales are progressively up‑sampled or down‑sampled, aligned in channel dimension, and combined through scale‑aware attention weighting. This hierarchical fusion preserves fine details of small nodules and contextual information of larger lesions without incurring excessive computational cost. Third, the Multi‑Scale Deformable Attention (MSDA) module extends the deformable attention mechanism of Deformable DETR by introducing a learnable scale factor for each attention head. The scale factor dynamically adjusts the density and spatial extent of sampling points, enabling the model to focus on both tiny, irregularly shaped nodules and larger, well‑defined ones. Offsets predicted by a lightweight network are multiplied by the scale factor, producing multi‑scale deformable attention maps that are robust to the typical geometric distortions of ultrasound imaging.
Training follows the standard DETR pipeline with Hungarian bipartite matching between predicted boxes and ground‑truth annotations. To prevent the frequency attention from dominating the loss, an L2 regularization term on the MSFCA weights is added. Data augmentation includes rotation, scaling, Gaussian noise injection, and intensity jitter to simulate the wide variability of clinical acquisition settings.
The authors evaluated Nodule‑DETR on a curated dataset of 2,500 real‑world thyroid ultrasound images collected from multiple hospitals, encompassing a broad range of scanner models, imaging angles, and patient demographics. Performance was measured using COCO‑style metrics: mAP@


📜 Original Paper Content

🚀 Synchronizing high-quality layout from 1TB storage...