Knowledge distillation is an effective and hardwarefriendly method, which plays a key role in lightweighting remote sensing object detection. However, existing distillation methods often encounter the issue of mixed features in remote sensing images (RSIs), and neglect the discrepancies caused by subtle feature variations, leading to entangled knowledge confusion. To address these challenges, we propose an architecture-agnostic distillation method named Dual-Stream Spectral Decoupling Distillation (DS 2 D 2 ) for universal remote sensing object detection tasks. Specifically, DS 2 D 2 integrates explicit and implicit distillation grounded in spectral decomposition. Firstly, the first-order wavelet transform is applied for spectral decomposition to preserve the critical spatial characteristics of RSIs. Leveraging this spatial preservation, a Density-Independent Scale Weight (DISW) is designed to address the challenges of dense and small object detection common in RSIs. Secondly, we show implicit knowledge hidden in subtle student-teacher feature discrepancies, which significantly influence predictions when activated by detection heads. This implicit knowledge is extracted via full-frequency and high-frequency amplifiers, which map feature differences to prediction deviations. Extensive experiments on DIOR and DOTA datasets validate the effectiveness of the proposed method. Specifically, on DIOR dataset, DS 2 D 2 achieves improvements of 4.2% in AP 50 for RetinaNet and 3.8% in AP 50 for Faster R-CNN, outperforming existing distillation approaches. The source code will be available at https://github.com/PolarAid/DS2D2.
💡 Deep Analysis
📄 Full Content
1
Dual-Stream Spectral Decoupling Distillation
for Remote Sensing Object Detection
Xiangyi Gao, Danpei Zhao*, Member, IEEE, Bo Yuan, Wentao Li
Abstract—Knowledge distillation is an effective and hardware-
friendly method, which plays a key role in lightweighting remote
sensing object detection. However, existing distillation methods
often encounter the issue of mixed features in remote sensing
images (RSIs), and neglect the discrepancies caused by subtle
feature variations, leading to entangled knowledge confusion. To
address these challenges, we propose an architecture-agnostic
distillation method named Dual-Stream Spectral Decoupling
Distillation (DS2D2) for universal remote sensing object detection
tasks. Specifically, DS2D2 integrates explicit and implicit distilla-
tion grounded in spectral decomposition. Firstly, the first-order
wavelet transform is applied for spectral decomposition to pre-
serve the critical spatial characteristics of RSIs. Leveraging this
spatial preservation, a Density-Independent Scale Weight (DISW)
is designed to address the challenges of dense and small object
detection common in RSIs. Secondly, we show implicit knowledge
hidden in subtle student-teacher feature discrepancies, which
significantly influence predictions when activated by detection
heads. This implicit knowledge is extracted via full-frequency
and high-frequency amplifiers, which map feature differences
to prediction deviations. Extensive experiments on DIOR and
DOTA datasets validate the effectiveness of the proposed method.
Specifically, on DIOR dataset, DS2D2 achieves improvements of
4.2% in AP50 for RetinaNet and 3.8% in AP50 for Faster R-CNN,
outperforming existing distillation approaches. The source code
will be available at https://github.com/PolarAid/DS2D2.
Index Terms—Knowledge distillation, object detection, remote
sensing images, spectral decomposition.
I. INTRODUCTION
T
HE rapid development of object detection algorithms has
significantly enhanced information extraction capabilities
in remote sensing images (RSIs). Existing advanced methods
with complex structural designs [1], [2] suffer from slow
inference speeds and face deployment challenges on hardware-
constrained platforms. As shown in Figure 1, RSIs typically
cover diverse and complex scenes, where small objects are
often obscured. This imposes additional challenges on detec-
tion methods, requiring more sophisticated discrimination and
processing techniques to accurately extract critical features,
thereby increasing the difficulty of lightweight optimization.
To address the conflict between the massive streams of remote
Manuscript created Mar 20, 2025; revised July 22, 2025; accepted Au-
gust 14, 2025. This work was supported by the National Natural Science
Foundation of China under Grant 62271018 and in part by the Academic
Excellence Foundation of BUAA for PhD Students. (Corresponding author:
Danpei Zhao.)
Xiangyi Gao, Danpei Zhao, Bo Yuan, and Wentao Li are with the
Department of Aerospace Intelligent Science and Technology, School of
Astronautics, Beihang University, Beijing 102206, China, and also with
Key Laboratory of Spacecraft Design Optimization and Dynamic Simula-
tion Technology, Ministry of Education (e-mail: gaoxiangyi23@buaa.edu.cn,
zhaodanpei@buaa.edu.cn, yuanbobuaa@buaa.edu.cn, canoe@buaa.edu.cn).
Teacher
Student
Feats
Feats
Small and Dense
Diverse and Complex
Confused Feature
(a) Conventional Feature Distillation
Teacher
Student
SD-Feats
SD-Feats
Amplifier
Wavelet
SD-Feats: Spectral Decoupled Features
Low
High
Precise
Localization
Concealed
Candidates
Key Region
Recognition
Global
Modeling
Small and Dense
Diverse and Complex
(b) Dual-Stream Spectral Decoupling Distillation (DS2D2)
Fig. 1. An overview of conventional feature distillation versus our DS2D2.
Conventional methods struggle with semantic confusion and neglect im-
plicit knowledge. We employ wavelet transforms for spectral decomposition.
Besides, combining explicit and implicit distillation enables comprehensive
learning.
sensing data and the demand for rapid interpretation, numerous
lightweight methods have been proposed. They primarily
include efficient architecture design [3], [4], [5], pruning [6],
[7], quantization [8], [9], and knowledge distillation [10], [11].
Among these, knowledge distillation has emerged as a domi-
nant lightweight paradigm widely adopted in remote sensing
tasks due to its deployment efficiency, robust performance, and
hardware adaptability.
Knowledge distillation was first proposed by Hinton in 2015
[12], and it has been extensively studied by researchers for
remote sensing applications. As a lightweight method, knowl-
edge distillation does not reduce the computational cost of the
model. It improves the model’s accuracy while keeping the
computational cost unchanged. Therefore, the model’s compu-
tational efficiency is enhanced, achieving overall lightweight-
ing. Related methods [10], [11], [13], [14] devise various
feature-weighting