음향 기반 드론 인식 네트워크 AUDRON
📝 Abstract
Unmanned aerial vehicles (UAVs), commonly known as drones, are increasingly used across diverse domains, including logistics, agriculture, surveillance, and defense. While these systems provide numerous benefits, their misuse raises safety and security concerns, making effective detection mechanisms essential. Acoustic sensing offers a low-cost and non-intrusive alternative to vision or radar-based detection, as drone propellers generate distinctive sound patterns. This study introduces AUDRON (AUdio-based Drone Recognition Network), a hybrid deep learning framework for drone sound detection, employing a combination of Mel-Frequency Cepstral Coefficients (MFCC), Short-Time Fourier Transform (STFT) spectrograms processed with convolutional neural networks (CNNs), recurrent layers for temporal modeling, and autoencoder-based representations. Feature-level fusion integrates complementary information before classification. Experimental evaluation demonstrates that AU-DRON effectively differentiates drone acoustic signatures from background noise, achieving high accuracy while maintaining generalizability across varying conditions. AUDRON achieves 98.51% and 97.11% accuracy in binary and multiclass classification. The results highlight the advantage of combining multiple feature representations with deep learning for reliable acoustic drone detection, suggesting the framework’s potential for deployment in security and surveillance applications where visual or radar sensing may be limited.
💡 Analysis
Unmanned aerial vehicles (UAVs), commonly known as drones, are increasingly used across diverse domains, including logistics, agriculture, surveillance, and defense. While these systems provide numerous benefits, their misuse raises safety and security concerns, making effective detection mechanisms essential. Acoustic sensing offers a low-cost and non-intrusive alternative to vision or radar-based detection, as drone propellers generate distinctive sound patterns. This study introduces AUDRON (AUdio-based Drone Recognition Network), a hybrid deep learning framework for drone sound detection, employing a combination of Mel-Frequency Cepstral Coefficients (MFCC), Short-Time Fourier Transform (STFT) spectrograms processed with convolutional neural networks (CNNs), recurrent layers for temporal modeling, and autoencoder-based representations. Feature-level fusion integrates complementary information before classification. Experimental evaluation demonstrates that AU-DRON effectively differentiates drone acoustic signatures from background noise, achieving high accuracy while maintaining generalizability across varying conditions. AUDRON achieves 98.51% and 97.11% accuracy in binary and multiclass classification. The results highlight the advantage of combining multiple feature representations with deep learning for reliable acoustic drone detection, suggesting the framework’s potential for deployment in security and surveillance applications where visual or radar sensing may be limited.
📄 Content
The rapid advancement of drone technology has transformed multiple industries, enabling innovative applications in logistics, precision agriculture, infrastructure inspection, environmental monitoring, and security [1]. Modern UAVs are increasingly compact, agile, and affordable, which allows for flexible deployment but also increases the risk of misuse [2]. Unauthorized drones can intrude into restricted airspaces, perform covert surveillance, or disrupt public events, creating significant safety, privacy, and regulatory challenges. Efficient detection and identification of such UAVs in real time is therefore crucial for operational safety, security enforcement, and public protection.
Traditional detection techniques [3], including radar, Li-DAR, and vision-based systems, offer high accuracy under AmygdalaAI-India Lab, is an international volunteer-run research group that advocates for AI for a better tomorrow https://amygdalaaiindia.github.io/ . controlled conditions but often face limitations in practical scenarios. Small or fast-moving drones can evade radar, while poor lighting, occlusions, or cluttered environments reduce the effectiveness of camera-based systems. Moreover, these approaches can be costly and infrastructure-intensive, making them less feasible for widespread deployment. Acoustic sensing emerges as a low-cost and non-intrusive alternative, leveraging the distinct sound signatures produced by drone propellers. However, environmental noise, overlapping sound sources, and variations in drone models pose challenges to reliable acoustic detection, necessitating robust feature extraction and intelligent modeling techniques.
This study introduces AUDRON, which stands for AUdiobased Drone Recognition Network, as a hybrid deep learning framework to address these challenges. AUDRON integrates multiple feature representations, including MFCC, STFT spectrograms processed with CNN, recurrent layers for temporal sequence modeling, and autoencoder-based embeddings for capturing latent audio characteristics. Feature-level fusion combines these complementary representations, enhancing the system’s capability to discriminate drone acoustic signatures from diverse background noises.
Key contributions include:
• Proposal of a unified framework (AUDRON) that effectively integrates multiple complementary feature representations for drone acoustic detection. • Demonstration of model robustness using diverse drone and environmental sounds, incorporating a noise class to minimize false positives in real-world scenarios. The paper is organized as follows. Section II reviews related work on drone classification. Section III details the proposed AUDRON model. Section IV presents the experimental setup, results, and discussion. Section V concludes with future directions.
Several studies have explored methods for capturing and analyzing drone sounds from real-world environments. While numerous studies focus on object detection using drones and drone imagery [4], comparatively fewer works address drone detection specifically from the distinctive acoustic signatures they generate. Al-Emadi et al. [5] proposed a deep learning approach using acoustic fingerprints to detect and identify arXiv:2512.20407v2 [cs.SD] 30 Dec 2025 drones from recorded audio. Alla et al. [6] proposed an audiovisual fusion approach combining CRNN and YOLOv5 for drone detection, achieving high accuracy, though reliance on both modalities may limit performance in scenarios where either audio or IR data is severely degraded. Zhang et al. [7] developed an audio-assisted camera array that fused visual and acoustic features for drone detection, though its dependence on bulky multi-camera setups limits portability and scalability. Iqbal et al. [8] proposed a sound-based amateur drone detection framework using MFCC [9] and Linear Predictive Cepstral Coefficients (LPCC) features [10] with SVM classifiers, achieving high accuracy but showing sensitivity to noisy environments. Dong et al. [11] proposed a deep learningbased drone sound detection system using fused acoustic features. Akbal et al. [12] proposed a sound-based amateur drone detection model using Skinny Pattern and Iterative Neighborhood Component Analysis(INCA) feature selection, achieving high accuracy on a small multi-class environmental sound dataset. This work [13] developed a deep learning-based UAV audio detection and identification system using STFT spectrograms, evaluating CNN, RNN, and CRNN models on diverse drone and environmental sounds for robust real-world performance. This work uses a dataset that includes both drone sounds and diverse environmental acoustic sounds as a nodrone class, aligning with environmental sound classification [14] approaches to better handle real-world false positives. These studies highlight the growing role of acoustic cues in drone detection, yet also underscore the need for more diverse datasets and robust methods to address real-world variability a
This content is AI-processed based on ArXiv data.