This article focuses on signal classification for deep-sea acoustic neutrino detection. In the deep sea, the background of transient signals is very diverse. Approaches like matched filtering are not sufficient to distinguish between neutrino-like signals and other transient signals with similar signature, which are forming the acoustic background for neutrino detection in the deep-sea environment. A classification system based on machine learning algorithms is analysed with the goal to find a robust and effective way to perform this task. For a well-trained model, a testing error on the level of one percent is achieved for strong classifiers like Random Forest and Boosting Trees using the extracted features of the signal as input and utilising dense clusters of sensors instead of single sensors.
Deep Dive into Signal Classification for Acoustic Neutrino Detection.
This article focuses on signal classification for deep-sea acoustic neutrino detection. In the deep sea, the background of transient signals is very diverse. Approaches like matched filtering are not sufficient to distinguish between neutrino-like signals and other transient signals with similar signature, which are forming the acoustic background for neutrino detection in the deep-sea environment. A classification system based on machine learning algorithms is analysed with the goal to find a robust and effective way to perform this task. For a well-trained model, a testing error on the level of one percent is achieved for strong classifiers like Random Forest and Boosting Trees using the extracted features of the signal as input and utilising dense clusters of sensors instead of single sensors.
Essential for the feasibility of acoustic neutrino detection is a good understanding of the background of transient acoustic signals in the deep sea and the ability to suppress them or identify them as background. The transient signals are very diverse and originate from anthropogenic and biological sources as well as weather-correlated sources. The aim of the AMADEUS project [1] is to investigate the method of acoustic neutrino detection. AMADEUS is integrated into the ANTARES neutrino telescope [2], which is located in the Mediterranean Sea and the acoustic set-up consists of six clusters of six acoustic sensors each. The spaces between the sensors within the clusters are about 1 m and between the clusters up to 350 m. In the experiment, transient signals with bipolar (i.e. neutrino-like) content are selected using on-line filtering techniques. As the variety of recorded transient signals is still high, an effective classification scheme to discriminate between background and neutrino-like signals is researched and presented here. The analysis chain incorporates a simulation of transient signals, a filter analogous to the one used on-line in the experiment, feature extraction algorithms and the signal classification based on machine learning algorithms.
The goal of this research is to find a robust and well performing system to distinguish between neutrino-like and other transient signals occurring in the deep sea, like man-made and biological sources. In this Section, the methods used for training and testing the classification system will be explained.
A special purpose simulation was designed for testing the feature extraction and classification system, which is also trained with simulated data. The simulation is capable of generating typical deep-sea signals, waveforms present at the ANTARES site like bipolar and multi-polar pulses, echoes of the ANTARES acoustic positioning system or random signals. The different signal types are generated following a uniform frequency distribution. Starting from random source positions within a given volume around the detector, the signals are propagated to the sensors and characteristic ambient noise of different sea levels is added. The output -a continuous data stream -is directed to the filter and from there to the feature extraction or directly to the classification system.
As a first step, the incoming continuous data stream is subjected to a filter system equivalent to the one used in the experiment, where it is used to reduce the amount of data stored for off-line classification and reconstruction. The filter set-up consists of an amplitude threshold for strong transient signals, which is self-adjusting to the changing ambient noise conditions, and a matched filter for bipolar signals [3]. As reference signal for the matched filter a bipolar pulse is used according to the one, which is produced by a 10 20 eV Shower at a distance of 300 m perpendicular to the shower axis [4]. In a next step, the characteristics of the filtered signals are extracted. The resulting feature vector contains the time and frequency domain characteristics of the signal as well as the results of a matched filter bank, which was tuned for neutrino-like signals. The bank consists of six reference signals corresponding to angles of 90 • -96 • in one degree steps to the shower axis of a 10 20 eV Shower at a distance of 300 m. In the time domain, the number of occurring peaks and the peak-to-peak amplitude of the largest peak, its asymmetry and duration are extracted. In the frequency domain, the main frequency component and the excess over the noise background are used as features. From the results of the matched filter bank, the best match is taken into account. From this matched filter output the number of peaks and the amplitude, the width and the integral of the largest peak are stored in the feature vector. As an independent feature vector, the filtered waveform itself can be subjected to the classification algorithm.
The classification system stems from machine learning algorithms [6] trained and tested with data from the simulation. As input, either the extracted feature vector or the filtered waveform is used; as output, either binary class labels (bipolar or not) or multiple class labels (one for each signal type in the simulation data) are predicted. The following algorithms [7] have been investigated for individual sensors and clusters of sensors:
• Naïve Bayes: This simple classification model is based on applying the Bayes theorem and assuming that the features are conditionally independent of one another for each class. For a given feature vector, the class is selected using probabilities gained from the training data.
• Decision Tree: This classification model stems from a tree-like structured set of rules. Starting at the root, the tree splits up on each node based on the input variable with the highest information gain. The path from the root of the tree to one of
…(Full text truncated)…
This content is AI-processed based on ArXiv data.