Intelligent Road Condition Monitoring using 3D In-Air SONAR Sensing

In this paper, we investigate the capabilities of in-air 3D SONAR sensors for the monitoring of road surface conditions. Concretely, we consider two applications: Road material classification and Road damage detection and classification. While such t…

Authors: Amber Cassimon, Robin Kerstens, Walter Daems

Intelligent Road Condition Monitoring using 3D In-Air SONAR Sensing
INTELLIGENT R OAD CONDITION MONIT ORING USING 3D IN-AIR SON AR SENSING 1 Intelligent Road Condition Monitoring using 3D In-Air SON AR Sensing Amber Cassimon, Robin K erstens, W alter Daems, Jan Steck el* Abstract —In this paper , we investigate the capabilities of in-air 3D SONAR sensors f or the monitoring of road surface conditions. Concretely , we consider two applications: Road material clas- sification and Road damage detection and classification. While such tasks can be performed with other sensor modalities, such as camera sensors and LiDAR sensors, these sensor modalities tend to fail in harsh sensing conditions, such as heavy rain, smoke or f og. By using a sensing modality that is robust to such interference, we enable the creation of opportunistic sensing applications, where vehicles performing other tasks (garbage collection, mail delivery , etc.) can also be used to monitor the condition of the road. For these tasks, we use a single dataset, in which different types of damages are annotated, with labels including the material of the road surface. In the material classification task, we differ entiate between three different road materials: Asphalt, Concrete and Element roads. In the damage detection and classification task, we determine if there is damage, and what type of damage (independent of material type), without localizing the damage. W e are succesful in determining the road surface type from SONAR sensor data, with F1 scores approaching 90% on the test set, but find that for the detection of damages performace lags, with F1 score around 75%. Fr om this, we conclude that SONAR sensing is a pr omising modality to include in opportunistic sensing-based pavement management systems, but that further research is needed to reach the desired accuracy . Index T erms —In-Air SONAR Sensing, 3D SONAR, Road Condition Monitoring, Pa vement Management System, Machine Learning, Deep Learning I . I N T R O D U C T I ON P A VEMENT management systems (PMS) allo w munic- ipalities to plan and execute road maintenance in an intelligent fashion [1]. Howe ver , such PMS can only fulfill their task if sufficient high-quality data is av ailable to accurate assess the state of the road surface. Traditionally , data on road surface conditions are gathered manually , i.e., the municipality sends out a worker to a specific location to assess the state of a road. This process is time-consuming for the worker , and ex- pensiv e for the municipality , while yielding only very limited cov erage of a municipality’ s road network. This process has been partially automated through the use of dedicated mapping vehicles. These mapping vehicles are equipped with a state-of- the-art sensor suite, and continuously dri ve around, collecting information while driving. This improves the coverage of the data being sent to the PMS over the manual approach, but still has some dra wbacks. In municipalities with large road networks, individual roads are often still revisited at a relativ ely lo w frequency , since such mapping v ehicles need All authors are with the Department of Electronics and ICT Engineering T echnology , Cosys-Lab Research Group, Uni versity of Antwerp, Antwerp, Belgium Flanders Make Strategic Research Centre, Lommel, Belgium *Corresponding Author: jan.steckel@uantwerpen.be to cover the entire municipalities road network. Additionally , for smaller municipalities with smaller income streams, the acquisition, operation and maintenance of dedicated mapping vehicles may be prohibitively expensi ve [1]. An alternative to dedicated mapping v ehicles that resolv es these issues is the use of opportunistic sensing [2]. In opportunistic sensing, a set of sensors is mounted on vehicles which drive around for other purposes, such as mail deliv ery , garbage collection, etc. These vehicles, by virtue of their purpose, already cov er most of a municipality’ s road network at relati vely frequent intervals. This mak es them excellent vehicles platforms for the collection of road damage data to feed into a PMS. Giv en the size of fleets of vehicles often used for these tasks, it is imperativ e that the sensors mounted on such v ehicles hav e a low cost, so they can be installed on as many v ehicles as possible, to ensure sufficient coverage. It is this setting that forms the context for this paper . W e propose a road surface condition monitoring system based around the use of 3D in-air SON AR (Sound Navigation and Ranging) sensors. W e consider the task of identifying the material of the road surface, as well as the task of detecting the presence and type of road surface defects. Our results show that we can succesfully detect road surface type with 3D in-air SON AR sensors, while we sho w that damage detection is viable, though its performance lags that of road surface type detection. Finally , we propose sev eral possible av enues for future impro vements. The remainder of this paper is structured as follows: Section II will look at related w ork for both road material identification and road damage detection. Next, in section III we e xplain our methodology for both tasks. Section IV explains the exact experimental set-up we use for our e xperiments, and section VI looks at the results of these experiments. Finally , we provide some closing thoughts on the work done in section VIII, along with some possible future av enues for improvements. I I . R E L A T E D W O R K In this section, we giv e a brief overvie w of some relev ant works, trying to showcase the breadth of the existing body of literature, without trying to exhausti vely enumerate all the work in this area. W e consider both the problem of detecting and identifying damages in road surfaces, as well as identifying the material of the road surface. A. Road Damage Detection In their work, [3] make use of an RGB (Red, Green, Blue) camera to assess the quality of a road surface into one of three quality categories: bad, regular and good. Specifically , their research focuses on the use of a segmentation model to isolate INTELLIGENT R OAD CONDITION MONITORING USING 3D IN-AIR SONAR SENSING 2 the road from other parts of the camera image. They show that the addition of a masking operation with a mask generated by a se gmentation model improv es classification performance in all classes. Shim et al. [4] make use of a lightweight auto-encoder network to identify road surface damage based on camera images. While they are capable of detecting the presence of road surf ace damages, they do not consider the identification of individual damages, nor do they try to make an assessment of the ov erall quality of the road. While both Shim et al., and T rinh et al. made use of camera images for their assessment of road damages, other sensing modalities have also been used. Li et al. [5] make use of tire noise to identify pavement damages, they consider pav ement to be in normal condition, to exhibit crack damages or to exhibit pothole damages. They achiev e an ov erall accuracy of 88.4% using a random forest combined with gradient boosting. The accelerometer sensors included in smartphones, com- bined with GPS (Global Positioning System) data were used by Dong et al. [6] to identify road surface distortions, patching, potholes and rutting. Data collection happened on a moving vehicle. Using k-means clustering, they were able to achie ve an average accurac y of 84%. Ultrasound sensors have also been used to identify damages in concrete structures, such as in the work of Kim et al [7]. They studied concrete specimens in a laboratory environment to identify various characteristics caused by freeze-tha w cycles experienced by the concrete specimens. While this work focused on damage detection in concrete, unlike the others, it does not focus on a vehicle-borne sensing case. B. Road Material Identification Previous work has been done on the use of visual-tactile sensor fusion to identify different road materials and surfaces [8]. In their work, Shi et al. perform sensor fusion between a camera and tactile information deriv ed from a PVDF (Polyvinylidene fluoride) sensor . They fuse these sensors using transformer neural networks, and obtain strong predictiv e performance across all four considered classes. Ultrasound sensing has also been used to identify road surface materials before. Kim et al. [9] mount ultrasonic sensors in front of the front wheels of a vehicle, and use a short-time fourier transform coupled with a deep neural network to determine the type of road surface currently being driv en on. Across 8 road surface types, they achie ve accuracies upwards of 95%. Sattar et al. [10] test the use of both RADAR (Radio Detection and Ranging) and SON AR to classify fiv e dif ferent road surface types. They test both 24GHz RAD AR and 40kHz SON AR separately , as well as combined. They also consider the use of 150GHz RAD AR. They show accuracies of upward of 80% using both RAD AR and SON AR, with the combined results going up to 92% accuracy . While our work only considers on-road surfaces, this work also included “grass” as one of the road surface types. Fig. 1. An image showing the mounting of the eR TIS SONAR sensor (bottom left of the sensorbox) on a vehicle. The sensorbox is mounted on the back of the vehicle, looking behind. I I I . M E T H O D O L O G Y W e consider both the surface material classification, as classi- fication problems, omitting the added complexities of locating the damages, or assessing their se verity . W e always vie w this through the lens of a multilabel classification problem, since it is possible that multiple damages are captured in one sample, as well as multiple materials, when a road changes from one material type to another . W e consider a variety of models, including a linear model (logistic regression), a support vector machine, a random forest, a gradient-boosted tree, a decision tree, and a multi-layer perceptron. Additionally , giv en its additional complexity , we also consider a CNN (Conv olutional Neural Network)-based classifier for the damage detection problem, but not for the material identification problem. This giv es us a set of models cov ering a wide range of comple xities, allowing us to select the simplest model that fits the data well. The SON AR data collected for this paper was collected using an eR TIS (Embedded Real-T ime Imaging) SON AR array sensor [11], [12], [13]. The eR TIS is a fully embedded 32- channel 3D in-air SONAR sensor , capable of gathering data under harsh sensing conditions, such as in the presence of fog or dust, where visual sensing methods may struggle. Figure 1 sho ws the mounting of the sensor on the car . Note that the sensor is mounted on its side. This is further clarified in Figure 2, showing a side view and a top vie w of the mounting of the SON AR sensor in a diagram. The range of elev ation angles is indicated in red, while the range of azimuth angles is indicated in blue. Due to the way the sensor is mounted, these angles actually dif fer from what would intuitiv ely be expected, with the elev ation angle sweeping from INTELLIGENT R OAD CONDITION MONITORING USING 3D IN-AIR SONAR SENSING 3 SIDE VIEW DRIVING DIRECTION UP TOP VIEW DRIVING DIRECTION LEFT SONAR SONAR RIGHT Fig. 2. A diagram showing how the SON AR sensor was mounted to the vehicle. The range of elev ation angles is indicated in red, and the range of azimuth angles in blue. left to right from the driv er’ s point of view , and the azimuth angle sweeping up and do wn. A. Dataset W e used a ne wly collected dataset. This dataset contains cam- era images, raw PDM (Pulse Density Modulation) datastreams recorded by the SON AR sensor, and labels for each camera image. Other datapoints were also collected, but excluded from this w ork. Camera images were only used for labeling, and were not used in the actual detection or classification of damages. First, both data types (SONAR recording and labels) are time-synchronized. Since these are sampled at dif ferent sample rates, we pick one data type (labels) as our reference point, and then select the temporally closest samples from the other sensors. Note that this may include a sensor sample that was recorded before the camera image used to label the damage. Next, we asses the synchronization of each of the samples. If the time-difference between the labels and the other modalities is too high (Above 150ms in our case), the sample is discarded. Given that our SON AR sensor operates at a sampling frequency of around 10Hz, this mainly occurs in cases where one sensor was disabled or failed. Follo wing synchronization, the material types present in a sample are determined, and this annotation is added to the dataset. Note that, during the annotation process, labels were assigned which include both the material, and the damage type. F or instance: “ Asphalt - Alligator Crack” indicates that an alligator crack was found in an asphalt surface. The different material types considered are shown in Figure 5, while the different types of damages are shown in Figure 4. Following this, similar damages in different material types are combined into one damage type, shared among the different materials. For instance, the labels “ Asphalt - Alligator Crack” and “Concrete - Alligator Crack” are combined into one label “ Alligator Crack”, discarding the material type, which was determined and separately annotated earlier . After this, we perform a minimal filtering of the classes considered, eliminating classes that we don’t have enough data for . Any classes that we hav e less than 100 samples for are eliminated, to reduce dataset imbalanbce, and prev ent classes with very few samples 0 1000 2000 3000 4000 5000 Number of Samples [/] F old 0 F old 1 F old 2 F old 3 F old 4 F old 5 F old 6 F old 7 F old 8 F old 9 513 513 513 513 513 513 513 513 513 513 4386 4386 4386 4386 4387 4387 4387 4387 4387 4387 488 488 488 488 487 487 487 487 487 487 Dataset Size T est Set T raining Set V alidation Set Fig. 3. The size of each dataset, for each fold in the dataset. from affecting the overall accuracy . W e keep this threshold as low as possible, since the utility of a damage classification systems is reduced significantly as the number of detectable damages is reduced. Of course, this needs to be weighed against overall classifier performance, giv en that the collection and labelling of additional samples is costly . Additionally , it must be noted that the data was annotated based on camera images. This means that the distribution of labels will likely be skewed to wards damages that are easily visually detectable, while classes which are harder to detect visually may be underrepresented due to the labelling process. Finally , we split the dataset. W e first isolate 10% of the dataset as a held out test set. The remaining 90% of the dataset is split into 10 folds, each with a 90%-10% split between training and validation data, respectively . Splitting the data into separate folds is done in a stratified fashion, to ensure each fold maximally represents the dataset as a whole. Stratification is done based on a “fake” feature, that represents a sample’ s material or damage type, depending on the problem at hand. Since we treat the classification as a multilabel classification problem, each combination of damages or materials is treated as a separate value for this feature. The feature is built by taking the one-hot encoded vector of the present labels, and multiplying each bit with increasing po wers of two, essentially interpreting the one- hot encoded vector as a binary-encoded integer . This integer value is used for stratification. The size of the different datasets (train, test and validation) across the cross-v alidation folds is shown in Figure 3. The distribution of labels in the dataset is shown in Figure 4 for the damage types, and Figure 5 for the materials. Note that labels aren’t mutually exclusi ve in both figures, i.e., one sample can be gi ven more than one label, so the number of samples per class need not sum to the total number of samples. B. Pr epr ocessing T o prepare the data for consumption by the machine learning models, we employ two different preprocessing pipelines. One INTELLIGENT R OAD CONDITION MONITORING USING 3D IN-AIR SONAR SENSING 4 1 0 0 1 0 1 1 0 2 1 0 3 Occur r ences [/] L oose Stones Open T ransversal Joint Subsidence Open L ongitudinal Joint F raying L ongitudinal Crack Missing Material Alligator Crack T ransversal Crack Label Distribution Histogram Fig. 4. The distribution among different damages in the dataset. Note the logarithmic X-axis. 1 0 0 1 0 1 1 0 2 1 0 3 Occur r ences [/] Element Concr ete Asphalt Label Distribution Histogram Fig. 5. The distribution among different materials in the dataset. Note the logarithmic X-axis. is used for the CNN model, while the other is used for all other models. W e make use of a separate pipeline for the CNN model, since it is capable of making use of the spatial relation between closely grouped data points, while remaining independent of the overall position of each data point in the energyscape. The pipeline used for the CNN model produces an image and we will thus refer to it as the image preprocessing pipeline, while the other preprocessing pipeline produces a vector , and we will refer to it as the vector preprocessing pipeline. W e graphically sho w the dif ferent steps in both pipelines in Figure 6. 1) V ector prepr ocessing pipeline: When preprocessing the data, we make use of the usual signal processing pipeline that is used in conjunction with the eR TIS SON AR sensor [13]. This pipeline consists of a PDM decoder , a matched filter , a delay-and-sum beamformer , an en velope detector and a clean- up stage. The clean-up stage works similar to the constant false-alarm rate detection algorithm. This signal processing pipeline generates a matrix representing the intensity of the return received from a particular angle (azimuth and elev ation) and range. This matrix is referred to as an ener gyscape. Next, we subtract the mean energyscape across all training samples from each of the ener gyscapes, to emphasize the dif ference between indi vidual ener gyscapes. W e apply a max-pooling filter to the energyscape along the range direction to reduce the dimensionality of the samples. Finally , we flatten the energyscape into a v ector by concatenating each of the ro ws, and use Principal Component Analysis (PCA) to reduce its dimensionality . It is this set of principal components of the flattened ener gyscape that is presented as input data to the models. 2) Image prepr ocessing pipeline: For the CNN model, we employ a dif ferent preprocessing pipeline, to allow the CNN to exploit the data locality inherent in the image- based representation of an ener gyscape. First, we split the signal processing pipeline for the raw SONAR data into two separate pipelines. The first includes PDM decoding and matched filtering, while the second includes beamforming, en velope detection and clean-up. W e start, as we did for the other models, by performing PDM decoding and matched filtering. Howe ver , before we pass the filtered signals to the beamformer , we include a data augmentation to improve the range-in variance of the model. W e achiev e this by shifting the filtered signals forw ard or backward by a random amount of samples. Samples that are shifted out are replaced by zeros. T o ensure that we don’t lose any important information, this time- shift is applied to all 32 microphone signals equally . After this augmentation, we perform beamforming, env elope detection and ener gyscape clean-up. After signal processing, similar to the other models, we subtract the mean ener gyscape, computed across all energyscapes. W e also compute the mean and standard de viation of an indi vidual energyscape, and use this to normalize the indi vidual energyscape by subtracting the mean and dividing by the standard deviation. After normalization, we randomly flip the energyscape along both the horizontal and vertical directions, before passing the (possibly flipped) energy scape to the CNN model as input. C. Models Most of the models we use were taken from the Scikit-learn python library [14], with the exception of the gradient boosted trees, which were taken from the xgboost python library [15]. The CNN was implemented using PyT orch [16]. 1) CNN: The CNN architecture is based on the overall skeleton laid out in N AS-Bench-101 [17]. Data is first fed into a con volutional stem, which increases the number of channels. Follo wing this, the architecture consists of a sequence of residual blocks, with do wnsampling modules interspersed at regular intervals. Follo wing this, global a verage pooling is used to reduce the feature maps to a single vector , which is then fed through a linear layer to compute the logits of our classifier network. All con volutions are executed using Con volution-BatchNorm-ReLU-Dropout order . INTELLIGENT R OAD CONDITION MONITORING USING 3D IN-AIR SONAR SENSING 5 Matched Filter Beamforming Envelope Detection Clean-up Time-shift Normalization Matched Filter Beamforming Envelope Detection Clean-up Subtraction of Mean Energyscape Range-Wise Pooling Flatten PCA (incl. Normalization) Subtraction of Mean Energyscape Flip PDM Decoder PDM Decoder 32x Fig. 6. The vector preprocessing pipeline (top, blue) and the image preprocessing pipeline (below , green). Both pipelines are fed using time-domain, PDM- encoded SONAR recordings for all channels, and output either a feature vector or energyscape, depending on the pipeline. Conv ReLU BatchNorm Dropout Conv ReLU BatchNorm Dropout Conv ReLU BatchNorm Dropout + Conv ReLU BatchNorm Dropout MaxPool2D Conv ReLU BatchNorm Dropout Conv ReLU BatchNorm Dropout + Global Avg. P . Linear Fig. 7. A schematic representation of the used neural network architecture. The stem conv olution is marked in green in the top-left, the schematic shows two residual modules marked in red, and one downsampling module with a blue background in the top-right. Finally , the global average pooling and linear layers are included at the bottom-right. Figure 7 sho ws a schematic representation of the CNN ar- chitecture we used. The number of layers in the architecture is determined based on the uniform scaling principle introduced by EfficientNet [18], with some slight modifications. Follo wing [18], the size of our architecture is determined by four hyperparameters: α, β , γ and ϕ . From these, we compute three ratios: the depth ratio ( α ϕ ), width ratio ( β ϕ ), and the resolution ratio ( γ ϕ ). W e then take a set of baseline v alues, manually selected for the ϕ = 0 case, and multiply each of the ratios with these baseline values to obtain the number of layers (from the depth ratio, d = d ϕ =0 · α ϕ ), the initial number of channels (from the width ratio, w = w ϕ =0 · β ϕ ) and the interval between downsampling layers (from the resolution ratio, r = r ϕ =0 · γ ϕ ). The number of residual blocks is determined by the number of layers, minus one, to account for the stem con v olution. After ev ery “downsampling interval” residual blocks, we insert a do wnsampling module. A downsampling module consists of a Con v-Bn-ReLU-Dropout module followed by a pooling module. The con volution opera- tion doubles the number of channels, while the pooling module halves the spatial resolution. Note that the do wnsampling interval affects both the number of channels and the spatial resolution of the featuremaps. The number of channels can be controlled individually using the width ratio. Thus, the downsampling interval primarily serv es to control the spatial resolution of the feature maps. I V . E X P E R I M E N T S The hyperparameters we used were kept constant across all experiments. W e use PyT orch Lightning 2.6.0, PyT orch 2.10.0, Numpy 2.4.1, Scikit-learn 1.8.0 and XGBoost 3.1.3. A. Data collection par ameters Some parameters of the dataset were set during data collection. Every SON AR sample in the dataset consists of a recording with 163840 samples sampled at a rate of 4.5MHz from all 32 microphones, encoded using PDM. The emitted signal by the sensor is a hyperbolic chirp with a frequency ranging from 20kHz to 50kHz. Thic chirp is generated with a sample rate of 450 kHz, and has a duration of 2.5ms. B. Pr epr ocessing parameters Since the preprocessing pipelines for the vector -based models differ from that of the CNN model, we discuss each pipeline separately here. The preprocessing of the SONAR data is done on the SONAR sensor using the on-board compute unit. This limits the possible computational complexity of the preprocessing pipeline, particularly af fecting operations lik e INTELLIGENT R OAD CONDITION MONITORING USING 3D IN-AIR SONAR SENSING 6 beamforming etc., necessitating a trade-off between prepro- cessing complexity and sample rate. For both pipelines, beamforming is done in 91 directions, with both azimuth and elev ation ranging from -90 to +90 degrees, essentially covering the full frontal hemisphere. 1) V ector prepr ocessing pipeline: Max-filtering the ener - gyscapes to reduce their range-wise dimension is done with a kernel size of 5. The PCA analysis is configured to whiten the data first, before projecting down to 256 components. 2) Image pr epr ocessing pipeline: For the CNN preprocess- ing pipeline, we time-shift the signals follo wing PDM decod- ing and matched filtering by a random amount of samples, with a maximum of 45 samples. At this point in the pipeline, the sample rate of the signals is 450kHz, thus, a delay of 45 samples corresponds to a time delay of 100µs. Following beamforming, env elope detection and clean-up, we flip the re- ceiv ed energyscapes horizontally (along the azimuth/elev ation dimension) and vertically (along the range dimension), with each flip occurring with a probability of 50%. C. Model P arameter s W ith the exception of the CNN model, all models were trained in a one-vs-rest setting to enable multi-label classification. Except when mentioned otherwise, we leave model parameters to their default values. 1) Logistic Re gression: As mentioned at the start of sec- tion III, for each experiment, we make use of three different models: A logistic re gression model, a support vector machine, a decision tree, a random forest, a multi-layer perceptron, a gradient-boosted tree and, for the damage detection problem, a CNN. For the logistic regression model, we set C (The in verse of the regularization strength) to 0.01, and used an L2 penalty for regularization. W e fitted both slope and intercept. T o mitigate issues caused by dataset imbalance, we utilized the “class weight” option in scikit-learn, by setting it to “balance”. This has the effect of weighting the loss associated with each sample based on Equation 1, which weighs samples based on the number of occurrences that a sample belongs to. w i = S C ∗ P S j =0 Y [ j, i ] (1) In this equation, w i is the weight of the i-th class, the feature matrix, has a dimension of S × F , where S is the number of samples, and F the number of features. Y Is the label matrix, with a dimension of S × C , where C is the number of classes. W e performed each experiment twice, with the seed for random number generation set to 0 or 1, depending on the experiment to account for variations resulting from different initializations of the optimization algorithms. 2) Support V ector Machine: For the Support V ector Ma- chine (SVM)-based models, we set C (in versely related to the regularization strength) to 1, an L2 penalty is used for regularization, we use the same class weighting we used for the logistic regression, shown in equation 1. Our SVM uses a radial basis function (RBF) k ernel. W e set the γ parameter in our RBF kernel to the value sho wn in equation 2, this corresponds to the “scale” setting in Scikit-learn. γ = 1 F · V ar [ X ] (2) The variables in Equation 2 correspond to those in Equa- tion 1. Similar to the logistic regression setting, each exper - iment was performed twice. The SVMs were also trained in the same one-vs-rest setting as the logistic regression models. 3) Decision T r ee: The parameters for the decision tree model were largely left to their default values in sci-kit learn, with the e xception of the “class weight” parameter , which was set to “balanced”, similar to the logistic regression model. Similar to the other models, decision trees were also trained in a one-vs-rest setting. 4) Random F or est: Similar to before, the random forest model’ s parameters were largely left to their default settings, with the exception of the “class weight” parameter , which was simiarly set to balanced. 5) Gradient-Boosted T r ees: As we did with the random forest model, we left the parameters for the gradient-boosted tree models on their default settings. 6) Multi-Layer P er ceptr on: The multi-layer perceptron consists of an input layer , three hidden layers with 256 neurons each, and an output layer performing the classification. An L2 regularization was applied, using a weight of 1 × 10 − 5 , the network was trained in minibatches of 256 samples over 250 iterations with a fixed learning rate of 0.01 using the Adam optimizer [19]. 7) CNN: Our CNN models were trained with a batch size set to 1024, for 1000 epochs, with a learning rate of 0.1. After 100 and 800 epochs of training, the learning rate was reduced to 0.01 and 0.001, respecti vely . Gradients were clipped to hav e a norm with a magnitude of 20, and accumulated ov er 4 batches before being applied. The neural network is regularized using an L2 weight decay multiplied by a factor of 1 × 10 − 7 . W e used LeakyReLU activ ations everywhere instead of standard ReLU (Rectified Linear Unit) activ ations. Across the network, batchnorm was applied with a momentum of 0.1. The model size parameters α, β , γ and ϕ were set to α = 1 . 2 , β = 1 . 1 , γ = 1 . 15 and ϕ = 2 . The model size parameters are set to d ϕ =0 = 3 , w ϕ =0 = 16 , and r ϕ =0 = 1 . The stem con volution had a kernel size of 7 × 11 and was applied with a stride of 3 by 5. Dropout was enabled in the stem conv olution, with a probability of 0.1. Both conv olution operations in a residual block were applied with a kernel size of 9 × 5 . In the downsampling module, the con v olution operation had a kernel size of 1 × 1 , a stride of 1 in both dimensions, and uses no padding. Dropout was disabled in the downsampling module, while the max-pooling operation had a kernel size of 3 × 3 , and was applied with a stride of 2, with zero padding in both dimensions. All experiments were carried out on a virtual machine with a 48 (virtual) CPU cores of an AMD EPYC 9654 CPU and an NVIDIA L40S GPU with 48GB of VRAM. V . R E S U L T S In this section, we present the results obtained in the two dif- ferent problem domains, by each of the models. W e ev aluated INTELLIGENT R OAD CONDITION MONITORING USING 3D IN-AIR SONAR SENSING 7 each of the models on 2 parameters: The obtained F1 score, and the obtained Cohen’ s Kappa coefficient, denoted as κ . The F1 and Cohen’ s Kappa scores were av eraged across classes based on the number of true samples of each class. An in- depth discussion of these results can be found in Section VI. A. Material Classification In T able I, we showcase the performance of all different models on the material classification task. Each model was trained twice with a different random initialization, and was trained using 10-fold cross validation. Performance numbers were averaged across all folds and random initializations. B. Damage Detection Similar to the pre vious section , we report the F1 and κ score for all models, averaged across random intializations and dataset folds in T able II. As noted before, in the damage de- tection problem, we additionally considered a CNN (ResNet) model. V I . D I S C U S S I O N W e used the same set of machine learning models in both the surface material classification and damage detection settings. Comparing the results in T able I and T able II, we can see some similar trends appearing. First, we notice that the logistic regression model does significantly worse than the other models on both problems. This is to be expected, giv en its linear nature, compared to the other non-linear models. Next, we also note that for most models, standard deviations in both κ and F1 score tend to be quite lo w , ranging from 0.5% up to 2%, with the exception of the ResNet used for damage detection. The higher standard deviation observed with the ResNet results is likely a result of the o verall worse performance, as well as the complicated training procedure. W e also observe that while training and validation data tend to be quite close together, there is somewhat of a gap with the performance on the test set. The split between the test and training+validation set was made in a non-stratified fashion, using a uniform distribution across samples. This may indicate a bias in the sampling strategy , leading to a test and training+v alidation set with dif ferent label distributions, leading to a performance gap between both. An alternati ve explanation is that the ov erall machine learning pipeline, and the hyperparameters for the models were ov erfitted to the training+validation set. This seems unlikely for a number of reasons. First of all, the difference exists consistently across different models. Given the limited amount of hyperparameter tuning that was done for different models, it seems unlikely that all models would ha ve overfit to the same degree. Ad- ditionally , the preprocessing pipeline used for the SON AR data also doesn’t contain any steps that may benefit one type of damage over another . The difference is also present in both the CNN model, and the other models, which use different preprocessing pipelines, which makes a bias in the preprocessing pipeline a less likely cause. Across models and problems, we can see that the models tend to perform slightly better on the validation set than on the training set. This is to be e xpected, as the optimal classification threshold was determined using Y ouden’ s J- statistic on the validation set, rather than the training set, to encourage generalization. This classification threshold was then used to ev alute on all datasets (training, v alidation and test). A. Material Classification In this section, we discuss the results on the material classifi- cation problem. When looking at the statistics in T able I, we can see that most models obtained F1 scores on the test set approaching 90%, with κ scores approaching 80%. This represents a near- perfect classification, and shows that determining the road surface type using ultrasound sensing is possible, in-line with existing literature, such as [9]. From the poor performance of the logistic regression model on this dataset, we can also conclude that a linear model lacks the necessary comple xity to fully capture the non-linear rela- tionship between PCA components deriv ed from ener gyscapes, and the type of road surface present in a measurement. W e note that there is a large gap between the κ score obtained on the test set, and the κ score obtained on the validation and training set. W e surmise that this is likely the case because the optimal classification threshold for each model was determined based on the validation dataset, leading to a classifier that generalizes well from the training set to the validation set, b ut likely does not perform as well on the test set. W e also note that the test set is notably smaller than the training set (513 samples versus 4386 samples). Since some classes are quite rare, it is likely they only occur once or twice in the test set. Thus, a single missed classification in one of these rarer classes is likely to heavily affect the κ score for that class, and consequently for all classes on the test set, with this effect being much less pronounced on the training set, since a single sample represents a much smaller part of the training set. This further highlights the importance of gathering a sufficiently large dataset, both for training and ev aluation to ensure the system can both be built and e valuated well. Figure 8 sho ws a confusion matrix for the gradient boosted trees model trained for material classification. B. Damage Classification When looking at the damage classification results, we imme- diately see that the models performed significantly worse than on the material classification task. This is not unexpected, giv en the more subtle and complex nature of detecting and identifying road surface damages, compared to just identifying the surface type. Similar to before, we observe that the linear model does significantly worse than the others, clearly showing the need for a non-linear model. Counter to expectations, the CNN (ResNet) model did significantly worse than ev ery other model, with the exception of the F1 score on the training set, where it barely managed to match a linear model in performance. W e attribute this poor performance to the formatting of the image data fed to the INTELLIGENT R OAD CONDITION MONITORING USING 3D IN-AIR SONAR SENSING 8 T ABLE I T H E R E SU LT S O F T H E D I FF ER E N T M O D EL S O N T H E M ATE R I AL C L A S SI FI C A T I O N P RO B L EM , A V E R AG E D AC RO S S R A N D OM I N I T A L I Z A T I O NS A N D F O L D S . A L L N U MB E R S A R E G I V EN A S µ ± σ . T H E B E ST R E S ULT I N E AC H C O L U MN I S M A RK E D I N B O LD . Model T est κ V alidation κ T raining κ T est F1 V alidation F1 T raining F1 Logistic Regression 40 . 34% ± 1 . 22% 48 . 00% ± 3 . 18% 45 . 18% ± 0 . 68% 71 . 77% ± 0 . 90% 75 . 73% ± 1 . 73% 74 . 66% ± 0 . 77% Decision Tree 76 . 82% ± 1 . 90% 92 . 03% ± 1 . 23% 89 . 49% ± 0 . 48% 89 . 03% ± 0 . 63% 95 . 93% ± 0 . 47% 94 . 88% ± 0 . 16% Random Forest 76 . 84% ± 0 . 69% 93 . 25% ± 1 . 46% 91 . 16% ± 0 . 42% 89 . 44% ± 0 . 26% 96 . 64% ± 0 . 54% 95 . 82% ± 0 . 19% Support V ector Machine 74 . 67% ± 1 . 66% 89 . 97% ± 2 . 59% 88 . 03% ± 2 . 02% 88 . 18% ± 1 . 71% 95 . 02% ± 1 . 73% 94 . 28% ± 1 . 74% Multi-Layer Perceptron 77 . 52% ± 2 . 25% 91 . 95% ± 2 . 21% 89 . 42% ± 1 . 34% 89 . 12% ± 1 . 18% 95 . 95% ± 1 . 08% 95 . 03% ± 0 . 76% Gradient-Boosted Trees 78 . 10 % ± 1 . 40 % 93 . 58 % ± 1 . 83 % 91 . 37 % ± 1 . 07 % 89 . 55 % ± 0 . 60 % 96 . 79 % ± 1 . 01 % 95 . 95 % ± 0 . 51 % T ABLE II T H E R E SU LT S O F T H E D I FF ER E N T M O D EL S O N T H E D AM A GE D E T EC T I ON P RO B L E M , A V E RA GE D AC RO S S R A N D OM I N I T A L I Z A T I O NS A N D F O L D S . A L L N U MB E R S A R E G I V EN A S µ ± σ . T H E B E ST R E S ULT I N E AC H C O L U MN I S M A R K ED I N B O LD . Model T est κ V alidation κ T raining κ T est F1 V alidation F1 T raining F1 Logistic Regression 25 . 88% ± 1 . 03% 30 . 14% ± 2 . 35% 27 . 88% ± 1 . 25% 49 . 16% ± 0 . 58% 52 . 69% ± 1 . 17% 51 . 20% ± 0 . 31% Decision Tree 54 . 98% ± 1 . 51% 65 . 23% ± 2 . 36% 61 . 34% ± 0 . 86% 68 . 42% ± 0 . 78% 75 . 76% ± 1 . 60% 73 . 18% ± 0 . 31% Random Forest 57 . 87% ± 1 . 46% 65 . 96% ± 2 . 03% 62 . 11% ± 1 . 01% 70 . 69% ± 0 . 69% 76 . 40% ± 1 . 39% 73 . 85% ± 0 . 49% Support V ector Machine 52 . 64% ± 1 . 25% 59 . 90% ± 1 . 63% 56 . 56% ± 0 . 81% 68 . 33% ± 1 . 01% 73 . 15% ± 1 . 40% 70 . 94% ± 0 . 91% Multi-Layer Perceptron 54 . 15% ± 1 . 82% 63 . 54% ± 2 . 15% 59 . 44% ± 1 . 41% 69 . 25% ± 0 . 95% 75 . 83% ± 1 . 25% 73 . 22% ± 0 . 62% Gradient-Boosted Trees 58 . 00 % ± 2 . 38 % 66 . 66 % ± 2 . 66 % 62 . 52 % ± 2 . 10 % 71 . 74 % ± 1 . 15 % 77 . 75 % ± 1 . 54 % 75 . 06 % ± 0 . 90 % ResNet 2 . 57% ± 1 . 67% 5 . 50% ± 2 . 18% 19 . 70% ± 4 . 16% 28 . 68% ± 8 . 03% 31 . 59% ± 8 . 35% 50 . 30% ± 2 . 08% Negative P ositive P r edicted Negative P ositive Gr ound- T ruth 125 (24.37%) 29 ( 5.65%) 18 ( 3.51%) 341 (66.47%) Asphalt (359/513) F 1 : 9 3 . 5 5 % , : 7 7 . 7 4 % Negative P ositive P r edicted Negative P ositive Gr ound- T ruth 379 (73.88%) 3 ( 0.58%) 45 ( 8.77%) 86 (16.76%) Concr ete (131/513) F 1 : 7 8 . 1 8 % , : 7 2 . 5 0 % Negative P ositive P r edicted Negative P ositive Gr ound- T ruth 475 (92.59%) 0 ( 0.00%) 15 ( 2.92%) 23 ( 4.48%) Element (38/513) F 1 : 7 5 . 4 1 % , : 7 3 . 9 5 % Confusion Matrix Fig. 8. A confusion matrix of the test set for the gradient boosted trees classifier, trained on the first fold of the dataset for material classification, using a random seed of 0. CNN model. This is showcased in Figure 9 in [20]. The figure shows interleaved areas of intensity for single reflections, due to the interleav ed nature of the azimuth/elev ation direction. Because azimuth and elev ation are collapsed into a single matrix dimension, a discontinuity occurs whenev er the minor index reaches its end v alue. This discontinuity violates an underlying assumption of the conv olution operation: That adjacent samples are close in some sense of locality (temporal, spatial, etc.). It is likely this violated assumption that was responsible for the poor performance of the CNN model. Figure 9 sho ws a confusion matrix for the gradient boosted trees model trained for damage classification. V I I . F U T U R E W O R K When assembling the dataset used in this paper , location information was not used to inform the splitting of samples between the test, validation and training set. Given the limited size of the dataset, this would likely create further class im- balance, and make it e ven more dif ficult to obtain satisfactory classification performance. W e should also note that, under imperfect data collection conditions, where different sensors can fail, the inclusion of a second (GPS) sensor can reduce the a vailable number of samples, since these samples are the subset of all samples where both the SONAR and GPS sensors are operational. Despite this, the current method of splitting the dataset can result in data leakage, by samples taken close together in physical space, being split between the training, validation and test sets. A better approach would be to cluster samples based on the road the y were taken from, and then splitting the roads between the different datasets, to ensure that all data from a single road ends up in the same set. This would improv e the generalizibility of the models by ensuring that they also function appropriately on streets not seen before. In this work, we made use of a simple, linear delay-and- sum beamforming algorithm. While this works and has been used in the past succesfully in numerous applications [21], [22], [23], more advanced beamforming algorithms with better capabilities exist. Examples of this include the Minimum V ari- ance Distortionless Response (MVDR) beamformer [24], or the Delay-Multiply-and-Sum (DMAS) [25] beamforming al- INTELLIGENT R OAD CONDITION MONITORING USING 3D IN-AIR SONAR SENSING 9 Negative P ositive P r edicted Negative P ositive Gr ound- T ruth 331 (64.52%) 52 (10.14%) 23 ( 4.48%) 107 (20.86%) Alligator Crack (130/513) F 1 : 7 4 . 0 5 % , : 6 4 . 0 1 % Negative P ositive P r edicted Negative P ositive Gr ound- T ruth 431 (84.02%) 20 ( 3.90%) 13 ( 2.53%) 49 ( 9.55%) F raying (62/513) F 1 : 7 4 . 8 1 % , : 7 1 . 1 3 % Negative P ositive P r edicted Negative P ositive Gr ound- T ruth 365 (71.15%) 51 ( 9.94%) 26 ( 5.07%) 71 (13.84%) L ongitudinal Crack (97/513) F 1 : 6 4 . 8 4 % , : 5 5 . 4 6 % Negative P ositive P r edicted Negative P ositive Gr ound- T ruth 498 (97.08%) 5 ( 0.97%) 0 ( 0.00%) 10 ( 1.95%) L oose Stones (10/513) F 1 : 8 0 . 0 0 % , : 7 9 . 5 2 % Negative P ositive P r edicted Negative P ositive Gr ound- T ruth 340 (66.28%) 64 (12.48%) 20 ( 3.90%) 89 (17.35%) Missing Material (109/513) F 1 : 6 7 . 9 4 % , : 5 7 . 3 6 % Negative P ositive P r edicted Negative P ositive Gr ound- T ruth 473 (92.20%) 9 ( 1.75%) 16 ( 3.12%) 15 ( 2.92%) Open L ongitudinal Joint (31/513) F 1 : 5 4 . 5 5 % , : 5 2 . 0 1 % Negative P ositive P r edicted Negative P ositive Gr ound- T ruth 477 (92.98%) 10 ( 1.95%) 11 ( 2.14%) 15 ( 2.92%) Open T ransversal Joint (26/513) F 1 : 5 8 . 8 2 % , : 5 6 . 6 7 % Negative P ositive P r edicted Negative P ositive Gr ound- T ruth 469 (91.42%) 13 ( 2.53%) 8 ( 1.56%) 23 ( 4.48%) Subsidence (31/513) F 1 : 6 8 . 6 6 % , : 6 6 . 4 8 % Negative P ositive P r edicted Negative P ositive Gr ound- T ruth 169 (32.94%) 102 (19.88%) 22 ( 4.29%) 220 (42.88%) T ransversal Crack (242/513) F 1 : 7 8 . 0 1 % , : 5 2 . 3 4 % Confusion Matrix Fig. 9. A confusion matrix of the test set for the gradient boosted trees classifier, trained on the first fold of the dataset for damage classification, using a random seed of 0. gorithms. Such algorithms can results in cleaner data following the beamforming process, b ut also increase the computational load on the on-board compute units. T o improve the perfomance of the con v olutional neu- ral netw orks, two possible solutions prev ent themself. The first solution in volv es re-organizing the energyscapes into 3- dimensional tensors, and using 3-dimensional conv olutions for the recognition of damages. While this would resolve the issues presented, it would also significantly increase the computational comple xity of the neural network. T aking into account that the CNN is already significantly slower than the other models to train, this trade-off may not be worthwhile, with the CNN already taking sev eral days to train, compared to se veral minutes the other models. Alternati vely , the signal processing pipeline could be altered to support 2-dimensional beamforming along the ele v ation direction. While this would resolve the issue, it also has the disadvantage that some information is lost with regards to the size and location of the damages, which may be an informati ve feature for the detection system. V I I I . C O N C L U S I O N In this paper , we showcased the viability of 3D in-air SON AR for the monitoring of road surfaces as part of a pav ement management system. W e show that 3D in-air SONAR sensing is a viable approach for detecting the material used for the road surface, with our results being in-line with the pre-existing results in the scientific literature. For damage detection, we show promising results, exceeding our own expectations, but falling short of what is required for an industrially viable solution. Based on these results, we argue that 3D in-air SON AR sensing is a worthwhile modality for inclusion in an opportunistic sensing system as part of a PMS, though further research is needed to improve damage detection performance. W e highlight se veral av enues for further research, including the inclusion of additional sensor data, improv ed signal pro- cessing pipelines, and impro ved machine learning pipelines. F U N D I N G This work was realized in imec.ICON Hybrid AI for predic- tiv e road maintenance (HAIR O AD) project, with the financial support of Flanders Inno v ation & Entrepreneurship (VLAIO, project no. HBC.2023.0170). R E F E R E N C E S [1] H. Shon, C.-S. Cho, Y .-J. Byon, and J. Lee, “ Au- tonomous condition monitoring-based pa vement man- agement system, ” Automation in Construction , vol. 138, p. 104 222, 2022, I S S N : 0926-5805. D O I : https : / / doi . org/ 10.1016 /j .autcon .2022 .104222 [Online]. A vailable: https : / / www . sciencedirect . com / science / article / pii / S0926580522000954 [2] W . Hu, S. W inter , and K. Khoshelham, “Forecasting fine-grained sensing coverage in opportunistic vehicular sensing, ” Computers, En vir onment and Urban Systems , vol. 100, p. 101 939, 2023, I S S N : 0198-9715. D O I : https: / / doi . org / 10 . 1016 / j . compen vurbsys . 2023 . 101939 [Online]. A vailable: https : / / www . sciencedirect . com / science/article/pii/S0198971523000029 [3] L. Trinh, A. Anwar , and S. Mercelis, “Improving classi- fication of road surface conditions via road area extrac- tion and contrasti ve learning, ” in IECON 2024 - 50th Annual Conference of the IEEE Industrial Electr onics Society , 2024, pp. 1–7. D O I : 10 . 1109 / IECON55916 . 2024.10905278 [4] S. Shim, J. Kim, S.-W . Lee, and G.-C. Cho, “Road sur - face damage detection based on hierarchical architecture using lightweight auto-encoder network, ” Automation in Construction , vol. 130, p. 103 833, 2021, I S S N : 0926- 5805. D O I : https : / / doi . org / 10 . 1016 / j . autcon . 2021 . 103833 [Online]. A v ailable: https:/ /www . sciencedirect. com/science/article/pii/S0926580521002843 INTELLIGENT R OAD CONDITION MONITORING USING 3D IN-AIR SONAR SENSING 10 [5] H. Li, R. Chen, N. Ritha, and J. W ang, “Research on intelligent detection system and method for road surface damage utilizing tire noise, ” Measur ement , vol. 253, p. 117 650, 2025, I S S N : 0263-2241. D O I : https : / / doi . org / 10 . 1016 / j . measurement . 2025 . 117650 [Online]. A vailable: https : / / www . sciencedirect . com / science / article/pii/S0263224125010097 [6] D. Dong and Z. Li, “Smartphone sensing of road surface condition and defect detection, ” Sensors , vol. 21, no. 16, 2021, I S S N : 1424-8220. D O I : 10 . 3390 / s21165433 [Online]. A vailable: https : / / www . mdpi . com / 1424 - 8220/21/16/5433 [7] D. Kim, R. Kim, J. Min, and H. Choi, “Initial freeze–thaw damage detection in concrete using two- dimensional non-contact ultrasonic sensors, ” Construc- tion and Building Materials , vol. 364, p. 129 854, 2023, I S S N : 0950-0618. D O I : https : / / doi . org / 10 . 1016 / j . conbuildmat . 2022 . 129854 [Online]. A vailable: https : / / www . sciencedirect . com / science / article / pii / S0950061822035103 [8] R. Shi et al., “Cnn-transformer for visual-tactile fusion applied in road recognition of autonomous vehicles, ” P attern Recognition Letters , vol. 166, pp. 200–208, 2023, I S S N : 0167-8655. D O I : https : / / doi . org / 10 . 1016 / j . patrec . 2022 . 11 . 023 [Online]. A vailable: https : / / www . sciencedirect . com / science / article / pii / S016786552200352X [9] M.-H. Kim, J. Park, and S. Choi, “Road type iden- tification ahead of the tire using d-cnn and reflected ultrasonic signals, ” International J ournal of Automotive T echnology , vol. 22, no. 1, pp. 47–54, 2021, I S S N : 1976- 3832. D O I : 10 . 1007 / s12239 - 021 - 0006 - 6 [Online]. A vailable: https://doi.org/10.1007/s12239- 021- 0006- 6 [10] S. Sattar , S. Li, and M. Chapman, “Dev eloping a near real-time road surface anomaly detection approach for road surface monitoring, ” Measurement , vol. 185, p. 109 990, 2021, I S S N : 0263-2241. D O I : https : / / doi . org / 10 . 1016 / j . measurement . 2021 . 109990 [Online]. A vailable: https : / / www . sciencedirect . com / science / article/pii/S0263224121009192 [11] D. Laurijssen, W . Jansen, A. Aerts, W . Daems, and J. Steckel, Ruggedized ultrasound sensing in harsh conditions: Ertis in the wild , 2025. arXiv: 2509 .10029 [eess.SY] . [Online]. A vailable: https://arxiv .org/abs/ 2509.10029 [12] R. K erstens, D. Laurijssen, and J. Steckel, “Ertis: A fully embedded real time 3d imaging sonar sensor for robotic applications, ” in 2019 International Confer ence on Robotics and Automation (ICRA) , 2019, pp. 1438– 1443. D O I : 10.1109/ICRA.2019.8794419 [13] J. Steckel, A. Boen, and H. Peremans, “A sonar system using a sparse broadband 3d array for robotic applica- tions, ” in 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems , 2012, pp. 3223–3228. D O I : 10.1109/IR OS.2012.6385584 [14] F . Pedregosa et al., “Scikit-learn: Machine learning in Python, ” Journal of Machine Learning Resear ch , vol. 12, pp. 2825–2830, 2011. [15] T . Chen and C. Guestrin, “XGBoost: A scalable tree boosting system, ” in Pr oceedings of the 22nd A CM SIGKDD International Confer ence on Knowledge Dis- covery and Data Mining , ser . KDD ’16, San Francisco, California, USA: ACM, 2016, pp. 785–794, I S B N : 978- 1-4503-4232-2. D O I : 10 . 1145 / 2939672 . 2939785 [On- line]. A vailable: http : / / doi . acm . org / 10 . 1145 / 2939672 . 2939785 [16] A. Paszke et al., “Automatic differentiation in pytorch, ” in NIPS-W , 2017. [17] C. Y ing, A. Klein, E. Christiansen, E. Real, K. Murphy , and F . Hutter , “N AS-bench-101: T ow ards reproducible neural architecture search, ” in Pr oceedings of the 36th International Confer ence on Machine Learning , K. Chaudhuri and R. Salakhutdinov , Eds., ser . Proceedings of Machine Learning Research, vol. 97, PMLR, 2019, pp. 7105–7114. [Online]. A v ailable: https://proceedings. mlr .press/v97/ying19a.html [18] M. T an and Q. Le, “Ef ficientNet: Rethinking model scaling for con volutional neural networks, ” in Proceed- ings of the 36th International Conference on Machine Learning , K. Chaudhuri and R. Salakhutdino v , Eds., ser . Proceedings of Machine Learning Research, vol. 97, PMLR, 2019, pp. 6105–6114. [Online]. A vailable: https: //proceedings.mlr .press/v97/tan19a.html [19] D. P . Kingma and J. Ba, Adam: A method for stochastic optimization , 2017. arXiv: 1412. 6980 [cs.LG] . [On- line]. A vailable: https://arxi v .org/abs/1412.6980 [20] J. Steckel, A. Boen, and H. Peremans, “Broadband 3-d sonar system using a sparse array for indoor navig a- tion, ” IEEE T ransactions on Robotics , vol. 29, no. 1, pp. 161–171, 2013. D O I : 10.1109/TR O.2012.2221313 [21] J. Steckel, A. Aerts, E. V erreycken, D. Laurijssen, and W . Daems, T ool wear pr ediction in cnc turning operations using ultrasonic microphone arrays and cnns , 2024. arXiv: 2406.08957 [eess.AS] . [Online]. A vailable: https://arxiv .org/abs/2406.08957 [22] W . Jansen and J. Steckel, “Semantic landmark detection and classification using neural networks for 3d in-air sonar, ” in 2024 IEEE SENSORS , 2024, pp. 1–4. D O I : 10.1109/SENSORS60989.2024.10785063 [23] A. Schenck, W . Daems, and J. Steckel, “T o- ward automated air leak localization: A machine learning-enhanced ultrasonic and lidar-slam frame work for industrial environments, ” IEEE Access , vol. 13, pp. 66 492–66 504, 2025. D O I : 10.1109/ACCESS.2025. 3558361 [24] S. S. Balasem, S. K. T iong, and S. P . K oh, “Beam- forming algorithms technique by using mvdr and lcmv, ” W orld Applied Pr ogramming , v ol. 2, no. 5, pp. 315–324, 2012. [25] W . Jansen, W . Daems, and J. Steckel, Delay-multiply- and-sum beamforming for r eal-time in-air acoustic imaging , 2025. arXiv: 2511. 09165 [eess.SP] . [On- line]. A vailable: https://arxi v .org/abs/2511.09165

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment