Automatic Identification of Animal Breeds and Species Using Bioacoustics and Artificial Neural Networks
In this research endeavor, it was hypothesized that the sound produced by animals during their vocalizations can be used as identifiers of the animal breed or species even if they sound the same to unaided human ear. To test this hypothesis, three artificial neural networks (ANNs) were developed using bioacoustics properties as inputs for the respective automatic identification of 13 bird species, eight dog breeds, and 11 frog species. Recorded vocalizations of these animals were collected and processed using several known signal processing techniques to convert the respective sounds into computable bioacoustics values. The converted values of the vocalizations, together with the breed or species identifications, were used to train the ANNs following a ten-fold cross validation technique. Tests show that the respective ANNs can correctly identify 71.43% of the birds, 94.44% of the dogs, and 90.91% of the frogs. This result show that bioacoustics and ANN can be used to automatically determine animal breeds and species, which together could be a promising automated tool for animal identification, biodiversity determination, animal conservation, and other animal welfare efforts.
💡 Research Summary
The paper investigates whether the vocalizations of animals can serve as reliable identifiers of breed or species when processed through bioacoustic feature extraction and classified by artificial neural networks (ANNs). Three separate ANN models were built to recognize 13 bird species, 8 dog breeds, and 11 frog species. Audio recordings were gathered from publicly available databases (Michigan State University’s Avian Vocalizations Center for birds, AmphibiaWeb for frogs) and from field recordings of dogs in pet shops and prior studies. Each class comprised 25 bird clips, 10 frog clips, and 10 dog clips per breed, yielding 325 bird, 110 frog, and 90 dog samples. To teach the networks to discriminate animal sounds from non‑animal noises, “pseudo‑species” (negative examples) were added to each dataset, expanding the classification problem to 14 bird classes, 9 dog classes, and 12 frog classes.
All recordings were processed with the jAudio toolkit, extracting 28 spectral descriptors commonly used in bioacoustics: Mel‑Frequency Cepstral Coefficients (MFCC), Zero‑Crossing Rate (ZCR), Root‑Mean‑Square (RMS), Fraction of Low‑Energy Window Frames (FLWEF), Spectral Flux, Spectral Roll‑off, Compactness, Method of Moments (area, mean, power‑spectrum density, skew, kurtosis), Linear Predictive Coding (LPC), Spectral Centroid, Beat Sum, Strongest Beat, Strength of Strongest Beat, Spectral Variability, among others. For each descriptor the mean and standard deviation across the entire clip were computed, forming a 56‑dimensional feature vector per sample.
The ANN architecture employed a multilayer perceptron (MLP) trained with back‑propagation and cross‑entropy loss. To avoid over‑fitting, a 10‑fold cross‑validation scheme was used, with 70 % of the data for training, 10 % for testing, and 20 % for final evaluation. Experiments compared two input strategies: (1) using the full set of 28 descriptors, and (2) selecting a reduced subset based on preliminary performance. For birds, the full 28‑descriptor set yielded the best result: 71.43 % classification accuracy. For dogs, a reduced set of four descriptors (including MFCC, ZCR, Spectral Centroid, and LPC) achieved 94.44 % accuracy, demonstrating that a small, well‑chosen feature subset can be highly discriminative. For frogs, the full descriptor set produced 90.91 % accuracy.
The authors discuss why bird identification performed less well: the relatively small and heterogeneous dataset, subtle inter‑species acoustic differences, and higher susceptibility to background noise. In contrast, dog barks and frog calls exhibit more distinct spectral patterns, leading to higher accuracies. The paper highlights the practical implications of these findings: a smartphone‑based crowdsourcing platform could allow citizen scientists to record animal sounds in the field, automatically upload them, and receive instant species identification. Such a system would be valuable for biodiversity monitoring, especially in environments where visual observation is difficult (dense canopy, nocturnal or underwater habitats).
Limitations noted include the modest sample sizes, especially for birds, the reliance on handcrafted spectral features rather than end‑to‑end deep learning approaches, and the absence of a comparison with more modern architectures such as convolutional or recurrent neural networks. Future work is suggested to expand the dataset across more habitats and taxa, incorporate advanced deep learning models, test robustness under varying environmental noise conditions, and develop real‑time mobile applications.
In conclusion, the study provides empirical evidence that bioacoustic feature extraction combined with ANN classification can automatically differentiate animal breeds and species with high accuracy, particularly for dogs and frogs. This validates the feasibility of acoustic‑based automated identification as a tool for wildlife conservation, biodiversity assessment, and broader animal‑welfare initiatives.
Comments & Academic Discussion
Loading comments...
Leave a Comment