Objective of this work is to integrate high performance computing (HPC) technologies and bioacoustics data-mining capabilities by offering a MATLAB-based toolbox called Raven-X. Raven-X will provide a hardware-independent solution, for processing large acoustic datasets - the toolkit will be available to the community at no cost. This goal will be achieved by leveraging prior work done which successfully deployed MATLAB based HPC tools within Cornell University's Bioacoustics Research Program (BRP). These tools enabled commonly available multi-core computers to process data at accelerated rates to detect and classify whale sounds in large multi-channel sound archives. Through this collaboration, we will expand on this effort which was featured through Mathworks research and industry forums incorporate new cutting-edge detectors and classifiers, and disseminate Raven-X to the broader bioacoustics community.
The project is broken down into two phases of equal duration. Phase-I we aim to integrate existing HPC models [1,2,8] and provide a stable software tool through an online repository. The toolkit will be deployed to run on standard computers as well as HPC technologies (e.g. cloud computers). An HPC model [1] will be added to provide accelerated processing. Popular algorithms will be integrated to the toolkit, these include GPL [9], silbido [10] and ERMA [11]. The team will work through various architectural issues and changes will be incorporated into a fault tolerant application. Each algorithm will be tested and benchmarked under various conditions, such as sample rate, archive size, channel configuration and sound file type.
Phase-II will focus on creating a deployable package to the broader community. We aim to create and modify new user interfaces, documentation will be added where necessary. Interfaces will be open, allowing other software experts, from the community, to explore adding custom algorithms. For deployed algorithms, Raven-X will enable users to export detector/classifier results in a format which can be used by existing software tools (e.g. Raven Pro and Tethys). The objective is to build a stable package, robust and capable of running a series of new algorithms for advanced marine mammal detection and classification. At the end of the project, Raven-X will be made available to the bioacoustics community at no-cost through an online repository.
Investigating Hardware for PMRF Tyler Helble investigated several hardware configurations for improving computing performance at SSC-PAC processing for PMRF and SCORE data. Tyler consulted with the Cornell team, SSC-PAC’s High Performance Computing team, and attended the 2016 passive acoustic data archiving meeting at NOAA in Boulder, CO. It was determined that a lack of high speed, online storage is a main bottleneck for processing range data. A 270TB network attached storage (NAS) device was purchased with end-of-year PACFLEET funding. The unit is manufactured by NEXSAN BEAST storage; installation for SSC-PAC lab system is scheduled for the end of October. The storage device is expandable and will allow the lab to host nearly all of the unclassified acoustic data on a single accessible device, rather than stored on hundreds of unconnected SATA drives. This will eliminate the need for an operator to physical swap drives, and will allow for a more automated processing of PMRF data; minimizing human intervention. The acquisition of the NAS, combined with improvements in algorithm speeds from this project, should allow SSC-PAC to process each year’s worth of data significantly faster than traditional methods 1 . The NAS will be hosted in the SSC-PAC lab and connected to an existing 16 core machine. The NAS device is also compatible with the SSC-PAC’s, 144 core, 40TB RAM system located in the HPC center, and may eventually be migrated to that system. The connectivity to the HPC is relatively slow (1 gig Ethernet), so the lab may need to be physically relocated to an area with faster network speeds for the data to be hosted at the HPC center. Additionally, any computer connected to the HPC must be on the SSC-PAC network, which means that several of our custom software packages for processing data would need to be added to DATUMS before they are cleared for use on the SSC-PAC network. Therefore, we foresee the migration to the HPC to be at least several years down the road.
Raven-X Software HPC runtime models, for the parallel-distributed processing, were updated and integrated into the Raven-X toolkit. Older user interfaces have been redesigned and implemented using the new object oriented (OO) structure, released in MATLAB 2016a. The goal is to create a complete OO toolkit, maximizing the ability to use components in Raven-X as building blocks for other HPC applications. The new Raven-X APP is shown in figure 1.
Four main algorithms, internal to Cornell BRP were integrated and tested in the new OO framework. These include three right whale algorithms and one algorithm designed for minkewhale detection. Preliminary integration of the GPL algorithm was accomplished using the OO framework. GPL comes equipped with various parameter files (called parm-files), each parm-file contains a series of settings that are used inside the GPL code. The OO structure provided a wrapper which allowed the original GPL package to remain untouched. This is best illustrated in figure 2. Integration with silbido and ERMA will continue after GPL integration is complete.
Raven-X independent from each other thereby incorporating future changes is more seamless and easier.
The GPL detector was successfully integrated into the Raven-X framework. Runtime performance was measured by processing GPL blue whale and GPL fin whale detector across 24 hours of single channel, 2 kHz data. The COTS computer selected for the test was equipped with 12 CPU’s. The test consisted of two me
This content is AI-processed based on open access ArXiv data.