Recording Device
For data collection, we created custom multi-channel acoustic recording devices called SoundSHROOMs (Sensor-equipped Habitat Recorders for Outdoor Omnidirectional Monitoring) (Fig. 1). These devices were designed to synchronously capture audio from multiple microphones for beamforming and localization applications and to serve as a testbed for experimenting with different mechanical designs that help reduce wind noise and enhance environmental robustness. The device samples synchronously from ten microphones, eight of which are MEMS-type (VM3011) specifically designed to be dust-resistant, shockproof, and weather-tight, making them ideal for long-term outdoor applications. The VM3011 are omnidirectional with a sensitivity of − 26 dBFS (1 kHz, 94 dB SPL) and a signal-to-noise ratio of 64 dBA (94 dB SPL at 1 kHz signal, 20 Hz-20 kHz, A-weighted Noise). These microphones output pulse-density modulated (PDM) signals to a single 8-channel converter (ADAU7118) that translates the signals into one time-division multiplexed (TDM) signal for direct interfacing with the ARM Cortex-M33 processor (STM32U575RIT6). No additional gain is applied to the MEMS-type microphone channels.
The system also features two omnidirectional analog electret microphones (AOM-5024P-HD-MB-R) with a high signal-to-noise ratio (80 dB) and a relatively uniform frequency response up to 15 kHz with a rated sensitivity up to 20kHz. These microphones serve as a sound quality reference for the digital MEMS microphone channels and provide additional flexibility. The electrets are interfaced with a 24-bit analog-to-digital converter (ADC) with configurable digital gain (ADAU1979); for our application, we only sample 16-bits (MSB) after applying a gain of 25.5 dB. The ADC is interfaced with the same TDM-16 bus used by the MEMS microphones, synchronizing the electret microphones with the other channels. For our application, we sample at 32 kHz with 16-bit resolution across all channels and save the data directly to an uncompressed 10-channel WAV file. We chose this configuration to balance file size while still being able to capture the entire audible spectrum without aliasing the acoustic signal. The MEMS microphones were purchased as part of the same production batch to minimize any manufacturing non-uniformities. The positions of the MEMS microphones were selected to allow for experimentation with beamforming over a wide range of frequencies (Table 1). It is often ideal to space the microphones at no greater than half the desired wavelength to avoid phase matching ambiguities, but not so close that phase differences cannot be resolved across microphones. Seven of the microphones are positioned on the same plane for greater azimuth resolution, while the eighth microphone and electrets are offset to add greater resolution when beamforming across elevations.
We proved SoundSHROOM’s efficacy in beamforming applications through synthetic and real-world experiments. We first defined our 8 MEMS microphone locations using an open-source Python microphone array package called Acoular17. Using this definition, we then placed three virtual sound sources around the array at varying azimuths and elevations. By using the CLEAN-SC (Clean based on Source Coherence) algorithm, we were able to distinguish a cluster of points along the three calculated beams, which can be further used to predict sound source locations (Fig. 2)18.
A simulated beamforming example using the Acoular17 Python package with CLEAN-SC algorithm. Three sound sources were defined at locations (5, − 3, 4), (3, 4, 0), and (-5, 0, 2). The SoundSHROOM was positioned at the origin with the 8 MEMS microphones simulated at their real locations.
In addition, we collected real-world acoustic recordings by placing one of the SoundSHROOMs in the center of a large auditorium and dynamically positioning a sound source around the SoundSHROOM device (Fig. 3). With Acoular and the collected data, we automatically calculated beams toward sound source locations with 1-second, 8-channel excerpts. We conducted this experiment using a moving sound source that traveled along a circular path with a radius of 7.5 meters. A video of the complete results, superimposed with the sound data, can be found in the beamforming_example folder in the repository, along with the scripts used.
A laboratory beamforming example using the Acoular17 Python package is presented. A speaker was positioned 7.5 meters from a SoundSHROOM and moved around the device in a circle. The beams were calculated using Acoular on 1-second excerpts, utilizing SoundSHROOM’s eight MEMS channels as input.
Our system stands out in several key aspects when compared to prior commercial systems or those designed specifically for research purposes. It features a dual-microphone setup with two highly sensitive electret microphones for high-fidelity stereo recordings, while the additional eight MEMS microphones are designated for localization and beamforming applications. The microphones are strategically separated across all three axes, providing maximum flexibility for beamforming across both azimuth and elevation.
In contrast, some available alternatives include an affordable commercial microphone array priced at approximately 59USD. However, this option offers only 4 channels of synchronized audio and restricts the sample rate to 16 kHz (ReSpeaker, Seeed Technology Co. Ltd., Shenzhen, China). Other 4-channel arrays used in academic settings are often not available for purchase and tend to be customized for specific applications19,20,21. Our current version of SoundSHROOM lacks the capability to automatically synchronize across individual units, unlike other options equipped with onboard GPS, necessitating synchronization in post-processing when multiple units are used for localization20. While microphone arrays with more than four channels of synchronized audio are available, they are typically either expensive or custom-made for particular applications and not readily accessible. In addition, given that storing data from more than 4-channels of audio is bandwidth constraining, most of these systems record at a maximum of 16kHz which can be prohibitive for some applications22; we’ve successfully tested SoundSHROOM with the 10-microphones sampled at 48kHz. A more common approach is to utilize multiple 1- to 4-channel systems that are spaced apart and synchronized through wireless connections, cables, or in post-processing.
Data Collection
In July 2023, we deployed three SoundSHROOMs at various locations near Longyearbyen, Svalbard, Norway (Fig. 4, Table 2). The first objective was to collect qualitative data on the deployment process in the Arctic during the summer months, including assessing the environmental conditions to which the instruments might be subjected and determining whether the SoundSHROOMs can survive in their current configuration. The second objective was to test various types of microphone windshields to evaluate their performance under realistic conditions with variable wind loading. These windshields were applied to both the MEMS microphone array and the electret stereo pair. Through both objectives, we recorded multi-channel synchronized microphone data, which will be made publicly available for anyone interested in wildlife recording on Svalbard. The overall study consisted of six deployment areas chosen based on observed biodiversity, environmental characteristics (e.g., proximity to a stream, open plain, wet terrain, mud), and proximity to human activity. Deployment 5 was split into two sub-deployments (5.1 and 5.2) located near each other. Deployments 4 and 6 were the furthest from human activity (i.e., roads and housing), while the other deployments were near the main road in Longyearbyen, allowing for the occasional sound of car traffic. Deployments 2 and 3 were near the dog yards where locals house their sled dogs, so dog activity can be heard in the acoustic recordings. Additionally, flowing water can be heard in the data from deployments 1 and 5.1 due to nearby streams.
Deployment locations in Svalbard as seen in Google Earth28.
Recording Device
We configured the SoundSHROOMs in four different setups (Fig. 5) to experiment with various mechanical windshields and assess their effectiveness in reducing wind-related noise in the acoustic recordings. To characterize the performance of the small windshields, we used a wind tunnel with a fixed air flow rate of 10m/s. Both MEMS and electret microphones sampled at 48kHz showed significant noise reduction in frequencies at or below 1kHz when using the windshield, with negligible differences observed at higher frequencies (Figs. 6 and 7).
The windshields were attached to the plastic surface of the SoundSHROOMs, completely covering the microphone ports. The sampling rate (32kHz) and bit resolution (16-bit) remained consistent across all devices during data collection in Svalbard.