Research Ideas and Outcomes :
Case Study
|
Corresponding author: Ulrike Sturm (ulrike.sturm@mfn-berlin.de)
Received: 17 Jan 2020 | Published: 24 Feb 2020
© 2020 Marcel Stehle, Mario Lasseck, Omid Khorramshahi, Ulrike Sturm
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation: Stehle M, Lasseck M, Khorramshahi O, Sturm U (2020) Evaluation of acoustic pattern recognition of nightingale (Luscinia megarhynchos) recordings by citizens. Research Ideas and Outcomes 6: e50233. https://doi.org/10.3897/rio.6.e50233
|
Acoustic pattern recognition methods introduce new perspectives for species identification, biodiversity monitoring and data validation in citizen science but are rarely evaluated in real world scenarios. In this case study we analysed the performance of a machine learning algorithm for automated bird identification to reliably identify common nightingales (Luscinia megarhynchos) in field recordings taken by users of the smartphone app Naturblick. We found that the performance of the automated identification tool was overall robust in our selected recordings. Although most of the recordings had a relatively low confidence score, a large proportion of the recordings were identified correctly.
pattern recognition, sound recognition, species identification, mobile app, citizen science
Acoustic pattern recognition methods provide new perspectives for species identification and biodiversity monitoring (
However, to be able to assess the performance of such automated classifiers sufficient testing in the field is indispensable (
The smartphone app Naturblick (nature view) was developed with the initial focus to encourage species identification for young adults (age 18 to 30) and combines several tools that allow users to identify animals and plants. In the second step the app was developed further as an integrative tool for environmental education and biodiversity monitoring in citizen science (
We analysed the performance of our classifier to reliably identify nightingales (Luscinia megarhynchos) in field recordings taken by Naturblick users. First, the correctness of the classification was checked by manually validating the recordings. Secondly, the robustness of the classification was analysed more closely by subclassifying the validated nightingale sounds into two communicative signals: song and call.
The smartphone app Naturblick has been released at an early stage in June 2016 and improved continuously based on the user feedback (see
The applied machine learning algorithm for automated bird identification is mainly based on template matching of spectrogram segments and a random forest ensemble learning method. First, individual bird song and call elements are extracted from the training data. For this the grayscale spectrogram of each audio file is treated as an image and sound elements are extracted by applying median clipping for noise reduction and various morphological operations for segmentation (
The common nightingale (Luscinia megarhynchos), a common migratory bird in Berlin, was chosen as object of investigation. In Europe Luscinia megarhynchos sings after arrival around mid April until late June (
The seasonality of L. megarhynchos was not included as a criterion in order to detect more false positive results. Based on their confidence scores these recordings were divided into two groups: <50%, >50%. Recordings with a ConfS > 50% were validated in its entirety. Recordings with a ConfS < 50% were divided into four groups (10-20%, 20-30%, 30-40%, 40-50%) and samples of ten recordings per group were randomly selected and validated. The sample recordings were validated manually and labelled as song, song and call, or call. One person listened to all recordings and compared them with verified recordings of songs and calls from the Animal Sound Archive of the Museum für Naturkunde Berlin. Additionally, a spectrogram analysis using Raven 1.4 was conducted for recordings with high similarity to other species. For each recording a spectrogram (visual representations of the audio recording) was produced and compared to verified spectrograms from the Animal Sound Archive of the Museum für Naturkunde Berlin. Variables measured included maximum and minimum frequencies, and delta frequency. Audio recordings, which were particularly difficult to distinguish, were cross-checked by a second researcher.
In total, 468 field recordings met the defined criteria (Table
Recordings with conditions:
Month | Total number of recordings |
Number of recordings with ConfS <50% |
Number of recordings with ConfS >50% |
---|---|---|---|
June 2016 | 0 | 0 | 0 |
July 2016 | 0 | 0 | 0 |
August 2016 | 0 | 0 | 0 |
September 2016 | 2 | 2 | 0 |
October 2016 | 0 | 0 | 0 |
November 2016 | 0 | 0 | 0 |
December 2016 | 1 | 1 | 0 |
January 2017 | 5 | 5 | 0 |
February 2017 | 3 | 0 | 0 |
March 2017 | 17 | 16 | 1 |
April 2017 | 53 | 49 | 4 |
May 2017 | 208 | 199 | 9 |
June 2017 | 125 | 101 | 24 |
July 2017 | 36 | 28 | 8 |
August 2017 | 11 | 11 | 0 |
September 2017 | 5 | 5 | 0 |
October 2017 | 2 | 2 | 0 |
33 recordings of the 40 samples with a ConfS < 50% were validated as correctly classified (Fig.
Twelve of the correctly identified recordings were found to be audio playbacks of files, CDs or similar sources (Table
Number of verified L. megarhynchos recordings per indicators for audio playbacks of audio files or similar sources. Recordings with more than one indicator are marked with an asterisk (*).
Indicator | Description | Number of recordings (ConfS) |
Place and time | Species-specific plausibility of timestamp in combination with geographical coordinates | 5 (22.8%*, 36.7%*, 37.4%*, 46.6%*, 47.4%*) |
High recording level | A high recording level indicates that a user has been holding his/her smartphone to a speaker, that plays back a recording. | 10 (17.4%*, 22.8%*, 28%*, 36.7%*, 37.4%*, 46.6%*, 46.9%, 47.4%*, 51.5%, 52%*) |
Unusual noise | Absence of natural background noise or noise, which does not fit to an outdoor recording, e.g. mouse and keyboard sounds | 11 (17.4%*, 18%, 22.8%*, 28%*, 36.7%*, 37.4%*, 42.9%, 46.6%*, 46.9%*, 47.4%*, 52%*) |
The majority of the verified recordings with a ConfS <50% were identified as nightingale song (Fig.
Our case study highlights the usefulness of acoustic pattern recognition to identify animal sounds recorded with smartphones. Regarding the classification of nightingales the performance of the automated identification tool was overall robust in our selected recordings. A large proportion of the verified recordings were classified correctly. This is remarkable, taking into account the origin of the source audio material used in this study. We expected large variations in audio qualities due to different microphones, various mobile phone vendors, general problems with urban field recordings because of background noise, and operator errors.
Interestingly, nightingale calls had a higher frequency of high confidence ranks in comparison to song recordings. One reason for this could be that nightingale calls have a smaller variability than songs and therefore are better recognized.
We discovered that the audio recognition tool of the app Naturblick was tested commonly by playing playbacks of audio files or similar sources. We assume that users were curious to test the effectiveness of the audio pattern recognition. This interaction was not anticipated and the classifier was not trained to bias noisy field recordings versus high quality audio material. However, none of these playback recordings received confidence scores above 60%.
We believe, besides its value as species identification tool in general, automated pattern recognition should be recognized more as a mechanism to assess data quality of citizen science audio recordings via smartphones. Data quality may even be improved quite simply by applying indicators to identify outliers, e.g. audio playbacks of audio files or similar sources (
We only evaluated the precision of the nightingale classification and not its recall rate or sensitivity. Therefore, it would be interesting to examine the nightingale recordings that were excluded from our selection. Furthermore, the study was limited due to its small sample size of validated recordings. To address this, we plan to evaluate the identification performance of the recognition tool with an increased sample size of nightingale recordings.
We thank Silke Voigt-Heucke for her valuable suggestions and for cross-checking individual recordings. We thank Dr. Karl-Heinz Frommolt for his support.
This publication was written as part of the project ‘Stadtnatur entdecken’, funded by the Federal Ministry of the Environment, Nature Conservation and Nuclear Safety (BMU).