Research Ideas and Outcomes : Case Study
Print
Case Study
Evaluation of acoustic pattern recognition of nightingale (Luscinia megarhynchos) recordings by citizens
expand article infoMarcel Stehle‡,§, Mario Lasseck, Omid Khorramshahi, Ulrike Sturm
‡ Museum für Naturkunde Berlin Leibniz Institute for Evolution and Biodiversity Science, Berlin, Germany
§ Eberswalde University for Sustainable Development, Eberswalde, Germany
Open Access

Abstract

Acoustic pattern recognition methods introduce new perspectives for species identification, biodiversity monitoring and data validation in citizen science but are rarely evaluated in real world scenarios. In this case study we analysed the performance of a machine learning algorithm for automated bird identification to reliably identify common nightingales (Luscinia megarhynchos) in field recordings taken by users of the smartphone app Naturblick. We found that the performance of the automated identification tool was overall robust in our selected recordings. Although most of the recordings had a relatively low confidence score, a large proportion of the recordings were identified correctly.

Keywords

pattern recognition, sound recognition, species identification, mobile app, citizen science

Background

Acoustic pattern recognition methods provide new perspectives for species identification and biodiversity monitoring (Frommolt et al. 2008, Briggs et al. 2012, Bardeli et al. 2010, Potamitis 2014, Stowell and Plumbley 2014, Frommolt 2017). In addition, Wiggins et al. (2011) outlined the potential of automatic recognition techniques and filtering of outliers as a mechanism for data validation in citizen science. In the last decade bird species identification based on acoustic signals have reached correct classification rates up to 99.7% (Lopes et al. 2011). Though, Lopes et al. (2011) showed that correct classification rate decrease from 95.1% to 78.2% when the number of bird species increase. In the LifeCLEF Bird classification challenge 2018 a mean average precision of 82.6% was obtained identifying foreground species in a data set of 1500 species (Goëau et al. 2018).

However, to be able to assess the performance of such automated classifiers sufficient testing in the field is indispensable (Russo and Voigt 2016, Rydell et al. 2017). Moreover, the implementation of an acoustic pattern recognition algorithm in a smartphone app introduces additional challenges like traffic noise in the background, low quality smartphone microphones and operator errors (Priyadarshani et al. 2018). Therefore, reported accuracies for relativly clean datasets are not applicable for practical use for data from field recordings (Priyadarshani et al. 2018).

The smartphone app Naturblick (nature view) was developed with the initial focus to encourage species identification for young adults (age 18 to 30) and combines several tools that allow users to identify animals and plants. In the second step the app was developed further as an integrative tool for environmental education and biodiversity monitoring in citizen science (Sturm and Tscholl 2019). Naturblick enables users to record bird sounds, identify these recordings with an acoustic pattern recognition algorithm (Lasseck 2016) and share their recordings with citizen science projects, such as Forschungsfall Nachtigall (nightingale research case).

Objectives

Priyadarshani et al. (2018) conducted a review on automated birdsong recognition. They showed that two thirds of studies are based on small datasets and are limited to carefully selected recordings. To our knowledge, acoustic pattern recognition methods are rarely evaluated in real world scenarios, examples of exceptions are Aide et al. (2013), Jahn et al. (2017). In this case study we provide a first insight into the practical application of an acoustic pattern recognition tool to identify nightingale sounds based on field recordings taken by citizens.

We analysed the performance of our classifier to reliably identify nightingales (Luscinia megarhynchos) in field recordings taken by Naturblick users. First, the correctness of the classification was checked by manually validating the recordings. Secondly, the robustness of the classification was analysed more closely by subclassifying the validated nightingale sounds into two communicative signals: song and call.

Methodology

The smartphone app Naturblick has been released at an early stage in June 2016 and improved continuously based on the user feedback (see Sturm and Tscholl 2019). The first project phase had its focus on the identification of animals and plants, so there were no specific Citizen Science activities in the period from June 2016 to March 2018. User recorded bird sounds individually with the aim to identify them. These recordings were collected under the licence CC-BY SA 4.0 to serve as training material and as a general source for biodiversity research. Naturblick has been downloaded 45477 times and overall 55904 sound files have been recorded during the investigation period from June 2016 to October 2017.

The applied machine learning algorithm for automated bird identification is mainly based on template matching of spectrogram segments and a random forest ensemble learning method. First, individual bird song and call elements are extracted from the training data. For this the grayscale spectrogram of each audio file is treated as an image and sound elements are extracted by applying median clipping for noise reduction and various morphological operations for segmentation (Lasseck 2013). Features are created for each audio file by determining the cross-correlation of all extracted elements via template matching. During training features are weighted and reduced step by step with a random forest classifier to find the best call or song elements to represent and identify a species (Lasseck 2014, Lasseck 2015). The algorithm was trained as a regression task using the extra-tree regressor (Geurts et al. 2006) of the scikit-learn machine learning library (Pedregosa et al. 2011). A value between zero (species not detected) and one (species detected) indicates the probability of a species present in an audio file. This probability value can be interpreted as a confidence score (ConfS) ranging from zero to hundred percent. Audio material of 83 bird species from the Animal Sound Archive of the Museum für Naturkunde Berlin and the collaborative online database Xeno-Canto was utilized for feature engineering and classifier training. A single classification run produces a ConfS for each of the 83 species. The algorithm yielded the best results in the NIPS4B Multi-label Bird Species Classification Challenge (Lasseck 2013), and in previous LifeCLEF evaluation campaigns (Lasseck 2014, Lasseck 2015).

The common nightingale (Luscinia megarhynchos), a common migratory bird in Berlin, was chosen as object of investigation. In Europe Luscinia megarhynchos sings after arrival around mid April until late June (Kipper et al. 2016). Recordings, which met the following criteria were categorised as correctly identified as Luscinia megarhynchos and validated manually:

  1. Luscinia megarhynchos was listed as the most probable species,
  2. the ConfS was higher than 10%.

The seasonality of L. megarhynchos was not included as a criterion in order to detect more false positive results. Based on their confidence scores these recordings were divided into two groups: <50%, >50%. Recordings with a ConfS > 50% were validated in its entirety. Recordings with a ConfS < 50% were divided into four groups (10-20%, 20-30%, 30-40%, 40-50%) and samples of ten recordings per group were randomly selected and validated. The sample recordings were validated manually and labelled as song, song and call, or call. One person listened to all recordings and compared them with verified recordings of songs and calls from the Animal Sound Archive of the Museum für Naturkunde Berlin. Additionally, a spectrogram analysis using Raven 1.4 was conducted for recordings with high similarity to other species. For each recording a spectrogram (visual representations of the audio recording) was produced and compared to verified spectrograms from the Animal Sound Archive of the Museum für Naturkunde Berlin. Variables measured included maximum and minimum frequencies, and delta frequency. Audio recordings, which were particularly difficult to distinguish, were cross-checked by a second researcher.

Results

In total, 468 field recordings met the defined criteria (Table 1). All recordings were made between September 2016 and October 2017. The duration varied from 1.9 seconds to 40.7 seconds. Most samples contained metadata on geographic coordinates, date and time of the recording. Only 10% of the recordings (46 recordings) had a ConfS higher than 50%. Most recordings had a confidence score between 10% and 20% (N=224).

Recordings with conditions:

  1. L. megarhynchos listed as the most probable species;
  2. confidence score > 10%, N=468.
Month Total number of recordings

Number of recordings with ConfS <50%

Number of recordings with ConfS >50%
June 2016 0 0 0
July 2016 0 0 0
August 2016 0 0 0
September 2016 2 2 0
October 2016 0 0 0
November 2016 0 0 0
December 2016 1 1 0
January 2017 5 5 0
February 2017 3 0 0
March 2017 17 16 1
April 2017 53 49 4
May 2017 208 199 9
June 2017 125 101 24
July 2017 36 28 8
August 2017 11 11 0
September 2017 5 5 0
October 2017 2 2 0

33 recordings of the 40 samples with a ConfS < 50% were validated as correctly classified (Fig. 1). One of the false identified recordings had the same ConfS (24.5%) both for the common nightingale and the common chaffinch (Fringilla coelebs) and was verified as a common chaffinch call. Also, nearly all the recordings with a ConfS >50% have been confirmed as correctly classified. Only one out of these 46 recordings was misidentified. Two of the false positive classified recordings were human imitations (ConfS 15% and 18.3%).

Figure 1.  

Verified L. megarhynchos recordings in groups based on the ConfS (10-20%, N=10; 20-30%, N=10; 30-40%, N=10; 40-50%, N=10; 50-60%, N=21; 60-70%, N=11; 70-80%, N=11; 80-90%, N=3).

Twelve of the correctly identified recordings were found to be audio playbacks of files, CDs or similar sources (Table 2).

Number of verified L. megarhynchos recordings per indicators for audio playbacks of audio files or similar sources. Recordings with more than one indicator are marked with an asterisk (*).

Indicator Description Number of recordings (ConfS)
Place and time Species-specific plausibility of timestamp in combination with geographical coordinates 5 (22.8%*, 36.7%*, 37.4%*, 46.6%*, 47.4%*)
High recording level A high recording level indicates that a user has been holding his/her smartphone to a speaker, that plays back a recording. 10 (17.4%*, 22.8%*, 28%*, 36.7%*, 37.4%*, 46.6%*, 46.9%, 47.4%*, 51.5%, 52%*)
Unusual noise Absence of natural background noise or noise, which does not fit to an outdoor recording, e.g. mouse and keyboard sounds 11 (17.4%*, 18%, 22.8%*, 28%*, 36.7%*, 37.4%*, 42.9%, 46.6%*, 46.9%*, 47.4%*, 52%*)

The majority of the verified recordings with a ConfS <50% were identified as nightingale song (Fig. 2). Most of the recordings with a ConfS >50% were classified as nightingal calls. All recordings classified as song and call were found to be audio playbacks of files, CDs or similar sources.

Figure 2.  

Vocalization types in verified nightingale recordings, N=78.

Discussion and relevance to ongoing research

Our case study highlights the usefulness of acoustic pattern recognition to identify animal sounds recorded with smartphones. Regarding the classification of nightingales the performance of the automated identification tool was overall robust in our selected recordings. A large proportion of the verified recordings were classified correctly. This is remarkable, taking into account the origin of the source audio material used in this study. We expected large variations in audio qualities due to different microphones, various mobile phone vendors, general problems with urban field recordings because of background noise, and operator errors.

Interestingly, nightingale calls had a higher frequency of high confidence ranks in comparison to song recordings. One reason for this could be that nightingale calls have a smaller variability than songs and therefore are better recognized. Kipper et al. (2015) found only three distinct call types and measured 27 different acoustic variables per call in their study on male song responses to either male or female call playbacks in nightingales. In contrast, male nightingale song comprises of a large song type repertoire (e.g. Kipper et al. 2004). Also the underlying training material for the classifier could explain most of the discovered bias for calls. Hence, further investigation is necessary.

We discovered that the audio recognition tool of the app Naturblick was tested commonly by playing playbacks of audio files or similar sources. We assume that users were curious to test the effectiveness of the audio pattern recognition. This interaction was not anticipated and the classifier was not trained to bias noisy field recordings versus high quality audio material. However, none of these playback recordings received confidence scores above 60%.

We believe, besides its value as species identification tool in general, automated pattern recognition should be recognized more as a mechanism to assess data quality of citizen science audio recordings via smartphones. Data quality may even be improved quite simply by applying indicators to identify outliers, e.g. audio playbacks of audio files or similar sources (Wiggins et al. 2011).

We only evaluated the precision of the nightingale classification and not its recall rate or sensitivity. Therefore, it would be interesting to examine the nightingale recordings that were excluded from our selection. Furthermore, the study was limited due to its small sample size of validated recordings. To address this, we plan to evaluate the identification performance of the recognition tool with an increased sample size of nightingale recordings.

Acknowledgements

We thank Silke Voigt-Heucke for her valuable suggestions and for cross-checking individual recordings. We thank Dr. Karl-Heinz Frommolt for his support.

Funding program

This publication was written as part of the project ‘Stadtnatur entdecken’, funded by the Federal Ministry of the Environment, Nature Conservation and Nuclear Safety (BMU).

References