Here, we present HyenaSET, a large (~1640 hours) bioacoustic dataset derived from collar-mounted audio recorders deployed on 19 spotted hyenas in the Maasai Mara National Reserve, Kenya. Within this dataset, 243 hours have been strongly labeled by identifying the onset and offset of all vocalizations as well as their types, and the labels have been validated by experts on hyena vocalizations and behavior. Within the strongly labeled data, the total amount of time hyenas were vocalizing was 9.5 hours (3.9%). Furthermore, each vocalization has been manually identified as "focal" (emitted by the hyena wearing the collar) or "non-focal" (emitted by a nearby conspecific), making use of information from collar-mounted accelerometers that picked up vibrations of the animal's throat when it produced vocalizations. In addition to the labeled data, we also provide a large corpus of unlabeled data from the same recordings, which can be used for un- or self-supervised machine learning tasks. To ensure reproducibility of this dataset as a benchmark in machine learning studies, we present it alongside five stratified cross-validation train/test splits to enable accurate comparisons, and we also provide a train/test split in which specific individuals are left out of the training set to assess generalizability across individuals. Finally, as a performance benchmark, we present baseline results for this dataset using animal2vec, a recently developed transformer-based model optimized for bioacoustic data.
Woerner, J. M., Angonin, C., Gersick, A. S., Holekamp, K. E., Jensen, F. H., Johnson, M. P., Onsare, M. H. M., Pioon, M. O., Schäfer-Zimmermann, J., Strandburg-Peshkin, A., Strauss, E. D.
Advertisement
Stats
- Recommendations n/a n/a positive of 0 vote(s)
- Views 5
- Comments 0
