InfantMarmosetsVox

Description

InfantMarmosetsVox is a dataset for multi-class call-type and caller identification. It contains audio recordings of different individual marmosets and their call-types. The dataset contains a total of 350 files of precisely labelled 10-minute audio recordings across all caller classes. The audio was recorded from five pairs of infant marmoset twins, each recorded individually in two separate sound-proofed recording rooms at a sampling rate of 44.1 kHz. The start and end time, call-type, and marmoset identity of each vocalization are provided, labeled by an experienced researcher.

References

This dataset was collected and partially used for the paper "Automatic detection and classification of marmoset vocalizations using deep and recurrent neural networks" by Zhang et al.

It is also used for the experiments in the paper "Can Self-Supervised Neural Representations Pre-Trained on Human Speech distinguish Animal Callers?" by E. Sarkar and M. Magimai-Doss.

The source code of a PyTorch DataLoader reading this data is available at https://github.com/idiap/ssl-caller-detection.

Citation

Any publication (eg. conference paper, journal article, technical report, book chapter, etc) resulting from the usage of InfantsMarmosetVox must cite the following publication:

Sarkar, E., Magimai.-Doss, M. (2023) Can Self-Supervised Neural Representations Pre-Trained on Human Speech distinguish Animal Callers? Proc. INTERSPEECH 2023, 1189-1193, doi: 10.21437/Interspeech.2023-1968

Bibtex:

@inproceedings{sarkar23_interspeech,
  author={Eklavya Sarkar and Mathew Magimai.-Doss},
  title={{Can Self-Supervised Neural Representations Pre-Trained on Human Speech distinguish Animal Callers?}},
  year=2023,
  booktitle={Proc. INTERSPEECH 2023},
  pages={1189--1193},
  doi={10.21437/Interspeech.2023-1968}
}