VoxCeleb Dataset

Dataset Description

VoxCeleb is a collection of voice recording of celebrities extracted from various Youtube videos. It contains:

Identities

Sample count

train

1211

148642

dev / eval

references

40

4874

probes

37720

The dev and eval sets are a copy of each other for this protocol. The following results will then only show the development set.

GMM

To run the baseline, use the following command:

$ bob bio pipeline simple -d voxceleb gmm-mobio -l sge-demanding -o results/gmm_voxceleb -n 512

Then, to generate the scores, use:

$ bob bio metrics -e ./results/gmm_voxceleb/scores-dev.csv
Table 13 [Min. criterion: EER ] Threshold on Development set: 1.062216e-01

Development

Failure to Acquire

0.0%

False Match Rate

18.8% (3538/18860)

False Non Match Rate

18.8% (3538/18860)

False Accept Rate

18.8%

False Reject Rate

18.8%

Half Total Error Rate

18.8%

On 1281 CPU nodes on the SGE Grid: Ran in 10 hours.

ISV

TODO

Speechbrain ECAPA-TDNN

This baseline reproduces the speaker verification experiment with a pretrained ECAPA-TDNN model using the SpeechBrain library. The original paper’s reference is the following:

@inproceedings{spear,
  author = {Brecht Desplanques, Jenthe Thienpondt and Kris Demuynck},
  title = {{ECAPA-TDNN:} Emphasized Channel Attention, Propagation and Aggregation in {TDNN} Based Speaker Verification},
  booktitle = {Interspeech 2020},
  year = {2020},
  url = {https://www.isca-speech.org/archive_v0/Interspeech_2020/pdfs/2650.pdf},
}

To run the baseline, use the following command:

$ bob bio pipeline simple -vvv -d voxceleb -p speechbrain-ecapa-voxceleb -g dev -o ./results/speechbrain_voxceleb

Then, to generate the scores, use:

$ bob bio metrics -e ./results/speechbrain_voxceleb/scores-dev.csv
Table 14 [Min. criterion: EER] Threshold on Development set: -6.159925e-01

Development

Failure to Acquire

0.0%

False Match Rate

1.0% (189/18860)

False Non Match Rate

1.0% (189/18860)

False Accept Rate

1.0%

False Reject Rate

1.0%

Half Total Error Rate

1.0%

On 1281 CPU nodes on the SGE Grid: Ran in 9 minutes (no training).

Note

ECAPA-TDNN gives a reference result of 0.8% EER on VoxCeleb. However, they were using a customized version of the dataset (VoxCeleb (cleaned)) which ignores 109 probe files (presumably containing wrong data) from our own dataset.

Footnotes

1(1,2)

The number of nodes is a requested maximum amount and can vary depending on the number of jobs currently running on the grid as well as the scheduler’s load estimation. The execution time can then also vary.