VoxCeleb Dataset¶

Dataset Description¶

VoxCeleb is a collection of voice recording of celebrities extracted from various Youtube videos. It contains:

		Identities	Sample count
train		1211	148642
dev / eval	references	40	4874
	probes		37720

The dev and eval sets are a copy of each other for this protocol. The following results will then only show the development set.

GMM¶

To run the baseline, use the following command:

$ bob bio pipeline simple -d voxceleb gmm-mobio -l sge-demanding -o results/gmm_voxceleb -n 512

Then, to generate the scores, use:

$ bob bio metrics -e ./results/gmm_voxceleb/scores-dev.csv

Table 13 [Min. criterion: EER ] Threshold on Development set: 1.062216e-01¶
	Development
Failure to Acquire	0.0%
False Match Rate	18.8% (3538/18860)
False Non Match Rate	18.8% (3538/18860)
False Accept Rate	18.8%
False Reject Rate	18.8%
Half Total Error Rate	18.8%

On 1281 CPU nodes on the SGE Grid: Ran in 10 hours.

ISV¶

TODO

Speechbrain ECAPA-TDNN¶

This baseline reproduces the speaker verification experiment with a pretrained ECAPA-TDNN model using the SpeechBrain library. The original paper’s reference is the following:

@inproceedings{spear,
  author = {Brecht Desplanques, Jenthe Thienpondt and Kris Demuynck},
  title = {{ECAPA-TDNN:} Emphasized Channel Attention, Propagation and Aggregation in {TDNN} Based Speaker Verification},
  booktitle = {Interspeech 2020},
  year = {2020},
  url = {https://www.isca-speech.org/archive_v0/Interspeech_2020/pdfs/2650.pdf},
}

To run the baseline, use the following command:

$ bob bio pipeline simple -vvv -d voxceleb -p speechbrain-ecapa-voxceleb -g dev -o ./results/speechbrain_voxceleb

Then, to generate the scores, use:

$ bob bio metrics -e ./results/speechbrain_voxceleb/scores-dev.csv

Table 14 [Min. criterion: EER] Threshold on Development set: -6.159925e-01¶
	Development
Failure to Acquire	0.0%
False Match Rate	1.0% (189/18860)
False Non Match Rate	1.0% (189/18860)
False Accept Rate	1.0%
False Reject Rate	1.0%
Half Total Error Rate	1.0%

On 1281 CPU nodes on the SGE Grid: Ran in 9 minutes (no training).

Note

ECAPA-TDNN gives a reference result of 0.8% EER on VoxCeleb. However, they were using a customized version of the dataset (VoxCeleb (cleaned)) which ignores 109 probe files (presumably containing wrong data) from our own dataset.

Footnotes

1(1,2): The number of nodes is a requested maximum amount and can vary depending on the number of jobs currently running on the grid as well as the scheduler’s load estimation. The execution time can then also vary.