Databases used to train models¶

Several face recognition databases are released by year by the community interested in this research. Among them, some are claimed to be large scale, with huge amounts of identities and samples. Such databases are very suitable to train “robust” models, mostly neural networks.

Here we focus in 3 databases.

Casia Webface¶

This database contains 494,414 face images of 10,575 identities.

Todo

Provide some statistics about this dataset (e.g. number of samples per identity)

MSCeleba¶

The MS-Celeb 1M has around 10M images of 100K identities. This dataset has several issues with mislabeling. We implemented a prunning algorithm using the DBScan clustering.

Todo

Provide some statistics about this dataset (e.g. number of samples per identity)

MegaFace 2¶

The MegaFace 2 has arond 4M samples of 672K identities.

Todo

Provide some statistics about this dataset (e.g. number of samples per identity)