FairFaceGen

The FairFaceGen is a dataset of synthetic faces generated to study the impact of synthetic data generation on the performance and bias of face recognition (FR) models.

Get Data

Description

The FairFaceGen is a dataset of synthetic faces generated to study the impact of synthetic data generation on the performance and bias of face recognition (FR) models. It has about 11+11=22K identities and is built using prompt-based balanced generation across age, race, and gender attributes using Flux.1-dev and Stable Diffusion v3.5 generators. The identity variations are generated using Arc2Face and IP-Adapter variants (SD15 and SDXL backbones, with FaceID/CLIP embeddings) generators that produce variations per identity.

Reference

@INPROCEEDINGS{fairfacegen_ijcb2025,
    title     = {Investigation of accuracy and bias in face recognition trained with synthetic data},
    author    = {Korshunov, Pavel and Kotwal, Ketan and Ecabert, Christophe and Vidit, Vidit and Mohammadi, Amir and Marcel, Sebastien},
    booktitle = {2025 IEEE International Joint Conference on Biometrics (IJCB)},
    pages     = {1--10},
    year      = {2025},
    organization = {IEEE}
}

Link: Investigation of accuracy and bias in face recognition trained with synthetic data