Why Synthetic Data for Face Recognition?
Recent advancements in state-of-the-art face recognition models are driven in part by the availability of large-scale datasets and deep learning models. Meanwhile, large-scale face recognition datasets, such as MS-Celeb, WebFace, etc., were collected by crawling images from the Internet, thus raising legal, ethical, and privacy concerns. To address such concerns, recently, several studies have proposed generating synthetic face datasets and using synthetic face images for training face recognition models. However, generating synthetic face datasets with sufficient inter-class and intra-class variations is still an active area of research.
The Synthetic Data for Face Recognition (SDFR) Competition in the 18th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2024) invites teams to propose clever ways to use synthetic face recognition datasets (either existing or new synthetic face datasets) to train face recognition models. The competition is split in two tasks, where the first task involves a predefined face recognition backbone and limit on the dataset size to focus on the quality of synthesized face datasets, while the second task provides almost complete freedom on the model backbone, the dataset and the training. The top-performing teams in each task will be invited to contribute as co-authors in the competition paper which will appear in the proceedings of FG 2024.
We hope this competition will accelerate research in synthetic data generation to bridge the gap between real and synthetic face datasets.
- Nov 28, 2023: New! The leaderboard is now available, which will be updated during competition.
- Nov 24, 2023: The submission is now open, and the submission instructions are available in Competition Details.
- Nov 16, 2023: You can join the SDFR @ FG 2024 forum now.
- Nov 16, 2023: Teams can have up to 10 members (previously 5).
- Nov 7, 2023: Registration is open.
The competition has two tasks, and each team can use existing synthetic datasets or a newly generated dataset to participate in either or both tasks:
- Task 1 [Constrained]: In this task, the generated synthetic dataset can have up to one million synthesized images (for example, 10,000 identities and 100 images per identity). The backbone is also fixed to iResNet-50.
- Task 2 [Unconstrained]: In this task, the participants can use synthetic data with no limit on the number of synthesized images. Participants are also allowed to use any network architecture and train their best model with state-of-the-art techniques, but only using synthetic data.
2.1 General Rules:
- For both tasks, participants can generate new synthetic face recognition datasets and/or use (a subset, the complete set, or an extension of) existing ones as long as the datasets meet the definition of the corresponding task and are not against any other rules of the competition.
- Participants cannot use a dataset of real images with identity labels (such as WebFace260M, CASIA-WebFace, etc.) for any part of their method and training process. However, participants are allowed to use real datasets without identity labels (such as FFHQ) to train the face generator model and generate synthetic datasets. This can be extended to real datasets with identity labels if the identity labels are not used in training the face generator model.
- For the main face recognition model which is expected to be trained exclusively on synthetic data, participants can not make use of pretrained checkpoints (as initial weights) obtained from training on datasets which do not follow the rules.
- Synthetic datasets can be generated using an existing or a new face generator model (such as GAN-based or diffusion-based models) but cannot be directly generated from real face images.
- Participants can use any face generator network which is trained on any dataset without identity labels. Examples of such face generator networks are StyleGAN, EG3D, LDM, etc.
- Participants are allowed to use a pretrained face recognition model (e.g., ArcFace, AdaFace, etc.) for controlling and generating synthetic datasets. However, they are not allowed to directly learn embeddings of a pretrained face recognition model.
- The submissions should only include only one face recognition model. Using different methods to use different trained models (such as an ensemble of multiple models or a fusion of features extracted by multiple models) is not allowed.
- The final submission needs to be reproducible and include source code and clear instructions for reproducibility.
- The trained face recognition model should be submitted in ONNX format. More information about submission will be available in the submission instructions.
- Submissions which do not include all items described in the submission instruction by the "Submission Deadline for Reproducibility Materials" are disqualified and will not be considered for the final evaluation.
2.2. Specific Rules to Task 1:
- Dynamic synthetic data generation during training is not allowed in task 1, and participants are required to use an already generated dataset (new and/or existing datasets) for training. This rule does not prevent data augmentation techniques during training and is only concerned with generated samples prior to data augmentation.
2.3. Specific Rules to Task 2:
- Dynamic synthetic data generation during training is allowed in task 2, and participants can control data generation during training.
- The face recognition model for task 2 should be able to calculate features for a single image on a GPU with 24GB of graphic memory during inference. There is no limitation on the speed or complexity of the model.
- Using any new or existing network structure is allowed for task 2 as long as it meets the computation requirement for the inference in the previous rule. Participants are also allowed to use any normalization technique in the structure of their networks to improve training with synthetic data as long as they do not use real images with identity labels in training.
3. Dev Kit
You can download the dev kit here.
All the submitted models will be evaluated on several benchmarking datasets of real images (LFW, CA-LFW, CP-LFW, AgeDB, CFP-FP, IJB-B, IJB-C) and will be ranked by Borda Count of performance over these datasets. The models should be in the format defined in the submission instructions to extract features for each given image. The competition organizers will run pairwise comparisons to calculate the performance metrics for evaluation.
5. Submission Instruction
5.1. Submission Platform
After completing the registration for the competition, participants can use the submission platform for making a submission. Instructions to signup and submit the models on the submission platform can be found here. The submitted models will be evaluated and the results will apear in the leaderboard.
5.2. Submission Format
Participants are required to use the submission platform (as described above) for their submission. A complete submission should include:
- Trained face recognition model: Participants are required to submit the trained model in ONNX format. In addition, for each submission participants are required to submit a score file for provided images in the dev kit. For more information about the ONNX model and the score file, please check the dev kit. You can also find here a sample submission for Trained Face Recognition Models for task 1. Note: Each team can submit multiple models during the period of competition and find the performance on the leaderboard. However, a new submission replaces the previous submission, and the last submission will be considered as the final submission.
- Reproducibility materials: For each task to which the participant submit their Trained face recognition model, in addition to trained models, participants are required to provide the following items for evaluation.
Note: Reproducibility materials do not change the ranking and are used for the organization purpose of the competition.
However, submissions without reproducibility materials are considered incomplete.
Note: Participants are recommended to submit the reproducibility materials for their last submitted face recognition model. Therefore, it is not required to submit reproducibility materials for other model submissions.
Note that there are also two deadlines in the competition timeline for submitting the trained face recognition models and reproducibility materials.
- Source code: Submissions need source code for data generation and training face recognition models. If participants trained a new face generator network, they need to include the source code of training their face generator model too. Source code should be commented to help reproducibility. Moreover, participants are required to provide a readme file describing instructions to use the source code.
- Face generator models: If participants used an existing face generator model, they need to clarify the face generator model and provide a link to the checkpoint. In case the participants trained a new face generator network, they need to provide the trained model. The trained face generator models can be in ONNX format or any other format.
- Configuration and packages: Participants are required to provide information about the configuration and packages used to train their models in their submission. This can include an environment file for submissions based on Python.
- Meta-data: Participants are required to provide information about the generated dataset used for submission, including the number of images, number of identities, and number of images per identity. In addition, they need to report the hardware used for training as well as the roughly estimated training time on the used hardware.
- Description of the method (PDF): Participants are required to provide a PDF file describing their method for generating synthetic face image and citation to any existing datasets used for their submission (if any). In addition, they need to describe their training method, such as loss function, network structure, etc. There is no limit on the number of pages, but we recommend paticipants to use the maximum of 2 pages.
If you wish to participate in the SDFR competition, you are required to register your team (with a maximum of 10 members) using the registration form.
While registeration remains open until the final submission deadline, we recommend interested teams to register earlier to be notified of any update about the competition.
- Nov 7, 2023: Website Release, Registration Open, and Call for Participation
- Nov 24, 2023: Submission Instruction Release and Submission Open
- Feb 24, 2024 (23:59 GMT): Submission Deadline for Trained Face Recognition Models
- Mar 1, 2024 (23:59 GMT): Submission Deadline for Reproducibility Materials
- Apr 20, 2024: Submission of the Competition Paper to FG 2024