Knowledge Distillation for Face Recognition using Synthetic Data with Dynamic Latent Sampling

Summary

State-of-the-art face recognition models are computationally expensive for mobile applications. Training lightweight face recognition models also requires large identity-labeled datasets, raising privacy and ethical concerns. Generating synthetic datasets for training is also challenging, and there is a significant gap in performance between models trained on real and synthetic face datasets. We propose a new framework (called SynthDistill) to train lightweight face recognition models by distilling the knowledge from a pretrained teacher model using synthetic data. We generate synthetic face images without identity labels, mitigating the problems in the intra-class variation generation of synthetic datasets, and dynamically sample from the intermediate latent space of a face generator network to generate new variations of the challenging images while further exploring new face images. The results on different benchmarking real face recognition datasets demonstrate the superiority of SynthDistill compared to training on previous synthetic datasets, achieving a verification accuracy of 99.52% on the LFW dataset with a lightweight network. The results also show that SynthDistill significantly narrows the gap between real and synthetic data training. The source code of our experiments is publicly available to facilitate the reproducibility of our work.

SynthDistill

One strategy to train lightweight and efficient face recognition networks is to train a face recognition network on the large-scale face recognition datasets. However, training an efficient face recognition model using large-scale face recognition datasets requires access to such a dataset. Nonetheless, large-scale face recognition datasets, such as MS-Celeb, etc., were collected by crawling images from the Internet, thus raising legal, ethical, and privacy concerns. To address such concerns, recently several works proposed generating synthetic face datasets and use the synthetic face images for training face recognition models. However, generating synthetic face datasets with sufficient inter-class and intra-class variations is still a challenging problem. Another strategy for training a lightweight face recognition model is to transfer the knowledge of a model trained on a large dataset to a lightweight network through knowledge distillation. Notwithstanding, the knowledge distillation from a teacher model often requires access to the original or another large-scale real dataset, which has several challenges. In this work, we propose a new framework, named SynthDistill, to distill the knowledge of a pretrained teacher using synthetic face images without identity labels, thus mitigating the need for real identity-labeled data during the distillation phase. We propose dynamic sampling from the intermediate latent space of a StyleGAN to generate new images and enhance training. Hence, our proposed knowledge distillation framework does not require real face images during training.

Block diagram of our proposed knowledge distillation framework (SynthDistill): In step 1, Z space of the StyleGAN is sampled to generate face images. In step 2, the W space is re-sampled based on the teacher-student agreement to generate more challenging samples. The student model is updated based on the distillation loss, and all the other network blocks remain frozen.

We use an online generation of synthetic images and train the lightweight network along with the image generation within a loop through knowledge distillation based framework. We use StyleGAN as a pretrained face generator network and deploy a dynamic sampling approach to generate synthetic face images though a feedback mechanism during training. We use the generated face images to train the lightweight network as a student within our knowledge distillation framework. Based on the teacher-student agreement, we dynamically re-sampled from the intermediate latent space of StyleGAN. For samples with low similarity between embeddings of teacher and student we re-sample similar latent code (to help training difficult samples), but for high similarity we re-sample different latent code (to help generalisation), enabling the training of more robust training. Compared to previous works for the training of face recognition models on synthetic datasets, our proposed knowledge distillation framework does not require identity labels in the training, simplifying the process of generating synthetic face images.

Schematic showing the re-sampling strategy in the proposed approach. When teacher-student agreement is high, the re-sampling method generates diverse images. Conversely, when the similarity is low, i.e., when the given sample is challenging, re-sampling generates similar (challenging) samples facilitating the learning.

🏆 Winner of CVPR 2024 FRCSyn Challenge

Given the effectiveness of our method, SynthDistill achieved the first rank in training face recognition with unlimited synthetic data (sub-task 2.2) of the FRCSyn challenge at CVPR 2024:

CVPR 2024 FRCSyn Challenge Certificate for SynthDistill as the first (winner) solution.

Reproducibility: Source Code

The source code and pretrained models are available in the following GitLab repository.

BibTeX


@article{access2024synthdistill,
  title={Knowledge Distillation for Face Recognition using Synthetic Data with Dynamic Latent Sampling},
  author={Shahreza, Hatef Otroshi and George, Anjith and Marcel, S{\'e}bastien},
  journal={IEEE Access},
  year={2024},
  publisher={IEEE}
}

@inproceedings{ijcb2023synthdistill,
  title={SynthDistill: Face recognition with knowledge distillation from synthetic data},
  author={Shahreza, Hatef Otroshi and George, Anjith and Marcel, S{\'e}bastien},
  booktitle={2023 IEEE International Joint Conference on Biometrics (IJCB)},
  pages={1--10},
  year={2023},
  organization={IEEE}
}