GaFaR: Geometry-aware Face Reconstruction

1Idiap Research Institute, 2EPFL, 3UNIL
sample reconstructed face image

Sample face images from the FFHQ dataset (first row) as well as their corresponding 3D (third row) and frontal 2D reconstruction (second row) from facial templates in the whitebox template inversion attack against ArcFace. Values show the cosine similarity between the templates of the original and frontal reconstructed face images.

Summary

Face recognition systems are increasingly being used in different applications. In such systems, some features (also known as embeddings or templates) are extracted from each face image. Then, the extracted templates are stored in the system's database during the enrollment stage and are later used for recognition. In this project, we comprehensively evaluate the vulnerability of state-of-the-art face recognition systems to template inversion attacks using 3D face reconstruction. We propose a new method (called GaFaR) to reconstruct 3D faces from facial templates using a pretrained geometry-aware face generation network, and train a mapping from facial templates to the intermediate latent space of the face generator network. We train our mapping with a semi-supervised approach using real and synthetic face images. For real face images, we use a generative adversarial network (GAN)-based framework to learn the distribution of generator intermediate latent space. For synthetic face images, we directly learn the mapping from facial templates to the generator intermediate latent code. Furthermore, to improve the success attack rate, we use two optimization methods on the camera parameters of the GNeRF model. We propose our method in the whitebox and blackbox attacks against face recognition systems and compare the transferability of our attack with state-of-the-art methods across other face recognition systems on the MOBIO and LFW datasets. We also perform practical presentation attacks on face recognition systems using the digital screen replay and printed photographs, and evaluate the vulnerability of face recognition systems to different template inversion attacks. To our knowledge, this is the first work on 3D face reconstruction from facial templates.
General blockdiagram

General blockdiagram of the proposed method: we train a mapping network from facial templates (input) to the intermediate latent space W of GNeRF model. The mapped latent codes along with camera parameters are fed to the GNeRF generator and renderer network (fixed) to generate face image from desired view.

Proposed Face Reconstruction Method (GaFaR)

Block diagram of our proposed template inversion attack: During the training process, a semi-supervised approach is used to learn our mapping (illustrated as a green block) from the facial templates to the intermediate latent space of the GNeRF model. We use real training data and synthetic training data simultaneously for unsupervised and supervised learning in our method. In the inference stage, the leaked template is fed into our mapping network to find corresponding vector in the intermediate latent space of the GNeRF. Then, camera parameters along with generated latent code are given to the generator and renderer of GNeRF to generate a reconstructed face image. To enhance the attack, we propose an optimization (grid search or continuous optimization) on two of the camera parameters, to find the best pose, which minimizes the distance between the template of reconstructed face image and the leaked template.

Block diagram of GaFaR

Evaluation

For evaluation, we use the reconstructed face images and inject to the taget face recognition system. In our IEEE-TPAMI paper, we also perform practical presentation attack using the reconstructed face image. Th blockdiagram of our evaluation scenario is depicted in the following figure:

Evaluation Blockdiagram

Presentation Attack using Reconstructed Face Images

We performed practical presentation attack using the reconstructed face image. The following figure shows our evaluation setup for performing different types of presentation and capturing presentation using mobile devices (a) replay attack using Apple iPad Pro, and (b) presentation attack using printed photograph.

PA setup

We considered three different mobile devices, including Apple iPhone 12, Xiaomi Redmi 9A, and Samsung Galaxy S9, as the camera of the target face recognition system and capture images from the presentations. The captured images from our presentation attacks are publicly available.


Reproducibility: Source Code and Data

BibTeX


  @article{tpami2023ti3d,
    author    = {Hatef Otroshi Shahreza and S{\'e}bastien Marcel},
    title     = {Comprehensive Vulnerability Evaluation of Face Recognition Systems to Template Inversion Attacks Via 3D Face Reconstruction},
    journal   = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
    year      = {2023},
    volume    = {45},
    number    = {12},
    pages     = {14248-14265},
    doi       = {10.1109/TPAMI.2023.3312123}
  }

  @inproceedings{iccv2023ti3d,
    author    = {Hatef Otroshi Shahreza and S{\'e}bastien Marcel},
    title     = {Template Inversion Attack against Face Recognition Systems using 3D Face Reconstruction},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    pages     = {19662--19672},
    month     = {October},
    year      = {2023}
  }