VFPAD

in-Vehicle Face Presentation Attack Detection

Get Data


Description

The in-Vehicle Face Presentation Attack Detection (VFPAD) dataset consists of bona-fide and 2D/3D attack presentations acquired for a subject (real or fake) in the driver’s sear of the car. These presentations have been captured using an NIR camera (940 nm) placed on the steering wheel of the car, while NIR illuminators have been fixed on both front pillars (adjacent to the wind-shield) of the car. The bona-fide videos represent 24 male and 16 female subjects of various ethnicities. The PAI species used to construct this dataset include photo-prints, digital displays (for replay attacks), rigid 3D masks, and flexible 3D masks made of silicone.

 

Data Collection

The videos comprising this dataset represent bona-fide and attack presentations under a range of variations:

  • Environmental variations: presentations have been recorded in four sessions, each under different environmental conditions (outdoor sunny; outdoor cloudy; indoor dimly-lit; and indoor brightly-lit)
  • Different scenarios: bona-fide presentations for each subject have been captured with variety of appearances: with/without glasses, with/without hat, etc.
  • Illumination variations: two illumination conditions have been used: ‘uniform’ (both NIR illuminators switched on), and ‘non-uniform’ (only the left NIR-illuminator switched on), and
  • Pose variations: two poses (‘angles’) have been used: ‘front’: the subject looks ahead at the road; and ‘below’: subject looks straight into the camera.

As Figure 1 shows, the camera is placed on the steering column, looking up at the subject’s face.

 

 

Structure of the Dataset

Each presentation is recorded in a separate file in HDF5 format. The hdf5 files have the following directory-structure:

/stream_0
/stream_0/recording_0
/stream_0/recording_1

The subdirectory recording_0 contains several frames that may be used for illumination-calibration. These frames represent a video, approximately 2 seconds long, that has been captured without the NIR 940nm illumination. Therefore, these frames capture the ambient natural light.

The subdirectory recording_1 contains frames of a 10-second long video, with the appropriate NIR illuminators switched on. These are the frames that are used for PAD experiments.

 

 

Overall Statistics

  Number of videos
bona-fide 4046
PA 1790
Total no. of videos 5836

 

The dataset is divided into two folders: bf and pa--- each of which consists of sub-folders for each client (real subject or PAI). All recordings for the given client are stored in the corresponding sub-folder. These presentations are stored in HDF5 format. The filename encodes information about the type of presentation recorded. The filename has the following format:

<presentation-type>_<session-id>_<angle-id>_<illumination-id>_<client-id>_<presenter-id>_<type-id>_<sub-category-id>_<pai-id>_<trial-id>.hdf5

 

The description for each field is provided below:

  Component Length Description
1 presentation-type 2 char bf or pa: string indicating whether the corresponding sample is bona-fide or PA.
2 session-id 2 digits 01, 02, 03, or 04: indicates the session (S1, S2, S3, or S4, respectively) in which the data is captured.
3 angle-id 1 digit 1 or 2: indicates the angle between camera and face (below: 1; or front: 2).
4 illumination-id 1 digit 1 or 2: indicates the light distribution over the face (non-uniform: 1; or uniform: 2).
5 client-id 4 digits The identity assigned to the bona-fide subject or to the PAI that is in front of the camera. For bona-fide subjects, arbitrary numerical identities have been used from 0001 to 0040. For PAIs, arbitrary strings havebeen used to create identities for each PAI.
6 presenter-id 4 digits Redundant information for the present version of the dataset. Indicates who is presenting the face (real or fake) to the camera. In the dataset the presenter-id is either 0000 (for bf) or 0001 (for pa) for every file.
7 type-id 2 digits 00, 01, 02, 03, or 04: Indicates the main category of presentation. The numeric strings correspond to bona-fide, 2D print attacks, 2D replay attacks, 3D silicone masks, and 3D rigid masks, respectively.
8 sub-category-id 2 digits Indicates the sub-category of the main category indicated by type-id See Table 2 for explanations of sub-category-id for the various type-id values.
9 pai-id 3 digits 3 digits A unique number given to each presentation attack instrument. For bona-fide presentations this number is always 000.
10 trial-id 8 digits An arbitrary numeric string. This string helps to distinguish between separate captures of the same presentation for the exactly same recording scenario.

 

The details of sub-category-id are provided in Table below:

Type ID Sub-Category ID Description
00 (bona-fide) 00 Natural (no glasses or hat)
01 Medical glasses (wherever applicable)
02 Clear glasses
03 Sunglasses
04 Hat (no glasses)
05 Hat + clear glasses
06 Hat + sunglasses
01 (Print) 01 Matte on Laser printer
02 Glossy on Laser printer
03 Matte on Inkjet printer
04 Glossy on Inkjet printer
02 (Replay-attack) 00
03 (3D Silicone masks) 00 Generic flexible mask (G-Flex-3D-Mask)
01 Custom flexible mask (C-Flex-3D-Mask)
04 (3D Rigid masks) 00 Custom rigid mask 1
02 Custom rigid mask 2
03 Custom rigid mask 3
04 Custom rigid mask 4

 

Experimental protocol

The reference publication considers the experimental protocol named grandtest. For a frame-level evaluation, 20 frames from each video have been used, except for print attacks. The VFPAD dataset consists of relatively less number of print attacks. The grandtest protocol, thus, considers 80 frames per video to provide a fair representation of print attacks during experimentation. For the grandtest protocol, videos were divided into fixed, disjoint groups: train, dev, and eval. Each group consists of unique subset of subjects. (Subjects of one group are not present in other two).

Details of the grandtest protocol are summarized below:

Partition #Videos Split ratio (%)
train bona-fide 1503 37.15
train PA 595 33.24
dev bona-fide 1247 30.82
dev PA 666 37.20
eval bona-fide 1296 32.03
eval PA 529 29.56
Total 5836  

 

 

Citation

If you use the dataset, please cite the following publication:

@article{IEEE_TBIOM_2021,
  author = {Kotwal, Ketan and Bhattacharjee, Sushil and Abbet, Philip and Mostaani, Zohreh and Wei, Huang and Wenkang, Xu and Yaxi, Zhao and Marcel, S\'{e}bastien},
  title = {Domain-Specific Adaptation of CNN for Detecting Face Presentation Attacks in NIR},
  journal = {IEEE Transactions on Biometrics, Behavior, and Identity Science},
  publisher = {{IEEE}},
  year={2022},
  volume={4},
  number={1},
  pages={135--147},
  doi={10.1109/TBIOM.2022.3143569}
}