Personal tools
You are here: Home Dataset voicePA

voicePA dataset

Introduction

The 'voicePa' database contains a set of genuine voice samples from 44 speakers and 24 different types of speech presentation attacks. The attacks were created using the genuine data recorded for 'AVspoof' database. The presentation attacks originally provided by 'AVspoof' database are also available in 'voicePA' database.

Genuine data

The genuine (non-attack) data is taken directly from 'AVspoof' database and can be used by both automatic speaker verification (ASV) and presentation attack detection (PAD) systems (folder 'genuine' contains this data). The genuine data acquisition process lasted approximately two months with 44 subjects, each participating in four different sessions configured in different environmental setups. During each recording session, subjects were asked to speak out prepared (read) speech, pass-phrases and free speech recorded with three devices: one laptop with high-quality microphone and two mobile phones (iPhone 3GS and Samsung S3).

Attack data

Based on the genuine data, 24 types of presentation attacks were generated. Attacks were recorded in 3 different environments (two typical offices and a large conference room), using 5 different playback devices, including built-in laptop speakers, high quality speakers, and three phones: iPhone 3GS, iPhone 6S, and Samsung S3, and assuming an ASV system running on either laptop, iPhone 3GS, or Samsung S3. In addition to a replay type of attacks (speech is recorded and replayed to the microphone of an ASV system), two types of synthetic speech were also replayed: speech synthesis and voice conversion (for the details on these algorithms, please refer to the paper below published in BTAS 2015 and describing 'AVspoof' database).

Protocols

The data in 'voicePA' database is split into three non-overlapping subsets: training (genuine and attack samples from 4 female and 10 male subjects), development or 'Dev'  (genuine and attack samples from 4 female and 10 male subjects), and evaluation or 'Eval'  (genuine and attack samples from 5 female and 11 male subjects).

Details about the attacks copied from AVspoof database, as well as on speech synthesis and voice conversion algorithms can be found in this publication:

 * Serife Kucur Ergunay, Elie Khoury, Alexandros Lazaridis, Sebastien Marcel. "On the vulnerability of Speaker Verification to Realistic Voice Spoofing", BTAS 2015.

 * Pavel Korshunov, Andreé R. Goncalves, Ricardo P. V. Violato, Flávio O. Simões and Sébastien Marcel. "On the Use of Convolutional Neural Networks for Speech Presentation Attack Detection", International Conference on Identity, Security and Behavior Analysis, 2018