DATASETS (52)

TITLE DESCRIPTION
3DMAD The 3D Mask Attack Database (3DMAD) is a biometric (face) spoofing database. It currently contains 76500 frames of 17 persons, recorded using Kinect for both real-access and spoofing attacks. Each frame consists of:
AMI AMI Meeting Corpus is a multi-modal data set consisting of 100 hours of meeting recordings.
AREX AMI Requests for Explanations and Relevance Judgments for their Answers
AV16.3: an Audio-Visual Corpus for Speaker Localization and Tracking The AV16.3 corpus is an audio-visual corpus of real indoor multispeaker data, designed to test algorithms for audio-only, video-only and audio-visual speaker localization and tracking.
AVspoof The AVspoof database is intended to provide stable, non-biased spoofing attacks in order for researchers to test both their ASV systems and anti-spoofing algorithms. The attacks are created based on newly acquired audio recordings. The data acquisition process lasted approximately two months with 44 persons, each participating in several sessions configured in different environmental conditions and setups. After the collection of the data, the attacks, more precisely, replay, voice conversion and speech synthesis attacks were generated.
Biometric resources Find useful protocols, annotations, etc. that are provided to help encourage reproducible research.
bioscote: BIOmetric SCOres Thesis Elshafey 2014 This dataset contains raw scores in plain text format of several biometric (face and speaker) recognition systems applied on several databases.
CCC - Cursive Character Challenge This is the home page of Cursive Character Challenge (C-Cube), the new benchmark for machine learning and pattern recognition algorithms. The database contains 57293 cursive characters manually extracted from cursive words, including both upper and lower case versions of each letter.
COHFACE The COHFACE dataset contains RGB video sequences of faces, synchronized with heart-rate and breathing-rate of the recorded subjects
CSMAD Custom Silicone Mask Attack Dataset
DeepfakeTIMIT DeepfakeTIMIT is a database of videos where faces are swapped using the open source GAN-based approach, which, in turn, was developed from the original autoencoder-based Deepfake algorithm.
DIH - Depth Images with Humans A dataset of depth images of people for the tasks of body pose estimation and body landmark detection in depth images.
Disco-Annotation Disco-Annotation is a collection of training and test sets with manually annoted discourse relations for 8 English discourse connectives in europarl texts.
DW-Dubbing The DW-Dubbing dataset was annotated to evaluate algorithms detecting dubbing scenes in broadcast media.The face tracks with audio are collected from 15 videos of Deutsche-Welle broadcast programs.
ELEA The corpus was gathered with the aim of analyzing emergent leadership as a social phenomenon that occurs in newly formed groups.
Europarl-direct Europarl-direct These files provide statement pair extractions from the Europarl corpus of the same known source language directly translated to the target languages
EYEDIAP The EYEDIAP dataset was designed to train and evaluate gaze estimation algorithms from RGB and RGB-D data. It contains a diversity of participants, head poses, gaze targets and sensing conditions.
ERPA This is a small dataset representing face-image data from 5 subjects (‘subject1’ – ‘subject5’). For each subject, images have been captured with two cameras – the Intel Realsense SR300, and the Xenics Gobi thermal (LWIR) camera.
fvspoofingattack: The Spoofing-Attack Finger vein Database The Spoofing-Attack Database for finger vein spoofing consists of 440 index real and fake finger images attempts to 110 clients.
Hand Posture and Gesture Datasets This webpage provides several benchmark databases for hand posture and hand gesture recognition.
HATDOC Human Attention in Document Classification
Head Pose Database The objective was to construct a video database allowing to perform quantitative evaluation of algorithms extracting information related to the head pose of people, such as head tracking and pose estimation algorithms, or focus of attention analysis.
idiap-poster-data The Idiap Poster Data consists of images extracted from 6 hours of videos shot during a poster session.
InteractPlay Dataset InteractPlay Dataset is a hand gesture database made of a 3D hand trajectories. It contains 16 hand gestures from 22 persons and provides 5 sessions and 10 recordings per session
maya-codex The Maya Codex Dataset contains high-quality representation of the ancient Maya hieroglyph data, and a statistic glyph co-occurrence information that we extracted from the Thompson catalog [1].
MDC: Mobile Data Challenge MDC consists of large quantities of continuous data pertaining to the behaviour of individuals and social networks, recorded via mobile phones from 2009 to 2011 in the Lausanne/Geneva area. About 200 persons participated in the data collecting campaign.
Mediaparl Mediaparl is a Swiss accented bilingual database containing recordings in both French and German as they are spoken in Switzerland
Mobio The MOBIO database currently consists of 152 people (audio and video samples) with 12 sessions each.
msspoof: Multispectral-Spoof Database Multispectral-Spoof contains face images and printed spoofing attacks recorded in Visible (VIS) and Near-Infrared (NIR) spectra for 22 identities.
PRINT-ATTACK The Print-Attack Database consists of video samples of spoofing attacks using printed photos to 50 identities under different lighting conditions.
Replay-Mobile The Replay-Mobile Database for face spoofing consists of 1190 video clips of photo and video attack attempts to 40 clients, under different lighting conditions.
SSLR Sound Source Localization for Robots (SSLR) Dataset
Swiss-French SpeechDat(M) FDB-1000

The Swiss-French SpeechDat(M) project comprises 1000 Swiss-French speakers (575 female and 425 male speakers) recorded directly over the Swiss fixed telephone network using an ISDN interface. The corpus contains phonetically rich sentences & application oriented utterances such as keywords, digits, etc.. Speech samples are stored as sequences of 8-bit 8 kHz A-law speech samples (before compression). Each prompted utterance is stored in a separate file

Swiss-German SpeechDat(II) FDB-2000 The Swiss-German SpeechDat(II) FDB-2000 comprises 2000 Swiss-German speakers (992 males, 1008 females) recorded over the Swiss fixed telephone network. This database is partitioned into 6 CDs
Swiss-French SpeechDat(II) FDB-3000

The Swiss-French SpeechDat(II) FDB-3000 comprises 3000 Swiss-French speakers (1500 males, 1500 females) recorded over the Swiss fixed telephone network. This database is partitioned into 6 CDs, each of which comprises 500 speakers sessions. The speech databases made within the SpeechDat(II) project were validated by SPEX, the Netherlands, to assess their compliance with the SpeechDat format and content specifications.
Speech samples are stored as sequences of 8-bit 8 kHz A-law. Each prompted utterance is stored in a separate file. Each signal file is accompanied by an ASCII SAM label file which contains the relevant descriptive information.

swiss-french-polyvar

PolyVar is a speaker verification database comprising native and non-native speakers of French, mainly from Switzerland but also from other European countries. It consists of read and spontaneous speech recorded by 143 speakers (85 male and 58 female) amounting to 160 hours of speech. Each speaker recorded from 1 to 229 sessions, giving a total of 3,600 recorded sessions. The data are provided with orthographic annotation.

Speechdat - VERIF1SF

This subset of PolyVar (cf. ELRA-S0046) consists of 20 speakers which recorded 50 sessions. The format in use is SAM (a-law).

Swiss-French Polyphone Database 1000 speakers

Like the Dutch and German polyphone corpora, this is a Polyphone-like database recorded in Switzerland to cover the French language as spoken in the Roman area.

The database consists of 5,000 speakers who answered several questions (around 10), leading to spontaneous speech, and reading about 28 items .

This form contains several speech sequences, including sentences from different sources (local newspapers, existing corpora, law articles, etc.) to ensure a good phonetic coverage, application words from a defined list of command words, currency amounts, quantities, credit card numbers, spelled words (mainly names), etc.
The database is divided into two subsets: the first one comprises 1,000 speakers and the second one 4,000 speakers (1,000 speakers are not available). Each subset is divided into two subsets: the phonetically rich sentences and the application-oriented data.

Swiss-French Polyphone Database 4000 speakers

Like the Dutch and German polyphone corpora, this is a Polyphone-like database recorded in Switzerland to cover the French language as spoken in the Roman area.

The database consists of 5,000 speakers who answered several questions (around 10), leading to spontaneous speech, and reading about 28 items from a form supplied by IDIAP.

This form contains several speech sequences, including sentences from different sources (local newspapers, existing corpora, law articles, etc.) to ensure a good phonetic coverage, application words from a defined list of command words, currency amounts, quantities, credit card numbers, spelled words (mainly names), etc.
The database is divided into two subsets: the first one comprises 1,000 speakers and the second one 4,000 speakers (1,000 speakers are not available). Each subset is divided into two subsets: the phonetically rich sentences and the application-oriented data

TA2 The TA2 database consists of high-definition, simultaneous A/V recordings and annotations from two separate rooms, where the participants play games and communicate with each other over a video-conferencing system.
TED A dataset for recommendations collected from ted.com which contains metadata fields for TED talks and user profiles with rating and commenting transactions.
Tense-Annotation This dataset provides parallel texts in English/French from Europarl, along with an alignment of the verbs in the sentences with information on their position, tense and voice.
The Replay-Attack Database The Replay-Attack Database for face spoofing consists of 1300 video clips of photo and video attack attempts to 50 clients, under different lighting conditions. This Database was produced at the Idiap Research Institute, in Switzerland.
Two-Handed Datasets This database consists of different two-handed gestures (rotations in all the 6 directions and a push" gesture)."
UBIPose The UBIPose dataset is a subset of the UBImpressed dataset. It is intended for the evaluation of head pose estimation algorithms in natural and challenging scenarios. This dataset provides the annotation of the positions of 6 facial landmarks (two corners of two eyes, nasal root and nose tip) in 14.4 K frames and 3D head poses (roll, pitch, yaw) in 10.4 K frames.
Unicity UNICITY consists of 58k images collected from 65 recorded sequences with one or two people performing different behaviors including attacks and trickeries, like for instance tailgating (when a person walks very close to another to get into a restricted area). It also provides full annotation of people such as the location of head and shoulders. As as result, UNICITY is perfectly suited for training and adapting machine learning algorithms for video surveillance applications.
VERA Fingervein The VERA Fingervein Database for fingervein recognition consists of 440 images from 110 clients.
VERA Palmvein The VERA Palmvein Database for palmvein recognition consists of 2200 images from 110 clients. This Database was produced at the Idiap Research Institute in Martigny and at Haute Ecole Spécialisée de Suisse Occidentale in Sion, in Switzerland.
VERA Spoofing Fingervein The VERA Spoofing Fingervein Database for direct attacks fingervein recognition consists of 200 images attempts to the 50 first clients from the Idiap Research Institute VERA Fingervein Database. This Database was produced at the Idiap Research Institute in Martigny, in Switzerland.
VERA Spoofing Palmvein The VERA Spoofing Palmvein Database for direct attacks palmvein recognition consists of 1000 images attempts to the 50 first clients from the Idiap Research Institute VERA Palmvein Database. This Database was produced at the Idiap Research Institute in Martigny, in Switzerland.
voicePA The database with speech data from 44 speakers and 28 presentation attacks, including synthetic and replay attacks, recorded in different environments by using different speakers and microphones (mobile phones and laptop)
walliserdeutsch News bulletins in the upper valaisan german dialect, broadcast by RRO (radio rottu oberwallis), taken from their web site and annotated at Idiap.
wolf corpus The wolf corpus is an audio-visual data set containing around 81 hours of conversational data among groups of 8-12 people playing a role playing game.
youtube-personality The YouTube personality dataset consists of a collection of behavorial features, speech transcriptions, and personality impression scores for a set of 404 YouTube vloggers that explicitly show themselves in front of the a webcam talking about a variety of topics including personal issues, politics, movies, books, etc. There is no content-related restriction and the language used in the videos is natural and diverse.