Personal tools
You are here: Home Dataset

Idiap datasets listing



Name Short desc Size Numb. files License yes/no Dist. type
3DMAD The 3D Mask Attack Database (3DMAD) currently contains 76500 frames of 17 persons, recorded using Kinect 39G 5 1 WEB Full description
AMI AMI Meeting Corpus is a multi-modal data set consisting of 100 hours of meeting recordings. 875G 174 1 WEB Full description
AREX AMI Requests for Explanations and Relevance Judgments for their Answers 1M 1 0 WEB Full description
AV16-3 Audio-Visual Corpus for Speaker Localization and Tracking 7G 6 1 WEB Full description
avspoof Database including 10 types of voice recognition attacks 29G 3 1 WEB Full description
bioscote This dataset contains raw scores in plain text format of several biometric (face and speaker) recognition systems applied on several databases. 80GB 9 0 WEB Full description
CCC Cursive Character Challenge 215M 4 0 WEB Full description
COHFACE The COHFACE dataset contains RGB video sequences of faces, synchronized with heart-rate and breathing-rate of the recorded subjects. 310M 1 1 WEB Full description
Disco-Annotation Disco-Annotation is a collection of training and test sets with manually annoted discourse relations for 8 English discourse connectives in europarl texts. 204K 1 0 WEB Full description
ELEA The corpus was gathered with the aim of analyzing emergent leadership as a social phenomenon that occurs in newly formed groups. 4.1G 1 1 WEB Full description
ERPA This is a small dataset representing face-image data from 5 subjects (‘subject1’ – ‘subject5’). For each subject, images have been captured with two cameras – the Intel Realsense SR300, and the Xenics Gobi thermal (LWIR) camera. For each subje 2.4G 1 1 WEB Full description
Europarl-direct Europarl-direct These files provide statement pair extractions from the Europarl corpus of the same known source language directly translated to the target languages 149M 1 1 WEB Full description
eyediap The EYEDIAP dataset was designed to train and evaluate gaze estimation algorithms from RGB and RGB-D data. It contains a diversity of participants, head poses, gaze targets and sensing conditions. 54G 17 1 WEB Full description
fvspoofingattack The Spoofing-Attack Database for finger vein spoofing consists of 440 index real and fake finger images attempts to 110 clients. 54M 1 1 WEB Full description
HeadPose The objective was to construct a video database allowing to perform quantitative evaluation of algorithms extracting information related to the head pose of people, such as head tracking and pose estimation algorithms, or focus of attention analysis. 2.6GB 1 1 WEB Full description
idiap-poster-data The Idiap Poster Data consists of images extracted from 6 hours of videos shot during a poster session. 43 GB 6 1 WEB Full description
maya-codex The Maya Codex Dataset contains high-quality representation of the ancient Maya hieroglyph data, and a statistic glyph co-occurrence information that we extracted from the Thompson catalog [1]. 61M 1 1 WEB Full description
MDC MDC consists of large quantities of continuous data pertaining to the behaviour of individuals and social networks, recorded via mobile phones from 2009 to 2011 in the Lausanne/Geneva area. About 200 persons participated in the data collecting campaign. 50 GB 1 1 HDD Full description
Mediaparl Mediaparl is a Swiss accented bilingual database containing recordings in both French and German as they are spoken in Switzerland 4.8GB 1 1 WEB Full description
MOBIO The MOBIO database currently consists of 152 people (audio and video samples) with 12 sessions each ~135GB 18 1 WEB Full description
msspoof Multispectral-Spoof contains face images and printed spoofing attacks recorded in Visible (VIS) and Near-Infrared (NIR) spectra for 22 identities. 1.9G 1 1 WEB Full description
PrintAttack The Print-Attack Replay Database for consists of 200 video clips of printed-photo attack attempts to 50 clients, under different lighting conditions. It also contains 200 real-access attempt videos from the same clients 1.1Gb 7 1 WEB Full description
Replay-Mobile The Replay-Mobile database for face anti-spoofing on mobile-devices consists of 1190 videos of 40 subjects, including real-access videos and attack videos. The database was produced at the IDIAP, Switzerland, in collaboration with Gradiant, Spain. 15G 2 1 WEB Full description
ReplayAttack The Replay-Attack Database for face spoofing consists of 1300 video clips of photo and video attack attempts to 50 clients, under different lighting conditions. This Database was produced at the Idiap Research Institute, in Switzerland. ~3 Gb (compressed) 7 1 WEB Full description
TA2 The TA2 database consists of high-definition, simultaneous A/V recordings and annotations from two separate rooms, where the participants play games and communicate with each other over a video-conferencing system. ~50GB 2 1 WEB Full description
TED A dataset for recommendations collected from ted.com which contains metadata fields for TED talks and user profiles with rating and commenting transactions. 100.77 MB 1 0 WEB Full description
Tense-Annotation This dataset provides parallel texts in English/French from Europarl, along with an alignment of the verbs in the sentences with information on their position, tense and voice. 300M 2 0 WEB Full description
vera-fingervein The VERA Fingervein Database for fingervein recognition consists of 440 images from 110 clients. 33M 2 1 WEB Full description
vera-palmvein The VERA Palmvein Database for palmvein recognition consists of 2200 images from 110 clients. This Database was produced at the Idiap Research Institute in Martigny and at Haute Ecole Spécialisée de Suisse Occidentale in Sion, in Switzerland. 209M 1 1 WEB Full description
vera-spoofingfingervein The VERA Spoofing Fingervein Database for direct attacks fingervein recognition consists of 200 images attempts to the 50 first clients from the Idiap Research Institute VERA Fingervein Database. This Database was produced at the Idiap Research Institute 15M 1 1 WEB Full description
vera-spoofingpalmvein The VERA Spoofing Palmvein Database for direct attacks palmvein recognition consists of 1000 images attempts to the 50 first clients from the Idiap Research Institute VERA Palmvein Database. This Database was produced at the Idiap Research Institute in Ma 218M 1 1 WEB Full description
walliserdeutsch News bulletins in the upper valaisan german dialect, broadcast by RRO (radio rottu oberwallis), taken from their web site and annotated at Idiap. ~4G 1 1 WEB Full description
WOLF The WOLF corpus is an audio-visual data set containing around 81 hours of conversational data among groups of 8-12 people playing a role playing game. ~100GB 15 1 WEB Full description
youtube-personality The YouTube personality dataset consists of a collection of behavorial features, speech transcriptions, and personality impression scores for a set of 404 YouTube vloggers that explicitly show themselves in front of the a webcam talking about a variety of 496KB 1 0 WEB Full description