A PHP/Symfony bundle which provides a cryptographer resource/service for common cryptographic operations The Data Cryptographer Bundle is a PHP/Symfony bundle which provides a cryptographer resource/service for common cryptographic operations:
A reference-based metric to evaluate the accuracy of pronoun translation (APT) The APT software is a reference-based metric to evaluate the accuracy of pronoun translation.
acoustic-simulator Implementation of audio degradation processes
ACT ACT for Accuracy of Connective Translation is a reference-based metric to measure the accuracy of discourse connective translation, mainly for statistical machine translation systems.
asrt A python library that facilitate the extraction of text sentences from multilingual 'pdf' documents
Attentive Residual Connections NMT Implementation and output data of "Global-Context Neural Machine Translation through Target-Side Attentive Residual Connections"
BEAT platform The BEAT platform is a European computing e-infrastructure for Open Science proposing a solution for open access, scientific information sharing and re-use including data and source code while protecting privacy and confidentiality. It allows easy online access to experimentation and testing in computational science.
BOB Bob is a free signal-processing and machine learning toolbox developed by the Biometrics group at Idiap Research Institute, Switzerland. The toolbox is written in a mix of Python and C++ and is designed to be both efficient and reduce development time. Vein biometrics recognition baselines
Bob's library of image-quality feature-extractors This package is part of the signal-processing and machine learning toolbox Bob. It provides functions for extracting image-quality features proposed for PAD experiments by different research groups. Image quality measures proposed by Galbally et al. (IEEE TIP 2014) and by Wen et al. (IEEE TIFS 2015) are implemented in this package.
bob.paper.eusipco2018 Source code for reproducing the speaker inconsistency detection experiments of the paper "Speaker Inconsistency Detection in Tampered Video" in EUSIPCO 2018 conference
CNN_QbE_STD Implementation of the work presented in "CNN based Query by Example Spoken Term Detection"
CNN-voice-PAD The purpose of this software is to train Convolutional Neural Networks on raw speech signals in order to detect voice presentation attacks.
Content-Based Recommendation Generator (CBRec v1.0) A Python library which generates content-based recommendations for a set of items described by textual metadata using four possible vector space methods, namely TF-IDF, LSI, RP and LDA.
DiscoConn-Classifier Classifier models and feature extractors for discourse relations
DocRec - Keyword Extraction and Document Recommendation in Conversations The package contains several pieces of Matlab code. Taken together, they extract keywords from a conversation, then use them to build implicit queries, and then consolidate the sets of retrieved documents to recommend to the conversation participants.
eakmeans - Implementation of fast exact k-means algorithms Implementation of fast exact k-means algorithms
Eigenposterior Eigenposterior (Senone Class Principal Components) based approach for purifying DNN posterior estimates
Emotion-Based Recommendation Generator (EMORec v1.0) A Python library which performs emotion-based analysis and recommendation using a multiple-instance regression algorithm for a set of multimedia items described by transcripts
ERPA This is a small dataset representing face-image data from 5 subjects (‘subject1’ – ‘subject5’). For each subject, images have been captured with two cameras – the Intel Realsense SR300, and the Xenics Gobi thermal (LWIR) camera. For each subject, the dataset contains images with the face of the subject visible, as well as with the face covered by a mask. Two kinds of masks have been used in this dataset – rigid (resin-coated) masks, and flexible (silicone) masks.
Exact Acceleration of Linear Object Detectors We describe a general and exact method to considerably speed up linear object detection systems operating in a sliding, multi-scale window fashion, such as the individual part detectors of part-based models.
Face Color Model This page contains the source code and data needed to train and use a model for skin, hair, clothing and background color modelling and segmentation.
facereclib - The Face Recognition Library This library is designed to perform a fair comparison of face recognition algorithms. It contains scripts to execute various kinds of face recognition experiments on a variety of facial image databases
FingerveinRecLib The Finger vein Recognition Library based on Bob is a library designed to perform a fair comparison of finger vein recognition algorithms.
GC.MI The gc_MI.cpp file includes C++ code implementing the GC.MI algorithm presented in the paper:
HAN_NMT Document-Level Neural Machine Translation with Hierarchical Attention Networks
HEAT Image Retrieval System HEAT is an image retrieval web-application that is intended for large unstructured collections of images without semantic annotations. The system implements a novel searching paradigm that does not require any explicit query. At each iteration, the system displays a small set of images and the user chooses the image that best matches what she is looking for. After a few iterations, the sets of displayed images are gradually concentrated on images that satisfy the user.
HG3D - A module for 3D head pose and gaze tracking from RGB-D sensors This software contains the implementation of algorithms related to 3D head pose and gaze tracking tasks based on RGB-D cameras (standard vision and depth).
HOOSC Histogram of Orientation Shape Context
hpca hpca is a C++ toolkit providing an efficient implementation of the Hellinger PCA for computing word embeddings
human-detection Background substraction and Human Detection
HTS-VTLN This software is a patch to HMM based statistical parametric speech synthesis toolkit (HTS 2.2).
Importance Sampling This python package provides a library that accelerates the training of arbitrary neural networks created with Keras using importance sampling.
inv-tn Inverse Text Normalization using NMT models
ISS The Idiap Speech Scripts (ISS) is a collection of speech databases and dictionaries, and for training and testing of models for ASR. The scripts in turn are reliant on many other packages including HTK/HTS, Juicer and the ICSI speech tools.
Juicer Juicer is a Weighted Finite State Transducer (WFST) based decoder for Automatic Speech Recognition (ASR).
kaldi-ivector The code is an implementation of the standard i-vector extraction algorithm for the Kaldi toolkit.
KiSC K.I.S.S. Cluster (KiSC) - with K.I.S.S. as in "Keep It Stupid Simple" - is a utility that aims to simplify the life of administrators managing resources accross a cluster of hosts
libssp Library for speech signal processing
MASH Framework Back-end of the MASH computation farm
mash-simulator mash-simulator is a 3D simulator for Linux and MacOS where a robot must complete a certain number of tasks in different randomized environments.
mash-web Front-end of the MASH computation farm
ML3 ML3 is an open source implementation of the Multiclass Latent Locally Linear Support Vector Machine algorithm, a multi-class local classifier based on a latent SVM formulation.
mhan Multilingual hierarchical attention networks toolkit
MSER Linear time Maximally Stable Extremal Regions (MSER) implementation as described in D. Nistér and H. Stewénius, Linear Time Maximally Stable Extremal Regions"
Multi Camera Calibration Suite This toolset provides the basics for calibrating a multi-camera scene. it contains six utilities for different purposes. In this README I will walk the user through the calibration of a multi camera scene using this toolset.
PalmveinRecLib The Palm vein Recognition Library based on Bob is a library designed to perform a fair comparison of palm vein recognition algorithms. It contains scripts to execute various kinds of palm vein recognition experiments on a variety of palm vein image databases.
pbdlib-matlab PbDlib is a set of tools combining statistical learning, dynamical systems and optimal control approaches for programming-by-demonstration applications
phonvoc: Phonetic and phonological vocoding platform Phonvoc is a cascaded deep neural network composed of speech analyser and synthesizer that use shared phonological speech representation.
Probabilistic Models: temporal topic models and more Topic models such as Latent Dirichlet Allocation (LDA) have been used successfully in many domains for data mining. Originally designed for text documents, these methods find some hidden “topics” considering that each document is a weighted mixture of topics. Each topic expresses itself in a document by generating some specific words with more probability than others.
Remote heart rate measurement from face video sequences This package provides three baseline algorithms to perform remote photoplethysmography (rPPG), which consists in measuring the heart rate from a face video sequence. The software package implements three different algorithms to retrieve the pulse signal from skin color variations: an approach based on colorspace transformation, another approach solely based on signal processing, and a more recent approach, which analyzes the subspace spanned by skin-colored pixels in the RGB colorspace.
RGBD: A Python based RGB-D data processing module This python module implements the streaming, calibration and visualization of RGB-D data, that is, combined color and depth images.
semiblindpsfdeconv Code for "Semi-Blind Spatially-Variant Deconvolution in Optical Microscopy with Local Point Spread Function Estimation By Use Of Convolutional Neural Networks" ICIP 2018
Simple Imager Simple Imager (Linux Imaging and Deployment Made Easy) is a set of tools allowing an imaging server to retrieve a copy of Linux reference hosts (sources) and allowing those images to be deployed to other target hosts by the mean of RSync or BitTorrent files download.
SLOG - Similarity Learning on Graph SLOG contains implementation of similarity learning methods over relational data, where the relation between data points are given explicitly
Speaker Diarization Toolkit The toolkit is intended to facilitate research in multistream speaker diarization providing a platform for research in novel audio, video or location features. It is based on the Information Bottleneck principle and is explicitely designed to use of several hetergenous feature streams.
SSP SSP stands for Speech Signal Processing. It is a fairly small package written in python. Its functionality is similar to tracter, with some overlap and some additional capabilities. In particular, SSP contains a parametric vocoder, a pitch extractor and feature extraction for ASR.
symfony-bundle-datajukebox The Data Jukebox Bundle is a PHP/Symfony bundle which aims to provide - for common CRUD (Create-Read-Update-Delete) operations - the same level of abstraction that Symfony does for forms.
Tasting Families of Features for Image Classification Please find below the code necessary to reproduce the experiments of the paper Tasting Families of Features for Image Classification" under the GPL v2 license. "
The Multi-Tracked Paths This is an implementation of the variant of KSP for tracking presented in (Berclaz et al. 2011). You can get more information and the reference implementation from the CVLab's web page about multi-camera tracking.
Torch Statistical machine learning library containing most of the state-of-the-art algorithms. Written in Lua and C, the library is distributed under a BSD license.
Torch3vision Common software library for computer vision with machine learning algorithms. Written in simple C++, this library is based on Torch and distributed under a BSD license.
Tracter Tracter is a data flow framework.
trimed The trimed algorithm for obtaining the medoid of a set
warca WARCA is a simple and fast algorithm for metric learning.
Webvalidation This software is a multi users, multi projects web annotation tool that help to organize the process of validating automatically generated transcriptions.
xbob.spkrec Speaker recognition library including feature extraction, background training, client enrolment, and score computation
xbob.thesis.elshafey2014 This package contains scripts to reproduce the experiments of Laurent El Shafey's Ph.D. thesis at Ecole Polytechnique Fédérale de Lausanne (EPFL).
zentas Software for doing k-medoids using an accelerated CLARANS algorithm



3DMAD The 3D Mask Attack Database (3DMAD) is a biometric (face) spoofing database. It currently contains 76500 frames of 17 persons, recorded using Kinect for both real-access and spoofing attacks. Each frame consists of:
AMI AMI Meeting Corpus is a multi-modal data set consisting of 100 hours of meeting recordings.
AREX AMI Requests for Explanations and Relevance Judgments for their Answers
AV16.3: an Audio-Visual Corpus for Speaker Localization and Tracking The AV16.3 corpus is an audio-visual corpus of real indoor multispeaker data, designed to test algorithms for audio-only, video-only and audio-visual speaker localization and tracking.
AVspoof The AVspoof database is intended to provide stable, non-biased spoofing attacks in order for researchers to test both their ASV systems and anti-spoofing algorithms. The attacks are created based on newly acquired audio recordings. The data acquisition process lasted approximately two months with 44 persons, each participating in several sessions configured in different environmental conditions and setups. After the collection of the data, the attacks, more precisely, replay, voice conversion and speech synthesis attacks were generated.
Biometric resources Find useful protocols, annotations, etc. that are provided to help encourage reproducible research.
bioscote: BIOmetric SCOres Thesis Elshafey 2014 This dataset contains raw scores in plain text format of several biometric (face and speaker) recognition systems applied on several databases.
CCC - Cursive Character Challenge This is the home page of Cursive Character Challenge (C-Cube), the new benchmark for machine learning and pattern recognition algorithms. The database contains 57293 cursive characters manually extracted from cursive words, including both upper and lower case versions of each letter.
COHFACE The COHFACE dataset contains RGB video sequences of faces, synchronized with heart-rate and breathing-rate of the recorded subjects
Disco-Annotation Disco-Annotation is a collection of training and test sets with manually annoted discourse relations for 8 English discourse connectives in europarl texts.
DW-Dubbing The DW-Dubbing dataset was annotated to evaluate algorithms detecting dubbing scenes in broadcast media.The face tracks with audio are collected from 15 videos of Deutsche-Welle broadcast programs.
ELEA The corpus was gathered with the aim of analyzing emergent leadership as a social phenomenon that occurs in newly formed groups.
Europarl-direct Europarl-direct These files provide statement pair extractions from the Europarl corpus of the same known source language directly translated to the target languages
EYEDIAP The EYEDIAP dataset was designed to train and evaluate gaze estimation algorithms from RGB and RGB-D data. It contains a diversity of participants, head poses, gaze targets and sensing conditions.
fvspoofingattack: The Spoofing-Attack Finger vein Database The Spoofing-Attack Database for finger vein spoofing consists of 440 index real and fake finger images attempts to 110 clients.
Hand Posture and Gesture Datasets This webpage provides several benchmark databases for hand posture and hand gesture recognition.
HATDOC Human Attention in Document Classification
Head Pose Database The objective was to construct a video database allowing to perform quantitative evaluation of algorithms extracting information related to the head pose of people, such as head tracking and pose estimation algorithms, or focus of attention analysis.
idiap-poster-data The Idiap Poster Data consists of images extracted from 6 hours of videos shot during a poster session.
InteractPlay Dataset InteractPlay Dataset is a hand gesture database made of a 3D hand trajectories. It contains 16 hand gestures from 22 persons and provides 5 sessions and 10 recordings per session
maya-codex The Maya Codex Dataset contains high-quality representation of the ancient Maya hieroglyph data, and a statistic glyph co-occurrence information that we extracted from the Thompson catalog [1].
MDC: Mobile Data Challenge MDC consists of large quantities of continuous data pertaining to the behaviour of individuals and social networks, recorded via mobile phones from 2009 to 2011 in the Lausanne/Geneva area. About 200 persons participated in the data collecting campaign.
Mediaparl Mediaparl is a Swiss accented bilingual database containing recordings in both French and German as they are spoken in Switzerland
Mobio The MOBIO database currently consists of 152 people (audio and video samples) with 12 sessions each.
msspoof: Multispectral-Spoof Database Multispectral-Spoof contains face images and printed spoofing attacks recorded in Visible (VIS) and Near-Infrared (NIR) spectra for 22 identities.
PRINT-ATTACK The Print-Attack Database consists of video samples of spoofing attacks using printed photos to 50 identities under different lighting conditions.
Replay-Mobile The Replay-Mobile Database for face spoofing consists of 1190 video clips of photo and video attack attempts to 40 clients, under different lighting conditions.
SSLR Sound Source Localization for Robots (SSLR) Dataset
Speechdat - FIXED1SF This database comprises telephone recordings from 1000 speakers recorded directly over the fixed PSTN using an ISDN interface.
Speechdat - FIXED1SZ 2000 swiss-german speakers recorded over the SwissNet. They follow a protocole made up of 41 items (digits, words, numbers,sentences,..). An orthographical and phonemic transcription is available.
Speechdat - VERIF1SF 20 swiss-french people recorded 50 times overs the swiss telephone network follow a protocole made up of 54 items. (digits, words, numbers,sentences,..). An orthographical and phonemic transcription is available.
swiss-french-polyphone 4500 swiss-french speakers recorded over the SwissNet. They follow a protocole made up of 38 items. (digits, id number, natural numbers, money amount, names, words, sentences,...). There is an orthographic transcription for all the calls.
swiss-french-polyvar Telephone recordings from about 143 swiss-french speakers. Each speaker recorded between 1 and 225 sessions. Each recording is made up of 55 items. (words, sentences, id numbers, creditcard numbers, single, digits, date, address, query, comments). All files are annotated and transcripted.
TA2 The TA2 database consists of high-definition, simultaneous A/V recordings and annotations from two separate rooms, where the participants play games and communicate with each other over a video-conferencing system.
TED A dataset for recommendations collected from which contains metadata fields for TED talks and user profiles with rating and commenting transactions.
Tense-Annotation This dataset provides parallel texts in English/French from Europarl, along with an alignment of the verbs in the sentences with information on their position, tense and voice.
The Replay-Attack Database The Replay-Attack Database for face spoofing consists of 1300 video clips of photo and video attack attempts to 50 clients, under different lighting conditions. This Database was produced at the Idiap Research Institute, in Switzerland.
Two-Handed Datasets This database consists of different two-handed gestures (rotations in all the 6 directions and a push" gesture)."
Unicity UNICITY consists of 58k images collected from 65 recorded sequences with one or two people performing different behaviors including attacks and trickeries, like for instance tailgating (when a person walks very close to another to get into a restricted area). It also provides full annotation of people such as the location of head and shoulders. As as result, UNICITY is perfectly suited for training and adapting machine learning algorithms for video surveillance applications.
VERA Fingervein The VERA Fingervein Database for fingervein recognition consists of 440 images from 110 clients.
VERA Palmvein The VERA Palmvein Database for palmvein recognition consists of 2200 images from 110 clients. This Database was produced at the Idiap Research Institute in Martigny and at Haute Ecole Spécialisée de Suisse Occidentale in Sion, in Switzerland.
VERA Spoofing Fingervein The VERA Spoofing Fingervein Database for direct attacks fingervein recognition consists of 200 images attempts to the 50 first clients from the Idiap Research Institute VERA Fingervein Database. This Database was produced at the Idiap Research Institute in Martigny, in Switzerland.
VERA Spoofing Palmvein The VERA Spoofing Palmvein Database for direct attacks palmvein recognition consists of 1000 images attempts to the 50 first clients from the Idiap Research Institute VERA Palmvein Database. This Database was produced at the Idiap Research Institute in Martigny, in Switzerland.
walliserdeutsch News bulletins in the upper valaisan german dialect, broadcast by RRO (radio rottu oberwallis), taken from their web site and annotated at Idiap.
wolf corpus The wolf corpus is an audio-visual data set containing around 81 hours of conversational data among groups of 8-12 people playing a role playing game.
youtube-personality The YouTube personality dataset consists of a collection of behavorial features, speech transcriptions, and personality impression scores for a set of 404 YouTube vloggers that explicitly show themselves in front of the a webcam talking about a variety of topics including personal issues, politics, movies, books, etc. There is no content-related restriction and the language used in the videos is natural and diverse.