The Idiap Research Institute together with the Swiss Center for Electronics and Microtechnology (CSEM) seeks a qualified candidate for postdoctoral position on joint modeling of speech and physiological signals. The research and development will be take place in the context of CSEM-Idiap collaboration project AUDIO: Reinforced Audio Processing via Physiological Signals. Briefly, this project combines Idiap's expertise on speech processing and CSEM's expertise on physiological signal acquisition to develop a platform that acquires speech signal synchronously with physiological signals and body sounds, and models them jointly for human conversation analysis.
Speech and Audio Processing
Video Presentation
Current Group Members

BOURLARD, Hervé
(Director, EPFL Full Professor)
- website

GARNER, Philip
(Senior Researcher)
- website

MOTLICEK, Petr
(Senior Researcher)
- website

MAGIMAI DOSS, Mathew
(Senior Researcher)
- website

IMSENG, David
(Research Associate)
- website

RAZAVI, Marzieh
(Postdoctoral Researcher)
- website

VLASENKO, Bogdan
(Postdoctoral Researcher)
- website

MADIKERI, Srikanth
(Postdoctoral Researcher)
- website

AICHINGER, Ida
(Postdoctoral Researcher)
- website

ABROL, Vinayak
(Postdoctoral Researcher)
- website

KODRASI, Ina
(Postdoctoral Researcher)
- website

KHONGLAH, Banriskhem (Kayang)
(Postdoctoral Researcher)
- website

SHAKAS, Alexis
(Postdoctoral Researcher)
- website

DIGHE, Pranay
(Research Assistant)
- website

RAM, Dhananjay
(Research Assistant)
- website

TONG, Sibo
(Research Assistant)
- website

HE, Weipeng
(Research Assistant)
- website

DUBAGUNTA, Pavankumar (Subrahmanya)
(Research Assistant)
- website

MUCKENHIRN, Hannah
(Research Assistant)
- website

JANBAKHSHI, Parvaneh
(Research Assistant)
- website

DEY, Subhadeep
(Research Assistant)
- website

TORNAY, Sandrine
(Research Assistant)
- website

SEBASTIAN, Jilt
(Research Assistant)
- website

SCHNELL, Bastian
(Research Assistant)
- website

STERPU, George
(Trainee)
- website

CANDY, Romain
(Trainee)
- website

MARELLI, François
(Trainee)
- website

LOUPI, Dimitra
(Trainee)
- website

KABIL, Selen
(Trainee)
- website
Alumni
- AJMERA, Jitendra
- ARADILLA ZAPATA, Guillermo
- ATHINEOS, Marios
- BAHAADINI, Sara
- BARBER, David
- BENZEGHIBA, Mohamed (Faouzi)
- CEREKOVIC, Aleksandra
- CEVHER, Volkan
- CHAVARRIAGA, Ricardo
- COLLADO, Thierry
- CRITTIN, Frank
- DINES, John
- DRYGAJLO, Andrzej
- DUFFNER, Stefan
- GALAN MOLES, Ferran
- GRANDVALET, Yves
- GRANGIER, David
- HAGEN, Astrid
- HERMANSKY, Hynek
- HONNET, Pierre-Edouard
- IKBAL, Shajith
- IVANOVA, Maria
- KETABDAR, Hamed
- KRSTULOVIC, Sacha
- LATHOUD, Guillaume
- LAZARIDIS, Alexandros
- LI, Weifeng
- MARIÉTHOZ, Johnny
- MARTINS, Renato
- MASSON, Olivier
- MCCOWAN, Iain
- MILLÁN, José del R.
- MOORE, Darren
- MORRIS, Andrew
- MOSTAANI, Zohreh
- MOULIN, François
- NATUREL, Xavier
- PARTHASARATHI, Sree Hari Krishnan
- PINTO, Francisco
- POTARD, Blaise
- SHANKAR, Ravi
- STEPHENSON, Todd
- SZASZAK, György
- TYAGI, Vivek
- ULLMANN, Raphael
- VALENTE, Fabio
- WELLNER, Pierre
Current Projects
- AUDIO - Reinforced audio processing via physiological signals
- COBALT - Content Based Call Filtering
- DEEPCHARISMA - Deep Learning Charisma
- FLOSS - Flexible Linguistically-guided Objective Speech aSessment
- MASS - Multilingual Affective Speech Synthesis
- MEGANEPRO - Myo-Electricity, Gaze and Artificial Intelligence for Neurocognitive Examination and Prosthetics
- MOSPEEDI - MoSpeeDi. Motor Speech Disorders: characterizing phonetic speech planning and motor speech programming/execution and their impairments
- MPM - Multimodal People Monitoring
- MUMMER - MultiModal Mall Entertainment Robot
- PHASER-QUAD - Parsimonious Hierarchical Automatic Speech Recognition and Query Detection
- SARAL - Summarization and domain-Adaptive Retrieval of Information Across Languages
- SHISSM - Sparse and hierarchical Structures for Speech Modeling
- SIIP - Speaker Identification Integrated Project
- SMILE - Scalable Multimodal sign language Technology for sIgn language Learning and assessmEnt
- SUMMA - Scalable Understanding of Multilingual Media
- TAPAS - Training Network on Automatic Processing of PAthological Speech
- UNITS - Unified Speech Processing Framework for Trustworthy Speaker Recognition
Recent Projects
- MALORCA - Machine Learning of Speech Recognition Models for Controller Assistance
- MULTIVEO - High Accuracy Speaker-Independent Multilingual Automatic Speech Recognition System
- ELEARNING-VALAIS_3.0 - eLearning-Valais 3.0
- ESGEM - Enhanced Swiss German mEdia Monitoring
- NMTBENCHMARK - Training and Benchmarking Neural MT and ASR Systems for Swiss Languages
- ADDG2SU_EXT. - Flexible Acoustic data-driven Grapheme to Subword Unit Conversion
- OMSI_EXT - Objective Measurement of Speech Intelligibility
- RECAPP - Making speech technology accessible to Swiss people
- SIWIS - Spoken Interaction with Interpretation in Switzerland
- A-MUSE - Adaptive Multilingual Speech Processing
- SP2 - SCOPES Project on Speech Prosody
- PHASER - PHASER: Parsimonious Hierarchical Automatic Speech Recognition
- ADDG2SU - Flexible Acoustic Data-Driven Grapheme to Subword Unit Conversion
- L-PASS - Linguistic-Paralinguistic Speech Synthesis
- SCOREL2 - Automatic scoring and adaptive pedagogy for oral language learning
- DEEPSTD-EXT - Universal Spoken Term Detection with Deep Learning (extension)
- ROCKIT - Roadmap for Conversational Interaction Technologies
- DBOX - D-Box: A generic dialog box for multilingual conversational applications
- OMSI_ARMASUISSE - Objective Measurement of Speech Intelligibility
- BIOWATCH - Biowatch
- RECOD2014 - low bit-rate speech coding
- AAMASSE - Acoustic Model Adaptation toward Spontaneous Speech and Environment
- INEVENT - Accessing Dynamic Networked Multimedia Events
- FLEXASR - Flexible Grapheme-Based Automatic Speech Recognition
- RECOD2013 - low bit-rate speech coding
- PANDA - Perceptual Background Noise Analysis for the Newest Generation of Telecommunication Systems
- IM2-3 - Interactive Multimodal Information Management Phase 3
- DAUM2012 - Domain Adaptation Using Sub-Space Models
- RECOD2012 - Very Low bit-rate speech coding
- SCALE - Speech Communication with Adaptive Learning
- V-FAST - Vocal-tract based Fast Adaptation for Speech Technology
- DAUM - Domain Adaptation Using Sub-Space Models
- MULTI08EXT - Multimodal Interaction and Multimedia Data Mining
- ICS-2010 - Interactive Cognitive Systems
- TAO-CSR - Task Adaptation and Optimisation for Conversational Speech Recognition
- AMSP - Auditory-motivated signal processing and applications to robust speech enhancement and recognition
- TA2-EEU - Together Anywhere, Together Anytime - Enlarged European Union
- TA2 - Together Anywhere, Together Anytime
- RECOD - Low bit-rate speech coding
- DM3 - Distributed MultiModal Media server, a low cost large capacity high throughput data storage system
- MULTI08 - Multimodal Interaction and Multimedia Data Mining
- AMIDA - Augmented Multi-party Interaction with Distance Access
Group News
Suite à un processus d’évaluation très stricte – comprenant une nomination par la Direction et une validation par le Collège scientifique -, l’Idiap est heureux d’annoncer la promotion de trois de ses chercheurs au poste de Senior Researcher :
Following a very strict evaluation process – comprising the nomination by Idiap's management and a formal approval by the Scientific College - , Idiap is pleased to announce the promotion of three of its researchers to Senior Researchers:
Multimodal people monitoring using sound (and vision) . The Idiap Research Institute together with Swiss Center for Electronics and Microtechnology (CSEM) invite applications for a post-doctoral position in research and development for multimodal people monitoring.
The Idiap Research Institute seeks qualified candidates for two PhD positions in the area of pathological speech processing. The research and development will take place in the context of EU funded Marie-Sklodowska Curie Actions Innovative Training Networks European Training Networks TAPAS - Training Network on Automatic Processing of Pathological Speech .
LYON, France – In its final field test, the Speaker Identification Integrated Project (SiiP) successfully demonstrated the system’s innovative capabilities as language independent voice recognition system.
Many audio events, such as those that happen in the vocal tract when speaking, can be characterised as having a start time and duration. The duration can be several samples or frames. However, this is at odds with current audio synthesis methods, which tend to use fixed-duration frame-based models. It follows that more natural audio synthesis may arise from more natural models.
The work described in this thesis takes place in the context of data-driven integration of linguistic knowledge and acoustic information for the pronunciation lexicon development.
The Idiap Research Institute invites applications for three post-doctoral positions in the general areas of multilingual speech recognition, low-resourced speech recognition, domain adaptation and cross-lingual indexing. The positions are currently funded by EU H2020 and IARPA projects, ranging from one to five years in duration.
The Idiap Research Institute seeks qualified candidates for one PhD student position in the field of speech processing and classification of motor speech disorders from the speech signal (pathological speech analysis and synthesis for perception and production).
When speech processing systems are designed for use in multilingual environments, additional complexity is introduced. Identifying when language switching has occurred, predicting how cross-lingual terms will be pronounced, obtaining sufficient speech data from diverse language backgrounds: such factors all complicate the development of practical speech-oriented systems. In this talk, I will discuss our research group's experience in building speech recognition systems for the South African environment, one in which 11 official languages are recognised. I will also show how this relates to our participation in the BABEL project, a recent 5-year international collaborative project aimed at solving the spoken term detection task in under-resourced languages.
The Prof. Marelie Davel will give the talk entitled: Multilingual speech recognition in under-resourced environments
12th of April, 2017 in Prague (ANS CR)
Winner of the IEEE Ganesh N. Ramaswamy Memorial Student Grant:
On March 3rd, 2017, Pierre-Edouard Honnet made the public defense of his PhD thesis entitled "Intonation Modelling for Speech Synthesis and Emphasis Preservation".
The Idiap Research Institute seeks a qualified candidate for postdoctoral position in automatic speaker recognition.
The Idiap Research Institute seeks a qualified candidate for postdoctoral position in automatic spam call detection.
In October 2016, National Institute of Standards and Technology (NIST), USA, has organized the Speaker Recognition (SRE) evaluation, as the one of ongoing series of speaker recognition system evaluations conducted by NIST since 1996.
The Idiap Research Institute invites applications for post-doctoral position in automatic speech recognition. The position is funded by a new industrial project with a leading credit card company in Switzerland. The research and development project will focus on combining technologies of speech recognition with speaker verification. The research will be carried out in a collaboration with other (i.e. European H2020) projects already running at the Idiap research institute.
How is it possible for people with bad intentions to get access to data from our smartphone or our GPS?
The Idiap Research Institute invites applications for one internship position in the domain of speech processing. The position will be aligned with an European project on speaker identification.
In the context of a Swiss NSF grant, we seek a PhD student to work on multilingual and affective speech synthesis.
With regard to devices using voice control, the Swiss German population has so far been left out in the cold. At the best, smartphones, smart TVs and other tools of this kind understand High German, but have no chance when the Swiss German dialect is concerned. But this will change soon.
Marzieh Razavi, Idiap/EPFL Ph.D. student was granted the LTC 2015 Best Student Paper Award for the paper entitled "Pronunciation Lexicon Development for Under-Resourced Languages Using Automatically Derived Subword Units: A Case Study on Scottish Gaelic".
Raphael Ullmann, Idiap/EPFL Ph.D. student was awarded a ISCA 2015 Best Student Paper Award for the paper entitled "Objective Intelligibility Assessment of Text-to-Speech Systems Through Utterance Verification"
On October first 2014, Ramya Rasipuram made the public defense of her PhD thesis entitled "Grapheme-based Automatic Speech Recognition using Probabilistic Lexical Modeling". She received the EPFL PhD thesis diploma from her thesis Director Hervé Bourlard.
On September 4th 2013, Afsaneh Asaei, made the public defense of her PhD thesis entitled "Model-based Sparse Component Analysis for Multiparty Distant Speech Recognition". She received the EPFL diploma from her doctoral advisor (Hervé Bourlard).
Prof. Hervé Bourlard, Idiap Director and EPFL Full Professor, has been named as a Fellow by the International Speech Communication Association (ISCA).
Le déferlement d’outils technologiques accélère la communication et bouleverse notre rapport au monde. Basé à Martigny, l’Idiap – qui mène des projets de recherche fondamentale au plus haut niveau – travaille à l’amélioration des relations personne-machine et à l’optimisation de la communication humaine. Ce prestigieux institut s’engage pour un progrès scientifique au service de l’homme. Interview de son directeur, Hervé Bourlard, expert mondial du traitement de la parole et également professeur à l’Ecole polytechnique fédérale de Lausanne (EPFL).
In its issue of April 17, 2013, the Swiss economic journal Bilan released its annual ranking of the top 300 "most influential personalities" in Switzerland.
Crédit: RTS.ch, la première, émission CQFD, 5 octobre 2012
A new book on multimodal signal processing for the analysis of human communication has been published by Cambridge University Press on June 7, 2012. The book was edited by Hervé Bourlard and Andrei Popescu-Belis, with colleagues from the University of Edinburgh, and five other Idiap researchers have contributed chapters to it.
In december 2011, Idiap signs a Memorandum of Understanding (MoU) with the Indian Institute of Technology at Guwahati (IITG) and the International Institute of Information Technology (IIIT) at Hyderhabad, India
Afsaneh Asaei's paper was selected as the winner of the IEEE Spoken Language Processing Student Travel Grant.
Multimodal signal processing: methods and techniques to build multimodal interactive systems by Jean-Philippe Thiran (Author), Herve Bourlard (Author), Ferran Marques (Author), Academic Press Inc (23 novembre 2009), 448 pages, ISBN-10: 0123748259
Special Issue on Mobile Media Search
by T.Dutoit, L.Couvreur, H.Bourlard