“To understand others implies no only to get their point, but also to understand their feelings and emotions”; Daniel Goleman, Emotional Intelligence.
Speech & Audio Processing
The expertise of the group encompasses statistical automatic speech recognition—based on hidden Markov models, or hybrid systems exploiting connectionist approaches—, text-to-speech, and generic audio processing, covering sound source localization, microphone arrays, speaker diarization, audio indexing, very low bit-rate speech coding, and perceptual background noise analysis for telecommunication systems.
Group News
PARIDA Shantipriya, Idiap Postdoctoral Researcher has been granted to be an invited speaker at the Indo-German SPARC Symposium.
3rd ROXANNE Newsletter
Alternation in respiratory system and speech production system results in changes in speech. Therefore, speech signal, which can be acquired in a non-invasive manner, could be used to predict breathing patterns. There is a growing interest in that direction, which has gained further momentum with COVID-19 situation.
Highly Automated Air Traffic Controller Workstations with Artificial Intelligence Integration
Group Job Openings
- Speech recognition and natural language processing for digital interviews — by admin — last modified Mar 09, 2021
- In the context of a Swiss NSF grant, we seek a PhD student to work on speech recognition and natural language processing for digital interviews. The project is a collaboration between the universities of Neuchâtel and Lausanne, and Idiap Research Institute.
- Several Openings for Cross-Disciplinary Senior Researcher positions — by admin — last modified Mar 01, 2021
- With the growth of the institute, in addition to increased Federal funding to further support its activities, Idiap is opening several additional permanent senior research scientist positions.
- PhD position in Neural Architectures for Speech Technology — by admin — last modified Dec 21, 2020
- In the context of a Swiss NSF grant, we seek a PhD student to work in the general area of neural architectures for speech technology.
- Internship position in domain of automatic speech recognition — by admin — last modified Jul 10, 2020
- The position will be aligned with an European project on automatic speech recognition of air-traffic communication through SESAR Joint Undertaking exploratory research
- Postdoc positions in automatic speech recognition — by admin — last modified Jul 10, 2020
- The Idiap Research Institute seeks qualified candidates for several postdoctoral positions in the field of Deep Learning applied for acoustic, and/or language modeling in Automatic Speech Recognition (ASR).
Current Group Members

BOURLARD, Hervé
(Director, EPFL Full Professor)
- website

MAGIMAI DOSS, Mathew
(Senior Researcher)
- website

MOTLICEK, Petr
(Senior Researcher)
- website

GARNER, Philip
(Senior Researcher)
- website

MADIKERI, Srikanth
(Research Associate)
- website

BRAUN, Rudolf (Arseni)
(Research Associate)
- website

KHOSRAVANI, Abbas
(Postdoctoral Researcher)
- website

ANTONELLO, Niccolò
(Postdoctoral Researcher)
- website

SARFJOO, Saeed (Seyyed)
(Postdoctoral Researcher)
- website

PARIDA, Shantipriya
(Postdoctoral Researcher)
- website

TORNAY, Sandrine
(Postdoctoral Researcher)
- website

PRASAD, Ravi (Shankar)
(Postdoctoral Researcher)
- website

VLASENKO, Bogdan
(Postdoctoral Researcher)
- website

HE, Weipeng
(Postdoctoral Researcher)
- website

HERMANN, Enno
(Research Assistant)
- website

FABIEN, Maël
(Research Assistant)
- website

BITTAR, Alexandre
(Research Assistant)
- website

FRITSCH, Julian (David)
(Research Assistant)
- website

COPPIETERS DE GIBSON, Louise
(Research Assistant)
- website

VYAS, Apoorv
(Research Assistant)
- website

JANBAKHSHI, Parvaneh
(Research Assistant)
- website

MOSTAANI, Zohreh
(Research Assistant)
- website

ZULUAGA GOMEZ, Juan Pablo
(Research Assistant)
- website

MARELLI, François
(Research Assistant)
- website

PRASAD, Amrutha
(Research Assistant)
- website

KABIL, Selen
(Research Assistant)
- website

NIGMATULINA, Iuliia
(Research Assistant)
- website

SARKAR, Eklavya
(Research Assistant)
- website

TARIGOPULA, Neha
(Research Assistant)
- website

SCHNELL, Bastian
(Research Assistant)
- website

DUBAGUNTA, Pavankumar (Subrahmanya)
(Research Assistant)
- website

SALAMIN, Chloé
(Trainee)
- website

EL HAJAL, Karl
(Master Student)

VASQUEZ-CORREA, Juan Camilo
(Trainee)

VILLATORO TELLO, Esaú
(Sabbatical Academic Visitor)
- website
Alumni
- ABROL, Vinayak
- AICHINGER, Ida
- AJMERA, Jitendra
- ARADILLA ZAPATA, Guillermo
- ATHINEOS, Marios
- BABY, Deepak
- BAHAADINI, Sara
- BARBER, David
- BENZEGHIBA, Mohamed (Faouzi)
- CANDY, Romain
- CEREKOVIC, Aleksandra
- CEVHER, Volkan
- CHAVARRIAGA, Ricardo
- COLLADO, Thierry
- CRITTIN, Frank
- DEY, Subhadeep
- DIGHE, Pranay
- DINES, John
- DRYGAJLO, Andrzej
- DUFFNER, Stefan
- GALAN MOLES, Ferran
- GOMEZ ALANIS, Alejandro
- GRANDVALET, Yves
- GRANGIER, David
- HAGEN, Astrid
- HAJIBABAEI, Mahdi
- HALPERN, Bence
- HERMANSKY, Hynek
- HONNET, Pierre-Edouard
- IKBAL, Shajith
- IMSENG, David
- IVANOVA, Maria
- JAIMES, Alejandro (Alex)
- JEANNINGROS, Loïc
- KETABDAR, Hamed
- KHONGLAH, Banriskhem (Kayang)
- KODRASI, Ina
- KRSTULOVIC, Sacha
- LATHOUD, Guillaume
- LAZARIDIS, Alexandros
- LI, Weifeng
- LOUPI, Dimitra
- MARIÉTHOZ, Johnny
- MARTINS, Renato
- MASSON, Olivier
- MBANGA NDJOCK, Pierre (Armel)
- MCCOWAN, Iain
- MENDOZA, Viviana
- MILLÁN, José del R.
- MILLIUS, Loris
- MOORE, Darren
- MORRIS, Andrew
- MOULIN, François
- MUCKENHIRN, Hannah
- NALLANTHIGHAL, Venkata Srikanth
- NATUREL, Xavier
- PARTHASARATHI, Sree Hari Krishnan
- PINTO, Francisco
- POTARD, Blaise
- RAZAVI, Marzieh
- SAMUI, Suman
- SEBASTIAN, Jilt
- SHAHNAWAZUDDIN, Syed
- SHAKAS, Alexis
- SHANKAR, Ravi
- SHARMA, Shivam
- SRINIVASAMURTHY, Ajay
- STEPHENSON, Todd
- STERPU, George
- SZASZAK, György
- TONG, Sibo
- TYAGI, Vivek
- ULLMANN, Raphael
- VALENTE, Fabio
- VITEK, Radovan
- WANG, Lei
- WELLNER, Pierre
- ZHAN, Qingran
Active Research Grants
- ADEL - Automatic Detection of Leadership from Voice and Body
- AI4EU - A European AI On Demand Platform and Ecosystem
- ATCO2 - Automatic collection and processing of voice data from air-traffic communications
- CMM - Conversation Member Match
- DAHL - DAHL: Domain Adaptation via Hierarchical Lexicons
- EVOLANG - Evolving Language
- HAAWAII - Highly Automated Air Traffic Controller Workstations with Artificial Intelligence Integration
- MOSPEEDI - MoSpeeDi. Motor Speech Disorders: characterizing phonetic speech planning and motor speech programming/execution and their impairments
- NATAI - The Nature of Artificial Intelligence
- ROXANNE - Real time network, text, and speaker analytics for combating organized crime
- SARAL - Summarization and domain-Adaptive Retrieval of Information Across Languages
- SHISSM - Sparse and hierarchical Structures for Speech Modeling
- SMILE-II - SMILE-II Scalable Multimodal sign language technology for sIgn language Learning and assessmEnt Phase-II
- STARFISH - STARFISH: Safety and Speech Recognition with Artificial Intelligence in the Use of Air Traffic Control
- STEADI - Storytelling Algorithm for Digital Interviews
- TAPAS - Training Network on Automatic Processing of PAthological Speech
- TIPS - Towards Integrated processing of Physiological and Speech signals
- WAVE2-96 - H2020-SESAR-PJ.10-W2-Solution 96
Past Research Grants
- A-MUSE - Adaptive Multilingual Speech Processing
- AAMASSE - Acoustic Model Adaptation toward Spontaneous Speech and Environment
- ADDG2SU - Flexible Acoustic Data-Driven Grapheme to Subword Unit Conversion
- ADDG2SU_EXT. - Flexible Acoustic data-driven Grapheme to Subword Unit Conversion
- AMIDA - Augmented Multi-party Interaction with Distance Access
- AMSP - Auditory-motivated signal processing and applications to robust speech enhancement and recognition
- BIOWATCH - Biowatch
- COBALT - Content Based Call Filtering
- DAUM - Domain Adaptation Using Sub-Space Models
- DAUM2012 - Domain Adaptation Using Sub-Space Models
- DBOX - D-Box: A generic dialog box for multilingual conversational applications
- DEEPCHARISMA - Deep Learning Charisma
- DEEPSTD-EXT - Universal Spoken Term Detection with Deep Learning (extension)
- DEVEL-IA - Formation « Développeurs spécialisés en Intelligence Artificielle » selon le modèle de formation continue duale postgrade
- DM3 - Distributed MultiModal Media server, a low cost large capacity high throughput data storage system
- ELEARNING-VALAIS_3.0 - eLearning-Valais 3.0
- ESGEM - Enhanced Swiss German mEdia Monitoring
- FLEXASR - Flexible Grapheme-Based Automatic Speech Recognition
- FLOSS - Flexible Linguistically-guided Objective Speech aSessment
- ICS-2010 - Interactive Cognitive Systems
- IM2-3 - Interactive Multimodal Information Management Phase 3
- INEVENT - Accessing Dynamic Networked Multimedia Events
- L-PASS - Linguistic-Paralinguistic Speech Synthesis
- MALORCA - Machine Learning of Speech Recognition Models for Controller Assistance
- MASS - Multilingual Affective Speech Synthesis
- MEGANEPRO - Myo-Electricity, Gaze and Artificial Intelligence for Neurocognitive Examination and Prosthetics
- MPM - Multimodal People Monitoring
- MULTI08 - Multimodal Interaction and Multimedia Data Mining
- MULTI08EXT - Multimodal Interaction and Multimedia Data Mining
- MULTIVEO - High Accuracy Speaker-Independent Multilingual Automatic Speech Recognition System
- MUMMER - MultiModal Mall Entertainment Robot
- NMTBENCHMARK - Training and Benchmarking Neural MT and ASR Systems for Swiss Languages
- OMSI_ARMASUISSE - Objective Measurement of Speech Intelligibility
- OMSI_EXT - Objective Measurement of Speech Intelligibility
- PANDA - Perceptual Background Noise Analysis for the Newest Generation of Telecommunication Systems
- PHASER - PHASER: Parsimonious Hierarchical Automatic Speech Recognition
- PHASER-QUAD - Parsimonious Hierarchical Automatic Speech Recognition and Query Detection
- REAPPS - Reinforced audio processing via physiological signals
- RECOD - Low bit-rate speech coding
- RECOD2012 - Very Low bit-rate speech coding
- RECOD2013 - low bit-rate speech coding
- RECOD2014 - low bit-rate speech coding
- ROCKIT - Roadmap for Conversational Interaction Technologies
- SCALE - Speech Communication with Adaptive Learning
- SCOREL2 - Automatic scoring and adaptive pedagogy for oral language learning
- SHAPED - SHAPED: Speech Hybrid Analytics Platform for consumer and Enterprise Devices
- SIIP - Speaker Identification Integrated Project
- SIWIS - Spoken Interaction with Interpretation in Switzerland
- SM2 - SM2: Extracting Semantic Meaning from Spoken Material
- SMILE - Scalable Multimodal sign language Technology for sIgn language Learning and assessmEnt
- SP2 - SCOPES Project on Speech Prosody
- SUMMA - Scalable Understanding of Multilingual Media
- TA2 - Together Anywhere, Together Anytime
- TA2-EEU - Together Anywhere, Together Anytime - Enlarged European Union
- TAO-CSR - Task Adaptation and Optimisation for Conversational Speech Recognition
- UNITS - Unified Speech Processing Framework for Trustworthy Speaker Recognition
- V-FAST - Vocal-tract based Fast Adaptation for Speech Technology