In his research work, Idiap student, Bastian Schnell believes that affective TTS can be enabled with models which generalise better to the variability in speech thanks to components which are interpretable by humans.
Speech & Audio Processing
The expertise of the group encompasses statistical automatic speech recognition—based on hidden Markov models, or hybrid systems exploiting connectionist approaches—, text-to-speech, and generic audio processing, covering sound source localization, microphone arrays, speaker diarization, audio indexing, very low bit-rate speech coding, and perceptual background noise analysis for telecommunication systems.
Group News
Arriving in 2019 for a sabbatical year from the University of Mexico, Esaú Villatoro has now been working at Idiap for more than two years. Between publishing his work and adapting to Swiss life, he looks back on his experience at the institute.
Idiap Research Institute and the School of Engineering at EPFL invite applications for the directorship of Idiap. The successful candidate will also hold a faculty position as full professor at EPFL School of Engineering.
The Institute nominates every year two students for its internal awards. In 2021, the Best Paper Award goes to Suhan Shetty, and the Best Student Award goes to Parvaneh Janbakhshi. Congratulations!
Access to information is a challenge for disabled people, even at a time when communications channels are increasing. An international consortium gathering researchers, as well as private and public partners, under the leadership of the University of Zurich and including Idiap and Icare from the French speaking side of Switzerland was granted 6 million Swiss francs from Innosuisse—completed by 6 million from private partners—to take up this challenge.
Group Job Openings
Our group is regularly posting job openings ranging from internships to researcher positions. To check the opportunities currently available or to submit a speculative applications use the link below.Our group is regularly posting job openings ranging from internships to researcher positions. To check the opportunities currently available or to submit a speculative applications use the link below.
- Speech & Audio Processing
- Postdoc positions in automatic speech recognition F/H
- Internship position in domain of automatic speech recognition F/H
- Speculative Application for the Speech and Audio Processing group F/H
- Idiap Fellowship for Female Researchers F/H
- Idiap Academic Visitor Program F/H
- Valais-Wallis Ambition initiative for PhDs and Postdocs F/H
Current Group Members
The group is led by Hervé Bourlard.
There was an error while rendering this tile
Alumni
- ABROL, Vinayak
- AICHINGER, Ida
- AJMERA, Jitendra
- ANTONELLO, Niccolò
- ARADILLA ZAPATA, Guillermo
- ATHINEOS, Marios
- BABY, Deepak
- BAHAADINI, Sara
- BARBER, David
- BENZEGHIBA, Mohamed (Faouzi)
- CANDY, Romain
- CEREKOVIC, Aleksandra
- CEVHER, Volkan
- CHAVARRIAGA, Ricardo
- COLLADO, Thierry
- CRITTIN, Frank
- DEY, Subhadeep
- DIGHE, Pranay
- DINES, John
- DRYGAJLO, Andrzej
- DUFFNER, Stefan
- FABIEN, Maël
- GALAN MOLES, Ferran
- GOMEZ ALANIS, Alejandro
- GRANDVALET, Yves
- GRANGIER, David
- HAGEN, Astrid
- HAJIBABAEI, Mahdi
- HALPERN, Bence
- HE, Weipeng
- HERMANSKY, Hynek
- HONNET, Pierre-Edouard
- IKBAL, Shajith
- IMSENG, David
- IVANOVA, Maria
- JAIMES, Alejandro (Alex)
- JEANNINGROS, Loïc
- KETABDAR, Hamed
- KHODABAKHSHANDEH, Hamid
- KHONGLAH, Banriskhem (Kayang)
- KHOSRAVANI, Abbas
- KODRASI, Ina
- KRSTULOVIC, Sacha
- LATHOUD, Guillaume
- LAZARIDIS, Alexandros
- LI, Weifeng
- LINKE, Julian
- LOUPI, Dimitra
- MARIÉTHOZ, Johnny
- MARTINS, Renato
- MASSON, Olivier
- MBANGA NDJOCK, Pierre (Armel)
- MCCOWAN, Iain
- MENDOZA, Viviana
- MILLÁN, José del R.
- MILLIUS, Loris
- MOORE, Darren
- MORRIS, Andrew
- MOULIN, François
- MUCKENHIRN, Hannah
- NALLANTHIGHAL, Venkata Srikanth
- NATUREL, Xavier
- PARIDA, Shantipriya
- PARTHASARATHI, Sree Hari Krishnan
- PINTO, Francisco
- POTARD, Blaise
- RAZAVI, Marzieh
- SAMUI, Suman
- SEBASTIAN, Jilt
- SHAHNAWAZUDDIN, Syed
- SHAKAS, Alexis
- SHANKAR, Ravi
- SHARMA, Shivam
- SRINIVASAMURTHY, Ajay
- STEPHENSON, Todd
- STERPU, George
- SZASZAK, György
- TONG, Sibo
- TYAGI, Vivek
- ULLMANN, Raphael
- VALENTE, Fabio
- VASQUEZ-CORREA, Juan Camilo
- VITEK, Radovan
- WANG, Lei
- WELLNER, Pierre
- ZHAN, Qingran
Active Research Grants
- CMM - Conversation Member Match
- CRITERIA - Comprehensive data-driven Risk and Threat Assessment Methods for the Early and Reliable Identification, Validation and Analysis of migration-related risks
- EMIL - Emotion in the loop – a step towards a comprehensive closed-loop deep brain stimulation in Parkinson’s disease
- EVOLANG - Evolving Language
- HAAWAII - Highly Automated Air Traffic Controller Workstations with Artificial Intelligence Integration
- IICT - Inclusive Information and Communication Technologies
- NAST - Neural Architectures for Speech Technology
- NATAI - The Nature of Artificial Intelligence
- ROXANNE - Real time network, text, and speaker analytics for combating organized crime
- SHISSM - Sparse and hierarchical Structures for Speech Modeling
- SMILE-II - SMILE-II Scalable Multimodal sign language technology for sIgn language Learning and assessmEnt Phase-II
- STARFISH - STARFISH: Safety and Speech Recognition with Artificial Intelligence in the Use of Air Traffic Control
- STEADI - Storytelling and first impressions in face-to-face and algorithm-powered digital interviews
- TAPAS - Training Network on Automatic Processing of PAthological Speech
- TIPS - Towards Integrated processing of Physiological and Speech signals
- WAVE2-96 - H2020-SESAR-PJ.10-W2-Solution 96
Past Research Grants
- A-MUSE - Adaptive Multilingual Speech Processing
- AAMASSE - Acoustic Model Adaptation toward Spontaneous Speech and Environment
- ADDG2SU - Flexible Acoustic Data-Driven Grapheme to Subword Unit Conversion
- ADDG2SU_EXT. - Flexible Acoustic data-driven Grapheme to Subword Unit Conversion
- ADEL - Automatic Detection of Leadership from Voice and Body
- AI4EU - A European AI On Demand Platform and Ecosystem
- AMIDA - Augmented Multi-party Interaction with Distance Access
- AMSP - Auditory-motivated signal processing and applications to robust speech enhancement and recognition
- ATCO2 - Automatic collection and processing of voice data from air-traffic communications
- BIOWATCH - Biowatch
- COBALT - Content Based Call Filtering
- DAHL - DAHL: Domain Adaptation via Hierarchical Lexicons
- DAUM - Domain Adaptation Using Sub-Space Models
- DAUM2012 - Domain Adaptation Using Sub-Space Models
- DBOX - D-Box: A generic dialog box for multilingual conversational applications
- DEEPCHARISMA - Deep Learning Charisma
- DEEPSTD-EXT - Universal Spoken Term Detection with Deep Learning (extension)
- DEVEL-IA - Formation « Développeurs spécialisés en Intelligence Artificielle » selon le modèle de formation continue duale postgrade
- DM3 - Distributed MultiModal Media server, a low cost large capacity high throughput data storage system
- ELEARNING-VALAIS_3.0 - eLearning-Valais 3.0
- ESGEM - Enhanced Swiss German mEdia Monitoring
- FLEXASR - Flexible Grapheme-Based Automatic Speech Recognition
- FLOSS - Flexible Linguistically-guided Objective Speech aSessment
- ICS-2010 - Interactive Cognitive Systems
- IM2-3 - Interactive Multimodal Information Management Phase 3
- INEVENT - Accessing Dynamic Networked Multimedia Events
- L-PASS - Linguistic-Paralinguistic Speech Synthesis
- MALORCA - Machine Learning of Speech Recognition Models for Controller Assistance
- MASS - Multilingual Affective Speech Synthesis
- MEGANEPRO - Myo-Electricity, Gaze and Artificial Intelligence for Neurocognitive Examination and Prosthetics
- MOSPEEDI - MoSpeeDi. Motor Speech Disorders: characterizing phonetic speech planning and motor speech programming/execution and their impairments
- MPM - Multimodal People Monitoring
- MULTI08 - Multimodal Interaction and Multimedia Data Mining
- MULTI08EXT - Multimodal Interaction and Multimedia Data Mining
- MULTIVEO - High Accuracy Speaker-Independent Multilingual Automatic Speech Recognition System
- MUMMER - MultiModal Mall Entertainment Robot
- NMTBENCHMARK - Training and Benchmarking Neural MT and ASR Systems for Swiss Languages
- PANDA - Perceptual Background Noise Analysis for the Newest Generation of Telecommunication Systems
- PHASER - PHASER: Parsimonious Hierarchical Automatic Speech Recognition
- PHASER-QUAD - Parsimonious Hierarchical Automatic Speech Recognition and Query Detection
- REAPPS - Reinforced audio processing via physiological signals
- ROCKIT - Roadmap for Conversational Interaction Technologies
- SCALE - Speech Communication with Adaptive Learning
- SCOREL2 - Automatic scoring and adaptive pedagogy for oral language learning
- SHAPED - SHAPED: Speech Hybrid Analytics Platform for consumer and Enterprise Devices
- SIIP - Speaker Identification Integrated Project
- SIWIS - Spoken Interaction with Interpretation in Switzerland
- SM2 - SM2: Extracting Semantic Meaning from Spoken Material
- SMILE - Scalable Multimodal sign language Technology for sIgn language Learning and assessmEnt
- SP2 - SCOPES Project on Speech Prosody
- SUMMA - Scalable Understanding of Multilingual Media
- TA2 - Together Anywhere, Together Anytime
- TA2-EEU - Together Anywhere, Together Anytime - Enlarged European Union
- TAO-CSR - Task Adaptation and Optimisation for Conversational Speech Recognition
- UNITS - Unified Speech Processing Framework for Trustworthy Speaker Recognition
- V-FAST - Vocal-tract based Fast Adaptation for Speech Technology