You’re never far from an English word in Switzerland
Speech & Audio Processing
The expertise of the group encompasses statistical automatic speech recognition—based on hidden Markov models, or hybrid systems exploiting connectionist approaches—, text-to-speech, and generic audio processing, covering sound source localization, microphone arrays, speaker diarization, audio indexing, very low bit-rate speech coding, and perceptual background noise analysis for telecommunication systems.
Group News
As speech processing expands with diversified tasks, designing novel neural network models capable of learning meaningful speech representations gain importance.
This week Idiap's speech and machine learning group presented their joint work on Fast Transformers with Clustered Attention
Prof. Esaú Villatoto Tello, visitor professor at Idiap since September 2019, joins the SARAL project to work on the design of Cross-lingual Information Retrieval tools.
Lei joined Idiap in October 2019 for an internship in the framework of the China Scholarship Council visiting scholar program. During her internship, she worked on neural network-based mappings for single-channel dereverberation and noise reduction.
Group Job Openings
- PhD position in Neural Architectures for Speech Technology — by admin — last modified Dec 21, 2020
- In the context of a Swiss NSF grant, we seek a PhD student to work in the general area of neural architectures for speech technology.
- Internship position in domain of automatic speech recognition — by admin — last modified Jul 10, 2020
- The position will be aligned with an European project on automatic speech recognition of air-traffic communication through SESAR Joint Undertaking exploratory research
- Postdoc positions in automatic speech recognition — by admin — last modified Jul 10, 2020
- The Idiap Research Institute seeks qualified candidates for several postdoctoral positions in the field of Deep Learning applied for acoustic, and/or language modeling in Automatic Speech Recognition (ASR).
Current Group Members

BOURLARD, Hervé
(Director, EPFL Full Professor)
- website

MAGIMAI DOSS, Mathew
(Senior Researcher)
- website

MOTLICEK, Petr
(Senior Researcher)
- website

GARNER, Philip
(Senior Researcher)
- website

IMSENG, David
(Research Associate)
- website

MADIKERI, Srikanth
(Research Associate)
- website

PARIDA, Shantipriya
(Postdoctoral Researcher)
- website

PRASAD, Ravi (Shankar)
(Postdoctoral Researcher)
- website

SARFJOO, Saeed (Seyyed)
(Postdoctoral Researcher)
- website

KHOSRAVANI, Abbas
(Postdoctoral Researcher)
- website

ANTONELLO, Niccolò
(Postdoctoral Researcher)
- website

HE, Weipeng
(Postdoctoral Researcher)
- website

MARELLI, François
(Research Assistant)
- website

HERMANN, Enno
(Research Assistant)
- website

FABIEN, Maël
(Research Assistant)
- website

SCHNELL, Bastian
(Research Assistant)
- website

BITTAR, Alexandre
(Research Assistant)
- website

DUBAGUNTA, Pavankumar (Subrahmanya)
(Research Assistant)
- website

TORNAY, Sandrine
(Research Assistant)
- website

MOSTAANI, Zohreh
(Research Assistant)
- website

KABIL, Selen
(Research Assistant)
- website

FRITSCH, Julian (David)
(Research Assistant)
- website

COPPIETERS DE GIBSON, Louise
(Research Assistant)
- website

JANBAKHSHI, Parvaneh
(Research Assistant)
- website

ZULUAGA GOMEZ, Juan Pablo
(Research Assistant)
- website

PRASAD, Amrutha
(Research Assistant)
- website

NIGMATULINA, Iuliia
(Research Assistant)
- website

VYAS, Apoorv
(Research Assistant)
- website

BRAUN, Rudolf (Arseni)
(Junior Developper)
- website

VILLATORO TELLO, Esaú
(Sabbatical Academic Visitor)
- website
Alumni
- ABROL, Vinayak
- AICHINGER, Ida
- AJMERA, Jitendra
- ARADILLA ZAPATA, Guillermo
- ATHINEOS, Marios
- BABY, Deepak
- BAHAADINI, Sara
- BARBER, David
- BENZEGHIBA, Mohamed (Faouzi)
- CANDY, Romain
- CEREKOVIC, Aleksandra
- CEVHER, Volkan
- CHAVARRIAGA, Ricardo
- COLLADO, Thierry
- CRITTIN, Frank
- DEY, Subhadeep
- DIGHE, Pranay
- DINES, John
- DRYGAJLO, Andrzej
- DUFFNER, Stefan
- GALAN MOLES, Ferran
- GOMEZ ALANIS, Alejandro
- GRANDVALET, Yves
- GRANGIER, David
- HAGEN, Astrid
- HAJIBABAEI, Mahdi
- HALPERN, Bence
- HERMANSKY, Hynek
- HONNET, Pierre-Edouard
- IKBAL, Shajith
- IVANOVA, Maria
- JEANNINGROS, Loïc
- KETABDAR, Hamed
- KHONGLAH, Banriskhem (Kayang)
- KODRASI, Ina
- KRSTULOVIC, Sacha
- LATHOUD, Guillaume
- LAZARIDIS, Alexandros
- LI, Weifeng
- LOUPI, Dimitra
- MARIÉTHOZ, Johnny
- MARTINS, Renato
- MASSON, Olivier
- MBANGA NDJOCK, Pierre (Armel)
- MCCOWAN, Iain
- MENDOZA, Viviana
- MILLÁN, José del R.
- MILLIUS, Loris
- MOORE, Darren
- MORRIS, Andrew
- MOULIN, François
- MUCKENHIRN, Hannah
- NALLANTHIGHAL, Venkata Srikanth
- NATUREL, Xavier
- PARTHASARATHI, Sree Hari Krishnan
- PINTO, Francisco
- POTARD, Blaise
- RAZAVI, Marzieh
- SAMUI, Suman
- SEBASTIAN, Jilt
- SHAHNAWAZUDDIN, Syed
- SHAKAS, Alexis
- SHANKAR, Ravi
- SHARMA, Shivam
- SRINIVASAMURTHY, Ajay
- STEPHENSON, Todd
- STERPU, George
- SZASZAK, György
- TONG, Sibo
- TYAGI, Vivek
- ULLMANN, Raphael
- VALENTE, Fabio
- VITEK, Radovan
- VLASENKO, Bogdan
- WANG, Lei
- WELLNER, Pierre
- ZHAN, Qingran
Active Research Grants
- ADEL - Automatic Detection of Leadership from Voice and Body
- AI4EU - A European AI On Demand Platform and Ecosystem
- ATCO2 - Automatic collection and processing of voice data from air-traffic communications
- DAHL - DAHL: Domain Adaptation via Hierarchical Lexicons
- EVOLANG - Evolving Language
- HAAWAII - Highly Automated Air Traffic Controller Workstations with Artificial Intelligence Integration
- MOSPEEDI - MoSpeeDi. Motor Speech Disorders: characterizing phonetic speech planning and motor speech programming/execution and their impairments
- NATAI - The Nature of Artificial Intelligence
- ROXANNE - Real time network, text, and speaker analytics for combating organized crime
- SARAL - Summarization and domain-Adaptive Retrieval of Information Across Languages
- SHISSM - Sparse and hierarchical Structures for Speech Modeling
- TAPAS - Training Network on Automatic Processing of PAthological Speech
- TIPS - Towards Integrated processing of Physiological and Speech signals
Past Research Grants
- SM2 - SM2: Extracting Semantic Meaning from Spoken Material
- MASS - Multilingual Affective Speech Synthesis
- SMILE - Scalable Multimodal sign language Technology for sIgn language Learning and assessmEnt
- SHAPED - SHAPED: Speech Hybrid Analytics Platform for consumer and Enterprise Devices
- FLOSS - Flexible Linguistically-guided Objective Speech aSessment
- MUMMER - MultiModal Mall Entertainment Robot
- MPM - Multimodal People Monitoring
- REAPPS - Reinforced audio processing via physiological signals
- PHASER-QUAD - Parsimonious Hierarchical Automatic Speech Recognition and Query Detection
- DEVEL-IA - Formation « Développeurs spécialisés en Intelligence Artificielle » selon le modèle de formation continue duale postgrade
- UNITS - Unified Speech Processing Framework for Trustworthy Speaker Recognition
- SUMMA - Scalable Understanding of Multilingual Media
- MEGANEPRO - Myo-Electricity, Gaze and Artificial Intelligence for Neurocognitive Examination and Prosthetics
- DEEPCHARISMA - Deep Learning Charisma
- COBALT - Content Based Call Filtering
- SIIP - Speaker Identification Integrated Project
- MALORCA - Machine Learning of Speech Recognition Models for Controller Assistance
- MULTIVEO - High Accuracy Speaker-Independent Multilingual Automatic Speech Recognition System
- ELEARNING-VALAIS_3.0 - eLearning-Valais 3.0
- ESGEM - Enhanced Swiss German mEdia Monitoring
- NMTBENCHMARK - Training and Benchmarking Neural MT and ASR Systems for Swiss Languages
- ADDG2SU_EXT. - Flexible Acoustic data-driven Grapheme to Subword Unit Conversion
- OMSI_EXT - Objective Measurement of Speech Intelligibility
- RECAPP - Making speech technology accessible to Swiss people
- SIWIS - Spoken Interaction with Interpretation in Switzerland
- A-MUSE - Adaptive Multilingual Speech Processing
- SP2 - SCOPES Project on Speech Prosody
- PHASER - PHASER: Parsimonious Hierarchical Automatic Speech Recognition
- ADDG2SU - Flexible Acoustic Data-Driven Grapheme to Subword Unit Conversion
- L-PASS - Linguistic-Paralinguistic Speech Synthesis
- SCOREL2 - Automatic scoring and adaptive pedagogy for oral language learning
- DEEPSTD-EXT - Universal Spoken Term Detection with Deep Learning (extension)
- ROCKIT - Roadmap for Conversational Interaction Technologies
- DBOX - D-Box: A generic dialog box for multilingual conversational applications
- OMSI_ARMASUISSE - Objective Measurement of Speech Intelligibility
- BIOWATCH - Biowatch
- RECOD2014 - low bit-rate speech coding
- AAMASSE - Acoustic Model Adaptation toward Spontaneous Speech and Environment
- INEVENT - Accessing Dynamic Networked Multimedia Events
- FLEXASR - Flexible Grapheme-Based Automatic Speech Recognition
- RECOD2013 - low bit-rate speech coding
- PANDA - Perceptual Background Noise Analysis for the Newest Generation of Telecommunication Systems
- IM2-3 - Interactive Multimodal Information Management Phase 3
- DAUM2012 - Domain Adaptation Using Sub-Space Models
- RECOD2012 - Very Low bit-rate speech coding
- SCALE - Speech Communication with Adaptive Learning
- V-FAST - Vocal-tract based Fast Adaptation for Speech Technology
- DAUM - Domain Adaptation Using Sub-Space Models
- MULTI08EXT - Multimodal Interaction and Multimedia Data Mining
- ICS-2010 - Interactive Cognitive Systems
- TAO-CSR - Task Adaptation and Optimisation for Conversational Speech Recognition
- AMSP - Auditory-motivated signal processing and applications to robust speech enhancement and recognition
- TA2-EEU - Together Anywhere, Together Anytime - Enlarged European Union
- TA2 - Together Anywhere, Together Anytime
- RECOD - Low bit-rate speech coding
- DM3 - Distributed MultiModal Media server, a low cost large capacity high throughput data storage system
- MULTI08 - Multimodal Interaction and Multimedia Data Mining
- AMIDA - Augmented Multi-party Interaction with Distance Access