Speech and Audio Processing

Speech processing has been one of the mainstays of Idiap’s research portfolio for many years. Today it is still the largest group within the institute, and Idiap continues to be recognised as a leading proponent in the field. The expertise of the group encompasses statistical automatic speech recognition (based on hidden Markov models, or hybrid systems exploiting connectionist approaches), text-to-speech, and generic audio processing (covering sound source localization, microphone arrays, speaker diarization, audio indexing, very low bit-rate speech coding, and perceptual background noise analysis for telecommunication systems).

Video Presentation

Current Group Members

(Director, EPFL Full Professor)
- website

(Senior Researcher)
- website

GARNER, Philip
(Senior Researcher)
- website

(Senior Researcher)
- website

MADIKERI, Srikanth
(Research Associate)
- website

(Research Associate)
- website

(Research Associate)
- website

BABY, Deepak
(Postdoctoral Researcher)
- website

(Postdoctoral Researcher)
- website

PRASAD, Ravi (Shankar)
(Postdoctoral Researcher)
- website

(Postdoctoral Researcher)
- website

PARIDA, Shantipriya
(Postdoctoral Researcher)
- website

(Research Assistant)
- website

BITTAR, Alexandre
(Research Assistant)
- website

TONG, Sibo
(Research Assistant)
- website

TORNAY, Sandrine
(Research Assistant)
- website

(Research Assistant)
- website

HE, Weipeng
(Research Assistant)
- website

FRITSCH, Julian (David)
(Research Assistant)
- website

ZHAN, Qingran
(Research Assistant)
- website

(Research Assistant)
- website

(Research Assistant)
- website

KABIL, Selen
(Research Assistant)
- website

SCHNELL, Bastian
(Research Assistant)
- website

VYAS, Apoorv
(Research Assistant)
- website

MARELLI, François
(Research Assistant)
- website

BRAUN, Rudolf (Arseni)
(Research Intern)
- website

- website

(Sabbatical Academic Visitor)
- website


  • AJMERA, Jitendra
  • ARADILLA ZAPATA, Guillermo
  • ATHINEOS, Marios
  • BARBER, David
  • BENZEGHIBA, Mohamed (Faouzi)
  • CEREKOVIC, Aleksandra
  • CEVHER, Volkan
  • CHAVARRIAGA, Ricardo
  • COLLADO, Thierry
  • CRITTIN, Frank
  • DINES, John
  • DRYGAJLO, Andrzej
  • DUFFNER, Stefan
  • GALAN MOLES, Ferran
  • GRANGIER, David
  • HAGEN, Astrid
  • HERMANSKY, Hynek
  • HONNET, Pierre-Edouard
  • IKBAL, Shajith
  • IVANOVA, Maria
  • KETABDAR, Hamed
  • LATHOUD, Guillaume
  • LAZARIDIS, Alexandros
  • LI, Weifeng
  • MARIÉTHOZ, Johnny
  • MARTINS, Renato
  • MASSON, Olivier
  • MCCOWAN, Iain
  • MILLÁN, José del R.
  • MOORE, Darren
  • MORRIS, Andrew
  • MOSTAANI, Zohreh
  • MOULIN, François
  • NATUREL, Xavier
  • PARTHASARATHI, Sree Hari Krishnan
  • PINTO, Francisco
  • POTARD, Blaise
  • SHANKAR, Ravi
  • SZASZAK, György
  • TYAGI, Vivek
  • ULLMANN, Raphael
  • VALENTE, Fabio
  • WELLNER, Pierre

Current Projects

Recent Projects

Group News

Job opening: Senior Researcher positions at the Idiap research Institute
research — Feb 28, 2020

The Idiap Research Institute is an internationally renowned independent research institute. Since its inception in 1991 it has been affiliated with EPFL and University of Geneva, and has been funded by the Federal Government, the State of Valais and the City of Martigny.

Spiking neural architectures for speech prosody
education — Apr 11, 2019

In the context of a new Swiss NSF grant, we seek a PhD student to work on neural architectures for speech technology. At the outset, we expect the work to involve spiking neural networks, and to be applied to synthesis of speech prosody.

R&D Engineer in Speech Processing
education — Feb 28, 2019

To further advance our research and development along with technology transfer activities, the Idiap Research Institute seeks one or more R&D engineers.

Domain adaptation for speech and language processing
education — Feb 08, 2019

The Idiap Research Institute in partnership with Swisscom invites applications for a post-doctoral (or similarly qualified) position in automatic speech recognition (ASR) and natural language processing (NLP). The position is funded by Swisscom, with a view to a long term collaboration.

Idiap has a new opening for a post-doctoral position on joint modeling of speech and physiological signals
education — Mar 16, 2018

The Idiap Research Institute together with the Swiss Center for Electronics and Microtechnology (CSEM) seeks a qualified candidate for postdoctoral position on joint modeling of speech and physiological signals. The research and development will be take place in the context of CSEM-Idiap collaboration project AUDIO: Reinforced Audio Processing via Physiological Signals. Briefly, this project combines Idiap's expertise on speech processing and CSEM's expertise on physiological signal acquisition to develop a platform that acquires speech signal synchronously with physiological signals and body sounds, and models them jointly for human conversation analysis.

3 promotions au poste de Senior Researcher à l’Idiap
institute — Jan 30, 2018

Suite à un processus d’évaluation très stricte – comprenant une nomination par la Direction et une validation par le Collège scientifique -, l’Idiap est heureux d’annoncer la promotion de trois de ses chercheurs au poste de Senior Researcher :

3 promotions to Senior Researcher at Idiap
institute — Jan 30, 2018

Following a very strict evaluation process – comprising the nomination by Idiap's management and a formal approval by the Scientific College - , Idiap is pleased to announce the promotion of three of its researchers to Senior Researchers:

Idiap has a new opening for two PhD positions in Pathological Speech Processing
education — Dec 15, 2017

The Idiap Research Institute seeks qualified candidates for two PhD positions in the area of pathological speech processing. The research and development will take place in the context of EU funded Marie-Sklodowska Curie Actions Innovative Training Networks European Training Networks TAPAS - Training Network on Automatic Processing of Pathological Speech .

Idiap has a new opening for an Internship position in Spiking Networks for Prosody Synthesis
education — Oct 18, 2017

Many audio events, such as those that happen in the vocal tract when speaking, can be characterised as having a start time and duration. The duration can be several samples or frames. However, this is at odds with current audio synthesis methods, which tend to use fixed-duration frame-based models. It follows that more natural audio synthesis may arise from more natural models.

Idiap Speaker Series : 'Multilingual speech recognition in under-resourced environments' (Webcast now available).
research — Jun 19, 2017

When speech processing systems are designed for use in multilingual environments, additional complexity is introduced. Identifying when language switching has occurred, predicting how cross-lingual terms will be pronounced, obtaining sufficient speech data from diverse language backgrounds: such factors all complicate the development of practical speech-oriented systems. In this talk, I will discuss our research group's experience in building speech recognition systems for the South African environment, one in which 11 official languages are recognised. I will also show how this relates to our participation in the BABEL project, a recent 5-year international collaborative project aimed at solving the spoken term detection task in under-resourced languages.

Idiap has a new opening for a Post-doctoral position in automatic speech recognition
education — Nov 08, 2016

The Idiap Research Institute invites applications for post-doctoral position in automatic speech recognition. The position is funded by a new industrial project with a leading credit card company in Switzerland. The research and development project will focus on combining technologies of speech recognition with speaker verification. The research will be carried out in a collaboration with other (i.e. European H2020) projects already running at the Idiap research institute.

Machines learn to speak Swiss-German
research — Feb 29, 2016

With regard to devices using voice control, the Swiss German population has so far been left out in the cold. At the best, smartphones, smart TVs and other tools of this kind understand High German, but have no chance when the Swiss German dialect is concerned. But this will change soon.

La recherche évolue de manière très rapide.
institute — Jul 23, 2013

Le déferlement d’outils technologiques accélère la communication et bouleverse notre rapport au monde. Basé à Martigny, l’Idiap – qui mène des projets de recherche fondamentale au plus haut niveau – travaille à l’amélioration des relations personne-machine et à l’optimisation de la communication humaine. Ce prestigieux institut s’engage pour un progrès scientifique au service de l’homme. Interview de son directeur, Hervé Bourlard, expert mondial du traitement de la parole et également professeur à l’Ecole polytechnique fédérale de Lausanne (EPFL).

Multimodal Signal Processing: Human Interactions in Meetings
research — Jun 14, 2012

A new book on multimodal signal processing for the analysis of human communication has been published by Cambridge University Press on June 7, 2012. The book was edited by Hervé Bourlard and Andrei Popescu-Belis, with colleagues from the University of Edinburgh, and five other Idiap researchers have contributed chapters to it.

New Book on Multimodal signal processing
research — Oct 29, 2009

Multimodal signal processing: methods and techniques to build multimodal interactive systems by Jean-Philippe Thiran (Author), Herve Bourlard (Author), Ferran Marques (Author), Academic Press Inc (23 novembre 2009), 448 pages, ISBN-10: 0123748259