As part of the Horizon EC project CRiTERIA, our colleagues Dairazalia Sanchez-Cortes, Sergio Burdisso and Petr Motlicek are working on fact assessment.
Speech & Audio Processing
The expertise of the group encompasses statistical automatic speech recognition—based on hidden Markov models, or hybrid systems exploiting connectionist approaches—, text-to-speech, and generic audio processing, covering sound source localization, microphone arrays, speaker diarization, audio indexing, very low bit-rate speech coding, and perceptual background noise analysis for telecommunication systems.
Group News
Idiap researchers of the Speech & Audio Processing group have been participating on this years—6 weeks long—JSALT2023 Jelinek Summer Workshop on Speech and Language Technology held in Le Mans.
Every year, the Institute nominates two students for its internal awards. In 2022, the Paper Award goes to Alexandre Bittar, and the Student Award goes to Teguh Lembono. Congratulations!
Esau Villatoro, research associate at the Speech & Audio Processing group at Idiap, and his colleagues from the Mathematics Research Center (CIMAT) from Mexico have won first place in two competitions related to Natural Language Processing. The objective of these competitions is to improve important aspects of Mexican society such as tourism and communication.
Idiap researchers published a paper describing an approach to speech processing based on the properties of the human brain. Their method proved as efficient as the current standard, whilst conserving the advantage of energy efficiency. Moreover, their work is replicable thanks to open access software paving the way for future applications.
Group Job Openings
Our group is regularly posting job openings ranging from internships to researcher positions. To check the opportunities currently available or to submit a speculative applications use the link below.Our group is regularly posting job openings ranging from internships to researcher positions. To check the opportunities currently available or to submit a speculative applications use the link below.
Current Group Members

MAGIMAI DOSS, Mathew
(Senior Research Scientist)
- website

MOTLICEK, Petr
(Senior Research Scientist)
- website

GARNER, Philip
(Senior Research Scientist)
- website

MADIKERI, Srikanth
(Research Associate)
- website

MURALIDHAR, Skanda
(Research Associate)
- website

VILLATORO TELLO, Esaú
(Research Associate)
- website

SANCHEZ-CORTES, Dairazalia
(Postdoctoral Researcher)
- website

TORNAY, Sandrine
(Postdoctoral Researcher)
- website

VLASENKO, Bogdan
(Postdoctoral Researcher)
- website

HERMANN, Enno
(Postdoctoral Researcher)
- website

MARELLI, François
(Postdoctoral Researcher)
- website

RANGAPPA, Pradeep
(Postdoctoral Researcher)
- website

BHATTACHARJEE, Mrinmoy
(Postdoctoral Researcher)
- website

HOVSEPYAN, Sevada
(Postdoctoral Researcher)
- website

MOSTAANI, Zohreh
(PhD Student / Research Assistant)
- website

PRASAD, Amrutha
(PhD Student / Research Assistant)
- website

ZULUAGA GOMEZ, Juan Pablo
(PhD Student / Research Assistant)
- website

SARKAR, Eklavya
(PhD Student / Research Assistant)
- website

COPPIETERS DE GIBSON, Louise
(PhD Student / Research Assistant)
- website

TARIGOPULA, Neha
(PhD Student / Research Assistant)
- website

PUROHIT, Tilak
(PhD Student / Research Assistant)
- website

CHEN, Haolin
(PhD Student / Research Assistant)
- website

HE, Mutian
(PhD Student / Research Assistant)
- website

BURDISSO, Sergio (Gastón)
(R&D / Research Assistant)

EL HAJAL, Karl
(PhD Student / Research Assistant)
- website

BITTAR, Alexandre
(PhD Student / Research Assistant)
- website

VIDAL, Maxime
(Research Assistant)
- website

THORBECKE (NIGMATULINA), Iuliia
(PhD Student / Research Assistant)
- website

KUMAR, Shashi
(PhD Student / Research Assistant)
- website

NADERI, Maryam
(AI Master Student)
- website

RUFAI, Amina
(Research Intern)
- website

RUVOLO, Barbara
(Research Intern)

KHALIL, Driss
(Research Intern)
- website

SANTOS REVILLA, Andrea Elena
(Student)
Alumni
- ABROL, Vinayak
- AICHINGER, Ida
- AJMERA, Jitendra
- ANTONELLO, Niccolò
- ARADILLA ZAPATA, Guillermo
- ATHINEOS, Marios
- BABY, Deepak
- BAHAADINI, Sara
- BARBER, David
- BENZEGHIBA, Mohamed (Faouzi)
- BORNET, Annie
- BOURLARD, Hervé
- CANDY, Romain
- CAROFILIS VASCO, Roberto Andrés
- CEREKOVIC, Aleksandra
- CEVHER, Volkan
- CHAVARRIAGA, Ricardo
- CHU, Dong
- COLLADO, Thierry
- CRITTIN, Frank
- DELEZE, Maxime
- DEY, Subhadeep
- DIGHE, Pranay
- DINES, John
- DRYGAJLO, Andrzej
- DUFFNER, Stefan
- ELBANNA, Gasser
- ESPUÑA FONTCUBERTA, Aleix
- FABIEN, Maël
- FAJČÍK, Martin
- FRITSCH, Julian (David)
- GALAN MOLES, Ferran
- GOMEZ ALANIS, Alejandro
- GRANDVALET, Yves
- GRANGIER, David
- HAGEN, Astrid
- HAJIBABAEI, Mahdi
- HALPERN, Bence
- HE, Weipeng
- HERMANSKY, Hynek
- HONNET, Pierre-Edouard
- IKBAL, Shajith
- IMSENG, David
- IVANOVA, Maria
- JAIMES, Alejandro (Alex)
- JEANNINGROS, Loïc
- KETABDAR, Hamed
- KHODABAKHSHANDEH, Hamid
- KHONGLAH, Banriskhem (Kayang)
- KHOSRAVANI, Abbas
- KODRASI, Ina
- KRSTULOVIC, Sacha
- LATHOUD, Guillaume
- LAZARIDIS, Alexandros
- LI, Weifeng
- LINKE, Julian
- LOUPI, Dimitra
- MARIÉTHOZ, Johnny
- MARTINS, Renato
- MASSON, Olivier
- MAYORAZ, André
- MBANGA NDJOCK, Pierre (Armel)
- MCCOWAN, Iain
- MEIER, Corentin
- MENDOZA, Viviana
- MILLÁN, José del R.
- MOORE, Darren
- MORRIS, Andrew
- MOULIN, François
- MUCKENHIRN, Hannah
- NALLANTHIGHAL, Venkata Srikanth
- NATUREL, Xavier
- PARIDA, Shantipriya
- PARTHASARATHI, Sree Hari Krishnan
- PINTO, Francisco
- PITON, Timothy
- POCARD, Valentin
- POTARD, Blaise
- RAZAVI, Marzieh
- SAHA, Atreyee
- SALAMIN, Chloé
- SAMUI, Suman
- SARFJOO, Saeed (Seyyed)
- SEBASTIAN, Jilt
- SHAHNAWAZUDDIN, Syed
- SHAKAS, Alexis
- SHANKAR, Ravi
- SHARMA, Shivam
- SINGH, Muskaan
- SRINIVASAMURTHY, Ajay
- STEPHENSON, Todd
- STERPU, George
- SZASZAK, György
- TKACZUK, Jakub
- TONG, Sibo
- TRUSCELLO, Léonard
- TYAGI, Vivek
- ULLMANN, Raphael
- VALENTE, Fabio
- VASQUEZ-CORREA, Juan Camilo
Active Research Grants
- CRITERIA - Comprehensive data-driven Risk and Threat Assessment Methods for the Early and Reliable Identification, Validation and Analysis of migration-related risks
- EMIL - Emotion in the loop – a step towards a comprehensive closed-loop deep brain stimulation in Parkinson’s disease
- EPOC - A personalized speech recognition framework for audio messaging on the edge
- EUROCONTROL - Integrate the Automatic Speech Recognition system with eDEP, ESCAPE and audiolan
- EVOLANG - Evolving Language
- IICT - Inclusive Information and Communication Technologies
- NAST - Neural Architectures for Speech Technology
- NATAI - The Nature of Artificial Intelligence
- SMILE-II - SMILE-II Scalable Multimodal sign language technology for sIgn language Learning and assessmEnt Phase-II
- STEADI - Storytelling and first impressions in face-to-face and algorithm-powered digital interviews
- TIPS - Towards Integrated processing of Physiological and Speech signals
- TRACY - A big-data analyTics from base-stations Registrations And Cdrs e-evidence sYstem
Past Research Grants
- AAMASSE - Acoustic Model Adaptation toward Spontaneous Speech and Environment
- ADDG2SU - Flexible Acoustic Data-Driven Grapheme to Subword Unit Conversion
- ADDG2SU_EXT - Flexible Acoustic data-driven Grapheme to Subword Unit Conversion
- AI4EU - A European AI On Demand Platform and Ecosystem
- AMIDA - Augmented Multi-party Interaction with Distance Access
- AMSP - Auditory-motivated signal processing and applications to robust speech enhancement and recognition
- ATCO2 - Automatic collection and processing of voice data from air-traffic communications
- BIOWATCH - Biowatch
- CLAS3 - Cross-Lingual Adaptation for Text to Speech Synthesis (CLAS3)
- CMM - Conversation Member Match
- COBALT - Content Based Call Filtering
- DAHL - DAHL: Domain Adaptation via Hierarchical Lexicons
- DAUM - Domain Adaptation Using Sub-Space Models
- DAUM2012 - Domain Adaptation Using Sub-Space Models
- DBOX - D-Box: A generic dialog box for multilingual conversational applications
- DEEPCHARISMA - Deep Learning Charisma
- DEEPSTD-EXT - Universal Spoken Term Detection with Deep Learning (extension)
- DEVEL-IA - Formation « Développeurs spécialisés en Intelligence Artificielle » selon le modèle de formation continue duale postgrade
- DIMHA - Diarizing Massive Amounts of Heterogeneous Audio
- DM3 - Distributed MultiModal Media server, a low cost large capacity high throughput data storage system
- ELEARNING-VALAIS_3.0 - eLearning-Valais 3.0
- EMIME - Effective Multilingual Interaction in Mobile Environments
- ESGEM - Enhanced Swiss German mEdia Monitoring
- FLEXASR - Flexible Grapheme-Based Automatic Speech Recognition
- FLOSS - Flexible Linguistically-guided Objective Speech aSessment
- GENEEMO - Geneemo: An Expressive Audio Content Generation Tool
- HAAWAII - Highly Automated Air Traffic Controller Workstations with Artificial Intelligence Integration
- ICS-2010 - Interactive Cognitive Systems
- IM2-3 - Interactive Multimodal Information Management Phase 3
- INEVENT - Accessing Dynamic Networked Multimedia Events
- L-PASS - Linguistic-Paralinguistic Speech Synthesis
- MALORCA - Machine Learning of Speech Recognition Models for Controller Assistance
- MASS - Multilingual Affective Speech Synthesis
- MEGANEPRO - Myo-Electricity, Gaze and Artificial Intelligence for Neurocognitive Examination and Prosthetics
- MOSPEEDI - MoSpeeDi. Motor Speech Disorders: characterizing phonetic speech planning and motor speech programming/execution and their impairments
- MPM - Multimodal People Monitoring
- MULTI08 - Multimodal Interaction and Multimedia Data Mining
- MULTI08EXT - Multimodal Interaction and Multimedia Data Mining
- MULTIVEO - High Accuracy Speaker-Independent Multilingual Automatic Speech Recognition System
- MUMMER - MultiModal Mall Entertainment Robot
- PANDA - Perceptual Background Noise Analysis for the Newest Generation of Telecommunication Systems
- PHASER - PHASER: Parsimonious Hierarchical Automatic Speech Recognition
- PHASER-QUAD - Parsimonious Hierarchical Automatic Speech Recognition and Query Detection
- REAPPS - Reinforced audio processing via physiological signals
- RECAPP - Making speech technology accessible to Swiss people
- ROCKIT - Roadmap for Conversational Interaction Technologies
- RODI - Role based speaker diarization
- ROXANNE - Real time network, text, and speaker analytics for combating organized crime
- SARAL - Summarization and domain-Adaptive Retrieval of Information Across Languages
- SCALE - Speech Communication with Adaptive Learning
- SCOREL2 - Automatic scoring and adaptive pedagogy for oral language learning
- SESAME - SEarching Swiss Audio MEmories
- SHAPED - SHAPED: Speech Hybrid Analytics Platform for consumer and Enterprise Devices
- SHISSM - Sparse and hierarchical Structures for Speech Modeling
- SIIP - Speaker Identification Integrated Project
- SIWIS - Spoken Interaction with Interpretation in Switzerland
- SMILE - Scalable Multimodal sign language Technology for sIgn language Learning and assessmEnt
- SP2 - SCOPES Project on Speech Prosody
- STARFISH - STARFISH: Safety and Speech Recognition with Artificial Intelligence in the Use of Air Traffic Control
- SUMMA - Scalable Understanding of Multilingual Media
- TA2 - Together Anywhere, Together Anytime
- TA2-EEU - Together Anywhere, Together Anytime - Enlarged European Union
- TAO-CSR - Task Adaptation and Optimisation for Conversational Speech Recognition
- TAPAS - Training Network on Automatic Processing of PAthological Speech
- UNITS - Unified Speech Processing Framework for Trustworthy Speaker Recognition
- V-FAST - Vocal-tract based Fast Adaptation for Speech Technology
- VEOVOX - VeoVox: Voice-Controlled Order-Taking System for Restaurants
- WAVE2-96 - H2020-SESAR-PJ.10-W2-Solution 96