Selected Publications

Recent Publications

. Interspeech 2020. 2020.

Recent & Upcoming Talks

Recent Posts

October 2020: Lecture given at FDP seminar on automatic speech recognition:
- FDP workshop details
- News announced

Current projects

Past projects can be found here

HAAWAII - Highly Automated Air Traffic Controller Workstations with Artificial Intelligence Integration

This project has received funding from the SESAR Joint Undertaking under Grant Agreement No. 884287, under European Union’s Horizon 2020 Research and Innovation programme.

ATCO2 - Automatic collection and processing of voice data from air-traffic communications

ATCO2 is H2020 EC project. It has received funding from the Clean Sky 2 Joint Undertaking (JU) under grant agreement No 864702.

ROXANNE Real time netwOrk, teXt and speaker ANalytics for combating orgaNized crimE

ROXANNE (Real time network, text, and speaker analytics for combating organized crime) is an EU funded collaborative research and innovation project, aiming to unmask criminal networks and their members as well as to reveal the true identity of perpetrators by combining the capabilities of speech/language technologies and visual analysis with network analysis. ROXANNE collaborates with Law Enforcement Agencies (LEAs), industry and researchers to develop new tools to speed up investigative processes and support LEA decision-making. The end-product will be an advanced technical platform which uses new tools to uncover and track organized criminal networks, underpinned by a strong legal framework. The project consortium comprises 24 European organisations from 16 countries while 11 of them are LEAs from 10 different countries.

MDM - multimodal people monitoring using sound and vision

MDM Multimodal people monitoring project is a collaborative project between the Idiap Research Institute and Swiss Center for Electronics and Microtechnology (CSEM).

MuMMER - MultiModal Mall Entertainment Robot

The goal of MuMMER is to develop a humanoid robot (based on Softbank’s Pepper platform) that can interact autonomously and naturally in the dynamic environments of a public shopping mall, providing an engaging and entertaining experience to the general public.

SARAL - Summarization and domain‐Adaptive Retrieval of Information Across Languages

SARAL is IARPA U.S. project coordinated by USC Viterbi School of Engineering, California.

SHAPED: Speech Hybrid Analytics Platform for Consumer and Enterprise Devices

The objective of the SHAPED project is to define a software architecture and set of algorithms enabling the most effective processing of speech between the embedded device and the cloud, balancing user experience and operation costs across the range of Logitech voice-enabled interface devices.

SM2 - extracting Semantic Meaning from Spoken Material

SM2 project aims to develop a customisable technology for “semantic keyword and concept detection” allowing bank institutes to meet MIFID requirements. The solution allows to search in all kind of electronic documents (speech/video/text) and analyse according to predefined semantic categories.

Past Projects

SIIP: Speaker identification integrated project [May 2014 - April 2018]

  • Funding: FP7 EC

  • Web: http://www.siip.eu

  • Summary: Funded by the European Commission, SIIP research project has developed a breaking-through Suspect Identification solution based on a novel Speaker Identification (SID) engine and Global Info Sharing Mechanism (GISM) which identify unknown speakers that are captured in lawfully intercepted calls, in recorded crime or terror arenas, in social-media and in any other type of speech sources.


MALORCA: Machine Learning of Speech Recognition Models for Controller Assistance [April 2016 - March 2018]

  • Funding: H2020 EC SESAR Joint Undertaking project

  • Web: http://www.malorca-project.de/

  • Summary: Malorca project proposes a general, cheap and effective solution to automate re-learning, adaptation and customisation process of automatic speech recognition models applied for air-traffic control domain. Both the radar and speech recordings (of ATCOs) are used as input data.


DBOX: A generic dialog box for multilingual conversational applications [2012 - 2015]

  • Funding: EC Eurostars program

  • Web: http://www.idiap.ch/project/d-box/front-page

  • Summary: From a research point of view, DBOX project aims at building a multilingual conversational agent which will seamlessly interact with multiple users speaking different languages and driven by a common goal defined by the game. This involves the development and integration of multilingual speech recognition systems, multilingual speech synthesis, multilingual dialog modeling, and cross-domain adaptation resources. From an integration and evaluation point of view the project’s key innovative idea is that the overall anticipated framework will be application-agnostic.

  • The project was selected as one of success story Eurostars projects: https://www.eurostars-eureka.eu/speech-recognition-brings-gaming-next-level


TA2: Together Anywhere, Together Anytime [2008 - 2012]

  • Funding: FP7 EC

  • Web: http://www.ta2-project.eu/

  • Summary: TA2 aims at defining end-to-end systems for the development and delivery of new, creative forms of interactive, immersive, high quality media experiences for groups of users such as households and families. The overall vision of TA2 can be summarised as “making communications and engagement easier among groups of people separated in space and time.


Samsung (South Korea) - Spontaneous speech recognition exploiting natural interfaces (2011-2014)
CTI (Idiap/Koemei) - Task Adaptation and Optimisation for Conversational Speech Recognition (2011-2012)
Armasuisse (Switzerland) - Low bit-rate speech coding (2011-2012)
TA2 (EC FP7) - Together Anywhere, Together Anytime (2008-2012)
DIRAC (EC, FP6) - Detection and Identification of Rare Audio-visual Cues (2007-2010)
Qualcomm (USA) - Speech and audio coding (2005-2007)
Qualcomm (USA) - Aurora: Advanced DSR Front-end, USA (2000-2001)
BARRANDE (France) - Codade de la parole a tres bas debit independent de la langue (1999-2000)

Teaching

Two courses during winter semesters at EPFL:

Digital Speech and Audio Coding: The goal of this course is to introduce the engineering students state-of-the-art speech and audio coding techniques with an emphasis on the integration of knowledge about sound production and auditory perception through signal processing techniques (EDEE PhD course, doctoral course of electrical engineering): https://edu.epfl.ch/coursebook/en/digital-speech-and-audio-coding-EE-719?cb_cycle=edoc&cb_section=edee

Automatic speech processing: the goal of this course is to provide the students with the main formalisms, models and algorithms required for the implementation of advanced speech processing applications (involving, among others, speech coding, speech analysis/synthesis, and speech recognition), (assistant at labs, Electrical and electronics engineering, masters): https://edu.epfl.ch/coursebook/en/automatic-speech-processing-EE-554

Contact

Current Phd students:

Weipeng He
Qingran Zhan
Mael Fabien
Juan Zuluaga
Amrutha Prasad

Current Postdocs:

Shantipriya Parida
Saeed Sarfjoo

Current Interns:

Past students:
Subhadeep Dey Ajay Srinivasamurthy
Ivan Himawan (Queensland University of Technology)
Gwenole Lecovre