Selected Publications

Recent Publications

. Interspeech 2020. 2020.

Recent & Upcoming Talks

Recent Posts

February 2021: Talk given on automatic speech recognition challenges at Swisscom seminar

October 2020: Lecture given at FDP seminar on automatic speech recognition (KIIT School Of Computer Applications, India, Faculty Development Programme (FDP))
- News announced
- Video

HAAWAII project kick-off in June 2020
- Website

ATCO2 project kick-off in November 2019
Website

Post date 01/Oct/2019: MALORCA prpject has been showcased among selected SESAR JU project at the European R&I Days: https://www.sesarju.eu/news/sesar-ju-showcased-projects-results-european-ri-days

Post date 01/Sept/2019: ROXANNE project has been launched. The kick off meeting was held in Martigny, with more than 40 participants: http://roxanne-euproject.org/ Consortium photo

** machine translation workshop

2019 https://lotus.kuee.kyoto-u.ac.jp/WAT/WAT2019/index.html (Parida as one of organisers) Our Task: WAT2019 Multi-Modal Translation Task Description: In 2019, the Workshop on Asian Translation 2019 (WAT2019) included the task of multimodal English-to-Hindi translation for the first time in its history. The task relies on our “Hindi Visual Genome”, a multimodal dataset consisting of text and images suitable for English-Hindi multimodal machine translation task and multimodal research. Link: https://ufal.mff.cuni.cz/hindi-visual-genome/wat-2019-multimodal-task

2020 https://lotus.kuee.kyoto-u.ac.jp/WAT/WAT2020/index.html (Parida as one of organisers) Task: WAT2019 Multi-Modal Translation Task Description: In 2019, the Workshop on Asian Translation 2019 (WAT2019) included the task of multimodal English-to-Hindi translation for the first time in its history. The task relies on our “Hindi Visual Genome”, a multimodal dataset consisting of text and images suitable for English-Hindi multimodal machine translation task and multimodal research. Link: https://ufal.mff.cuni.cz/hindi-visual-genome/wat-2019-multimodal-task

** fake news detection task in June 2020 maybe of interest to be disseminated through the Idiap: https://sites.google.com/view/mex-a3t/results?authuser=0

Twitter: https://twitter.com/LabTL_INAOE/status/1266529360477110272 https://twitter.com/EsauVT/status/1266483731516338177

In case of another challenge, the Idiap was scored as second: Congratulations on the GermEval 2020 OMT shared task results (second place). Hope to see the official results of all participants on codalab soon.

** new phd Mahdi ** ** Adobe research gift **

** Interspeech 2019 presented**

** TSD 2019 presented **

** ATM Vienna conference - paper presentation with DLR**

** CSEM project started in November 2018 **

** SARAl results **

** Shantipryia started 2018 **

** EUROCONTROL ** presentation in 2018

** Dey finished his PhD **

** MPM project started in June 2018**

** Logitech project started in April 2018 **

** paper of Weipeng - on the youtube ** **nist evals participated **

**3rd field-test SIIP **

**malorca workshop **

** MALORCA has finished in March 2018**

** SIIP has foinished in April 2018 **

** MuMMER project started in XXX

** Icassp 2017 - best paper awards for Dey**

Post date 29/Nov/2018: DBOX project has been selected as one of success story projects, and its results were presented at Eureka web: https://www.eurostars-eureka.eu/speech-recognition-brings-gaming-next-level

–>

Current projects

Past projects can be found here

HAAWAII - Highly Automated Air Traffic Controller Workstations with Artificial Intelligence Integration

This project has received funding from the SESAR Joint Undertaking under Grant Agreement No. 884287, under European Union’s Horizon 2020 Research and Innovation programme.

ATCO2 - Automatic collection and processing of voice data from air-traffic communications

ATCO2 is H2020 EC project. It has received funding from the Clean Sky 2 Joint Undertaking (JU) under grant agreement No 864702.

ROXANNE Real time netwOrk, teXt and speaker ANalytics for combating orgaNized crimE

ROXANNE (Real time network, text, and speaker analytics for combating organized crime) is an EU funded collaborative research and innovation project, aiming to unmask criminal networks and their members as well as to reveal the true identity of perpetrators by combining the capabilities of speech/language technologies and visual analysis with network analysis. ROXANNE collaborates with Law Enforcement Agencies (LEAs), industry and researchers to develop new tools to speed up investigative processes and support LEA decision-making. The end-product will be an advanced technical platform which uses new tools to uncover and track organized criminal networks, underpinned by a strong legal framework. The project consortium comprises 24 European organisations from 16 countries while 11 of them are LEAs from 10 different countries.

MDM - multimodal people monitoring using sound and vision

MDM Multimodal people monitoring project is a collaborative project between the Idiap Research Institute and Swiss Center for Electronics and Microtechnology (CSEM).

MuMMER - MultiModal Mall Entertainment Robot

The goal of MuMMER is to develop a humanoid robot (based on Softbank’s Pepper platform) that can interact autonomously and naturally in the dynamic environments of a public shopping mall, providing an engaging and entertaining experience to the general public.

SARAL - Summarization and domain‐Adaptive Retrieval of Information Across Languages

SARAL is IARPA U.S. project coordinated by USC Viterbi School of Engineering, California.

SHAPED: Speech Hybrid Analytics Platform for Consumer and Enterprise Devices

The objective of the SHAPED project is to define a software architecture and set of algorithms enabling the most effective processing of speech between the embedded device and the cloud, balancing user experience and operation costs across the range of Logitech voice-enabled interface devices.

SM2 - extracting Semantic Meaning from Spoken Material

SM2 project aims to develop a customisable technology for “semantic keyword and concept detection” allowing bank institutes to meet MIFID requirements. The solution allows to search in all kind of electronic documents (speech/video/text) and analyse according to predefined semantic categories.

Past Projects

SIIP: Speaker identification integrated project [May 2014 - April 2018]

  • Funding: FP7 EC

  • Web: http://www.siip.eu

  • Summary: Funded by the European Commission, SIIP research project has developed a breaking-through Suspect Identification solution based on a novel Speaker Identification (SID) engine and Global Info Sharing Mechanism (GISM) which identify unknown speakers that are captured in lawfully intercepted calls, in recorded crime or terror arenas, in social-media and in any other type of speech sources.


MALORCA: Machine Learning of Speech Recognition Models for Controller Assistance [April 2016 - March 2018]

  • Funding: H2020 EC SESAR Joint Undertaking project

  • Web: http://www.malorca-project.de/

  • Summary: Malorca project proposes a general, cheap and effective solution to automate re-learning, adaptation and customisation process of automatic speech recognition models applied for air-traffic control domain. Both the radar and speech recordings (of ATCOs) are used as input data.


DBOX: A generic dialog box for multilingual conversational applications [2012 - 2015]

  • Funding: EC Eurostars program

  • Web: http://www.idiap.ch/project/d-box/front-page

  • Summary: From a research point of view, DBOX project aims at building a multilingual conversational agent which will seamlessly interact with multiple users speaking different languages and driven by a common goal defined by the game. This involves the development and integration of multilingual speech recognition systems, multilingual speech synthesis, multilingual dialog modeling, and cross-domain adaptation resources. From an integration and evaluation point of view the project’s key innovative idea is that the overall anticipated framework will be application-agnostic.

  • The project was selected as one of success story Eurostars projects: https://www.eurostars-eureka.eu/speech-recognition-brings-gaming-next-level


TA2: Together Anywhere, Together Anytime [2008 - 2012]

  • Funding: FP7 EC

  • Web: http://www.ta2-project.eu/

  • Summary: TA2 aims at defining end-to-end systems for the development and delivery of new, creative forms of interactive, immersive, high quality media experiences for groups of users such as households and families. The overall vision of TA2 can be summarised as “making communications and engagement easier among groups of people separated in space and time.


Samsung (South Korea) - Spontaneous speech recognition exploiting natural interfaces (2011-2014)
CTI (Idiap/Koemei) - Task Adaptation and Optimisation for Conversational Speech Recognition (2011-2012)
Armasuisse (Switzerland) - Low bit-rate speech coding (2011-2012)
TA2 (EC FP7) - Together Anywhere, Together Anytime (2008-2012)
DIRAC (EC, FP6) - Detection and Identification of Rare Audio-visual Cues (2007-2010)
Qualcomm (USA) - Speech and audio coding (2005-2007)
Qualcomm (USA) - Aurora: Advanced DSR Front-end, USA (2000-2001)
BARRANDE (France) - Codade de la parole a tres bas debit independent de la langue (1999-2000)

Teaching

Two courses during winter semesters at EPFL:

Digital Speech and Audio Coding: The goal of this course is to introduce the engineering students state-of-the-art speech and audio coding techniques with an emphasis on the integration of knowledge about sound production and auditory perception through signal processing techniques (EDEE PhD course, doctoral course of electrical engineering): https://edu.epfl.ch/coursebook/en/digital-speech-and-audio-coding-EE-719?cb_cycle=edoc&cb_section=edee

Automatic speech processing: the goal of this course is to provide the students with the main formalisms, models and algorithms required for the implementation of advanced speech processing applications (involving, among others, speech coding, speech analysis/synthesis, and speech recognition), (assistant at labs, Electrical and electronics engineering, masters): https://edu.epfl.ch/coursebook/en/automatic-speech-processing-EE-554

Contact

Current Phd students:

Weipeng He
Qingran Zhan
Mael Fabien
Juan Zuluaga
Amrutha Prasad

Current Postdocs:

Shantipriya Parida
Saeed Sarfjoo

Current Interns:

Past students:
Subhadeep Dey Ajay Srinivasamurthy
Ivan Himawan (Queensland University of Technology)
Gwenole Lecovre