I am a senior researcher at Idiap Research Institute in Martigny, Switzerland, and also an external teacher at Brno University of Technology (BUT). My research interests include audio and speech processing (especially speech and speaker recognition), and its applications for various tasks (i.e. embedded platforms, robotics, forensic voice comparison).
Part of my scientific work was also done at Oregon Graduate Institute, USA and at Ecole Superieure d’Ingenieurs en Electrotechnique et Electronique, France.
PhD in Computer Science, 2003
Faculty of Information Technology, Brno University of Technology
MEng in Electrical Engineering, 1999
Faculty of Electrial Engineering , Brno University of Technology
February 2021: Talk given on automatic speech recognition challenges at Swisscom seminar
HAAWAII project kick-off in June 2020
ATCO2 project kick-off in November 2019
Post date 01/Oct/2019: MALORCA prpject has been showcased among selected SESAR JU project at the European R&I Days: https://www.sesarju.eu/news/sesar-ju-showcased-projects-results-european-ri-days
Post date 01/Sept/2019: ROXANNE project has been launched. The kick off meeting was held in Martigny, with more than 40 participants: http://roxanne-euproject.org/
** machine translation workshop
2019 https://lotus.kuee.kyoto-u.ac.jp/WAT/WAT2019/index.html (Parida as one of organisers) Our Task: WAT2019 Multi-Modal Translation Task Description: In 2019, the Workshop on Asian Translation 2019 (WAT2019) included the task of multimodal English-to-Hindi translation for the first time in its history. The task relies on our “Hindi Visual Genome”, a multimodal dataset consisting of text and images suitable for English-Hindi multimodal machine translation task and multimodal research. Link: https://ufal.mff.cuni.cz/hindi-visual-genome/wat-2019-multimodal-task
2020 https://lotus.kuee.kyoto-u.ac.jp/WAT/WAT2020/index.html (Parida as one of organisers) Task: WAT2019 Multi-Modal Translation Task Description: In 2019, the Workshop on Asian Translation 2019 (WAT2019) included the task of multimodal English-to-Hindi translation for the first time in its history. The task relies on our “Hindi Visual Genome”, a multimodal dataset consisting of text and images suitable for English-Hindi multimodal machine translation task and multimodal research. Link: https://ufal.mff.cuni.cz/hindi-visual-genome/wat-2019-multimodal-task
** fake news detection task in June 2020 maybe of interest to be disseminated through the Idiap: https://sites.google.com/view/mex-a3t/results?authuser=0
In case of another challenge, the Idiap was scored as second: Congratulations on the GermEval 2020 OMT shared task results (second place). Hope to see the official results of all participants on codalab soon.
** new phd Mahdi ** ** Adobe research gift **
** Interspeech 2019 presented**
** TSD 2019 presented **
** ATM Vienna conference - paper presentation with DLR**
** CSEM project started in November 2018 **
** SARAl results **
** Shantipryia started 2018 **
** EUROCONTROL ** presentation in 2018
** Dey finished his PhD **
** MPM project started in June 2018**
** Logitech project started in April 2018 **
** paper of Weipeng - on the youtube ** **nist evals participated **
**3rd field-test SIIP **
**malorca workshop **
** MALORCA has finished in March 2018**
** SIIP has foinished in April 2018 **
** MuMMER project started in XXX
** Icassp 2017 - best paper awards for Dey**
Post date 29/Nov/2018: DBOX project has been selected as one of success story projects, and its results were presented at Eureka web: https://www.eurostars-eureka.eu/speech-recognition-brings-gaming-next-level
Past projects can be found here
This project has received funding from the SESAR Joint Undertaking under Grant Agreement No. 884287, under European Union’s Horizon 2020 Research and Innovation programme.
ATCO2 is H2020 EC project. It has received funding from the Clean Sky 2 Joint Undertaking (JU) under grant agreement No 864702.
ROXANNE (Real time network, text, and speaker analytics for combating organized crime) is an EU funded collaborative research and innovation project, aiming to unmask criminal networks and their members as well as to reveal the true identity of perpetrators by combining the capabilities of speech/language technologies and visual analysis with network analysis. ROXANNE collaborates with Law Enforcement Agencies (LEAs), industry and researchers to develop new tools to speed up investigative processes and support LEA decision-making. The end-product will be an advanced technical platform which uses new tools to uncover and track organized criminal networks, underpinned by a strong legal framework. The project consortium comprises 24 European organisations from 16 countries while 11 of them are LEAs from 10 different countries.
The goal of MuMMER is to develop a humanoid robot (based on Softbank’s Pepper platform) that can interact autonomously and naturally in the dynamic environments of a public shopping mall, providing an engaging and entertaining experience to the general public.
SARAL is IARPA U.S. project coordinated by USC Viterbi School of Engineering, California.
SM2 project aims to develop a customisable technology for “semantic keyword and concept detection” allowing bank institutes to meet MIFID requirements. The solution allows to search in all kind of electronic documents (speech/video/text) and analyse according to predefined semantic categories.
SIIP: Speaker identification integrated project [May 2014 - April 2018]
Funding: FP7 EC
Summary: Funded by the European Commission, SIIP research project has developed a breaking-through Suspect Identification solution based on a novel Speaker Identification (SID) engine and Global Info Sharing Mechanism (GISM) which identify unknown speakers that are captured in lawfully intercepted calls, in recorded crime or terror arenas, in social-media and in any other type of speech sources.
MALORCA: Machine Learning of Speech Recognition Models for Controller Assistance [April 2016 - March 2018]
Funding: H2020 EC SESAR Joint Undertaking project
Summary: Malorca project proposes a general, cheap and effective solution to automate re-learning, adaptation and customisation process of automatic speech recognition models applied for air-traffic control domain. Both the radar and speech recordings (of ATCOs) are used as input data.
DBOX: A generic dialog box for multilingual conversational applications [2012 - 2015]
Funding: EC Eurostars program
Summary: From a research point of view, DBOX project aims at building a multilingual conversational agent which will seamlessly interact with multiple users speaking different languages and driven by a common goal defined by the game. This involves the development and integration of multilingual speech recognition systems, multilingual speech synthesis, multilingual dialog modeling, and cross-domain adaptation resources. From an integration and evaluation point of view the project’s key innovative idea is that the overall anticipated framework will be application-agnostic.
The project was selected as one of success story Eurostars projects: https://www.eurostars-eureka.eu/speech-recognition-brings-gaming-next-level
TA2: Together Anywhere, Together Anytime [2008 - 2012]
Funding: FP7 EC
Summary: TA2 aims at defining end-to-end systems for the development and delivery of new, creative forms of interactive, immersive, high quality media experiences for groups of users such as households and families. The overall vision of TA2 can be summarised as “making communications and engagement easier among groups of people separated in space and time.
Samsung (South Korea) - Spontaneous speech recognition exploiting natural interfaces (2011-2014)
CTI (Idiap/Koemei) - Task Adaptation and Optimisation for Conversational Speech Recognition (2011-2012)
Armasuisse (Switzerland) - Low bit-rate speech coding (2011-2012)
TA2 (EC FP7) - Together Anywhere, Together Anytime (2008-2012)
DIRAC (EC, FP6) - Detection and Identification of Rare Audio-visual Cues (2007-2010)
Qualcomm (USA) - Speech and audio coding (2005-2007)
Qualcomm (USA) - Aurora: Advanced DSR Front-end, USA (2000-2001)
BARRANDE (France) - Codade de la parole a tres bas debit independent de la langue (1999-2000)
Two courses during winter semesters at EPFL:
Digital Speech and Audio Coding: The goal of this course is to introduce the engineering students state-of-the-art speech and audio coding techniques with an emphasis on the integration of knowledge about sound production and auditory perception through signal processing techniques (EDEE PhD course, doctoral course of electrical engineering): https://edu.epfl.ch/coursebook/en/digital-speech-and-audio-coding-EE-719?cb_cycle=edoc&cb_section=edee
Automatic speech processing: the goal of this course is to provide the students with the main formalisms, models and algorithms required for the implementation of advanced speech processing applications (involving, among others, speech coding, speech analysis/synthesis, and speech recognition), (assistant at labs, Electrical and electronics engineering, masters): https://edu.epfl.ch/coursebook/en/automatic-speech-processing-EE-554