I am a senior researcher at Idiap Research Institute in Martigny, Switzerland, and also an external teacher at Brno University of Technology (BUT). My research interests include audio and speech processing (especially speech and speaker recognition), and its applications for various tasks (i.e. embedded platforms, robotics, forensic voice comparison).
Part of my scientific work was also done at Oregon Graduate Institute, USA and at Ecole Superieure d’Ingenieurs en Electrotechnique et Electronique, France.
PhD in Computer Science, 2003
Faculty of Information Technology, Brno University of Technology
MEng in Electrical Engineering, 1999
Faculty of Electrial Engineering , Brno University of Technology
October 2020: Lecture given at FDP seminar on automatic speech recognition:
- FDP workshop details
- News announced
Past projects can be found here
This project has received funding from the SESAR Joint Undertaking under Grant Agreement No. 884287, under European Union’s Horizon 2020 Research and Innovation programme.
ATCO2 is H2020 EC project. It has received funding from the Clean Sky 2 Joint Undertaking (JU) under grant agreement No 864702.
ROXANNE (Real time network, text, and speaker analytics for combating organized crime) is an EU funded collaborative research and innovation project, aiming to unmask criminal networks and their members as well as to reveal the true identity of perpetrators by combining the capabilities of speech/language technologies and visual analysis with network analysis. ROXANNE collaborates with Law Enforcement Agencies (LEAs), industry and researchers to develop new tools to speed up investigative processes and support LEA decision-making. The end-product will be an advanced technical platform which uses new tools to uncover and track organized criminal networks, underpinned by a strong legal framework. The project consortium comprises 24 European organisations from 16 countries while 11 of them are LEAs from 10 different countries.
The goal of MuMMER is to develop a humanoid robot (based on Softbank’s Pepper platform) that can interact autonomously and naturally in the dynamic environments of a public shopping mall, providing an engaging and entertaining experience to the general public.
SARAL is IARPA U.S. project coordinated by USC Viterbi School of Engineering, California.
SM2 project aims to develop a customisable technology for “semantic keyword and concept detection” allowing bank institutes to meet MIFID requirements. The solution allows to search in all kind of electronic documents (speech/video/text) and analyse according to predefined semantic categories.
SIIP: Speaker identification integrated project [May 2014 - April 2018]
Funding: FP7 EC
Web: http://www.siip.eu
Summary: Funded by the European Commission, SIIP research project has developed a breaking-through Suspect Identification solution based on a novel Speaker Identification (SID) engine and Global Info Sharing Mechanism (GISM) which identify unknown speakers that are captured in lawfully intercepted calls, in recorded crime or terror arenas, in social-media and in any other type of speech sources.
MALORCA: Machine Learning of Speech Recognition Models for Controller Assistance [April 2016 - March 2018]
Funding: H2020 EC SESAR Joint Undertaking project
Summary: Malorca project proposes a general, cheap and effective solution to automate re-learning, adaptation and customisation process of automatic speech recognition models applied for air-traffic control domain. Both the radar and speech recordings (of ATCOs) are used as input data.
DBOX: A generic dialog box for multilingual conversational applications [2012 - 2015]
Funding: EC Eurostars program
Summary: From a research point of view, DBOX project aims at building a multilingual conversational agent which will seamlessly interact with multiple users speaking different languages and driven by a common goal defined by the game. This involves the development and integration of multilingual speech recognition systems, multilingual speech synthesis, multilingual dialog modeling, and cross-domain adaptation resources. From an integration and evaluation point of view the project’s key innovative idea is that the overall anticipated framework will be application-agnostic.
The project was selected as one of success story Eurostars projects:
https://www.eurostars-eureka.eu/speech-recognition-brings-gaming-next-level
TA2: Together Anywhere, Together Anytime [2008 - 2012]
Funding: FP7 EC
Summary: TA2 aims at defining end-to-end systems for the development and delivery of new, creative forms of interactive, immersive, high quality media experiences for groups of users such as households and families. The overall vision of TA2 can be summarised as “making communications and engagement easier among groups of people separated in space and time.
Samsung (South Korea) - Spontaneous speech recognition exploiting natural interfaces (2011-2014)
CTI (Idiap/Koemei) - Task Adaptation and Optimisation for Conversational Speech Recognition (2011-2012)
Armasuisse (Switzerland) - Low bit-rate speech coding (2011-2012)
TA2 (EC FP7) - Together Anywhere, Together Anytime (2008-2012)
DIRAC (EC, FP6) - Detection and Identification of Rare Audio-visual Cues (2007-2010)
Qualcomm (USA) - Speech and audio coding (2005-2007)
Qualcomm (USA) - Aurora: Advanced DSR Front-end, USA (2000-2001)
BARRANDE (France) - Codade de la parole a tres bas debit independent de la langue (1999-2000)
Two courses during winter semesters at EPFL:
Digital Speech and Audio Coding: The goal of this course is to introduce the engineering students state-of-the-art speech and audio coding techniques with an emphasis on the integration of knowledge about sound production and auditory perception through signal processing techniques (EDEE PhD course, doctoral course of electrical engineering): https://edu.epfl.ch/coursebook/en/digital-speech-and-audio-coding-EE-719?cb_cycle=edoc&cb_section=edee
Automatic speech processing: the goal of this course is to provide the students with the main formalisms, models and algorithms required for the implementation of advanced speech processing applications (involving, among others, speech coding, speech analysis/synthesis, and speech recognition), (assistant at labs, Electrical and electronics engineering, masters): https://edu.epfl.ch/coursebook/en/automatic-speech-processing-EE-554
Weipeng He
Qingran Zhan
Mael Fabien
Juan Zuluaga
Amrutha Prasad
Shantipriya Parida
Saeed Sarfjoo
Past students:
Subhadeep Dey
Ajay Srinivasamurthy
Ivan Himawan (Queensland University of Technology)
Gwenole Lecovre