Unified Speech Processing Framework for Trustworthy Speaker Recognition

The goal of automatic speaker recognition task is to recognize persons through their voice. Automatic speaker verification is a subtest of speaker recognition task where the goal is to verify or authenticate a person. State-of-the-art speaker verification systems typically model short-term spectrum based features such as mel frequency cepstral coefficients (MFCCs) through a generative model such as, Gaussian mixture models (GMMs) and employ a series of compensation methods to achieve low error rates. This has two main limitations. First, the approach necessitates availability of sufficient training data for each speaker for robust modeling and sufficient test data to apply the series of compensation techniques to verify a speaker. Second, the speaker verification system is prone to malicious attacks such as through voice conversion (VC) system, text-to-speech (TTS) system. The main reason is that the front-end feature and back-end models of speaker verification system, namely, MFCC and GMMs, are similar to that of VC system and TTS system. The proposed project aims to address these limitations through development of novel approaches for trustworthy speaker verification. In order to achieve that, through collaboration between researchers from Speech and Audio Processing group and Biometrics group at Idiap, the proposed project focuses along two lines, 1. in on-going DeepSTD project funded by HASLER foundation, in the context of speech recognition, it was shown that speech recognition systems can be built by directly modeling raw speech signals using artificial neural networks. The proposed project aims to build on that approach to develop a generic speaker verification approach that can be used for both speaker verification and speaker diarization. 2. in a collaborative study with researchers from Univesity of Eastern Finland and Nanyang Technical University (Singapore), Idiap have developed a countermeasure approach for state-of-the-art speaker verification system. The proposed project aims to extend this approach along with development of novel anti-spoofing countermeasures using binary features and text-dependent speaker verification.

Information Interfaces and Presentation
Swiss National Science Foundation
Jul 01, 2015
Jun 30, 2018