Role based speaker diarization

Speaker Diarization is the task of inferring "who spoke when" in an audio stream and is an essential step for facilitating the search and the indexing of audio archives, increasing the richness of automatic transcriptions and extracting high level informations on human conversations. Most of the recent efforts in the domain have addressed the problem using machine learning and signal processing techniques. On the other hand, current approaches completely neglect the fact that the data represents instances of human conversations which present predictable patterns induced by the role that each participant have in the discussion.

In recent years, many studies have shown that turn-taking extracted from speaker diarization can be statistically modeled and used to classify the role that each speaker has in the conversation. Roles can be coded according to a number of schemes including formal/informal, social and functional roles. Reversely we propose to integrate in the diarization system, the statistics on the speaker interactions induced by their roles. The goal of this proposal is to enhance speaker diarization of meetings and broadcast data through the combination of traditional audio processing techniques with the information on the conversation structure coming from the roles that participants have.

The project is organized in two research tracks: 1- Statistical representation and estimation of the speakers interactions conditioned to their role. 2-Integration of this information into the speaker diarization system. The development and the evaluation will be carried on meeting recordings and broadcast audio data collected in the framework of the Rich Transcription evaluations. Progresses will be estimated in terms of Diarization Error Rate which is the official metric proposed by NIST for benchmarking this task. The research proposed in RODI will try to bridge the gap in between two different fields, the automatic speaker segmentation and the analysis of human conversations that are closely related.

Perceptive and Cognitive Systems
Swiss National Science Foundation
Nov 01, 2011
Oct 31, 2014