Personal tools
You are here: Home / ICPR10

ICPR10

Fusing Audio-Visual Nonverbal Cues to Detect Dominant People in Conversations

Oya Aran and Daniel Gatica-Perez

in the 20th Int'l Conf. on Pattern Recognition, Istanbul, Turkey, 2010


Download pdf file
aran ICPR10 F2

This work addresses the question of fusing audio and visual nonverbal cues to estimate the most dominant person in small group interactions, and shows that one can achieve higher accuracies by combining multimodal cues. On top of the audio and visual nonverbal features, we present a new set of audio-visual features that combines audio-visual information at the feature extraction level. We also apply score and rank level fusion techniques for multimodal fusion. For dominance estimation, we use simple rule based estimators, which do not require labeled training data.  

On two dominance estimation tasks, (i.e. to estimate the most dominant (MD) person, and to estimate the least dominant (LD) person), performed on the DOME dataset, our experiments show that the visual information is necessary and the highest accuracies can only be achieved by combination of audio and visual cues.