Robust face tracking, feature extraction and multimodal fusion for audio-visual speech recognition and visual attention modeling in complex environment
Abstract
Human communication is a combination of speech and non-verbal behavior. A significant part of the non-verbal information is contained in face movements and expressions. Therefore, a major step in the automatic analysis of human communication is the location and tracking of human faces. In this project, we will first tackle the problem of robust face tracking, that is, the continuous estimation of the head pose and of the facial animations in video sequences. Based on this first development, two subsequent workpackages will address important building blocks towards the automatic analysis of natural scenes, namely automatic audio-visual speech recognition and Visual Focus of Attention (VFOA) analysis. Both of them strongly rely on robust face tracking and therefore will directly exploit and benefit from the results of the first workpackage.
Partners
EPFL - Signal Processing Laboratory 5 (LTS5)
Idiap Research Institute

