AV16.3: an Audio-Visual Corpus for Speaker Localization and Tracking
The AV16.3 corpus is an audio-visual corpus of real indoor multispeaker data, designed to test algorithms for audio-only, video-only and audio-visual speaker localization and tracking.
Real human speakers were used. The variety of recordings was chosen to test algorithms to their limits, and to cover a wide range of applicative scenarii (meetings, surveillance). The emphasis is on overlapped speech and multiple moving speakers. Recordings include mostly dynamic scenarii, with single and multiple moving speakers. A few meeting scenarii, with mostly seated speakers, are also included.
The full database description and the download can be found here: AV16.3