Wall Street Journal (WSJ) Hub - Home
MC-WSJ-AV corpus offers an intermediate task between simple digit
recognition and large vocabulary conversational speech recognition. The
corpus consists of read Wall Street Journal sentences taken from the
test set of the WSJCAM0 database, recorded in the instrumented meeting
rooms constructed for the recording of the AMI Meetings Corpus. The
sentences are read by a range of speakers (some 45 in total) with
varying accents (including a number of non-native English speakers).
Sentences are read according to a number of scenarios including a single
stationary speaker, a single moving speaker, and multiple concurrent
speakers. During recordings, all speakers wear lapel and headset
microphones, and audio from two eight element microphone arrays is also
captured. The rooms also provide synchronised video recordings including
close-up views of the speakers' faces, as well as wide-angle views of
the entire room. The data is suitable for a wide variety of research
tasks including :
- development of microphone array ASR front-end processing systems
- audio-visual ASR
- audio-visual person tracking
- integration of audio-visual person tracking with microphone array ASR processing
- recognition of accented and non-native English speech
- recognition of overlapped speech
Related publication : THE MULTI-CHANNEL WALL STREET JOURNAL AUDIO VISUAL CORPUS (MC-WSJ-AV): SPECIFICATION AND INITIAL EXPERIMENTS, pdf
