Multi-channel Overlapping Numbers Corpus


In the scope of some of our early microphone array research at IDIAP, a multi-channel recording of the Numbers speech recognition corpus was made. This multi-channel corpus is available for public distribution to facilitate microphone array speech recognition research. Full details of the corpus are contained in the report “MONC : The Multichannel Overlapping Numbers Corpus”. The corpus is being distributed by the Center for Spoken Language Understanding at OGI.

A description of the database and initial experiments were reported in the following paper:

D. Moore and I. McCowan. Microphone Array Speech Recognition: Experiments on Overlapping Speech in Meetings. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing, April 2003.


Multi-Channel Wall Street Journal Audio-Visual (MS-WSJ-AV) Corpus


The MC-WSJ-AV corpus offers an intermediate task between simple digit recognition and large vocabulary conversational speech recognition. The corpus consists of read Wall Street Journal sentences taken from the test set of the WSJCAM0 database, recorded in the instrumented meeting rooms constructed for the recording of the AMI Meetings Corpus. The sentences are read by a range of speakers (some 45 in total) with varying accents (including a number of non-native English speakers). Sentences are read according to a number of scenarios including a single stationary speaker, a single moving speaker, and multiple concurrent speakers. During recordings, all speakers wear lapel and headset microphones, and audio from two eight element microphone arrays is also captured. The rooms also provide synchronised video recordings including close-up views of the speakers' faces, as well as wide-angle views of the entire room.

The data is suitable for a wide variety of research tasks including: development of microphone array ASR front-end processing systems, audio-visual ASR, audio-visual person tracking, integration of audio-visual person tracking with microphone array ASR processing, recognition of accented and non-native English speech, and recognition of overlapped speech.

The corpus is publicly available from the IDIAP MultiModal Media Fileserver. A description of the database and initial experiments were reported in the following paper:

M. Lincoln, I. McCowan, J. Vepa, and H. Krishna Maganti. The Multi-Channel Wall Street Journal Audio-Visual Corpus (MC-WSJ-AV): Specification and Initial Experiments. In Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), December 2005.



The AMI Meetings Corpus


The AMI Meeting Corpus is a multi-modal data set consisting of 100 hours of meeting recordings. The recordings use a range of signals synchronized to a common timeline. These include close-talking and array microphones, individual and room-view video cameras, and output from a slide projector and an electronic whiteboard. During the meetings, the participants also have unsynchronized pens available to them that record what is written. The meetings were recorded in English using three different rooms with different acoustic properties, and include mostly non-native speakers.

The corpus is publicly available from the AMI Corpus web site, which also has a more informative overview of the corpus. A description of the corpus design and recording setup were reported in the following paper:

I. McCowan, J. Carletta, W. Kraaij, S. Ashby, S. Bourban, M. Flynn, M. Guillemot, T. Hain, J. Kadlec, V. Karaiskos, M. Kronenthal, G. Lathoud, M. Lincoln, A. Lisowska, W. Post, D. Reidsma, and P. Wellner. The AMI Meeting Corpus. In Proceedings of the 5th International Conference on Methods and Techniques in Behavioral Research, September 2005.


Other Data


Some of the other corpora listed on the IDIAP MultiModal Media Fileserver contain microphone array recordings.