ICSI meeting corpus - Home
The
ICSI Meeting Corpus is a collection of 75 meetings -- including
simultaneous multi-channel audio recordings, word-level orthographic
transcriptions, and supporting documentation -- collected at the
International
Computer Science Institute in Berkeley during the years 2000-2002. The
meetings included are "natural" meetings in the sense that they would
have occurred anyway: they are generally regular weekly meetings of
various ICSI working teams, including the
team working on the ICSI
Meeting Project. In recording meetings of this type, we hoped to
capture meeting dynamics and speaking styles that are as natural as
possible given that speakers are wearing close-talking microphones and
are fully cognizant of the recording process. The meetings included
here range in length from 17 to 103 minutes, but generally run just
under an hour each. The collection includes a total of approximately 72
hours of Meeting Room speech.
This document contains an overview
of the Meeting Project, including the collection, transcription, and
data preparation process. Further details are provided in the other
documentation in this directory.
As part of this release, we provide:
* audio -- for each of the 75 meetings, a directory containing
simultaneous recordings of up to 16 channels: close-talking channels
for each participant, plus 6 table-top mics.
* transcripts --
for each meeting, a word-level orthographic transcription, plus
annotations of speech and nonspeech events and general meeting
information, available in the form of an "MRT" file, an XML format
designed for this corpus.
* doc -- in addition to this
overview, files describing the transcription conventions, the MRT
specification, a table of enrolled speakers, and other useful
information.
The Recording Set-up
The
meetings were simultaneously recorded using close-talking microphones
for each speaker (generally head-mounted, but early meetings contain
some lapel mics), as well as six table-top microphones: 4 high-quality
omnidirectional PZM microphones arrayed down the center of the
conference table, and 2 inexpensive microphone elements mounted on a
mock PDA. See the "naming.txt" or "naming.html" and "seatingchart.txt"
files in the doc directory for further details.
The data were
collected at a 48 kHZ sample-rate, downsampled on the fly to 16 kHz.
Audio files for each meeting are provided as separate time-synchronous
recordings for each channel, encoded as 16-bit linear (big-endian)
wavefiles, shorten-compressed in NIST SPHERE format.(Consult the "Known
Problems, Useful Facts" section below for an important note on the
synchronicity of the recordings.)
All meetings were recorded in
the same (roughly, 13 x 25 foot) instrumented meeting room. The room
contains a central conference table almost completely filling the room,
and can seat up to about 15
people (though we were only equipped to
record up to 10). Although we did not introduce the convention until
part way through our collection process, later meetings identify the
seat number of each participant in order to support speaker
localization research and provide adjacency information. A diagram of
the set-up may be found in the "seatingchart.txt" document.
The
meeting room contains whiteboards along three walls and is equipped
with projection equipment; people writing on whiteboards or projecting
slides can occasionally be heard during these recordings.
However,
no video is available to supplement the audio recordings. The low-level
hum of the meeting room lights and fan is also audible, particularly on
the far-field mics. The nearby elevators and hallway
conversation are also occasionally heard.
