Idiap on LinkedIn Idiap youtube channel Idiap on Twitter Idiap on Facebook
Personal tools
You are here: Home Research Resources Mediaparl

Mediaparl

— filed under:

Mediaparl is a Swiss accented bilingual database containing recordings in both French and German as they are spoken in Switzerland

Mediaparl is a Swiss accented bilingual database containing recordings in both French and German as they are spoken in Switzerland. The data were recorded at the Valais Parliament. Valais is a bi-lingual Swiss canton with many local accents and dialects. Therefore, the database contains data with high variability and is suitable to study multilingual, accented and non-native speech recognition as well as language identification and language switch detection.

The corpus is partitioned into training, development and test sets. Since we focus on bilingual (accented, non-native) speech, the test set (MediaParl-TST) contains all the speakers who speak in both languages. The remaining speakers (non-bilingual) have been randomly assigned to the training (MediaParl-TRN) and development sets (MediaParl-DEV) in a proportion of 9 to 1.

MediaParl-TRN contains 11,425 sentences (5,471 in French and 5,955 in German) spoken by 180 different speakers. MediaParl-DEV contains 1,525 sentences (646 in French and 879 in German) from 17 different speakers. MediaParl-TST contains 2,617 sentences (925 in french and 1692 in German) from 7 different speakers. Each speaker uses both languages but we assume that each speaker is naturally speaking more often in his mother tongue. Four speakers are native German speakers and three speakers native French speakers.

Acknowledgements

All publications that report on research that use the Corpus will acknowledge the MediaParl database as follows: "(Portions of) the research in this paper used the MediaParl Corpus made available by the Idiap Research Institute, Martigny, Switzerland and owned by the State of Valais, Switzerland.” and also refer to the following publication: MediaParl: Bilingual mixed language accented speech database, David Imseng, Hervé Bourlard, Holger Caesar, Philip N. Garner, Gwénolé Lecorvé and Alexandre Nanchen, in: Proceedings of the 2012 IEEE Workshop on Spoken Language Technology, 2012".

Download:

Mediaparl
Document Actions
Resource Information
Resource type: database
URL: https://www.idiap.ch/dataset/mediaparl
Date: Feb 01, 2013
Size: 4.8GB
Ownership: State of Valais
Distribution: Web
License:

Research use

Contact: Alexandre NANCHEN
+41 277 217 791