Cross-Lingual Adaptation for Text to Speech Synthesis (CLAS3)

Recent advances in statistical text to speech synthesis (TTS) have enabled voice personalization via the adaptation techniques normally associated with automatic speech recognition (ASR). Such techniques allow a synthesis voice to match a given voice using a short sample of the given voice.

Cross-language adaptation has the potential to enable personalised translation services. However, whilst the adaptation works for a given language, how to do it across languages is still a research issue. Research under the FP7 EMIME project at Idiap has demonstrated feasibility. One current difficulty is that of how to separate speaker specific characteristics from language specific characteristics. In this project, we propose to use bilingual speakers to separate language and speaker characteristics.

Perceptive and Cognitive Systems
Hasler Stiftung (Hasler Foundation)
Nov 01, 2011
Aug 31, 2012