Spoken Interaction with Interpretation in Switzerland

Speech-to-speech translation (S2ST) is the translation of spoken sentences from one language to another. S2ST is not a mature technology but a topic in the research community. Commonly, a S2ST system consists of three components: speech-to-text conversion, machine translation, and text-to-speech conversion. Such a S2ST system translates only the words and neglects the personality associated with them. Therefore, the translated speech reflects its original only incompletely.

The SIWIS project aims for a better S2ST system that also translates important cues in speech such as identity of the speaker, focus, contrast or emphasis; it therefore transfers the user's intention more naturally and completely. Furthermore, the S2ST system should be adaptive with respect to two aspects: the speech-to-text component should adapt to the user's voice in order to optimise the speech recognition rate, and the text-to-speech component should be adaptive to allow the user to define the sound of the generated speech by means of some speech samples.

Switzerland is an ideal place for S2ST research, because it works day to day in five different languages simultaneously. Four of these are national languages, augmented by English, the latter being as important as any of them for international communication. The situation is even more complicated if one also considers local accents and dialects. This language mix leads to obvious difficulties, with many people working and even living in a non-native language. More positively, however, this multi-linguality makes Switzerland a predestined place for multi-lingual research, not only as a geographic location to conduct research, but also as the country most likely to benefit from the results of such research.

The SIWIS project will begin from a baseline defined by the union of the output of the EU FP7 EMIME project and the complementary expertise of four partner institutions. In a series of core tasks, the partners will pool resources to place the Swiss language research community at the state-of-the-art in speech-to-speech translation in the major languages of Switzerland and Swiss commerce. In a series of group tasks, the partners will advance the state-of-the-art in the field, capitalising on the unique location, language mix and expertise of Switzerland and the partners.

All tasks will focus on one or more of the following common themes:

Swiss languages. Whilst a focus on Swiss language is not a research issue in itself, the Swiss locality puts \project in a position to focus data collection and enable research that can only take place effectively in a multi-lingual environment.

Translation. We have at our disposal a capable translation framework, hence an end-to-end recognition, translation and synthesis chain.

Prosody. Prosody will be a significant research focus of SIWIS. That is, we will translate not only spoken words, but also important prosodic cues associated with them. The concept of prosody transfer across distinct languages and speakers is a largely untouched research area. It is something that can only be attacked given the pooled recognition, translation and synthesis resources of a consortium.

Cross-lingual adaptation. Adaptation is the process that allows speech synthesis to mimic the voice of a speaker in another language. This will be driven by the unique availability of bilingual speakers in Switzerland.

The research will result in a unique speech-to-speech translation capability, the synthesis in a target language mimicking both the spectral and prosodic characteristics of the speaker in the source language.

Application Area - Health and bioengineering, Application Area - Human Machine Interaction, Machine Learning
Eidgenoessische Technische Hochschule Zuerich, University of Edinburgh, University of Geneva
Swiss National Science Foundation
N/A
Dec 01, 2012
Nov 30, 2016