Swisscom TV Box learns to transcribe Swiss German dialects

The Speech and audio processing group at Idiap, in collaboration with Swisscom, improves Swiss-German speech recognition for the voice assistant of the Swisscom TV Box through a multi-dialect approach.

In Switzerland, Standard German, which is spoken in Germany, is in a so-called diglossic relationship with the Swiss German dialects spoken in the German-speaking Swiss cantons. Standard German is used in a specific social and political context. Standard Swiss German, also called "Swiss German" or Schweizer Schriftdeutsch, is a mix of standard German and the written form of German officially used in Switzerland. It is used in books, newspapers and all official publications. However, this standard Swiss German is not spoken. Written Swiss German appeared only recently appeared (especially for texting and chats). Therefore, spelling conventions for Swiss German varieties don’t exist.

The official form of Swiss German is different from standard German at all linguistic analysis levels. This phenomenon is called Helvetism. There are vocabulary, pronunciation, spelling and even syntax differences. Every Swiss German dialects has its own characteristics. Fortunately, Swiss German is the best-studied dialect area in Central Europe. A phonetic transcription system, called Dieth, is used in most scientific research to transcript Swiss German dialects. This system uses standard German spelling as a starting point, but deviates where it’s inconsistent or if details linked to the description of the various Swiss dialects are missing. Variations of the dialect means that words are pronounced and written differently according to the regions. To establish the identity of a term and all its variations of writing, they must be standardized into a single form.

The Swiss German Automatic Speech Recognition (ASR) is a considerable challenge due to the lack of available datasets and the considerable regional variations. Idiap researchers, in collaboration with the Swisscom AI group, have designed a multi-dialect approach for word generation for Swiss German to deal with existing variations. If their model comes across a term it doesn't recognize, it can’t identify if this expression comes from a given variation or if it's new a word. Thanks to the database, which contains the different variants, the model learns automatically to distinguish between a term from a dialect or a new word. This model is also able to transcribe sentences into standard Swiss German. This work carried out at Idiap on Swiss German multi-dialect data extracted from the voice assistant Swisscom TV box, showed a significant improvement in ASR performance, in particular for dialects with little or no linguistic information.


More information

•    Speech & Audio Processing group