New opening for a postdoctoral position in the area of automatic speech recognition.

The Idiap Research Institute seeks qualified candidates for one post-doctoral research position in the area of automatic speech recognition in the Swiss language environment.

Swiss communication is highly dependent upon region, both in a language sense and a dialect and accent sense.

Experiments have shown that ASR trained for a given dialect or accent does not perform well on a related, but distinct, dialect or accent. For instance, a system trained for the Valaisan accent will perform worse in the context of a Vaud accent. This means that in a homogeneous environment, ASR cannot currently work well. Practically, it is undesirable to train separate systems for each target group; intuitively, such systems should benefit from data sharing or combination.

Our current exemplar is the Valaisan parliament, in which the difficulty is further compounded in that native speakers of one language (typically French or German) are often required to speak another language. Non-native accents have also been shown by experiment to cause difficulty for ASR.

Finally one would expect to be able to use, say, a French system trained on general French data, to recognise Swiss French. Experiments have shown that this is also difficult. A combination of general French with Swiss French does not improve upon a purely Swiss French system. The situation for German is expected to be worse.

We hence seek a post-doctoral researcher to address the above problem. The task may be described as “Hierarchical adaptation to dialect and accent”. The successful candidate will investigate the following adaptation hierarchy:

1. Given a well trained French or German system, how can it be adapted to work for Swiss accented French or (standard “high”) German?

2. Given a well trained Swiss French or German system, how can it be adapted to work for a given speaker or accent?

3. Given well trained Swiss French and German systems, how can they be adapted (either independently or together) to work for a non-native speaker?

We expect the solution to the above to lie in the vast literature on acoustic model and pronunciation adaptation; in particular the concepts of MAP and MLLR adaptation, accent adaptive adaptation and pronunciation modelling. Initial work suggests that the Subspace Gaussian Mixture Model (SGMM) approach is promising as it includes such adaptation in the model design.

The work will require some knowledge of:

  • Adaptation techniques in ASR.
  • The linguistics of accent and dialect differences in Switzerland. To this end, the post would suit a native speaker of French or, ideally, German. We also envisage a significant amount of database and software integration work.

In parallel with the above, there is some requirement to bring together the work into a demonstrable form. This will require assisting development engineers with integration of, for instance, Idiap’s Juicer and Tracter ASR and signal processing frameworks with other solutions such as Kaldi.

We stress that the candidate will not work alone; the team will comprise developers as well as senior research scientists, and the work is closely related to other projects within the same group.

The applicant should have a PhD (or equivalent experience) in a subject related to the description above; strong computer skills are also essential. Preference will be given to a speaker of French or German. The candidates should demonstrate good communication and writing skills in English. The position is funded initially for 18 months, with the hope that it will be extended. The post will remain open until filled.

More info about the position can be found on our "Idiap online recruitment system"