Flexible Grapheme-Based Automatic Speech Recognition
There has always been an interest in using directly the grapheme (orthographic) transcription of the word, without explicit phonetic modeling. However, while limiting the variability at the word representation level, the link between the acoustic waveform has become weaker (depending on the language), as the standard acoustic features characterize phonemes. Most recent attempts were based on mapping orthography of the words onto HMM states using phonetic information, or extending conventional HMM-based ASR systems by improving context-dependent modelling for grapheme units.
The goal of the present project is to exploit new statistical models recently developed at Idiap and that are potentially better suited to deal with the grapheme representation of the lexicon words and to exploit in a principled way both grapheme representation and phoneme information. This will be done by extending a novel acoustic modelling approach referred to as KL-HMM (Kullback-Leibler divergence based HMM), which has recently been shown to be much simpler, and more flexible, while yielding state-of-the-art performance (on phoneme-based ASR system) and opening up multiple opportunities for further development and research. In KL-HMM system, acoustic features are replaced by elementary unit (e.g. phonemes) posterior probability distribution and, HMM states are modelled through multinomial distribution in that posterior space. We believe this can be generalized to grapheme-based systems. Also, while working in posterior probability spaces, it is much easier to combine multiple evidences coming from multiple sources of information. The present project proposal is thus particularly well suited as a PhD project since it will allow:
- Building upon a strong PhD thesis1 and extending a new and very promising approach towards flexible speech recognition systems.
- Investigating further its generalization properties towards new types of models based on grapheme word representation.

