Ramya Rasipuram awarded the EPFL PhD degree for her work on "Grapheme-based Automatic Speech Recognition using Probabilistic Lexical Modeling"

On October first 2014, Ramya Rasipuram made the public defense of her PhD thesis entitled "Grapheme-based Automatic Speech Recognition using Probabilistic Lexical Modeling". She received the EPFL PhD thesis diploma from her thesis Director Hervé Bourlard.

ramya-rasipuram-phd-thesis-award.pngOne of the key challenges involved in building statistical automatic speech recognition (ASR) systems is modeling the relationship between subword units and acoustic features. To model this relationship two main resources are required: transcribed speech data i.e., speech with word level transcriptions and a pronunciation dictionary where each word is transcribed in terms of basic sound units of the language, i.e., phones or phonemes. The creation of these two resources for any language is expensive and time consuming. The development of ASR systems for resource-rich languages (such as English or French) is less constrained by this issue. However, for under-resourced languages that lack proper resources, the above issue is a major bottleneck.

In this thesis, we introduced the framework of probabilistic lexical modeling, where the relationship between subword units and acoustic features is factored through a latent variable into two models, namely, acoustic model and lexical model. In the acoustic model, the relationship between latent variables and acoustic features is modeled, while in the lexical model, a probabilistic relationship between latent variables and subword units is modeled. In the thesis, we showed that in the proposed framework: (1) the subword units can be graphemes, the units of written language, which make the pronunciation dictionary development easy; (2) the acoustic model can be trained on domain-independent or language-independent resources; and (3) the lexical model can be trained on a relatively small amount of transcribed speech data from the target domain or language in which we are interested to build an ASR system. The proposed approach facilitates sharing of resources and models from resource-rich languages and requires fewer or even zero conventional resources from the target language.

The potential and the efficacy of the proposed approach is demonstrated through experiments and comparisons with other standard approaches on ASR for resource rich languages, non-native and accented speech, under-resourced languages, and minority languages. The studies revealed that the proposed framework is particularly suitable when the task is challenged by the lack of both pronunciation dictionary and sufficient transcribed speech data. Furthermore, the investigations also showed that standard ASR approaches in which the lexical model is deterministic are more suitable when a phone-based pronunciation dictionary is available than for a grapheme-based pronunciation dictionary, while the probabilistic lexical model based ASR approach proposed in the thesis is suitable for both.

Congratulations to her.

To download Ramya's thesis, click on the following link: Grapheme-based Automatic Speech Recognition using Probabilistic Lexical Modeling


Automatic speech recognition; Kullback-Leibler divergence based hiddenMarkov model; lexicon; grapheme subword units; phoneme subword units; probabilistic lexical modeling; grapheme-based automatic speech recognition; grapheme-to-phoneme conversion; under-resourced speech recognition.