This software is a patch to HMM based statistical parametric speech synthesis toolkit (HTS 2.2). Vocal tract length normalization (VTLN) is a rapid adaptation technique and transforms spectral characteristics of the speech to match the gender of the target speaker. This code can perform estimation of Bilinear transform based warping factors for Mel-generalized cepstral (MGCEP) features. This code includes the possibility to perform VTLN adaptation as a global warping of the spectrum using base classes and also as multiple warping parameters for different phoneme classes using regression trees (similar to CMLLR adaptation). Please check the README file for more details of using the code. Please download the patch

Following is the demo for VTLN using a single adaptation sentence:

Example 1: WSJ (ASR) database (16kHz speech)
Speech from average Voice model
Natural speech from Target female speaker
Speech synthesized using VTLN for the target female speaker

Example 2 : Blizzard challenge 2010 (RJS) database (48kHz speech)
Speech from speaker dependent RJS model
Natural speech from target speaker (Roger)
Speech synthesized using VTLN for the target speaker (Roger)

For more details on the technique please refer to :
L. Saheer, J. Dines, P. N. Garner, and H. Liang, “Implementation of VTLN for statistical speech synthesis”, in proceedings of the 7th ISCA Speech Synthesis Workshop, Kyoto, Japan, September 2010, pages. 224–229.

