Publications

Thesis Publications

J. Dines. "Model based trainable speech synthesis and its applications", Ph.D. Thesis. Queensland University of Technology, Brisbane Australia, 2003.

J. Dines. "Active Noise Control for Agricultural Machinery", Honours Thesis.  University of Southern Queensland, Toowoomba, Australia, 1998.

Book Chapters

[7] Weifeng Li, Kenichi Kumatani, John Dines, Mathew Magimai Doss, Herve Bourlard, "A Neural Network Based Regression Approach for Recognising Simultaneous Speech" in: Andei Popescu Belis, Rainer Stiefelhagen, eds., Machine Learning for Multimodal Interaction, LNCS 5237, Springer-Verlag, Berlin/Heidelberg, 2008, p110-118.

[6] John Dines, Mathew Magimai Doss, "A Study of Phoneme and Grapheme Based Context Dependent ASR Systems" in: Andei Popescu Belis, Steve Renals, Herve Bourlard, eds., Machine Learning for Multimodal Interaction, LNCS 4892, Springer-Verlag, Berlin/Heidelberg, 2008, p215-226.

[5] Thomas Hain, Lukas Burget, John Dines, Giulia Garau, Martin Karafiat, David van Leeuwan, Mike Lincoln, Vincent Wan "The 2007 AMI(DA) System for Meeting Transcription" in: Rainer Stiefelhagen, Rachel Bowers, Jonathon Fiscus, eds., Multimodal Technologies for Perception of Humans, LNCS 4625, Springer-Verlag, Berlin/Heidelberg, 2008, p414-428.

[4] Darren Moore, John Dines, Mathew Magimai Doss, Jithendra Vepa, Octanvian Cheng, Thomas Hain "Juicer: A Weighted Finite-State Transducer Speech Decoder" in: Steve Renals, Samy Bengio, Jonathon G. Fiscus, eds., Machine Learning for Multimodal Interaction, LNCS 4299, Springer-Verlag, Berlin/Heidelberg, 2006, p285-296.

[3] Thomas Hain, Lukas Burget, John Dines, Giulia Garau, Martin Karafiat, Mike Lincoln, Jithendra Vepa, Vincent Wan "The AMI Meeting Transcription System: Progress and Performance" in: Steve Renals, Samy Bengio, Jonathon G. Fiscus, eds., Machine Learning for Multimodal Interaction, LNCS 4299, Springer-Verlag, Berlin/Heidelberg, 2006, p419-431.

[2] Thomas Hain, Lukas Burget, John Dines, Giulia Garau, Martin Karafiat, Mike Lincoln, Iain McCowan, Darren Moore, Vincent Wan, Roeland Ordelman, Steve Renals "The 2005 AMI System for the Transcription of Speech in Meetings" in: Steve Renals, Samy Bengio, eds., Machine Learning for Multimodal Interaction, LNCS 3869, Springer-Verlag, Berlin/Heidelberg, 2006, p450-462.

[1] Thomas Hain, Lukas Burget, John Dines, Giulia Garau, Martin Karafiat, Mike Lincoln, Iain McCowan, Darren Moore, Vincent Wan, Roeland Ordelman, Steve Renals "The Development of the AMI System for the Transcription of Speech in Meetings" in: Steve Renals, Samy Bengio, eds., Machine Learning for Multimodal Interaction, LNCS 3869, Springer-Verlag, Berlin/Heidelberg, 2006, p344-356.

Journal Publications

[2] John Dines, Junichi Yamagishi, Simon King "Measuring the Gap between HMM-based ASR and TTS" in: IEEE Journal of Selected Topics in Signal Processing (accepted for publication).

[1] Junichi Yamagishi, Bela Usababaev, Simon King, Oliver Watts, John Dines, Jilei Tian, Yong Guan, Rile Hu, Keiichiro Oura, Yi-Jian Wu, Keiichi Tokuda, Reima Karhila, Mikko Kurimo "Thousands of Voices for HMM-based Speech Synthesis - Analysis and Applications of TTS Systems Built on Various ASR Corpora" in: IEEE Transactions on Audio, Speech and Language Processing (accepted for publication).

Conference Publications

[32] Lakshmi Saheer, Philip N. Garner, John Dines, Hui Liang, "VTLN adaptation for statistical speech synthesis" accepted: ICASSP 2010 (Dallas, USA).

[31] Hui Liang, John Dines, Lakshmi Saheer, "A Comparison of Supervised and Unsupervised Cross-Lingual Speaker Adaptation Approaches for HMM-Based Speech Synthesis" accepted: ICASSP 2010 (Dallas, USA).

[30] Danil Korchagin, Philip N. Garner, John Dines, "Automatic Temporal Alignment of AV Data with Confidence Estimation" accepted: ICASSP 2010 (Dallas, USA).

[29] Junichi Yamagishi, Mike Lincoln, Simon King, John Dines, Mathew Gibson, Jilei Tian, Yong Guan, "Analysis of Unsupervised and Noise-Robust Speaker Adaptive HMM-based Speech Synthesis Systems toward a Unified ASR and TTS Framework" in: Proceedings of Blizzard Challenge Workshop, (Edinburgh, U.K.), 2009.

[28] Junichi Yamagishi, Bela Usabaev, Simon King, Oliver Watts, John Dines, Jilei Tian, Rile Hu, Yong Guan, Keiichiro Oura, Keiichi Tokuda, Reima Karhila, Mikko Kurimo, "Thousands of Voices for HMM-based Speech Synthesis" in: Proceedings of Interspeech, (Brighton, U.K.), 2009.

[27] Philip N. Garner, John Dines, Thomas Hain, Asmaa El Hannani, Martin Karafiat, Danil Korchagin, Mike Lincoln, Vincent Wan and Le Zhang, "Real-Time ASR from Meetings" in: Proceedings of Interspeech, (Brighton, U.K.), 2009.

[26] John Dines, Lakshmi Saheer and Hui Liang, "Speech recognition with speech synthesis models by marginalising over decision tree leaves" in: Proceedings of Interspeech, (Brighton, U.K.), 2009.

[25] John Dines, Junichi Yamagishi and Simon King, "Measuring the gap between HMM-based ASR and TTS" to appear in: Proceedings of Interspeech, (Brighton, U.K.), 2009.

[24] Weifeng Li, John Dines, Mathew Magimai.-Doss and Herve Bourlard, "Non-linear mapping for multi-channel speech separation and robust overlapping speech recognition", in: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2009

[23] Vincent Wan, John Dines, Asmaa El Hannani, Thomas Hain, "Bob: A lexicon and pronunciation dictionary generator", in: Proceedings of Workshop on Spoken Language Technology (SLT), (Goa, India), 2008.

[22] Sarah Favre, Hugues Salamin, John Dines, Alessandro Vinciarelli, "Role Recognition in Multiparty Recordings using Social Affiliation Networks and Discrete Distributions", in Proceedings of Special Session on Social Signal Processing at ICMI, (Chania, Greece), 2008.

[21] Kenichi Kumatani, John McDonough, Barbara Rauch, Philip N. Garner, Weifeng Li, and John Dines, "Maximum Kurtosis Beamforming with the Generalized Sidelobe Canceller", in Proceedings of Interspeech-2008, (Brisbane, Australia), 2008.

[20] Weifeng Li, M. Magimai.-Doss, J. Dines, and H. Bourlard, "MLP-based Log Spectral Energy Mapping for Robust Overlapping Speech Recognition", in Proceedings of EUSIPCO, (Lausanne, Switzerland), 2008.

[19] Weifeng Li, J. Dines, M. Magimai.-Doss, and H. Bourlard, "Neural network based regression for robust overlapping speech recognition using microphone arrays", in Proceedings of Interspeech-2008, (Brisbane, Australia), 2008.

[18] Weifeng Li, K. Kumatani, J. Dines, M. Magimai.-Doss, and H. Bourlard, "A Neural Network based Regression Approach for Recogninizing Simultaneous Speech", in Proceedings of Joint Workshop on Machine Learning and Multimodal Interaction, (Utrecht, Netherlands), September, 2008.

[17] Thomas Hain, Lukas Burget, Martin Karafiat, John Dines, David van Leeuwen, Giulia Garau, Mike Lincoln and Vincent Wan, "AMI/DA STT and SASTT", in Proceedings of RT07 Workshop,  (Baltimore, USA), 10 May 2007.

[16] John Dines and Jithendra Vepa, "Direct optimisation of a multilayer perceptron for the estimation of cepstral mean and variance statistics", in Proceedings of Interspeech 2007 Eurospeech, (Antwerp, Belgium), 2007.

[15] John Dines and Mathew Magimai Doss, "A study of phoneme and grapheme based context-dependent ASR systems", in Proceedings of MLMI-07, (Brno, Czech Republic), 2007.

[14] Octavian Cheng, John Dines and Mathew Magimai Doss, "A Generalized dynamic composition algorithm of weighted finite state transducers for large vocabulary speech recognition", in Proceedings of ICASSP, (Honolulu, Hawaii), 2007.

[13] Thomas Hain, Lukas Burget, John Dines, Giulia Garau, Vincent Wan, Martin Karafiat, Jithendra Vepa and Mike Lincoln, "The AMI system for the transcription of speech in meetings", in Proceedings of ICASSP,  (Honolulu, Hawaii), 2007.

[12] Thomas Hain, Lukas Burget, John Dines, Giulia Garau, Martin Karafiat, Mike Lincoln, Jithendra Vepa and Vincent Wan, "The AMI meeting transcription system: Progress and performance", in Proceedings of NIST RT'O6 Workshop, (Washington, D.C.), 2006.

[11] John Dines, Jithendra Vepa, and Thomas Hain.  "The segmentation of multi-channel meeting recordings for automatic speech recognition", in Interspeech 2006 ICSLP, (Pittsburgh), 2006.

[10] Darren Moore, John Dines, Mathew Magimai Doss, Jithendra Vepa, Octavian Cheng, Thomas Hain.  "Juicer: A Weighted Finite State Transducer speech decoder", in MLMI-06, (Washington DC), 2006.

[9] Thomas Hain, Lukas Burget, John Dines, Giulia Garau, Martin Karafiat, Mike Lincoln, Iain McCowan, Darren Moore, Vincent Wan, Roeland Ordelman and Steve Renals.  "The 2005 AMI System for the Transcription of Speech in Meetings", in NIST Spring 2005 Rich Transcription Workshop, (Edinburgh, Scotland), 2005.

[8] Thomas Hain, Lukas Burget, John Dines, Iain McCowan, Martin Karafiat, Mike Lincoln, Darren Moore, Giulia Garau, Vincent Wan, Roeland Ordelman, and Steve Renals.  "The Development of the AMI System for the Transcription of Speech in Meetings", in MLMI, (Edinburgh, UK), 2005.

[7] Thomas Hain, John Dines, Giulia Garau, Martin Karafiat, Darren Moore, Vincent Wan, Roeland Ordelman and Steve Renals.  "Transcription of Conference Room Meetings: an Investigation", in Eurospeech, (Lisbon, Portugal), 2005.

[6] G. Aradilla, J Dines and S Silvadas.  "Using RASTA in task independent TANDEM feature extraction", in ICSLP, (Korea), 2004.

[5] J. Dines, S. Sridharan and M. Moody. “Speech segmentation with HMM”, in Proceedings of the International Australian Speech Science and Technology Conference (SST-2002), (Melbourne, Australia), 2002.

[4] J. Dines, S. Sridharan and M. Moody. “Application of the trended hidden Markov model to speech synthesis”, in Proceedings of the European Conference on Speech Communication and Technology (Eurospeech), (Aalborg, Denmark), 2001.

 

[3] J. Dines and S. Sridharan. “Trainable speech synthesis with trended hidden Markov models”, in Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), (Salt Lake City, USA), 2001.

 

[2] J. Dines, S. Sridharan and M. Moody. “Compression of speech for mass storage using speech recognition and text-to-speech synthesis”, in Proceedings of the International Australian Speech Science and Technology Conference (SST-2000), (Canberra, Australia), 2000.

 

[1] J. Dines and S. Sridharan. “A speaker independent phonetic vocoder for the English language”, in Proceedings of the International Symposium on Signal Processing and Communications Systems (ISPACS), (Honolulu, USA), 2000.

Other

M. Magimai-Doss, J. Dines, H. Bourlard and H. Hermansky.  "Phoneme vs based grapheme automatic speech recognition",  IDIAP research report 04-48, 2004

I. McCowan, D. Moore, J. Dines, D. Gatica-Perez, M. Flynn, P. Wellner, and H. Bourlard. "On the Use of Information Retrieval Measures for Speech Recognition Evaluation", IDIAP research report 04-73, 2004