Srikanth Madikeri

Short Biography

Srikanth Madikeri got his Ph.D. in Computer Science and Engineering from Indian Institute of Technology Madras in 2013. During his Ph.D., he worked on automatic speaker recognition and spoken keyword spotting. He is currently working as a Research Associate at Idiap in the Speech Processing group. His current research interests include - Automatic Speech Recognition for low resource languages, Automatic Speaker Recognition and Speaker Diarization.

Education

  • Ph.D. in Computer Science and Engineering at IIT-Madras (2008-2013)
  • Bachelor of Engineering in Computer Science and Engineering, Anna University, Chennai (2004-2008)

Professional Experience

  • Research Associate at Idiap Reserach Institute (2018-present)
  • Postdoctoral researcher at Idiap Reserach Institute (2013-2018)
  • 3 years as Research Associate at IIT Madras (2010-2013)
  • 2 years as Project Associate at IIT Madras (2008-2010)

Publications

Journals and Book Chapters (Updated 23-Aug-21)

  • N. Dawalatabad, S. Madikeri, C. C. Sekhar, and H. A. Murthy, "Novel architectures for unsupervised information bottleneck based speaker diarization of meetings", in IEEE Trans. on Audio, Speech, and Language Processing 2021.
  • I. Himawan, S. Madikeri, P. Motlicek, M. Cernak, S. Sridharan, and C. Fookes, "Voice Presentation Attack Detection Using Convolutional Neural Networks", Handbook of Biometric Anti-Spoofing, pp. 391-415. (Code on Github)
  • S. Dey, P. Motlicek, S. Madikeri, M. Ferras, "Template-matching for text-dependent speaker verification", Speech Communication, Vol 88, pp. 96-105.
  • M. Ferras, S. Madikeri, P. Motlicek, S. Dey and H. Bourlard, "A large-scale open-source acoustic simulator for speaker recognition", IEEE Signal Processing Letters, Vol. 23 (4), pp. 527-531. (Code on Github)
  • S. Madikeri, "A fast and scalable hybrid FA/PPCA-based framework for speaker recognition", in Digital Signal Processing, Vol. 32, pp. 137-145, September 2014. (Code hosted at IIT-M)
  • S. Madikeri, A. Talambedu, and H. A. Murthy, "Modified group delay feature based total variability space modelling for speaker recognition", Internation Journal of Speech Technology, Vol. 18(1), pp. 17-23.

Conferences (selected) (Updated 23-Aug-21)

  • S. Madikeri, P. Motlicek, H. Bourlard, "Multitask adaptation with Lattice-Free MMI for multi-genre speech recognition of low resource languages", in Proc. of Interspeech 2021
  • A. Vyas, S. Madikeri, H. Bourlard, ""Comparing CTC and LFMMI for out-of-domain adaptation of wav2vec 2.0 acoustic model", in Proc. of Interspeech 2021
  • S. Sarfjoo, S. Madikeri, P. Motlicek, "Speech Activity Detection Based on Multilingual Speech Recognition System", in Proc. of Interspeech 2021
  • R. Braun, S. Madikeri, P. Motlicek, "A Comparison of Methods for OOV-Word Recognition on a New Public Dataset", in Proc. of ICASSP 2021 (Code on Github)
  • A. Vyas, S. Madikeri, H. Bourlard, "Lattice-free mmi adaptation of self-supervised pretrained acoustic models", in Proc. of ICASSP 2021 (Code on Github)
  • S. Madikeri, B. Khonglah, S. Tong, Petr Motlicek, H. Bourlard and D. Povey, "Lattice-Free Maximum Mutual Information Training of Multilingual Speech Recognition System", in Proc. Of Interspeech 2020. (Kaldi recipe)
  • S. Sarfjoo, S. Madikeri, P. Motlicek, S. Marcel, "Supervised domain adaptation for text-independent speaker verification using limited data", in Proc. Of Interspeech 2020
  • B. Khonglah, et al., "Incremental Semi-supervised Learning for Multi-Genre Speech Recognition", in Proc. Of IEEE ICASSP 2020. (pdf)
  • E. Boschee, et al., "SARAL: A Low-Resource Cross-Lingual Domain-Focused Information Retrieval System for Effective Rapid Document Triage", in Proc. of the 57th Conference of the Association for Computational Linguistics: System Demonstrations, pp. 19-24. (link)
  • S. Madikeri, S. Dey, P. Motlicek, "A Bayesian Approach to Inter-task fusion for speaker recognition", in Proc. of ICASSP 2019, pp. 5786-5790.
  • S. Dey, S. Madikeri, and P. Motlicek, "End-to-end text-dependent speaker verification using novel distance measures", in Proc. of Interspeech 2018, pp. 3598-3602.
  • S. Madikeri, S. Dey, and P. Motlicek, "Analysis of Language Dependent Front-End for Speaker Recognition", in Proc. of Interspeech 2018, pp. 1101-1105.
  • S. Dey, S. Madikeri, and P. Motlicek, "Information theoretic clustering for unsupervised domain-adaptation", in Proc. of ICASSP 2016, pp. 5580-5584.
  • M. Ferras, S. Madikeri, P. Motlicek, and H. Bourlard, "System fusion and speaker linking for longitudinal diarization of tv shows", in Proc. of ICASSP 2016, pp. 5495-5499.
  • S. Dey, S. Madikeri, M. Ferras, and P. Motlicek, "Deep neural network based posteriors for text-dependent speaker verification", in Proc. of ICASSP 2016, pp. 5050-5054.
  • N. Dawalatabad, S. Madikeri, C. C. Sekhar, and H. A. Murthy, "Two-Pass IB Based Speaker Diarization System Using Meeting-Specific ANN Based Features", in Proc. of Interspeech 2016, pp. 2199-2203.
  • M. Ferras, S. Madikeri, S. Dey, P. Motlicek, and H. Bourlard, "Inter-Task System Fusion for Speaker Recognition", in Proc. of Interspeech 2016, pp. 1810-1814.
  • S. Madikeri, and H. Bourlard, "KL-HMM based speaker diarization system for meetings", in Proc. of ICASSP 2015, Brisbane, Australia, pp. 4435-4439.
  • P. Motlicek, S.Dey, S. Madikeri, and L. Burget, "Employment of Subspace Gaussian Mixture Models in speaker recognition", in Proc. of ICASSP 2015, Brisbane, Australia, pp. 4445-4449.
  • I. Himawan, P. Motlicek, M. Ferras, S. Madikeri, "Towards utterance-based neural network adaptation in acoustic modeling", in Proc. of IEEE ASRU 2015.
  • S. Madikeri, and H. Bourlard ,"Filterbank slope based features for speaker diarization", in Proc. ICASSP 2014, Florence, Italy, pp. 111-115.
  • S. Madikeri, "A Hybrid Factor Analysis and Probabilistic PCA-based system for Dictionary Learning and Encoding for Robust Speaker Recognition", In Odyssey 2012-The Speaker and Language Recognition Workshop [pdf].
  • S. Madikeri and H. A. Murthy, "Mel Filter Bank energy-based Slope feature and its application to speaker recognition," Communications (NCC), 2011 National Conference on , vol., no., pp.1-4, 28-30 Jan. 2011 doi: 10.1109/NCC.2011.5734713
  • S. Madikeri, and H. A. Murthy, "Discriminative training of Gaussian mixture speaker models: A new approach," Communications (NCC), 2010 National Conference on , vol., no., pp.1-5, 29-31 Jan. 2010 doi: 10.1109/NCC.2010.5430204

The full list of publications can be found here

Code/Toolkits

  • Pkwrap: a pytorch wrapper for LF-MMI training in Kaldi [github-link]
  • Multilingual LF-MMI training: sample recipe is here
  • Standard i-vector implementation for Kaldi [github-link]
  • IB diarization toolkit (in C++) [Toolkit page]

Professional Activities and Awards

  • Winner of the International Create Challenge 2017
  • Best paper award at NCC 2011 for the paper titled "Discriminative training of Gaussian mixture speaker models: A new approach" in the Signal Processing Track

Current Projects

  • SUMMARIZATION AND DOMAIN-ADAPTIVE RETRIEVAL OF INFORMATION ACROSS LANGUAGES (SARAL): See here for project description. Our work for this project involves building Automatic Speech Recognition systems for low-resources languages (Tagalog, Swahili, Somali, Lithuanian and Bulgarian, so far) using techniques such as multilingual training and semi-supervised learning.
  • REAL TIME NETWORK, TEXT, AND SPEAKER ANALYTICS FOR COMBATING ORGANIZED CRIME (ROXANNE): See here for a brief description.

Past Projects

  • Speaker Identification Integrated Project: EU project with 17 partners including LEAs (Law Enforcment Agencies). Our work focused on developing speaker identification engines, fusion modules to use metadata information from gender identification and accent identification engines.
  • DimHA: We worked on developing fast speaker diarization systems using the Information Bottleneck (IB) framework.

Contact

E-mail: firstname dot lastname at idiap dot ch

Other connections

Google Scholar Github

Tel: +41 27 721 7743
Office: 304-4
Contact