Srikanth Madikeri

Google Scholar Github

About me

I got my Ph.D. in Computer Science and Engineering from Indian Institute of Technology Madras in 2013. During my Ph.D. I worked on automatic speaker recognition and spoken keyword spotting. I am currently working as a Research Associate at Idiap in the Speech Processing group. My research interests include - Automatic Speech Recognition for low resource languages with focus on information extraction, Automatic Speaker Recognition and Speaker Diarization.

Contact

E-mail: firstname dot lastname at idiap dot ch

Education

  • Ph.D. in Computer Science and Engineering at IIT-Madras (2008-2013)
  • Bachelor of Engineering in Computer Science and Engineering, Anna University, Chennai (2004-2008)

Experience

  • Research Associate at Idiap Reserach Institute (2018-present)
  • Postdoctoral researcher at Idiap Reserach Institute (2013-2018)
  • 3 years as Research Associate at IIT Madras (2010-2013)
  • 2 years as Project Associate at IIT Madras (2008-2010)

Publications

Full list of publications

Journals and Book Chapters

  • N. Dawalatabad, S. Madikeri, C. C. Sekhar, and H. A. Murthy, "Novel architectures for unsupervised information bottleneck based speaker diarization of meetings", in IEEE Trans. on Audio, Speech, and Language Processing 2021.
  • I. Himawan, S. Madikeri, P. Motlicek, M. Cernak, S. Sridharan, and C. Fookes, "Voice Presentation Attack Detection Using Convolutional Neural Networks", Handbook of Biometric Anti-Spoofing, pp. 391-415. ( Code)
  • S. Dey, P. Motlicek, S. Madikeri, M. Ferras, "Template-matching for text-dependent speaker verification", Speech Communication, Vol 88, pp. 96-105.
  • M. Ferras, S. Madikeri, H. Bourlard, "Speaker Diarization and Linking of Meeting Data", IEEE ACM. Trans. Audio Speech Lang. Processing. 24(11) pp. 1935-1945.
  • M. Ferras, S. Madikeri, P. Motlicek, S. Dey and H. Bourlard, "A large-scale open-source acoustic simulator for speaker recognition", IEEE Signal Processing Letters, Vol. 23 (4), pp. 527-531. ( Code)
  • S. Madikeri, "A fast and scalable hybrid FA/PPCA-based framework for speaker recognition", in Digital Signal Processing, Vol. 32, pp. 137-145, September 2014. (Code hosted at IIT-M)
  • S. Madikeri, A. Talambedu, and H. A. Murthy, "Modified group delay feature based total variability space modelling for speaker recognition", Internation Journal of Speech Technology, Vol. 18(1), pp. 17-23.

Conferences (selected)

  • E. Villatoro, S. Madikeri, P. Motlicek, A. Ganapathiraju, A. Ivanov, "Expanded Lattice Embeddings for Spoken Document Retrieval on Informal Meetings", in Proc. of SIGIR 2022, pp. 2669-2674.
  • S. Madikeri, P. Motlicek, H. Bourlard, "Multitask adaptation with Lattice-Free MMI for multi-genre speech recognition of low resource languages", in Proc. of Interspeech 2021, pp 4329-4333.
  • A. Vyas, S. Madikeri, H. Bourlard, "Comparing CTC and LFMMI for out-of-domain adaptation of wav2vec 2.0 acoustic model", in Proc. of Interspeech 2021, pp. 2861-2865.
  • S. Sarfjoo, S. Madikeri, P. Motlicek, "Speech Activity Detection Based on Multilingual Speech Recognition System", in Proc. of Interspeech 2021
  • R. Braun, S. Madikeri, P. Motlicek, "A Comparison of Methods for OOV-Word Recognition on a New Public Dataset", in Proc. of ICASSP 2021 ()
  • A. Vyas, S. Madikeri, H. Bourlard, "Lattice-free mmi adaptation of self-supervised pretrained acoustic models", in Proc. of ICASSP 2021 ()
  • S. Madikeri, B. Khonglah, S. Tong, Petr Motlicek, H. Bourlard and D. Povey, "Lattice-Free Maximum Mutual Information Training of Multilingual Speech Recognition System", in Proc. Of Interspeech 2020. (Kaldi recipe)
  • S. Sarfjoo, S. Madikeri, P. Motlicek, S. Marcel, "Supervised domain adaptation for text-independent speaker verification using limited data", in Proc. Of Interspeech 2020
  • B. Khonglah, et al., "Incremental Semi-supervised Learning for Multi-Genre Speech Recognition", in Proc. Of IEEE ICASSP 2020.
  • E. Boschee, et al., "SARAL: A Low-Resource Cross-Lingual Domain-Focused Information Retrieval System for Effective Rapid Document Triage", in Proc. of the 57th Conference of the Association for Computational Linguistics: System Demonstrations, pp. 19-24. (link)
  • S. Madikeri, S. Dey, P. Motlicek, "A Bayesian Approach to Inter-task fusion for speaker recognition", in Proc. of ICASSP 2019, pp. 5786-5790.
  • S. Dey, S. Madikeri, and P. Motlicek, "End-to-end text-dependent speaker verification using novel distance measures", in Proc. of Interspeech 2018, pp. 3598-3602.
  • S. Madikeri, S. Dey, and P. Motlicek, "Analysis of Language Dependent Front-End for Speaker Recognition", in Proc. of Interspeech 2018, pp. 1101-1105.
  • S. Dey, P. Motlicek, S. Madikeri, and M. Ferras, "Exploiting sequence information for text-dependent speaker verification", in Proc. of ICASSP 2017, pp. 5370-5374.
  • S. Dey, S. Madikeri, and P. Motlicek, "Information theoretic clustering for unsupervised domain-adaptation", in Proc. of ICASSP 2016, pp. 5580-5584.
  • M. Ferras, S. Madikeri, P. Motlicek, and H. Bourlard, "System fusion and speaker linking for longitudinal diarization of tv shows", in Proc. of ICASSP 2016, pp. 5495-5499.
  • S. Dey, S. Madikeri, M. Ferras, and P. Motlicek, "Deep neural network based posteriors for text-dependent speaker verification", in Proc. of ICASSP 2016, pp. 5050-5054.
  • N. Dawalatabad, S. Madikeri, C. C. Sekhar, and H. A. Murthy, "Two-Pass IB Based Speaker Diarization System Using Meeting-Specific ANN Based Features", in Proc. of Interspeech 2016, pp. 2199-2203.
  • M. Ferras, S. Madikeri, S. Dey, P. Motlicek, and H. Bourlard, "Inter-Task System Fusion for Speaker Recognition", in Proc. of Interspeech 2016, pp. 1810-1814.
  • S. Madikeri, and H. Bourlard, "KL-HMM based speaker diarization system for meetings", in Proc. of ICASSP 2015, Brisbane, Australia, pp. 4435-4439.
  • P. Motlicek, S.Dey, S. Madikeri, and L. Burget, "Employment of Subspace Gaussian Mixture Models in speaker recognition", in Proc. of ICASSP 2015, Brisbane, Australia, pp. 4445-4449.
  • I. Himawan, P. Motlicek, M. Ferras, S. Madikeri, "Towards utterance-based neural network adaptation in acoustic modeling", in Proc. of IEEE ASRU 2015.
  • S. Madikeri, and H. Bourlard ,"Filterbank slope based features for speaker diarization", in Proc. ICASSP 2014, Florence, Italy, pp. 111-115.
  • S. Madikeri, "A Hybrid Factor Analysis and Probabilistic PCA-based system for Dictionary Learning and Encoding for Robust Speaker Recognition", In Odyssey 2012-The Speaker and Language Recognition Workshop [pdf].
  • S. Madikeri and H. A. Murthy, "Mel Filter Bank energy-based Slope feature and its application to speaker recognition," Communications (NCC), 2011 National Conference on , vol., no., pp.1-4, 28-30 Jan. 2011 doi: 10.1109/NCC.2011.5734713
  • S. Madikeri, and H. A. Murthy, "Discriminative training of Gaussian mixture speaker models: A new approach," Communications (NCC), 2010 National Conference on , vol., no., pp.1-5, 29-31 Jan. 2010 doi: 10.1109/NCC.2010.5430204 (Best Paper Award in Signal Processing Track)

Code/Toolkits

  • Pkwrap: a pytorch wrapper for LF-MMI training in Kaldi arXiv
  • Multilingual LF-MMI training: sample recipe is available here
  • Standard i-vector implementation for Kaldi
  • IB diarization toolkit (in C++)

Professional Activities and Awards

  • Winner of the International Create Challenge 2017
  • Best paper award at NCC 2011 for the paper titled "Discriminative training of Gaussian mixture speaker models: A new approach" in the Signal Processing Track

Current Projects

  • REAL TIME NETWORK, TEXT, AND SPEAKER ANALYTICS FOR COMBATING ORGANIZED CRIME (ROXANNE): See here for a brief description.

Past Projects

  • SUMMARIZATION AND DOMAIN-ADAPTIVE RETRIEVAL OF INFORMATION ACROSS LANGUAGES (SARAL): See here for project description. Our work for this project involves building Automatic Speech Recognition systems for low-resources languages (Tagalog, Swahili, Somali, Lithuanian and Bulgarian, so far) using techniques such as multilingual training and semi-supervised learning.
  • Speaker Identification Integrated Project: EU project with 17 partners including LEAs (Law Enforcment Agencies). Our work focused on developing speaker identification engines, fusion modules to use metadata information from gender identification and accent identification engines.
  • DimHA: We worked on developing fast speaker diarization systems using the Information Bottleneck (IB) framework.


Tel: +41 27 721 7743
Office: 304-4
Contact