Idiap will present next week at the ICRA (Int. Conference on Robotics and Automation) conference a novel approach allowing robots to detect multiple simultaneous speakers.

The paper "Deep Neural Networks for Multiple Speaker Detection and Localization" by Weipeng He, Petr Motlicek and Jean-Marc Odobez shows that the deep learning-based method achieves 90% precision and recall for localizing multiple speakers in real robot recordings.

The authors propose a neural networks for simultaneous detection and localization of multiple sound sources in human-robot interaction. In contrast to conventional signal processing techniques, neural network-based sound source localization methods require fewer strong assumptions about the environment. The likelihood-based encoding of the network output, proposed by the authors, naturally allows the detection of an arbitrary number of sources. Experiments on real data recorded from a robot show that our proposed methods significantly outperform the popular spatial spectrum-based approaches.

This research will be presented at the IEEE International Conference on Robotics and Automation (ICRA) 2018 in Brisbane, Australia in May, 2018. Take a look at the spotlight video which describes the approach in detail and demonstrates the results.