Swiss algorithms to help customers in a Finnish shopping mall

Human-robot interactions are often lacking fluidity, especially outside of the lab. Today, researchers from Idiap are publishing in open access the algorithms which allowed a robot to be used in real conditions in a shopping mall in Finland in the framework of the European project Mummer.

Answering customers’ questions in a shopping mall, a particularly mundane task. But for a robot, it’s quite a challenge. Of course, to perform this task, the robot must be able to understand the questions. Beyond that, to converse without confusion, the robot must be able to detect people around it, to identify people interested in it, to distinguish a discussion between two people and when people are speaking to the robot or to check if people are paying attention. “The aim is to develop algorithms analyzing audio and video signals captured by the robot to extract non-verbal communication elements,” explains Jean-Marc Odobez, head of the Perception and Activity Understanding group. To have a robot able to interact in everyday locations is an additional challenge. Researchers from Valais and Europe took up the challenge in a shopping mall, two-hour drive of Helsinki, where they tested and refined their technologies. Thanks to Idiap’s work, the experiment showed that in the video captured by the robot it is possible to recognize and identify rapidly speakers in real life lightning conditions.

Continuous recognition of a speaker and open source approach

“The confusion between a person and another is the main technical obstacle to use robots in a public area, where speakers are numerous. A single error is enough for the robot to lose the record of the conversation and to start the dialogue from the beginning or to continue with the wrong speaker,” emphasizes Odobez. To evaluate and improve the robot’s performance, researchers recorded many interactions in real life conditions. Then, they created a database encompassing a description of each interaction: at who is directed the speaker’s gaze, who is the interlocutor of the speaker, etc. The last step was to analyze these interactions with the algorithm to compare the results and to refine the algorithm. Unique in the human-robot interaction field, this data base, as well as the algorithms were shared with the scientific community to evaluate future improvements and were published and presented during the specialized IEEE RO-MAN 2020 conference.


Multi-modal algorithms able to learn from less data

The robot must be able to identify a human voice in a noisy environment and to localize it. Thanks to this, if necessary, it can turn its head towards the speaker to measure the person’s gaze direction, so it can determine who the person is talking to. Oral communication is essentially multi-modal, implying more than just a vocal content. Idiap specializes in particular in multi-modal analysis. “Combining visual and audio signals, the continuity and the flawlessness of interactions are much better,” explains the research engineer who was in charge of the remote piloting of the robot during the three months of the real life test in Finland.

The high level of success to recognize and maintain an interaction with a speaker is one of the feat of this trial. To achieve this, Idiap researchers developed learning techniques with so called low labelled data. “Usually, to use machine learning, we have to provide algorithms with a lot of audio or video data including a lot of associated information. But automatically collecting this information is difficult and the labeling process is a lengthy and expensive process,” specifies Odobez. Reducing the costs of this labeling process is crucial. For example, to learn to localize a sound source, usually, it is necessary to record one or more sounds and to label where each source is. With the novel algorithm the mention of the number of sounds is enough and there is no need to localize them, which is more convenient.

More information

-    Perception and activity understanding group
-    H2020 Mummer project
-    IEEE RO-MAN 2020 Conference
-    Mummer Results in Brief article on CORDIS website