Interactive Cognitive Systems
Cognitive systems have always been a strong theme of the computing
research community in general, and Idiap in particular. Until the last
decade, such research tended to be placed under headings such as speech
recognition or image processing. The scenarios were typically unimodal,
with tight constraints on how a human could behave with respect to the
computer. As we progress in time, however, our systems produce better
and better results on evaluation databases, and we are able and obliged
to move the goalposts. For example, speech recognition has to be able to
deal with spontaneity, background noise, adaptation to the environment
and task, as well as the multilingual aspects (too often underestimated,
with main emphasis on English only). In robotic vision, also covered by
the present project, computers have to be able to adapt to changing
environments and extract relevant semantic information.
This project thus encompasses fundamental research aiming at the
development of advanced techniques towards Interactive Cognitive Systems
(computers and robots) for the processing and interpretation of
cognitive audio and visual scenes. While being oriented to fundamental
research, its core objective is the study of methods applied to the
domains of activity of the Idiap Research Institute.
In the present proposal, we briefly describe four research projects that embody some of the challenges described above:
ICS-1: Robust privacy-sensitive audio features for interaction modeling.
On one hand, advances in cognitive systems trigger more and more
privacy preserving issues. On the other hand, it is also interesting to
see how much information can be extracted about human-computer and
human-human interaction by using audio features that fully preserve the
privacy of the users (typically avoiding to extract lexical and identity
information). Thus, this project investigates how to detect and model
interaction, and how it relates to other aspects of natural human
behaviour, based on privacy-preserving features only.
ICS-2: Multilingual speech recognition. The goal of this sub-project is
to extensively investigate how to extrapolate Idiap’s leading edge in
(English) speech recognition to multiple languages, including at least
Swiss national languages. In this context, we are looking for principled
approaches towards the definition and training of shared multi-lingual
phone sets, fast adaptation of mono-lingual systems, or composition of
multiple (mono-lingual) systems.
ICS-3: Learning semantic spatial concepts for mobile robots. In this
sub-project, we investigate how a robot can adapt itself to a possibly
changing environment. Rather than stick to static outdoor environments,
we focus on an indoor home or office environment, where furniture and
people move around. Although we are initially focusing on a computer
vision modality, the work has the potential to diverge into audio based
cognition.
ICS-4: Conversation analysis based on speaker diarization. Idiap has
always been at the leading-edge in the area of speaker diarization (“Who
spoke when”?). ICS-4 proposes a novel speaker diarization approach that
is adaptive to its context, taking cues not only from the speakers
themselves, but also from the higher semantic context available from
dialogue and turn-taking.
The above sub-projects span the traditional cognitive spectrum of audio
and video, but also include the emerging field of social cognition and
should provide potential for strong interactions. This interaction will
be encouraged through the use of common tasks and databases and common
software.

