Acoustic Model Adaptation toward Spontaneous Speech and Environment

In previous years, we have focused our automatic speech recognition (ASR) research with Samsung on accents and on multi-linguality. In this year, we propose to focus on “Natural” user interfaces. By natural, we mean that the interface should function in such a way that the user should not have to behave differently from when he or she interacts with a person. Of course, there are many facets to this; however, two are pertinent: conversational/spontaneous speech and recognition exploiting natural sensors. Speech user interfaces typically rely on being able to place a microphone close to the user’s mouth. This maximizes the volume and clarity of the speech signal, whilst minimizing the effect of other noise in the vicinity. Such an interface is natural for, say, a telephone. However, many applications do not lend themselves to this type of interface. Examples include most home electronics, where the user might typically be in the center of a room, but the device is near a wall. In the case of televisions, a useful intermediate device is the remote control. Nevertheless, it is still inconvenient to hold a remote control like a telephone in order to talk to it.
Idiap Research Institute
Samsung Electronics Co., Ltd, Media Solution Center
May 01, 2014
Dec 31, 2014