D-Box: A generic dialog box for multilingual conversational applications

From a research point of view, the project requires building a multilingual conversational agent which will seamlessly interact with multiple users speaking different languages and driven by a common goal defined by the game. This involves the development and integration of multilingual speech recognition systems, multilingual speech synthesis, multilingual dialog modeling, and cross-domain adaptation resources. From an integration and evaluation point of view the project's key innovative idea is that the overall anticipated framework will be application-agnostic. This will be achieved by defining an interface layer which will mediate between the dialog engine interacting with the users and the game engine running the game specification; i.e. from a technical point of view the agent will be realized as middleware between the project's dialogue engine and an application engine. This design benefits the application by abstracting away the level of conversational interaction which will be modeled independently. It also benefits the user since a new application may use the middleware that has already been adapted to the user and to the environment. D-Box is thus concerned with addressing scalability, a common problem of today's interactive conversational systems, through the development of a conversational agent that is cheaply and easily portable across languages and adaptable to different domains and environments. In order to validate portability, the framework developed in the context of a game scenario will be in parallel evaluated and tested in the context of a standard commercial voice-based dialogue system.

The D-Box project will target three EU languages: English, German, and French. This choice was based on the availability and maturity of language processing components with the consortium partners, but more importantly by commercial considerations and the anticipated target markets. However, the dialogue engine will be language independent. Finally, as opposed to the R&D components, the middleware architecture will be designed in such a way that it can ultimately support other games of the same type, while also being capable (of course) of supporting multiple other languages.

Although the goals of this project are quite challenging, they are also extremely exciting and full of application potential. Furthermore, the project has been designed to maximize its chances of success by directly leveraging on leading edge multi-disciplinary consortium expertise in: networked/collaborative gaming platforms (MIPUMI), multilingual dialog management and modeling (UDS, IDIAP), multilingual speech recognition (IDIAP, UDS, Koemei) and synthesis (Acapela, IDIAP), multilingual (spoken and written) language understanding (UDS, IDIAP), and commercial voice solutions (Sikom). All partners have broad experience in collaboration on international multidisciplinary projects, and bring a strong commitment to the project.

Project Funding Party

EUREKA Pan-European research and development funding and coordination organization

Project Leader

MIPUMI Mi'pu'mi Games GmbH

Project Partner

ACAPELA Acapela
IDIAP Idiap Research Institute
KOEMEI Koemei SA
SIKOM SIKOM Software GmbH
UDS Universität des Saarlandes

Keywords: Human Language Technologies, Applications software

Contact: MOTLICEK, Petr