Improving the coherence of machine translation output by modeling intersentential relations
Machine translation (MT) has made significant progress in the past
decade, but its focus has remained on the translation of sentences
considered individually. However, in order to ensure overall coherence
throughout a translated text, an MT system must also consider and
render correctly the items that depend on intersentential relations.
The perceived coherence of a translated text, and therefore its
overall quality, are mainly influenced by the following markers:
pronouns, verb tense/mode/aspect, discourse connectives, and
politeness/style/register. None of these markers can be reliably
translated on a pure sentence-by-sentence basis.
This project aims at extending the current statistical MT approach by
modeling these intersentential dependencies (ISDs), along the
following five themes: linguistic analysis; corpus data, annotation
and test suites; automatic identification of intersentential
dependencies; statistical machine translation for ISD-labeled texts;
and evaluation methods for MT coherence and their application. The
project involves researchers in human language technology, machine
learning, linguistics, and system evaluation, coming from three
different groups with extensive contributions to the relevant fields.
Their collaboration is grounded in several previous joint
achievements, and will lead to the design of a robust, operational
system. The project will significantly boost the dynamics of Swiss
research in MT and will contribute to position it more firmly within
the European and international community.
Partners
Idiap (/Dr. Andrei Popescu-Belis/), UniGe/Department of Linguistics (/Prof. Jacques Moeschler, Dr. Paola Merlo, Dr. Sandrine Zufferey/), UniGe/Department of Computer Science (/Dr. James Henderson/)

