Importance of context in machine translation

Perfect understanding of a text is not possible without context. Same goes for a computer. It cannot do a good translation without understanding what is really defined in a text. At Idiap, Lesly Miculicich conducted research to make this possible through her thesis which she successfully defended last February.

Have you already seen obvious mistakes when translating a text automatically online? Until now machine translation only worked through a sentence-based model. Each sentence was translated independently. But this method is not ideal to translate a document in a really coherent way. Indeed, there are linguistic connections which link the sentences together, and which are necessary to understand the meaning of a text. These connections are not present with such a model leading to an inconsistent translation with many mistakes.

As part of her thesis at Idiap, Lesly Miculicich focused on this phenomenon in order to change this approach and significantly improve the automatic translation of a text, notably by including the importance of context and the notion of coreference. Miculicich sought to include these notions in machine translation. “A language can be thought of as a sequence of words, but internally there is also a structure with connections between words that are not in the same sequence. This is what we define by the syntax and semantics of a text,” explains Miculicich.

The researcher's approach focuses on the mentions of so-called entities, i.e. nouns and pronouns. Then to evaluate how effective the inclusion of notion of coreference between these mentions is. Subsequently, she proposes to infer long-term connections by incorporating a ‘self-attention’ mechanism, which would focus on deducing the links between the content of a sentence and the rest of the text. Likewise, Miculicich takes hierarchical representations where words, phrases and sentences do not have the same value and help to summarize context. This new model based on contextual information and the relation of the link between sentences allows machine translation to no longer separate a text into a series of sentences, but to take it as a whole, where all the information is considered and can be linked together.

Lesly Miculicich's work can enable computers to understand the context of a document through its syntax. Computers could even learn concepts naturally acquired by a human being such as semantics and common sense, we use daily and unconsciously in our language. “In the future, we might even imagine that this method would allow the content of an entire book to be translated automatically and perfectly,” concludes the researcher.

