A novel approach for conversation transcripts retrieval boosted by R&D

In collaboration with a private company, Idiap researchers presented a novel approach to retrieve information from conversations’ transcripts. Their method uses both automatic speech recognition and natural language processing technologies.

Retrieving information from the transcript of an informal meeting can be challenging. The challenge is even higher when a computer makes the transcription automatically. Why? Because such automatic speech recognition systems often make mistakes due to noisy environments, words with multiple meanings, simultaneous speaking, or people’s accent. Too many errors and the meaning of a transcribed text can significantly diverge from the original conversation. Searching in such a flawed document to retrieve information can quickly become impossible. During the latest ACM SIGIR Conference on Research and Development in Information Retrieval, researchers from Idiap Speech & Audio processing group presented a novel approach for retrieving information from a poor transcript.

A novel approach

The standard approach to the information retrieval problem is to improve the quality of the transcript. A more accurate automatically generated transcript makes it easier to search the document. Current state-of-the-art techniques offer a performance of about 10% in word error rate. However, it comes at a high cost as such automatic speech recognition systems are usually trained for a certain domain: a specific language, a given topic or a standardized type of input like a news broadcast. “This approach is not only costly and time-consuming, but it can’t be easily replicated in other domains,” Esaú Villatoro, research associate at Idiap’s Speech & Audio processing group and first author of the paper, explains. On the other hand, systems that are more generic can have a word error rate over 40% that can completely alter the meaning of the resulting transcript.

To produce these transcripts, researchers design algorithms that pick the best transcription hypothesis from a set of variations. When the rate of potential errors is very high, searching this best hypothesis is especially challenging. In this case, terms that are searched for can be absent from the best hypothesis and only appear elsewhere in the set of variations. Therefore, researchers opted for a novel approach searching the whole set of hypotheses. “To do so, we had to come up with a re-ranking algorithm that is focusing on semantics rather than on the best transcription. We were able to achieve this in an unusual way using natural language processing (NLP) techniques. It’s something of a new trend in the automatic speech recognition field,” Villatoro explains.

Boosted by a private partnership

Searching for a specific piece of information within a conversation represents a challenging task. “When chatting, people often jump from one topic to another one. In general, informal talks are less structured, making it even more difficult for automatic speech recognition systems to accurately process this type of data. That’s why our approach represents an attractive solution as it is specifically designed to tackle part of these challenges” Villatoro emphasizes. This work began as a collaborative research project with the Information Sciences Institute from the University of Southern California. Later, thanks to the collaboration with a private company sharing the same interest in this topic, the research was boosted. Representing a potential product, the company validates the proposed algorithms by Idiap researchers on their own data, allowing for quicker improvements and a more robust system.
As the institute’s strategy is to strengthen collaborations with the industry, this example perfectly illustrates how such partnerships benefit both research and the economy. A trend that is particularly true in the information retrieval domain, where private companies are keen to participate in top conferences and to be associated with scientific papers.

More information

- Speech & audio processing research group
- “Expanded Lattice Embeddings for Spoken Document Retrieval on Informal Meetings” by Esau Villatoro-Tello, Srikanth Madikeri, Petr Motlicek, Aravind Ganapathiraju and Alexei V. Ivanov
- ACM SIGIR conference