Idiap on LinkedIn Idiap youtube channel Idiap on Twitter Idiap on Facebook
Personal tools
You are here: Home Research Resources DocRec - Keyword Extraction and Document Recommendation in Conversations

DocRec - Keyword Extraction and Document Recommendation in Conversations

— filed under:

The package contains several pieces of Matlab code. Taken together, they extract keywords from a conversation, then use them to build implicit queries, and then consolidate the sets of retrieved documents to recommend to the conversation participants.

First a list of keywords is extracted from the conversation transcript. Then, the keywords from the list are topically clustered into several topically-independent subsets. Each subset represents an implicit query, which is submitted to the Lucene search engine (available from lucene.apache.org) to retrieve documents from Wikipedia (using a dump of the pages available from dumps.wikimedia.org) or any other local repository indexed by Lucene.

Finally the lists of results from each separate query are merged using a merging method that favors diversity of topics among the recommended documents.

Document Actions
Resource Information
Resource type: software
URL: https://github.com/idiap/DocRec
Date: Nov 02, 2015
Nature: Software
Lifespan: Not limited
Size: 16'469 Ko
Audience: Researchers in Natural Language Processing and Information Retrieval
Access: open source
Ownership: Idiap Research Institute
Distribution: via Idiap's github at https://github.com/idiap/DocRec
License:
Contact: Andrei POPESCU-BELIS
+41 277 217 729