DocRec - Keyword Extraction and Document Recommendation in Conversations
The package contains several pieces of Matlab code. Taken together, they extract keywords from a conversation, then use them to build implicit queries, and then consolidate the sets of retrieved documents to recommend to the conversation participants.
First a list of keywords is extracted from the conversation transcript. Then, the keywords from the list are topically clustered into several topically-independent subsets. Each subset represents an implicit query, which is submitted to the Lucene search engine (available from lucene.apache.org) to retrieve documents from Wikipedia (using a dump of the pages available from dumps.wikimedia.org) or any other local repository indexed by Lucene.
Finally the lists of results from each separate query are merged using a merging method that favors diversity of topics among the recommended documents.