wmil-sgd: A weighted multiple-instance learning algorithm based on stochastic gradient descent
This is a Python implementation of the weighted multiple-instance learning (WMIL) algorithm based on stochastic gradient descent described in , extending our earlier proposal .
The WMIL-SGD algorithm is a weakly supervised learning model, which jointly learns to focus on relevant parts of a document according to the context along with a classifier for the target categories. The model takes as input a document (bag), which consists of multiple input vectors (instances), possibly from a neural network. The model learns to compute a weighted average of these vectors by estimating the weights for each document and target categories. We have applied WMIL-SGD to multi-aspect sentiment analysis, segmentation, and summarization. We have shown that the weights predicted by WMIL-SGD match human estimates of the importance of each sentence on a dataset of audiobook reviews .
 Pappas N. & Popescu-Belis A. (2017) - Explicit Document Modeling through Weighted Multiple-Instance Learning. Journal of Artificial Intelligence Research (JAIR).
 Pappas N. & Popescu-Belis A. (2014) - Explaining the Stars: Weighted Multiple-Instance Learning for Aspect-Based Sentiment Analysis. Proceedings of EMNLP 2014 (Conference on Empirical Methods in Natural Language Processing), Doha, Qatar, p.455-466.
 Pappas N. & Popescu-Belis A. (2016) - Human versus Machine Attention in Document Classification: A Dataset with Crowdsourced Annotations. Proceedings of the EMNLP 2016 SocialNLP workshop (4th International Workshop on Natural Language Processing for Social Media), Austin, TX, p. 94-100.