Object Detection with Active Sample Harvesting

Practical deployment of Machine Learning techniques relies on the existence of large training data sets, which exhibit all the difficulties to be met in practice, and are labeled by human experts. While critical, the production of training data remains a subject poorly studied. Efficient labeling has been tackled with active learning, to concentrate the human effort on the examples which truly influence the learning. The production of unlabeled data, however, has not been studied in itself. The objective of this project is to address these two tasks, and to develop novel, efficient, and mathematically sound procedures to produce very large quantities of labeled data to train part-based object detectors. The first part of this proposal focuses on the extension of active learning to the particular situation of object detection with part-based models. We will define multiple levels of information,spanning from the mere presence of an object in an image, to the locations of its individual parts. From there, instead of simply identifying subsets of samples whose labels are likely to be informative, the procedure we envision will select pairs of samples and levels of labeling, so that the ratio information / labeling cost will be maximum. The second part will address the production of unlabeled data. We introduce the idea of "data harvesters", web-crawling daemons built upon goal-planning algorithms. These harvesters will model the relation between information attached to the web source of images (web site, extual context of images, date and time, gps coordinates, camera type, etc.) and their usefulness for training. From this model, harvesters will implement goal-planning strategies, to properly balance exploitation and exploration, trying both to localize good sources of data for training, and to download data from the ones already identified.
Machine Learning
Idiap Research Institute
Swiss National Science Foundation
Oct 01, 2012
Sep 30, 2016