My research is about statistical learning, from techniques using hand-designed stochastic models, to learning problems such as feature selection, active learning, or efficient Boosting. I strongly believe in looking jointly at the statistical and algorithmic efficiency, as most of the algorithms trade one for the other.
My current work is applied mainly to computer-vision, both with my group at the Idiap research institute and in collaboration with the Computer Vision Lab at EPFL.
Joint work with Charles Dubout.
In the context of the MASH project (Fleuret et al. 2011), we want to build classifiers from a large number of families of features. We are currently handling ~30 such families, and we target learning systems able to cope with at least one order of magnitude more.
We have proposed two variants of Adaboost to cope with these difficulties. The first one, dubbed Tasting (Dubout & Fleuret 2011), consists of sampling a few features from each family before the learning starts, and to use this features to estimate at every Boosting step the most promising feature family, so that we can bias the sampling accordingly. We distribute the source code for that work under a mix of GPLv2 and BSD licenses.
The second one, that we named Adaptive Maximum Sampling (Dubout & Fleuret 2011b) models the loss reduction as a function of the number of features looked at, and the number of samples used to estimate edges. This model allows to optimize the trade-off between the two.
Joint work with Nicolae Suditu.
We are adapting a state-of-the-art interactive image retrieval system to very large image data sets. The algorithm we started from relies on the estimate of a probability of relevance of any image, given the interaction the user had with the system, under a sound statistical model.
We propose to use a hierarchical partitioning of the image collection computed off-line, and modulate during the interactive search the resolution in different parts of the collection (Suditu & Fleuret 2011). This strategy maintains an accurate approximation of individual probabilities of relevance of images, while fixing an upper bound on the required computation.
Joint work with Karim Ali and David Hasler.
To limit the need for labelled data, we have developed a new method to exploit motion consistency in videos (Ali et al. 2011). We start by labeling the targets in a few frames of the video, and from then we alternate the training of an appearance-based classifier, and the estimation of trajectories physically possible and consistent with the appearance-based detection.
We managed to minimize the same loss in both steps, and leverage the flow-based multi-target tracking we have developed for multi-camera tracking.
The resulting procedure is amazingly stable and we obtain in certain cases with a fraction of the frames labeled better results than with the full sequence labeled. This is probably due to the ability of the system to be more consistent through the sequence in its labeling, and to chose locations of target more friendly to the appearance-based detector.
Joint work with Leonidas Lefakis.
One of the very standard approaches to object detection consists of training a sequence of two-class classifiers, each one designed to catch all the positive examples and filter out as much as possible the negative samples not caught by its predecessors.
We have developed a Boosting variant which trains all these classifiers simultaneously, by adding weak learners in each of them sequentially (Lefakis & Fleuret 2010). This procedure relies on a stochastic interpretation of the classifier responses: Each one is interpreted as the (log ratio of the) probability that the classifier Binary response is positive, and the overall response of the cascade is the (log ratio of the) probability that they all respond positively, under an assumption of independence.
The resulting procedure pushes all the classifiers to respond properly on the positive samples, and pushes the classifiers "already good" to get even better on each negative samples.
Joint work with Horesh Ben Shitrit, Jérôme Berclaz and Pascal Fua.
We are investigating the detection of individuals in video streams and estimation of their locations on the ground plane (Fleuret et al. 2008). The first part of our algorithm is the Probabilistic Occupancy Map, which consists of estimating in each time frame independently an approximation of the marginal probabilities of presence at every location with a naive mean-field type procedure. I gave a talk at Microsoft research about it.
The second part filters these detections with a time-based regularization. We introduce a graph whose vertices are locations in time/space, and edges are motions which are physically possible (Berclaz et al. 2011). Finding the trajectories of an a priori unknown number of targets boils down to finding a flow minimizing a linear cost in this graph, which is a convex problem that can be solved very efficiently. Recently, we have added an appearance-based model to identify the two teams in a basketball game (Ben Shitrit et al. 2011).
Also, we have investigated how to model more sophisticated behaviors by introducing "behavioral maps" (Berclaz et al. 2008). The motion model is estimated through a generalized EM procedure from raw frame-by-frame detection results. This new model improves the quality of the tracking and allows for the automatic detection of atypical behaviors.
There are plenty of results and videos on the CVLab page.
You can download the source code of the Probabilistic Occupancy Map, distributed under the terms of the version 3 of the GNU General Public Licence.
Joint work with Donald Geman.
We developed a novel algorithm for the detection and the estimation of the pose of complex objects in cluttered scene (Fleuret & Geman 2008). We propose the notion of pose-indexed features which should ideally have a response distribution which does not dependent on the pose, given that a target is present. This novel idea allows to train a single classifier common to many different poses, and fixes the main weakness of the coarse-to-fine strategy for object detection (Fleuret & Geman 2001). We demonstrate the performance of that approach on cat detection. You can watch a talk given at Google Zurich in October 2008 on the topic.
You can download the complete data set and the source code. The latter is distributed under the terms of the version 3 of the GNU General Public Licence.
Joint work with Karim Ali, David Hasler and Pascal Fua.
The pose-indexed features we introduced for the cat detection require a dense exploration of a geometrical pose space during detection. Controlling the cost of this search is a key issue, and we have usually used methods combining coarse-to-fine representations and lazy evaluations.
In this project, we try another venue to address the same issue. We propose to learn jointly both the features and estimators of the latent pose parameters. These estimators are closed form rules able to compute a parameter directly from the signal.
We tested the efficiency of this new idea in the context of hand detection for industrial applications (Ali et al. 2009).
Boosting can be seen as a gradient descent: at every step, the learning procedure adds a weak learner corresponding to the direction of maximum local reduction of a loss. Hence, there is a natural generalization to a multi-layer structure similar to MLP. The derivative of the loss with respect to the intermediate responses is propagated through the predictor, and weak learners are added in the inner functionals in a similar fashion as with classical boosting.
As expected, such a structure with a hidden layer performs substantially better for pattern recognition than the combination of classical boosting with a local feature extraction trained in a non-supervised manner (Fleuret 2009).
Joint work with Germán González Serrano and Pascal Fua.
This work tackles the problem of automatic network delineation, with an emphasis on dendritic trees in neural tissues (Gonzalez et al. 2008). We proposed a novel approach mixing machine learning for the local characterization of filament-like locations, with a Bayesian modeling and stochastic optimization of the global tree network. We have investigated the use of steerable filters to create a filament detector which can be dedicated on-the-fly to an arbitrary orientation for a minimal computational overhead in both 2D (Gonzalez et al. 2009) and 3D (Gonzalez et al. 2009b).
You can see a short video (6Mb, divx) illustrating a simpler 2D algorithm we developed during a pre-study (Fleuret & Fua 2006) for dendrite reconstruction.
Joint work with Gilles Blanchard.
The main weakness of learning techniques is the necessity to have large training sets. We worked on the learning of high-level invariance from a large set of images, to be able to learn the appearance of a new object from a single image of it. We called our approach Chopping (Fleuret & Blanchard 2005) since it relies on a very large number of binary splits of the image space. You can download the database of images of 150 latex symbols we used, in mnist format.
Feature selection based on conditional mutual information (Fleuret 2004) gives good statistical performances and is computationally efficient. This algorithm can select tens of features from a family of tens of thousands in a less than a second on a standard PC. Experiments demonstrate that on tasks such as drug screening or image classification, a naive Bayesian classifier combining features selected with this technique achieves error rates similar to those obtained with sophisticated techniques such as SVM or boosting.
You can download the source code of the CMIM feature selection algorithm, distributed under the terms of the version 3 of the GNU General Public Licence.
Beside this main themes, I also worked on multiple testing (Blanchard & Fleuret 2007), cancerous cells detection and on key-point characterization (Özuysal et al. 2006).
My PhD and part of my post-doc was about object recognition and face-detection. We developed an original coarse-to-fine algorithm very efficient both in speed and error rates (Fleuret 2000, Fleuret & Geman 2001, Fleuret & Geman 2002).
I also worked on kernel design (Fleuret & Sahbi 2003, Boughorbel et al. 2005, Fleuret & Gerstner 2005), functional neural networks (Rossi, Conan-Guez & Fleuret 2002a, Rossi, Conan-Guez & Fleuret 2002b), goal-planning (Fleuret & Brunet 2000), texture segmentation (Shahrokni et al 2009), content-based image retrieval (Boujemaa et al. 2001, Boujemaa et al. 2004) and object recognition (Jedynak & Fleuret 1996, unfortunately in French).