# Programme 2012

# MLWS2012 Programme (Monday November 19th, EPFL)

- 9h15 – Welcome
- 9h30 – Scalable and accurate learning of sparse Gaussian Markov random fields
- 10h00 – Large Scale Variational Bayesian Inference for Structured Scale Mixture Models
- 10h30 – Structured Sparse Coding for Machine Listening
- 11h00 – Smart Feature Sampling for Boosting in Large Feature Space
- 11h30 – Learning Binary Local Feature Descriptors
- 12h00 – Learning to Learn by Exploiting Prior Knowledge
- 12h30 – Lunch
- 13h30 – Contextual Conditional Models for Smartphone-based Human Mobility Prediction
- 14h00 – Realtime Face Tracking
- 14h30 – A generative model for person-independent gaze estimation from RGB-D cameras
- 15h00 – A Scalable Formulation of Probabilistic Linear Discriminant Analysis: Applied to Face Recognition
- 15h30 – Structured prediction models and their applications to image segmentation
- 16h00 – Kernel-based automatic change detection in remote sensing optical images

# Abstracts

## Scalable and accurate learning of sparse Gaussian Markov random fields

*Anastasios Kyrillidis, Volkan Cevher*

We describe new optimization methods for learning sparse Gaussian Markov random fields. In this setting, the convex ell1 regularized log-det divergence criterion has been shown to produce theoretically consistent learning. This objective function is challenging for optimization since the ell1 term is non-smooth, the log-det objective is non Lipschitz gradient, and the learning problem is high-dimensional. To this end, we first describe an accelerated first-order method which leverage the self-concordant property of the log-det objective for step size selection. The resulting algorithm has linear convergence and exhibits superior empirical results as compared to the state-of-the-art first order methods. We then describe a second order method, which, in addition to our self-concordant step size selection rule, exploits further structure in the second order approximation of the log-det function to efficiently obtain Newton directions. The resulting algorithm has superlinear convergence, and typically requires one-third of the number of matrix inverses that the state-of-the-art QUIC algorithm needs to reach the same level of accuracy.

## Large Scale Variational Bayesian Inference for Structured Scale Mixture Models

*Young Jun Ko, Matthias Seeger*

Natural image statistics exhibit hierarchical dependencies across multiple scales. Representing such prior knowledge in non-factorial latent tree models can boost performance of image denoising, inpainting, deconvolution or reconstruction substantially, beyond standard factorial “sparse” methodology. We derive a large scale approximate Bayesian inference algorithm for linear models with non-factorial (latent tree-structured) scale mixture priors. Experimental results on a range of denoising and inpainting problems demonstrate substantially improved performance compared to MAP estimation or to inference with factorial priors.

## Structured Sparse Coding for Machine Listening

*Afsaneh Asaei, Hervé Bourlard*

In this talk, we take a new perspective to the analysis of multi-sensor observations and propose a structured sparse coding framework to extract the spatio-spectral information embedded in the acoustic scene. Inspired from the studies on sparse coding of sensory information in biological systems, our goal is to investigate how machine listening paradigm could exploit the sparsity models to process the information.

We characterize the multi-channel microphone recordings as compressive sensing of the acoustic field data. We derive a spatio-spectral representation of the concurrent sound sources and formulate a unified theory to identify the location of the sensors and sources and their spectral components in the present of acoustic multipath. The proposed framework incorporates the model underlying the spectrographic speech representation as well as the acoustic channel for extraction of the information bearing components.

The presented theory is evaluated on the data collected at Idiap smart meeting room. The results provide compelling evidence of the effectiveness of model-based sparse recovery formulations for multi-channel audio prcessing and opens up avenues of research on a unified modeling and processing framework for future generation technologies.

## Smart Feature Sampling for Boosting in Large Feature Space

*Charles Dubout, François Fleuret*

Classical Boosting algorithms, such as AdaBoost, build a strong classifier without concern about the computational cost during training.

Some applications, in particular in computer vision, may involve up to millions of training examples and features. In such contexts, the training time may become prohibitive. Several methods exist to accelerate training, typically either by sampling the features, or the examples, used to train the weak learners. Even if those methods can precisely quantify the speed improvement they deliver, they offer no guarantee of being more efficient than any other, given the same amount of time.

We will show in this presentation several strategies we have developed to optimize the choice of features during training. We evaluate the performance of these methods with dozens of families of features for image classification. We show that our approaches outperform systematically variants of uniform sampling and state-of-the-art methods based on bandit strategies.

## Learning Binary Local Feature Descriptors

*Tomasz Trzciński, Pascal Fua*

Binary descriptors of image patches are increasingly popular given that they require less storage and enable faster processing. This, however, often comes at a price of lower recognition performances. In this talk, I will propose two efficient methods to learn a set of binary embeddings. First of them allows as to boost the performances by projecting the image patches to a more discriminative subspace, and threshold their coordinates to build our binary descriptor. Applying complex projections to the patches is slow, which negates some of the advantages of binary descriptors. Hence, our key idea is to learn the discriminative projections so that they can be decomposed into a small number of simple filters for which the responses can be computed fast.

The second method that we propose allows us to improve the quality of the binary descriptors even further by leveraging the boosting-trick and powerful gradient-based weak classifiers. Contrary to previous approaches, we present a sequential learning strategy which enables simultaneous optimization of both the descriptor weighting and shape to find discriminative and compact binary embedding. The resulting descriptor is more complex than the first one, yet it significantly outperforms the state-of-the-art binary and floating-point descriptors (such as SIFT) with as few as 64 bits.

## Learning to Learn by Exploiting Prior Knowledge

*Tatiana Tommasi, Barbara Caputo*

One of the ultimate goals of open ended learning systems is to take advantage of experience to get a future benefit. We can identify two levels in learning. One builds directly over the data : it captures the pattern and regularities which allow for reliable predictions on new samples. The other starts from such an obtained source knowledge and focuses on how to generalize it to new target concepts : this is also known as learning to learn. Most of the existing machine learning methods stop at the first level and are able of reliable future decisions only if a large amount of training samples is available. This talk is focused to the second level of learning and it addresses how to transfer information from prior knowledge, exploiting it on a new learning problem with possibly scarce labeled data. We propose several algorithmic solutions by leveraging over prior models or features. All the proposed approaches evaluate automatically the relevance of prior knowledge and decide from where and how much to transfer without any need of external supervision or heuristically hand tuned parameters. A thorough experimental analysis shows the effectiveness of the defined methods both in case of inter-class transfer and for adaptation across different domains.

## Contextual Conditional Models for Smartphone-based Human Mobility Prediction

*Trinh-Minh-Tri Do, Daniel Gatica-Perez*

Human behavior is often complex and context-dependent. This paper presents a general technique to exploit multidimensional contextual variables for human mobility prediction. We use an ensemble method, in which we extract different mobility patterns with multiple models and then combine these models under a probabilistic framework. The key idea lies in the assumption that human mobility can be explained by several mobility patterns that depend on a subset of the contextual variables and these can be learned by a simple model. We showed how this idea can be applied to two specific online prediction tasks: What is the next place a user will visit? How long will he stay at the current place?. Using smartphone data collected from 153 users during 17 months, we show the potential of our method in predicting human mobility in real life.

## Realtime Face Tracking

*Sofien Bouaziz, Mark Pauly*

In this talk I will present some of the technical challenges in realtime face tracking using consumer-level RGB-D devices, e.g. Kinect.

In our current system the user is recorded in a natural environment using a non-intrusive, commercially available 3D sensor. The simplicity of this acquisition device comes at the cost of high noise levels in the acquired data. To effectively map low-quality 2D images and 3D depth maps to realistic facial expressions, incorporating prior knowledge about the facial expression space into the system is essential.

I will present some of the machine learning techniques that can be used to effectively build such a prior and to improve the tracking result.

In a live demo, I will demonstrate that compelling 3D facial dynamics can be reconstructed in realtime without the use of face markers, intrusive lighting, or complex scanning hardware.

## A generative model for person-independent gaze estimation from RGB-D cameras

*Kenneth Funes, Jean-Marc Odobez*

In this presentation we will show our on-going work on remote gaze estimation from RGB-D cameras. In particular we address the challenges of inter-person eye appearance variability and person-specific gaze model adaptation. With that aim, we developed a generative model tailored for this task that relies on the known shape of the human eye to encode a strong geometrical prior. The model has the advantage of handling prior values and uncertainties for the desired variables. In order to do inference, we developed a variational framework, with approximate techniques to handle the non-linear and non-conjugate relations imposed by the geometric nature of the model. Initial results will be presented.

## A Scalable Formulation of Probabilistic Linear Discriminant Analysis: Applied to Face Recognition

*Laurent El-Shafey, Sebastien Marcel*

In machine learning, classification problems are challenging, because they commonly have to deal with classes which have large intra-class and low inter-class variabilities. Linear Discriminant Analysis (LDA) is a popular technique that aims at explicitly modeling these variabilities. This technique relies on the projection of the feature data to a space where the ratio of between-class variation to within-class variation is maximised, and makes use of a distance measure in this projected space.

Recently, generative probabilistic approaches have yielded significant progress in the field of object recognition. Probabilistic Linear Discriminant Analysis (PLDA) is such a model, and incorporates components that describe both between-class and within-class variations. However, rather than basing classification on distance comparisons, the PLDA approach calculates the likelihood that two objects are from the same class. This algorithm has been shown has been shown to provide state-of-the-art performance for both face and speaker recognition.

In this talk, we firstly introduce the PLDA framework and describe the training and recognition procedures, as originally given in [1]. Next, we propose a novel scalable derivation for both the training and recognition steps, which significantly reduces the complexity with respect to the number of samples per class. Finally, we demonstrate the efficacy of our approach by conducting face verification experiments.

[1] "Probabilistic models for inference about identity", P. Li, Y. Fun, U. Mohammed, J. Elder and S. Prince, IEEE Transactions on Pattern Recognition and Machine Intelligence, vol. 34, issue 1, pp. 144--157, 2012.

## Structured prediction models and their applications to image segmentation

*Aurelien Lucchi, Pascal Fua*

Structured prediction has become an increasingly prominent field in machine learning, with a wide range of applications such as bio-informatics, natural language processing, and computer vision.

While graphical models, such as Markov random fields and conditional random fields, are very attractive for these tasks due to their ability to represent the inter-dependency between variables, efficient learning of such models remains a major challenge, especially at large scales. This talk will be a tutorial on the maximum-margin framework for learning MRFs and CRFs. We will present an application of this framework for the problem of image segmentation and show how the learning procedure can scale to fairly large 3D medical datasets.

## Kernel-based automatic change detection in remote sensing optical images

*Michele Volpi, Frank de Morsier, Giona Matasci, Mikhail Kanevski, Jean Philippe Thiran, Devis Tuia*

Due to recent technical developments, very high resolution imagery is nowadays publicly available. Moreover, next generation hyperspectral sensors will allow to remotely sense hundreds of spectral bands and thus will offer unprecedented possibilities for the detection of objects at the surface of the Earth. Given these advances, satellite image processing has become an active research field where machine learning and signal processing interact constantly in a challenging scenario: high dimensionality, few labeled examples and noisy environment. In this talk, we will present some recent advances in the field of automatic change detection. Solutions based on kernel methods will be discussed and applied to real world problems.