# Swiss Machine Learning Day 2015

This is the fourth iteration of the event. It took place at EPFL on Tuesday November 10th, 2015.

# Program

- 9:00 – 9:30
- Fast and Accurate Inference of Plackettâ€“Luce Models (Lucas Maystre, EPFL)
- 9:30 – 10:00
- Robust Principal Component Analysis on Graphs (Nauman Shahid, EPFL)
- 10:00 – 10:30
- Time-Varying Gaussian Process Bandit Optimization (Ilija Bogunovic, EPFL)
- 10:30 – 10:45
**Coffee break**- 10:45 – 11:15
- An invariant regulariser for learning with dynamical systems (Pablo Ramon Strasser, UNIGE)
- 11:15 – 11:45
- Learning POMDP policies from human demonstrations in haptic search tasks (Guillaume de Chambrier, EPFL)
- 11:45 – 12:15
- Active learning for reconstruction of curvilinear structures (Agata Justyna Mosinska, EPFL)
- 12:15 – 14:00
**Lunch break**- 14:00 – 14:30
- Weighted Multiple Instance Learning for Text Regression Tasks (Nikos Pappas, IDIAP)
- 14:30 – 15:00
- An HMM-based formulation for automatic derivation of phone-like subword units and pronunciation lexicon development (Marzieh Razavi, IDIAP)
- 15:00 – 15:30
- Data-driven surface normal learning for accurate multi-view stereo reconstruction (Silvano Galliani, ETHZ)
- 15:30 – 15:45
**Coffee break**- 15:45 – 16:15
- Learning Robot Manipulation Tasks with Tied Gaussian Mixture Models (Ajay Tanwani, IDIAP)
- 16:15 – 16:45
- Fast computation of adversarial examples in convolution networks (Seyed-Mohsen Moosavi-Dezfooli, EPFL)
- 16:45 – 17:15
- Scalable Metric Learning via Weighted Approximate Rank Component Analysis (Cijo Jose, IDIAP)

# Abstracts

## Fast and Accurate Inference of Plackettâ€“Luce Models (Lucas Maystre, EPFL)

We are interested in inferring a global ranking over items based on data in the form of (noisy) comparisons and / or partial rankings. Our approach is to postulate a generative model of comparisons / rankings, and infer model parameters. In particular, we focus on the well-known and widely-used Plackett-Luce model. In this work, we show that the maximum-likelihood (ML) estimate of this model can be expressed as the stationary distribution of a Markov chain. This (a) conveys insight into several recently proposed spectral inference algorithms and (b) enables us to formulate a new spectral algorithm that is more accurate and more general than prior work. With a simple adaptation, this algorithm can be used iteratively, producing a sequence of estimates that converges to the ML estimate. Our algorithms are easy to implement, making them relevant for practitioners at large.

## Robust Principal Component Analysis on Graphs (Nauman Shahid, EPFL)

Principal Component Analysis (PCA) is the most widely used tool for linear dimensionality reduction and clustering. Still it is highly sensitive to outliers and does not scale well with respect to the number of data samples. Robust PCA solves the first issue with a sparse penalty term. The second issue can be handled with the matrix factorization model, which is however non-convex. Besides, PCA based clustering can also be enhanced by using a graph of data similarity. In this article, we introduce a new model called `Robust PCA on Graphs' which incorporates spectral graph regularization into the Robust PCA framework. Our proposed model benefits from 1) the robustness of principal components to occlusions and missing values, 2) enhanced low-rank recovery, 3) improved clustering property due to the graph smoothness assumption on the low-rank matrix, and 4) convexity of the resulting optimization problem. Extensive experiments on 8 benchmark, 3 video and 2 artificial datasets with corruptions clearly reveal that our model outperforms 10 other state-of-the-art models in its clustering and low-rank recovery tasks.

## Time-Varying Gaussian Process Bandit Optimization (Ilija Bogunovic, EPFL)

We consider the sequential Bayesian optimization problem with bandit feedback, adopting a formulation that allows for the reward function to vary with time. We model the reward function using a Gaussian process whose evolution obeys a simple Markov model. We introduce two natural extensions of the classical Gaussian process upper confidence bound (GP-UCB) algorithm. The first, R-GP-UCB, resets GP-UCB at regular intervals. The second, TV-GP-UCB, instead forgets about old data in a smooth fashion. We derive regret bounds for both algorithms, expressed in terms of the kernel function, problem dimension, time horizon, and rate at which the function varies. We run these algorithms on both synthetic and real data, and we find the gradual forgetting of TV-GP-UCB to perform favorably compared to the sharp resetting of R-GP-UCB. Moreover, both algorithms significantly outperform classical GP-UCB, which is unsuitable for time-varying scenarios since it treats stale and fresh data equally.

Joint work with Jonathan Scarlett and Volkan Cevher

## An invariant regulariser for learning with dynamical systems (Pablo Ramon Strasser, UNIGE)

Sample variability is an inherent characteristic of many learning problems. In this paper we consider problems in which samples/trajectories are generated by some underlying dynamical systems. We model for the intrinsic sample variability by defining an invariant distance measure which captures the mechanism differences of the physical processes that generated the data. The new distance measure is more robust to sample variations. Under mild conditions the new distance measure can be approximated by the sum of trajectory-derivatives differences. We use the new measure as a regulariser. The experimental results show that its use brings significant performance improvements over commonly used regularizers in dictionary learning and predictive learning settings.

## Learning POMDP policies from human demonstrations in haptic search tasks (Guillaume de Chambrier, EPFL)

Decision making and planning for which the state information is only partially available is a problem faced by all forms of intelligent entities they being either virtual, synthetic or biological. The standard approach to mathematically solve such a decisional problem is to formulate it as a partially observable markov decision process. It's solution (a control policy) is tractable for a narrow set of problems but is intractable for instance high-dimensional continuous haptic search tasks.To address this problem, we take a programming by demonstration approach in which we model the decision making process followed by humans in blind haptic search tasks.

We model the humans position belief by a point mass filter and learn a mapping from belief space to a control policy in an Actor-Critic Fitted Q-Iteration framework in which the Actors is a Gaussian mixture model and the Critic is a non-parametric regression function. Through value and policy improvement we achieve an optimal policy.

We compare the Actor-Critic framework with our previous work (only considering a statistically learned policy) on the subject and demonstrate the necessity of including a reinforcement learning step. We further contrast our approach with belief space planning heuristics and demonstrate our methodology to be the most suitable representation.

## Active learning for reconstruction of curvilinear structures (Agata Justyna Mosinska, EPFL)

Recent years have seen significant advances in imaging technologies and allowed scientists to easily obtain large quantities of high quality data presenting curvilinear structures. To analyse it, great demands have been made for automated and intelligent solutions. Many of these methods are based on supervised machine learning and require large amount of ground truth data.

In this talk, we propose a method that has a potential to considerably accelerate the process of ground truth acquisition. Active Learning reduces training set size by incrementally learning from the user input and selecting the most informative subset of data. Our approach, unlike conventional techniques, exploits not only labelled data, but also unlabelled instances and the geometric relations between them by propagating the information between neighbouring edges. This way, we can identify uncertain image regions and query the expert about them.

We apply our method to reconstruction of the networks in 2D aerial and 3D microscopic images, where we decreased the annotation effort by up to 80%. We also demonstrate that the informative regions selected by our algorithm often coincide with the ambiguous parts from the human point of view and querying them improves the quality of the final reconstruction.

## Weighted Multiple Instance Learning for Text Regression Tasks (Nikos Pappas, IDIAP)

We present a model of multiple-instance learning applied to the prediction of ratings from user-contributed texts such as aspect ratings from product reviews. Each data point (text) is described by several independent feature vectors (one word vector per sentence). For learning from texts with known ratings, the model performs multiple-instance regression and assigns importance weights to each of the sentences of a text, uncovering their contribution to the text ratings. Then, the model is used to predict ratings in previously unseen texts, demonstrating interpretability and explanatory power for its predictions. We evaluate the model on three text regression tasks, namely aspect, sentiment and emotion rating prediction over seven publicly available data sets, improving over state-of-the-art linear models. The capacity of the model to explain its predictions is also exemplified.

## An HMM-based formulation for automatic derivation of phone-like subword units and pronunciation lexicon development (Marzieh Razavi, IDIAP)

State-of-the-art automatic speech recognition (ASR) and text-to-speech synthesis (TTS) systems use phonemes/phones as subword units. This necessitates availability of a phonetic lexicon that maps each word to a sequence of phonemes. Development of phonetic lexicons requires language-specific linguistic expertise and resources, which may not be available for all languages. This talk will present an HMM-based formulation, recently developed at Idiap, for automatic derivation of "phone-like" subword units and development of phonetic lexicon using acoustic resources of the target language. Briefly, in this formulation, first the subword units are automatically extracted by clustering context-dependent grapheme units using acoustic data. Then, the associated pronunciations are inferred using a recently proposed acoustic data-driven G2P approach (originally developed at Idiap). We will demonstrate the potential application of this approach on Scottish Gaelic, a genuinely under-resourced and endangered European language.

## Data-driven surface normal learning for accurate multi-view stereo reconstruction (Silvano Galliani, ETHZ)

Automatic reconstruction of 3D depth from multiple images of a scene has been a main goal of photogrammetry and computer vision over the last 30 years, and an active research topic. By finding corresponding parts of a scene captured by different cameras it is possible to triangulate points and obtain 3D information about an observed scene. In the literature most approaches try to match corresponding points of an image in a purely geometric fashion, while trying to be invariant to illumination effects.

The implicit assumption is that shading variations are a perturbation of the ideal conditions for image acquisition. Therefore, shading is usually excluded, as it is considered noise. Nevertheless, in the same way that humans perceive depth also from local shading, brightness variations due to shading effects provide information about the local shape and can serve as an additional cue for 3D geometry reconstruction.

In this talk we will show how it is possible to improve the 3D depth reconstruction of a scene from multiple stereo images by learning the relation of the surface normal to its shading information.

## Learning Robot Manipulation Tasks with Tied Gaussian Mixture Models (Ajay Tanwani, IDIAP)

Gaussian mixture models tend to overfit high-dimensional data when only few datapoints are available. In this work, we leverage upon the spatial and temporal correlations characterizing robot manipulation movements to robustly encode the data with a fewer number of model parameters. We exploit a variant of Gaussian mixture model that ties the covariance matrices in the mixture with common basis vectors/synergistic directions (recurrent coordination patterns in the movement), instead of estimating separately full covariance matrices for each cluster in the mixture. This allows the reuse of the discovered synergies in different parts of the movement having similar coordination patterns. I will show how task-parameterization and hidden semi-Markov model encoding of the approach allows the robot to adapt to changing manipulation scenarios in an autonomous manner. Experiments with whole body motion capture data, and valve opening task with the Baxter robot will be presented. The experiments reveal similar performance to that of a standard Gaussian mixture model with full covariance matrices, albeit having much less parameters and better generalization capability.

## Fast computation of adversarial examples in convolution networks (Seyed-Mohsen Moosavi-Dezfooli, EPFL)

The problem of robustness of convolution networks has attracted a lot of attention in the last couple of years. While such networks are robust to random noise, they exhibit an intriguing behavior to what is called the "adversarial examples". These examples basically represent very small perturbations of data samples, that are sufficient to change the decision of the classifier. In this talk, we introduce a fast method to compute the adversarial examples, which can then be used to study the robustness of neural networks. We further discuss the properties of classifiers in light of these adversarial examples.

## Scalable Metric Learning via Weighted Approximate Rank Component Analysis (Cijo Jose, IDIAP)

Our goal is to learn a Mahalanobis distance by minimizing a loss defined on a weighted sum of precisions at different ranks. Our core motivation is that such a criterion is natural when the rank of the first correct match is what matters, for instance for person re-identification.

For this purpose, we propose a novel metric learning formulation called WARCA (Weighted Approximate Rank Component Analysis), and derive a scalable stochastic gradient descent algorithm for the resulting learning problem. We also derive an efficient kernelized version of WARCA which decouples the training and prediction costs from the problem dimension, and enables us to plug in distance measures which are more natural for the features at hand.

We validate this new method on nine standard person re-identification datasets including two large scale Market-1501 and CUHK03 datasets and show that we improve upon the current state of the art methods on all of them.

# Previous years

The abstracts and slides from the previous years are available: