Swiss Machine Learning Day 2014

The Swiss Machine Learning Day is a one-day workshop organized every year since 2012, which aims at bringing together Swiss researchers working on topics related to machine learning.

The workshop took place at EPFL on Friday, Oct 24th, 2014.

Program

10:00–10:25: “Adding structure to VAR models for forecasting multiple time series.” (Magda Gregorova – UNIGE, slides)
10:25–10:50: “Resource Optimized Speech Recognition using Kullback-Leibler Divergence based HMM” (Ramya Rasipuram – IDIAP, slides)
10:50–11:15: Coffee break
11:15–11:40: “Leveraging from the NIST i-vector machine learning challenge” (Elie Khoury – IDIAP, slides)
11:40–12:05: “Discovering Primitive Motions from Unstructured Heterogeneous Demonstrations” (Nadia Barbara Figueroa Fernandez – EPFL)
12:05–12:30: “Mining Democracy” (Julien Herzen – EPFL, slides)
12:30–13:45: Lunch break
13:45–14:10: “Active Learning for Biomedical Image Segmentation” (Ksenia Konyushkova – EPFL, slides)
14:10–14:35: “Learning Visual Representations for Maya Glyphs” (Gulcan Can – IDIAP)
14:35–15:00: “Dictionary learning for fast classification based on soft-thresholding” (Alhussein Fawzi – EPFL, slides)
15:00–15:25: “Efficient mining of hard examples for object detection” (Olivier Canévet – IDIAP, slides)
15:25–15:45: Coffee break
15:45–16:10: “Incremental Learning of NCM Forests for Large-Scale Image Classification” (Marko Ristin – ETHZ, slides)
16:10–16:35: “High-Dimensional Inference” (Dezeure Ruben – ETHZ, slides)
16:35–17:00: “Matrix completion on graphs” (Vassilis Kalofolias – EPFL, slides)

Contact François Fleuret and Pierre Vandergheynst for any question.

Abstracts

Adding structure to VAR models for forecasting multiple time series

(Magda Gregorova – UNIGE)

Standard vector autoregressive models (VAR) for large multivariate time series notoriously suffer from the high-dimensionality-small-sample-size problem. Shrinkage and thresholding methods along the lines of ridge or lasso have recently been explored to allow for modelling of bigger time series systems. We present new methods that emphasize and extract specific structures in the models which arise naturally from grouping the covariates in the VARs by the time series from which they originate. We apply parameter shrinkage and thresholding on groups defined over both: tasks and variables. Following on multi-task learning methods we further enforce model similarity between the tasks while we differentiate between own history and the history of others in the system when discovering common features. By doing so, we aim at improving forecasting performance and interpretability of the models.

Resource Optimized Speech Recognition using Kullback-Leibler Divergence based HMM

(Ramya Rasipuram – IDIAP)

One of the key challenges involved in building statistical automatic speech recognition (ASR) systems is modeling the relationship between subword units and acoustic feature observations. To model this relationship two main resources are required: speech data with word level transcriptions and a pronunciation dictionary where each word is transcribed in terms of basic sound units of a language, i.e., phones or phonemes. The creation of these two resources for any language is expensive and time consuming. The development of ASR systems for the resource-rich languages (such as English or French) is less constrained by this issue. However, for under-resourced languages that lack proper resources, the above issue is the major bottleneck.

In this talk, I will introduce the Kullback-Leibler divergence based hidden Markov model (KL-HMM) approach developed recently at Idiap research institute. In the KL-HMM approach, phoneme class conditional probabilities estimated by an artificial neural network are directly used as feature observations to train an HMM and the HMM states are parametrized by trained categorical distributions. The two main advantages of the KL-HMM approach are: (1) It facilitates sharing of resources and models from resource-rich languages and (2) Requires fewer or even zero conventional resources from the under-resourced language. Experimental studies on various ASR tasks show that the KL-HMM approach is capable of addressing the challenges related to building ASR systems for under-resourced languages as the demand for resources from the language can be considerably reduced.

Leveraging from the NIST i-vector machine learning challenge

(Elie Khoury – IDIAP)

Modern speaker recognition systems typically rely on several handcrafted preprocessing or feature extraction techniques, and involve speech corpus engineering. This required knowledge often prevents researchers outside the audio processing community to be involved in speaker recognition evaluations (SREs). To foster the interest of a wider range of researchers and, e.g., the application of recent advances in machine learning, NIST has organized a novel benchmark, the NIST i-vector Machine Learning Challenge 2014. In contrast to previous NIST SREs, this challenge relies on the i-vector paradigm, which is widely used by state-of-the-art speaker recognition systems. By providing such i-vectors directly instead of audio data, this benchmark is accessible to participants outside the audio processing community.

In this talk we will present the i-vector paradigm, together with the speaker detection task defined for the NIST i-vector machine learning challenge. We will then describe our most successful findings including the use of Probabilistic Linear Discriminant Analysis to cluster unlabeled training data.

Discovering Primitive Motions from Unstructured Heterogeneous Demonstrations

(Nadia Barbara Figueroa Fernandez – EPFL)

We consider the problem of automatically segmenting and grouping primitive motions extracted from multiple unsegmented or possibly incomplete tasks (unstructured) under different frames of reference or from different sources (heterogeneous). Our framework discovers primitive motions by modeling the rigid body motion of a robot or human end-effector from continuous task demonstrations, as sequences of switching time-independent multivariate Gaussian distributions of a unique invariant structure. We use a Bayesian nonparametric approach, the Beta Process Hidden Markov Model (BP-HMM), to jointly segment unstructured heterogeneous demonstrations of complex tasks and extract a weakly grouped set of primitive motions. This set is weak as the BP-HMM is unable to group similar motions under variable constraints (i.e. rotation, translation and scaling). We extend the functionality of the BP-HMM by introducing the Spectral Polytope Covariance Matrix (SPCM) similarity function to group structurally similar motions subject to these variable constraints. Our similarity function is based on the assumption that two covariance matrices are similar if their exists a unified homothetic ratio between their spectral polytopes. We validate our framework on a toy dataset of primitive motions and on real human motion data of a complex task.

Mining Democracy

(Julien Herzen – EPFL)

Switzerland has a long tradition of direct democracy, which makes it an ideal laboratory for research on real-world politics. Similar to recent open government initiatives launched worldwide, the Swiss government regularly releases datasets related to state affairs and politics. In this talk, we propose an exploratory, data-driven study of the political landscape of Switzerland, in which we use opinions expressed by candidates and citizens on a web platform during the recent Swiss parliamentary elections, together with fine-grained vote results and parliament votes.

Following this purely data-driven approach, we show that it is possible to uncover interesting patterns that would otherwise require both tedious manual analysis and domain knowledge. In particular, we show that traditional cultural and/or ideological idiosyncrasies can be highlighted and quantified by looking at vote results and pre-election opinions. We propose a technique for comparing the candidates' opinions expressed before the elections with their actual votes cast in the parliament after the elections. This technique spots politicians that do not vote consistently with the opinions that they expressed during the campaign. We also observe that it is possible to predict surprisingly precisely the outcome of nationwide votes, by looking at the outcome in a single, carefully selected municipality. Our work points to some of the avenues created by user-generated data emerging from open government initiatives.

Active Learning for Biomedical Image Segmentation

(Ksenia Konyushkova – EPFL)

In today's world of popular concept of "Big Data", utilizing huge datasets for solving problems with Machine Learning seems to be natural and many recent algorithms require tones of data to be trained properly. However, in many applications even through it is easy to find sources of unlabelled data, annotations are still hard and expensive to obtain. Some common annotation tasks can be solved with the help of crowd sourcing, though there are areas, where annotating requires special training and education. For example, in biomedical tasks only an expert can distinguish organelles in electron microscopy imaging.

We are interested in the problem of biomedical image segmentation, applying statistical learning methods for the task of classification. The general problem of Active Learning can be formulated as follows. The sufficient amount of unlabelled data from the domain of interest is available, but only several instances are labelled. Though domain experts are at our disposal, their labour is expensive and we would like to avoid querying them whenever possible. Our task is to make use of our resources as efficient as possible in a learning task.

Learning Visual Representations for Maya Glyphs

(Gulcan Can – IDIAP)

The ancient Maya writing system is highly visual and complex. Some of the challenges about Maya glyphs, such as erosion, occlusions, and the inherent visual richness of the glyphs themselves, make automatic recognition tasks hard. Maya experts can indicate the diagnostic or varying parts of the deciphered glyphs. However, to implement a glyph recognition system based on expert knowledge is tedious and is not scalable to the large sets of glyphs. For addressing these problems, we propose to learn the visual representations with a sparse auto-encoder framework automatically. We compare the performance of the learnt representations with the histogram of orientations shape context descriptor in glyph classification task.

Dictionary learning for fast classification based on soft-thresholding

(Alhussein Fawzi – EPFL)

Classifiers based on sparse representations have recently been shown to provide excellent results in many visual recognition and classification tasks. However, the high cost of computing sparse representations at test time is a major obstacle that limits the applicability of these methods in large-scale problems, or in scenarios where computational power is restricted.

We consider a simple yet efficient alternative to sparse coding for feature extraction. We study a classification scheme that applies the soft-thresholding nonlinear mapping in a dictionary, followed by a linear classifier. A novel supervised dictionary learning algorithm tailored for this low complexity classification architecture is proposed. The dictionary learning problem, which jointly learns the dictionary and linear classifier, is cast as a difference of convex (DC) program and solved efficiently with an iterative DC solver. We conduct experiments on several datasets, and show that our learning algorithm that leverages the structure of the classification problem outperforms generic learning procedures. Our simple classifier based on soft-thresholding also competes with the recent sparse coding classifiers, when the dictionary is learned appropriately. The adopted classification scheme further requires less computational time at the testing stage, compared to other classifiers. The proposed scheme shows the potential of the adequately trained soft-thresholding mapping for classification and paves the way towards the development of very efficient classification methods for vision problems.

Efficient mining of hard examples for object detection

(Olivier Canévet – IDIAP)

We investigate techniques for bootstrapping hard samples efficiently in the context of object detection. The approaches we have developed follow two main axes. The first consists of simply using the response of the available predictor to discard a large area around collected false positives, so that we do not harvest samples which are redundant for the training. Our second approach leverages a hierarchical procedure to distill large sample sets into sets of small size which are as informative as possible regarding the training of the predictor. We demonstrate the performance of both approaches experimentally for face and pedestrian detection.

Incremental Learning of NCM Forests for Large-Scale Image Classification

(Marko Ristin – ETHZ)

In recent years, large image data sets such as "ImageNet", "TinyImages" or ever-growing social networks like "Flickr" have emerged, posing new challenges to image classification that were not apparent in smaller image sets. In particular, the efficient handling of dynamically growing data sets, where not only the amount of training images, but also the number of classes increases over time, is a relatively unexplored problem. To remedy this, we introduce Nearest Class Mean Forests (NCMF), a variant of Random Forests where the decision nodes are based on nearest class mean (NCM) classification. NCMFs not only outperform conventional random forests, but are also well suited for integrating new classes. To this end, we propose and compare several approaches to incorporate data from new classes, so as to seamlessly extend the previously trained forest instead of re-training them from scratch. In our experiments, we show that NCMFs trained on small data sets with 10 classes can be extended to large data sets with 1000 classes without significant loss of accuracy compared to training from scratch on the full data.

High-Dimensional Inference

(Dezeure Ruben – ETHZ)

Sometimes we are not so much interested in prediction or point estimation but rather in assigning uncertainty and significance. For high-dimensional linear models this has been an open problem for quite a while. The last couple of years there have been many proposals that claim nice theoretical properties and good empirical performance.

We did the (to our knowledge) first broad empirical comparison study of some of the most viable alternatives. In addition, we compiled the implemented code in an easy-to-use R package which allows researchers to reproduce our results as well as apply the methods to their own datasets.

Matrix Completion on Graphs

(Vassilis Kalofolias – EPFL)

The problem of finding the missing values of a matrix given a few of its entries, called matrix completion, has gathered a lot of attention in the recent years. Although the problem is NP-hard, Candes and Recht showed that it can be exactly relaxed if the matrix is low-rank and the number of observed entries is sufficiently large. In this work, we introduce a novel matrix completion model that makes use of proximity information about rows and columns by assuming they form communities. This assumption makes sense in several real-world problems like in recommender systems, where there are communities of people sharing preferences, while products form clusters that receive similar ratings. Our main goal is thus to find a low-rank solution that is structured by the proximities of rows and columns encoded by graphs. We borrow ideas from manifold learning to constrain our solution to be smooth on these graphs, in order to implicitly force row and column proximities. Our matrix recovery model is formulated as a convex non-smooth optimization problem, for which a well-posed iterative scheme is provided. We study and evaluate the proposed matrix completion on synthetic and real data, showing that the proposed structured low-rank recovery model outperforms the standard matrix completion model in many situations.