# Swiss Machine Learning Day 2013

The workshop took place at EPFL, room BC 420, on Wednesday Nov 13th, 2013.

- 9:30 — 10:00
- Hypothesis Transfer Learning (Ilja Kuzborskij, slides)
- 10:00 — 10:30
- Support vector algorithms for gradient observations (Ashwini Shukla, slides)
- 10:30 — 11:00
- Non-linear Sparse Subspace Clustering (Hua Gao)
- 11:00 — 11:15
- Coffee break
- 11:15 — 11:45
- Between Online and Offline Ensemble Learning (Leonidas Lefakis, slides)
- 11:45 — 12:15
- Non-Linear Domain Adaptation with Boosting (Carlos Becker, slides)
- 12:15 — 14:00
- Lunch break
- 14:00 — 14:30
- Scalable Bayesian Inference for Preference and Discrete Choice Models (Young-Jun Ko, slides)
- 14:30 — 15:00
- Learning Aspects and Ratings with Multiple-Instance Regression from Text (Nikolaos Pappas, slides)
- 15:00 — 15:30
- Toward End-to-end Phoneme Sequence Recognition using Convolutional Neural Networks (Dimitri Palaz, slides)
- 15:30 — 15:45
- Coffee break
- 15:45 — 16:15
- Launch Hard or Go Home! Predicting the Success of Kickstarter Campaigns (Vincent Etter, slides)
- 16:15 — 16:45
- Bob: A Free Library for Reproducible Machine Learning (André Anjos and Laurent El-Shafey, slides)

## Abstracts

### Hypothesis Transfer Learning

Speaker: Ilja Kuzborskij

A standard assumption in the supervised machine learning is training and testing samples drawn from the same probability density. When one believes that this assumption has been violated, one might resort to the Domain Adaptation methods, designated to find the hypothesis performing well on the target domain (testing), but induced from the source domain (training). However, given the large or expanding source domain, most Domain Adaptation algorithms become computationally misfit.

We approach this problem through the supervised transfer learning scenario, where the learner does not have access to the source domain directly, but rather operates on the basis of the hypotheses induced from it -- the Hypothesis Transfer Learning (HTL) problem. We study generalization guarantees of a Regularized Least-Squares under such a scenario, and demonstrate the theory explaining its remarkable success that has been observed empirically in the past. As a motivating example, we also propose a class-incremental multiclass HTL algorithm that successfully leverages on the source hypotheses in the task of visual object categorization.

### Support vector algorithms for gradient observations

Speaker: Ashwini Shukla

Support Vector Machines (SVM) have been used extensively for function approximation when function values (labels in classification, scalars in regression) at specific inputs are available for training. In this work, we re-formulate the classical SVM with gradient constraints using the KKT machinery and Lagrange duality and show that the resulting optimization retains many desirable properties of the classical SVM. We apply this framework learning scheme to a synthetic example to highlight the ability of our model to leverage the gradient information for statistically improving the testing error. We also present two real-world applications a) Implicit surface reconstruction where the gradient information appears in the form of surface normals and b) Modeling of multiple-attractor dynamical system where the gradient information appears in the form of desired velocities. These applications demonstrate the range of applications that can be addressed within our framework.

### Bob: A Free Library for Reproducible Machine Learning

Speaker: André Anjos and Laurent El-Shafey

Bob (http://www.idiap.ch/software/bob) is a free machine learning and signal processing library.

This is a collaborative, easy to use and extensible toolbox, that provides both efficient implementations of several machine learning algorithms ( Multi-Layer Perceptrons, Support Vector Machine, K-mean, Gaussian Mixture Modeling, Joint Factor Analysis, I-Vectors, Probabilistic Linear Discriminant Analysis, ... ) as well as a framework to help researchers to publish reproducible research, thanks to its concept of satellite packages.

### Launch Hard or Go Home! Predicting the Success of Kickstarter Campaigns

Speaker: Vincent Etter

Crowdfunding websites such as Kickstarter are becoming increasingly popular, allowing project creators to raise hundreds of millions of dollars every year. However, only one out of two Kickstarter campaigns reaches its funding goal and is successful. It is therefore of prime importance, both for project creators and backers, to be able to know which campaigns are likely to succeed.

We propose a method for predicting the success of Kickstarter campaigns by using both direct information and social features. We introduce a first set of predictors that uses the time series of money pledges to classify campaigns as probable success or failure and a second set that uses information gathered from tweets and Kickstarter's projects/backers graph. We show that even though the predictors that are based solely on the amount of money pledged reach a high accuracy, combining them with predictors using social features enables us to improve the performance significantly. In particular, only 4 hours after the launch of a campaign, the combined predictor reaches an accuracy of more than 76% (a relative improvement of 4%).

### Between Online and Offline Ensemble Learning

Speaker: Leonidas Lefakis

We propose to train an ensemble with the help of a reservoir of samples in which the learning algorithm can store a limited number of samples. This novel approach can be seen in the area between offline and online ensemble approaches, either as a restriction of the first, or as an enhancement of the latter. We first identify some basic strategies that can be used to populate this reservoir, and our main contribution is a more sophisticated method, dubbed Greedy Edge Expectation Maximization (GEEM), that efficiently maintains the reservoir content by viewing the samples through their projections into the weak classifier response space. We propose an efficient algorithmic implementation which makes it tractable in practice, and demonstrate its efficiency experimentally on several compute-vision data-sets, on which it outperforms state-of-the-art existing methods.

### Learning Aspects and Ratings with Multiple-Instance Regression from Text

Speaker: Nikolaos Pappas

A great number of online user-generated texts (e.g. reviews, comments) are accompanied by numerical ratings, either a single overall score, or a set of aspect scores, or both. Understanding the aspects and the attitudes of the users (in the form of ratings) towards them, may help to better explain and model their preferences. We cast the problem of learning the aspects and ratings from text as a multiple-instance regression (MIR) problem. MIR is a variant of multiple regression in which each data point (bag of instances) may be described by more than one vector of values (instances) for the independent variables. The MIR formulation fits naturally to the class of NLP problems where the labels that are given by annotators are usually 'soft', because texts can be decomposed to finer representations (documents to paragraphs, paragraphs to sentences). So far, this perspective has not been yet explored.

We solve the learning problem by making the hypothesis that not all the instances are equally responsible for the prediction of the bag labels, proposing an alternating projections method which is able to predict the instance-level labels, while being able to predict of unlabelled bags as well. The method has potential applications to opinion summarization, text segmentation, sentiment analysis and influence detection.

### Toward End-to-end Phoneme Sequence Recognition using Convolutional Neural Networks

Speaker: Dimitri Palaz

Most phoneme recognition state-of-the-art systems rely on a classical neural network classifiers, fed with highly tuned features, such as MFCC or PLP features. Recent advances in ``deep learning'' approaches questioned such systems, but while some attempts were made with simpler features such as spectrograms, state-of-the-art systems still rely on MFCCs. This might be viewed as a kind of failure from deep learning approaches, which are often claimed to have the ability to train with raw signals, alleviating the need of hand-crafted features. In this paper, we investigate a convolutional neural network approach for speech signals. While convolutional architectures got tremendous success in computer vision or text processing, they seem to have been let down in the past recent years in the speech processing field. We show that it is possible to learn an end-to-end phoneme classifier system directly from raw signal, with similar performance on the TIMIT and WSJ datasets than existing systems based on MFCC, questioning the need of complex features on large datasets. In addition, convolutional architectures are shown to improve performance of MFCC-based systems.

### Non-Linear Domain Adaptation with Boosting

Speaker: Carlos Becker

A common assumption in machine vision is that the training and test samples are drawn from the same distribution. However, there are many problems when this assumption is grossly violated, as in bio-medical applications where different acquisitions can generate drastic variations in the appearance of the data due to changing experimental conditions. This problem is accentuated with 3D data, for which annotation is very time-consuming, limiting the amount of data that can be labeled in new acquisitions for training. In this talk we present a multi-task learning algorithm for domain adaptation based on boosting. Unlike previous approaches that learn task-specific decision boundaries, our method learns a single decision boundary in a shared feature space, common to all tasks. We use the boosting-trick to learn a non-linear mapping of the observations in each task, with no need for specific a-priori knowledge of its global analytical form. This yields a more parameter-free domain adaptation approach that successfully leverages learning on new tasks where labeled data is scarce. We evaluate our approach on two challenging bio-medical datasets and achieve a significant improvement over the state of the art.

### Scalable Bayesian Inference for Preference and Discrete Choice Models

Speaker: Young Jun Ko

An increasing number of datasets in economics, e-commerce, collaborative filtering or e-learning are collected in order to elicit people's preferences and to understand how they make choices. A general approach is to model each user's utility across items of choice. A challenge in this context is how to transfer information between different users, which is essential in order to cope with the high degree of sparsity in real-world choice datasets. Borrowing from matrix factorization and Gaussian process preference learning, we propose a novel utility model which combines latent factors with nonparametric processes. While expressive, our model admits an efficient inference algorithm, whose expectation-maximization structure allows for parallel computation in the map-reduce framework.

### Non-linear Sparse Subspace Clustering

Speaker: Hua Gao

In many real-world problems, we have to deal with high-dimensional data which lie on low-dimensional subspaces corresponding to different classes in the data. The clustering of a union of linear subspaces has received a great attention these last years and many algorithms have been proposed. However, the assumption of union of linear subspaces is very restrictive and often violated in real cases with data living on non-linear manifolds: faces under various poses, various expressions,...

We propose a non-linear subspace clustering algorithm building on the framework of the Sparse Subspace Clustering (SSC). The algorithm performs the sparse optimization in a Hilbert space to get back the coefficients enabling finally spectral clustering. As SSC, it handles noise, outliers as well as missing data. Experiments on different real world datasets shows the effectiveness of the proposed algorithms.

Please contact François Fleuret and Matthias Seeger for more information.