Idiap on LinkedIn Idiap youtube channel Idiap on Twitter Idiap on Facebook
Outils personnels
Vous êtes ici : Accueil Research Research Themes Machine Learning

Machine Learning

Research in machine learning aims at developing computer programs able to learn from examples. Instead of relying on a careful tuning of parameters by human experts, machine learning techniques use statistical methods to directly estimate the optimal setting, which can hence have a complexity beyond what is achievable by human experts.



Torch is a machine learning library which aims at including state-of-the-art algorithms. Torch5 is the last version of Torch. It provides a Matlab-like environment for state-of-the-art machine learning algorithms. It is easy to use and provides a very efficient implementation, thanks to an easy and fast scripting language (Lua) and a underlying C implementation. It is distributed under a BSD license. Torch3 was the previous official version. It was written completely in C++.
Ronan Collobert


Statistical ML

Learning techniques can be organized in two broad families. The first one comprises generative models based on classical statistical modeling. They allow experts to combine easily different modalities, together with additional source of information, such as known time or spatial consistency. The second group includes discriminative methods able to directly make a prediction, without the need for a full understanding of the data to tackle. Such strategies are based on artificial neural networks, kernel techniques, and ensemble methods able to combine train multiple predictors jointly.

Contact: François Fleuret, Ronan Collobert


Computational efficiency, targeting real-time applications

Many applications of machine learning, such as biometry, spam filtering or speech recognition, require the prediction to be done under strict computational constraints. Such constraints can be meet by using lazy approaches, which restrict the computation adaptively, investing more resources on difficult data, or by simplifying prediction methods through feature selection and approximations.

Contact: François Fleuret


Very large datasets

The sustained increase in storage and computation capabilities has induced the development of new learning techniques able to cope with immense data-sets. While it has been demonstrated both experimentally and theoretically that training set of large size leads to better performance, specific techniques have to be developed to both cope with the relative limited memory of the computer compared with the overall data set, and to exploit the potential statistical complexity of a very large number of example.

Contact: Ronan Collobert François Fleuret


Leveraging Unlabeled Data

Hand-labeling data remains an expensive task in many cases. It motivates research for leveraging the cheap and basically infinite source of unlabeled speech, text or images available in the digital world. Classical semi-supervised learning and transduction are machine learning classification techniques able to handle labeled and unlabeled data, which assume each unlabeled example belongs to one of the labeled classes that are considered. Finding ways to adapt and scale these methods to real large-scale problems is a challenge we are interested in, here at IDIAP. We are also investigating other ways to leverage unlabeled data, like for e.g. transfer learning (a fully unsupervised task can learn interesting representations for a supervised task).

Ronan Collobert

Deep Learning

Real complex tasks require complex learning models. A wide range of approaches can be considered between two extremes: (i) use complex features and a simple learning algorithm, or (ii) use simple features and a complex learning algorithm. Deep architectures are an implementation of approach (ii), which stacks several layers of data representations with an increasing level of abstraction. Training these representations is extremely challenging as it implies training highly non-linear and non-convex models. We are interested in applications in Natural Language, Image and Speech processing.

Ronan Collobert

Information Organization and Retrieval

With fast growing internet resources, automatic information extraction and information organization from documents is crucial concern. Our research aims at marrying natural language processing and information retrieval in this context. It requires not only finding new fast natural language processing algorithms able to scale to billion of documents but also new techniques to implement semantic knowledge in document-query distances.

Contact: Ronan Collobert


Online learning

On-line learning is the process by which a cognitive system learns continuously from experience, updating and enriching its internal models of the environment. This learning mechanism is the main reason why cognitive systems are capable of achieving a robust, yet flexible capability to react to novel stimuli. Realistic domains are highly dynamic and any autonomous system interacting with them, such as a robot, must be able to adapt to a number of changing parameters. For instance, indoor visual place recognition for robot localization suffers from the natural variability of environments in time (varying illumination conditions, objects moved around beause of daily use, rooms redecorated); grasping by imitation requires the ability to recognize a human subject's "style", and to adapt to different objects' shapes and affordances, and so forth. Here at Idiap we work on developing online learning algorithm able to adapt to novel incoming data while achieving optimal performances, keeping the memory and computational complexity under control.

Contact: Barbara Caputo

Actions sur le document