|About Idiap||Research||Publications||Tech Transfer||Apply|
Research in machine learning aims at developing computer programs able to learn from examples. Instead of relying on a careful tuning of parameters by human experts, machine learning techniques use statistical methods to directly estimate the optimal setting, which can hence have a complexity beyond what is achievable by human experts.
Torch is a machine learning library which aims at including state-of-the-art algorithms. Torch5 is the last version of Torch. It provides a Matlab-like environment for state-of-the-art machine learning algorithms. It is easy to use and provides a very efficient implementation, thanks to an easy and fast scripting language (Lua) and a underlying C implementation. It is distributed under a BSD license. Torch3 was the previous official version. It was written completely in C++.
Computational efficiency, targeting real-time applications
Very large datasets
Leveraging Unlabeled Data
Hand-labeling data remains an expensive task in many cases. It motivates research for leveraging the cheap and basically infinite source of unlabeled speech, text or images available in the digital world. Classical semi-supervised learning and transduction are machine learning classification techniques able to handle labeled and unlabeled data, which assume each unlabeled example belongs to one of the labeled classes that are considered. Finding ways to adapt and scale these methods to real large-scale problems is a challenge we are interested in, here at IDIAP. We are also investigating other ways to leverage unlabeled data, like for e.g. transfer learning (a fully unsupervised task can learn interesting representations for a supervised task).
Real complex tasks require complex learning models. A wide range of approaches can be considered between two extremes: (i) use complex features and a simple learning algorithm, or (ii) use simple features and a complex learning algorithm. Deep architectures are an implementation of approach (ii), which stacks several layers of data representations with an increasing level of abstraction. Training these representations is extremely challenging as it implies training highly non-linear and non-convex models. We are interested in applications in Natural Language, Image and Speech processing.
With fast growing internet resources, automatic information extraction and information organization from documents is crucial concern. Our research aims at marrying natural language processing and information retrieval in this context. It requires not only finding new fast natural language processing algorithms able to scale to billion of documents but also new techniques to implement semantic knowledge in document-query distances.