Code

This page lists various software projects that I have worked on. I list mostly research or machine learning related projects. The order is arbitrary with a hint of date based sorting.

For my latest Deep Learning related research I use Keras because I like choice and Keras can be used both as an abstraction layer over Theano, Tensorflow and CNTK and a symbolic graph creating library for the creation of arbitrarily complex neural networks.

Importance Sampling

PyPI - Github - Docs

Deep learning models spend countless GPU/CPU cycles on trivial, correctly classified examples that do not individually affect the parameters; keras-importance-sampling is a python library that accelerates the training of arbitrary neural networks create with Keras using importance sampling.

The code was developed to support our 2018 ICML publication Not All Samples Are Created Equal: Deep Learning with Importance Sampling.

Transparent Keras

PyPI - Github

Transparent Keras aims to provide a very simple way to look under the hood during training of Keras models by defining an extra set of outputs that will be returned by train_on_batch or test_on_batch.

Local Feature Aggregation

PyPI - Github

A library that implements methods to aggregate local features (mainly for multimedia) into a single global feature that can be used easily with any classifier.

The library provides scikit-learn BaseEstimators for BOW, VLAD and Fisher Vectors and was used in our 2017 publication Learning Local Feature Aggregation Functions with Backpropagation.

LDA++

Homepage - Research page

LDA++ is a C++ library and a set of accompanying console applications that enable the inference of various Latent Dirichlet Allocation models.

It was used in the 2016 ACM-MM publication of Fast Supervised LDA and in the research page we provide data and instructions to reproduce our results.

NlpTools (PHP)

Homepage - packagist

NlpTools is a fairly old NLP library written in PHP. It implements a lot general purpose machine learning components, such as classifiers and clustering, in a way that optimizes for readability and extensibility.

Although it is probably not the best choice for deployment to a massively used website, it is widely used with +65K downloads in packagist and +400 stars at Github.