Computational Reduction for Training and Inference

This project is a follow-up to the ISUL project, to fund the 4th year of two ongoing PhD theses, and open a new sub-project to investigate a very promising topic that spanned from the research we have conducted, but is too rich to be tackled in the context of the two already running theses. The ISUL project aimed at developing novel machine-learning algorithms to address two fundamental issues with modern techniques: their need for both very large data corpora and heavy computation. We have developed a series of methods that allow the transfer of structures from an existing network to facilitate the training of a new one, on a different task, for which few data examples are available. Our approaches rely on mimicking the behavior of the existing network not only point-wise, but also in term of local changes. We have in parallel developed techniques that reduce the computational cost of training and inference by relying heavily on sampling to approximate dense weighted averaging. We structure this new proposal in three sub-projects: The first sub-project will continue our work on transfer learning first by improving the optimization itself, as we observed that the complexity of the underlying optimization problem is key. Additionally, we will consider using deep generative models to produce synthetic data capturing the joint distribution of the signal components. We can see their use as a Monte-Carlo generalization of our approaches based on first order derivatives to an arbitrary order. The second sub-project will extend our line of research on sampling for gradient descent and inference. We have recently investigated the use of sampling during inference, and shown that end-to-end gradient-based learning can be generalized to such a context. Our current algorithm relies on sampling an image at a fixed scale to reject poorly informative parts, and does not take into account that different scales may lead to different statistics. This is what we are planning next. From there, we are envisioning a generalization to sampling the model itself, looking jointly at parts of the model and parts of the signal, and sample along both axes jointly. This can be seen as a data-driven adaptive dropout, that modulates the computation required for a given level of accuracy. Finally, the third project will initiate a new line of work whose objective is to combine model-selection and training into a unified forward generation of a model, avoiding at the same time the costly back-propagation of the gradient, and a grid-search for the optimization of meta-parameters. The key motivation behind this new direction is the view of a deep model as a progressive refinement of an internal representation, combined with methods based on information theory that provide criteria to assess if the change occurring at a certain level of an architecture is beneficial to the overall task at hand. Our objective is to leverage these tools and reformulate explicitly the training of a model as the progressive design of a topological deformation of the feature space in low dimension, to avoid back-propagation and gradient descent.
Swiss National Science Foundation
Mar 01, 2020
Feb 28, 2022