Large neural networks, and more generally so-called "deep models" are currently the most efficient technical solution for processing high-dimension natural signals such as images and videos.
They have been put to use in particular for the general task of "scene understanding", which aims at automatically extracting a semantic description of an image or a video as an arrangement of components identified and localized. This can now be done at a level of performance that seemed unreachable five years ago, which opens the way to automatizing many complex tasks, such as content-based image and video retrieval, event detection, and autonomous driving or flying.
This performance is reached under two heavy requirements: First, the training of the models requires large-scale annotated data-sets, and second, the training and inference are computationally extremely demanding, requiring often several millions of oating-point operations per pixel.
The objective of this project is to address both issues to improve the performance of autonomous flying and wheeled drones in their context of use.