Tracking in the Wild

People tracking is central to many applications ranging from surveillance in complex urban environments to behavioral analysis in cluttered work spaces. However, in spite of years of sustained research, existing approaches can still only operate successfully in constrained environments, such as a sport arena, or for restricted subsets of human activities, such as walking along a city street. The goal of this project is to dramatically broaden the scope of current methods so that the resulting algorithms can be ``taken into the wild,'' that is, to far more unconstrained and generic settings.

To this end, we will build on the multi-camera multi-target approach we have

developed jointly over many years (Fleuret et al., 2008; Berclaz et al., 2011). Recently, we have focused on tracking basketball and soccer players (Ben Shitrit et al., 2011, 2012) and outperform state-of-the-art approaches. However, this is only true in controlled environments, and our current system cannot operate in cluttered real-world public spaces for the following three reasons: First, it requires multiple cameras, each carefully calibrated, and assumes a planar area of interest in which the only moving things are the people. Second, it relies on background subtraction, which is sensitive to global illumination changes, and on appearance models that have to be learned prior to using the system. Third, it cannot leverage sophisticated motion models either at the individual target level or at the group level.

The objective of WildTrack is to eliminate these weaknesses, which will require the joint expertise of all three partners. The research will be organized around a joint benchmarking platform and will be decomposed into the following three sub-projects.

Sub-project 1 - Environment Modeling and Camera Calibration:

Our existing tracking system relies on camera calibration, a planar ground, and

entrances and exits restricted to the edges of the area of interest. It also ignores

potential occluders, such as pillars that limit the field of views of some cameras. This sub-project will rely on Structure-from-Motion (SfM) techniques combined with object class detections and tracking results to produce a more refined 3D model of the environment. This includes breaking loose of planarity restrictions on the ground geometry, obtaining knowledge about typical trajectories, as well as about probable sources and sinks for the moving objects. We will consider both cases where traditional uncalibrated SfM can be applied---sufficiently many cameras with enough fields-of-view overlap---and cases where that is not true and information about object classes and probable trajectories is all the more important to compensate for the failure of normal SfM.

Sub-project 2 - Large-Scale Learning for Detection and Recognition: Our current implementation relies on background-subtraction to detect humans and on crude color-based appearance models to disambiguate difficult tracking situations, neither of which is robust to changes in imaging conditions. This sub-project will

tackle this weakness by learning from large training sets. We will first collect videos with multiple cameras and keep trajectories for which we have high prediction confidence. Data gathered along these trajectories will be used to train predictors to detect moving objects visible in a limited number of views, potentially corrupted by noise. It will also allow the use of transfer learning that will make it possible to learn someone's appearance from a handful of exemplar images.

Sub-project 3 -- Convex High-dimension Tracking: 15.01.2013 13:25:31 Page - 6 -

At present, we characterize people solely by their 2D ground positions, thus ignoring their 3D poses and interactions with each other and inanimate objects. This is sufficient when tracking pedestrians whose range of motion is small but is limiting when dealing with more complex behaviors, such as those of people sitting, standing, or reclining in the course of their daily lives. Removing these limitations will require performing our multi-target tracking in much higher dimensional state-spaces than the ones we have worked with so far, and will be the focus of this sub-project. These objectives require in-depth expertise in many core areas of Computer Vision and Machine Learning, which the three research groups involved in this project possess collectively. As a result, it will improve the state-of-art and yield a people-tracking approach that can truly be deployed in the wild.

Machine Learning
Eidgenoessische Technische Hochschule Zuerich
Swiss National Science Foundation
Jan 01, 2014
Dec 31, 2017