The following MUCATAR related demos are available for public viewing:


1. HEAD POSE TRACKING

Head pose tracking is  a challenging task.  Our methodology performs jointly head tracking and pose estimation
using a mixed-state particle filter framework.  The demos show sample results of our algorightm presented in [4].  In
the demos, the green box represents the estimated head location and the green arrow gives the estimated direction
where the head is pointing.

AgnesTextCol.avi
OlivierTextCol.avi



2. SPEECH ACQUISITION IN MEETINGS WITH AN AUDIO-VISUAL SENSOR ARRAY

Tracking speakers in multiparty conversations constitutes a fundamental task for automatic meeting analysis. In this paper, we present a novel probabilistic approach to jointly track the location and speaking activity of multiple speakers in a multisensor meeting room, equipped with a small microphone array and multiple uncalibrated cameras. Our framework is based on a mixed-state dynamic graphical model defined on a multiperson state-space, which includes the explicit definition of a proximity-based interaction model. Approximate inference in our model, needed given its complexity, is performed with a Markov Chain Monte Carlo particle filter (MCMC-PF), which results in high sampling efficiency. Our framework integrates audio-visual (AV) data through a novel observation model. Audio observations are derived from a source localization algorithm. Visual observations are based on models of the shape and spatial structure of human heads. We present results -based on an objective evaluation procedure- that show that our framework (1) is capable of locating and tracking the position and speaking activity of multiple meeting participants engaged in real conversations with good accuracy; (2) can deal with cases of visual clutter and occlusion; and (3) significantly outperforms a traditional sampling-based approach.
http://www.idiap.ch/~gatica/icme05.html

3. AUDIO-VISUAL PROBABILISTIC TRACKING OF MULTIPLE SPEAKERS IN MEETINGS

Tracking speakers in multiparty conversations constitutes a fundamental task for automatic meeting analysis. In this paper, we present a novel probabilistic approach to jointly track the location and speaking activity of multiple speakers in a multisensor meeting room, equipped with a small microphone array and multiple uncalibrated cameras. Our framework is based on a mixed-state dynamic graphical model defined on a multiperson state-space, which includes the explicit definition of a proximity-based interaction model. Approximate inference in our model, needed given its complexity, is performed with a Markov Chain Monte Carlo particle filter (MCMC-PF), which results in high sampling efficiency. Our framework integrates audio-visual (AV) data through a novel observation model. Audio observations are derived from a source localization algorithm. Visual observations are based on models of the shape and spatial structure of human heads. We present results -based on an objective evaluation procedure- that show that our framework (1) is capable of locating and tracking the position and speaking activity of multiple meeting participants engaged in real conversations with good accuracy; (2) can deal with cases of visual clutter and occlusion; and (3) significantly outperforms a traditional sampling-based approach.

http://www.idiap.ch/~gatica/av-tracking-multiperson.html


4. TRACKING A VARIABLE NUMBER OF OBJECTS USING TRANS-DIMENSIONAL MCMC SAMPLING

The following demonstrations show a multi-target tracking system built in a Bayesian
framework capable of tracking varying numbers of objects.  This framework uses a joint
multi-object state-space formulation and a trans-dimensional Markov chain Monte Carlo
(MCMC) particle filter to recursively estimate the multi-object configuration.  Novel color
and binary measurements capable of discruminating between different numbers of targets
are employed.  These demos inlude work from [5,6].

Tracking 4 occluding objects (with evaluation): 4objrun1.avi
Tracking 2 occluding objects (with evaluation): 2objrun7.avi

5. TRACKING  USING  MOTION  LIKELIHOOD  AND  MOTION  PROPOSAL  MODELING

http://www.idiap.ch/~odobez/IPpaper/EmbeddingMotion.html



6. SAMPLING METHODS

A comparison of multi-object tracking for a standard particle filter (PF), a particle
filter using partitioned sampling (PS) sampling techniques, and a particle filter with
distributed partitioned sampling (DPS) sampling techniques.  The following MPG
files are video sequences from the 5 first runs of 50 tracking multiple object using
the above techniques.  The first five sequences track with a standard PF, the second
five track with PS, the third track with DPS.  Each sequence is separated by a blank
yellow frame.  These videos demonstrate work from [9].  View them here:

Synthetic Sequence - seven objects are tracked in this synthetic sequence.  A distracting
blue object appears over the true blue object to fool the tracker.  This sequence is designed
to measure the trackers ability to recover from distraction.

Real Sequence 1 - Three people are tracked.  As one passes behind another, he is occluded.
This sequence is used to measure the trackers ability to recover from occlusion.

Real Sequence 2 - Three people are tracked.  As one passes behind another, he is occluded
(for a longer duration than in the first sequence).  This sequence is used to measure the
trackers ability to recover from occlusion.
 

7. HEAD TRACKING AND POSE ESTIMATION

These demos show a video sequence result of head tracking and pose estimation
with a mixed-state particle filter (PF).  Each head is represented by a spatial
configuration and a examplar-based head model.  These videos demonstrate work
presented in [8].  View them here:

Head Tracking Sequence 1 -  the head of a person is tracked and his head pose is
estimated.  Two clocks at the side of the image indicate the pan angle (1st clock) and
the tilt angle (2nd clock).
 

Head Tracking Sequence 2 -  the head of a person is tracked and his head pose is
estimated.  Two clocks at the side of the image indicate the pan angle (1st clock) and
the tilt angle (2nd clock).
 

8. AV TRACKING

For results on AV tracking (from [12,15]), refer to the following web sites:

http://www.idiap.ch/~gatica/av-tracking.html
http://www.idiap.ch/~gatica/av-tracking-multicam.html