Probabilistic Models: temporal topic models and more

Data Mining in Temporal Documents

Click the next button to go through the animated explanation:

Overall, the goal is to find temporal patterns that are recurrent within one or multiple temporal documents. Given their temporal component, these patterns are called “motifs”. The different motifs can be mixed arbitrarily in the documents and they don't need to be synchronized with each others.

The words can correspond to any features of interest that are meaningful for the specific domain: plain words (in any language), sound azimuth, localized quantized motion in an image plane, unlocalized quantized interest point descriptor, etc. To use temporal topic models on any data type, a vocabulary (set of features) needs to be defined as well as a temporal axis and resolution. On this page, we introduce quickly the application of temporal topic modeling methods to different kinds of data:

  • video data from a fixed camera, using position and motion as features,
  • audio data using Time Difference of Arrival as features.

Activity Mining in Video Data

Temporal topic mining can be applied to videos in different ways. In the case of videos recorded from a static camera (e.g., in a traffic scenario), the position within the image is meaningful and it can be used together with motion features (optical flow). We build a low-level vocabulary where each word encodes a position in the image and a motion direction, for example, “upward motion at position 10,15“. As such vocabulary can become big very fast (tens of thousands of words), we apply a soft clustering method (using a simpler topic model (PLSA)) to simplify this vocabulary. After this dimensionality reduction, we obtain an high-level vocabulary of size close to 100. Each word in this vocabulary can be seen as a motion blob in the image plane.

The temporal axis is discretized using a sliding window scheme without overlap and with a window length of 1 second. In the end, for 1 hour of input video, we obtain a temporal document which is 100×3600 (vocabulary size of 100 and 3600 seconds long). Here is an example temporal document obtained for 5 minutes of video (from the MIT dataset):


In the case of real data such as video data, the vocabulary has strong semantics (localized motion blobs in this case) and thus the recurrent motifs recovered from temporal topic models can be interpreted. Temporal topic mining recovers motifs, each in the form of a probability table over the vocabulary and time. We can advantageously reinterpret each probability table as explained below.

Click the next button to go through the animated explanation:

Pattern Mining in Audio Data (TDoA)

By setting up two microphones on the side of a two-way road, we can compute a generalized cross correlation to obtain the energy coming from any azimuth. Discretizing both time and azimuth, we can obtain a temporal document for each audio recording. Here are two examples of such temporal documents, time is horizontal while azimuth is vertical:

tdoc tdoc

The temporal discretization gives around 80ms to each horizontal bin, while the vocabulary is obtain by regular quantization of the -90°,+90° azimuth range into 25 bins. In this setup, a typical recording is 20 second long (many such recordings are used) which corresponds to 244 time instants per temporal document.

In these documents, we clearly see two dominant patterns: a downwards ramp corresponding to cars going from right to left and an upwards one for the other direction. In such setup, temporal topic modeling, without any precise information about the semantics of the features, is able to use temporal co-occurrence to properly extract the recurrent motifs. These automatically learned motifs has been shown to be well adapted to car counting, providing equal or better performance compared to a dedicated methods.