Probabilistic Models: temporal topic models and more

PLSM: introduction

PLSM stands for Probabilistic Latent Sequential Motif. It can be seen as a time-sensitive evolution of PLSA (Probabilistic Latent Sequential Analysis) which is the original probabilistic topic model. PLSM, similarily to PLSA, is defined by a probabilistic generative model and learning the parameters of the model can be done using an EM algorithm (Expectation-Maximization).

PLSM: understanding the model

PLSM can be represented as a graphical model, wherein nodes represent random variables and the absence of link between nodes represents conditional independence. Here, we provide three equivalent views of the PLSM model.

The PLSM model explains how the set of all observations is supposed to be generated. Each observation is a triple (d,w,ta) meaning that a word w occured once at time ta in the document d. PLSM supposes that there exists a set of K motifs named φ (represented only in the last version). The generative process of each observation goes as follow:

  • draw the document d from a distribution p(d),
  • draw a pair (z,ts) made of a motif index and a starting time, drawn from a per document starting distribution p(z,ts|d),
  • given this z, draw a pair (w,tr) of a word and a relative time, drawn from the corresponding motif defined as a distribution p(w,tr|z) (or φz(w,tr)).
  • set the absolute time of the observation as the sum of the motif starting time and the drawn relative time: ta = ts + tr.

Given a set observations, an Expectation Maximization algorithm allows to find the most likely parameters. The set of parameters is made of the p(z,ts|d) distribution and the p(w,tr|z) distributions (φ in the third representation).