Licencing:
=========

This binary program is provided as is to help reproducing research experiments and to make it possible to try the model in other situations.
If you use this code, you must properly reference the related papers, see the website for links:
 http://probamod.idiap.ch/related-publications.html

Examples of usage:
=================
  # interactive run with a duration of 50 time steps
  java -jar EmonetCvpr2011.jar -in sample.tdoc -interactive 1 -duration 50
  
  # tuning some parameters for this strange sample dataset
  java -jar EmonetCvpr2011.jar -in sample.tdoc -interactive 1 -duration 50 -gamma .3
  # tuning what is displayed when
  java -jar EmonetCvpr2011.jar -in sample.tdoc -interactive 1 -duration 50 -gamma .3 -iPeriod 5 -iPeriodReconstruction 500

  # writing some output files
  java -jar EmonetCvpr2011.jar -in sample.tdoc -duration 50 -out result -iter 1000 -writeModulo 10
  java -jar EmonetCvpr2011.jar -in sample.tdoc -duration 50 -out result


  # listing parameters
  java -jar EmonetCvpr2011.jar --help


Input file:
==========
For other datasets, please have a look at the 'sample.tdoc' file.
Each line corresponds to a time instant and contains a set of 'W:N' entries where W is the word index and N is a count.
Note that a line can be empty (no observations at this time instant).
Note that the .tdoc file can contain non-integer counts, e.g. '123:42.7'.
When loading the file the program first multiplies it by a parameter (-inScale on the command line) and then round it to the closest lower integer.


Semantic of output files:
========================
The output files go by pairs and represent the motif tables and the occurrences.
The format is as such to be compatible with the output of the PLSM algorithm.

  Motifs:
  ------
  .pwz:   normalized p(w|z) probability of a word given a motif
          * one column per motif
          * one row per word
          * each column sums to 1 (might be 0 for an empty motif)
  .ptrwz: normalized p(tr|w,z) probability  
          * one column per word
          * one line per relative time (duration), motifs are stacked

  Occurrences:
  -----------
  .pzd:   unnormalized p(z|d) probability of a motif given an input document
          * one column per document, one line per motif
          * the actual values are the number of observations associated to each motifs (not a probability)
          * NB: there might be some p(z|d) = 0
  .ptszd: normalized p(ts|z,d) probability of a starting time given a topic and a document
          * one column per motif
          * one line per time instant, with documents stacked on top of each others
          * NB: if some p(z|d) = 0 then all p(ts|z,d) are 0 for this z,d ... not really a probablity then

