.. vim: set fileencoding=utf-8 :
.. Pavel Korshunov <pavel.korshunov@idiap.ch>
.. Thu 12 Jul 13:43:22 2018

.. _bob.paper.eusipco2018:

=====================================================================================
 Documentation for EUSIPCO paper "Speaker Inconsistency Detection in Tampered Video"
=====================================================================================

Creating databases
------------------

So far, the algorithms in this package can be ran on two sets of data: generated from AMI_ and VidTIMIT_
databases.  Both databases need to be downloaded first. We used the original databases to create corresponding sets
of genuine and tampered videos, as well as, evaluation protocols.

VidTIMIT_ database
~~~~~~~~~~~~~~~~~~

From the images in VidTIMIT_, we generate video files for genuine subset and we down-sample audio file to 16bits. We
also generate our own tampered videos (so far, for each video we replace the speech from 5 other random speakers).
Here are the steps on how to generate the datasets:

* Provided you have VidTIMIT_ database downloaded to `/path/to/vidtimit`, generate genuine audio-video:

.. code:: sh

    $ bin/python bob/paper/eusipco2018/scripts_vidtimit/generate_non-tampered-audio.py -d /path/to/vidtimit/audio -o /output/dir/where/genuine/files/will/be
    $ bin/python bob/paper/eusipco2018/scripts_vidtimit/generate_non-tampered-video.py -d /path/to/vidtimit/video -o /output/dir/where/genuine/files/will/be

* Generate tampered video (5 tampered for each genuine):

.. code:: sh

    $ bin/python bin/python bob/paper/eusipco2018/scripts_vidtimit/generate_tampered.py -d /dir/where/genuine/files/are/ -o /dir/where/tampered/files/will/be/ -t 5

This script, for each genuine video will take randomly audio from 5 other people and create audio file with the same name, thus creating 5 audio-video pairs where lip movements do not match the speech.

* Run face and landmark detection - preprocess videos (this step is specific to the SGE grid at Idiap)

.. code:: sh

    $ cd bob/paper/eusipco2018/job
    $ bash submit_cpm_detection.sh $(find /dir/where/genuine/files/are -name '*.avi')
    $ bash watch_jobs.sh /dir/where/genuine/files/are

* Move found detections to genuine and tampered directories:

.. code:: sh

    $ rsync --chmod=0777 -avm --include='*.hdf5' -f 'hide,! */' /dir/where/genuine/files/are/ /dir/where/genuine/files/are/
    $ bin/python bob/paper/eusipco2018/scripts_vidtimit/reallocate_annotation_files.py -a /dir/where/genuine/files/are -o /dir/where/tampered/files/are

AMI_ database
~~~~~~~~~~~~~

Since AMI_ has a lot of different types of videos that are not very suitable for lip-sync detection, we need to
extract a suitable set of videos (a single person in the video frame speaking). Using the annotation files provided
in `project/savi/data/ami_annotations/` folder, we cut 15-40 seconds videos from the single speaker shots and use
the audio recorded with lapel mic.

To generate training and development data from AMI_, follow these steps:

* Provided you have AMI_ database downloaded to `/path/to/ami`, you can generate genuine videos by running the following script:

.. code:: sh

    $ bin/python bob/paper/eusipco2018/scripts_amicorpus/generate_non-tampered.py -d /path/to/ami -a
    bob/paper/eusipco2018/data/ami_annotations/p1.trn.mdtm -o /output/dir/where/genuine/files/will/be

* Generate tampered video (5 tampered for each genuine) set by running the following:

.. code:: sh

   $ bin/python bob/paper/eusipco2018/scripts_amicorpus/generate_tampered.py -d /path/to/ami/genuine/videos -o
   /output/dir/where/tampered/files/will/be -t 5

This script, for each genuine video will take randomly audio from 5 other people and merge it with this video,
thus creating 5 tampered videos where lip movements do not match the speech.

* Split video and audio in different files (run for both genuine and tampered directories):

.. code:: sh

   $ bin/python bob/paper/eusipco2018/scripts_amicorpus/bin/extract_audio_from_video.py -d /path/to/ami/videos -o /path/to/ami/videos -p /path/to/ami/videos/

* The rest of the processing is the same as for VidTIMIT_


Step-by-step instructions for reproducing the experiments
---------------------------------------------------------

For face and landmark detection, please refer to this README_ (note that although most of the steps could be
replicated on a local machine the readme is written with SGE grid in mind and a support infrastructure available at
Idiap_).

Before training models, video and audio features need to be preprocessed and extracted. First, preprocess video:

.. code:: sh

   $ bin/train_gmm.py bob/paper/eusipco2018/config/video_extraction_pipeline.py -P oneset-licit -s mfcc20mouthdeltas
   $ bin/train_gmm.py bob/paper/eusipco2018/config/video_extraction_pipeline.py -P oneset-spoof -s mfcc20mouthdeltas

Then, use audio pipeline to extract audio features, (video features should be ready by then) and train models (here
we are using GMMs as example of the classifiers):

.. code:: sh

   $ bin/train_gmm.py bob/paper/eusipco2018/config/audio_extraction_pipeline.py -P oneset-licit -s mfcc20mouthdeltas --projector-file Projector_gmm_mfcc20_mouthdeltas_licit.hdf5
   $ bin/train_gmm.py bob/paper/eusipco2018/config/audio_extraction_pipeline.py -P oneset-spoof -s mfcc20mouthdeltas --projector-file Projector_gmm_mfcc20_mouthdeltas_spoof.hdf5

Test the models and compute scores:

.. code:: sh

   $ bin/spoof.py bob/paper/eusipco2018/config/audio_extraction_pipeline.py -P train_dev -a gmm -s mfcc20mouthdeltas --projector-file Projector_gmm_mfcc20_mouthdeltas_spoof.hdf5

===========
Users Guide
===========

.. toctree::
   :maxdepth: 2

   guide


Contact
-------

For questions or reporting issues to this software package, contact Pavel Korshunov (pavel.korshunov@idiap.ch).


.. Place your references here:
.. _bob: https://www.idiap.ch/software/bob
.. _installation: https://www.idiap.ch/software/bob/install
.. _mailing list: https://www.idiap.ch/software/bob/discuss
.. _algorithm: https://github.com/ZheC/Realtime_Multi-Person_Pose_Estimation
.. _dlib: http://dlib.net/
.. _AMI: http://groups.inf.ed.ac.uk/ami/download/
.. _README: https://gitlab.mediforprogram.com/savi/lip.sync/tree/master/bob/paper/eusipco2018/job
.. _idiap: https://www.idiap.ch
.. _VidTIMIT: http://conradsanderson.id.au/vidtimit