.. vim: set fileencoding=utf-8 : .. date: Thu Sep 20 11:58:57 CEST 2012 .. _bob.bio.vein.baselines: =============================== Executing Baseline Algorithms =============================== The first thing you might want to do is to execute one of the vein recognition algorithms that are implemented in ``bob.bio.vein``. Running Baseline Experiments ---------------------------- To run the baseline experiments, you can use the ``verify.py`` script by just going to the console and typing: .. code-block:: sh $ verify.py This script is explained in more detail in :ref:`bob.bio.base.experiments`. The ``verify.py --help`` option shows you, which other options you can set. Usually it is a good idea to have at least verbose level 2 (i.e., calling ``verify.py --verbose --verbose``, or the short version ``verify.py -vv``). .. note:: **Running in Parallel** To run the experiments in parallel, you can define an SGE grid or local host (multi-processing) configurations as explained in :ref:`running_in_parallel`. In short, to run in the Idiap SGE grid, you can simply add the ``--grid`` command line option, without parameters. To run experiments in parallel on the local machine, simply add a ``--parallel `` option, where ```` specifies the number of parallel jobs you want to execute. Database setups and baselines are encoded using :ref:`bob.bio.base.configuration-files`, all stored inside the package root, in the directory ``bob/bio/vein/configurations``. Documentation for each resource is available on the section :ref:`bob.bio.vein.resources`. .. warning:: You **cannot** run experiments just by executing the command line instructions described in this guide. You **need first** to procure yourself the raw data files that correspond to *each* database used here in order to correctly run experiments with those data. Biometric data is considered private data and, under EU regulations, cannot be distributed without a consent or license. You may consult our :ref:`bob.bio.vein.resources.databases` resources section for checking currently supported databases and accessing download links for the raw data files. Once the raw data files have been downloaded, particular attention should be given to the directory locations of those. Unpack the databases carefully and annotate the root directory where they have been unpacked. Then, carefully read the *Databases* section of :ref:`bob.bio.base.installation` on how to correctly setup the ``~/.bob_bio_databases.txt`` file. Use the following keywords on the left side of the assignment (see :ref:`bob.bio.vein.resources.databases`): .. code-block:: text [YOUR_VERAFINGER_DIRECTORY] = /complete/path/to/verafinger [YOUR_UTFVP_DIRECTORY] = /complete/path/to/utfvp [YOUR_FV3D_DIRECTORY] = /complete/path/to/fv3d Notice it is rather important to use the strings as described above, otherwise ``bob.bio.base`` will not be able to correctly load your images. Once this step is done, you can proceed with the instructions below. In the remainder of this section we introduce baseline experiments you can readily run with this tool without further configuration. Baselines examplified in this guide were published in [TVM14]_. Repeated Line-Tracking with Miura Matching ========================================== Detailed description at :ref:`bob.bio.vein.resources.recognition.rlt`. To run the baseline on the `VERA fingervein`_ database, using the ``Nom`` protocol, do the following: .. code-block:: sh $ verify.py verafinger rlt -vv .. tip:: If you have more processing cores on your local machine and don't want to submit your job for SGE execution, you can run it in parallel (using 4 parallel tasks) by adding the options ``--parallel=4 --nice=10``. **Before** doing so, make sure the package gridtk_ is properly installed. Optionally, you may use the ``parallel`` resource configuration which already sets the number of parallel jobs to the number of hardware cores you have installed on your machine (as with :py:func:`multiprocessing.cpu_count`) and sets ``nice=10``. For example: .. code-block:: sh $ verify.py verafinger rlt parallel -vv To run on the Idiap SGE grid using our stock io-big-48-slots-4G-memory-enabled (see :py:mod:`bob.bio.vein.configurations.gridio4g48`) configuration, use: .. code-block:: sh $ verify.py verafinger rlt grid -vv You may also, optionally, use the configuration resource ``gridio4g48``, which is just an alias of ``grid`` in this package. This command line selects and runs the following implementations for the toolchain: * :ref:`bob.bio.vein.resources.database.verafinger` * :ref:`bob.bio.vein.resources.recognition.rlt` As the tool runs, you'll see printouts that show how it advances through preprocessing, feature extraction and matching. In a 4-core machine and using 4 parallel tasks, it takes around 4 hours to process this baseline with the current code implementation. To complete the evaluation, run the command bellow, that will output the equal error rate (EER) and plot the detector error trade-off (DET) curve with the performance: .. code-block:: sh $ bob bio metrics /verafinger/rlt/Nom/nonorm/scores-dev --no-evaluation [Min. criterion: EER ] Threshold on Development set `scores-dev`: 0.31835292 ====== ======================== None Development scores-dev ====== ======================== FtA 0.0% FMR 23.6% (11388/48180) FNMR 23.6% (52/220) FAR 23.6% FRR 23.6% HTER 23.6% ====== ======================== Maximum Curvature with Miura Matching ===================================== Detailed description at :ref:`bob.bio.vein.resources.recognition.mc`. To run the baseline on the `VERA fingervein`_ database, using the ``Nom`` protocol like above, do the following: .. code-block:: sh $ verify.py verafinger mc -vv This command line selects and runs the following implementations for the toolchain: * :ref:`bob.bio.vein.resources.database.verafinger` * :ref:`bob.bio.vein.resources.recognition.mc` In a 4-core machine and using 4 parallel tasks, it takes around 1 hour and 40 minutes to process this baseline with the current code implementation. Results we obtained: .. code-block:: sh $ bob bio metrics /verafinger/mc/Nom/nonorm/scores-dev --no-evaluation [Min. criterion: EER ] Threshold on Development set `scores-dev`: 7.372830e-02 ====== ======================== None Development scores-dev ====== ======================== FtA 0.0% FMR 4.4% (2116/48180) FNMR 4.5% (10/220) FAR 4.4% FRR 4.5% HTER 4.5% ====== ======================== Wide Line Detector with Miura Matching ====================================== You can find the description of this method on the paper from Huang *et al.* [HDLTL10]_. To run the baseline on the `VERA fingervein`_ database, using the ``Nom`` protocol like above, do the following: .. code-block:: sh $ verify.py verafinger wld -vv This command line selects and runs the following implementations for the toolchain: * :ref:`bob.bio.vein.resources.database.verafinger` * :ref:`bob.bio.vein.resources.recognition.wld` In a 4-core machine and using 4 parallel tasks, it takes only around 5 minutes minutes to process this baseline with the current code implementation.Results we obtained: .. code-block:: sh $ bob bio metrics /verafinger/wld/Nom/nonorm/scores-dev --no-evaluation [Min. criterion: EER ] Threshold on Development set `scores-dev`: 2.402707e-01 ====== ======================== None Development scores-dev ====== ======================== FtA 0.0% FMR 9.8% (4726/48180) FNMR 10.0% (22/220) FAR 9.8% FRR 10.0% HTER 9.9% Results for other Baselines =========================== This package may generate results for other combinations of protocols and databases. Here is a summary table for some variants (results expressed correspond to the the equal-error rate on the development set, in percentage): ======================== ====== ====== ====== ====== ====== Toolchain Vera Finger UTFVP ------------------------ -------------------- ------------- Feature Extractor Full B Nom 1vsall nom ======================== ====== ====== ====== ====== ====== Repeated Line Tracking 14.6 13.4 23.6 3.4 1.4 Wide Line Detector 5.8 5.6 9.9 2.8 1.9 Maximum Curvature 2.5 1.4 4.5 0.9 0.4 ======================== ====== ====== ====== ====== ====== In a machine with 48 cores, running these baselines took the following time (hh:mm): ======================== ====== ====== ====== ====== ====== Toolchain Vera Finger UTFVP ------------------------ -------------------- ------------- Feature Extractor Full B Nom 1vsall nom ======================== ====== ====== ====== ====== ====== Repeated Line Tracking 01:16 00:23 00:23 12:44 00:35 Wide Line Detector 00:07 00:01 00:01 02:25 00:05 Maximum Curvature 03:28 00:54 00:59 58:34 01:48 ======================== ====== ====== ====== ====== ====== Modifying Baseline Experiments ------------------------------ It is fairly easy to modify baseline experiments available in this package. To do so, you must copy the configuration files for the given baseline you want to modify, edit them to make the desired changes and run the experiment again. For example, suppose you'd like to change the protocol on the Vera Fingervein database and use the protocol ``full`` instead of the default protocol ``nom``. First, you identify where the configuration file sits: .. code-block:: sh $ resources.py -tc -p bob.bio.vein - bob.bio.vein X.Y.Z @ /path/to/bob.bio.vein: + mc --> bob.bio.vein.configurations.maximum_curvature + parallel --> bob.bio.vein.configurations.parallel + rlt --> bob.bio.vein.configurations.repeated_line_tracking + utfvp --> bob.bio.vein.configurations.utfvp + verafinger --> bob.bio.vein.configurations.verafinger + wld --> bob.bio.vein.configurations.wide_line_detector The listing above tells the ``verafinger`` configuration file sits on the file ``/path/to/bob.bio.vein/bob/bio/vein/configurations/verafinger.py``. In order to modify it, make a local copy. For example: .. code-block:: sh $ cp /path/to/bob.bio.vein/bob/bio/vein/configurations/verafinger.py verafinger_full.py $ # edit verafinger_full.py, change the value of "protocol" to "full" Also, don't forget to change all relative module imports (such as ``from ..database.verafinger import Database``) to absolute imports (e.g. ``from bob.bio.vein.database.verafinger import Database``). This will make the configuration file work irrespectively of its location w.r.t. ``bob.bio.vein``. The final version of the modified file could look like this: .. code-block:: python from bob.bio.vein.database.verafinger import Database database = Database(original_directory='/where/you/have/the/raw/files', original_extension='.png', #don't change this ) protocol = 'full' Now, re-run the experiment using your modified database descriptor: .. code-block:: sh $ verify.py ./verafinger_full.py wld -vv Notice we replace the use of the registered configuration file named ``verafinger`` by the local file ``verafinger_full.py``. This makes the program ``verify.py`` take that into consideration instead of the original file. Other Resources --------------- This package contains other resources that can be used to evaluate different bits of the vein processing toolchain. Training the Watershed Finger region detector ============================================= The correct detection of the finger boundaries is an important step of many algorithms for the recognition of finger veins. It allows to compensate for eventual rotation and scaling issues one might find when comparing models and probes. In this package, we propose a novel finger boundary detector based on the `Watershedding Morphological Algorithm `. Watershedding works in three steps: 1. Determine markers on the original image indicating the types of areas one would like to detect (e.g. "finger" or "background") 2. Determine a 2D (gray-scale) surface representing the original image in which darker spots (representing valleys) are more likely to be filled by surrounding markers. This is normally achieved by filtering the image with a high-pass filter like Sobel or using an edge detector such as Canny. 3. Run the watershed algorithm In order to determine markers for step 1, we train a neural network which outputs the likelihood of a point being part of a finger, given its coordinates and values of surrounding pixels. When used to run an experiment, :py:class:`bob.bio.vein.preprocessor.WatershedMask` requires you provide a *pre-trained* neural network model that presets the markers before watershedding takes place. In order to create one, you can run the program `bob_bio_vein_markdet.py`: .. code-block:: sh $ bob_bio_vein_markdet.py --hidden=20 --samples=500 fv3d central dev You input, as arguments to this application, the database, protocol and subset name you wish to use for training the network. The data is loaded observing a total maximum number of samples from the dataset (passed with ``--samples=N``), the network is trained and recorded into an HDF5 file (by default, the file is called ``model.hdf5``, but the name can be changed with the option ``--model=``). Once you have a model, you can use the preprocessor mask by constructing an object and attaching it to the :py:class:`bob.bio.vein.preprocessor.Preprocessor` entry on your configuration. Region of Interest Goodness of Fit ================================== Automatic region of interest (RoI) finding and cropping can be evaluated using a couple of scripts available in this package. The program ``bob_bio_vein_compare_rois.py`` compares two sets of ``preprocessed`` images and masks, generated by *different* preprocessors (see :py:class:`bob.bio.base.preprocessor.Preprocessor`) and calculates a few metrics to help you determine how both techniques compare. Normally, the program is used to compare the result of automatic RoI to manually annoted regions on the same images. To use it, just point it to the outputs of two experiments representing the manually annotated regions and automatically extracted ones. E.g.: .. code-block:: sh $ bob_bio_vein_compare_rois.py ~/verafinger/mc_annot/preprocessed ~/verafinger/mc/preprocessed Jaccard index: 9.60e-01 +- 5.98e-02 Intersection ratio (m1): 9.79e-01 +- 5.81e-02 Intersection ratio of complement (m2): 1.96e-02 +- 1.53e-02 Values printed by the script correspond to the `Jaccard index`_ (:py:func:`bob.bio.vein.preprocessor.utils.jaccard_index`), as well as the intersection ratio between the manual and automatically generated masks (:py:func:`bob.bio.vein.preprocessor.utils.intersect_ratio`) and the ratio to the complement of the intersection with respect to the automatically generated mask (:py:func:`bob.bio.vein.preprocessor.utils.intersect_ratio_of_complement`). You can use the option ``-n 5`` to print the 5 worst cases according to each of the metrics. Pipeline Display ================ You can use the program ``bob_bio_vein_view_sample.py`` to display the images after full processing using: .. code-block:: sh $ bob_bio_vein_view_sample.py --save=output-dir verafinger /path/to/processed/directory 030-M/030_L_1 $ # open output-dir And you should be able to view images like these (example taken from the Vera fingervein database, using the automatic annotator and Maximum Curvature feature extractor): .. figure:: img/preprocessed.* :scale: 50% Example RoI overlayed on finger vein image of the Vera fingervein database, as produced by the script ``bob_bio_vein_view_sample.py``. .. figure:: img/binarized.* :scale: 50% Example of fingervein image from the Vera fingervein database, binarized by using Maximum Curvature, after pre-processing. .. include:: links.rst