Source code for bob.bio.vein.database.utfvp

#!/usr/bin/env python
# vim: set fileencoding=utf-8 :
# Victor <vbros@idiap.ch>

"""
  Utfvp database implementation
"""

from sklearn.pipeline import make_pipeline

import bob.io.base

from bob.bio.base.database import CSVDataset, CSVToSampleLoaderBiometrics
from bob.bio.vein.database.roi_annotation import ROIAnnotation
from bob.extension import rc
from bob.extension.download import get_file


[docs]class UtfvpDatabase(CSVDataset):
    """
    The University of Twente Finger Vascular Pattern dataset

    .. warning::

      To use this dataset protocol, you need to have the original files of the UTFVP dataset.
      Once you have it downloaded, please run the following command to set the path for Bob

        .. code-block:: sh

            bob config set bob.bio.vein.utfvp.directory [DATABASE PATH]

    The fingervein image database consists of 1440 images taken in 2 distinct
    session in two days (May 9th, 2012 and May 23rd, 2012) using a custom built
    fingervein sensor. In each session, each of the 60 subjects in the dataset were
    asked to present 6 fingers to the sensor twice, making up separate tries. The
    six fingers are the left and right ring, middle and index fingers. Therefore,
    the database contains 60x6 = 360 unique fingers.

    Files in the database have a strict naming convention and are organized in
    directories following their subject identifier like so:
    ``0003/0003_5_2_120509-141536``. The fields can be interpreted as
    ``<subject-id>/<subject-id>_<finger-name>_<trial>_<date>-<hour>``. The subject
    identifier is written as a 4-digit number with leading zeroes, varying from 1
    to 60. The finger name is one of the following:

      * **1**: Left ring
      * **2**: Left middle
      * **3**: Left index
      * **4**: Right index
      * **5**: Right middle
      * **6**: Right ring

    The trial identifiers can vary between 1 and 4. The first two tries were
    captured during the first session while the last two, on the second session.
    Given the difference in the images between trials on the same day, we assume
    users were asked to remove the finger from the device and re-position it
    afterwards.

    **Annotations**

    We provide region-of-interest (RoI) **hand-made** annotations for all images in
    this dataset. The annotations mark the place where the finger is on the image,
    excluding the background. The annotation file is a text file with one
    annotation per line in the format ``(y, x)``, respecting Bob's image encoding
    convention. The interconnection of these points in a polygon forms the RoI.

    .. warning::

      To use the annotations, you need to provide the roi files.
      Once you have it downloaded, please run the following command to set the path for Bob

        .. code-block:: sh

            bob config set bob.bio.vein.utfvp.roi [ANNOTATION PATH]


    **Protocols**

    There are 15 protocols implemented in this package:

     * 1vsall
     * nom
     * nomLeftRing
     * nomLeftMiddle
     * nomLeftIndex
     * nomRightIndex
     * nomRightMiddle
     * nomRightRing
     * full
     * fullLeftRing
     * fullLeftMiddle
     * fullLeftIndex
     * fullRightIndex
     * fullRightMiddle
     * fullRightRing

    **"nom" Protocols**

    "nom" means "normal operation mode". In this set of protocols, images from
    different clients are separated in different sets that can be used for system
    training, validation and evaluation:

    * Fingers from clients in the range [1, 10] are used on the training set
    * Fingers from clients in the range [11, 28] are used on the development (or validation) set
    * Fingers from clients in the range [29, 60] are used in the evaluation (or test) set

    Data from the first session (both trials) can be used for enrolling the finger
    while data on the last session (both trials) should be used exclusively for
    probing the finger. In the way setup by this database interface, each of the
    samples is returned as a separate enrollment model. If a single score per
    finger is required, the user must manipulate the final score listings and fuse
    results themselves.

    Matching happens exhaustively between all probes and models. The variants named
    "nomLeftRing", for example, contain the data filtered by finger name as per the
    listings above. For example, "Left Ring" means all files named
    ``*/*_1_*_*-*.png``. Therefore, the equivalent protocol contains only 1/6 of
    the files of its complete ``nom`` version.


    **"full" Protocols**


    "full" protocols are meant to match current practices in fingervein reporting
    in which most published material don't use a separate evaluation set. All data
    is placed on the development (or validation) set. In these protocols, all
    images are used both for enrolling and probing for fingers. It is, of course,
    a biased setup. Matching happens exhaustively between all samples in the
    development set.


    The variants named "fullLeftRing", for example, contain the data filtered by
    finger name as per the listings above. For example, "Left Ring" means all files
    named ``*/*_1_*_*-*.png``. Therefore, the equivalent protocol contains only 1/6
    of the files of its complete ``full`` version.


    **"1vsall" Protocol**

    The "1vsall" protocol is meant as a cross-validation protocol. All data from
    the dataset is split into training and development (or validation). No samples
    are allocated for a separate evaluation (or test) set. The training set is
    composed of all samples of fingers ``0001_1`` (left ring finger of subject 1),
    ``0002_2`` (left middle finger of subject 2), ``0003_3`` (left index finger of
    subject 3), ``0004_4`` (right index finger of subject 4), ``0005_5`` (right
    middle finger of subject 5), ``0006_6`` (right ring finger of subject 6),
    ``0007_1`` (left ring finger of subject 7), ``0008_2`` (left middle finger of
    subject 8) and so on, until subject 35 (inclusive). There are 140 images in
    total on this set.

    All other 1300 samples from the dataset are used as a development (or
    validation) set. Each sample generates a single model and is used as a probe
    for all other models. Matching happens exhaustively, but with the same image
    that generated the model being matched. So, there are 1299 probes per model.

    """

    def __init__(self, protocol):
        # Downloading model if not exists
        urls = UtfvpDatabase.urls()
        filename = get_file(
            "utfvp.tar.gz",
            urls,
            file_hash="526045842fcee46eec3415bfc8ac34d3",
        )

        super().__init__(
            name="utfvp",
            dataset_protocol_path=filename,
            protocol=protocol,
            csv_to_sample_loader=make_pipeline(
                CSVToSampleLoaderBiometrics(
                    data_loader=bob.io.base.load,
                    dataset_original_directory=rc.get(
                        "bob.bio.vein.utfvp.directory", ""
                    ),
                    extension="",
                    reference_id_equal_subject_id=False,
                ),
                ROIAnnotation(roi_path=rc.get("bob.bio.vein.utfvp.roi", "")),
            ),
            score_all_vs_all=True,
        )

[docs]    @staticmethod
    def protocols():
        # TODO: Until we have (if we have) a function that dumps the protocols, let's use this one.
        return [
            "nom",
            "full",
            "1vsall",
            "nomLeftRing",
            "nomRightRing",
            "nomLeftMiddle",
            "nomRightMiddle",
            "nomLeftIndex",
            "nomRightIndex",
            "fullLeftRing",
            "fullRightRing",
            "fullLeftMiddle",
            "fullRightMiddle",
            "fullLeftIndex",
            "fullRightIndex",
        ]

[docs]    @staticmethod
    def urls():
        return [
            "https://www.idiap.ch/software/bob/databases/latest/utfvp-557bfdd2.tar.gz",
            "http://www.idiap.ch/software/bob/databases/latest/utfvp-557bfdd2.tar.gz",
        ]