.. vim: set fileencoding=utf-8 : .. @author: Olegs Nikisins .. @date: May 2017 ============================================ High Level Database Interface How-To Guide ============================================ The *high level database interface* (HLDI) is needed to run biometric experiments using non-filelist databases (e.g. if one wants to use SQL-based database package). This tutorial explains how to create a *high level* database interface, using as an example ``bob.pad.*`` framework (e.g. ``bob.pad.face``). The process is similar for ``bob.bio`` frameworks, e.g. ``bob.bio.face``, ``bob.bio.vein``). High level database interface is a link between low level database interface/package (e.g. ``bob.db.replay``) and a corresponding framework used to run biometric experiments (e.g. ``bob.pad.face``). Generally speaking, the low level interface has lot's of querying options, which are not always used in the corresponding biometric framework. High level interface only contains the functionality, which is needed to run biometric experiments. This, must have functionality, is defined in the corresponding base classes and is discussed next. First thing you need to do is to create a ``*.py`` file containing your high level implementation, for example: ``bob/pad/face/database/replay.py`` for the Replay database. This file must be placed into corresponding biometric framework, which in this case is ``bob.pad.face`` package. The file **must** contain the implementation of two classes: - ``File`` - ``Database`` For example, the names of the above classes for the *Replay* database used in the ``bob.pad.face`` framework are: ``ReplayPadFile`` and ``ReplayPadDatabase``. Implementation of the ``*File`` class ------------------------------------- First of all, the ``*File`` class must inherit from the **base file class** of the corresponding biometric framework. An example: - ``*File`` class for the Replay database used in PAD (Presentation Attack Detection) experiments: ``class ReplayPadFile(PadFile):`` - ``*File`` class for the Biowave V1 database used in verification experiments: ``class BiowaveV1BioFile(BioFile):`` Base class defines the elements, which must be implemented in the derived class. For example, the implementation of ``ReplayPadFile`` class must set the following elements of the base class: ``client_id``, ``path``, ``attack_type`` and ``file_id``. The corresponding high level implementation of the ``ReplayPadFile`` class might look as follows: .. code:: python import bob.bio.video from bob.pad.base.database import PadFile class ReplayPadFile(PadFile): def __init__(self, f): self.__f = f # here ``f`` is an instance of the File class defined in the low level database interface if f.is_real(): attack_type = None else: attack_type = 'attack' super(ReplayPadFile, self).__init__(client_id=f.client, path=f.path, attack_type=attack_type, file_id=f.id) def load(self, directory=None, extension='.mov'): path = self.f.make_path(directory=directory, extension=extension) frame_selector = bob.bio.video.FrameSelector(selection_style = 'all') video_data = frame_selector(path) bbx_data = one_file.bbx(directory=directory) return_dictionary = {} return_dictionary["data"] = video_data return_dictionary["annotations"] = bbx_data return return_dictionary Please, note, that in our case the ``ReplayPadFile`` also has a ``load()`` method. *Note: the load() method of the high level ``*File`` class is used by the preprocessor (a very first block in every biometric pipeline) to read the data from the database.* Not all high level database interfaces require this method, but let's try to understand why ``ReplayPadFile`` class has it. The necessity to have this method comes from the fact, that Replay database contains **video** files, not images. To understand why ``load()`` method is needed in the case of video-based database we need to take a look at the inheritance structure of the class. For the ``ReplayPadFile`` class it looks as follows: - ``ReplayPadFile`` -> ``bob.pad.base.database.PadFile`` -> ``bob.bio.base.database.BioFile`` -> ``bob.db.base.File`` Here the notation ``A`` -> ``B`` means ``A`` inherits from ``B``. Well, the inheritance is pretty deep, but no need to worry about this. The class of interest for us is ``bob.db.base.File`` containing the default file managing methods, which might be overridden if necessary. One of methods is ``load()`` **not** supporting video files by default. Since a different behavior is desired, we need to override it in the high level implementation of the ``*File`` class, ``ReplayPadFile`` in this case. In this example the ``load()`` method returns the dictionary, which contains the video frames, and annotations defining the face bounding box in each frame. The preprocessor has to be "ready to deal" with that type of input. With this, we are done configuring the high level implementation of the ``*File`` class. Implementation of the ``*Database`` class ----------------------------------------- The second unit to be implemented in HLDI is the ``*Database`` class. First of all the ``*Database`` class must inherit from the **base database class** of the corresponding biometric framework. An example: - ``*Database`` class for the Replay database used in PAD (Presentation Attack Detection) experiments: ``class ReplayPadDatabase(PadDatabase):`` - ``*Database`` class for the Biowave V1 database used in verification experiments: ``class BiowaveV1BioDatabase(BioDatabase):`` Let's consider an example of the ``ReplayPadDatabase`` class. The implementation might look as follows, but don't dive into the code yet: .. code:: python from bob.pad.base.database import PadDatabase class ReplayPadDatabase(PadDatabase): def __init__( self, all_files_options={}, check_original_files_for_existence=False, original_directory=None, original_extension=None, # here I have said grandtest because this is the name of the default # protocol for this database protocol='grandtest', **kwargs): self.db = LowLevelDatabase() # Since the high level API expects different group names than what the low # level API offers, you need to convert them when necessary self.low_level_group_names = ('train', 'devel', 'test') # group names in the low-level database interface self.high_level_group_names = ('train', 'dev', 'eval') # names are expected to be like that in objects() function super(ReplayPadDatabase, self).__init__( 'replay', all_files_options, check_original_files_for_existence, original_directory, original_extension, protocol, **kwargs) def objects(self, groups=None, protocol=None, purposes=None, model_ids=None, **kwargs): # Convert group names to low-level group names here. groups = self.convert_names_to_lowlevel(groups, self.low_level_group_names, self.high_level_group_names) files = self.db.objects(protocol=protocol, groups=groups, cls=purposes, **kwargs) files = [ReplayPadFile(f) for f in files] return files def annotations(self, file): """ Do nothing. In this particular implementation the annotations are returned in the *File class above. """ return None Instead, let's try to understand why the implementation looks like this. Again, the methods to be implemented are defined by the corresponding base class of our ``*Database`` class. In the case of PAD ``*Database`` the inheritance structure is as follows: - ``ReplayPadDatabase`` -> ``bob.pad.base.database.PadDatabase`` -> ``bob.bio.base.database.BioDatabase`` -> ``bob.db.base.Database`` For the verification database the inheritance would be: - ``bob.pad.base.database.PadDatabase`` -> ``bob.bio.base.database.BioDatabase`` -> ``bob.db.base.Database`` For other biometric experiments it might look differently. In the given example the behavior of the ``ReplayPadDatabase`` class is defined by the ``bob.pad.base.database.PadDatabase`` base class, which sates that two methods must be implemented in the high level database implementation: ``objects()`` and ``annotations()``. The ``objects()`` method returns a list of instances of ``ReplayPadFile`` class. The ``annotations()`` method is empty, since the developer of the code decided to return the annotations in the ``*File`` class. Note: you are not obliged to do it that way, it's just a matter of taste. At this point, having all necessary classes in place, we are done with implementation of the high level database interface! Just a few small things have to be done to register our high level interface in the corresponding biometric framework. - First, import your package in the ``__init__.py`` file located in the folder containing the implementation of HLDI: ``from .replay import ReplayPadDatabase`` - Next, create an instance of the ``*Database`` class with default configuration. For example, for the ``ReplayPadDatabase`` class used in ``bob.pad.face`` framework, the default configuration file ``/bob/pad/face/config/database/replay.py`` is as follows: .. code:: python # The original_directory is taken from the .bob_bio_databases.txt file located in your home directory original_directory = "[YOUR_REPLAY_ATTACK_DIRECTORY]" original_extension = ".mov" # extension of the data files database = ReplayPadDatabase( protocol='grandtest', original_directory=original_directory, original_extension=original_extension, training_depends_on_protocol=True, ) - Finally, in the ``setup.py`` file of the corresponding biometric framework, add the entry pointing to your default configuration. In the case of observed PAD example the code is: .. code:: python entry_points = { 'bob.pad.database': [ 'replay = bob.pad.face.config.database.replay:database', ], }, That's it! Now we are ready to use our database in the corresponding biometric framework.