The best way to create a new database for Bob is by first looking at
examples of working databases that are similar to the dataset you want to
create an API for. The name of database interface packages for Bob
usually start with
bob.db.. Take a look at our available packages list
to find available Bob db packages.
In this development guide, we provide a list of guidelines on how to structure your database and solve common development issues. We’ll use the example on the Using Bob Databases of the Samples database (see: Example Database) inside this guide.
This development guide is not a replacement for our official Bob Package Development guide, which explains how to structure packages. It only contains additional guidelines you should consider when developing db packages.
All database packages should be named
<dataset-slug> corresponds to a short name for the dataset you’re trying to
bind. As a general recommendation, use small and catchy names. Avoid long names
which are confusing and difficult to type and remember.
Examples of good, catchy names:
atnt: for the AT&T faces database
iris: for Fisher’s Iris flower dataset
mobio: for the Mobio biometric database
Examples of more inappropriate names (avoid these):
database_cut1_whatnot(avoid very long package names)
test-mine(avoid dashes - they don’t play well with python)
set_v2(avoid version names)
Most people find it beneficial to share a common module organization so that it is easy for everyone to maintain different databases. This is our typical directory organization, which you should try to adhere to:
+-- __init__.py #namespace init for "bob"
+-- __init__.py #namespace init for "bob.db"
models.py include all classes and functions that represent the
task described by the database. If, for example, the database is composed of
image samples and associated tags, possible classes in
Tag. See an example at
query.py normally only contains the definition of the
class. Finally, the file
driver.py contains the connector allowing the
bob_dbmanage.py to interface with your python API for
It is conventional that all elements inside
imported into the package’s
__init__.py to avoid obliging the user to understand
driver.py should declare a database with a name matching
<dataset-slug> and implement at least the following methods from
bob.db.base.driver.Interface.version(): Here, insert the db package version. This is typically done by using
pkg_resources. See an example in the source code for
bob.db.base.driver.Interface.type(): Returns the type of the backend implementation. The return value of this function on your driver implementation will allow
bob_dbmanage.pyto provide specialized actions for SQL-backend db package implementations. The value returned should be either
test.py should contain basic test units for all functionality
shipped with the database. This should contain, at least:
Tests for reading out samples,
Tests for sub-selection of samples using parameters of the
Documentation should accompany the package and indicate how to use the db package Python API and its command-line interface, with examples and appropriate doctests.
If the raw dataset in question is composed of files, it may be beneficial to re-use (through inheritance) some classes available in this package. Read the documentation of:
In this case, make sure your
Sample objects inherit from
File, as is,
for example, the case for
Equivalently, you may also benefit from some database constructions if your
Database inherits from
SQL-Backend for File Databases¶
So far, we have exemplified the implementation and organization of a simple db package, for which the dataset contained only a few raw image samples and a single evaluation protocol. For very complex problems, in which datasets contain many hundreds of raw samples and multiple evaluation protocols, a more complex modelling of the internals of the Bob db package may be required. In such cases, we recommend prospective developers to consider using alternative techniques (as opposed to simple file lists) for implementing the internals of their db packages. In this guide, we introduce how to handle database backends using SQLite through SQLAlchemy, for which support is built into this package.
It is important to note that using a simple or complex backend implementation for storing and retrieving iterables from the database must be completely transparent to the db package user. Users of your Bob db package are primarily interested in iterating over (sub-selected) samples and executing their pipelines. The choice of the backend to use for a given db package must be, therefore, totally opaque to them.
Understanding SQL, database structuring, normalization and how to deploy an ORM (such as SQLAlchemy) is beyond the scope of this guide. Search for guides and tutorials on the net to familiarize yourself with these subjects before trying to make a backend based with SQLAlchemy.
Module Organization for SQLite¶
The module organization of typical SQL-backend db packages is very similar to simple file-based ones:
+-- __init__.py #namespace init for "bob"
+-- __init__.py #namespace init for "bob.db"
models.py will contain the definition of the SQL tables for every
component in the database. One possible table in the SQL database will be that
Sample’s. If you’re designing a db package for a dataset with a
one-file-per-sample storage model, ensure your
Sample class also inherits
bob.db.base.File to provide a uniform experience to users
already used to bob.db interfaces. Other tables and relationships are optional
and should map your problem alongside its contraints to a proper database
query.py will contain the definition of the
Database class. We
recommend you consider, in this case, inheriting from
bob.db.base.SQLiteDatabase, which provides a number of utilities to
handle file-based datasets with an SQLite backend. You’ll find examples among
different Bob db packages for this. Typically, elements returned by the
objects() in this case are ORM objects from SQLAlchemy,
representing a row in a table of your internal database. However, these objects
behave exactly the same as non-SQL
Sample objects and allow the user to
transparently load file contents and meta data using simple API calls.
driver.py will be very similar to other databases, with a few
bob.db.base.driver.Interface.files()implementation will return at least the path to the
db.sql3file, which will contain the db package backend information. This will allow
bob_dbmanage.pyto download this file in installations it misses or upload updated versions of it to our central server.
SQL-backend db packages normally install a
createcommand at the driver
bob.db.base.driver.Interfaceallowing developers to create the
db.sql3file from scratch. It is important to have a create command so that the database can be re-created in case of changes.
create.py typically contains routines for the creation of
db.sql3 from scratch and is not required per se. Functions and modules
create.py are imported into
driver.py for the
instantiation of the
create command. See examples in db packages for
mobio or verafinger.
Finally, the file
test.py should contain the usual set of tests, as for
The package documentation should contain all information regarding the Python API, command-line interface and, if possible, the SQL-backend database design (showing tables, column types and constraints where adequate).
Metafiles Not Shipped with the Database¶
Very often, Bob db packages require the use of support files which should exist inside the package structure, but are not kept under version control. Reasons for this may be that these meta-files are too large or can be recreated programmatically. Such files may be of different natures and each developer should be able to recognize those easily when the situation occurs. Here is a non-exhaustive list of possible use cases for such metafiles:
File lists (for example, defining evaluation protocols or such)
Auxiliary database files (for example, Sqlite database files)
Samples (in case you want to ship them with your database)
In order to mitigate issues related to management, this package provides a set
of utilies to handle such meta-files more easily. To prepare your package for
handling metafiles, you must first make sure that the
Interface class returns a non-empty list as a result of the
Each entry in the list returned by
files() should represent the full
path of the file, considering the current installation location. The method
bob.db.base.driver.Interface should return sqlite.
A typical implementation for SQL-backend db packages is like this:
from pkg_resources import resource_filename
raw_files = ('db.sql3',)
return [resource_filename(__name__, k) for k in raw_files]
Metafiles Shipped with the Python Package¶
If you’d like the file
db.sql3 to be shipped to PyPI when you publish
your package, make sure to include
db.sql3 in the package’s
file. Otherwise, you are not required to add this file to the package manifest.
Download Missing Files for Large Databases¶
If you declared extra metafiles with your driver’s
it is possible to both store and retrieve metafiles from a central file server
running at Idiap (see http://www.idiap.ch/software/bob/databases/latest). All
metafiles of a package are wrapped into a single tar-ball and copied to the
server upon uploading. The reverse process takes place when downloading.
This mechanism allows third-parties to download sources from the version control repository and retrieve the metafiles.
To download and install metafiles for a package, do:
$ bob_dbmanage.py <database-name> download
For example, you can use the special database name
all, together with the
--missing to download the missing metafiles of all installed databases
$ bob_dbmanage.py all download --missing
Low and High-Level Interfaces¶
Bob database interfaces come in two flavours:
Low-level interfaces allow developers to create programmatic APIs to access samples and metadata available with databases as they are distributed by their controllers. Examples of this are the Samples database in this package or APIs provided in any other db packages. The main objective of a low-level database interface is to provide access to all information provided with the database, without direct regards to the specific task it was originally conceived for. The reasoning behind this design choice lies in the fact that databases very often find second lives in different tasks than originally intended. By providing access to all information available from the raw dataset, a developer potentialises such (re-)use cases.
High-level interfaces allow developers to create programmatic APIs to bind low-level interfaces to frameworks that perform a specific function. Because each low-level databases should be created to export all available information, in some cases it is possible to re-use an existing db package as input to a different task than it was originally conceived for. Here are some examples:
Re-use a database for emotion recognition to perform remote photo-plethysmographic (see
Re-use a face recognition database to train a face detector
Re-use a speaker recognition database to do speech recognition
High-level database interfaces are, therefore, very task specific and normally sit together with frameworks doing high-level experimental research. Examples of such frameworks are bob.bio.base (biometric recognition) and bob.pad.base (presentation attack vulnerability and detection). Checkout their user guides for more information on specific high-level implementations required by those tasks.