Python API

This section includes information for using the Python API of bob.db.base.

This section contains the reference guide for bob.db.base.

The db package contains simplified APIs to access data for various databases that can be used in Biometry, Machine Learning or Pattern Classification.

class bob.db.base.Database(original_directory=None, original_extension=None, **kwargs)

Bases: bob.db.base.FileDatabase

This class is deprecated. New databases should use the bob.db.base.FileDatabase class if required

class bob.db.base.File(path, file_id=None, **kwargs)

Bases: object

Abstract class that define basic properties of File objects.

Your file instance should have at least the self.id and self.path properties.

load(directory=None, extension='.hdf5')[source]

Loads the data at the specified location and using the given extension. Override it if you need to load differently.

Parameters
  • directory (str, optional) – If not empty or None, this directory is prefixed to the final file destination

  • extension (str, optional) – If not empty or None, this extension is suffixed to the final file destination

Returns

The loaded data (normally numpy.ndarray).

Return type

object

make_path(directory=None, extension=None)[source]

Wraps the current path so that a complete path is formed

Parameters
  • directory (str, optional) – An optional directory name that will be prefixed to the returned result.

  • extension (str, optional) – An optional extension that will be suffixed to the returned filename. The extension normally includes the leading . character as in .jpg or .hdf5.

Returns

Returns a string containing the newly generated file path.

Return type

str

save(data, directory=None, extension='.hdf5', create_directories=True)[source]

Saves the input data at the specified location and using the given extension. Override it if you need to save differently.

Parameters
  • data (object) – The data blob to be saved (normally a numpy.ndarray).

  • directory (str, optional) – If not empty or None, this directory is prefixed to the final file destination

  • extension (str, optional) – The extension of the filename - this will control the type of output and the codec for saving the input blob.

  • create_directories (bool, optional) – Whether to create the required directories to save the data.

class bob.db.base.FileDatabase(original_directory, original_extension, **kwargs)

Bases: object

Low-level File-based Database API to be used within Bob.

Not all Databases in Bob need to inherit from this class. Use this class only if in your database one sample correlates to one actual file.

original_directory

The directory where the raw files are located.

Type

str

original_extension

The extension of raw data files, e.g. .png.

Type

str

original_file_name(file)[source]

This function returns the original file name for the given File object.

Parameters

filebob.db.base.File or a derivative The File objects for which the file name should be retrieved

Returns

The original file name for the given bob.db.base.File object.

Return type

str

Raises

ValueError – if the file is not found.

original_file_names(files)[source]

Returns the full path of the original data of the given File objects.

Parameters

files (list of bob.db.base.File) – The list of file object to retrieve the original data file names for.

Returns

The paths extracted for the files, in the same order.

Return type

list of str

class bob.db.base.SQLiteBaseDatabase(sqlite_file, file_class, **kwargs)

Bases: object

This class can be used for handling SQL databases.

It opens an SQL database in a read-only mode and keeps it opened during the whole session.

Parameters
  • sqlite_file (str) – The file name (including full path) of the SQLite file to read or generate.

  • file_class (bob.db.base.File) – The File class, which needs to be derived from bob.db.base.File. This is required to be able to query() the databases later on.

m_file_class

The file_class parameter is kept in this attribute.

Type

bob.db.base.File

m_session

The SQL session object.

Type

object

m_sqlite_file

The sqlite_file parameter is kept in this attribute.

Type

str

all_files(**kwargs)[source]

Returns the list of all File objects that satisfy your query.

For possible keyword arguments, please check the implemention’s objects() method.

assert_validity()[source]

Raise a RuntimeError if the database back-end is not available.

files(ids, preserve_order=True)[source]

Returns a list of File objects with the given file ids

Parameters
  • ids (list or tuple) – The ids of the object in the database table “file”. This object should be a python iterable (such as a tuple or list).

  • preserve_order (bool) – If True (the default) the order of elements is preserved, but the execution time increases.

Returns

a list (that may be empty) of File objects.

Return type

list

is_valid()[source]

Returns if a valid session has been opened for reading the database.

paths(ids, prefix=None, suffix=None, preserve_order=True)[source]

Returns a full file paths considering particular file ids

Parameters
  • ids (list or :obj`tuple`) – The ids of the object in the database table “file”. This object should be a python iterable (such as a tuple or list).

  • prefix (str, optional) – The bit of path to be prepended to the filename stem

  • suffix (str, optional) – The extension determines the suffix that will be appended to the filename stem.

  • preserve_order (bool) – If True (the default) the order of elements is preserved, but the execution time increases.

Returns

A list (that may be empty) of the fully constructed paths given the file ids.

Return type

list

query(*args)[source]

Creates a query to the database using the given arguments.

reverse(paths, preserve_order=True)[source]

Reverses the lookup from certain paths, returns a list of bob.db.base.File’s

Parameters
  • paths (list) – The filename stems (list of str) to query for. This object should be a python iterable (such as a tuple or list)

  • preserve_order (bool, optional) – If True (the default) the order of elements is preserved, but the execution time increases.

Returns

A list (that may be empty).

Return type

list

uniquify(file_list)[source]

Sorts the given list of File objects and removes duplicates from it.

Parameters

file_list ([bob.db.base.File]) – A list of File objects to be handled. Also other objects can be handled, as long as they are sortable.

Returns

A sorted copy of the given file_list with the duplicates removed.

Return type

list

class bob.db.base.SQLiteDatabase(sqlite_file, file_class, original_directory, original_extension, **kwargs)

Bases: bob.db.base.SQLiteBaseDatabase, bob.db.base.FileDatabase

This class can be used for handling SQL File based databases.

It inherits from bob.db.base.SQLiteBaseDatabase and bob.db.base.FileDatabase.

bob.db.base.get_config()[source]

Returns a string containing the configuration information.

bob.db.base.read_annotation_file(file_name, annotation_type)

This function provides default functionality to read annotation files.

Parameters
  • file_name (str) – The full path of the annotation file to read. The path can also be like base_path:relative_path where the base_path can be both a directory or a tarball. This allows you to read annotations from inside a tarball.

  • annotation_type (str) –

    The type of the annotation file that should be read. The following annotation_types are supported:

    • eyecenter: The file contains a single row with four entries: re_x re_y le_x le_y

    • named: The file contains named annotations, one per line, e.g.: reye re_x re_y or pose 25.7

    • idiap: The file contains enumerated annotations, one per line, e.g.: 1 key1_x key1_y, and maybe some additional annotations like gender, age, …

    • json: The file contains annotations of any format, dumped in a text json file.

Returns

A python dictionary with the keypoint name as key and the position (y,x) as value, and maybe some additional annotations.

Return type

dict

Raises
  • IOError – If the annotation file is not found.

  • ValueError – If the annotation type is not known.

Database Handling Utilities

Some utilities shared by many of the databases.

class bob.db.base.utils.null[source]

Bases: object

A look-alike stream that discards the input

write(s)[source]

Writes contents of string s on this stream

flush()[source]

Flushes the stream

bob.db.base.utils.apsw_is_available()[source]

Checks lock-ability for SQLite on the current file system

class bob.db.base.utils.SQLiteConnector(filename, readonly=False, lock=None)[source]

Bases: object

An object that handles the connection to SQLite databases.

Parameters
  • filename (str) – The name of the file containing the SQLite database

  • readonly (bool) – Should I try and open the database in read-only mode?

  • lock (str) – Any vfs name as output by apsw.vfsnames()

static filesystem_is_lockable(database)[source]

Checks if the filesystem is lockable

create_engine(echo=False)[source]

Returns an SQLAlchemy engine

session(echo=False)[source]

Returns an SQLAlchemy session

bob.db.base.utils.session(dbtype, dbfile, echo=False)[source]

Creates a session to an SQLite database

bob.db.base.utils.session_try_readonly(dbtype, dbfile, echo=False)[source]

Creates a read-only session to an SQLite database.

If read-only sessions are not supported by the underlying sqlite3 python DB driver, then a normal session is returned. A warning is emitted in case the underlying filesystem does not support locking properly.

Raises

NotImplementedError – if the dbtype is not supported.

bob.db.base.utils.create_engine_try_nolock(dbtype, dbfile, echo=False)[source]

Creates an engine connected to an SQLite database with no locks.

If engines without locks are not supported by the underlying sqlite3 python DB driver, then a normal engine is returned. A warning is emitted if the underlying filesystem does not support locking properly in this case.

Raises

NotImplementedError – if the dbtype is not supported.

bob.db.base.utils.session_try_nolock(dbtype, dbfile, echo=False)[source]

Creates a session to an SQLite database with no locks.

If sessions without locks are not supported by the underlying sqlite3 python DB driver, then a normal session is returned. A warning is emitted if the underlying filesystem does not support locking properly in this case.

Raises

NotImplementedError – if the dbtype is not supported.

bob.db.base.utils.connection_string(dbtype, dbfile, opts={})[source]

Returns a connection string for supported platforms

Parameters
  • dbtype (str) – The type of database (only sqlite is supported for the time being)

  • dbfile (str) – The location of the file to be used

  • opts (dict, optional) – This is ignored.

Returns

The url.

Return type

object

bob.db.base.utils.safe_tarmembers(archive)[source]

Gets a list of safe members to extract from a tar archive

This list excludes:
  • Full paths outside the destination sandbox

  • Symbolic or hard links to outside the destination sandbox

Notes

Code came from a StackOverflow answer http://stackoverflow.com/questions/10060069

Example

Deploy it like this .. code-block:: python

ar = tarfile.open(“foo.tar”) ar.extractall(path=”./sandbox”, members=safe_tarmembers(ar)) ar.close()

Parameters

archive (tarfile.TarFile) – An opened tar file for reading

Yields

list – A list of tarfile.TarInfo objects that satisfy the security criteria imposed by this function, as denoted above.

bob.db.base.utils.check_parameters_for_validity(parameters, parameter_description, valid_parameters, default_parameters=None)[source]

Checks the given parameters for validity.

Checks a given parameter is in the set of valid parameters. It also assures that the parameters form a tuple or a list. If parameters is ‘None’ or empty, the default_parameters will be returned (if default_parameters is omitted, all valid_parameters are returned).

This function will return a tuple or list of parameters, or raise a ValueError.

Parameters
  • parameters (str or list of str or None) – The parameters to be checked. Might be a string, a list/tuple of strings, or None.

  • parameter_description (str) – A short description of the parameter. This will be used to raise an exception in case the parameter is not valid.

  • valid_parameters (list of str) – A list/tuple of valid values for the parameters.

  • default_parameters (list of str or None) – The list/tuple of default parameters that will be returned in case parameters is None or empty. If omitted, all valid_parameters are used.

Returns

A list or tuple containing the valid parameters.

Return type

tuple

Raises

ValueError – If some of the parameters are not valid.

bob.db.base.utils.check_parameter_for_validity(parameter, parameter_description, valid_parameters, default_parameter=None)[source]

Checks the given parameter for validity

Ensures a given parameter is in the set of valid parameters. If the parameter is None or empty, the value in default_parameter will be returned, in case it is specified, otherwise a ValueError will be raised.

This function will return the parameter after the check tuple or list of parameters, or raise a ValueError.

Parameters
  • parameter (str or None) – The single parameter to be checked. Might be a string or None.

  • parameter_description (str) – A short description of the parameter. This will be used to raise an exception in case the parameter is not valid.

  • valid_parameters (list of str) – A list/tuple of valid values for the parameters.

  • default_parameter (list of str, optional) – The default parameter that will be returned in case parameter is None or empty. If omitted and parameter is empty, a ValueError is raised.

Returns

The validated parameter.

Return type

str

Raises

ValueError – If the specified parameter is invalid.

bob.db.base.utils.convert_names_to_highlevel(names, low_level_names, high_level_names)[source]

Converts group names from a low level to high level API

This is useful for example when you want to return db.groups() for the bob.bio.base. Your instance of the database should already have low_level_names and high_level_names initialized.

bob.db.base.utils.convert_names_to_lowlevel(names, low_level_names, high_level_names)[source]

Same as convert_names_to_highlevel() but on reverse

bob.db.base.utils.file_names(files, directory, extension)paths[source]

Returns the full path of the given File objects.

Parameters
  • files (list of bob.db.base.File) – The list of file object to retrieve the file names for.

  • directory (str) – The base directory, where the files can be found.

  • extension (str) – The file name extension to add to all files.

Returns

paths – The paths extracted for the files, in the same order.

Return type

list of str

bob.db.base.utils.sort_files(files)[source]

Returns a sorted version of the given list of File’s (or other structures that define an ‘id’ data member). The files will be sorted according to their id, and duplicate entries will be removed.

Parameters

files (list of bob.db.base.File) – The list of files to be uniquified and sorted.

Returns

sorted – The sorted list of files, with duplicate BioFile.ids being removed.

Return type

list of bob.db.base.File

Driver API

This module defines, among other less important constructions, a management interface that can be used by Bob to display information about the database and manage installed files.

class bob.db.base.driver.Interface[source]

Bases: object

Base manager for Bob databases

You should derive and implement an Interface object on every bob.db package you create.

abstract name()[source]

The name of this database

Returns

a Python-conforming name for this database. This must match the package name. If the package is named bob.db.foo, then this function must return foo.

Return type

str

abstract files()[source]

List of meta-data files for the package to be downloaded/uploaded

This function should normally return an empty list, except in case the database being implemented requires download/upload of metadata files that are not kept in its (git) repository.

Returns

A python iterable with all metadata files needed. The paths listed by this method should correspond to full paths (not relative ones) w.r.t. the database package implementing it. This is normally achieved by using pkg_resources.resource_filename().

Return type

list

abstract version()[source]

The version of this package

Returns

The current version number defined in setup.py

Return type

str

abstract type()[source]

The type of auxiliary files you have for this database

Returns

A string defining the type of database implemented. You can return only two values on this function, either sqlite or text. If you return sqlite, then we append special actions such as dbshell on bob_dbmanage automatically for you. Otherwise, we don’t.

Return type

str

setup_parser(parser, short_description, long_description)[source]

Sets up the base parser for this database.

Parameters
  • short_description (str) – A short description (one-liner) for this database

  • long_description (str) – A more involved explanation of this database

Returns

a subparser, ready so you can add commands on

Return type

argparse.ArgumentParser

abstract add_commands(parser)[source]

Adds commands to a given argparse.ArgumentParser

This method, effectively, allows you to define special commands that your database will be able to perform when called from the common driver like for example create or checkfiles.

You are not obliged to overwrite this method. If you do, you will have the chance to establish your own commands. You don’t have to worry about stock commands such as files() or version(). They will be automatically hooked-in depending on the values you return for type() and files().

Parameters

parser (argparse.ArgumentParser) – An instance of a parser that you can customize, i.e., call argparse.ArgumentParser.add_argument() on.

Sample Database

Minimal database implementation

class bob.db.base.tests.sample.Database

Bases: object

Sample database implementation. Documentation is very important.

You should use sphinx-napoleon (numpy) to document parameters and methods

objects(group=None)[source]

Provides an iterable over samples given the selector information

This method returns an iterable (it may be a list, an iterator or a generator) allowing the user to iterate over the samples, given the selection criteria.

The selection criteria is database dependent. Given the simple nature of our database, our selector allows only to subselect samples for a particular group given the design protocol.

Parameters

group (str) – A string that defines the subset within the database, to return the iteratable for. It may take the value test or train.

Returns

A list of Sample objects you can use to create processing pipelines.

Return type

list

Raises

ValueError – in case the supplied group value is not valid.

class bob.db.base.tests.sample.Sample(data_dir, path)

Bases: bob.db.base.File

Defines a sample (image + tags) pair available in the database

For file-based databases, you may inherit from bob.db.base.File, which provides stock file loading/saving routines.

Internally, a sample is composed of a root directory, pointing to where the database is installed, together with the file stem, indicating the common part of the name shared between the image and the tag annotation file.

Parameters
  • data_dir (str) – The base directory where the root of the database is located on the user filesystem

  • path (str) – The relative path (minus the extension) of the sample

data_dir

The base directory where the root of the database is located on the user filesystem

Type

str

property dominant_color

The dominant color of the object

load(directory=None, extension=None)[source]

Default loading routine - see bob.db.base.File.load()

make_path(directory=None, extension=None)[source]

Path construction routine - see bob.db.base.File.make_path()

property tags

A list of strings containing the tags for the image

Sample Database Driver

Interface definition for Bob’s database driver of this database.

Building a driver goes through 2 steps:

  1. Define an command line interface inheriting bob.db.base.driver.Interface

  2. Create an entry point on the package’s setup.py with type bob.db, containing a pointer to this interface

Once the two steps are in place, then the command-line utility will show your database and allow you to interact with it via the command line.

class bob.db.base.tests.sample.driver.Interface[source]

Bases: bob.db.base.driver.Interface

Bob Manager interface for the Samples Database

name()[source]

Returns a simple name for this database, w/o funny characters, spaces

files()[source]

Returns a python iterable with all auxiliary files needed.

The values should be take w.r.t. where the python file that declares the database is sitting at. Use this method to return names of files that are not kept with the database and can be stored on a remote server.

version()[source]

Returns the current version number from Bob’s build

type()[source]

Returns the type of auxiliary files you have for this database

If you return ‘sqlite’, then we append special actions such as ‘dbshell’ on ‘bob_dbmanage.py’ automatically for you. Otherwise, we don’t.

If you use auxiliary text files, just return ‘text’. We may provide special services for those types in the future.

Use the special name ‘builtin’ if this database is an integral part of Bob.

add_commands(parser)[source]

A few commands this database can respond to.