Python API for bob.pad.base

Generic functions

Tools to run PAD experiments

Command line generation

bob.pad.base.tools.command_line_parser([…])

Creates an argparse.ArgumentParser object that includes the minimum set of command line options (which is not so few).

bob.pad.base.tools.initialize(parsers[, …])

Parses the command line and arranges the arguments accordingly.

bob.pad.base.tools.command_line(cmdline)

Converts the given options to a string that can be executed in a terminal.

bob.pad.base.tools.write_info(args, …)

Writes information about the current experimental setup into a file specified on command line.

bob.pad.base.tools.FileSelector

This class provides shortcuts for selecting different files for different stages of the snti-spoofing process.

Algorithm

bob.pad.base.tools.train_projector(…[, …])

Trains the feature projector using extracted features of the 'train' group, if the algorithm requires projector training.

bob.pad.base.tools.project(algorithm, extractor)

Projects the features for all files of the database.

bob.pad.base.algorithm

Scoring

bob.bio.base.tools.compute_scores(algorithm, …)

Computes the scores for the given groups.

Details

bob.pad.base.padfile_to_label(padfile)[source]

Returns an integer presenting the label of the current sample.

Parameters

padfile (bob.pad.base.database.PadFile) – A pad file.

Returns

True (1) if it is a bona-fide sample, False (O) otherwise.

Return type

bool

bob.pad.base.get_config()[source]

Returns a string containing the configuration information.

bob.pad.base.combinations(input_dict)[source]

Obtain all possible key-value combinations in the input dictionary containing list values.

Parameters:

input_dictdict

Input dictionary with list values.

Returns:

combinations[dict]

A list of dictionaries containing the combinations.

bob.pad.base.convert_and_prepare_features(features, dtype='float64')[source]

This function converts a list or a frame container of features into a 2D array of features. If the input is a list of frame containers, features from different frame containers (individuals) are concatenated into the same list. This list is then converted to an array. The rows are samples, the columns are features.

Parameters:

features[2D numpy.ndarray] or [FrameContainer]

A list or 2D feature arrays or a list of Frame Containers, see bob.bio.video.utils.FrameContainer. Each frame Container contains feature vectors for the particular individual/person.

Returns:

features_array2D numpy.ndarray

An array containing features for all samples and frames.

bob.pad.base.convert_array_to_list_of_frame_cont(data)[source]

Convert an input 2D array to a list of FrameContainers.

Parameters:

data2D numpy.ndarray

Input data array of the dimensionality (N_samples X N_features ).

Returns:

frame_container_list[FrameContainer]

A list of FrameContainers, see bob.bio.video.utils.FrameContainer for further details. Each frame container contains one feature vector.

bob.pad.base.convert_frame_cont_to_array(frame_container)[source]

This function converts a single Frame Container into an array of features. The rows are samples, the columns are features.

Parameters:

frame_containerobject

A Frame Container conteining the features of an individual, see bob.bio.video.utils.FrameContainer.

Returns:

features_array2D numpy.ndarray

An array containing features for all frames. The rows are samples, the columns are features.

bob.pad.base.convert_list_of_frame_cont_to_array(frame_containers)[source]

This function converts a list of Frame containers into an array of features. Features from different frame containers (individuals) are concatenated into the same list. This list is then converted to an array. The rows are samples, the columns are features.

Parameters:

frame_containers[FrameContainer]

A list of Frame Containers, , see bob.bio.video.utils.FrameContainer. Each frame Container contains feature vectors for the particular individual/person.

Returns:

features_array2D numpy.ndarray

An array containing features for all frames of all individuals.

bob.pad.base.mean_std_normalize(features, features_mean=None, features_std=None, copy=True)[source]

The features in the input 2D array are mean-std normalized. The rows are samples, the columns are features. If features_mean and features_std are provided, then these vectors will be used for normalization. Otherwise, the mean and std of the features is computed on the fly.

Parameters:

features2D numpy.ndarray

Array of features to be normalized.

features_mean1D numpy.ndarray

Mean of the features. Default: None.

features_std2D numpy.ndarray

Standart deviation of the features. Default: None.

Returns:

features_norm2D numpy.ndarray

Normalized array of features.

features_mean1D numpy.ndarray

Mean of the features.

features_std1D numpy.ndarray

Standart deviation of the features.

bob.pad.base.norm_train_cv_data(real_train, real_cv, attack_train, attack_cv, one_class_flag=False)[source]

Mean-std normalization of train and cross-validation data arrays.

Parameters:

real_train2D numpy.ndarray

Subset of train features for the real class.

real_cv2D numpy.ndarray

Subset of cross-validation features for the real class.

attack_train2D numpy.ndarray

Subset of train features for the attack class.

attack_cv2D numpy.ndarray

Subset of cross-validation features for the attack class.

one_class_flagbool

If set to True, only positive/real samples will be used to compute the mean and std normalization vectors. Set to True if using one-class SVM. Default: False.

Returns:

real_train_norm2D numpy.ndarray

Normalized subset of train features for the real class.

real_cv_norm2D numpy.ndarray

Normalized subset of cross-validation features for the real class.

attack_train_norm2D numpy.ndarray

Normalized subset of train features for the attack class.

attack_cv_norm2D numpy.ndarray

Normalized subset of cross-validation features for the attack class.

bob.pad.base.norm_train_data(real, attack)[source]

Mean-std normalization of input data arrays. The mean and std normalizers are computed using real class only.

Parameters:

real2D numpy.ndarray

Training features for the real class.

attack2D numpy.ndarray

Training features for the attack class.

Returns:

real_norm2D numpy.ndarray

Mean-std normalized training features for the real class.

attack_norm2D numpy.ndarray

Mean-std normalized training features for the attack class. Or an empty list if one_class_flag = True.

features_mean1D numpy.ndarray

Mean of the features.

features_std1D numpy.ndarray

Standart deviation of the features.

This function converts a list of all training features returned by read_features method of the extractor to the subsampled train and cross-validation arrays for both real and attack classes.

Parameters:

training_features[[FrameContainer], [FrameContainer]]

A list containing two elements: [0] - a list of Frame Containers with feature vectors for the real class; [1] - a list of Frame Containers with feature vectors for the attack class.

n_samplesint

Number of uniformly selected feature vectors per class.

Returns:

real_train2D numpy.ndarray

Selected subset of train features for the real class. The number of samples in this set is n_samples/2, which is defined by split_data_to_train_cv method of this class.

real_cv2D numpy.ndarray

Selected subset of cross-validation features for the real class. The number of samples in this set is n_samples/2, which is defined by split_data_to_train_cv method of this class.

attack_train2D numpy.ndarray

Selected subset of train features for the attack class. The number of samples in this set is n_samples/2, which is defined by split_data_to_train_cv method of this class.

attack_cv2D numpy.ndarray

Selected subset of cross-validation features for the attack class. The number of samples in this set is n_samples/2, which is defined by split_data_to_train_cv method of this class.

bob.pad.base.select_quasi_uniform_data_subset(features, n_samples)[source]

Select quasi uniformly N samples/feature vectors from the input array of samples. The rows in the input array are samples. The columns are features. Use this function if n_samples is close to the number of samples.

Parameters:

features2D numpy.ndarray

Input array with feature vectors. The rows are samples, columns are features.

n_samplesint

The number of samples to be selected uniformly from the input array of features.

Returns:

features_subset2D numpy.ndarray

Selected subset of features.

bob.pad.base.select_uniform_data_subset(features, n_samples)[source]

Uniformly select N samples/feature vectors from the input array of samples. The rows in the input array are samples. The columns are features.

Parameters:

features2D numpy.ndarray

Input array with feature vectors. The rows are samples, columns are features.

n_samplesint

The number of samples to be selected uniformly from the input array of features.

Returns:

features_subset2D numpy.ndarray

Selected subset of features.

bob.pad.base.split_data_to_train_cv(features)[source]

This function is designed to split the input array of features into two subset namely train and cross-validation. These subsets can be used to tune the hyper-parameters of the SVM. The splitting is 50/50, the first half of the samples in the input are selected to be train set, and the second half of samples is cross-validation.

Parameters:

features2D numpy.ndarray

Input array with feature vectors. The rows are samples, columns are features.

Returns:

features_train2D numpy.ndarray

Selected subset of train features.

features_cv2D numpy.ndarray

Selected subset of cross-validation features.

bob.pad.base.vstack_features(reader, paths, same_size=False)[source]

Stacks all features in a memory efficient way.

Parameters
  • reader (collections.Callable) – The function to load the features. The function should only take one argument path and return loaded features. Use functools.partial to accommodate your reader to this format. The features returned by reader are expected to have the same numpy.dtype and the same shape except for their first dimension. First dimension should correspond to the number of samples.

  • paths (collections.Iterable) – An iterable of paths to iterate on. Whatever is inside path is given to reader so they do not need to be necessarily paths to actual files. If same_size is True, len(paths) must be valid.

  • same_size (bool, optional) – If True, it assumes that arrays inside all the paths are the same shape. If you know the features are the same size in all paths, set this to True to improve the performance.

Returns

The read features with the shape (n_samples, *features_shape[1:]).

Return type

numpy.ndarray

Examples

This function in a simple way is equivalent to calling numpy.vstack(reader(p) for p in paths).

>>> import numpy
>>> from bob.io.base import vstack_features
>>> def reader(path):
...     # in each file, there are 5 samples and features are 2 dimensional.
...     return numpy.arange(10).reshape(5,2)
>>> paths = ['path1', 'path2']
>>> all_features = vstack_features(reader, paths)
>>> numpy.allclose(all_features, numpy.array(
...     [[0, 1],
...      [2, 3],
...      [4, 5],
...      [6, 7],
...      [8, 9],
...      [0, 1],
...      [2, 3],
...      [4, 5],
...      [6, 7],
...      [8, 9]]))
True
>>> all_features_with_more_memory = numpy.vstack(reader(p) for p in paths)
>>> numpy.allclose(all_features, all_features_with_more_memory)
True

You can allocate the array at once to improve the performance if you know that all features in paths have the same shape and you know the total number of the paths:

>>> all_features = vstack_features(reader, paths, same_size=True)
>>> numpy.allclose(all_features, numpy.array(
...     [[0, 1],
...      [2, 3],
...      [4, 5],
...      [6, 7],
...      [8, 9],
...      [0, 1],
...      [2, 3],
...      [4, 5],
...      [6, 7],
...      [8, 9]]))
True

Note

This function runs very slowly. Only use it when RAM is precious.

class bob.pad.base.tools.FileSelector(decorated)[source]

This class provides shortcuts for selecting different files for different stages of the snti-spoofing process.

It communicates with the database and provides lists of file names for all steps of the tool chain.

Parameters:

databasebob.pad.base.database.PadDatabase or derived.

The database object that provides the list of files.

preprocessed_directorystr

The directory, where preprocessed data should be written to.

extractor_filestr

The filename, where the extractor should be written to (if any).

extracted_directorystr

The directory, where extracted features should be written to.

projector_filestr

The filename, where the projector should be written to (if any).

projected_directorystr

The directory, where projected features should be written to (if required).

score_directories(str, str)

The directories, where score files for no-norm should be written to.

default_extensionstr

The default extension of all intermediate files.

compressed_extensionstr

The extension for writing compressed score files. By default, no compression is performed.

class bob.pad.base.tools.PadDatabase(name, protocol='Default', original_directory=None, original_extension=None, **kwargs)

Bases: bob.bio.base.database.BioDatabase

This class represents the basic API for database access. Please use this class as a base class for your database access classes. Do not forget to call the constructor of this base class in your derived class.

Parameters:

name : str A unique name for the database.

protocol : str or None The name of the protocol that defines the default experimental setup for this database.

original_directory : str The directory where the original data of the database are stored.

original_extension : str The file name extension of the original data.

kwargs : key=value pairs The arguments of the bob.bio.base.database.BioDatabase base class constructor.

all_files(groups=('train', 'dev', 'eval'), flat=False)[source]

Returns all files of the database, respecting the current protocol. The files can be limited using the all_files_options in the constructor.

Parameters
  • groups (str or tuple or None) – The groups to get the data for. it should be some of ('train', 'dev', 'eval') or None

  • flat (bool) – if True, it will merge the real and attack files into one list.

Returns

files – The sorted and unique list of all files of the database.

Return type

[bob.pad.base.database.PadFile]

abstract annotations(file)[source]

Returns the annotations for the given File object, if available. You need to override this method in your high-level implementation. If your database does not have annotations, it should return None.

Parameters:

filebob.pad.base.database.PadFile

The file for which annotations should be returned.

Returns:

annotsdict or None

The annotations for the file, if available.

model_ids_with_protocol(groups = None, protocol = None, **kwargs) → ids[source]

Client-based PAD is not implemented.

abstract objects(groups=None, protocol=None, purposes=None, model_ids=None, **kwargs)[source]

This function returns lists of File objects, which fulfill the given restrictions.

Keyword parameters:

groupsstr or [str]

The groups of which the clients should be returned. Usually, groups are one or more elements of (‘train’, ‘dev’, ‘eval’)

protocol

The protocol for which the clients should be retrieved. The protocol is dependent on your database. If you do not have protocols defined, just ignore this field.

purposesstr or [str]

The purposes for which File objects should be retrieved. Usually it is either ‘real’ or ‘attack’.

model_ids[various type]

This parameter is not supported in PAD databases yet

original_file_names(files) → paths[source]

Returns the full paths of the real and attack data of the given PadFile objects.

Parameters:

files[[bob.pad.base.database.PadFile], [bob.pad.base.database.PadFile]

The list of lists ([real, attack]) of file object to retrieve the original data file names for.

Returns:

paths[str] or [[str]]

The paths extracted for the concatenated real+attack files, in the preserved order.

training_files(step = None, arrange_by_client = False) → files[source]

Returns all training File objects This function needs to be implemented in derived class implementations.

Parameters:

The parameters are not applicable in this version of anti-spoofing experiments

Returns:

files[bob.pad.base.database.PadFile] or [[bob.pad.base.database.PadFile]]

The (arranged) list of files used for the training.

bob.pad.base.tools.command_line(cmdline) → str[source]

Converts the given options to a string that can be executed in a terminal. Parameters are enclosed into '...' quotes so that the command line can interpret them (e.g., if they contain spaces or special characters).

Parameters:

cmdline[str]

A list of command line options to be converted into a string.

Returns:

strstr

The command line string that can be copy-pasted into the terminal.

bob.pad.base.tools.command_line_parser(description=__doc__, exclude_resources_from=[]) → parsers[source]

Creates an argparse.ArgumentParser object that includes the minimum set of command line options (which is not so few). The description can be overwritten, but has a (small) default.

Included in the parser, several groups are defined. Each group specifies a set of command line options. For the configurations, registered resources are listed, which can be limited by the exclude_resources_from list of extensions.

It returns a dictionary, containing the parser object itself (in the 'main' keyword), and a list of command line groups.

Parameters:

descriptionstr

The documentation of the script.

exclude_resources_from[str]

A list of extension packages, for which resources should not be listed.

Returns:

parsersdict

A dictionary of parser groups, with the main parser under the ‘main’ key. Feel free to add more options to any of the parser groups.

bob.pad.base.tools.compute_scores(algorithm, extractor, force=False, groups=['dev', 'eval'], allow_missing_files=False, write_compressed=False)[source]

Computes the scores for the given groups.

This function computes all scores for the experiment and writes them to score files. By default, scores are computed for both groups 'dev' and 'eval'.

Parameters:

algorithmpy:class:bob.bio.base.algorithm.Algorithm or derived

The algorithm, used for enrolling model and writing them to file.

extractor : py:class:bob.bio.base.extractor.Extractor or derived

forcebool

If given, files are regenerated, even if they already exist.

groupssome of ('dev', 'eval')

The list of groups, for which scores should be computed.

write_compressedbool

If enabled, score files are compressed as .tar.bz2 files.

bob.pad.base.tools.extract(extractor, preprocessor, groups=None, indices=None, allow_missing_files=False, force=False)[source]

Extracts features from the preprocessed data using the given extractor.

The given extractor is used to extract all features required for the current experiment. It writes the extracted data into the directory specified by the bob.pad.base.tools.FileSelector. By default, if target files already exist, they are not re-created.

The preprocessor is only used to load the data in a coherent way.

Parameters:

extractorpy:class:bob.bio.base.extractor.Extractor or derived

The extractor, used for extracting and writing the features.

preprocessorpy:class:bob.bio.base.preprocessor.Preprocessor or derived

The preprocessor, used for reading the preprocessed data.

groupssome of ('train', 'dev', 'eval') or None

The list of groups, for which the data should be extracted.

indices(int, int) or None

If specified, only the features for the given index range range(begin, end) should be extracted. This is usually given, when parallel threads are executed.

allow_missing_filesbool

If set to True, preprocessed data files that are not found are silently ignored.

forcebool

If given, files are regenerated, even if they already exist.

bob.pad.base.tools.groups(args) → groups[source]

Returns the groups, for which the files must be preprocessed, and features must be extracted and projected. This function should be used in order to eliminate the training files (the 'train' group), when no training is required in this experiment.

Parameters:

argsnamespace

The interpreted command line arguments as returned by the initialize() function.

Returns:

groups[str]

A list of groups, for which data needs to be treated.

bob.pad.base.tools.initialize(parsers, command_line_parameters = None, skips = []) → args[source]

Parses the command line and arranges the arguments accordingly. Afterward, it loads the resources for the database, preprocessor, extractor, algorithm and grid (if specified), and stores the results into the returned args.

This function also initializes the FileSelector instance by arranging the directories and files according to the command line parameters.

If the skips are given, an ‘–execute-only’ parameter is added to the parser, according skips are selected.

Parameters:

parsersdict

The dictionary of command line parsers, as returned from command_line_parser(). Additional arguments might have been added.

command_line_parameters[str] or None

The command line parameters that should be interpreted. By default, the parameters specified by the user on command line are considered.

skips[str]

A list of possible --skip-... options to be added and evaluated automatically.

Returns:

argsnamespace

A namespace of arguments as read from the command line.

Note

The database, preprocessor, extractor, algorithm and grid (if specified) are actual instances of the according classes.

bob.pad.base.tools.is_idiap()[source]
bob.pad.base.tools.preprocess(preprocessor, groups=None, indices=None, allow_missing_files=False, force=False)[source]

Preprocesses the original data of the database with the given preprocessor.

The given preprocessor is used to preprocess all data required for the current experiment. It writes the preprocessed data into the directory specified by the bob.pad.base.tools.FileSelector. By default, if target files already exist, they are not re-created.

Parameters:

preprocessorpy:class:bob.bio.base.preprocessor.Preprocessor or derived

The preprocessor, which should be applied to all data.

groupssome of ('train', 'dev', 'eval') or None

The list of groups, for which the data should be preprocessed.

indices(int, int) or None

If specified, only the data for the given index range range(begin, end) should be preprocessed. This is usually given, when parallel threads are executed.

allow_missing_filesbool

If set to True, files for which the preprocessor returns None are silently ignored.

forcebool

If given, files are regenerated, even if they already exist.

bob.pad.base.tools.project(algorithm, extractor, groups=None, indices=None, allow_missing_files=False, force=False)[source]

Projects the features for all files of the database.

The given algorithm is used to project all features required for the current experiment. It writes the projected data into the directory specified by the bob.pad.base.tools.FileSelector. By default, if target files already exist, they are not re-created.

The extractor is only used to load the data in a coherent way.

Parameters:

algorithmpy:class:bob.pad.base.algorithm.Algorithm or derived

The algorithm, used for projecting features and writing them to file.

extractorpy:class:bob.bio.base.extractor.Extractor or derived

The extractor, used for reading the extracted features, which should be projected.

groupssome of ('train', 'dev', 'eval') or None

The list of groups, for which the data should be projected.

indices(int, int) or None

If specified, only the features for the given index range range(begin, end) should be projected. This is usually given, when parallel threads are executed.

forcebool

If given, files are regenerated, even if they already exist.

bob.pad.base.tools.read_features(file_names, extractor, split_by_client = False) → extracted[source]

Reads the extracted features from file_names using the given extractor. If split_by_client is set to True, it is assumed that the file_names are already sorted by client.

Parameters:

file_names[str] or [[str]]

A list of names of files to be read. If split_by_client = True, file names are supposed to be split into groups.

extractorpy:class:bob.bio.base.extractor.Extractor or derived

The extractor, used for reading the extracted features.

split_by_clientbool

Indicates if the given file_names are split into groups.

allow_missing_filesbool

If set to True, extracted files that are not found are silently ignored.

Returns:

extracted[object] or [[object]]

The list of extracted features, in the same order as in the file_names.

bob.pad.base.tools.read_preprocessed_data(file_names, preprocessor, split_by_client = False) → preprocessed[source]

Reads the preprocessed data from file_names using the given preprocessor. If split_by_client is set to True, it is assumed that the file_names are already sorted by client.

Parameters:

file_names[str] or [[str]]

A list of names of files to be read. If split_by_client = True, file names are supposed to be split into groups.

preprocessorpy:class:bob.bio.base.preprocessor.Preprocessor or derived

The preprocessor, which can read the preprocessed data.

split_by_clientbool

Indicates if the given file_names are split into groups.

allow_missing_filesbool

If set to True, preprocessed data files that are not found are silently ignored.

Returns:

preprocessed[object] or [[object]]

The list of preprocessed data, in the same order as in the file_names.

bob.pad.base.tools.train_extractor(extractor, preprocessor, allow_missing_files=False, force=False)[source]

Trains the feature extractor using preprocessed data of the 'train' group, if the feature extractor requires training.

This function should only be called, when the extractor actually requires training. The given extractor is trained using preprocessed data. It writes the extractor to the file specified by the bob.pad.base.tools.FileSelector. By default, if the target file already exist, it is not re-created.

Parameters:

extractorpy:class:bob.bio.base.extractor.Extractor or derived

The extractor to be trained.

preprocessorpy:class:bob.bio.base.preprocessor.Preprocessor or derived

The preprocessor, used for reading the preprocessed data.

allow_missing_filesbool

If set to True, preprocessed data files that are not found are silently ignored during training.

forcebool

If given, the extractor file is regenerated, even if it already exists.

bob.pad.base.tools.train_projector(algorithm, extractor, allow_missing_files=False, force=False)[source]

Trains the feature projector using extracted features of the 'train' group, if the algorithm requires projector training.

This function should only be called, when the algorithm actually requires projector training. The projector of the given algorithm is trained using extracted features. It writes the projector to the file specified by the bob.pad.base.tools.FileSelector. By default, if the target file already exist, it is not re-created.

Parameters:

algorithmpy:class:bob.pad.base.algorithm.Algorithm or derived

The algorithm, in which the projector should be trained.

extractorpy:class:bob.bio.base.extractor.Extractor or derived

The extractor, used for reading the training data.

forcebool

If given, the projector file is regenerated, even if it already exists.

bob.pad.base.tools.write_info(args, command_line_parameters, executable)[source]

Writes information about the current experimental setup into a file specified on command line.

Parameters:

argsnamespace

The interpreted command line arguments as returned by the initialize() function.

command_line_parameters[str] or None

The command line parameters that have been interpreted. If None, the parameters specified by the user on command line are considered.

executablestr

The name of the executable (such as './bin/spoof.py') that is used to run the experiments.