Python API for bob.bio.base¶

bob.bio.base.is_argument_available(argument, method)[source]¶

Check if an argument (or keyword argument) is available in a method

bob.bio.base.argument¶

The name of the argument (or keyword argument).

Type: str

bob.bio.base.method¶: Pointer to the method

bob.bio.base.list_resources(keyword, strip=['dummy'], package_prefix='bob.bio.', verbose=False, packages=None)[source]¶: Returns a string containing a detailed list of resources that are registered with the given keyword.

bob.bio.base.load(file)[source]¶: Loads data from file. The given file might be an HDF5 file open for reading or a string.

bob.bio.base.load_compressed(filename, compression_type='bz2')[source]¶: Extracts the data to a temporary HDF5 file using HDF5 and reads its contents. Note that, though the file name is .hdf5, it contains compressed data! Accepted compression types are ‘gz’, ‘bz2’, ‘’

bob.bio.base.load_resource(resource, keyword, imports=['bob.bio.base'], package_prefix='bob.bio.', preferred_package=None)[source]¶

Loads the given resource that is registered with the given keyword. The resource can be:

a resource as defined in the setup.py
a configuration file
a string defining the construction of an object. If imports are required for the construction of this object, they can be given as list of strings.

Parameters:

resourcestr: Any string interpretable as a resource (see above).
keywordstr: A valid resource keyword, can be one of bob.bio.base.utils.resources.valid_keywords.
imports[str]: A list of strings defining which modules to import, when constructing new objects (option 3).
package_prefixstr: Package namespace, in which we search for entry points, e.g., bob.bio.
preferred_packagestr or None: When several resources with the same name are found in different packages (e.g., in different bob.bio or other packages), this specifies the preferred package to load the resource from. If not specified, the extension that is not from bob.bio is selected.

Returns:

resourceobject: The resulting resource object is returned, either read from file or resource, or created newly.

bob.bio.base.open_compressed(filename, open_flag='r', compression_type='bz2')[source]¶: Opens a compressed HDF5File with the given opening flags. For the ‘r’ flag, the given compressed file will be extracted to a local space. For ‘w’, an empty HDF5File is created. In any case, the opened HDF5File is returned, which needs to be closed using the close_compressed() function.

bob.bio.base.pretty_print(obj, kwargs)[source]¶: Returns a pretty-print of the parameters to the constructor of a class, which should be able to copy-paste on the command line to create the object (with few exceptions).

bob.bio.base.read_config_file(filenames, keyword=None)[source]¶

Use this function to read the given configuration file. If a keyword is specified, only the configuration according to this keyword is returned. Otherwise a dictionary of the configurations read from the configuration file is returned.

Parameters:

filenames[str]: A list (pontentially empty) of configuration files or resources to read running options from
keywordstr or None: If specified, only the contents of the variable with the given name is returned. If None, the whole configuration is returned (a local namespace)

Returns:

configobject or namespace: If keyword is specified, the object inside the configuration with the given name is returned. Otherwise, the whole configuration is returned (as a local namespace).

bob.bio.base.read_original_data(biofile, directory, extension)[source]¶

This function reads the original data using the given biofile instance. It simply calls load(directory, extension) from bob.bio.base.database.BioFile or one of its derivatives.

Parameters

biofile (bob.bio.base.database.BioFile or one of its derivatives) – The file to read the original data.
directory (str) – The base directory of the database.
extension (str or None) – The extension of the original data. Might be None if the biofile itself has the extension stored.

Returns

Whatver biofile.load returns; usually a numpy.ndarray

Return type

object

bob.bio.base.resource_keys(keyword, exclude_packages=[], package_prefix='bob.bio.', strip=['dummy'])[source]¶: Reads and returns all resources that are registered with the given keyword. Entry points from the given exclude_packages are ignored.

bob.bio.base.save(data, file, compression=0)[source]¶: Saves the data to file using HDF5. The given file might be an HDF5 file open for writing, or a string. If the given data contains a save method, this method is called with the given HDF5 file. Otherwise the data is written to the HDF5 file using the given compression.

bob.bio.base.save_compressed(data, filename, compression_type='bz2', create_link=False)[source]¶: Saves the data to a temporary file using HDF5. Afterwards, the file is compressed using the given compression method and saved using the given file name. Note that, though the file name will be .hdf5, it will contain compressed data! Accepted compression types are ‘gz’, ‘bz2’, ‘’

bob.bio.base.score_fusion_strategy(strategy_name='average')[source]¶

Returns a function to compute a fusion strategy between different scores.

Different strategies are employed:

'average' : The averaged score is computed using the numpy.average() function.
'min' : The minimum score is computed using the min() function.
'max' : The maximum score is computed using the max() function.
'median' : The median score is computed using the numpy.median() function.
None is also accepted, in which case None is returned.

bob.bio.base.selected_elements(list_of_elements, desired_number_of_elements=None)[source]¶: Returns a list of elements that are sub-selected from the given list (or the list itself, if its length is smaller). These elements are selected such that they are evenly spread over the whole list.

bob.bio.base.selected_indices(total_number_of_indices, desired_number_of_indices=None)[source]¶: Returns a list of indices that will contain exactly the number of desired indices (or the number of total items in the list, if this is smaller). These indices are selected such that they are evenly spread over the whole sequence.

class bob.bio.base.annotator.Annotator¶

Bases: sklearn.base.TransformerMixin, sklearn.base.BaseEstimator

Annotator class for all annotators. This class is meant to be used in conjunction with the bob bio annotate script or to be used in pipelines.

transform(samples, **kwargs)[source]¶

Annotates a sample and returns annotations in a dictionary.

Parameters

samples (numpy.ndarray) – The samples that are being annotated.
**kwargs – The extra arguments that may be passed.

Returns

A dictionary containing the annotations of the biometric sample. If the program fails to annotate the sample, it should return an empty dictionary.

Return type

dict

class bob.bio.base.annotator.Callable(callable, **kwargs)¶

Bases: bob.bio.base.annotator.Annotator

A class that wraps a callable object that annotates a sample into a bob.bio.annotator object.

callable¶: A callable with the following signature: annotations = callable(sample, **kwargs) that takes numpy array and returns annotations in dictionary format for that biometric sample. Please see Annotator for more information.

transform(sample, **kwargs)[source]¶

Annotates a sample and returns annotations in a dictionary.

Parameters

samples (numpy.ndarray) – The samples that are being annotated.
**kwargs – The extra arguments that may be passed.

Returns

A dictionary containing the annotations of the biometric sample. If the program fails to annotate the sample, it should return an empty dictionary.

Return type

dict

class bob.bio.base.annotator.FailSafe(annotators, required_keys, only_required_keys=False, **kwargs)¶

Bases: bob.bio.base.annotator.Annotator

A fail-safe annotator. This annotator takes a list of annotator and tries them until you get your annotations. The annotations of previous annotator is passed to the next one.

annotators¶

A list of annotators to try

Type: list

required_keys¶

A list of keys that should be available in annotations to stop trying different annotators.

Type: list

only_required_keys¶

If True, the annotations will only contain the required_keys.

Type: bool

annotate(sample, **kwargs)[source]¶

transform(samples, **kwargs)[source]¶

Takes a batch of data and tries annotating them while unsuccessful.

Tries each annotator given at the creation of FailSafe when the previous one fails.

Each kwargs value is a list of parameters, with each element of those lists corresponding to each element of sample_batch (for example: with [s1, s2, ...] as samples_batch, kwargs['annotations'] should contain [{<s1_annotations>}, {<s2_annotations>}, ...]).

class bob.bio.base.pipelines.vanilla_biometrics.BioAlgorithm(score_reduction_operation=<function average_scores>, **kwargs)¶

Bases: object

Describes a base biometric comparator for the Vanilla Biometrics Pipeline Biometric Algorithm.

biometric model enrollment, via enroll() and scoring, with score().

Parameters: score_reduction_operation (collections.callable) – Callable containing the score reduction function to be applied in the samples in a sampleset

clear_caches()[source]¶: Clean all cached objects from BioAlgorithm

abstract enroll(data)[source]¶

It handles the creation of ONE biometric reference for the vanilla pipeline

Parameters: data – Data used for the creation of ONE BIOMETRIC REFERENCE

enroll_samples(biometric_references)[source]¶

This method should implement the enrollment sub-pipeline of the Vanilla Biometrics Pipeline. TODO REF

It handles the creation of biometric references

Parameters: biometric_references (list) – A list of bob.pipelines.SampleSet objects to be used for creating biometric references. The sets must be identified with a unique id and a path, for eventual checkpointing.

abstract score(biometric_reference, data)[source]¶

It handles the score computation for one sample

Parameters

biometric_reference (list) – Biometric reference to be compared
data (list) – Data to be compared

Returns

scores – For each sample in a probe, returns as many scores as there are samples in the probe, together with the probe’s and the relevant reference’s subject identifiers.

Return type

score_multiple_biometric_references(biometric_references, data)[source]¶

Score one probe against multiple biometric references (models). This method is called if allow_scoring_multiple_references is set to true. You may want to override this method to improve the performance of computations.

Parameters

biometric_references (list) – List of biometric references (models) to be scored [description]
data – Data used for the creation of ONE biometric probe.

Returns

A list of scores for the comparison of the probe against multiple models.

Return type

score_samples(probe_features, biometric_references, allow_scoring_with_all_biometric_references=True)[source]¶

Scores a new sample against multiple (potential) references

Parameters

probes (list) – A list of bob.pipelines.SampleSet objects to be used for scoring the input references
biometric_references (list) – A list of bob.pipelines.Sample objects to be used for scoring the input probes, must have an id attribute that will be used to cross-reference which probes need to be scored.
allow_scoring_with_all_biometric_references (bool) – If true will call self.score_multiple_biometric_references, at scoring time, to compute scores in one shot with multiple probes. This optimization is useful when all probes needs to be compared with all biometric references AND your scoring function allows this broadcast computation.

Returns

scores – For each sample in a probe, returns as many scores as there are samples in the probe, together with the probes and the relevant reference’s subject identifiers.

Return type

class bob.bio.base.pipelines.vanilla_biometrics.BioAlgorithmCheckpointWrapper(biometric_algorithm, base_dir, group=None, force=False, hash_fn=None, **kwargs)¶

Bases: bob.bio.base.pipelines.vanilla_biometrics.BioAlgorithm

Wrapper used to checkpoint enrolled and Scoring samples.

Parameters

biometric_algorithm (bob.bio.base.pipelines.vanilla_biometrics.BioAlgorithm) – An implemented bob.bio.base.pipelines.vanilla_biometrics.BioAlgorithm
base_dir (str) – Path to store biometric references and scores
extension (str) – File extension
force (bool) – If True, will recompute scores and biometric references no matter if a file exists
hash_fn –
maps (Pointer to a hash function. This hash function) –
directory (sample.key to a hash code and this hash code corresponds a relative) –
checkpointed. (where a single sample will be) –
than (This is useful when is desirable file directories with less) –
files. (a certain number of) –

Examples

>>> from bob.bio.base.pipelines.vanilla_biometrics import BioAlgorithmCheckpointWrapper, Distance
>>> biometric_algorithm = BioAlgorithmCheckpointWrapper(Distance(), base_dir="./")
>>> biometric_algorithm.enroll(sample) 

clear_caches()[source]¶: Clean all cached objects from BioAlgorithm

enroll(enroll_features)[source]¶

It handles the creation of ONE biometric reference for the vanilla pipeline

Parameters: data – Data used for the creation of ONE BIOMETRIC REFERENCE

score(biometric_reference, data)[source]¶

It handles the score computation for one sample

Parameters

biometric_reference (list) – Biometric reference to be compared
data (list) – Data to be compared

Returns

scores – For each sample in a probe, returns as many scores as there are samples in the probe, together with the probe’s and the relevant reference’s subject identifiers.

Return type

score_multiple_biometric_references(biometric_references, data)[source]¶

Score one probe against multiple biometric references (models). This method is called if allow_scoring_multiple_references is set to true. You may want to override this method to improve the performance of computations.

Parameters

biometric_references (list) – List of biometric references (models) to be scored [description]
data – Data used for the creation of ONE biometric probe.

Returns

A list of scores for the comparison of the probe against multiple models.

Return type

set_score_references_path(group)[source]¶

write_biometric_reference(sample, path)[source]¶

write_scores(samples, path)[source]¶

class bob.bio.base.pipelines.vanilla_biometrics.BioAlgorithmDaskWrapper(biometric_algorithm, **kwargs)¶

Bases: bob.bio.base.pipelines.vanilla_biometrics.BioAlgorithm

Wrap bob.bio.base.pipelines.vanilla_biometrics.BioAlgorithm to work with DASK

clear_caches()[source]¶: Clean all cached objects from BioAlgorithm

enroll(data)[source]¶

It handles the creation of ONE biometric reference for the vanilla pipeline

Parameters: data – Data used for the creation of ONE BIOMETRIC REFERENCE

enroll_samples(biometric_reference_features)[source]¶

This method should implement the enrollment sub-pipeline of the Vanilla Biometrics Pipeline. TODO REF

It handles the creation of biometric references

Parameters: biometric_references (list) – A list of bob.pipelines.SampleSet objects to be used for creating biometric references. The sets must be identified with a unique id and a path, for eventual checkpointing.

score(biometric_reference, data)[source]¶

It handles the score computation for one sample

Parameters

biometric_reference (list) – Biometric reference to be compared
data (list) – Data to be compared

Returns

scores – For each sample in a probe, returns as many scores as there are samples in the probe, together with the probe’s and the relevant reference’s subject identifiers.

Return type

score_multiple_biometric_references(biometric_references, data)[source]¶

Score one probe against multiple biometric references (models). This method is called if allow_scoring_multiple_references is set to true. You may want to override this method to improve the performance of computations.

Parameters

biometric_references (list) – List of biometric references (models) to be scored [description]
data – Data used for the creation of ONE biometric probe.

Returns

A list of scores for the comparison of the probe against multiple models.

Return type

score_samples(probe_features, biometric_references, allow_scoring_with_all_biometric_references=False)[source]¶

Scores a new sample against multiple (potential) references

Parameters

probes (list) – A list of bob.pipelines.SampleSet objects to be used for scoring the input references
biometric_references (list) – A list of bob.pipelines.Sample objects to be used for scoring the input probes, must have an id attribute that will be used to cross-reference which probes need to be scored.
allow_scoring_with_all_biometric_references (bool) – If true will call self.score_multiple_biometric_references, at scoring time, to compute scores in one shot with multiple probes. This optimization is useful when all probes needs to be compared with all biometric references AND your scoring function allows this broadcast computation.

Returns

scores – For each sample in a probe, returns as many scores as there are samples in the probe, together with the probes and the relevant reference’s subject identifiers.

Return type

set_score_references_path(group)[source]¶

class bob.bio.base.pipelines.vanilla_biometrics.BioAlgorithmLegacy(instance, base_dir, force=False, projector_file=None, **kwargs)¶

Bases: bob.bio.base.pipelines.vanilla_biometrics.BioAlgorithm

Biometric Algorithm that handles bob.bio.base.algorithm.Algorithm

In this design, BioAlgorithm.enroll maps to bob.bio.base.algorithm.Algorithm.enroll() and BioAlgorithm.score maps to bob.bio.base.algorithm.Algorithm.score()

Note

Legacy algorithms are always checkpointable

Parameters: instance (object) – An instance of bob.bio.base.algorithm.Algorithm

Example

>>> from bob.bio.base.pipelines.vanilla_biometrics import BioAlgorithmLegacy
>>> from bob.bio.base.algorithm import PCA
>>> biometric_algorithm = BioAlgorithmLegacy(PCA(subspace_dimension=0.99), base_dir="./", projector_file="Projector.hdf5")

property base_dir¶

enroll(enroll_features, **kwargs)[source]¶

It handles the creation of ONE biometric reference for the vanilla pipeline

Parameters: data – Data used for the creation of ONE BIOMETRIC REFERENCE

load_legacy_background_model()[source]¶

score(biometric_reference, data, **kwargs)[source]¶

It handles the score computation for one sample

Parameters

biometric_reference (list) – Biometric reference to be compared
data (list) – Data to be compared

Returns

scores – For each sample in a probe, returns as many scores as there are samples in the probe, together with the probe’s and the relevant reference’s subject identifiers.

Return type

score_multiple_biometric_references(biometric_references, data, **kwargs)[source]¶

Score one probe against multiple biometric references (models). This method is called if allow_scoring_multiple_references is set to true. You may want to override this method to improve the performance of computations.

Parameters

biometric_references (list) – List of biometric references (models) to be scored [description]
data – Data used for the creation of ONE biometric probe.

Returns

A list of scores for the comparison of the probe against multiple models.

Return type

write_biometric_reference(sample, path)[source]¶

write_scores(samples, path)[source]¶

class bob.bio.base.pipelines.vanilla_biometrics.CSVScoreWriter(path, exclude_list=('data', 'samples', 'key', 'references', 'annotations'))¶

Bases: bob.bio.base.pipelines.vanilla_biometrics.ScoreWriter

Read and write scores in CSV format, shipping all metadata with the scores

Parameters

path (str) – Directory to save the scores
exclude_list (list) – List of metadata to exclude from the CSV file

post_process(score_paths, path)[source]¶: Removing the HEADER of all files but the first

write(probe_sampleset)[source]¶: Write scores and returns a bob.pipelines.DelayedSample containing the instruction to open the score file

class bob.bio.base.pipelines.vanilla_biometrics.Database(name, protocol, allow_scoring_with_all_biometric_references=False, annotation_type=None, fixed_positions=None, memory_demanding=False, **kwargs)¶

Bases: object

Base class for Vanilla Biometric pipeline

abstract all_samples(groups=None)[source]¶

Returns all the samples of the dataset

Parameters: groups (list or None) – List of groups to consider (like ‘dev’ or ‘eval’). If None, will return samples from all the groups.
Returns: samples – List of all the samples of the dataset.
Return type: list

abstract background_model_samples()[source]¶

Returns bob.pipelines.Sample’s to train a background model

Returns: samples – List of samples for background model training.
Return type: list

abstract groups()[source]¶

abstract probes(group)[source]¶

Returns probes to score biometric references

Parameters: group (str) – Limits samples to this group
Returns: probes – List of samples for the creation of biometric probes.
Return type: list

abstract protocols()[source]¶

reference_ids(group)[source]¶

abstract references(group='dev')[source]¶

Returns references to enroll biometric references

Parameters: group (str, optional) – Limits samples to this group
Returns: references – List of samples for the creation of biometric references.
Return type: list

class bob.bio.base.pipelines.vanilla_biometrics.DatabaseConnector(database, allow_scoring_with_all_biometric_references=True, annotation_type='eyes-center', fixed_positions=None, memory_demanding=False, append_purpose=False, **kwargs)¶

Bases: bob.bio.base.pipelines.vanilla_biometrics.Database

Wraps a legacy bob.bio.base database and generates conforming samples

This connector allows wrapping generic bob.bio.base datasets and generate samples that conform to the specifications of biometric pipelines defined in this package.

Parameters

database (object) – An instantiated version of a bob.bio.base.Database object
protocol (str) – The name of the protocol to generate samples from. To be plugged at bob.db.base.Database.objects.
allow_scoring_with_all_biometric_references (bool) – If True will allow the scoring function to be performed in one shot with multiple probes. This optimization is useful when all probes needs to be compared with all biometric references AND your scoring function allows this broadcast computation.
annotation_type (str) – Type of the annotations that the database provide. Allowed types are: eyes-center and bounding-box
fixed_positions (dict) – In case database contains one single annotation for all samples. This is useful for registered databases.
memory_demanding (bool) – Sinalizes that a database has some memory demanding components. It might be useful for future processing
append_purpose (bool) – If True, sample.key will be appended with the purpose of the sample (world, probe, or bio-ref).

all_samples(groups=None)[source]¶

Returns all the legacy database files in Sample format

Parameters: groups (list or None) – List of groups to consider (‘train’, ‘dev’, and/or ‘eval’). If None is given, returns samples from all the groups.
Returns: samples – List of all the samples of a database in bob.pipelines.Sample objects.
Return type: list

background_model_samples()[source]¶

Returns bob.pipelines.Sample’s to train a background model (group world).

Returns: samples – List of samples conforming the pipeline API for background model training.
Return type: list

groups()[source]¶

probes(group='dev')[source]¶

Returns probes to score biometric references

Parameters: group (str) – A group to be plugged at database.objects
Returns: probes – List of samples conforming the pipeline API for the creation of biometric probes.
Return type: list

protocols()[source]¶

references(group='dev')[source]¶

Returns references to enroll biometric references

Parameters: group (str, optional) – A group to be plugged at database.objects
Returns: references – List of samples conforming the pipeline API for the creation of biometric references. See, e.g., pipelines.first().
Return type: list

class bob.bio.base.pipelines.vanilla_biometrics.Distance(distance_function=<function cosine>, factor=-1, **kwargs)¶

Bases: bob.bio.base.pipelines.vanilla_biometrics.BioAlgorithm

enroll(enroll_features) → model[source]¶

Enrolls the model by storing all given input vectors.

Parameters: enroll_features ([numpy.ndarray]) – The list of projected features to enroll the model from.
Returns: ``model`` – The enrolled model.
Return type: 2D numpy.ndarray

score(model, probe) → float[source]¶

Computes the distance of the model to the probe using the distance function specified in the constructor.

Parameters

model (2D numpy.ndarray) – The model storing all enrollment features
probe (numpy.ndarray) – The probe feature vector

Returns

``score`` – A similarity value between model and probe

Return type

float

score_multiple_biometric_references(biometric_references, data)[source]¶

Score one probe against multiple biometric references (models). This method is called if allow_scoring_multiple_references is set to true. You may want to override this method to improve the performance of computations.

Parameters

biometric_references (list) – List of biometric references (models) to be scored [description]
data – Data used for the creation of ONE biometric probe.

Returns

A list of scores for the comparison of the probe against multiple models.

Return type

class bob.bio.base.pipelines.vanilla_biometrics.FourColumnsScoreWriter(path, extension='.txt')¶

Bases: bob.bio.base.pipelines.vanilla_biometrics.ScoreWriter

Read and write scores using the four columns format bob.bio.base.score.load.four_column()

write(probe_sampleset)[source]¶: Write scores and returns a bob.pipelines.DelayedSample containing the instruction to open the score file

class bob.bio.base.pipelines.vanilla_biometrics.ScoreWriter(path, extension='.txt')¶

Bases: object

Defines base methods to read, write scores and concatenate scores for bob.bio.base.pipelines.vanilla_biometrics.BioAlgorithm

post_process(score_paths, filename)[source]¶

abstract write(sampleset, path)[source]¶

class bob.bio.base.pipelines.vanilla_biometrics.VanillaBiometricsPipeline(transformer, biometric_algorithm, score_writer=None)¶

Bases: object

Vanilla Biometrics Pipeline

This is the backbone of most biometric recognition systems. It implements three subpipelines and they are the following:

VanillaBiometrics.train_background_model: Initializes or trains your transformer.
It will run sklearn.base.BaseEstimator.fit()

VanillaBiometrics.create_biometric_reference: Creates biometric references
It will run sklearn.base.BaseEstimator.transform() followed by a sequence of bob.bio.base.pipelines.vanilla_biometrics.abstract_classes.BioAlgorithm.enroll()

VanillaBiometrics.compute_scores: Computes scores
It will run sklearn.base.BaseEstimator.transform() followed by a sequence of bob.bio.base.pipelines.vanilla_biometrics.abstract_classes.BioAlgorithm.score()

Example

>>> from bob.pipelines.transformers import Linearize
>>> from sklearn.pipeline import make_pipeline
>>> from bob.bio.base.pipelines.vanilla_biometrics import Distance, VanillaBiometricsPipeline
>>> estimator_1 = Linearize()
>>> transformer = make_pipeline(estimator_1)
>>> biometric_algoritm = Distance()
>>> pipeline = VanillaBiometricsPipeline(transformer, biometric_algoritm)
>>> pipeline(samples_for_training_back_ground_model, samplesets_for_enroll, samplesets_for_scoring)  

To run this pipeline using Dask, used the function dask_vanilla_biometrics().

Example

>>> from bob.bio.base.pipelines.vanilla_biometrics import dask_vanilla_biometrics
>>> pipeline = VanillaBiometricsPipeline(transformer, biometric_algoritm)
>>> pipeline = dask_vanilla_biometrics(pipeline)
>>> pipeline(samples_for_training_back_ground_model, samplesets_for_enroll, samplesets_for_scoring).compute()  

Parameters

transformer (:py:class`sklearn.pipeline.Pipeline` or a sklearn.base.BaseEstimator) – Transformer that will preprocess your data
biometric_algorithm (bob.bio.base.pipelines.vanilla_biometrics.abstract_classes.BioAlgorithm) – Biometrics algorithm object that implements the methods enroll and score methods
score_writer (bob.bio.base.pipelines.vanilla_biometrics.ScoreWriter) – Format to write scores. Default to bob.bio.base.pipelines.vanilla_biometrics.FourColumnsScoreWriter

compute_scores(probe_samples, biometric_references, allow_scoring_with_all_biometric_references=True)[source]¶

create_biometric_reference(biometric_reference_samples)[source]¶

post_process(score_paths, filename)[source]¶

train_background_model(background_model_samples)[source]¶

write_scores(scores)[source]¶

class bob.bio.base.pipelines.vanilla_biometrics.ZTNorm(adaptive_score_fraction, adaptive_score_descending_sort)¶

Bases: object

Computes Z, T and ZT Score Normalization of a :any:`bob.bio.base.pipelines.vanilla_biometrics.BioAlgorithm

Reference bibliography from: A Generative Model for Score Normalization in Speaker Recognition https://arxiv.org/pdf/1709.09868.pdf

Parameters

adaptive_score_fraction (float) – Set the proportion of the impostor scores used to compute \(\mu\) and \(\std\) for the T normalization This is also called as adaptative T-Norm (https://ieeexplore.ieee.org/document/1415220) or Top-Norm (https://ieeexplore.ieee.org/document/4013533)
bool (adaptive_score_descending_sort) – It true, during the Top-norm statistics computations, sort the scores in descending order

compute_snorm_scores(znormed_scores, tnormed_scores)[source]¶

compute_tnorm_scores(probe_scores, sampleset_for_tnorm, t_biometric_references, allow_scoring_with_all_biometric_references=False)[source]¶: Base T-normalization function

compute_znorm_scores(probe_scores, sampleset_for_znorm, biometric_references)[source]¶: Base Z-normalization function

compute_ztnorm_score(t_scores, zt_scores, t_biometric_references, z_normed_scores)[source]¶

class bob.bio.base.pipelines.vanilla_biometrics.ZTNormCheckpointWrapper(ztnorm, base_dir, force=False)¶

Bases: object

Wrap bob.bio.base.pipelines.vanilla_biometrics.ZTNormPipeline to work with DASK

Parameters: ztnorm (bob.bio.base.pipelines.vanilla_biometrics.ZTNorm) – ZTNorm Pipeline

compute_snorm_scores(znormed_scores, tnormed_scores)[source]¶

compute_tnorm_scores(probe_scores, sampleset_for_tnorm, t_biometric_references, for_zt=False)[source]¶

compute_znorm_scores(probe_scores, sampleset_for_znorm, biometric_references, for_zt=False)[source]¶

compute_ztnorm_score(t_scores, zt_scores, t_biometric_references, z_normed_scores)[source]¶

write_scores(samples, path)[source]¶

class bob.bio.base.pipelines.vanilla_biometrics.ZTNormDaskWrapper(ztnorm)¶

Bases: object

Wrap :any:`bob.bio.base.pipelines.vanilla_biometrics.ZTNorm to work with DASK

Parameters: ztnorm (bob.bio.base.pipelines.vanilla_biometrics.ZTNormPipeline) – ZTNorm Pipeline

compute_snorm_scores(znormed_scores, tnormed_scores)[source]¶

compute_tnorm_scores(probe_scores, sampleset_for_tnorm, t_biometric_references, for_zt=False)[source]¶

compute_znorm_scores(probe_scores, sampleset_for_znorm, biometric_references, for_zt=False)[source]¶

compute_ztnorm_score(t_scores, zt_scores, t_biometric_references, z_normed_scores)[source]¶

class bob.bio.base.pipelines.vanilla_biometrics.ZTNormPipeline(vanilla_biometrics_pipeline, z_norm=True, t_norm=True, score_writer=<bob.bio.base.pipelines.vanilla_biometrics.FourColumnsScoreWriter object>, adaptive_score_fraction=1.0, adaptive_score_descending_sort=True)¶

Bases: object

Apply Z, T or ZT Score normalization on top of VanillaBiometric Pipeline

Reference bibliography from: A Generative Model for Score Normalization in Speaker Recognition https://arxiv.org/pdf/1709.09868.pdf

Example

>>> from bob.pipelines.transformers import Linearize
>>> from sklearn.pipeline import make_pipeline
>>> from bob.bio.base.pipelines.vanilla_biometrics import Distance, VanillaBiometricsPipeline, ZTNormPipeline
>>> estimator_1 = Linearize()
>>> transformer = make_pipeline(estimator_1)
>>> biometric_algorithm = Distance()
>>> vanilla_biometrics_pipeline = VanillaBiometricsPipeline(transformer, biometric_algorithm)
>>> zt_pipeline = ZTNormPipeline(vanilla_biometrics_pipeline)
>>> zt_pipeline(...) 

Parameters

vanilla_biometrics_pipeline (VanillaBiometricsPipeline) – An instance VanillaBiometricsPipeline to the wrapped with score normalization
z_norm (bool) – If True, applies ZScore normalization on top of raw scores.
t_norm (bool) – If True, applies TScore normalization on top of raw scores. If both, z_norm and t_norm are true, it applies score normalization
score_writer –
adaptive_score_fraction (float) – Set the proportion of the impostor scores used to compute \(\mu\) and \(\std\) for the T normalization This is also called as adaptative T-Norm (https://ieeexplore.ieee.org/document/1415220) or Top-Norm (https://ieeexplore.ieee.org/document/4013533)
bool (adaptive_score_descending_sort) – It true, during the Top-norm statistics computations, sort the scores in descending order

compute_scores(probe_samples, biometric_references, allow_scoring_with_all_biometric_references=False)[source]¶

compute_snorm_scores(znormed_scores, tnormed_scores)[source]¶

compute_tnorm_scores(t_biometric_reference_samples, probe_features, probe_scores, allow_scoring_with_all_biometric_references=False)[source]¶

compute_znorm_scores(zprobe_samples, probe_scores, biometric_references, allow_scoring_with_all_biometric_references=False)[source]¶

compute_ztnorm_scores(z_probe_features, t_biometric_references, z_normed_scores, t_scores, allow_scoring_with_all_biometric_references=False)[source]¶

create_biometric_reference(biometric_reference_samples)[source]¶

post_process(score_paths, filename)[source]¶

train_background_model(background_model_samples)[source]¶

write_scores(scores)[source]¶

bob.bio.base.pipelines.vanilla_biometrics.checkpoint_vanilla_biometrics(pipeline, base_dir, biometric_algorithm_dir=None, hash_fn=None)¶

Given a bob.bio.base.pipelines.vanilla_biometrics.VanillaBiometricsPipeline, wraps bob.bio.base.pipelines.vanilla_biometrics.VanillaBiometricsPipeline and bob.bio.base.pipelines.vanilla_biometrics.BioAlgorithm to be checkpointed

Parameters

pipeline (bob.bio.base.pipelines.vanilla_biometrics.VanillaBiometricsPipeline) – Vanilla Biometrics based pipeline to be checkpointed
base_dir (str) – Path to store transformed input data and possibly biometric references and scores
biometric_algorithm_dir (str) – If set, it will checkpoint the biometric references and scores to this path. If not, base_dir will be used. This is useful when it’s suitable to have the transformed data path, and biometric references and scores in different paths.
hash_fn – Pointer to a hash function. This hash function will map sample.key to a hash code and this hash code will be the relative directory where a single sample will be checkpointed. This is useful when is desireable file directories with more than a certain number of files.

bob.bio.base.pipelines.vanilla_biometrics.dask_vanilla_biometrics(pipeline, npartitions=None, partition_size=None)¶

Given a bob.bio.base.pipelines.vanilla_biometrics.VanillaBiometricsPipeline, wraps bob.bio.base.pipelines.vanilla_biometrics.VanillaBiometricsPipeline and bob.bio.base.pipelines.vanilla_biometrics.BioAlgorithm to be executed with dask

Parameters

pipeline (bob.bio.base.pipelines.vanilla_biometrics.VanillaBiometricsPipeline) – Vanilla Biometrics based pipeline to be dasked
npartitions (int) – Number of partitions for the initial dask.bag
partition_size (int) – Size of the partition for the initial dask.bag

bob.bio.base.pipelines.vanilla_biometrics.execute_vanilla_biometrics(pipeline, database, dask_client, groups, output, write_metadata_scores, checkpoint, dask_partition_size, dask_n_workers, **kwargs)¶

Function that executes the Vanilla Biometrics pipeline.

This is called when using the bob bio pipelines vanilla-biometrics command.

This is also callable from a script without fear of interrupting the running Dask instance, allowing chaining multiple experiments while keeping the workers alive.

Parameters

pipeline (Instance of bob.bio.base.pipelines.vanilla_biometrics.VanillaBiometricsPipeline) – A constructed vanilla-biometrics pipeline.
database (Instance of bob.bio.base.pipelines.vanilla_biometrics.abstract_class.Database) – A database interface instance
dask_client (instance of dask.distributed.Client or None) – A Dask client instance used to run the experiment in parallel on multiple machines, or locally. Basic configs can be found in bob.pipelines.config.distributed.
groups (list of str) – Groups of the dataset that will be requested from the database interface.
output (str) – Path where the results and checkpoints will be saved to.
write_metadata_scores (bool) – Use the CSVScoreWriter instead of the FourColumnScoreWriter when True.
checkpoint (bool) – Whether checkpoint files will be created for every step of the pipelines.

bob.bio.base.pipelines.vanilla_biometrics.execute_vanilla_biometrics_ztnorm(pipeline, database, dask_client, groups, output, consider_genuines, write_metadata_scores, ztnorm_cohort_proportion, checkpoint, dask_partition_size, dask_n_workers, **kwargs)[source]¶

Function that executes the Vanilla Biometrics pipeline with ZTNorm.

This is called when using the bob bio pipelines vanilla-biometrics-ztnorm command.

This is also callable from a script without fear of interrupting the running Dask instance, allowing chaining multiple experiments while keeping the workers alive.

Parameters

pipeline (Instance of bob.bio.base.pipelines.vanilla_biometrics.VanillaBiometricsPipeline) – A constructed vanilla-biometrics pipeline.
database (Instance of bob.bio.base.pipelines.vanilla_biometrics.abstract_class.Database) – A database interface instance
dask_client (instance of dask.distributed.Client or None) – A Dask client instance used to run the experiment in parallel on multiple machines, or locally. Basic configs can be found in bob.pipelines.config.distributed.
groups (list of str) – Groups of the dataset that will be requested from the database interface.
output (str) – Path where the results and checkpoints will be saved to.
write_metadata_scores (bool) – Use the CSVScoreWriter instead of the FourColumnScoreWriter when True.
checkpoint (bool) – Whether checkpoint files will be created for every step of the pipelines.
dask_partition_size (int) – If using Dask, this option defines the size of each dask.bag.partition. Use this option if the current heuristic that sets this value doesn’t suit your experiment. (https://docs.dask.org/en/latest/bag-api.html?highlight=partition_size#dask.bag.from_sequence).
dask_n_workers (int) – If using Dask, this option defines the number of workers to start your experiment. Dask automatically scales up/down the number of workers due to the current load of tasks to be solved. Use this option if the current amount of workers set to start an experiment doesn’t suit you.
ztnorm_cohort_proportion (float) – Sets the percentage of samples used for t-norm and z-norm. Sometimes you don’t want to use all the t/z samples for normalization
consider_genuines (float) – If set, will consider genuine scores in the ZT score normalization

bob.bio.base.pipelines.vanilla_biometrics.is_checkpointed(pipeline)¶

Check if bob.bio.base.pipelines.vanilla_biometrics.VanillaBiometricsPipeline is checkpointed

Parameters: pipeline (bob.bio.base.pipelines.vanilla_biometrics.VanillaBiometricsPipeline) – Vanilla Biometrics based pipeline to be checkpointed

bob.bio.base.pipelines.vanilla_biometrics.pickle_compress(path, obj, attempts=5)[source]¶

Pickle an object, compressed it and save it

Parameters

path (str) – Path where to save the object
obj – Object to be saved
attempts (Serialization attempts) –

bob.bio.base.pipelines.vanilla_biometrics.uncompress_unpickle(path)[source]¶

class bob.bio.base.database.BioDatabase(name, all_files_options={}, extractor_training_options={}, projector_training_options={}, enroller_training_options={}, check_original_files_for_existence=False, original_directory=None, original_extension=None, annotation_directory=None, annotation_extension=None, annotation_type=None, protocol='Default', training_depends_on_protocol=False, models_depend_on_protocol=False, **kwargs)¶

Bases: bob.db.base.FileDatabase

This class represents the basic API for database access. Please use this class as a base class for your database access classes. Do not forget to call the constructor of this base class in your derived class.

Parameters:

name : str A unique name for the database.

all_files_options : dict Dictionary of options passed to the bob.bio.base.database.BioDatabase.objects() database query when retrieving all data.

extractor_training_options : dict Dictionary of options passed to the bob.bio.base.database.BioDatabase.objects() database query used to retrieve the files for the extractor training.

projector_training_options : dict Dictionary of options passed to the bob.bio.base.database.BioDatabase.objects() database query used to retrieve the files for the projector training.

enroller_training_options : dict Dictionary of options passed to the bob.bio.base.database.BioDatabase.objects() database query used to retrieve the files for the enroller training.

check_original_files_for_existence : bool Enables to test for the original data files when querying the database.

original_directory : str The directory where the original data of the database are stored.

original_extension : str The file name extension of the original data.

annotation_directory : str The directory where the image annotations of the database are stored, if any.

annotation_extension : str The file name extension of the annotation files.

annotation_type : str The type of the annotation file to read, see bob.db.base.read_annotation_file for accepted formats.

protocol : str or None The name of the protocol that defines the default experimental setup for this database.

training_depends_on_protocol : bool Specifies, if the training set used for training the extractor and the projector depend on the protocol. This flag is used to avoid re-computation of data when running on the different protocols of the same database.

models_depend_on_protocol : bool Specifies, if the models depend on the protocol. This flag is used to avoid re-computation of models when running on the different protocols of the same database.

kwargs : key=value pairs The arguments of the Database base class constructor.

all_files(groups=None) → files[source]¶

Returns all files of the database, respecting the current protocol. The files can be limited using the all_files_options in the constructor.

Parameters:

groupssome of ('world', 'dev', 'eval') or None: The groups to get the data for. If None, data for all groups is returned.

kwargs: ignored

Returns:

files[bob.bio.base.database.BioFile]: The sorted and unique list of all files of the database.

annotations(file)[source]¶

Returns the annotations for the given File object, if available. You need to override this method in your high-level implementation. If your database does not have annotations, it should return None.

Parameters:

filebob.bio.base.database.BioFile: The file for which annotations should be returned.

Returns:

annotsdict or None: The annotations for the file, if available.

arrange_by_client(files) → files_by_client[source]¶

Arranges the given list of files by client id. This function returns a list of lists of File’s.

Parameters:

filesbob.bio.base.database.BioFile: A list of files that should be split up by BioFile.client_id.

Returns:

files_by_client[[bob.bio.base.database.BioFile]]: The list of lists of files, where each sub-list groups the files with the same BioFile.client_id

client_id_from_model_id(model_id, group='dev')[source]¶: Return the client id associated with the given model id. In this base class implementation, it is assumed that only one model is enrolled for each client and, thus, client id and model id are identical. All key word arguments are ignored. Please override this function in derived class implementations to change this behavior.

enroll_files(model_id, group='dev') → files[source]¶

Returns a list of File objects that should be used to enroll the model with the given model id from the given group, respecting the current protocol. If the model_id is None (the default), enrollment files for all models are returned.

Parameters:

model_idint or str: A unique ID that identifies the model.
groupone of ('dev', 'eval'): The group to get the enrollment files for.

Returns:

files[bob.bio.base.database.BioFile]: The list of files used for to enroll the model with the given model id.

file_names(files, directory, extension) → paths[source]¶

Returns the full path of the given File objects.

Parameters:

files[bob.bio.base.database.BioFile]: The list of file object to retrieve the file names for.
directorystr: The base directory, where the files can be found.
extensionstr: The file name extension to add to all files.

Returns:

paths[str] or [[str]]: The paths extracted for the files, in the same order. If this database provides file sets, a list of lists of file names is returned, one sub-list for each file set.

groups(protocol=None)[source]¶

Returns the names of all registered groups in the database

Keyword parameters:

protocol: str: The protocol for which the groups should be retrieved. If you do not have protocols defined, just ignore this field.

model_ids(group='dev') → ids[source]¶

Returns a list of model ids for the given group, respecting the current protocol.

Parameters:

groupone of ('dev', 'eval'): The group to get the model ids for.

Returns:

ids[int] or [str]: The list of (unique) model ids for models of the given group.

abstract model_ids_with_protocol(groups=None, protocol=None, **kwargs) → ids[source]¶

Returns a list of model ids for the given groups and given protocol.

Parameters:

groupsone or more of ('world', 'dev', 'eval'): The groups to get the model ids for.

protocol: a protocol name

Returns:

ids[int] or [str]: The list of (unique) model ids for the given groups.

object_sets(groups=None, protocol=None, purposes=None, model_ids=None, **kwargs)[source]¶

This function returns lists of FileSet objects, which fulfill the given restrictions.

Keyword parameters:

groupsstr or [str]: The groups of which the clients should be returned. Usually, groups are one or more elements of (‘world’, ‘dev’, ‘eval’)
protocol: The protocol for which the clients should be retrieved. The protocol is dependent on your database. If you do not have protocols defined, just ignore this field.
purposesstr or [str]: The purposes for which File objects should be retrieved. Usually, purposes are one of (‘enroll’, ‘probe’).
model_ids[various type]: The model ids for which the File objects should be retrieved. What defines a ‘model id’ is dependent on the database. In cases, where there is only one model per client, model ids and client ids are identical. In cases, where there is one model per file, model ids and file ids are identical. But, there might also be other cases.

abstract objects(groups=None, protocol=None, purposes=None, model_ids=None, **kwargs)[source]¶

This function returns a list of bob.bio.base.database.BioFile objects or the list of objects which inherit from this class. Returned files fulfill the given restrictions.

Keyword parameters:

groupsstr or [str]: The groups of which the clients should be returned. Usually, groups are one or more elements of (‘world’, ‘dev’, ‘eval’)
protocol: The protocol for which the clients should be retrieved. The protocol is dependent on your database. If you do not have protocols defined, just ignore this field.
purposesstr or [str]: The purposes for which File objects should be retrieved. Usually, purposes are one of (‘enroll’, ‘probe’).
model_ids[various type]: The model ids for which the File objects should be retrieved. What defines a ‘model id’ is dependent on the database. In cases, where there is only one model per client, model ids and client ids are identical. In cases, where there is one model per file, model ids and file ids are identical. But, there might also be other cases.

probe_file_sets(model_id=None, group='dev') → files[source]¶

Returns a list of probe FileSet objects, respecting the current protocol. If a model_id is specified, only the probe files that should be compared with the given model id are returned (for most databases, these are all probe files of the given group). Otherwise, all probe files of the given group are returned.

Parameters:

model_idint or str or None: A unique ID that identifies the model.
groupone of ('dev', 'eval'): The group to get the enrollment files for.

Returns:

files[bob.bio.base.database.BioFileSet] or something similar: The list of file sets used to probe the model with the given model id.

probe_files(model_id=None, group='dev') → files[source]¶

Returns a list of probe File objects, respecting the current protocol. If a model_id is specified, only the probe files that should be compared with the given model id are returned (for most databases, these are all probe files of the given group). Otherwise, all probe files of the given group are returned.

Parameters:

model_idint or str or None: A unique ID that identifies the model.
groupone of ('dev', 'eval'): The group to get the enrollment files for.

Returns:

files[bob.bio.base.database.BioFile]: The list of files used for to probe the model with the given model id.

replace_directories(replacements=None)[source]¶

This helper function replaces the original_directory and the annotation_directory of the database with the directories read from the given replacement file.

This function is provided for convenience, so that the database configuration files do not need to be modified. Instead, this function uses the given dictionary of replacements to change the original directory and the original extension (if given).

The given replacements can be of type dict, including all replacements, or a file name (as a str), in which case the file is read. The structure of the file should be:

# Comments starting with # and empty lines are ignored

[YOUR_..._DATA_DIRECTORY] = /path/to/your/data
[YOUR_..._ANNOTATION_DIRECTORY] = /path/to/your/annotations

If no annotation files are available (e.g. when they are stored inside the database), the annotation directory can be left out.

Parameters:

replacementsdict or str: A dictionary with replacements, or a name of a file to read the dictionary from. If the file name does not exist, no directories are replaced.

test_files(groups=['dev']) → files[source]¶

Returns all test files (i.e., files used for enrollment and probing) for the given groups, respecting the current protocol. The files for the steps can be limited using the all_files_options defined in the constructor.

Parameters:

groupssome of ('dev', 'eval'): The groups to get the data for.

Returns:

files[bob.bio.base.database.BioFile]: The sorted and unique list of test files of the database.

training_files(step=None, arrange_by_client=False) → files[source]¶

Returns all training files for the given step, and arranges them by client, if desired, respecting the current protocol. The files for the steps can be limited using the ..._training_options defined in the constructor.

Parameters:

stepone of ('train_extractor', 'train_projector', 'train_enroller') or None: The step for which the training data should be returned.
arrange_by_clientbool: Should the training files be arranged by client? If set to True, training files will be returned in [[bob.bio.base.database.BioFile]], where each sub-list contains the files of a single client. Otherwise, all files will be stored in a simple [bob.bio.base.database.BioFile].

Returns:

files[bob.bio.base.database.BioFile] or [[bob.bio.base.database.BioFile]]: The (arranged) list of files used for the training of the given step.

uses_probe_file_sets(protocol=None)[source]¶: Defines if, for the current protocol, the database uses several probe files to generate a score. Returns True if the given protocol specifies file sets for probes, instead of a single probe file. In this default implementation, False is returned, throughout. If you need different behavior, please overload this function in your derived class.

class bob.bio.base.database.BioFile(client_id, path, file_id=None, original_directory=None, original_extension=None, annotation_directory=None, annotation_extension=None, annotation_type=None, **kwargs)¶

Bases: bob.db.base.File, bob.pipelines.sample._ReprMixin

A simple base class that defines basic properties of File object for the use in verification experiments

client_id¶

The id of the client this file belongs to. Its type depends on your implementation. If you use an SQL database, this should be an SQL type like Integer or String.

Type: str or int

path¶

see bob.db.base.File constructor

Type: object

file_id¶

see bob.db.base.File constructor

Type: object

original_directory¶

The path to the original directory of the file

Type: str or None

original_extension¶

The extension of the original files. This attribute is deprecated. Please try to include the extension in the path attribute

Type: str or None

annotation_directory¶

The path to the directory of the annotations

Type: str or None

annotation_extension¶

The extension of annotation files. Default is .json

Type: str or None

annotation_type¶

The type of the annotation file, see :bob.db.base.annotations.read_annotation_file. Default is json.

Type: str or None

property annotations¶

load(original_directory=None, original_extension=None)[source]¶

Loads the data at the specified location and using the given extension. Override it if you need to load differently.

Parameters

original_directory (str (optional)) – The path to the root of the dataset structure. If None, will try to use self.original_directory.
original_extension (str (optional)) – The filename extension of every files in the dataset. If None, will try to use self.original_extension.

Returns

The loaded data (normally numpy.ndarray).

Return type

object

class bob.bio.base.database.BioFileSet(file_set_id, files, path=None, **kwargs)¶

Bases: bob.bio.base.database.BioFile

This class defines the minimum interface of a set of database files that needs to be exported. Use this class, whenever the database provides several files that belong to the same probe. Each file set has an id, and a list of associated files, which are of type bob.bio.base.database.BioFile of the same client. The file set id can be anything hashable, but needs to be unique all over the database.

Parameters

file_set_id (str or int) – A unique ID that identifies the file set.
files ([bob.bio.base.database.BioFile]) – A non-empty list of BioFile objects that should be stored inside this file. All files of that list need to have the same client ID.

class bob.bio.base.database.CSVDataset(*, name, protocol, dataset_protocol_path, csv_to_sample_loader=None, is_sparse=False, allow_scoring_with_all_biometric_references=False, group_probes_by_reference_id=False, **kwargs)¶

Bases: bob.bio.base.pipelines.vanilla_biometrics.Database

Generic filelist dataset for :any:` bob.bio.base.pipelines.vanilla_biometrics.VanillaBiometricsPipeline` pipeline. Check Vanilla Biometrics: Advanced features for more details about the Vanilla Biometrics Dataset interface.

To create a new dataset, you need to provide a directory structure similar to the one below:

my_dataset/
my_dataset/my_protocol/norm/train_world.csv
my_dataset/my_protocol/dev/for_models.csv
my_dataset/my_protocol/dev/for_probes.csv
my_dataset/my_protocol/eval/for_models.csv
my_dataset/my_protocol/eval/for_probes.csv
...

In the above directory structure, inside of my_dataset should contain the directories with all evaluation protocols this dataset might have. Inside of the my_protocol directory should contain at least two csv files:

for_models.csv

for_probes.csv

Those csv files should contain in each row i-) the path to raw data and ii-) the reference_id label for enrollment (bob.bio.base.pipelines.vanilla_biometrics.Database.references) and probing (bob.bio.base.pipelines.vanilla_biometrics.Database.probes). The structure of each CSV file should be as below:

PATH,reference_id
path_1,reference_id_1
path_2,reference_id_2
path_i,reference_id_j
...

You might want to ship metadata within your Samples (e.g gender, age, annotation, …) To do so is simple, just do as below:

PATH,reference_id,METADATA_1,METADATA_2,METADATA_k
path_1,reference_id_1,A,B,C
path_2,reference_id_2,A,B,1
path_i,reference_id_j,2,3,4
...

The files my_dataset/my_protocol/train.csv/eval_enroll.csv and my_dataset/my_protocol/train.csv/eval_probe.csv are optional and it is used in case a protocol contains data for evaluation.

Finally, the content of the file my_dataset/my_protocol/train.csv is used in the case a protocol contains data for training (bob.bio.base.pipelines.vanilla_biometrics.Database.background_model_samples)

Parameters

dataset_path (str) – Absolute path or a tarball of the dataset protocol description.
protocol_na (str) – The name of the protocol
e (str) – The name of the protocol
csv_to_sample_loader (bob.pipelines.sample_loaders.CSVToSampleLoader) – Base class that whose objective is to generate bob.pipelines.Sample and/or bob.pipelines.SampleSet from csv rows

all_samples(groups=None)[source]¶

Reads and returns all the samples in groups.

Parameters: groups (list or None) – Groups to consider (‘train’, ‘dev’, and/or ‘eval’). If None is given, returns the samples from all groups.
Returns: samples – List of bob.pipelines.Sample objects.
Return type: list

background_model_samples()[source]¶

Returns bob.pipelines.Sample’s to train a background model

Returns: samples – List of samples for background model training.
Return type: list

groups()[source]¶

This function returns the list of groups for this database.

Returns: A list of groups
Return type: [str]

probes(group='dev')[source]¶

Returns probes to score biometric references

Parameters: group (str) – Limits samples to this group
Returns: probes – List of samples for the creation of biometric probes.
Return type: list

protocols()[source]¶

references(group='dev')[source]¶

Returns references to enroll biometric references

Parameters: group (str, optional) – Limits samples to this group
Returns: references – List of samples for the creation of biometric references.
Return type: list

class bob.bio.base.database.CSVDatasetCrossValidation(*, name, protocol='Default', csv_file_name='metadata.csv', random_state=0, test_size=0.8, samples_for_enrollment=1, csv_to_sample_loader=None, allow_scoring_with_all_biometric_references=True, group_probes_by_reference_id=False, **kwargs)¶

Bases: bob.bio.base.pipelines.vanilla_biometrics.Database

Generic filelist dataset for bob.bio.base.pipelines.vanilla_biometrics.VanillaBiometricsPipeline pipeline that handles CROSS VALIDATION.

Check Vanilla Biometrics: Advanced features for more details about the Vanilla Biometrics Dataset interface.

This interface will take one csv_file as input and split into i-) data for training and ii-) data for testing. The data for testing will be further split in data for enrollment and data for probing. The input CSV file should be casted in the following format:

PATH,reference_id
path_1,reference_id_1
path_2,reference_id_2
path_i,reference_id_j
...

Parameters

csv_file_name (str) – CSV file containing all the samples from your database
random_state (int) – Pseudo-random number generator seed
test_size (float) – Percentage of the reference_ids used for testing
samples_for_enrollment (float) – Number of samples used for enrollment
csv_to_sample_loader (bob.pipelines.sample_loaders.CSVToSampleLoader) – Base class that whose objective is to generate bob.pipelines.Sample and/or bob.pipelines.SampleSet from csv rows

all_samples(groups=None)[source]¶

Reads and returns all the samples in groups.

Parameters: groups (list or None) – Groups to consider (‘train’ and/or ‘dev’). If None is given, returns the samples from all groups.
Returns: samples – List of bob.pipelines.Sample objects.
Return type: list

background_model_samples()[source]¶

Returns bob.pipelines.Sample’s to train a background model

Returns: samples – List of samples for background model training.
Return type: list

groups()[source]¶

probes(group='dev')[source]¶

Returns probes to score biometric references

Parameters: group (str) – Limits samples to this group
Returns: probes – List of samples for the creation of biometric probes.
Return type: list

protocols()[source]¶

references(group='dev')[source]¶

Returns references to enroll biometric references

Parameters: group (str, optional) – Limits samples to this group
Returns: references – List of samples for the creation of biometric references.
Return type: list

class bob.bio.base.database.CSVDatasetZTNorm(**kwargs)[source]¶

Bases: bob.bio.base.database.CSVDataset

Generic filelist dataset for bob.bio.base.pipelines.vanilla_biometrics.ZTNormPipeline pipelines. Check Vanilla Biometrics: Advanced features for more details about the Vanilla Biometrics Dataset interface.

This dataset interface takes as in put a CSVDataset as input and have two extra methods: CSVDatasetZTNorm.zprobes and CSVDatasetZTNorm.treferences.

To create a new dataset, you need to provide a directory structure similar to the one below:

my_dataset/
my_dataset/my_protocol/norm/train_world.csv
my_dataset/my_protocol/norm/for_znorm.csv
my_dataset/my_protocol/norm/for_tnorm.csv
my_dataset/my_protocol/dev/for_models.csv
my_dataset/my_protocol/dev/for_probes.csv
my_dataset/my_protocol/eval/for_models.csv
my_dataset/my_protocol/eval/for_probes.csv

Parameters: database (CSVDataset) – CSVDataset to be aggregated

zprobes(group='dev', proportion=1.0)[source]¶

treferences(covariate='sex', proportion=1.0)[source]¶

class bob.bio.base.database.CSVToSampleLoaderBiometrics(data_loader, dataset_original_directory='', extension='', reference_id_equal_subject_id=True)¶

Bases: bob.pipelines.sample_loaders.CSVToSampleLoader

Base class that converts the lines of a CSV file, like the one below to bob.pipelines.DelayedSample or bob.pipelines.SampleSet

PATH,REFERENCE_ID
path_1,reference_id_1
path_2,reference_id_2
path_i,reference_id_j
...

Parameters

data_loader – A python function that can be called parameterlessly, to load the sample in question from whatever medium
dataset_original_directory (str) – Path of where data is stored
extension (str) – Default file extension

convert_row_to_sample(row, header)[source]¶

class bob.bio.base.database.FileListBioDatabase(filelists_directory, name, protocol=None, bio_file_class=<class 'bob.bio.base.database.BioFile'>, original_directory=None, original_extension=None, annotation_directory=None, annotation_extension='.pos', annotation_type='eyecenter', dev_sub_directory=None, eval_sub_directory=None, world_filename=None, optional_world_1_filename=None, optional_world_2_filename=None, models_filename=None, probes_filename=None, scores_filename=None, tnorm_filename=None, znorm_filename=None, use_dense_probe_file_list=None, keep_read_lists_in_memory=True, **kwargs)¶

Bases: bob.bio.base.database.ZTBioDatabase

This class provides a user-friendly interface to databases that are given as file lists.

Parameters

filelists_directory (str) – The directory that contains the filelists defining the protocol(s). If you use the protocol attribute when querying the database, it will be appended to the base directory, such that several protocols are supported by the same class instance of bob.bio.base.
name (str) – The name of the database
protocol (str) – The protocol of the database. This should be a folder inside filelists_directory.
bio_file_class (class) – The class that should be used to return the files. This can be bob.bio.base.database.BioFile, bob.bio.spear.database.AudioBioFile, bob.bio.face.database.FaceBioFile, or anything similar.
original_directory (str or None) – The directory, where the original data can be found.
original_extension (str or [str] or None) – The filename extension of the original data, or multiple extensions.
annotation_directory (str or None) – The directory, where additional annotation files can be found.
annotation_extension (str or None) – The filename extension of the annotation files.
annotation_type (str or None) – The type of annotation that can be read. Currently, options are 'eyecenter', 'named', 'idiap'. See bob.db.base.read_annotation_file() for details.
dev_sub_directory (str or None) – Specify a custom subdirectory for the filelists of the development set (default is 'dev')
eval_sub_directory (str or None) – Specify a custom subdirectory for the filelists of the development set (default is 'eval')
world_filename (str or None) – Specify a custom filename for the training filelist (default is 'norm/train_world.lst')
optional_world_1_filename (str or None) – Specify a custom filename for the (first optional) training filelist (default is 'norm/train_optional_world_1.lst')
optional_world_2_filename (str or None) – Specify a custom filename for the (second optional) training filelist (default is 'norm/train_optional_world_2.lst')
models_filename (str or None) – Specify a custom filename for the model filelists (default is 'for_models.lst')
probes_filename (str or None) – Specify a custom filename for the probes filelists (default is 'for_probes.lst')
scores_filename (str or None) – Specify a custom filename for the scores filelists (default is 'for_scores.lst')
tnorm_filename (str or None) – Specify a custom filename for the T-norm scores filelists (default is 'for_tnorm.lst')
znorm_filename (str or None) – Specify a custom filename for the Z-norm scores filelists (default is 'for_znorm.lst')
use_dense_probe_file_list (bool or None) – Specify which list to use among probes_filename (dense) or scores_filename. If None it is tried to be estimated based on the given parameters.
keep_read_lists_in_memory (bool) – If set to True (the default), the lists are read only once and stored in memory. Otherwise the lists will be re-read for every query (not recommended).

all_files(groups=['dev'], add_zt_files=True)[source]¶

Returns all files for the given group. The internally stored protocol is used, throughout.

Parameters

groups ([str]) – A list of groups to retrieve the files for.
add_zt_files (bool) – If selected, also files for ZT-norm scoring will be added. Please select this option only if this dataset provides ZT-norm files, see implements_zt().

Returns

A list of all files that fulfill your query.

Return type

annotations(file)[source]¶

Reads the annotations for the given file id from file and returns them in a dictionary.

Parameters: file (BioFile) – The BioFile object for which the annotations should be read.
Returns: The annotations as a dictionary, e.g.: {'reye':(re_y,re_x), 'leye':(le_y,le_x)}
Return type: dict

client_id_from_model_id(model_id, group='dev')[source]¶

Returns the client id that is connected to the given model id.

Parameters

model_id (str or None) – The model id for which the client id should be returned.
groups (str or [str] or None) – (optional) the groups, the client belongs to. Might be one or more of ('dev', 'eval', 'world', 'optional_world_1', 'optional_world_2'). If groups are given, only these groups are considered.
protocol (str or None) – The protocol to consider.

Returns

The client id for the given model id, if found.

Return type

client_id_from_t_model_id(t_model_id, group='dev')[source]¶

Returns the client id that is connected to the given T-Norm model id.

Parameters

model_id (str or None) – The model id for which the client id should be returned.
groups (str or [str] or None) – (optional) the groups, the client belongs to. Might be one or more of ('dev', 'eval'). If groups are given, only these groups are considered.

Returns

The client id for the given model id of a T-Norm model, if found.

Return type

client_ids(protocol=None, groups=None)[source]¶

Returns a list of client ids for the specific query by the user.

Parameters

protocol (str or None) – The protocol to consider
groups (str or [str] or None) – The groups to which the clients belong ('dev', 'eval', 'world', 'optional_world_1', 'optional_world_2').

Returns

A list containing all the client ids which have the given properties.

Return type

[str]

get_base_directory()[source]¶: Returns the base directory where the filelists defining the database are located.

groups(protocol=None, add_world=True, add_subworld=True)[source]¶

This function returns the list of groups for this database.

Parameters

protocol (str or None) – The protocol for which the groups should be retrieved. If None, the internally stored protocol is used.
add_world (bool) – Add the world groups?
add_subworld (bool) – Add the sub-world groups? Only valid, when add_world=True

Returns

A list of groups

Return type

[str]

implements_zt(protocol=None, groups=None)[source]¶

Checks if the file lists for the ZT score normalization are available.

Parameters

protocol (str or None) – The protocol for which the groups should be retrieved.
groups (str or [str] or None) – The groups for which the ZT score normalization file lists should be checked ('dev', 'eval').

Returns

True if the all file lists for ZT score normalization exist, otherwise False.

Return type

bool

model_ids_with_protocol(groups=None, protocol=None, **kwargs)[source]¶

Returns a list of model ids for the specific query by the user.

Parameters

protocol (str or None) – The protocol to consider
groups (str or [str] or None) – The groups to which the models belong ('dev', 'eval', 'world', 'optional_world_1', 'optional_world_2').

Returns

A list containing all the model ids which have the given properties.

Return type

[str]

objects(groups=None, protocol=None, purposes=None, model_ids=None, classes=None, **kwargs)[source]¶

Returns a set of bob.bio.base.database.BioFile objects for the specific query by the user.

Parameters

protocol (str or None) – The protocol to consider
purposes (str or [str] or None) – The purposes required to be retrieved ('enroll', 'probe') or a tuple with several of them. If None is given (this is the default), it is considered the same as a tuple with all possible values. This field is ignored for the data from the 'world', 'optional_world_1', 'optional_world_2' groups.
model_ids (str or [str] or None) – Only retrieves the files for the provided list of model ids (claimed client id). If None is given (this is the default), no filter over the model_ids is performed.
groups (str or [str] or None) – One of the groups ('dev', 'eval', 'world', 'optional_world_1', 'optional_world_2') or a tuple with several of them. If None is given (this is the default), it is considered to be the existing subset of ('world', 'dev', 'eval').
classes (str or [str] or None) –
The classes (types of accesses) to be retrieved ('client', 'impostor') or a tuple with several of them. If None is given (this is the default), it is considered the same as a tuple with all possible values.

Note

Classes are not allowed to be specified when ‘probes_filename’ is used in the constructor.

Returns

A list of BioFile objects considering all the filtering criteria.

Return type

original_file_name(file, check_existence=True)[source]¶

Returns the original file name of the given file.

This interface supports several original extensions, so that file lists can contain images of different data types.

When multiple original extensions are specified, this function will check the existence of any of these file names, and return the first one that actually exists. In this case, the check_existence flag is ignored.

Parameters

file (BioFile) – The BioFile object for which the file name should be returned.
check_existence (bool) – Should the existence of the original file be checked? (Ignored when multiple original extensions were specified in the constructor.)

Returns

The full path of the original data file.

Return type

set_base_directory(filelists_directory)[source]¶: Resets the base directory where the filelists defining the database are located.

tclient_ids(protocol=None, groups=None)[source]¶

Returns a list of T-Norm client ids for the specific query by the user.

Parameters

protocol (str or None) – The protocol to consider
groups (str or [str] or None) – The groups to which the clients belong (“dev”, “eval”).

Returns

A list containing all the T-Norm client ids which have the given properties.

Return type

[str]

tmodel_ids_with_protocol(protocol=None, groups=None, **kwargs)[source]¶

Returns a list of T-Norm model ids for the specific query by the user.

Parameters

protocol (str or None) – The protocol to consider
groups (str or [str] or None) – The groups to which the models belong ('dev', 'eval').

Returns

A list containing all the T-Norm model ids belonging to the given group.

Return type

[str]

tobjects(groups=None, protocol=None, model_ids=None, **kwargs)[source]¶

Returns a list of bob.bio.base.database.BioFile objects for enrolling T-norm models for score normalization.

Parameters

protocol (str or None) – The protocol to consider
model_ids (str or [str] or None) – Only retrieves the files for the provided list of model ids (claimed client id). If None is given (this is the default), no filter over the model_ids is performed.
groups (str or [str] or None) – The groups to which the models belong ('dev', 'eval').

Returns

A list of BioFile objects considering all the filtering criteria.

Return type

uses_dense_probe_file(protocol)[source]¶: Determines if a dense probe file list is used based on the existence of parameters.

zclient_ids(protocol=None, groups=None)[source]¶

Returns a list of Z-Norm client ids for the specific query by the user.

Parameters

protocol (str or None) – The protocol to consider
groups (str or [str] or None) – The groups to which the clients belong (“dev”, “eval”).

Returns

A list containing all the Z-Norm client ids which have the given properties.

Return type

[str]

zobjects(groups=None, protocol=None, **kwargs)[source]¶

Returns a list of BioFile objects to perform Z-norm score normalization.

Parameters

protocol (str or None) – The protocol to consider
groups (str or [str] or None) – The groups to which the clients belong ('dev', 'eval').

Returns

A list of File objects considering all the filtering criteria.

Return type

AlgorithmTransformer.transform maps bob.bio.base.algorithm.Algorithm.project()

class bob.bio.base.database.LSTToSampleLoader(data_loader, dataset_original_directory='', extension='')[source]¶

Bases: bob.pipelines.sample_loaders.CSVToSampleLoader

Simple mechanism that converts the lines of a LST file to bob.pipelines.DelayedSample or bob.pipelines.SampleSet

transform(X)[source]¶

Transform one CVS line to ONE bob.pipelines.DelayedSample

Parameters: X – CSV File Object (open file)

convert_row_to_sample(row, header=None)[source]¶

class bob.bio.base.database.ZTBioDatabase(name, z_probe_options={}, **kwargs)¶

Bases: bob.bio.base.database.BioDatabase

This class defines another set of abstract functions that need to be implemented if your database provides the interface for computing scores used for ZT-normalization.

all_files(groups=None) → files[source]¶

Returns all files of the database, including those for ZT norm, respecting the current protocol. The files can be limited using the all_files_options and the the z_probe_options in the constructor.

Parameters:

groupssome of ('world', 'dev', 'eval') or None: The groups to get the data for. If None, data for all groups is returned.
add_zt_files: bool: If set (the default), files for ZT score normalization are added.

Returns:

files[bob.bio.base.database.BioFile]: The sorted and unique list of all files of the database.

client_id_from_t_model_id(t_model_id, group='dev') → client_id[source]¶

Returns the client id for the given T-Norm model id. In this base class implementation, we just use the BioDatabase.client_id_from_model_id() function. Overload this function if you need another behavior.

Parameters:

t_model_idint or str: A unique ID that identifies the T-Norm model.
groupone of ('dev', 'eval'): The group to get the client ids for.

Returns:

client_id[int] or [str]: A unique ID that identifies the client, to which the T-Norm model belongs.

t_enroll_files(t_model_id, group='dev') → files[source]¶

Returns a list of File objects that should be used to enroll the T-Norm model with the given model id from the given group, respecting the current protocol.

Parameters:

t_model_idint or str: A unique ID that identifies the model.
groupone of ('dev', 'eval'): The group to get the enrollment files for.

Returns:

files[bob.bio.base.database.BioFile]: The sorted list of files used for to enroll the model with the given model id.

t_model_ids(group='dev') → ids[source]¶

Returns a list of model ids of T-Norm models for the given group, respecting the current protocol.

Parameters:

groupone of ('dev', 'eval'): The group to get the model ids for.

Returns:

ids[int] or [str]: The list of (unique) model ids for T-Norm models of the given group.

abstract tmodel_ids_with_protocol(protocol=None, groups=None, **kwargs)[source]¶

This function returns the ids of the T-Norm models of the given groups for the given protocol.

Keyword parameters:

groupsstr or [str]: The groups of which the model ids should be returned. Usually, groups are one or more elements of (‘dev’, ‘eval’)
protocolstr: The protocol for which the model ids should be retrieved. The protocol is dependent on your database. If you do not have protocols defined, just ignore this field.

abstract tobjects(groups=None, protocol=None, model_ids=None, **kwargs)[source]¶

This function returns the File objects of the T-Norm models of the given groups for the given protocol and the given model ids.

Keyword parameters:

groupsstr or [str]: The groups of which the model ids should be returned. Usually, groups are one or more elements of (‘dev’, ‘eval’)
protocolstr: The protocol for which the model ids should be retrieved. The protocol is dependent on your database. If you do not have protocols defined, just ignore this field.
model_ids[various type]: The model ids for which the File objects should be retrieved. What defines a ‘model id’ is dependent on the database. In cases, where there is only one model per client, model ids and client ids are identical. In cases, where there is one model per file, model ids and file ids are identical. But, there might also be other cases.

z_probe_file_sets(group='dev') → files[source]¶

Returns a list of probe FileSet objects used to compute the Z-Norm. This function needs to be implemented in derived class implementations.

Parameters:

groupone of ('dev', 'eval'): The group to get the Z-norm probe files for.

Returns:

files[bob.bio.base.database.BioFileSet]: The unique list of file sets used to compute the Z-norm.

z_probe_files(group='dev') → files[source]¶

Returns a list of probe files used to compute the Z-Norm, respecting the current protocol. The Z-probe files can be limited using the z_probe_options in the query to bob.bio.base.database.ZTBioDatabase.z_probe_files()

Parameters:

groupone of ('dev', 'eval'): The group to get the Z-norm probe files for.

Returns:

files[bob.bio.base.database.BioFile]: The unique list of files used to compute the Z-norm.

abstract zobjects(groups=None, protocol=None, **kwargs)[source]¶

This function returns the File objects of the Z-Norm impostor files of the given groups for the given protocol.

Keyword parameters:

groupsstr or [str]: The groups of which the model ids should be returned. Usually, groups are one or more elements of (‘dev’, ‘eval’)
protocolstr: The protocol for which the model ids should be retrieved. The protocol is dependent on your database. If you do not have protocols defined, just ignore this field.

class bob.bio.base.preprocessor.Filename¶

Bases: bob.bio.base.preprocessor.Preprocessor

This preprocessor is simply passing over the file name, in order to be used in an extractor that loads the data from file.

The file name that will be returned by the read_data() function will contain the path of the bob.bio.base.database.BioFile, but it might contain more paths (such as the --preprocessed-directory passed on command line).

read_data(data_file) → data[source]¶

Returns the name of the data file without its filename extension.

Parameters:

data_filestr: The name of the preprocessed data file.

Returns:

datastr: The preprocessed data read from file.

write_data(data, data_file)[source]¶

Does not write any data.

dataany: ignored.
data_fileany: ignored.

class bob.bio.base.preprocessor.Preprocessor(writes_data=True, read_original_data=None, min_preprocessed_file_size=1000, **kwargs)¶

Bases: object

This is the base class for all preprocessors. It defines the minimum requirements for all derived proprocessor classes.

Parameters:

writes_databool: Select, if the preprocessor actually writes preprocessed images, or if it is simply returning values.
read_original_data: callable or None: This function is used to read the original data from file. It takes three inputs: A bob.bio.base.database.BioFile (or one of its derivatives), the original directory (as str) and the original extension (as str). If None, the default function bob.bio.base.read_original_data() is used.
min_preprocessed_file_size: int: The minimum file size of a saved preprocessd data in bytes. If the saved preprocessed data file size is smaller than this, it is assumed to be a corrupt file and the data will be processed again.
kwargskey=value pairs: A list of keyword arguments to be written in the __str__ function.

read_data(data_file) → data[source]¶

Reads the preprocessed data from file. In this base class implementation, it uses bob.bio.base.load() to do that. If you have different format, please overwrite this function.

Parameters:

data_filestr or bob.io.base.HDF5File: The file open for reading or the name of the file to read from.

Returns:

dataobject (usually numpy.ndarray): The preprocessed data read from file.

write_data(data, data_file)[source]¶

Writes the given preprocessed data to a file with the given name. In this base class implementation, we simply use bob.bio.base.save() for that. If you have a different format (e.g. not images), please overwrite this function.

Parameters:

dataobject: The preprocessed data, i.e., what is returned from __call__.
data_filestr or bob.io.base.HDF5File: The file open for writing, or the name of the file to write.

class bob.bio.base.extractor.Extractor(requires_training=False, split_training_data_by_client=False, min_extractor_file_size=1000, min_feature_file_size=1000, **kwargs)¶

Bases: object

This is the base class for all feature extractors. It defines the minimum requirements that a derived feature extractor class need to implement.

If your derived class requires training, please register this here.

Parameters

requires_trainingbool: Set this flag to True if your feature extractor needs to be trained. In that case, please override the train() and load() methods
split_training_data_by_clientbool: Set this flag to True if your feature extractor requires the training data to be split by clients. Ignored, if requires_training is False
min_extractor_file_sizeint: The minimum file size of a saved extractor file for extractors that require training in bytes. If the saved file size is smaller than this, it is assumed to be a corrupt file and the extractor will be trained again.
min_feature_file_sizeint: The minimum file size of extracted features in bytes. If the saved file size is smaller than this, it is assumed to be a corrupt file and the features will be extracted again.
kwargskey=value pairs: A list of keyword arguments to be written in the __str__ function.

load(extractor_file)[source]¶

Loads the parameters required for feature extraction from the extractor file. This function usually is only useful in combination with the train() function. In this base class implementation, it does nothing.

Parameters:

extractor_filestr: The file to read the extractor from.

read_feature(feature_file)[source]¶

Reads the extracted feature from file. In this base class implementation, it uses bob.bio.base.load() to do that. If you have different format, please overwrite this function.

Parameters:

feature_filestr or bob.io.base.HDF5File: The file open for reading or the name of the file to read from.

Returns:

featureobject (usually numpy.ndarray): The feature read from file.

train(training_data, extractor_file)[source]¶

This function can be overwritten to train the feature extractor. If you do this, please also register the function by calling this base class constructor and enabling the training by requires_training = True.

Parameters:

training_data[object] or [[object]]: A list of preprocessed data that can be used for training the extractor. Data will be provided in a single list, if split_training_features_by_client = False was specified in the constructor, otherwise the data will be split into lists, each of which contains the data of a single (training-)client.
extractor_filestr: The file to write. This file should be readable with the load() function.

write_feature(feature, feature_file)[source]¶

Writes the given extracted feature to a file with the given name. In this base class implementation, we simply use bob.bio.base.save() for that. If you have a different format, please overwrite this function.

Parameters:

featureobject: The extracted feature, i.e., what is returned from __call__.
feature_filestr or bob.io.base.HDF5File: The file open for writing, or the name of the file to write.

class bob.bio.base.extractor.Linearize(dtype=None)¶

Bases: bob.bio.base.extractor.Extractor

Extracts features by simply concatenating all elements of the data into one long vector.

If a dtype is specified in the contructor, it is assured that the resulting

load(**kwargs)[source]¶

Loads the parameters required for feature extraction from the extractor file. This function usually is only useful in combination with the train() function. In this base class implementation, it does nothing.

Parameters:

extractor_filestr: The file to read the extractor from.

train(**kwargs)[source]¶

This function can be overwritten to train the feature extractor. If you do this, please also register the function by calling this base class constructor and enabling the training by requires_training = True.

Parameters:

training_data[object] or [[object]]: A list of preprocessed data that can be used for training the extractor. Data will be provided in a single list, if split_training_features_by_client = False was specified in the constructor, otherwise the data will be split into lists, each of which contains the data of a single (training-)client.
extractor_filestr: The file to write. This file should be readable with the load() function.

class bob.bio.base.transformers.AlgorithmTransformer(instance, projector_file=None, **kwargs)¶

Bases: sklearn.base.TransformerMixin, sklearn.base.BaseEstimator

Class that wraps bob.bio.base.algorithm.Algorithm

AlgorithmTransformer.fit maps to bob.bio.base.algorithm.Algorithm.train_projector()

Example

Wrapping LDA algorithm with functools >>> from bob.bio.base.pipelines.vanilla_biometrics import AlgorithmTransformer >>> from bob.bio.base.algorithm import LDA >>> transformer = AlgorithmTransformer(LDA(use_pinv=True, pca_subspace_dimension=0.90)

Parameters: instance (object) – An instance of bob.bio.base.algorithm.Algorithm

fit(X, y=None)[source]¶

transform(X, metadata=None)[source]¶

class bob.bio.base.transformers.ExtractorTransformer(instance, model_path=None, **kwargs)¶

Bases: sklearn.base.TransformerMixin, sklearn.base.BaseEstimator

Scikit learn transformer for bob.bio.base.extractor.Extractor.

Parameters

instance (object) – An instance of bob.bio.base.extractor.Extractor
model_path (str) – Model path in case instance.requires_training is equal to True.

fit(X, y=None)[source]¶

transform(X, metadata=None)[source]¶

class bob.bio.base.transformers.PreprocessorTransformer(instance, **kwargs)¶

Bases: sklearn.base.TransformerMixin, sklearn.base.BaseEstimator

Scikit learn transformer for bob.bio.base.preprocessor.Preprocessor.

Parameters: instance (object) – An instance of bob.bio.base.preprocessor.Preprocessor

fit(X, y=None)[source]¶

transform(X, annotations=None)[source]¶

class bob.bio.base.transformers.defaultdict¶

Bases: dict

defaultdict(default_factory[, …]) –> dict with default factory

The default factory is called without arguments to produce a new value when a key is not present, in __getitem__ only. A defaultdict compares equal to a dict with the same items. All remaining arguments are treated the same as if they were passed to the dict constructor, including keyword arguments.

copy() → a shallow copy of D.¶

default_factory¶: Factory for default value called by __missing__().

bob.bio.base.transformers.split_X_by_y(X, y)[source]¶

class bob.bio.base.algorithm.Algorithm(performs_projection=False, requires_projector_training=True, split_training_features_by_client=False, use_projected_features_for_enrollment=True, requires_enroller_training=False, multiple_model_scoring='average', multiple_probe_scoring='average', min_projector_file_size=1000, min_projected_file_size=1000, min_enroller_file_size=1000, min_model_file_size=1000, min_t_model_file_size=1000, **kwargs)¶

Bases: object

This is the base class for all biometric recognition algorithms. It defines the minimum requirements for all derived algorithm classes.

Call the constructor in derived class implementations. If your derived algorithm performs feature projection, please register this here. If it needs training for the projector or the enroller, please set this here, too.

Parameters:

performs_projectionbool: Set to True if your derived algorithm performs a projection. Also implement the project() function, and the load_projector() if necessary.
requires_projector_trainingbool: Only valid, when performs_projection = True. Set this flag to False, when the projection is applied, but the projector does not need to be trained.
split_training_features_by_clientbool: Only valid, when performs_projection = True and requires_projector_training = True. If set to True, the train_projector() function will receive a double list (a list of lists) of data (sorted by identity). Otherwise, the train_projector() function will receive data in a single list.
use_projected_features_for_enrollmentbool: Only valid, when performs_projection = True. If set to false, the enrollment is performed using the original features, otherwise the features projected using the project() function are used for model enrollment.
requires_enroller_trainingbool: Set this flag to True, when the enroller requires specialized training. Which kind of features are used for training depends on the use_projected_features_for_enrollment flag.
multiple_model_scoringstr or None: The way, scores are fused when multiple features are stored in a one model. See bob.bio.base.score_fusion_strategy() for possible values.
multiple_probe_scoringstr or None: The way, scores are fused when multiple probes are available. See bob.bio.base.score_fusion_strategy() for possible values.
min_projector_file_sizeint: The minimum file size of projector_file in bytes. If the saved file is smaller than this, it is assumed to be corrupt and it will be generated again.
min_projected_file_sizeint: The minimum file size of projected_file in bytes. If the saved file is smaller than this, it is assumed to be corrupt and it will be generated again.
min_enroller_file_sizeint: The minimum file size of enroller_file in bytes. If the saved file is smaller than this, it is assumed to be corrupt and it will be generated again.
min_model_file_sizeint: The minimum file size of model_file in bytes. If the saved file is smaller than this, it is assumed to be corrupt and it will be generated again.
kwargskey=value pairs: A list of keyword arguments to be written in the __str__ function.

enroll(enroll_features) → model[source]¶

This function will enroll and return the model from the given list of features. It must be overwritten by derived classes.

Parameters:

enroll_features[object]: A list of features used for the enrollment of one model.

Returns:

modelobject: The model enrolled from the enroll_features. Must be writable with the write_model() function and readable with the read_model() function.

load_enroller(enroller_file)[source]¶

Loads the parameters required for model enrollment from file. This function usually is only useful in combination with the train_enroller() function. This function is always called after calling load_projector(). In this base class implementation, it does nothing.

Parameters:

enroller_filestr: The file to read the enroller from.

load_projector(projector_file)[source]¶

Loads the parameters required for feature projection from file. This function usually is useful in combination with the train_projector() function. In this base class implementation, it does nothing.

Please register performs_projection = True in the constructor to enable this function.

Parameters:

projector_filestr: The file to read the projector from.

project(feature) → projected[source]¶

This function will project the given feature. It must be overwritten by derived classes, as soon as performs_projection = True was set in the constructor. It is assured that the load_projector() was called once before the project function is executed.

Parameters:

featureobject: The feature to be projected.

Returns:

projectedobject: The projected features. Must be writable with the write_feature() function and readable with the read_feature() function.

read_feature(feature_file) → feature[source]¶

Reads the projected feature from file. In this base class implementation, it uses bob.io.base.load() to do that. If you have different format, please overwrite this function.

Please register performs_projection = True in the constructor to enable this function.

Parameters:

feature_filestr or bob.io.base.HDF5File: The file open for reading, or the file name to read from.

Returns:

featureobject: The feature that was read from file.

read_model(model_file) → model[source]¶

Loads the enrolled model from file. In this base class implementation, it uses bob.io.base.load() to do that.

If you have a different format, please overwrite this function.

Parameters:

model_filestr or bob.io.base.HDF5File: The file open for reading, or the file name to read from.

Returns:

modelobject: The model that was read from file.

score(model, probe) → score[source]¶

This function will compute the score between the given model and probe. It must be overwritten by derived classes.

Parameters:

modelobject: The model to compare the probe with. The model was read using the read_model() function.
probeobject: The probe object to compare the model with. The probe was read using the read_feature() function (or the bob.bio.base.extractor.Extractor.read_feature() function, if this algorithm does not perform projection.

Returns:

scorefloat: A similarity between model and probe. Higher values define higher similarities.

score_for_multiple_models(models, probe) → score[source]¶

This function computes the score between the given model list and the given probe. In this base class implementation, it computes the scores for each model using the score method, and fuses the scores using the fusion method specified in the constructor of this class. Usually this function is called from derived class score functions.

Parameters:

models[object]: A list of model objects.
probeobject: The probe object to compare the models with.

Returns:

scorefloat: The fused similarity between the given models and the probe.

score_for_multiple_probes(model, probes) → score[source]¶

This function computes the score between the given model and the given probe files. In this base class implementation, it computes the scores for each probe file using the score method, and fuses the scores using the fusion method specified in the constructor of this class.

Parameters:

modelobject: A model object to compare the probes with.
probes[object]: The list of probe object to compare the models with.

Returns:

scorefloat: The fused similarity between the given model and the probes.

train_enroller(training_features, enroller_file)[source]¶

This function can be overwritten to train the model enroller. If you do this, please also register the function by calling this base class constructor and enabling the training by require_enroller_training = True.

Parameters:

training_features[object] or [[object]]: A list of extracted features that can be used for training the projector. Features will be split into lists, each of which contains the features of a single (training-)client.
enroller_filestr: The file to write. This file should be readable with the load_enroller() function.

train_projector(training_features, projector_file)[source]¶

This function can be overwritten to train the feature projector. If you do this, please also register the function by calling this base class constructor and enabling the training by requires_projector_training = True.

Parameters:

training_features[object] or [[object]]: A list of extracted features that can be used for training the projector. Features will be provided in a single list, if split_training_features_by_client = False was specified in the constructor, otherwise the features will be split into lists, each of which contains the features of a single (training-)client.
projector_filestr: The file to write. This file should be readable with the load_projector() function.

write_feature(feature, feature_file)[source]¶

Saves the given projected feature to a file with the given name. In this base class implementation:

If the given feature has a save attribute, it calls feature.save(bob.io.base.HDF5File(feature_file), 'w'). In this case, the given feature_file might be either a file name or a bob.io.base.HDF5File.
Otherwise, it uses bob.io.base.save() to do that.

If you have a different format, please overwrite this function.

Please register ‘performs_projection = True’ in the constructor to enable this function.

Parameters:

featureobject: A feature as returned by the project() function, which should be written.
feature_filestr or bob.io.base.HDF5File: The file open for writing, or the file name to write to.

write_model(model, model_file)[source]¶

Writes the enrolled model to the given file. In this base class implementation:

If the given model has a ‘save’ attribute, it calls model.save(bob.io.base.HDF5File(model_file), 'w'). In this case, the given model_file might be either a file name or a bob.io.base.HDF5File.
Otherwise, it uses bob.io.base.save() to do that.

If you have a different format, please overwrite this function.

Parameters:

modelobject: A model as returned by the enroll function, which should be written.
model_filestr or bob.io.base.HDF5File: The file open for writing, or the file name to write to.

class bob.bio.base.algorithm.Distance(distance_function=<function euclidean>, is_distance_function=True, **kwargs)¶

This class defines a simple distance measure between two features. Independent of the actual shape, each feature vector is treated as a one-dimensional vector, and the specified distance function is used to compute the distance between the two features. If the given distance_function actually computes a distance, we simply return its negative value (as all Algorithm’s are supposed to return similarity values). If the distance_function computes similarities, the similarity value is returned unaltered.

Parameters:

distance_functioncallable: A function taking two 1D arrays and returning a float
is_distance_functionbool: Set this flag to False if the given distance_function computes a similarity value (i.e., higher values are better)
kwargskey=value pairs: A list of keyword arguments directly passed to the Algorithm base class constructor.

enroll(enroll_features) → model[source]¶

Enrolls the model by storing all given input vectors.

Parameters:

enroll_features[numpy.ndarray]: The list of projected features to enroll the model from.

Returns:

model2D numpy.ndarray: The enrolled model.

load_enroller(**kwargs)[source]¶

Loads the parameters required for model enrollment from file. This function usually is only useful in combination with the train_enroller() function. This function is always called after calling load_projector(). In this base class implementation, it does nothing.

Parameters:

enroller_filestr: The file to read the enroller from.

load_projector(**kwargs)[source]¶

Loads the parameters required for feature projection from file. This function usually is useful in combination with the train_projector() function. In this base class implementation, it does nothing.

Please register performs_projection = True in the constructor to enable this function.

Parameters:

projector_filestr: The file to read the projector from.

project(feature) → projected[source]¶

This function will project the given feature. It must be overwritten by derived classes, as soon as performs_projection = True was set in the constructor. It is assured that the load_projector() was called once before the project function is executed.

Parameters:

featureobject: The feature to be projected.

Returns:

projectedobject: The projected features. Must be writable with the write_feature() function and readable with the read_feature() function.

read_feature(feature_file) → feature[source]¶

Reads the projected feature from file. In this base class implementation, it uses bob.io.base.load() to do that. If you have different format, please overwrite this function.

Please register performs_projection = True in the constructor to enable this function.

Parameters:

feature_filestr or bob.io.base.HDF5File: The file open for reading, or the file name to read from.

Returns:

featureobject: The feature that was read from file.

score(model, probe) → float[source]¶

Computes the distance of the model to the probe using the distance function specified in the constructor.

Parameters:

model2D numpy.ndarray: The model storing all enrollment features
probenumpy.ndarray: The probe feature vector

Returns:

scorefloat: A similarity value between model and probe

train_enroller(**kwargs)[source]¶

This function can be overwritten to train the model enroller. If you do this, please also register the function by calling this base class constructor and enabling the training by require_enroller_training = True.

Parameters:

training_features[object] or [[object]]: A list of extracted features that can be used for training the projector. Features will be split into lists, each of which contains the features of a single (training-)client.
enroller_filestr: The file to write. This file should be readable with the load_enroller() function.

train_projector(**kwargs)[source]¶

This function can be overwritten to train the feature projector. If you do this, please also register the function by calling this base class constructor and enabling the training by requires_projector_training = True.

Parameters:

training_features[object] or [[object]]: A list of extracted features that can be used for training the projector. Features will be provided in a single list, if split_training_features_by_client = False was specified in the constructor, otherwise the features will be split into lists, each of which contains the features of a single (training-)client.
projector_filestr: The file to write. This file should be readable with the load_projector() function.

write_feature(**kwargs)[source]¶

Saves the given projected feature to a file with the given name. In this base class implementation:

If the given feature has a save attribute, it calls feature.save(bob.io.base.HDF5File(feature_file), 'w'). In this case, the given feature_file might be either a file name or a bob.io.base.HDF5File.
Otherwise, it uses bob.io.base.save() to do that.

If you have a different format, please overwrite this function.

Please register ‘performs_projection = True’ in the constructor to enable this function.

Parameters:

featureobject: A feature as returned by the project() function, which should be written.
feature_filestr or bob.io.base.HDF5File: The file open for writing, or the file name to write to.

class bob.bio.base.algorithm.LDA(lda_subspace_dimension=None, pca_subspace_dimension=None, use_pinv=False, distance_function=<function euclidean>, is_distance_function=True, uses_variances=False, **kwargs)¶

Computes a linear discriminant analysis (LDA) on the given data, possibly after computing a principal component analysis (PCA).

This algorithm computes a LDA projection (bob.learn.linear.FisherLDATrainer) on the given training features, projects the features to Fisher space and computes the distance of two projected features in Fisher space. For example, the Fisher faces algorithm as proposed by [ZKC+98] can be run with this class.

Additionally, a PCA projection matrix can be computed beforehand, to reduce the dimensionality of the input vectors. In that case, the finally stored projection matrix is the combination of the PCA and LDA projection.

Parameters:

lda_subspace_dimensionint or None: If specified, the LDA subspace will be truncated to the given number of dimensions. By default (None) it is limited to the number of classes in the training set - 1.
pca_subspace_dimentsionint or float or None: If specified, a combined PCA + LDA projection matrix will be computed. If specified as int, defines the number of eigenvectors used in the PCA projection matrix. If specified as float (between 0 and 1), the number of eigenvectors is calculated such that the given percentage of variance is kept.
use_pinvbool: Use the Pseudo-inverse to compute the LDA projection matrix? Sometimes, the training fails because it is impossible to invert the covariance matrix. In these cases, you might want to set use_pinv to True, which solves this process, but slows down the processing noticeably.
distance_functionfunction: A function taking two parameters and returns a float. If uses_variances is set to True, the function is provided with a third parameter, which is the vector of variances (aka. eigenvalues).
is_distance_functionbool: Set this flag to False if the given distance_function computes a similarity value (i.e., higher values are better)
use_variancesbool: If set to True, the distance_function is provided with a third argument, which is the vector of variances (aka. eigenvalues).
kwargskey=value pairs: A list of keyword arguments directly passed to the Algorithm base class constructor.

enroll(enroll_features) → model[source]¶

Enrolls the model by storing all given input vectors.

Parameters:

enroll_features[1D numpy.ndarray]: The list of projected features to enroll the model from.

Returns:

model2D numpy.ndarray: The enrolled model.

load_enroller(**kwargs)[source]¶

Loads the parameters required for model enrollment from file. This function usually is only useful in combination with the train_enroller() function. This function is always called after calling load_projector(). In this base class implementation, it does nothing.

Parameters:

enroller_filestr: The file to read the enroller from.

load_projector(projector_file)[source]¶

Reads the projection matrix and the eigenvalues from file.

Parameters:

projector_filestr: An existing file, from which the PCA or PCA+LDA projection matrix and the eigenvalues are read.

project(feature) → projected[source]¶

Projects the given feature into Fisher space.

Parameters:

feature1D numpy.ndarray: The 1D feature to be projected.

Returns:

projected1D numpy.ndarray: The feature projected into Fisher space.

score(model, probe) → float[source]¶

Computes the distance of the model to the probe using the distance function specified in the constructor.

Parameters:

model2D numpy.ndarray: The model storing all enrollment features.
probe1D numpy.ndarray: The probe feature vector in Fisher space.

Returns:

scorefloat: A similarity value between model and probe

train_enroller(**kwargs)[source]¶

This function can be overwritten to train the model enroller. If you do this, please also register the function by calling this base class constructor and enabling the training by require_enroller_training = True.

Parameters:

training_features[object] or [[object]]: A list of extracted features that can be used for training the projector. Features will be split into lists, each of which contains the features of a single (training-)client.
enroller_filestr: The file to write. This file should be readable with the load_enroller() function.

train_projector(training_features, projector_file)[source]¶

Generates the LDA or PCA+LDA projection matrix from the given features (that are sorted by identity).

Parameters:

training_features[[1D numpy.ndarray]]: A list of lists of 1D training arrays (vectors) to train the LDA projection matrix with. Each sub-list contains the features of one client.
projector_filestr: A writable file, into which the LDA or PCA+LDA projection matrix (as a bob.learn.linear.Machine) and the eigenvalues will be written.

class bob.bio.base.algorithm.PCA(subspace_dimension, distance_function=<function euclidean>, is_distance_function=True, uses_variances=False, **kwargs)¶

Performs a principal component analysis (PCA) on the given data.

This algorithm computes a PCA projection (bob.learn.linear.PCATrainer) on the given training features, projects the features to eigenspace and computes the distance of two projected features in eigenspace. For example, the eigenface algorithm as proposed by [TP91] can be run with this class.

Parameters:

subspace_dimensionint or float: If specified as int, defines the number of eigenvectors used in the PCA projection matrix. If specified as float (between 0 and 1), the number of eigenvectors is calculated such that the given percentage of variance is kept.
distance_functionfunction: A function taking two parameters and returns a float. If uses_variances is set to True, the function is provided with a third parameter, which is the vector of variances (aka. eigenvalues).
is_distance_functionbool: Set this flag to False if the given distance_function computes a similarity value (i.e., higher values are better)
use_variancesbool: If set to True, the distance_function is provided with a third argument, which is the vector of variances (aka. eigenvalues).
kwargskey=value pairs: A list of keyword arguments directly passed to the Algorithm base class constructor.

enroll(enroll_features) → model[source]¶

Enrolls the model by storing all given input vectors.

Parameters:

enroll_features[1D numpy.ndarray]: The list of projected features to enroll the model from.

Returns:

model2D numpy.ndarray: The enrolled model.

load_enroller(**kwargs)[source]¶

Loads the parameters required for model enrollment from file. This function usually is only useful in combination with the train_enroller() function. This function is always called after calling load_projector(). In this base class implementation, it does nothing.

Parameters:

enroller_filestr: The file to read the enroller from.

load_projector(projector_file)[source]¶

Reads the PCA projection matrix and the eigenvalues from file.

Parameters:

projector_filestr: An existing file, from which the PCA projection matrix and the eigenvalues are read.

project(feature) → projected[source]¶

Projects the given feature into eigenspace.

Parameters:

feature1D numpy.ndarray: The 1D feature to be projected.

Returns:

projected1D numpy.ndarray: The feature projected into eigenspace.

score(model, probe) → float[source]¶

Computes the distance of the model to the probe using the distance function specified in the constructor.

Parameters:

model2D numpy.ndarray: The model storing all enrollment features.
probe1D numpy.ndarray: The probe feature vector in eigenspace.

Returns:

scorefloat: A similarity value between model and probe

train_enroller(**kwargs)[source]¶

This function can be overwritten to train the model enroller. If you do this, please also register the function by calling this base class constructor and enabling the training by require_enroller_training = True.

Parameters:

training_features[object] or [[object]]: A list of extracted features that can be used for training the projector. Features will be split into lists, each of which contains the features of a single (training-)client.
enroller_filestr: The file to write. This file should be readable with the load_enroller() function.

train_projector(training_features, projector_file)[source]¶

Generates the PCA covariance matrix and writes it into the given projector_file.

Parameters:

training_features[1D numpy.ndarray]: A list of 1D training arrays (vectors) to train the PCA projection matrix with.
projector_filestr: A writable file, into which the PCA projection matrix (as a bob.learn.linear.Machine) and the eigenvalues will be written.

class bob.bio.base.algorithm.PLDA(subspace_dimension_of_f, subspace_dimension_of_g, subspace_dimension_pca=None, plda_training_iterations=200, INIT_SEED=5489, INIT_F_METHOD='BETWEEN_SCATTER', INIT_G_METHOD='WITHIN_SCATTER', INIT_S_METHOD='VARIANCE_DATA', multiple_probe_scoring='joint_likelihood')¶

Tool chain for computing PLDA (over PCA-dimensionality reduced) features

Todo

Add more documentation for the PLDA constructor, i.e., by explaining the parameters

enroll(enroll_features)[source]¶: Enrolls the model by computing an average of the given input vectors

load_enroller(projector_file)[source]¶: Reads the PCA projection matrix and the PLDA model from file

load_projector(**kwargs)[source]¶

Loads the parameters required for feature projection from file. This function usually is useful in combination with the train_projector() function. In this base class implementation, it does nothing.

Please register performs_projection = True in the constructor to enable this function.

Parameters:

projector_filestr: The file to read the projector from.

project(feature) → projected[source]¶

This function will project the given feature. It must be overwritten by derived classes, as soon as performs_projection = True was set in the constructor. It is assured that the load_projector() was called once before the project function is executed.

Parameters:

featureobject: The feature to be projected.

Returns:

projectedobject: The projected features. Must be writable with the write_feature() function and readable with the read_feature() function.

read_feature(feature_file) → feature[source]¶

Reads the projected feature from file. In this base class implementation, it uses bob.io.base.load() to do that. If you have different format, please overwrite this function.

Please register performs_projection = True in the constructor to enable this function.

Parameters:

feature_filestr or bob.io.base.HDF5File: The file open for reading, or the file name to read from.

Returns:

featureobject: The feature that was read from file.

read_model(model_file)[source]¶: Reads the model, which in this case is a PLDA-Machine

score(model, probe)[source]¶: Computes the PLDA score for the given model and probe

score_for_multiple_probes(model, probes)[source]¶: This function computes the score between the given model and several given probe files. In this base class implementation, it computes the scores for each probe file using the ‘score’ method, and fuses the scores using the fusion method specified in the constructor of this class.

train_enroller(training_features, projector_file)[source]¶: Generates the PLDA base model from a list of arrays (one per identity), and a set of training parameters. If PCA is requested, it is trained on the same data. Both the trained PLDABase and the PCA machine are written.

train_projector(**kwargs)[source]¶

This function can be overwritten to train the feature projector. If you do this, please also register the function by calling this base class constructor and enabling the training by requires_projector_training = True.

Parameters:

training_features[object] or [[object]]: A list of extracted features that can be used for training the projector. Features will be provided in a single list, if split_training_features_by_client = False was specified in the constructor, otherwise the features will be split into lists, each of which contains the features of a single (training-)client.
projector_filestr: The file to write. This file should be readable with the load_projector() function.

write_feature(**kwargs)[source]¶

Saves the given projected feature to a file with the given name. In this base class implementation:

If the given feature has a save attribute, it calls feature.save(bob.io.base.HDF5File(feature_file), 'w'). In this case, the given feature_file might be either a file name or a bob.io.base.HDF5File.
Otherwise, it uses bob.io.base.save() to do that.

If you have a different format, please overwrite this function.

Please register ‘performs_projection = True’ in the constructor to enable this function.

Parameters:

featureobject: A feature as returned by the project() function, which should be written.
feature_filestr or bob.io.base.HDF5File: The file open for writing, or the file name to write to.

A set of utilities to load score files with different formats.

bob.bio.base.score.load.iscsv(filename)[source]¶

bob.bio.base.score.load.open_file(filename, mode='rt')[source]¶

Opens the given score file for reading.

Score files might be raw text files, or a tar-file including a single score file inside.

Parameters: filename (str, file-like) – The name of the score file to open, or a file-like object open for reading. If a file name is given, the according file might be a raw text file or a (compressed) tar file containing a raw text file.
Returns: A read-only file-like object as it would be returned by open().
Return type: file-like

bob.bio.base.score.load.four_column(filename)[source]¶

Loads a score set from a single file and yield its lines

Loads a score set from a single file and yield its lines (to avoid loading the score file at once into memory). This function verifies that all fields are correctly placed and contain valid fields. The score file must contain the following information in each line:

claimed_id real_id test_label score

Parameters

filename (str, file-like) – The file object that will be opened with open_file() containing the scores.

Yields

str – The claimed identity – the client name of the model that was used in the comparison

str: The real identity – the client name of the probe that was used in the comparison

str: A label of the probe – usually the probe file name, or the probe id

float: The result of the comparison of the model and the probe

bob.bio.base.score.load.split_four_column(filename)[source]¶

Loads a score set from a single file and splits the scores

Loads a score set from a single file and splits the scores between negatives and positives. The score file has to respect the 4 column format as defined in the method four_column().

This method avoids loading and allocating memory for the strings present in the file. We only keep the scores.

Parameters

filename (str, file-like) – The file object that will be opened with open_file() containing the scores.

Returns

negatives, 1D float array containing the list of scores, for which: the claimed_id and the real_id are different (see four_column())
array: positives, 1D float array containing the list of scores, for which: the claimed_id and the real_id are identical (see four_column())

Return type

array

bob.bio.base.score.load.get_split_dataframe(filename)[source]¶

Loads a score set that was written with bob.bio.base.pipelines.vanilla_biometrics.CSVScoreWriter

Returns two dataframes, split between positives and negatives.

:param filename (str: opened with open_file() containing the scores. :type filename (str: The file object that will be :param file-like): opened with open_file() containing the scores. :type file-like): The file object that will be

Returns

dataframe (negatives, contains the list of scores (and metadata) for which) – the fields of the bio_ref_subject_id and probe_subject_id columns are different. (see Vanilla Biometrics: Advanced features)
dataframe (positives, contains the list of scores (and metadata) for which) – the fields of the bio_ref_subject_id and probe_subject_id columns are identical. (see Vanilla Biometrics: Advanced features)

bob.bio.base.score.load.split_csv_scores(filename)[source]¶

Loads a score set that was written with bob.bio.base.pipelines.vanilla_biometrics.CSVScoreWriter

:param filename (str: opened with open_file() containing the scores. :type filename (str: The file object that will be :param file-like): opened with open_file() containing the scores. :type file-like): The file object that will be

Returns

array (negatives, 1D float array containing the list of scores, for which) – the fields of the bio_ref_subject_id and probe_subject_id columns are different. (see Vanilla Biometrics: Advanced features)
array (positives, 1D float array containing the list of scores, for which) – the fields of the bio_ref_subject_id and probe_subject_id columns are identical. (see Vanilla Biometrics: Advanced features)

bob.bio.base.score.load.cmc_four_column(filename)[source]¶

Loads scores to compute CMC curves from a file in four column format.

The four column file needs to be in the same format as described in four_column(), and the test_label (column 3) has to contain the test/probe file name or a probe id.

This function returns a list of tuples. For each probe file, the tuple consists of a list of negative scores and a list of positive scores. Usually, the list of positive scores should contain only one element, but more are allowed. The result of this function can directly be passed to, e.g., the bob.measure.cmc() function.

Parameters: filename (str, file-like) – The file object that will be opened with open_file() containing the scores.
Returns: A list of tuples, where each tuple contains the negative and positive scores for one probe of the database. Both negatives and positives can be either an 1D numpy.ndarray of type float, or None.
Return type: list

bob.bio.base.score.load.five_column(filename)[source]¶

Loads a score set from a single file and yield its lines

Loads a score set from a single file and yield its lines (to avoid loading the score file at once into memory). This function verifies that all fields are correctly placed and contain valid fields. The score file must contain the following information in each line:

claimed_id model_label real_id test_label score

Parameters

filename (str, file-like) – The file object that will be opened with open_file() containing the scores.

Yields

str – The claimed identity – the client name of the model that was used in the comparison

str: A label for the model – usually the model file name, or the model id

str: The real identity – the client name of the probe that was used in the comparison

str: A label of the probe – usually the probe file name, or the probe id

float: The result of the comparison of the model and the probe

bob.bio.base.score.load.split_five_column(filename)[source]¶

Loads a score set from a single file and splits the scores

Loads a score set from a single file in five column format and splits the scores between negatives and positives. The score file has to respect the 5 column format as defined in the method five_column().

This method avoids loading and allocating memory for the strings present in the file. We only keep the scores.

Parameters

filename (str, file-like) – The file object that will be opened with open_file() containing the scores.

Returns

negatives, 1D float array containing the list of scores, for which: the claimed_id and the real_id are different (see four_column())
array: positives, 1D float array containing the list of scores, for which: the claimed_id and the real_id are identical (see four_column())

Return type

array

bob.bio.base.score.load.cmc_five_column(filename)[source]¶

Loads scores to compute CMC curves from a file in five column format.

The five column file needs to be in the same format as described in five_column(), and the test_label (column 4) has to contain the test/probe file name or a probe id.

This function returns a list of tuples. For each probe file, the tuple consists of a list of negative scores and a list of positive scores. Usually, the list of positive scores should contain only one element, but more are allowed. The result of this function can directly be passed to, e.g., the bob.measure.cmc() function.

Parameters: filename (str, file-like) – The file object that will be opened with open_file() containing the scores.
Returns: A list of tuples, where each tuple contains the negative and positive scores for one probe of the database.
Return type: list

bob.bio.base.score.load.scores(filename, ncolumns=None)[source]¶

Loads the scores from the given score file and yield its lines. Depending on the score file format, four or five elements are yielded, see bob.bio.base.score.load.four_column() and bob.bio.base.score.load.five_column() for details.

Parameters:

filename: str, file-like:: The file object that will be opened with open_file() containing the scores.
ncolumns: any: ignored

Yields:

tuple:: see bob.bio.base.score.load.four_column() or bob.bio.base.score.load.five_column()

bob.bio.base.score.load.split(filename, ncolumns=None, sort=False)[source]¶

Loads the scores from the given score file and splits them into positives and negatives. Depending on the score file format, it calls see bob.bio.base.score.load.split_four_column() and bob.bio.base.score.load.split_five_column() for details.

Parameters

filename (str) – The path to the score file.
ncolumns (int or None) – If specified to be 4 or 5, the score file will be assumed to be in the given format. If not specified, the score file format will be estimated automatically
sort (bool, optional) – If True, will return sorted negatives and positives

Returns

negatives (1D numpy.ndarray of type float) – This array contains the list of scores, for which the claimed_id and the real_id are different (see four_column())
positives (1D numpy.ndarray of type float) – This array contains the list of scores, for which the claimed_id and the real_id are identical (see four_column())

bob.bio.base.score.load.cmc(filename, ncolumns=None) → list[source]¶

Loads scores to compute CMC curves.

Depending on the score file format, it calls see bob.bio.base.score.load.cmc_four_column() and :py:func:`bob.bio.base.score.load.cmc_five_column for details.

Parameters

filename (str or file-like) – The file object that will be opened with open_file() containing the scores.
ncolumns – (int, Optional): If specified to be 4 or 5, the score file will be assumed to be in the given format. If not specified, the score file format will be estimated automatically

Returns:

list: [(neg,pos)] A list of tuples, where each tuple contains the negative and positive scores for one probe of the database.

bob.bio.base.score.load.load_score(filename, ncolumns=None, minimal=False, **kwargs)[source]¶

Load scores using numpy.loadtxt and return the data as a numpy array.

Parameters

filename (str, file-like) – The file object that will be opened with open_file() containing the scores.
ncolumns (int, optional) – 4, 5 or None (the default), specifying the number of columns in the score file. If None is provided, the number of columns will be guessed.
minimal (bool, optional) – If True, only loads claimed_id, real_id, and scores.
**kwargs – Keyword arguments passed to numpy.genfromtxt()

Returns

An array which contains not only the actual scores but also the claimed_id, real_id, test_label and ['model_label']

Return type

array

bob.bio.base.score.load.load_files(filenames, func_load)[source]¶

Load a list of score files and return a list of tuples of (neg, pos)

Parameters

filenames (list) – list of file paths
func_load – function that can read files in the list

Returns

:any:`list` ([(neg,pos)] A list of tuples, where each tuple contains the)
negative and positive sceach system/probee.

bob.bio.base.score.load.get_negatives_positives(score_lines)[source]¶: Take the output of load_score and return negatives and positives. This function aims to replace split_four_column and split_five_column but takes a different input. It’s up to you to use which one.

bob.bio.base.score.load.get_negatives_positives_from_file(filename, **kwargs)[source]¶: Loads the scores first efficiently and then calls get_negatives_positives

bob.bio.base.score.load.get_negatives_positives_all(score_lines_list)[source]¶: Take a list of outputs of load_score and return stacked negatives and positives.

bob.bio.base.score.load.get_all_scores(score_lines_list)[source]¶: Take a list of outputs of load_score and return stacked scores

bob.bio.base.score.load.dump_score(filename, score_lines)[source]¶: Dump scores that were loaded using load_score() The number of columns is automatically detected.

bob.bio.base.score.load.split_csv_vuln(filename)[source]¶

Loads vulnerability scores from a CSV score file.

Returns the scores split between positive and negative as well as licit and presentation attack (spoof).

The CSV must contain a probe_attack_type column with each field either containing a str defining the attack type (spoof), or empty (licit).

Parameters: filename (str) – The path to a CSV file containing all the scores
Returns: split_scores – The licit negative and positive, and spoof scores for probes.
Return type: dict of str: numpy.ndarray

Plots and measures for bob.bio.base

class bob.bio.base.script.figure.Roc(ctx, scores, evaluation, func_load)[source]¶: Bases: bob.measure.script.figure.Roc

class bob.bio.base.script.figure.Det(ctx, scores, evaluation, func_load)[source]¶: Bases: bob.measure.script.figure.Det

class bob.bio.base.script.figure.Cmc(ctx, scores, evaluation, func_load)[source]¶

Bases: bob.measure.script.figure.PlotBase

Handles the plotting of Cmc

compute(idx, input_scores, input_names)[source]¶: Plot CMC for dev and eval data using bob.measure.plot.cmc()

class bob.bio.base.script.figure.Dir(ctx, scores, evaluation, func_load)[source]¶

Bases: bob.measure.script.figure.PlotBase

Handles the plotting of DIR curve

compute(idx, input_scores, input_names)[source]¶: Plot DIR for dev and eval data using bob.measure.plot.detection_identification_curve()

class bob.bio.base.script.figure.Metrics(ctx, scores, evaluation, func_load, names=('Failure to Acquire', 'False Match Rate', 'False Non Match Rate', 'False Accept Rate', 'False Reject Rate', 'Half Total Error Rate'))[source]¶

Bases: bob.measure.script.figure.Metrics

Compute metrics from score files

init_process()[source]¶: Called in MeasureBase().run before iterating through the different systems. Should reimplemented in derived classes

compute(idx, input_scores, input_names)[source]¶: Compute metrics for the given criteria

class bob.bio.base.script.figure.MultiMetrics(ctx, scores, evaluation, func_load)[source]¶

Bases: bob.measure.script.figure.MultiMetrics

Compute metrics from score files

class bob.bio.base.script.figure.Hist(ctx, scores, evaluation, func_load, nhist_per_system=2)[source]¶

Bases: bob.measure.script.figure.Hist

Histograms for biometric scores

Click commands for bob.bio.base

bob.bio.base.script.commands.rank_option(**kwargs)[source]¶: Get option for rank parameter

Generate random scores.

bob.bio.base.script.gen.gen_score_distr(mean_neg, mean_pos, sigma_neg=10, sigma_pos=10, n_neg=5000, n_pos=5000, seed=0)[source]¶

Generate scores from normal distributions

Parameters

mean_neg (float) – Mean for negative scores
mean_pos (float) – Mean for positive scores
sigma_neg (float) – STDev for negative scores
sigma_pos (float) – STDev for positive scores
n_pos (int) – The number of positive scores generated
n_neg (int) – The number of negative scores generated
seed (int) – A value to initialize the Random Number generator. Giving the same value (or not specifying ‘seed’) on two different calls will generate the same lists of scores.

Returns

neg_scores (list) – Negatives scores
pos_scores (list) – Positive scores

bob.bio.base.script.gen.write_scores_to_file(neg, pos, filename, n_subjects=5, n_probes_per_subject=5, n_unknown_subjects=0, neg_unknown=None, to_csv=True, five_col=False, metadata={'meta0': 'data0', 'meta1': 'data1'})[source]¶

Writes score distributions

Parameters

neg (numpy.ndarray) – Scores for negative samples.
pos (numpy.ndarray) – Scores for positive samples.
filename (str) – The path to write the score to.
n_subjects (int) – Number of different subjects
n_probes_per_subject (int) – Number of different samples used as probe for each subject
n_unknown_subjects (int) – The number of unknown (no registered model) subjects
neg_unknown (None or list) – The of unknown subjects scores
to_csv (bool) – Use the CSV format, else the legacy 4 or 5 columns format.
five_col (bool) – If 5-colum format, else 4-column

bob.bio.base.utils.score_fusion_strategy(strategy_name='average')[source]¶

Returns a function to compute a fusion strategy between different scores.

Different strategies are employed:

'average' : The averaged score is computed using the numpy.average() function.
'min' : The minimum score is computed using the min() function.
'max' : The maximum score is computed using the max() function.
'median' : The median score is computed using the numpy.median() function.
None is also accepted, in which case None is returned.

bob.bio.base.utils.selected_indices(total_number_of_indices, desired_number_of_indices=None)[source]¶: Returns a list of indices that will contain exactly the number of desired indices (or the number of total items in the list, if this is smaller). These indices are selected such that they are evenly spread over the whole sequence.

bob.bio.base.utils.selected_elements(list_of_elements, desired_number_of_elements=None)[source]¶: Returns a list of elements that are sub-selected from the given list (or the list itself, if its length is smaller). These elements are selected such that they are evenly spread over the whole list.

bob.bio.base.utils.pretty_print(obj, kwargs)[source]¶: Returns a pretty-print of the parameters to the constructor of a class, which should be able to copy-paste on the command line to create the object (with few exceptions).

bob.bio.base.utils.is_argument_available(argument, method)[source]¶

Check if an argument (or keyword argument) is available in a method

bob.bio.base.utils.argument¶

The name of the argument (or keyword argument).

Type: str

bob.bio.base.utils.method¶: Pointer to the method

bob.bio.base.utils.resources.valid_keywords = ('database', 'preprocessor', 'extractor', 'algorithm', 'grid', 'config', 'annotator', 'pipeline')¶: Keywords for which resources are defined.

bob.bio.base.utils.resources.read_config_file(filenames, keyword=None)[source]¶

Use this function to read the given configuration file. If a keyword is specified, only the configuration according to this keyword is returned. Otherwise a dictionary of the configurations read from the configuration file is returned.

Parameters:

filenames[str]: A list (pontentially empty) of configuration files or resources to read running options from
keywordstr or None: If specified, only the contents of the variable with the given name is returned. If None, the whole configuration is returned (a local namespace)

Returns:

configobject or namespace: If keyword is specified, the object inside the configuration with the given name is returned. Otherwise, the whole configuration is returned (as a local namespace).

bob.bio.base.utils.resources.load_resource(resource, keyword, imports=['bob.bio.base'], package_prefix='bob.bio.', preferred_package=None)[source]¶

Loads the given resource that is registered with the given keyword. The resource can be:

a resource as defined in the setup.py
a configuration file
a string defining the construction of an object. If imports are required for the construction of this object, they can be given as list of strings.

Parameters:

resourcestr: Any string interpretable as a resource (see above).
keywordstr: A valid resource keyword, can be one of bob.bio.base.utils.resources.valid_keywords.
imports[str]: A list of strings defining which modules to import, when constructing new objects (option 3).
package_prefixstr: Package namespace, in which we search for entry points, e.g., bob.bio.
preferred_packagestr or None: When several resources with the same name are found in different packages (e.g., in different bob.bio or other packages), this specifies the preferred package to load the resource from. If not specified, the extension that is not from bob.bio is selected.

Returns:

resourceobject: The resulting resource object is returned, either read from file or resource, or created newly.

bob.bio.base.utils.resources.extensions(keywords=valid_keywords, package_prefix='bob.bio.') → extensions[source]¶

Returns a list of packages that define extensions using the given keywords.

Parameters:

keywords[str]: A list of keywords to load entry points for. Defaults to all bob.bio.base.utils.resources.valid_keywords.
package_prefixstr: Package namespace, in which we search for entry points, e.g., bob.bio.

bob.bio.base.utils.resources.resource_keys(keyword, exclude_packages=[], package_prefix='bob.bio.', strip=['dummy'])[source]¶: Reads and returns all resources that are registered with the given keyword. Entry points from the given exclude_packages are ignored.

bob.bio.base.utils.resources.list_resources(keyword, strip=['dummy'], package_prefix='bob.bio.', verbose=False, packages=None)[source]¶: Returns a string containing a detailed list of resources that are registered with the given keyword.

bob.bio.base.utils.resources.database_directories(strip=['dummy'], replacements=None, package_prefix='bob.bio.')[source]¶: Returns a dictionary of original directories for all registered databases.

bob.bio.base.utils.resources.get_resource_filename(resource_name, group)[source]¶

Get the file name of a resource.

Parameters

resource_name (str) – Name of the resource to be searched
group (str) – Entry point group

Returns

filename – The entrypoint file name

Return type