Python API for bob.bio.base¶
Generic functions¶
Functions dealing with resources¶
|
Loads the given resource that is registered with the given keyword. |
|
Use this function to read the given configuration file. |
|
Reads and returns all resources that are registered with the given keyword. |
|
Returns a list of packages that define extensions using the given keywords. |
Built-in immutable sequence. |
Miscellaneous functions¶
Returns a string containing the configuration information. |
|
Returns a function to compute a fusion strategy between different scores. |
|
|
Returns a list of elements that are sub-selected from the given list (or the list itself, if its length is smaller). |
|
Returns a list of indices that will contain exactly the number of desired indices (or the number of total items in the list, if this is smaller). |
Tools to run recognition experiments¶
Command line generation¶
Creates an |
|
|
Parses the command line and arranges the arguments accordingly. |
|
Converts the given options to a string that can be executed in a terminal. |
|
Writes information about the current experimental setup into a file specified on command line. |
This class provides shortcuts for selecting different files for different stages of the verification process. |
Controlling of elements¶
Returns the groups, for which the files must be preprocessed, and features must be extracted and projected. |
|
|
This function returns the first and last index for the files for the current job ID. |
Preprocessing¶
|
Preprocesses the original data of the database with the given preprocessor. |
Reads the preprocessed data from |
Feature Extraction¶
Trains the feature extractor using preprocessed data of the |
|
|
Extracts features from the preprocessed data using the given extractor. |
|
Reads the extracted features from |
Algorithm¶
Trains the feature projector using extracted features of the |
|
|
Projects the features for all files of the database. |
|
Trains the model enroller using the extracted or projected features, depending on your setup of the algorithm. |
|
Enroll the models for the given groups, eventually for both models and T-Norm-models. |
Scoring¶
|
Computes the scores for the given groups. |
|
Concatenates all results into one (or two) score files per group. |
|
Calibrates the score files by learning a linear calibration from the dev files (first element of the groups) and executing the on all groups. |
Compute metrics from score files |
Loading data¶
|
Opens the given score file for reading. |
|
Loads the scores from the given score file and yield its lines. |
|
Loads the scores from the given score file and splits them into positives and negatives. |
|
Loads scores to compute CMC curves. |
|
Loads a score set from a single file and yield its lines |
Loads a score set from a single file and splits the scores |
|
|
Loads scores to compute CMC curves from a file in four column format. |
|
Loads a score set from a single file and yield its lines |
Loads a score set from a single file and splits the scores |
|
|
Loads scores to compute CMC curves from a file in five column format. |
Plotting¶
|
Handles the plotting of Cmc |
|
Handles the plotting of DIR curve |
|
Histograms for biometric scores |
OpenBR conversions¶
Writes the OpenBR matrix and mask files (version 2), given a score file. |
|
Writes the Bob score file in the desired format from OpenBR files. |
Details¶
-
bob.bio.base.
valid_keywords
¶ Valid keywords, for which resources are defined, are
('database', 'preprocessor', 'extractor', 'algorithm', 'grid')
-
class
bob.bio.base.
Singleton
(decorated)[source]¶ Bases:
object
A non-thread-safe helper class to ease implementing singletons. This should be used as a decorator – not a metaclass – to the class that should be a singleton.
The decorated class can define one __init__ function that takes an arbitrary list of parameters.
To get the singleton instance, use the
instance()
method. Trying to use __call__ will result in a TypeError being raised.Limitations:
The decorated class cannot be inherited from.
The documentation of the decorated class is replaced with the documentation of this class.
-
bob.bio.base.
check_file
(filename, force, expected_file_size=1)[source]¶ Checks if the file with the given
filename
exists and has size greater or equal toexpected_file_size
. If the file is to small, or if theforce
option is set toTrue
, the file is removed. This function returnsTrue
is the file exists (and has not been removed), otherwiseFalse
-
bob.bio.base.
close_compressed
(filename, hdf5_file, compression_type='bz2', create_link=False)[source]¶ Closes the compressed hdf5_file that was opened with open_compressed. When the file was opened for writing (using the ‘w’ flag in open_compressed), the created HDF5 file is compressed into the given file name. To be able to read the data using the real tools, a link with the correct extension might is created, when create_link is set to True.
-
bob.bio.base.
database_directories
(strip=['dummy'], replacements=None, package_prefix='bob.bio.')[source]¶ Returns a dictionary of original directories for all registered databases.
-
bob.bio.base.
extensions
(keywords=valid_keywords, package_prefix='bob.bio.') → extensions[source]¶ Returns a list of packages that define extensions using the given keywords.
Parameters:
- keywords[str]
A list of keywords to load entry points for. Defaults to all
valid_keywords
.- package_prefixstr
Package namespace, in which we search for entry points, e.g.,
bob.bio
.
-
bob.bio.base.
filter_missing_files
(file_names, split_by_client=False, allow_missing_files=True)[source]¶ This function filters out files that do not exist, but only if
allow_missing_files
is set toTrue
, otherwise the list offile_names
is returned unaltered.
-
bob.bio.base.
filter_none
(data, split_by_client=False)[source]¶ This function filters out
None
values from the given list (or list of lists, whensplit_by_client
is enabled).
-
bob.bio.base.
is_argument_available
(argument, method)[source]¶ Check if an argument (or keyword argument) is available in a method
-
bob.bio.base.
method
¶ Pointer to the method
-
-
bob.bio.base.
list_resources
(keyword, strip=['dummy'], package_prefix='bob.bio.', verbose=False, packages=None)[source]¶ Returns a string containing a detailed list of resources that are registered with the given keyword.
-
bob.bio.base.
load
(file)[source]¶ Loads data from file. The given file might be an HDF5 file open for reading or a string.
-
bob.bio.base.
load_compressed
(filename, compression_type='bz2')[source]¶ Extracts the data to a temporary HDF5 file using HDF5 and reads its contents. Note that, though the file name is .hdf5, it contains compressed data! Accepted compression types are ‘gz’, ‘bz2’, ‘’
-
bob.bio.base.
load_resource
(resource, keyword, imports = ['bob.bio.base'], package_prefix='bob.bio.', preferred_package = None) → resource[source]¶ Loads the given resource that is registered with the given keyword. The resource can be:
a resource as defined in the setup.py
a configuration file
a string defining the construction of an object. If imports are required for the construction of this object, they can be given as list of strings.
Parameters:
- resourcestr
Any string interpretable as a resource (see above).
- keywordstr
A valid resource keyword, can be one of
valid_keywords
.- imports[str]
A list of strings defining which modules to import, when constructing new objects (option 3).
- package_prefixstr
Package namespace, in which we search for entry points, e.g.,
bob.bio
.- preferred_packagestr or
None
When several resources with the same name are found in different packages (e.g., in different
bob.bio
or other packages), this specifies the preferred package to load the resource from. If not specified, the extension that is not frombob.bio
is selected.
Returns:
- resourceobject
The resulting resource object is returned, either read from file or resource, or created newly.
-
bob.bio.base.
open_compressed
(filename, open_flag='r', compression_type='bz2')[source]¶ Opens a compressed HDF5File with the given opening flags. For the ‘r’ flag, the given compressed file will be extracted to a local space. For ‘w’, an empty HDF5File is created. In any case, the opened HDF5File is returned, which needs to be closed using the close_compressed() function.
-
bob.bio.base.
pretty_print
(obj, kwargs)[source]¶ Returns a pretty-print of the parameters to the constructor of a class, which should be able to copy-paste on the command line to create the object (with few exceptions).
-
bob.bio.base.
read_config_file
(filenames, keyword = None) → config[source]¶ Use this function to read the given configuration file. If a keyword is specified, only the configuration according to this keyword is returned. Otherwise a dictionary of the configurations read from the configuration file is returned.
Parameters:
- filenames[str]
A list (pontentially empty) of configuration files or resources to read running options from
- keywordstr or
None
If specified, only the contents of the variable with the given name is returned. If
None
, the whole configuration is returned (a local namespace)
Returns:
- configobject or namespace
If
keyword
is specified, the object inside the configuration with the given name is returned. Otherwise, the whole configuration is returned (as a local namespace).
-
bob.bio.base.
read_original_data
(biofile, directory, extension)[source]¶ This function reads the original data using the given
biofile
instance. It simply callsload(directory, extension)
frombob.bio.base.database.BioFile
or one of its derivatives.- Parameters
biofile (
bob.bio.base.database.BioFile
or one of its derivatives) – The file to read the original data.directory (str) – The base directory of the database.
extension (str or
None
) – The extension of the original data. Might beNone
if thebiofile
itself has the extension stored.
- Returns
Whatver
biofile.load
returns; usually anumpy.ndarray
- Return type
-
bob.bio.base.
resource_keys
(keyword, exclude_packages=[], package_prefix='bob.bio.', strip=['dummy'])[source]¶ Reads and returns all resources that are registered with the given keyword. Entry points from the given
exclude_packages
are ignored.
-
bob.bio.base.
save
(data, file, compression=0)[source]¶ Saves the data to file using HDF5. The given file might be an HDF5 file open for writing, or a string. If the given data contains a
save
method, this method is called with the given HDF5 file. Otherwise the data is written to the HDF5 file using the given compression.
-
bob.bio.base.
save_compressed
(data, filename, compression_type='bz2', create_link=False)[source]¶ Saves the data to a temporary file using HDF5. Afterwards, the file is compressed using the given compression method and saved using the given file name. Note that, though the file name will be .hdf5, it will contain compressed data! Accepted compression types are ‘gz’, ‘bz2’, ‘’
-
bob.bio.base.
score_fusion_strategy
(strategy_name='average')[source]¶ Returns a function to compute a fusion strategy between different scores.
Different strategies are employed:
'average'
: The averaged score is computed using thenumpy.average()
function.'min'
: The minimum score is computed using themin()
function.'max'
: The maximum score is computed using themax()
function.'median'
: The median score is computed using thenumpy.median()
function.None
is also accepted, in which caseNone
is returned.
-
bob.bio.base.
selected_elements
(list_of_elements, desired_number_of_elements=None)[source]¶ Returns a list of elements that are sub-selected from the given list (or the list itself, if its length is smaller). These elements are selected such that they are evenly spread over the whole list.
-
bob.bio.base.
selected_indices
(total_number_of_indices, desired_number_of_indices=None)[source]¶ Returns a list of indices that will contain exactly the number of desired indices (or the number of total items in the list, if this is smaller). These indices are selected such that they are evenly spread over the whole sequence.
-
bob.bio.base.
vstack_features
(reader, paths, same_size=False, allow_missing_files=False)[source]¶ Stacks all features in a memory efficient way.
- Parameters
reader (
collections.Callable
) – The function to load the features. The function should only take one argument being the path to the features. Usefunctools.partial
to accommodate your reader to this format. The features returned byreader
are expected to have the samenumpy.dtype
and the same shape except for their first dimension. First dimension is should correspond to the number of samples.paths (
collections.Iterable
) – An iterable of paths to iterate on. Whatever is inside path is given toreader
so they do not need to be necessarily paths to actual files. Ifsame_size
isTrue
,len(paths)
must be valid.same_size (
bool
, optional) – IfTrue
, it assumes that arrays inside all the paths are the same shape. If you know the features are the same size in all paths, set this toTrue
to improve the performance.allow_missing_files (
bool
, optional) – IfTrue
, it assumes that the items inside paths are actual files and ignores the ones that do not exist.
- Returns
The read features with the shape (n_samples, *features_shape[1:]).
- Return type
- Raises
ValueError – If both same_size and allow_missing_files are
True
.
Examples
This function in a simple way is equivalent to calling
numpy.vstack(reader(p) for p in paths)
.>>> import numpy >>> from bob.bio.base import vstack_features >>> def reader(path): ... # in each file, there are 5 samples and features are 2 dimensional. ... return numpy.arange(10).reshape(5,2) >>> paths = ['path1', 'path2'] >>> all_features = vstack_features(reader, paths) >>> all_features array([[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [0, 1], [2, 3], [4, 5], [6, 7], [8, 9]]) >>> all_features_with_more_memory = numpy.vstack(reader(p) for p in paths) >>> numpy.allclose(all_features, all_features_with_more_memory) True
You can allocate the array at once to improve the performance if you know that all features in paths have the same shape and you know the total number of the paths:
>>> vstack_features(reader, paths, same_size=True) array([[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [0, 1], [2, 3], [4, 5], [6, 7], [8, 9]])
-
class
bob.bio.base.tools.
FileSelector
(decorated)[source]¶ This class provides shortcuts for selecting different files for different stages of the verification process.
It communicates with the database and provides lists of file names for all steps of the tool chain.
Todo
Find a way that this class’ methods get correctly documented, instead of the
bob.bio.base.Singleton
wrapper class.Parameters:
- database
bob.bio.base.database.BioDatabase
or derived The database object that provides the list of files.
- preprocessed_directorystr
The directory, where preprocessed data should be written to.
- extractor_filestr
The filename, where the extractor should be written to (if any).
- extracted_directorystr
The directory, where extracted features should be written to.
- projector_filestr
The filename, where the projector should be written to (if any).
- projected_directorystr
The directory, where projetced features should be written to (if required).
- enroller_filestr
The filename, where the enroller should be written to (if required).
- model_directories(str, str)
The directories, where models and t-norm models should be written to.
- score_directories(str, str)
The directories, where score files for no-norm and ZT-norm should be written to.
- zt_score_directories(str, str, str, str, str) or
None
If given, specify the directories, where intermediate score files required to compute the ZT-norm should be written. The 5 directories are for 1: normal scores; 2: Z-scores; 3: T-scores; 4: ZT-scores; 5: ZT-samevalue scores.
- default_extensionstr
The default extension of all intermediate files.
- compressed_extensionstr
The extension for writing compressed score files. By default, no compression is performed.
- database
-
class
bob.bio.base.tools.
GridSubmission
(args, command_line_parameters, executable='verify.py', first_fake_job_id=0)[source]¶ Bases:
object
-
bob.bio.base.tools.
calibrate
(compute_zt_norm, groups=['dev', 'eval'], prior=0.5, write_compressed=False)[source]¶ Calibrates the score files by learning a linear calibration from the dev files (first element of the groups) and executing the on all groups.
This function is intended to compute the calibration parameters on the scores of the development set using the
bob.learn.linear.CGLogRegTrainer
. Afterward, both the scores of the development and evaluation sets are calibrated and written to file. For ZT-norm scores, the calibration is performed independently, if enabled. The names of the calibrated score files that should be written are obtained from thebob.bio.base.tools.FileSelector
.Note
All
NaN
scores in the development set are silently ignored. This might raise an error, if all scores areNaN
.Parameters:
- compute_zt_normbool
If set to
True
, also score files for ZT-norm are calibrated.- groupssome of
('dev', 'eval')
The list of groups, for which score files should be calibrated. The first of the given groups is used to train the logistic regression parameters, while the calibration is performed for all given groups.
- priorfloat
Whatever
bob.learn.linear.CGLogRegTrainer
takes as aprior
.- write_compressedbool
If enabled, calibrated score files are compressed as
.tar.bz2
files.
-
bob.bio.base.tools.
command_line
(cmdline) → str[source]¶ Converts the given options to a string that can be executed in a terminal. Parameters are enclosed into
'...'
quotes so that the command line can interpret them (e.g., if they contain spaces or special characters).Parameters:
- cmdline[str]
A list of command line options to be converted into a string.
Returns:
- strstr
The command line string that can be copy-pasted into the terminal.
-
bob.bio.base.tools.
command_line_config_group
(parser, package_prefix='bob.bio.', exclude_resources_from=[])[source]¶ Generic configuration command lines that can be used by different toolchains, e.g., in bob.bio or bob.pad. :param parser: Parser to which this argument group should be added :param package_prefix: prefix of a package, in which these arguments should be use, e.g., in bob.bio. or bob.pad. :param exclude_resources_from: resources that should be excluded from the commandline :return: new config argument group added to the parser
-
bob.bio.base.tools.
command_line_parser
(description=__doc__, exclude_resources_from=[]) → parsers[source]¶ Creates an
argparse.ArgumentParser
object that includes the minimum set of command line options (which is not so few). Thedescription
can be overwritten, but has a (small) default.Included in the parser, several groups are defined. Each group specifies a set of command line options. For the configurations, registered resources are listed, which can be limited by the
exclude_resources_from
list of extensions.It returns a dictionary, containing the parser object itself (in the
'main'
keyword), and a list of command line groups.Parameters:
- descriptionstr
The documentation of the script.
- exclude_resources_from[str]
A list of extension packages, for which resources should not be listed.
Returns:
- parsersdict
A dictionary of parser groups, with the main parser under the ‘main’ key. Feel free to add more options to any of the parser groups.
-
bob.bio.base.tools.
compute_scores
(algorithm, extractor, compute_zt_norm, indices=None, groups=['dev', 'eval'], types=['A', 'B', 'C', 'D'], write_compressed=False, allow_missing_files=False, force=False)[source]¶ Computes the scores for the given groups.
This function computes all scores for the experiment, and writes them to files, one per model. When
compute_zt_norm
is enabled, scores are computed for all four matrices, i.e. A: normal scores; B: Z-norm scores; C: T-norm scores; D: ZT-norm scores and ZT-samevalue scores. By default, scores are computed for both groups'dev'
and'eval'
.Parameters:
- algorithmpy:class:bob.bio.base.algorithm.Algorithm or derived
The algorithm, used for enrolling model and writing them to file.
- extractorpy:class:bob.bio.base.extractor.Extractor or derived
The extractor, used for extracting the features. The extractor is only used to read features, if the algorithm does not perform projection.
- compute_zt_normbool
If set to
True
, also ZT-norm scores are computed.- indices(int, int) or None
If specified, scores are computed only for the models in the given index range
range(begin, end)
. This is usually given, when parallel threads are executed.Note
The probe files are not limited by the
indices
.- groupssome of
('dev', 'eval')
The list of groups, for which scores should be computed.
- typessome of
['A', 'B', 'C', 'D']
A list of score types to be computed. If
compute_zt_norm = False
, only the'A'
scores are computed.- write_compressedbool
If enabled, score files are compressed as
.tar.bz2
files.- allow_missing_filesbool
If set to
True
, model and probe files that are not found will produceNaN
scores.- forcebool
If given, score files are regenerated, even if they already exist.
-
bob.bio.base.tools.
concatenate
(compute_zt_norm, groups=['dev', 'eval'], write_compressed=False, add_model_id=False)[source]¶ Concatenates all results into one (or two) score files per group.
Score files, which were generated per model, are concatenated into a single score file, which can be interpreter by
bob.bio.base.score.load.split_four_column()
. The score files are always re-computed, regardless if they exist or not.Parameters:
- compute_zt_normbool
If set to
True
, also score files for ZT-norm are concatenated.- groupssome of
('dev', 'eval')
The list of groups, for which score files should be concatenated.
- write_compressedbool
If enabled, concatenated score files are compressed as
.tar.bz2
files.
-
bob.bio.base.tools.
create_configuration_file
(parsers, args)[source]¶ This function writes an empty configuration file with all possible options.
-
bob.bio.base.tools.
enroll
(algorithm, extractor, compute_zt_norm, indices=None, groups=['dev', 'eval'], types=['N', 'T'], allow_missing_files=False, force=False)[source]¶ - Enroll the models for the given groups, eventually for both models and T-Norm-models.
This function uses the extracted or projected features to compute the models, depending on your setup of the given
algorithm
.
The given
algorithm
is used to enroll all models required for the current experiment. It writes the models into the directories specified by thebob.bio.base.tools.FileSelector
. By default, if target files already exist, they are not re-created.The extractor is only used to load features in a coherent way.
Parameters:
- algorithmpy:class:bob.bio.base.algorithm.Algorithm or derived
The algorithm, used for enrolling model and writing them to file.
- extractorpy:class:bob.bio.base.extractor.Extractor or derived
The extractor, used for reading the extracted features, if the algorithm enrolls models from unprojected data.
- compute_zt_normbool
If set to
True
and ‘T’` is part of thetypes
, also T-norm models are extracted.- indices(int, int) or None
If specified, only the models for the given index range
range(begin, end)
should be enrolled. This is usually given, when parallel threads are executed.- groupssome of
('dev', 'eval')
The list of groups, for which models should be enrolled.
- allow_missing_filesbool
If set to
True
, extracted or ptojected files that are not found are silently ignored. If none of the enroll files are found, no model file will be written.- forcebool
If given, files are regenerated, even if they already exist.
-
bob.bio.base.tools.
extract
(extractor, preprocessor, groups=None, indices=None, allow_missing_files=False, force=False)[source]¶ Extracts features from the preprocessed data using the given extractor.
The given
extractor
is used to extract all features required for the current experiment. It writes the extracted data into the directory specified by thebob.bio.base.tools.FileSelector
. By default, if target files already exist, they are not re-created.The preprocessor is only used to load the data in a coherent way.
Parameters:
- extractorpy:class:bob.bio.base.extractor.Extractor or derived
The extractor, used for extracting and writing the features.
- preprocessorpy:class:bob.bio.base.preprocessor.Preprocessor or derived
The preprocessor, used for reading the preprocessed data.
- groupssome of
('world', 'dev', 'eval')
orNone
The list of groups, for which the data should be extracted.
- indices(int, int) or None
If specified, only the features for the given index range
range(begin, end)
should be extracted. This is usually given, when parallel threads are executed.- allow_missing_filesbool
If set to
True
, preprocessed data files that are not found are silently ignored.- forcebool
If given, files are regenerated, even if they already exist.
-
bob.bio.base.tools.
groups
(args) → groups[source]¶ Returns the groups, for which the files must be preprocessed, and features must be extracted and projected. This function should be used in order to eliminate the training files (the
'world'
group), when no training is required in this experiment.Parameters:
- argsnamespace
The interpreted command line arguments as returned by the
initialize()
function.
Returns:
- groups[str]
A list of groups, for which data needs to be treated.
-
bob.bio.base.tools.
indices
(list_to_split, number_of_parallel_jobs, task_id=None)[source]¶ This function returns the first and last index for the files for the current job ID. If no job id is set (e.g., because a sub-job is executed locally), it simply returns all indices.
-
bob.bio.base.tools.
initialize
(parsers, command_line_parameters = None, skips = []) → args[source]¶ Parses the command line and arranges the arguments accordingly. Afterward, it loads the resources for the database, preprocessor, extractor, algorithm and grid (if specified), and stores the results into the returned args.
This function also initializes the
FileSelector
instance by arranging the directories and files according to the command line parameters.If the
skips
are given, an ‘–execute-only’ parameter is added to the parser, according skips are selected.Parameters:
- parsersdict
The dictionary of command line parsers, as returned from
command_line_parser()
. Additional arguments might have been added.- command_line_parameters[str] or None
The command line parameters that should be interpreted. By default, the parameters specified by the user on command line are considered.
- skips[str]
A list of possible
--skip-...
options to be added and evaluated automatically.
Returns:
- argsnamespace
A namespace of arguments as read from the command line.
Note
The database, preprocessor, extractor, algorithm and grid (if specified) are actual instances of the according classes.
-
bob.bio.base.tools.
preprocess
(preprocessor, groups=None, indices=None, allow_missing_files=False, force=False)[source]¶ Preprocesses the original data of the database with the given preprocessor.
The given
preprocessor
is used to preprocess all data required for the current experiment. It writes the preprocessed data into the directory specified by thebob.bio.base.tools.FileSelector
. By default, if target files already exist, they are not re-created.Parameters:
- preprocessorpy:class:bob.bio.base.preprocessor.Preprocessor or derived
The preprocessor, which should be applied to all data.
- groupssome of
('world', 'dev', 'eval')
orNone
The list of groups, for which the data should be preprocessed.
- indices(int, int) or None
If specified, only the data for the given index range
range(begin, end)
should be preprocessed. This is usually given, when parallel threads are executed.- allow_missing_filesbool
If set to
True
, files for which the preprocessor returnsNone
are silently ignored.- forcebool
If given, files are regenerated, even if they already exist.
-
bob.bio.base.tools.
project
(algorithm, extractor, groups=None, indices=None, allow_missing_files=False, force=False)[source]¶ Projects the features for all files of the database.
The given
algorithm
is used to project all features required for the current experiment. It writes the projected data into the directory specified by thebob.bio.base.tools.FileSelector
. By default, if target files already exist, they are not re-created.The extractor is only used to load the data in a coherent way.
Parameters:
- algorithmpy:class:bob.bio.base.algorithm.Algorithm or derived
The algorithm, used for projecting features and writing them to file.
- extractorpy:class:bob.bio.base.extractor.Extractor or derived
The extractor, used for reading the extracted features, which should be projected.
- groupssome of
('world', 'dev', 'eval')
orNone
The list of groups, for which the data should be projected.
- indices(int, int) or None
If specified, only the features for the given index range
range(begin, end)
should be projected. This is usually given, when parallel threads are executed.- allow_missing_filesbool
If set to
True
, extracted files that are not found are silently ignored.- forcebool
If given, files are regenerated, even if they already exist.
-
bob.bio.base.tools.
read_features
(file_names, extractor, split_by_client = False) → extracted[source]¶ Reads the extracted features from
file_names
using the givenextractor
. Ifsplit_by_client
is set toTrue
, it is assumed that thefile_names
are already sorted by client.Parameters:
- file_names[str] or [[str]]
A list of names of files to be read. If
split_by_client = True
, file names are supposed to be split into groups.- extractorpy:class:bob.bio.base.extractor.Extractor or derived
The extractor, used for reading the extracted features.
- split_by_clientbool
Indicates if the given
file_names
are split into groups.- allow_missing_filesbool
If set to
True
, extracted files that are not found are silently ignored.
Returns:
- extracted[object] or [[object]]
The list of extracted features, in the same order as in the
file_names
.
-
bob.bio.base.tools.
read_preprocessed_data
(file_names, preprocessor, split_by_client = False) → preprocessed[source]¶ Reads the preprocessed data from
file_names
using the given preprocessor. Ifsplit_by_client
is set toTrue
, it is assumed that thefile_names
are already sorted by client.Parameters:
- file_names[str] or [[str]]
A list of names of files to be read. If
split_by_client = True
, file names are supposed to be split into groups.- preprocessorpy:class:bob.bio.base.preprocessor.Preprocessor or derived
The preprocessor, which can read the preprocessed data.
- split_by_clientbool
Indicates if the given
file_names
are split into groups.- allow_missing_filesbool
If set to
True
, preprocessed data files that are not found are silently ignored.
Returns:
- preprocessed[object] or [[object]]
The list of preprocessed data, in the same order as in the
file_names
.
-
bob.bio.base.tools.
set_required_common_optional_arguments
(required=[], common=[], optional=[])[source]¶
-
bob.bio.base.tools.
take_from_config_or_command_line
(args, config, keyword, default, required=True, is_resource=True)[source]¶
-
bob.bio.base.tools.
train_enroller
(algorithm, extractor, allow_missing_files=False, force=False)[source]¶ Trains the model enroller using the extracted or projected features, depending on your setup of the algorithm.
This function should only be called, when the
algorithm
actually requires enroller training. The enroller of the givenalgorithm
is trained using extracted or projected features. It writes the enroller to the file specified by thebob.bio.base.tools.FileSelector
. By default, if the target file already exist, it is not re-created.Parameters:
- algorithmpy:class:bob.bio.base.algorithm.Algorithm or derived
The algorithm, in which the enroller should be trained. It is assured that the projector file is read (if required) before the enroller training is started.
- extractorpy:class:bob.bio.base.extractor.Extractor or derived
The extractor, used for reading the training data, if unprojected features are used for enroller training.
- allow_missing_filesbool
If set to
True
, extracted files that are not found are silently ignored during training.- forcebool
If given, the enroller file is regenerated, even if it already exists.
-
bob.bio.base.tools.
train_extractor
(extractor, preprocessor, allow_missing_files=False, force=False)[source]¶ Trains the feature extractor using preprocessed data of the
'world'
group, if the feature extractor requires training.This function should only be called, when the
extractor
actually requires training. The givenextractor
is trained using preprocessed data. It writes the extractor to the file specified by thebob.bio.base.tools.FileSelector
. By default, if the target file already exist, it is not re-created.Parameters:
- extractorpy:class:bob.bio.base.extractor.Extractor or derived
The extractor to be trained.
- preprocessorpy:class:bob.bio.base.preprocessor.Preprocessor or derived
The preprocessor, used for reading the preprocessed data.
- allow_missing_filesbool
If set to
True
, preprocessed data files that are not found are silently ignored during training.- forcebool
If given, the extractor file is regenerated, even if it already exists.
-
bob.bio.base.tools.
train_projector
(algorithm, extractor, allow_missing_files=False, force=False)[source]¶ Trains the feature projector using extracted features of the
'world'
group, if the algorithm requires projector training.This function should only be called, when the
algorithm
actually requires projector training. The projector of the givenalgorithm
is trained using extracted features. It writes the projector to the file specified by thebob.bio.base.tools.FileSelector
. By default, if the target file already exist, it is not re-created.Parameters:
- algorithmpy:class:bob.bio.base.algorithm.Algorithm or derived
The algorithm, in which the projector should be trained.
- extractorpy:class:bob.bio.base.extractor.Extractor or derived
The extractor, used for reading the training data.
- allow_missing_filesbool
If set to
True
, extracted files that are not found are silently ignored during training.- forcebool
If given, the projector file is regenerated, even if it already exists.
-
bob.bio.base.tools.
write_info
(args, command_line_parameters, executable)[source]¶ Writes information about the current experimental setup into a file specified on command line.
Parameters:
- argsnamespace
The interpreted command line arguments as returned by the
initialize()
function.- command_line_parameters[str] or
None
The command line parameters that have been interpreted. If
None
, the parameters specified by the user on command line are considered.- executablestr
The name of the executable (such as
'./bin/verify.py'
) that is used to run the experiments.
-
bob.bio.base.tools.
zt_norm
(groups=['dev', 'eval'], write_compressed=False, allow_missing_files=False)[source]¶ Computes ZT-Norm using the previously generated A, B, C, D and D-samevalue matrix files.
This function computes the ZT-norm scores for all model ids for all desired groups and writes them into files defined by the
bob.bio.base.tools.FileSelector
. It loads the A, B, C, D and D-samevalue matrix files that need to be computed beforehand.Parameters:
- groupssome of
('dev', 'eval')
The list of groups, for which ZT-norm should be applied.
- write_compressedbool
If enabled, score files are compressed as
.tar.bz2
files.- allow_missing_filesbool
Currently, this option is only provided for completeness.
NaN
scores are not yet handled correctly.
- groupssome of
A set of utilities to load score files with different formats.
-
bob.bio.base.score.load.
open_file
(filename, mode='rt')[source]¶ Opens the given score file for reading.
Score files might be raw text files, or a tar-file including a single score file inside.
- Parameters
filename (
str
,file-like
) – The name of the score file to open, or a file-like object open for reading. If a file name is given, the according file might be a raw text file or a (compressed) tar file containing a raw text file.- Returns
A read-only file-like object as it would be returned by
open()
.- Return type
file-like
-
bob.bio.base.score.load.
four_column
(filename)[source]¶ Loads a score set from a single file and yield its lines
Loads a score set from a single file and yield its lines (to avoid loading the score file at once into memory). This function verifies that all fields are correctly placed and contain valid fields. The score file must contain the following information in each line:
claimed_id real_id test_label score
- Parameters
filename (
str
,file-like
) – The file object that will be opened withopen_file()
containing the scores.- Yields
str – The claimed identity – the client name of the model that was used in the comparison
str: The real identity – the client name of the probe that was used in the comparison
str: A label of the probe – usually the probe file name, or the probe id
float: The result of the comparison of the model and the probe
-
bob.bio.base.score.load.
split_four_column
(filename)[source]¶ Loads a score set from a single file and splits the scores
Loads a score set from a single file and splits the scores between negatives and positives. The score file has to respect the 4 column format as defined in the method
four_column()
.This method avoids loading and allocating memory for the strings present in the file. We only keep the scores.
- Parameters
filename (
str
,file-like
) – The file object that will be opened withopen_file()
containing the scores.- Returns
- negatives, 1D float array containing the list of scores, for which
the
claimed_id
and thereal_id
are different (seefour_column()
)- array: positives, 1D float array containing the list of scores, for which
the
claimed_id
and thereal_id
are identical (seefour_column()
)
- Return type
array
-
bob.bio.base.score.load.
cmc_four_column
(filename)[source]¶ Loads scores to compute CMC curves from a file in four column format.
The four column file needs to be in the same format as described in
four_column()
, and thetest_label
(column 3) has to contain the test/probe file name or a probe id.This function returns a list of tuples. For each probe file, the tuple consists of a list of negative scores and a list of positive scores. Usually, the list of positive scores should contain only one element, but more are allowed. The result of this function can directly be passed to, e.g., the
bob.measure.cmc()
function.- Parameters
filename (
str
,file-like
) – The file object that will be opened withopen_file()
containing the scores.- Returns
A list of tuples, where each tuple contains the
negative
andpositive
scores for one probe of the database. Bothnegatives
andpositives
can be either an 1Dnumpy.ndarray
of typefloat
, orNone
.- Return type
-
bob.bio.base.score.load.
five_column
(filename)[source]¶ Loads a score set from a single file and yield its lines
Loads a score set from a single file and yield its lines (to avoid loading the score file at once into memory). This function verifies that all fields are correctly placed and contain valid fields. The score file must contain the following information in each line:
claimed_id model_label real_id test_label score
- Parameters
filename (
str
,file-like
) – The file object that will be opened withopen_file()
containing the scores.- Yields
str – The claimed identity – the client name of the model that was used in the comparison
str: A label for the model – usually the model file name, or the model id
str: The real identity – the client name of the probe that was used in the comparison
str: A label of the probe – usually the probe file name, or the probe id
float: The result of the comparison of the model and the probe
-
bob.bio.base.score.load.
split_five_column
(filename)[source]¶ Loads a score set from a single file and splits the scores
Loads a score set from a single file in five column format and splits the scores between negatives and positives. The score file has to respect the 5 column format as defined in the method
five_column()
.This method avoids loading and allocating memory for the strings present in the file. We only keep the scores.
- Parameters
filename (
str
,file-like
) – The file object that will be opened withopen_file()
containing the scores.- Returns
- negatives, 1D float array containing the list of scores, for which
the
claimed_id
and thereal_id
are different (seefour_column()
)- array: positives, 1D float array containing the list of scores, for which
the
claimed_id
and thereal_id
are identical (seefour_column()
)
- Return type
array
-
bob.bio.base.score.load.
cmc_five_column
(filename)[source]¶ Loads scores to compute CMC curves from a file in five column format.
The five column file needs to be in the same format as described in
five_column()
, and thetest_label
(column 4) has to contain the test/probe file name or a probe id.This function returns a list of tuples. For each probe file, the tuple consists of a list of negative scores and a list of positive scores. Usually, the list of positive scores should contain only one element, but more are allowed. The result of this function can directly be passed to, e.g., the
bob.measure.cmc()
function.- Parameters
filename (
str
,file-like
) – The file object that will be opened withopen_file()
containing the scores.- Returns
A list of tuples, where each tuple contains the
negative
andpositive
scores for one probe of the database.- Return type
-
bob.bio.base.score.load.
scores
(filename, ncolumns=None) → tuple[source]¶ Loads the scores from the given score file and yield its lines. Depending on the score file format, four or five elements are yielded, see
bob.bio.base.score.load.four_column()
andbob.bio.base.score.load.five_column()
for details.Parameters:
- filename:
str
,file-like
: The file object that will be opened with
open_file()
containing the scores.- ncolumns: any
ignored
Yields:
- filename:
-
bob.bio.base.score.load.
split
(filename, ncolumns=None, sort=False)[source]¶ Loads the scores from the given score file and splits them into positives and negatives. Depending on the score file format, it calls see
bob.bio.base.score.load.split_four_column()
andbob.bio.base.score.load.split_five_column()
for details.- Parameters
filename (str) – The path to the score file.
ncolumns (int or
None
) – If specified to be4
or5
, the score file will be assumed to be in the given format. If not specified, the score file format will be estimated automaticallysort (
bool
, optional) – IfTrue
, will return sorted negatives and positives
- Returns
negatives (1D
numpy.ndarray
of type float) – This array contains the list of scores, for which theclaimed_id
and thereal_id
are different (seefour_column()
)positives (1D
numpy.ndarray
of type float) – This array contains the list of scores, for which theclaimed_id
and thereal_id
are identical (seefour_column()
)
-
bob.bio.base.score.load.
cmc
(filename, ncolumns=None) → list[source]¶ Loads scores to compute CMC curves.
Depending on the score file format, it calls see
bob.bio.base.score.load.cmc_four_column()
and :py:func:`bob.bio.base.score.load.cmc_five_column for details.- Parameters
filename (
str
orfile-like
) – The file object that will be opened withopen_file()
containing the scores.ncolumns – (
int
, Optional): If specified to be4
or5
, the score file will be assumed to be in the given format. If not specified, the score file format will be estimated automatically
Returns:
list
: [(neg,pos)] A list of tuples, where each tuple contains thenegative
andpositive
scores for one probe of the database.
-
bob.bio.base.score.load.
load_score
(filename, ncolumns=None, minimal=False, **kwargs)[source]¶ Load scores using numpy.loadtxt and return the data as a numpy array.
- Parameters
filename (
str
,file-like
) – The file object that will be opened withopen_file()
containing the scores.ncolumns (
int
, optional) – 4, 5 or None (the default), specifying the number of columns in the score file. If None is provided, the number of columns will be guessed.minimal (
bool
, optional) – If True, only loadsclaimed_id
,real_id
, andscores
.**kwargs – Keyword arguments passed to
numpy.genfromtxt()
- Returns
An array which contains not only the actual scores but also the
claimed_id
,real_id
,test_label
and['model_label']
- Return type
array
-
bob.bio.base.score.load.
load_files
(filenames, func_load)[source]¶ Load a list of score files and return a list of tuples of (neg, pos)
- Parameters
filenames (
list
) – list of file pathsfunc_load – function that can read files in the list
- Returns
:any:`list` ([(neg,pos)] A list of tuples, where each tuple contains the)
negative
andpositive
sceach system/probee.
-
bob.bio.base.score.load.
get_negatives_positives
(score_lines)[source]¶ Take the output of load_score and return negatives and positives. This function aims to replace split_four_column and split_five_column but takes a different input. It’s up to you to use which one.
-
bob.bio.base.score.load.
get_negatives_positives_from_file
(filename, **kwargs)[source]¶ Loads the scores first efficiently and then calls get_negatives_positives
-
bob.bio.base.score.load.
get_negatives_positives_all
(score_lines_list)[source]¶ Take a list of outputs of load_score and return stacked negatives and positives.
-
bob.bio.base.score.load.
get_all_scores
(score_lines_list)[source]¶ Take a list of outputs of load_score and return stacked scores
-
bob.bio.base.score.load.
dump_score
(filename, score_lines)[source]¶ Dump scores that were loaded using
load_score()
The number of columns is automatically detected.
This file includes functionality to convert between Bob’s four column or five column score files and the Matrix files used in OpenBR.
-
bob.bio.base.score.openbr.
write_matrix
(score_file, matrix_file, mask_file, model_names=None, probe_names=None, score_file_format='4column', gallery_file_name='unknown-gallery.lst', probe_file_name='unknown-probe.lst', search=None)[source]¶ Writes the OpenBR matrix and mask files (version 2), given a score file.
If gallery and probe names are provided, the matrices in both files will be sorted by gallery and probe names. Otherwise, the order will be the same as given in the score file.
If
search
is given (as an integer), the resulting matrix files will be in the search format, keeping the given number of gallery scores with the highest values for each probe.Warning
When provided with a 4-column score file, this function will work only, if there is only a single model id for each client.
- Parameters
score_file (str) – The 4 or 5 column style score file written by bob.
matrix_file (str) – The OpenBR matrix file that should be written. Usually, the file name extension is
.mtx
mask_file (str) – The OpenBR mask file that should be written. The mask file defines, which values are positives, negatives or to be ignored. Usually, the file name extension is
.mask
model_names (
str
, optional) –If given, the matrix will be written in the same order as the given model names. The model names must be identical with the second column in the 5-column
score_file
.Note
If the score file is in four column format, the model_names must be the client ids stored in the first column. In this case, there might be only a single model per client
Only the scores of the given models will be considered.
probe_names (
list
, optional) – A list of strings. If given, the matrix will be written in the same order as the given probe names (thepath
of the probe). The probe names are identical to the third column of the 4-column (or the fourth column of the 5-column)score_file
. Only the scores of the given probe names will be considered in this case.score_file_format (
str
, optional) – One of('4column', '5column')
. The format, in which thescore_file
is; defaults to'4column'
gallery_file_name (
str
, optional) – The name of the gallery file that will be written in the header of the OpenBR files.probe_file_name (
str
, optional) – The name of the probe file that will be written in the header of the OpenBR files.search (
int
, optional) – If given, the scores will be sorted per probe, keeping the specified number of highest scores. If the given number is higher than the models,NaN
values will be added, and the mask will contain0x00
values.
-
bob.bio.base.score.openbr.
write_score_file
(matrix_file, mask_file, score_file, models_ids=None, probes_ids=None, model_names=None, probe_names=None, score_file_format='4column', replace_nan=None)[source]¶ Writes the Bob score file in the desired format from OpenBR files.
Writes a Bob score file in the desired format (four or five column), given the OpenBR matrix and mask files.
In principle, the score file can be written based on the matrix and mask files, and the format suffice the requirements to compute CMC curves. However, the contents of the score files can be adapted. If given, the
models_ids
andprobes_ids
define the client ids of model and probe, and they have to be in the same order as used to compute the OpenBR matrix. Themodel_names
andprobe_names
define the paths of model and probe, and they should be in the same order as the ids.In rare cases, the OpenBR matrix contains NaN values, which Bob’s score files cannot handle. You can use the
replace_nan
parameter to decide, what to do with these values. By default (None
), these values are ignored, i.e., not written into the score file. This is, what OpenBR is doing as well. However, you can also setreplace_nan
to any value, which will be written instead of the NaN values.- Parameters
matrix_file (str) – The OpenBR matrix file that should be read. Usually, the file name extension is
.mtx
mask_file (str) – The OpenBR mask file that should be read. Usually, the file name extension is
.mask
score_file (str) – Path to the 4 or 5 column style score file that should be written.
models_ids (
list
, optional) – A list of strings with the client ids of the models that will be written in the first column of the score file. If given, the size must be identical to the number of models (gallery templates) in the OpenBR files. If not given, client ids of the model will be identical to the gallery index in the matrix file.probes_ids (
list
, optional) – A list of strings with the client ids of the probes that will be written in the second/third column of the four/five column score file. If given, the size must be identical to the number of probe templates in the OpenBR files. It will be checked that the OpenBR mask fits to the model/probe client ids. If not given, the probe ids will be estimated automatically, i.e., to fit the OpenBR matrix.model_names (
list
, optional) –A list of strings with the model path written in the second column of the five column score file. If not given, the model index in the OpenBR file will be used.
Note
This entry is ignored in the four column score file format.
probe_names (
list
, optional) – A list of probe path to be written in the third/fourth column in the four/five column score file. If given, the size must be identical to the number of probe templates in the OpenBR files. If not given, the probe index in the OpenBR file will be used.score_file_format (
str
, optional) – One of('4column', '5column')
. The format, in which thescore_file
is; defaults to'4column'
replace_nan (
float
, optional) – If NaN values are encountered in the OpenBR matrix (which are not ignored due to the mask being non-NULL), this value will be written instead. IfNone
, the values will not be written in the score file at all.
Plots and measures for bob.bio.base
-
class
bob.bio.base.script.figure.
Cmc
(ctx, scores, evaluation, func_load)[source]¶ Bases:
bob.measure.script.figure.PlotBase
Handles the plotting of Cmc
-
compute
(idx, input_scores, input_names)[source]¶ Plot CMC for dev and eval data using
bob.measure.plot.cmc()
-
-
class
bob.bio.base.script.figure.
Dir
(ctx, scores, evaluation, func_load)[source]¶ Bases:
bob.measure.script.figure.PlotBase
Handles the plotting of DIR curve
-
compute
(idx, input_scores, input_names)[source]¶ Plot DIR for dev and eval data using
bob.measure.plot.detection_identification_curve()
-
-
class
bob.bio.base.script.figure.
Metrics
(ctx, scores, evaluation, func_load, names=('Failure to Acquire', 'False Match Rate', 'False Non Match Rate', 'False Accept Rate', 'False Reject Rate', 'Half Total Error Rate'))[source]¶ Bases:
bob.measure.script.figure.Metrics
Compute metrics from score files
-
class
bob.bio.base.script.figure.
MultiMetrics
(ctx, scores, evaluation, func_load)[source]¶ Bases:
bob.measure.script.figure.MultiMetrics
Compute metrics from score files
-
class
bob.bio.base.script.figure.
Hist
(ctx, scores, evaluation, func_load, nhist_per_system=2)[source]¶ Bases:
bob.measure.script.figure.Hist
Histograms for biometric scores
Click commands for bob.bio.base
Generate random scores.
-
bob.bio.base.script.gen.
gen_score_distr
(mean_neg, mean_pos, sigma_neg=10, sigma_pos=10, n_neg=5000, n_pos=5000, seed=0)[source]¶ Generate scores from normal distributions
- Parameters
mean_neg (float) – Mean for negative scores
mean_pos (float) – Mean for positive scores
sigma_neg (float) – STDev for negative scores
sigma_pos (float) – STDev for positive scores
n_pos (int) – The number of positive scores generated
n_neg (int) – The number of negative scores generated
seed (int) – A value to initialize the Random Number generator. Giving the same value (or not specifying ‘seed’) on two different calls will generate the same lists of scores.
- Returns
-
bob.bio.base.script.gen.
write_scores_to_file
(neg, pos, filename, n_subjects=5, n_probes_per_subject=5, n_unknown_subjects=0, neg_unknown=None, five_col=False)[source]¶ Writes score distributions
- Parameters
neg (
numpy.ndarray
) – Scores for negative samples.pos (
numpy.ndarray
) – Scores for positive samples.filename (str) – The path to write the score to.
n_subjects (int) – Number of different subjects
n_probes_per_subject (int) – Number of different samples used as probe for each subject
n_unknown_subjects (int) – The number of unknown (no registered model) subjects
five_col (bool) – If 5-colum format, else 4-column