Python API

This section includes information for using the pure Python API of bob.io.base.

Classes

Functions

bob.io.base.load(inputs) Loads the contents of a file, an iterable of files, or an iterable of bob.io.base.File‘s into a numpy.ndarray.
bob.io.base.merge(filenames) Converts an iterable of filenames into an iterable over read-only bob.io.base.File’s.
bob.io.base.save(array, filename[, ...]) Saves the contents of an array-like object to file.
bob.io.base.append(array, filename) Appends the contents of an array-like object to file.
bob.io.base.peek(filename) Returns the type of array (frame or sample) saved in the given file.
bob.io.base.peek_all(filename) Returns the type of array (for full readouts) saved in the given file.
bob.io.base.create_directories_safe(directory) Creates a directory if it does not exists, with concurrent access support.
bob.io.base.get_config() Returns a string containing the configuration information.

Test Utilities

These functions might be useful when you are writing your nose tests. Please note that this is not part of the default bob.io.base API, so in order to use it, you have to import bob.io.base.test_utils separately.

bob.io.base.test_utils.datafile(f[, module, ...]) Returns the test file on the “data” subdirectory of the current module.
bob.io.base.test_utils.temporary_filename([...]) Generates a temporary filename to be used in tests
bob.io.base.test_utils.extension_available(...) Decorator to check if a extension is available before enabling a test

Details

bob.io.base.create_directories_safe(directory, dryrun=False)[source]

Creates a directory if it does not exists, with concurrent access support. This function will also create any parent directories that might be required. If the dryrun option is selected, it does not actually create the directory, but just writes the (Linux) command that would have been executed.

Parameters:

directory
The directory that you want to create.
dryrun
Only write the command, but do not execute it.
bob.io.base.load(inputs)[source]

Loads the contents of a file, an iterable of files, or an iterable of bob.io.base.File‘s into a numpy.ndarray.

Parameters:

inputs

This might represent several different entities:

  1. The name of a file (full path) from where to load the data. In this case, this assumes that the file contains an array and returns a loaded numpy ndarray.
  2. An iterable of filenames to be loaded in memory. In this case, this would assume that each file contains a single 1D sample or a set of 1D samples, load them in memory and concatenate them into a single and returned 2D numpy ndarray.
  3. An iterable of bob.io.base.File. In this case, this would assume that each bob.io.base.File contains a single 1D sample or a set of 1D samples, load them in memory if required and concatenate them into a single and returned 2D numpy ndarray.
  4. An iterable with mixed filenames and bob.io.base.File. In this case, this would returned a 2D numpy.ndarray, as described by points 2 and 3 above.
bob.io.base.merge(filenames)[source]

Converts an iterable of filenames into an iterable over read-only bob.io.base.File’s.

Parameters:

filenames

This might represent:

  1. A single filename. In this case, an iterable with a single bob.io.base.File is returned.
  2. An iterable of filenames to be converted into an iterable of bob.io.base.File‘s.
bob.io.base.save(array, filename, create_directories=False)[source]

Saves the contents of an array-like object to file.

Effectively, this is the same as creating a bob.io.base.File object with the mode flag set to w (write with truncation) and calling bob.io.base.File.write() passing array as parameter.

Parameters:

array
The array-like object to be saved on the file
filename
The name of the file where you need the contents saved to
create_directories
Automatically generate the directories if required
bob.io.base.write(array, filename, create_directories=False)

Saves the contents of an array-like object to file.

Effectively, this is the same as creating a bob.io.base.File object with the mode flag set to w (write with truncation) and calling bob.io.base.File.write() passing array as parameter.

Parameters:

array
The array-like object to be saved on the file
filename
The name of the file where you need the contents saved to
create_directories
Automatically generate the directories if required
bob.io.base.read(inputs)

Loads the contents of a file, an iterable of files, or an iterable of bob.io.base.File‘s into a numpy.ndarray.

Parameters:

inputs

This might represent several different entities:

  1. The name of a file (full path) from where to load the data. In this case, this assumes that the file contains an array and returns a loaded numpy ndarray.
  2. An iterable of filenames to be loaded in memory. In this case, this would assume that each file contains a single 1D sample or a set of 1D samples, load them in memory and concatenate them into a single and returned 2D numpy ndarray.
  3. An iterable of bob.io.base.File. In this case, this would assume that each bob.io.base.File contains a single 1D sample or a set of 1D samples, load them in memory if required and concatenate them into a single and returned 2D numpy ndarray.
  4. An iterable with mixed filenames and bob.io.base.File. In this case, this would returned a 2D numpy.ndarray, as described by points 2 and 3 above.
bob.io.base.append(array, filename)[source]

Appends the contents of an array-like object to file.

Effectively, this is the same as creating a bob.io.base.File object with the mode flag set to a (append) and calling bob.io.base.File.append() passing array as parameter.

Parameters:

array
The array-like object to be saved on the file
filename
The name of the file where you need the contents saved to
bob.io.base.peek(filename)[source]

Returns the type of array (frame or sample) saved in the given file.

Effectively, this is the same as creating a bob.io.base.File object with the mode flag set to r (read-only) and returning bob.io.base.File.describe().

Parameters:

filename
The name of the file to peek information from
bob.io.base.peek_all(filename)[source]

Returns the type of array (for full readouts) saved in the given file.

Effectively, this is the same as creating a bob.io.base.File object with the mode flag set to r (read-only) and returning bob.io.base.File.describe(all=True).

Parameters:

filename
The name of the file to peek information from
bob.io.base.open

alias of File

bob.io.base.get_config()[source]

Returns a string containing the configuration information.

bob.io.base.get_include_directories()[source]

Returns a list of include directories for dependent libraries, such as HDF5.

class bob.io.base.File

Bases: object

Use this object to read and write data into files

Constructor Documentation:

File (filename, [mode], [pretend_extension])

Opens a file for reading or writing

Normally, we read the file matching the extension to one of the available codecs installed with the present release of Bob. If you set the pretend_extension parameter though, we will read the file as it had a given extension. The value should start with a '.'. For example '.hdf5', to make the file be treated like an HDF5 file.

Parameters:

filename : str

The file path to the file you want to open

mode : str, one of (‘r’, ‘w’, ‘a’)

[Default: 'r'] A single character indicating if you’d like to 'r'``ead, ``'w'``rite or ``'a'``ppend into the file; if you choose ``'w' and the file already exists, it will be truncated

pretend_extension : str

[optional] An extension to use; see bob.io.base.extensions() for a list of (currently) supported extensions

Class Members:

append(data) → position

Adds the contents of an object to the file

This method appends data to the file. If the file does not exist, creates a new file, else, makes sure that the inserted array respects the previously set file structure.

Parameters:

data : array_like

[optional] The array to be written into the file; it can be a numpy.array, a bob.blitz.array or any other object which can be converted to either of them

Returns:

position : int

The current position of the newly written data
codec_name

str <– Name of the File class implementation

This variable is available for compatibility reasons with the previous versions of this library.

describe([all]) → dtype, shape, stride

Returns a description (dtype, shape, stride) of data at the file

Todo

The return value(s) ‘stride’ are used, but not documented.

Parameters:

all : bool

[Default: False] If set to True, returns the shape and strides for reading the whole file contents in one shot.

Returns:

dtype : numpy.dtype

The data type of the object

shape : tuple

The shape of the object
filename

str <– The path to the file being read/written

read([index]) → data

Reads a specific object in the file, or the whole file

This method reads data from the file. If you specified an index, it reads just the object indicated by the index, as you would do using the [] operator. If the index is not specified, reads the whole contents of the file into a numpy.ndarray.

Parameters:

index : int

[optional] The index to the object one wishes to retrieve from the file; negative indexing is supported; if not given, implies retrieval of the whole file contents.

Returns:

data : numpy.ndarray

The contents of the file, as array
write(data) → None

Writes the contents of an object to the file

This method writes data to the file. It acts like the given array is the only piece of data that will ever be written to such a file. No more data appending may happen after a call to this method.

Parameters:

data : array_like

[optional] The array to be written into the file; it can be a numpy.array, a bob.blitz.array or any other object which can be converted to either of them
class bob.io.base.HDF5File

Bases: object

Reads and writes data to HDF5 files.

HDF5 stands for Hierarchical Data Format version 5. It is a flexible, binary file format that allows one to store and read data efficiently into or from files. It is a cross-platform, cross-architecture format.

Objects of this class allows users to read and write data from and to files in HDF5 format. For an introduction to HDF5, visit the HDF5 Website.

Constructor Documentation:

  • HDF5File (filename, [mode])
  • HDF5File (hdf5)

Opens an HFF5 file for reading, writing or appending.

For the open mode, use 'r' for read-only 'a' for read/write/append, 'w' for read/write/truncate or 'x' for (read/write/exclusive). When another HDF5File object is given, a shallow copy is created, pointing to the same file.

Parameters:

filename : str

The file path to the file you want to open for reading or writing

mode : one of (‘r’, ‘w’, ‘a’, ‘x’)

[Default: 'r'] The opening mode

hdf5 : HDF5File

An HDF5 file to copy-construct

Class Members:

append(path, data[, compression]) → None

Appends a scalar or an array to a dataset

The object must be compatible with the typing information on the dataset, or an exception will be raised. You can also, optionally, set this to an iterable of scalars or arrays. This will cause this method to iterate over the elements and add each individually.

The compression parameter is effective when appending arrays. Set this to a number betwen 0 (default) and 9 (maximum) to compress the contents of this dataset. This setting is only effective if the dataset does not yet exist, otherwise, the previous setting is respected.

Parameters:

path : str

The path to the dataset to append data at; can be an absolute value (starting with a leading '/') or relative to the current working directory cwd

data : numpy.ndarray or scalar

Object to append to the dataset

compression : int

A compression value between 0 and 9
cd(path) → None

Changes the current prefix path

When this object is created the prefix path is empty, which means all following paths to data objects should be given using the full path. If you set the path to a different value, it will be used as a prefix to any subsequent operation until you reset it. If path starts with '/', it is treated as an absolute path. If the value is relative, it is added to the current path; '..' and '.' are supported. If it is absolute, it causes the prefix to be reset.

..note:: All operations taking a relative path, following a cd(), will be considered relative to the value defined by the cwd property of this object.

Parameters:

path : str

The path to change directories to
close() → None

Closes this file

This function closes the HDF5File after flushing all its contents to disk. After the HDF5File is closed, any operation on it will result in an exception.

copy(hdf5) → None

Copies all accessible content to another HDF5 file

Unlinked contents of this file will not be copied. This can be used as a method to trim unwanted content in a file.

Parameters:

hdf5 : HDF5File

The HDF5 file (already opened for writing), to copy the contents to
create_group(path) → None

Creates a new path (group) inside the file

A relative path is taken w.r.t. to the current directory. If the directory already exists (check it with has_group()), an exception will be raised.

Parameters:

path : str

The path to create.
cwd

str <– The current working directory set on the file

del_attribute(name[, path]) → None

Removes a given attribute at the named resource

Parameters:

name : str

The name of the attribute to delete; if the attribute is not available, a RuntimeError is raised

path : str

[Default: '.'] The path leading to the resource (dataset or group|directory) you would like to delete an attribute from; if the path does not exist, a RuntimeError is raised
del_attributes([attributes][, path]) → None

Removes attributes in a given (existing) path

If the attributes are not given or set to None, then remove all attributes at the named resource.

Parameters:

attributes : [str] or None

[Default: None] An iterable containing the names of the attributes to be removed, or None

path : str

[Default: '.'] The path leading to the resource (dataset or group|directory) you would like to delete attributes from; if the path does not exist, a RuntimeError is raised
describe(key) → shape, size, expandable

Describes a dataset type/shape, if it exists inside a file

If a given key to an HDF5 dataset exists inside the file, returns a type description of objects recorded in such a dataset, otherwise, raises an exception. The returned value type is a tuple of tuples (HDF5Type, number-of-objects, expandable) describing the capabilities if the file is read using these formats.

Todo

Check and correct the returned values

Todo

The return value(s) ‘size’ are used, but not documented.

Parameters:

key : str

The dataset path to describe

Returns:

shape : tuple

The shape of the returned array

expandable : bool

Defines if this object can be resized.
filename

str <– The name (and path) of the underlying file on hard disk

flush() → None

Flushes the content of the HDF5 file to disk

When the HDF5File is open for writing, this function synchronizes the contents on the disk with the one from the file. When the file is open for reading, nothing happens.

get(key) → data

Reads whole datasets from the file

This function reads full data sets from this file. The data type is dependent on the stored data, but is generally a numpy.ndarray.

Note

The functions read() and get() are synonyms.

Parameters:

key : str

The path to the dataset to read data from; can be an absolute value (starting with a leading '/') or relative to the current working directory cwd

Returns:

data : numpy.ndarray or other

The data read from this file at the given key
get_attribute(name[, path]) → attribute

Retrieve a given attribute from the named resource

This method returns a single value corresponding to what is stored inside the attribute container for the given resource. If you would like to retrieve all attributes at once, use get_attributes() instead.

Parameters:

name : str

The name of the attribute to retrieve; if the attribute is not available, a RuntimeError is raised

path : str

[Default: '.'] The path leading to the resource (dataset or group|directory) you would like to get an attribute from; if the path does not exist, a RuntimeError is raised

Returns:

attribute : numpy.ndarray or scalar

The read attribute
get_attributes([path]) → attributes

Reads all attributes of the given path

Attributes are returned in a dictionary in which each key corresponds to the attribute name and each value corresponds to the value stored inside the HDF5 file. To retrieve only a specific attribute, use get_attribute().

Parameters:

path : str

[Default: '.'] The path leading to the resource (dataset or group|directory) you would like to get all attributes from; if the path does not exist, a RuntimeError is raised.

Returns:

attributes : {str:value}

The attributes organized in dictionary, where value might be a numpy.ndarray or a scalar
has_attribute(name[, path]) → existence

Checks existence of a given attribute at the named resource

Parameters:

name : str

The name of the attribute to check

path : str

[Default: '.'] The path leading to the resource (dataset or group|directory) you would like to delete attributes from; if the path does not exist, a RuntimeError is raised

Returns:

existence : bool

True, if the attribute name exists, otherwise False
has_dataset(key) → None

Checks if a dataset exists inside a file

Checks if a dataset exists inside a file, on the specified path. If the given path is relative, it is take w.r.t. to the current working directory.

Note

The functions has_dataset() and has_key() are synonyms.

Parameters:

key : str

The dataset path to check
has_group(path) → None

Checks if a path (group) exists inside a file

This method does not work for datasets, only for directories. If the given path is relative, it is take w.r.t. to the current working directory.

Parameters:

path : str

The path to check
has_key(key) → None

Checks if a dataset exists inside a file

Checks if a dataset exists inside a file, on the specified path. If the given path is relative, it is take w.r.t. to the current working directory.

Note

The functions has_dataset() and has_key() are synonyms.

Parameters:

key : str

The dataset path to check
keys([relative]) → paths

Lists datasets available inside this file

Returns all paths to datasets available inside this file, stored under the current working directory. If relative is set to True, the returned paths are relative to the current working directory, otherwise they are absolute.

Note

The functions keys() and paths() are synonyms.

Parameters:

relative : bool

[Default: False] If set to True, the returned paths are relative to the current working directory, otherwise they are absolute

Returns:

paths : [str]

A list of paths inside this file
lread(key[, pos]) → data

Reads some contents of the dataset

This method reads contents from a dataset, treating the N-dimensional dataset like a container for multiple objects with N-1 dimensions. It returns a single numpy.ndarray in case pos is set to a value >= 0, or a list of arrays otherwise.

Parameters:

key : str

The path to the dataset to read data from, can be an absolute value (starting with a leading '/') or relative to the current working directory cwd

pos : int

If given and >= 0 returns the data object with the given index, otherwise returns a list by reading all objects in sequence

Returns:

data : numpy.ndarray or [numpy.ndarray]

The data read from this file
paths([relative]) → paths

Lists datasets available inside this file

Returns all paths to datasets available inside this file, stored under the current working directory. If relative is set to True, the returned paths are relative to the current working directory, otherwise they are absolute.

Note

The functions keys() and paths() are synonyms.

Parameters:

relative : bool

[Default: False] If set to True, the returned paths are relative to the current working directory, otherwise they are absolute

Returns:

paths : [str]

A list of paths inside this file
read(key) → data

Reads whole datasets from the file

This function reads full data sets from this file. The data type is dependent on the stored data, but is generally a numpy.ndarray.

Note

The functions read() and get() are synonyms.

Parameters:

key : str

The path to the dataset to read data from; can be an absolute value (starting with a leading '/') or relative to the current working directory cwd

Returns:

data : numpy.ndarray or other

The data read from this file at the given key
rename(from, to) → None

Renames datasets in a file

Parameters:

from : str

The path to the data to be renamed

to : str

The new name of the dataset
replace(path, pos, data) → None

Modifies the value of a scalar/array in a dataset.

Parameters:

path : str

The path to the dataset to read data from; can be an absolute value (starting with a leading '/') or relative to the current working directory cwd

pos : int

Position, within the dataset, of the object to be replaced; the object position on the dataset must exist, or an exception is raised

data : numpy.ndarray or scalar

Object to replace the value with; this value must be compatible with the typing information on the dataset, or an exception will be raised
set(path, data[, compression]) → None

Sets the scalar or array at position 0 to the given value

This method is equivalent to checking if the scalar or array at position 0 exists and then replacing it. If the path does not exist, we append the new scalar or array.

The data must be compatible with the typing information on the dataset, or an exception will be raised. You can also, optionally, set this to an iterable of scalars or arrays. This will cause this method to iterate over the elements and add each individually.

The compression parameter is effective when writing arrays. Set this to a number betwen 0 (default) and 9 (maximum) to compress the contents of this dataset. This setting is only effective if the dataset does not yet exist, otherwise, the previous setting is respected.

Note

The functions set() and write() are synonyms.

Parameters:

path : str

The path to the dataset to write data to; can be an absolute value (starting with a leading '/') or relative to the current working directory cwd

data : numpy.ndarray or scalar

Object to write to the dataset

compression : int

A compression value between 0 and 9
set_attribute(name, value[, path]) → None

Sets a given attribute at the named resource

Only simple scalars (booleans, integers, floats and complex numbers) and arrays of those are supported at the time being. You can use numpy scalars to set values with arbitrary precision (e.g. numpy.uint8).

Warning

Attributes in HDF5 files are supposed to be small containers or simple scalars that provide extra information about the data stored on the main resource (dataset or group|directory). Attributes cannot be retrieved in chunks, contrary to data in datasets. Currently, no limitations for the size of values stored on attributes is imposed.

Parameters:

name : str

The name of the attribute to set

value : numpy.ndarray or scalar

A simple scalar to set for the given attribute on the named resources path

path : str

[Default: '.'] The path leading to the resource (dataset or group|directory) you would like to set an attribute at
set_attributes(attributes[, path]) → None

Sets several attribute at the named resource using a dictionary

Each value in the dictionary should be simple scalars (booleans, integers, floats and complex numbers) or arrays of those are supported at the time being. You can use numpy scalars to set values with arbitrary precision (e.g. numpy.uint8).

Warning

Attributes in HDF5 files are supposed to be small containers or simple scalars that provide extra information about the data stored on the main resource (dataset or group|directory). Attributes cannot be retrieved in chunks, contrary to data in datasets. Currently, no limitations for the size of values stored on attributes is imposed.

Parameters:

attributes : {str: value}

A python dictionary containing pairs of strings and values, which can be a py:class:numpy.ndarray or a scalar

path : str

[Default: '.'] The path leading to the resource (dataset or group|directory) you would like to set attributes at
sub_groups([relative][, recursive]) → groups

Lists groups (directories) in the current file

Parameters:

relative : bool

[Default: False] If set to True, the returned sub-groups are relative to the current working directory, otherwise they are absolute

recursive : bool

[Default: True] If set to False, the returned sub-groups are only the ones in the current directory, otherwise recurses down the directory structure

Returns:

groups : [str]

The list of directories (groups) inside this file

Unlinks datasets inside the file making them invisible

If a given path to an HDF5 dataset exists inside the file, unlinks it.Please note this will note remove the data from the file, just make it inaccessible. If you wish to cleanup, save the reacheable objects from this file to another HDF5File object using copy(), for example.

Parameters:

key : str

The dataset path to unlink
writable

bool <– Has this file been opened in writable mode?

write(path, data[, compression]) → None

Sets the scalar or array at position 0 to the given value

This method is equivalent to checking if the scalar or array at position 0 exists and then replacing it. If the path does not exist, we append the new scalar or array.

The data must be compatible with the typing information on the dataset, or an exception will be raised. You can also, optionally, set this to an iterable of scalars or arrays. This will cause this method to iterate over the elements and add each individually.

The compression parameter is effective when writing arrays. Set this to a number betwen 0 (default) and 9 (maximum) to compress the contents of this dataset. This setting is only effective if the dataset does not yet exist, otherwise, the previous setting is respected.

Note

The functions set() and write() are synonyms.

Parameters:

path : str

The path to the dataset to write data to; can be an absolute value (starting with a leading '/') or relative to the current working directory cwd

data : numpy.ndarray or scalar

Object to write to the dataset

compression : int

A compression value between 0 and 9
bob.io.base.extensions() → extensions

Returns a dictionary containing all extensions and descriptions currently stored on the global codec registry

The extensions are returned as a dictionary from the filename extension to a description of the data format.

Returns:

extensions : {str : str}

A dictionary of supported extensions

Re-usable decorators and utilities for bob test code

bob.io.base.test_utils.datafile(f, module=None, path='data')[source]

Returns the test file on the “data” subdirectory of the current module.

Keyword attributes

f: str
This is the filename of the file you want to retrieve. Something like 'movie.avi'.
module: string, optional
This is the python-style package name of the module you want to retrieve the data from. This should be something like bob.io.test, but you normally refer it using the __name__ property of the module you want to find the path relative to.
path: str, optional
This is the subdirectory where the datafile will be taken from inside the module. Normally (the default) data. It can be set to None if it should be taken from the module path root (where the __init__.py file sits).

Returns the full path of the file.

bob.io.base.test_utils.temporary_filename(prefix='bobtest_', suffix='.hdf5')[source]

Generates a temporary filename to be used in tests

bob.io.base.test_utils.extension_available(extension)[source]

Decorator to check if a extension is available before enabling a test