Python API

Package Documentation

This is the bob.io.stream package

This package provides a way to define efficient processing pipelines, based on the concept of “streams”, to load and process video data stored in hdf5 files. The interface with the hdf5 files is implemented in StreamFile. Users can define loading and processing pipeline through the Stream class.

The stream implementation is designed to allow the extension of the class by implementing filters using the StreamFilter class and decorating them with stream_filter(). The decorator adds the filter to the Stream, so it can be used as a stream’s member.

class bob.io.stream.Stream(name=None, parent=None)

Bases: object

Base class implementing methods to load/write, process and use data from hdf5 file with a “numpy-like” api.

This class is designed to provide the following functionalities:

  • Easily define chain of processing and loading data. When accessing data through a stream, it will recursively call its parent load() function before its own. For instance if a stream’s parent is a StreamFile, loading data through the stream will first load the data (from the dataset specified by the stream’s name) from the hdf5 through the StreamFile before applying its own processing.

  • Provide an easy syntax to implement this chain processing. This is achieved through the stream_filter() filter decorator which adds to the Stream class filters members, allowing them to be used in the following fashion:

    example_stream = Stream("cam1").normalize().stack(Stream("cam2").normalize())
    

    The data loaded through example_stream will thus load data from “cam1” and normalize it, then load data from “cam2” and normalize it, and finally stack the two together.

  • In a similar fashion to the chain processing, this class allows to apply processing in the reverse order to write data in a hdf5 file. This is implemented in the put() method and uses the child attribute.

The api is designed to be similar to numpy arrays:

  • Data access (processing and loading) is done using [].

  • Taking a slice in a stream returns a new stream with the sliced data. (This is implemented with the StreamView filter).

To reduce disk access, the result of loading or processing is buffered.

The class was initially designed to work with video streams, therefore StreamArray members are available to provide an easy way to use bounding boxes or landmarks for each frame in the stream. Additionally, the timestamps member are the timestamps of each frame in the stream.

name

Name of the stream. If parent is a StreamFile, it will be used to know from which dataset in the hdf5 file the data should be taken. Otherwise it is an identifier of the Stream (or StreamFilter) functionality (eg “adjust”, “normalize”, …).

Type

str

parent

The element before this instance in the chain of processing for loading data. The parent’s “load” function will recursively be used before this instance’s one.

Type

Stream or StreamFile

child

The element after this instance in the chain of processing for writing data. When put() is called, it will perform its function then recursively call its child’s.

Type

Stream or StreamFile

_loaded

Indices of the data that is currently buffered.

Type

list of int

_data

Buffered data.

Type

numpy.ndarray

_shape

Shape of the stream’s data. This member is mostly used when writing data, while when reading the shape property is used.

Type

tuple of int

adjust(*args, **kwargs)
astype(*args, **kwargs)
property bounding_box

Bounding box at each frame in the stream.

A StreamArray member is provided to allow the user easily store their bounding boxes with the stream’s data.

Returns

Bounding boxes.

Return type

StreamArray

clean(*args, **kwargs)
colormap(*args, **kwargs)
property config

Configuration dictionary to access the data in the hdf5 file.

Returns

Config.

Return type

dict

filter(*args, **kwargs)
filters = ['filter', 'view', 'save', 'astype', 'adjust', 'select', 'colormap', 'normalize', 'clean', 'stack', 'subtract']
get_available_filters()[source]

Get a list of the available filters to use with Stream class.

Note: Stream.filters is filled in with the name of the filters by the stream_filter() decorator, each time a class is decorated.

Returns

List of available filters in the Stream class. The filters can be used as “stream.filter()”

Return type

list of str

get_parent()[source]

Return this stream’s parent (None if parent is not set)

Returns

This stream’s parent.

Return type

Stream

property image_points

Landmarks at each frame in the stream.

A StreamArray member is provided to allow the user easily store their landmark points with the stream’s data.

Returns

Landmarks.

Return type

StreamArray

load(index=None)[source]

Load data directly.

Unlike accessing stream data through brackets [], this method always returns the data, not a Stream. This method is overloaded in StreamFilter, in order to call parent load method first and apply processing on the result.

The loaded data is buffered to reduce disk access.

Parameters

index (int or list) – Indices of the frames to load, by default None.

Returns

Data at index.

Return type

numpy.ndarray

property ndim

Number of dimension in the stream’s data.

Returns

Number of dimension.

Return type

int

normalize(*args, **kwargs)
put(data, timestamp=None)[source]

Recursivelly pass data down to child to write in hdf5File.

StreamFilter overloads this method to process data with the filter function before passing down to child.

Parameters
  • data (numpy.ndarray) – data to write to file.

  • timestamp (int or float) – Timestamp of data, by default None.

Raises

ValueError – If data’s shape does not match with previous frames’ shape or with stream’s shape..

reset()[source]

Deletes buffered data and meta-data.

save(*args, **kwargs)
select(*args, **kwargs)
set_source(src)[source]

Recursively set source of self and parent.

Parameters

src (StreamFile) – The file containing the raw data of this stream and parents.

property shape

Shape of the stream’s data.

When reading data, the shape of the stream is typically defined by the shape of the data in source, therefore the shape is recursively set to parent as well. However, when writing data, the shape is defined by the user, and the stream’s parent might not be set. In this case, we store the shape in _shape.

Raises
  • Exception – If trying to set the shape when it is already defined (by a parent StreamFile).

  • ValueError – If setting the shape with an invalid type.

Returns

Shape.

Return type

tuple of int

property source

Source file of the Stream’s data.

While parent points to the previous stream in the chain of processing, source points directly to the data file.

Returns

File containing the stream’s data, before processing.

Return type

StreamFile

stack(*args, **kwargs)
subtract(*args, **kwargs)
property timestamps

Timestamp of each frame in the stream’s data.

Returns

Timestamps.

Return type

numpy.ndarray

view(*args, **kwargs)
class bob.io.stream.StreamAdjust(adjust_to, name, parent)[source]

Bases: bob.io.stream.StreamFilter

Filter that allows to use 2 streams with different timestamps seamlessly by taking the closest time neighbors.

Streams frames are not necessarily simultaneous: some streams may be delayed, some might have less frames… However the timestamps of each frames are available. Given the timestamps of the parent stream, this filter implements a nearest neighbor search in the timestamps of the adjust_to stream to load the closest frame.

This stream emulates the adjust_to number of frames and timestamps to facilitate operations on streams.

adjust_to

Stream relatively to which the timestamps will be adjusted.

Type

Stream or StreamFilter

set_source(src)[source]

Set self and adjust_to sources to src.

Parameters

src (Stream or StreamFile) – Source Stream or StreamFile.

property shape

Stream’s data shape. The number of frames is equal to adjust_to.

Returns

Shape of the Stream’s data.

Return type

tuple of int

property timestamps

Stream’s timestamps, equal to adjust_to after adjustment.

Returns

Timestamps of the frames in the stream.

Return type

numpy.ndarray

load(index)[source]

Load frame(s) at index.

index is the index of a frame in adjust_to. The closest frame in self is found using nearest neighbor search, then the data is loaded.

Parameters

index (int or list of int or slice) – Indices of the frames to load.

Returns

Stream’s data at index.

Return type

numpy.ndarray

class bob.io.stream.StreamArray(stream)[source]

Bases: object

Class to associate data to a Stream, for instance bounding boxes to a video stream.

This class allows to set the value of the data array (eg the bounding box at some or each frame of a stream) without having to care about the shape of the stream. If the data is not initialized, it will return None.

class bob.io.stream.StreamAsType(name, parent, dtype)[source]

Bases: bob.io.stream.StreamFilter

Filter to cast the data to a different numpy dtype.

dtype

The dtype to which to cast the data.

Type

numpy.dtype

process(data, indices)[source]

Cast data to dtype.

Parameters
  • data (numpy.ndarray) – Data to cast.

  • indices (int or list of int) – Not used. Present for compatibility with other filters.

Returns

data casted to dtype.

Return type

numpy.ndarray

class bob.io.stream.StreamClean(name, parent)[source]

Bases: bob.io.stream.StreamFilter

Filter to fill in dead pixels through inpainting, then blurring.

process_frame(data, data_index, stream_index)[source]

Fill in dead pixels in data.

Parameters
  • data (numpy.ndarray) – Parent stream’s data to clean.

  • data_index (int or list of int) – Not used. Present for compatibility with other filters.

  • stream_index (int or list of int) – Not used. Present for compatibility with other filters.

Returns

Cleaned data.

Return type

numpy.ndarray

class bob.io.stream.StreamColorMap(name, parent, colormap='gray')[source]

Bases: bob.io.stream.StreamFilter

Filter to map a 1 channel images to RGB images, usefull for visualization, eg of depth maps.

colormap

The colormap used to represent the data. Can be “gray” for grayscale, or an openCV colormap.

Type

str

property shape

Shape of the stream’s data. The stream parent must have 1 channel, and this stream has mapped it to 3 (RGB).

Returns

Shape of the stream’s data.

Return type

tuple of int

process_frame(data, data_index, stream_index)[source]

Maps a 1 channel frame to a RGB frame using the filter’s colormap.

Parameters
  • data (numpy.ndarray) – Parent stream’s data. Must have only 1 channel

  • data_index (int) – Not used. Present for compatibility with other streams.

  • stream_index (int) – Not used. Present for compatibility with other filters.

Returns

Stream’s data, mapped to RGB using the filter’s colormap.

Return type

numpy.ndarray

Raises

ValueError – If the parent’s stream does not have only 1 channel: this stream maps 1 channel images to RGB.

class bob.io.stream.StreamFile(hdf5_file=None, data_format_config_file_path=None, mode='r')

Bases: object

File class to read and write from HDF5 files.

Exposes methods to read a stream’s data and meta-data. The format of the data in the hdf5 file is defined through a configuration dictionary.

The class can also be used to write a HDF5 file, through the put_frame() method. This operates by appending, one frame at a time, data to a file.

hdf5_file

HDF5 file containing the streams data.

Type

bob.io.base.HDF5File

data_format_config

Path to configuration json with the streams data meta-data (names, shape, etc…)

Type

str

get_available_streams()[source]

list of str: Get the names of the streams in the HDF5 File.

get_stream_config(stream_name)[source]

Get the stream_name configuration: stream name, data format, etc…

Parameters

stream_name (str) – Name of the stream in the HDF5 File which meta-data is requested.

Returns

Stream meta-data. If the configuration is not available, return a default config contaning only the stream name.

Return type

dict

get_stream_shape(stream_name)[source]

Get the shape of the data in in stream_name.

Parameters

stream_name (str) – Name of the stream which shape is requested.

Returns

Shape of the stream_name’s data.

Return type

tuple of int

get_stream_timestamps(stream_name)[source]

Return the timestamps of each frame in stream_name.

Parameters

stream_name (str) – Name of the stream which timestamps are requested.

Returns

Timestamps of each frame in stream_name

Return type

numpy.ndarray

load_stream_data(stream_name, index)[source]

Load the index frame(s) of data from stream_name.

Loads only the requested indices from the file. If the stream’s data configuration requests it, some axis in the loaded data are flipped.

Parameters
  • stream_name (str) – Name of the stream which data should to be loaded

  • index (int or list of int) – Index of the frame(s) to load.

Returns

Stream’s data at frames index.

Return type

numpy.ndarray

Raises

ValueError – If index has not a valid type.

put_frame(name, data, timestamp=None)[source]

Appends data (a frame of a stream) to the hdf5 file.

Parameters
  • name (str) – Path to the dataset to append to.

  • data (obj:numpy.ndarray) – Data frame to append.

set_source(hdf5_file=None, data_format_config_file_path=None, mode='r')[source]

Open the HDF5 file and load data config.

Parameters
  • hdf5_file (bob.io.base.HDF5File or str or None) – File handle or path to the streams HDF5 File, by default None.

  • data_format_config_file_path (str or None) – Path to the data config file, by default None.

  • mode (str) – File opening mode, by default “r”.

class bob.io.stream.StreamFilter(name, parent, process_frame=None)

Bases: bob.io.stream.Stream

Base filter class: overloads the bob.io.stream.Stream.load() and bob.io.stream.Stream.put() methods to insert the filter processing.

This class implements the process() and bob.io.stream.StreamFilter.process_frame() methods, which define the processing operated by the filter. A “process_frame” method can be receive in argument, in which case it will be applied to each frame of data in process_frame(). If not provided, this filter doesn’t perform any processing, however it provides the definition of the processing methods which can be overloaded in inheriting classes. See for example StreamView filter.

The bob.io.stream.Stream.load() is overloaded to first perform the filter’s parent processing (or loading if parent is not a filter) The bob.io.stream.Stream.put() methods is overloaded to first perform the processing of the filter, then pass the data down to child to further process or write on disk.

filter_name

The name of this filter. name (from class bob.io.stream.Stream) is kept separate because it is used to know from which dataset to load data in the hdf5.

Type

str

load(index=None)[source]

Overload bob.io.stream.Stream.load() to apply the filter processing what parent loaded.

Parameters

index (int or list of int) – Indices of the frames to load, by default None.

Returns

The processed data.

Return type

numpy.ndarray

process(data, indices)[source]

Apply the filter on each frame of data, and stack the results back in one array.

Parameters
  • data (numpy.ndarray) – Data to process.

  • indices (list of int) – Indices of data in the stream. Unused here, but usefull for instance for filters that combine two streams together.

Returns

Processed data.

Return type

numpy.ndarray

Raises

ValueError – If indices is not a list.

process_frame(data, data_index, stream_index)[source]

Apply self.__process_frame if possible, otherwise simply return data.

Parameters
  • data (numpy.ndarray) – Data (one frame) to process.

  • data_index (int) – Not used. Index of data in the stream.

  • stream_index (int) – Not used. Index of the stream from which data comes, to be used by filters that combine several streams.

Returns

Processed frame.

Return type

numpy.ndarray

put(data, timestamp=None)[source]

Apply filter’s processsing, then pass down data to child for further processing or save on disk.

Parameters
  • data (numpy.ndarray) – Data (one frame) to process.

  • timestamp (int or float) – Timestamp of data in the stream, by default None.

class bob.io.stream.StreamNormalize(name, parent, tmin=None, tmax=None, dtype='uint8')[source]

Bases: bob.io.stream.StreamFilter

Filter to normalize images data range.

tmin

minimal threshold: values below tmin will be clipped to 0.

Type

numpy.generic

tmax

maximum threshold: values over tmax will be clipped to the maximum value allowed by the dtype

Type

numpy.generic

dtype

Data type of the images.

Type

str or numpy.dtype

process(data, indices)[source]

Normalize data.

Parameters
  • data (numpy.ndarray) – The parent stream’s data, to be normalized.

  • indices (int or list of int) – Not used. Present for compatibility with other filters. The indices of data in the stream.

Returns

The normalized data.

Return type

numpy.ndarray

class bob.io.stream.StreamSave(file, name, parent)[source]

Bases: bob.io.stream.StreamFilter

Filter to save frames of data to a StreamFile.

Saving is performed by appending to the streamfile.

file

StreamFile into which the data will be appended.

Type

StreamFile

put(data, timestamp=None)[source]

Pass data and timestamp to the StreamFile to write to disk.

Parameters
  • data (numpy.ndarray) – data to write to file.

  • timestamp (int or float) – Timestamp of data, by default None.

class bob.io.stream.StreamSelect(name, parent, channel)[source]

Bases: bob.io.stream.StreamFilter

Filter to select a channel in a color stream (in bob’s format).

This could also be performed by slicing the channel in the parent.

channel

Index of the channel to keep.

Type

int

property shape

Shape of the stream’s data.

Because 1 channel is selected, the dimension is 1 on the channel axis.

Returns

Shape of the stream’s data.

Return type

tuple of int

process(data, indices)[source]

Select the required channel in data.

Parameters
  • data (numpy.ndarray) – Color data, from which a channel is selected.

  • indices (int) – Not used. Present for compatibility with other filters.

Returns

Selected channel in data.

Return type

numpy.ndarray

class bob.io.stream.StreamStacked(stack_stream, name, parent)[source]

Bases: bob.io.stream.StreamFilter

Filter to stack streams along the channel dimension.

The stream stacks his parent Stream with its stack_stream.

stack_stream

The stream to stack with parent.

Type

Stream or StreamFilter

set_source(src)[source]

Set self and stack_stream source to src.

Parameters

src (Stream or StreamFile) – Source Stream or StreamFile.

property shape

Shape of the stream’s data. The number of channels is the sum of the parent’s and the stacked stream.

Returns

Shape of the stream’s data.

Return type

tuple of int

process(data, indices)[source]

Stacks data from stack_stream with data (which comes from parent).

data comes from parent with shape (n, c1, …), this method loads the data of stack_stream at the same indices, which has shape (n, c2, …), then stacks them to output an array of shape (n, c1 + c2, …)

parent and stack_stream must have the same dimensions, except in the channel axis.

Parameters
  • data (numpy.ndarray) – Parent stream’s data at indices

  • indices (int or list of int) – Indices of data

Returns

data from parent stacked with data at indices from stacked_stream along the channel dimension.

Return type

numpy.ndarray

process_frame(data, data_index, stream_index)[source]

Concatenate frame from parent and stack_stream along channel axis.

Parameters
  • data (numpy.ndarray) – parent frame at data_index.

  • data_index (int) – Index of the frames to stack in the streams.

  • stream_index (int) – Not used. Present for compatibility with other filters.

Returns

Concatenated frames from parent and stack_stream streams.

Return type

numpy.ndarray

class bob.io.stream.StreamSubtract(subtrahend, name, parent)[source]

Bases: bob.io.stream.StreamFilter

Filter to subtract subtrahend from parent, clipping results values to be positive or zero.

subtrahend

The stream’s which data will be subtracted.

Type

Stream or StreamFilter

set_source(src)[source]

Set self and subtrahend sources to src.

Parameters

src (Stream or StreamFile) – Source stream or stream file.

process(data, indices)[source]

Subtract subtrahend’s data from data.

Parameters
  • data (numpy.ndarray) – parent data at indices.

  • indices (int) – Indices of data.

Returns

data minus subtrahend’s data.

Return type

numpy.ndarray

class bob.io.stream.StreamView(name, parent, view_indices=None)[source]

Bases: bob.io.stream.StreamFilter

Filter to implement “slicing” functionality for the bob.io.stream.Stream class.

Similarly to numpy’s “view”, this filter allows to take a slice in a stream without creating a copy of the data.

frame_view

Slice value in the first dimension of the stream (along the frame’s axis). None means no slicing: take the whole array.

Type

slice or None

bulk_view

Slice value along the other axis in the stream.

Type

tuple of int or slice or None

property shape

Shape of the stream’s data.

The shape is computed with respect to the parent’s shape, because source might not be set so we can not know the shape of the data. If the requested slice has a integer index along one axis, this dimension is dropped. However, taking an integer along the first axis is not allowed (Exception raised in __init__).

Returns

Shape of the stream’s data.

Return type

tuple of int

property ndim

Number of dimension of the stream’s data.

If the requested slice has an integer along an axis, this dimension is collapsed, otherwise the number of dimension is the same as parent’s.

Returns

Number of dimension.

Return type

int

load(index=None)[source]

Load stream’s data at the corresponding indices.

Maps index to indices in parent and delegate loading.

Parameters

index (int or list of int or slice) – Indices of the data to load.

Returns

Data at index in the stream.

Return type

numpy.ndarray

process(data, indices)[source]

Apply slicing on each frame of data.

The slicing of the frame’s axis is performed in bob.io.stream.StreamView.load(), so that data only contains frames that are requested. It remains to apply the slicing along the other axis in data, which is delegarted to process_frame() (by slicing into the numpy arrays). Here we only store the requested slice in full format (value along all axis).

Parameters
  • data (numpy.ndarray) – Data to slice. Slicing on the first axis is already performed.

  • indices (int or list of int) – Indices of data in the stream.

Returns

Sliced data.

Return type

numpy.ndarray

process_frame(data, data_index, stream_index)[source]

Apply the slicing on a frame of data.

Apply the frame slicing computed in process() on a frame.

Parameters
  • data (numpy.ndarray) – Frame of data.

  • data_index (int) – Not used. Present for compatibility with other filters.

  • stream_index (int) – Not used. Present for compatibility with other filters.

Returns

Sliced data.

Return type

numpy.ndarray

bob.io.stream.get_config()[source]

Returns a string containing the configuration information.

bob.io.stream.stream_filter(name)

Adds the filter with name to the Stream class.

This decorator function is meant to be used on a filter class that inherits the Stream class. It adds this filter to the Stream class so it can be used directly as a member. It also adds it to the filters list.

For example, see the StreamView filter.

Parameters

name (str) – Name of the filter