API¶
This section includes information for using the Python API of
beat.backend.python
.
loader¶
This modules implements a simple loader for Python code as well as safe executor. Safe in this context means that if the method raises an exception, it will catch it and return in a suitable form to the caller.
- beat.backend.python.loader.load_module(name, path, uses)[source]¶
Loads the Python file as module, returns a proper Python module
- Parameters
name (str) – The name of the Python module to create. Must be a valid Python symbol name
path (str) – The full path of the Python file to load the module contents from
uses (dict) – A mapping which indicates the name of the library to load (as a module for the current library) and the full-path and use mappings of such modules.
- Returns
- A valid Python module you can use in an Algorithm or
Library.
- Return type
- beat.backend.python.loader.run(obj, method, exc=None, *args, **kwargs)[source]¶
Runs a method on the object and protects its execution
In case an exception is raised, it is caught and transformed into the exception class the user passed.
- Parameters
obj (object) – The python object in which execute the method
method (str) – The method name to execute on the object
exc (class, Optional) – The class to use as base exception when translating the exception from the user code. If you set it to
None
, then just re-throws the user raised exception.*args – Arguments to the object method, passed unchanged
**kwargs – Arguments to the object method, passed unchanged
- Returns
whatever
obj.method()
is bound to return.- Return type
hash¶
Various functions for hashing platform contributions and others
- beat.backend.python.hash.toPath(hash, suffix='.data')[source]¶
Returns the path on disk which corresponds to the hash given.
- beat.backend.python.hash.hash(dictionary_or_string)[source]¶
Generates a hash for the given parameter
- beat.backend.python.hash.hashJSON(contents, description)[source]¶
Hashes the pre-loaded JSON object using
hashlib.hash.hexdigest()
Excludes description changes
- Returns
hash
- Return type
- beat.backend.python.hash.hashJSONFile(path, description)[source]¶
Hashes the JSON file contents using
hashlib.hash.hexdigest()
Excludes description changes
- Returns
hash
- Return type
- beat.backend.python.hash.hashFileContents(path)[source]¶
Hashes the file contents using
hashlib.hash.hexdigest()
.- Returns
hash
- Return type
- beat.backend.python.hash.hashDataset(database_name, protocol_name, set_name)[source]¶
Hashes a Dataset
baseformat¶
Base type for all data formats
- beat.backend.python.baseformat.setup_scalar(formatname, attrname, dtype, value, casting, add_defaults)[source]¶
Casts the value to the the scalar type defined by dtype
- Parameters
formatname (str) – The name of this dataformat (e.g.
user/format/1
). This value is only used for informational purposesattrname (str) – The name of this attribute (e.g.
value
). This value is only used for informational purposesdtype (numpy.dtype) – The datatype of every element on the array
value (file object, Optional) – A representation of the value. This object will be cast into a scalar with the dtype defined by the
dtype
parameter.casting (str) – See
numpy.can_cast()
for a description of possible values for this field.add_defaults (bool) – If we should use defaults for missing attributes. In case this value is set to
True
, missing attributes are set with defaults, otherwise, aTypeError
is raise if a missing attribute is found.
- Returns
the scalar or its default representation, if no value is set.
- Return type
- beat.backend.python.baseformat.setup_array(formatname, attrname, shape, dtype, value, casting, add_defaults)[source]¶
Casts the value to the the array type defined by (shape, dtype)
- Parameters
formatname (str) – The name of this dataformat (e.g.
user/format/1
). This value is only used for informational purposesattrname (str) – The name of this attribute (e.g.
value
). This value is only used for informational purposesshape (
tuple
) – The shape of the arraydtype (numpy.dtype) – The datatype of every element on the array
value (file object, Optional) – A representation of the value. This object will be cast into a numpy array with the dtype defined by the
dtype
parameter.casting (str) – See
numpy.can_cast()
for a description of possible values for this field.add_defaults (bool) – If we should use defaults for missing attributes. In case this value is set to
True
, missing attributes are set with defaults, otherwise, aTypeError
is raise if a missing attribute is found.
- Returns
- with the adequate dimensions. If a
value
is set, validates that value and returns it as a newnumpy.ndarray
.
- Return type
- beat.backend.python.baseformat.pack_array(dtype, value, fd)[source]¶
Binary-encodes the array at
value
into the file descriptorfd
- Parameters
dtype (numpy.dtype) – The datatype of the array (taken from the format descriptor)
value (file object, Optional) – The
numpy.ndarray
representing the value to be encodedfd (file object) – The file where to encode the input
- beat.backend.python.baseformat.pack_scalar(dtype, value, fd)[source]¶
Binary-encodes the scalar at
value
into the file descriptorfd
- Parameters
dtype (numpy.dtype) – The datatype of the scalar (taken from the format descriptor)
value (object, Optional) – An object representing the value to be encoded
fd (file object) – The file where to encode the input
- beat.backend.python.baseformat.read_some(format, fd)[source]¶
Reads some of the data from the file descriptor
fd
- beat.backend.python.baseformat.read_string(fd)[source]¶
Reads the next string from the file descriptor
fd
- beat.backend.python.baseformat.unpack_array(shape, dtype, fd)[source]¶
Unpacks the following data array.
Returns the unpacked array as a
numpy.ndarray
object. No checks are performed by this function as we believe that the binary stream matches perfectly the data type.- Parameters
shape (
tuple
) – The shape of the arraydtype (numpy.dtype) – The datatype of every element on the array
fd (file object) – The file where to encode the input
- Returns
advances readout of
fd
.- Return type
- beat.backend.python.baseformat.unpack_scalar(dtype, fd)[source]¶
Unpacks the following scalar.
Returns the unpacked scalar. No checks are performed by this function as we believe that the binary stream matches perfectly the data type.
- Parameters
dtype (numpy.dtype) – The datatype of every element on the array
fd (file object) – The file where to encode the input
- Returns
- which among other options, can be a numpy scalar (
int8
, float32
,bool_
, etc) or a string (str
). Advances readout offd
.
- which among other options, can be a numpy scalar (
- Return type
- class beat.backend.python.baseformat.baseformat(**kwargs)[source]¶
Bases:
object
All dataformats are represented, in Python, by a derived class of this one
Construction is, by default, set to using a unsafe data type conversion. For an ‘safe’ converter, use
baseformat.from_dict()
, where you can, optionally, set the casting style (seenumpy.can_cast()
for details on the values this parameter can assume).Parameters part of the declared type which are not set, are filled in with defaults. Similarly to the
casting
parameter, usebaseformat.from_dict()
to be able to adjust this behaviour.- from_dict(data, casting='safe', add_defaults=False)[source]¶
Same as initializing the object, but with a less strict type casting
Construction is, by default, set to using a unsafe data type conversion. See
numpy.can_cast()
for details on the values this parameter can assume).- Parameters
data (
dict
, Optional) – A dictionary representing the data input, matching the keywords defined at the resolved format. A value ofNone
, if passed, effectively results in the same as passing an empty dictionary{}
.casting (str) – See
numpy.can_cast()
for a description of possible values for this field. By default, it is set to'safe'
. Use the constructor to get a default'unsafe'
behaviour.add_defaults (bool) – If we should use defaults for missing attributes. Incase this value is set to True, missing attributes are set with defaults, otherwise, a
TypeError
is raise if a missing attribute is found.
- pack_into(fd)[source]¶
Creates a binary representation of this object into a file.
This method will make the object pickle itself on the file descritor
fd
. If you’d like to write the contents of this file into a string, use thesix.BytesIO
.
- pack()[source]¶
Creates a binary representation of this object as a string representation. It uses,
baseformat.pack_into()
to encode the string.
- unpack_from(fd)[source]¶
Loads a binary representation of this object
We don’t run any extra checks as an unpack operation is only supposed to be carried out once the type compatibility has been established.
- unpack(s)[source]¶
Loads a binary representation of this object from a string
Effectively, this method just calls
baseformat.unpack_from()
with asix.BytesIO
wrapped around the input string.
- isclose(other, *args, **kwargs)[source]¶
Tests for closeness in the numerical sense.
Values such as integers, booleans and strings are checked for an exact match. Parameters with floating-point components such as 32-bit floats and complex values should be close enough given the input parameterization.
Parameters for floating-point checks are those for
numpy.isclose()
. Check its help page for more details.- Returns
indicates if the other object is close enough to this one.
- Return type
dataformat¶
Validation and parsing for dataformats
- class beat.backend.python.dataformat.Storage(prefix, name)[source]¶
Bases:
Storage
Resolves paths for dataformats
- Parameters
- asset_type = 'dataformat'¶
- asset_folder = 'dataformats'¶
- class beat.backend.python.dataformat.DataFormat(prefix, data, parent=None, dataformat_cache=None)[source]¶
Bases:
object
Data formats define the chunks of data that circulate between blocks.
- Parameters
prefix (str) – Establishes the prefix of your installation.
data (str, dict) – The fully qualified algorithm name (e.g.
user/algo/1
) or a dictionary representing the data format (for analyzer results).parent (
tuple
, Optional) – The parent DataFormat for this format. If set toNone
, this means this dataformat is the first one on the hierarchy tree. If set to a tuple, the contents are(format-instance, field-name)
, which indicates the originating object that is this object’s parent and the name of the field on that object that points to this one.dataformat_cache (
dict
, Optional) – A dictionary mapping dataformat names to loaded dataformats. This parameter is optional and, if passed, may greatly speed-up data format loading times as dataformats that are already loaded may be re-used. If you use this parameter, you must guarantee that the cache is refreshed as appropriate in case the underlying dataformats change.
- parent¶
The pointer to the dataformat to which the current format is part of. It is useful for internal error reporting.
- property name¶
Name of this object, either from the filename or composed from the hierarchy it belongs.
- property schema_version¶
Returns the schema version
- property extends¶
If this dataformat extends another one, this is it, otherwise
None
- property type¶
Returns a new type that can create instances of this dataformat.
The new returned type provides a basis to construct new objects which represent the dataformat. It provides a simple JSON serializer and a for-screen representation.
Example
To create an object respecting the data format from a JSON descriptor, use the following technique:
ftype = dataformat(...).type json = simplejson.loads(...) newobj = ftype(**json) # instantiates the new object, checks format
To dump the object into JSON, use the following technique:
simplejson.dumps(newobj.as_dict(), indent=4)
A string representation of the object uses the technique above to pretty-print the object contents to the screen.
- property valid¶
A boolean that indicates if this dataformat is valid or not
- property description¶
Short description string, loaded from the JSON file if one was set
- property documentation¶
The full-length description for this object
- validate(data)[source]¶
Validates a piece of data provided by the user
In order to validate, the data object must be complete and safe-castable to this dataformat. For any other validation operation that would require special settings, use instead the
type()
method to generate a valid type and use eitherfrom_dict
,unpack
orunpack_from
depending on your use-case.- Parameters
data (dict, str, file object) – This parameter represents the data to be validated. It may be a dictionary with the JSON representation of a data blob or, else, a binary blob (represented by either a string or a file descriptor object) from which the data will be read. If problems occur, an exception is raised.
- Returns
Raises if an error occurs.
- Return type
None
- isparent(other)[source]¶
Tells if the other object extends self (directly or indirectly).
- Parameters
other (DataFormat) – another object to check
- Returns
True
, ifother
is a parent ofself
.False
otherwise.
- Return type
- write(storage=None)[source]¶
Writes contents to prefix location
- Parameters
storage (
Storage
, Optional) – If you pass a new storage, then this object will be written to that storage point rather than its default.
- export(prefix)[source]¶
Recursively exports itself into another prefix
Other required dataformats are also copied.
- Parameters
prefix (str) – Establishes the prefix of your installation.
- Returns
None
- Raises
RuntimeError – If prefix and self.prefix point to the same directory.
algorithm¶
Validation for algorithms
- class beat.backend.python.algorithm.Storage(prefix, name, language=None)[source]¶
Bases:
CodeStorage
Resolves paths for algorithms
- Parameters
- asset_type = 'algorithm'¶
- asset_folder = 'algorithms'¶
- class beat.backend.python.algorithm.Runner(module, obj_name, algorithm, exc=None)[source]¶
Bases:
object
A special loader class for algorithms, with specialized methods
- Parameters
module (module) – The preloaded module containing the algorithm as returned by
loader.load_module()
.obj_name (str) – The name of the object within the module you’re interested on
algorithm (object) – The algorithm instance that is used for parameter checking.
exc (class) – The class to use as base exception when translating the exception from the user code. Read the documentation of
loader.run()
for more details.
- class beat.backend.python.algorithm.Algorithm(prefix, name, dataformat_cache=None, library_cache=None)[source]¶
Bases:
object
Algorithms represent runnable components within the platform.
This class can only parse the meta-parameters of the algorithm (i.e., input and output declaration, grouping, synchronization details, parameters and splittability). The actual algorithm is not directly treated by this class. It can, however, provide you with a loader for actually running the algorithmic code (see
Algorithm.runner()
).- Parameters
prefix (str) – Establishes the prefix of your installation.
name (str) – The fully qualified algorithm name (e.g.
user/algo/1
)dataformat_cache (
dict
, Optional) – A dictionary mapping dataformat names to loaded dataformats. This parameter is optional and, if passed, may greatly speed-up algorithm loading times as dataformats that are already loaded may be re-used.library_cache (
dict
, Optional) – A dictionary mapping library names to loaded libraries. This parameter is optional and, if passed, may greatly speed-up library loading times as libraries that are already loaded may be re-used.
- dataformats¶
A dictionary containing all pre-loaded dataformats used by this algorithm. Data format objects will be of type
dataformat.DataFormat
.- Type
- libraries¶
A mapping object defining other libraries this algorithm needs to load so it can work properly.
- Type
- input_map¶
A dictionary where the key is the input name and the value, its type. All input names (potentially from different groups) are comprised in this dictionary.
- Type
- output_map¶
A dictionary where the key is the output name and the value, its type. All output names (potentially from different groups) are comprised in this dictionary.
- Type
- groups¶
A list containing dictionaries with inputs and outputs belonging to the same synchronization group.
- Type
- LEGACY = 'legacy'¶
- SEQUENTIAL = 'sequential'¶
- AUTONOMOUS = 'autonomous'¶
- SEQUENTIAL_LOOP_EVALUATOR = 'sequential_loop_evaluator'¶
- AUTONOMOUS_LOOP_EVALUATOR = 'autonomous_loop_evaluator'¶
- SEQUENTIAL_LOOP_PROCESSOR = 'sequential_loop_processor'¶
- AUTONOMOUS_LOOP_PROCESSOR = 'autonomous_loop_processor'¶
- dataformat_klass¶
alias of
DataFormat
- property name¶
The name of this object
- property schema_version¶
Returns the schema version
- property api_version¶
Returns the API version
- property type¶
Returns the type of algorithm
- property is_autonomous¶
Returns whether the algorithm is in the autonomous category
- property is_sequential¶
Returns whether the algorithm is in the sequential category
- property is_loop¶
- property language¶
Returns the current language set for the executable code
- clean_parameter(parameter, value)[source]¶
Checks if a given value against a declared parameter
This method checks if the provided user value can be safe-cast to the parameter type as defined on its specification and that it conforms to any parameter-imposed restrictions.
- Parameters
- Returns
The converted value, with an appropriate numpy type.
- Raises
KeyError – If the parameter cannot be found on this algorithm’s declaration.
ValueError – If the parameter cannot be safe cast into the algorithm’s type. Alternatively, a
ValueError
may also be raised if a range or choice was specified and the value does not obey those settings stipulated for the parameter
- property valid¶
A boolean that indicates if this algorithm is valid or not
- property uses¶
Mapping object defining the required library import name (keys) and the full-names (values)
- property isAnalyzer¶
Returns whether this algorithms is an analyzer
- property results¶
The results of this algorithm (analyzer) as a dictionary
If this algorithm is actually an analyzer (i.e., there are no formal outputs, but results that must be saved by the platform), then this dictionary contains the names and data types of those elements.
- property parameters¶
Dictionary containing all pre-defined parameters that this algorithm accepts
- property splittable¶
Whether this algorithm can be split between several processes
- property description¶
The short description for this object
- property documentation¶
The full-length description for this object
- runner(klass='Algorithm', exc=None)[source]¶
Returns a runnable algorithm object.
- Parameters
- Returns
- An instance of the algorithm,
which will be constructed, but not setup. You must set it up before using the
process
method.
- Return type
- write(storage=None)[source]¶
Writes contents to prefix location
- Parameters
storage (
Storage
, Optional) – If you pass a new storage, then this object will be written to that storage point rather than its default.
- export(prefix)[source]¶
Recursively exports itself into another prefix
Dataformats and associated libraries are also copied.
- Parameters
prefix (str) – A path to a prefix that must different then my own.
- Returns
None
- Raises
RuntimeError – If prefix and self.prefix point to the same directory.
database¶
Validation of databases
- class beat.backend.python.database.Storage(prefix, name)[source]¶
Bases:
CodeStorage
Resolves paths for databases
- Parameters
- asset_type = 'database'¶
- asset_folder = 'databases'¶
- class beat.backend.python.database.Runner(module, definition, prefix, root_folder, exc=None)[source]¶
Bases:
object
A special loader class for database views, with specialized methods
- Parameters
db_name (str) – The full name of the database object for this view
module (module) – The preloaded module containing the database views as returned by
loader.load_module()
.prefix (str) – Establishes the prefix of your installation.
root_folder (str) – The path pointing to the root folder of this database
exc (class) – The class to use as base exception when translating the exception from the user code. Read the documention of
loader.run()
for more details.*args – Constructor parameters for the database view. Normally, none.
**kwargs – Constructor parameters for the database view. Normally, none.
- class beat.backend.python.database.Database(prefix, name, dataformat_cache=None)[source]¶
Bases:
object
Databases define the start point of the dataflow in an experiment.
- Parameters
prefix (str) – Establishes the prefix of your installation.
name (str) – The fully qualified database name (e.g.
db/1
)dataformat_cache (
dict
, Optional) – A dictionary mapping dataformat names to loaded dataformats. This parameter is optional and, if passed, may greatly speed-up database loading times as dataformats that are already loaded may be re-used. If you use this parameter, you must guarantee that the cache is refreshed as appropriate in case the underlying dataformats change.
- property name¶
The full (valid) name of this database
- property description¶
The short description for this object
- property documentation¶
The full-length description for this object
- property schema_version¶
Returns the schema version
- property valid¶
A boolean that indicates if this database is valid or not
- property environment¶
Returns the run environment if any has been set
- property protocols¶
The declaration of all the protocols of the database
- property protocol_names¶
Names of protocols declared for this database
- view(protocol, name, exc=None, root_folder=None)[source]¶
Returns the database view, given the protocol and the set name
- Parameters
- Returns
The database view, which will be constructed, but not setup. You must set it up before using methods
done
ornext
.
- write(storage=None)[source]¶
Writes contents to prefix location
- Parameters
storage (
Storage
, Optional) – If you pass a new storage, then this object will be written to that storage point rather than its default.
- export(prefix)[source]¶
Recursively exports itself into another prefix
Dataformats associated are also exported recursively
- Parameters
prefix (str) – A path to a prefix that must different then my own.
- Returns
None
- Raises
RuntimeError – If prefix and self.prefix point to the same directory.
- class beat.backend.python.database.View[source]¶
Bases:
object
- index(root_folder, parameters)[source]¶
Returns a list of (named) tuples describing the data provided by the view.
The ordering of values inside the tuples is free, but it is expected that the list is ordered in a consistent manner (ie. all train images of person A, then all train images of person B, …).
For instance, assuming a view providing that kind of data:
----------- ----------- ----------- ----------- ----------- ----------- | image | | image | | image | | image | | image | | image | ----------- ----------- ----------- ----------- ----------- ----------- ----------- ----------- ----------- ----------- ----------- ----------- | file_id | | file_id | | file_id | | file_id | | file_id | | file_id | ----------- ----------- ----------- ----------- ----------- ----------- ----------------------------------- ----------------------------------- | client_id | | client_id | ----------------------------------- -----------------------------------
a list like the following should be generated:
[ (client_id=1, file_id=1, image=filename1), (client_id=1, file_id=2, image=filename2), (client_id=1, file_id=3, image=filename3), (client_id=2, file_id=4, image=filename4), (client_id=2, file_id=5, image=filename5), (client_id=2, file_id=6, image=filename6), ... ]
Warning
DO NOT store images, sound files or data loadable from a file in the list! Store the path of the file to load instead.
- class beat.backend.python.database.DatabaseTester(name, view_class, outputs_declaration, parameters, irregular_outputs=[], all_combinations=True)[source]¶
Bases:
object
Used while developing a new database view, to test its behavior
This class tests that, for each combination of connected/not connected outputs:
Data indices seems consistent
All the connected outputs produce data
All the not connected outputs don’t produce data
It also report some stats, and can generate a text file detailing the data generated by each output.
By default, outputs are assumed to produce data at constant intervals. Those that don’t follow this pattern, must be declared as ‘irregular’.
Note that no particular check is done about the database declaration or the correctness of the generated data with their data formats. This class is mainly used to check that the outputs are correctly synchronized.
data¶
Data I/O classes and functions
- beat.backend.python.data.mixDataIndices(list_of_data_indices)[source]¶
Given a collection of lists of data indices (belonging to separate but synchronized files/inputs), returns the most granular list of indices that span all the data
For example, the mix of
[(0, 2), (3, 4)]
and
[(0, 4)]
is:
[(0, 2), (3, 4)]
The mix of
[(0, 2), (3, 4)]
and
[(0, 1), (2, 3), (4, 4)]
is:
[(0, 1), (2, 2), (3, 3), (4, 4)]
- beat.backend.python.data.getAllFilenames(filename, start_index=None, end_index=None)[source]¶
Returns the names of all the files related to the given data file, taking the provided start and end indices into account.
- Parameters
- Returns
- (data_filenames, indices_filenames,
data_checksum_filenames, indices_checksum_filenames)
- class beat.backend.python.data.DataSource[source]¶
Bases:
object
Base class to load data from some source
- class beat.backend.python.data.FileInfos(file_index, start_index, end_index, offset, size)¶
Bases:
tuple
- end_index¶
Alias for field number 2
- file_index¶
Alias for field number 0
- offset¶
Alias for field number 3
- size¶
Alias for field number 4
- start_index¶
Alias for field number 1
- class beat.backend.python.data.CachedDataSource[source]¶
Bases:
DataSource
Utility class to load data from a file in the cache
- setup(filename, prefix, start_index=None, end_index=None, unpack=True)[source]¶
Configures the data source
- Parameters
filename (str) – Name of the file to read the data from
prefix (str) – Establishes the prefix of your installation.
start_index (int) – The starting index (if not set or set to
None
, the default, read data from the begin of file)end_index (int) – The end index (if not set or set to
None
, the default, reads the data until the end)unpack (bool) – Indicates if the data must be unpacked or not
- Returns
True
, if successful, orFalse
otherwise.
- class beat.backend.python.data.DatabaseOutputDataSource[source]¶
Bases:
DataSource
Utility class to load data from an output of a database view
- setup(view, output_name, dataformat_name, prefix, start_index=None, end_index=None, pack=False)[source]¶
Configures the data source
- Parameters
prefix (str) – Establishes the prefix of your installation.
start_index (int) – The starting index (if not set or set to
None
, the default, read data from the begin of file)end_index (int) – The end index (if not set or set to
None
, the default, reads the data until the end)unpack (bool) – Indicates if the data must be unpacked or not
- Returns
True
, if successful, orFalse
otherwise.
- class beat.backend.python.data.RemoteDataSource[source]¶
Bases:
DataSource
Utility class to load data from a data source accessible via a socket
- setup(socket, input_name, dataformat_name, prefix, unpack=True)[source]¶
Configures the data source
- Parameters
socket (zmq.Socket) – The socket to use to access the data.
input_name (str) – Name of the input corresponding to the data source.
dataformat_name (str) – Name of the data format.
prefix (str) – Establishes the prefix of your installation.
unpack (bool) – Indicates if the data must be unpacked or not
- Returns
True
, if successful, orFalse
otherwise.
- class beat.backend.python.data.DataSink[source]¶
Bases:
object
Interface of all the Data Sinks
Data Sinks are used by the outputs of an algorithm to write/transmit data.
- abstract write(data, start_data_index, end_data_index)[source]¶
Writes a block of data
- Parameters
data (baseformat.baseformat) – The block of data to write
start_data_index (int) – Start index of the written data
end_data_index (int) – End index of the written data
- class beat.backend.python.data.StdoutDataSink[source]¶
Bases:
DataSink
Data Sink that prints informations about the written data on stdout
Note: The written data is lost! Use this class for debugging purposes
- write(data, start_data_index, end_data_index)[source]¶
Write a block of data
- Parameters
data (baseformat.baseformat) –
start_data_index (int) – Start index of the written data
end_data_index (int) – End index of the written data
- class beat.backend.python.data.CachedDataSink[source]¶
Bases:
DataSink
Data Sink that save data in the Cache
The default behavior is to save the data in a binary format.
- setup(filename, dataformat, start_index, end_index, encoding='binary')[source]¶
Configures the data sink
- Parameters
filename (str) – Name of the file to generate
dataformat (dataformat.DataFormat) – The dataformat to be used inside this file. All objects stored inside this file will respect that format.
encoding (str) – String defining the encoding to be used for encoding the data. Only a few options are supported:
binary
(the default) orjson
(debugging purposes).
- write(data, start_data_index, end_data_index)[source]¶
Writes a block of data to the filesystem
- Parameters
data (baseformat.baseformat) – The block of data to write
start_data_index (int) – Start index of the written data
end_data_index (int) – End index of the written data
- beat.backend.python.data.load_data_index(cache_root, hash_path)[source]¶
Loads a cached-data index if it exists. Returns empty otherwise.
- Parameters
cache_root (str) – The path to the root of the cache directory
hash_path (str) – The hashed path of the input you wish to load the indexes for, as it is returned by the utility function
hash.toPath()
.
- Returns
A list, which will be empty if the index file is not present. Note that, given the current design, an empty list means an error condition.
- beat.backend.python.data.foundSplitRanges(lst, n_split)[source]¶
Splits a list of lists of indices into n splits for parallelization purposes.
data_loaders¶
This module implements all the data communication related classes
- class beat.backend.python.data_loaders.DataView(data_loader, data_indices)[source]¶
Bases:
object
Provides access to a subset of data from a group of inputs synchronized together
Data views are created from a data loader (see
DataLoader
), which are provided to the algorithms of types ‘sequential’ and ‘autonomous’ (seeDataLoaderList
).Example
view = data_loader.view('input1', 0) for i in range(view.count()) (data, start_index, end_index) = view[i]
- Parameters
data_loader (
DataLoader
) – Name of the data channel of the group of inputsdata_indices (
list
) – Data indices to consider as a list of tuples
- data_index_start¶
Lower data index across all inputs (see the section Inputs synchronization of the User’s Guide)
- Type
- data_index_end¶
Bigger data index across all inputs (see the section Inputs synchronization of the User’s Guide)
- Type
- class beat.backend.python.data_loaders.DataLoader(channel)[source]¶
Bases:
object
Provides access to data from a group of inputs synchronized together
Data loaders are provided to the algorithms of types ‘sequential’ and ‘autonomous’ (see
DataLoaderList
).Example
# Iterate through all the data for i in range(data_loader.count()) (data, start_index, end_index) = data_loader[i] print(data['input1'].data) # Restrict to a subset of the data view = data_loader.view('input1', 0) for i in range(view.count()) (data, start_index, end_index) = view[i]
- Parameters
channel (str) – Name of the data channel of the group of inputs
- data_index_start¶
Lower data index across all inputs (see the section Inputs synchronization of the User’s Guide)
- Type
- data_index_end¶
Bigger data index across all inputs (see the section Inputs synchronization of the User’s Guide)
- Type
- class beat.backend.python.data_loaders.DataLoaderList[source]¶
Bases:
object
Represents a list of data loaders
Inputs are organized by groups. The inputs inside a group are all synchronized together (see the section Inputs synchronization of the User’s Guide). A data loader provides access to data from a group of inputs.
A list implementing this interface is provided to the algorithms of types ‘sequential’ and ‘autonomous’.
One group of inputs is always considered as the main one, and is used to drive the algorithm. The usage of the other groups is left to the algorithm.
See
DataLoader
Example
data_loaders = DataLoaderList() ... # Retrieve a data loader by name data_loader = data_loaders['labels'] # Retrieve a data loader by index for index in range(0, len(data_loaders)): data_loader = data_loaders[index] # Iteration over all data loaders for data_loader in data_loaders: ... # Retrieve the data loader an input belongs to, by input name data_loader = data_loaders.loaderOf('label')
- main_loader¶
Main data loader
- Type
- add(data_loader)[source]¶
Add a data loader to the list
- Parameters
data_loader (DataLoader) – The data loader to add
Database execution¶
Execution utilities
- class beat.backend.python.execution.database.DBExecutor(message_handler, prefix, cache_root, data, dataformat_cache=None, database_cache=None)[source]¶
Bases:
object
Executor specialised in database views
- Parameters
prefix (str) – Establishes the prefix of your installation.
data (dict, str) – The piece of data representing the block to be executed. It must validate against the schema defined for execution blocks. If a string is passed, it is supposed to be a fully qualified absolute path to a JSON file containing the block execution information.
dataformat_cache (
dict
, Optional) – A dictionary mapping dataformat names to loaded dataformats. This parameter is optional and, if passed, may greatly speed-up database loading times as dataformats that are already loaded may be re-used. If you use this parameter, you must guarantee that the cache is refreshed as appropriate in case the underlying dataformats change.database_cache (
dict
, Optional) – A dictionary mapping database names to loaded databases. This parameter is optional and, if passed, may greatly speed-up database loading times as databases that are already loaded may be re-used. If you use this parameter, you must guarantee that the cache is refreshed as appropriate in case the underlying databases change.
- databases¶
A dictionary in which keys are strings with database names and values are
database.Database
, representing the databases required for running this block. The dictionary may be empty in case all inputs are taken from the file cache.- Type
- views¶
A dictionary in which the keys are tuples pointing to the
(<database-name>, <protocol>, <set>)
and the value is a setup view for that particular combination of details. The dictionary may be empty in case all inputs are taken from the file cache.- Type
- input_list¶
A list of inputs that will be served to the algorithm.
- Type
- property address¶
Address of the message handler
- property valid¶
A boolean that indicates if this executor is valid or not
Algorithm executor¶
A class that can setup and execute algorithm blocks on the backend
- class beat.backend.python.execution.algorithm.AlgorithmExecutor(socket, directory, dataformat_cache=None, database_cache=None, library_cache=None, cache_root='/cache', db_socket=None, loop_socket=None)[source]¶
Bases:
object
Executors runs the code given an execution block information
- Parameters
socket (zmq.Socket) – A pre-connected socket to send and receive messages from.
directory (str) – The path to a directory containing all the information required to run the user experiment.
dataformat_cache (
dict
, Optional) – A dictionary mapping dataformat names to loaded dataformats. This parameter is optional and, if passed, may greatly speed-up database loading times as dataformats that are already loaded may be re-used. If you use this parameter, you must guarantee that the cache is refreshed as appropriate in case the underlying dataformats change.database_cache (
dict
, Optional) – A dictionary mapping database names to loaded databases. This parameter is optional and, if passed, may greatly speed-up database loading times as databases that are already loaded may be re-used. If you use this parameter, you must guarantee that the cache is refreshed as appropriate in case the underlying databases change.library_cache (
dict
, Optional) – A dictionary mapping library names to loaded libraries. This parameter is optional and, if passed, may greatly speed-up library loading times as libraries that are already loaded may be re-used. If you use this parameter, you must guarantee that the cache is refreshed as appropriate in case the underlying libraries change.
- property runner¶
Returns the algorithm runner
This property allows for lazy loading of the runner
- property schema_version¶
Returns the schema version
- property analysis¶
A boolean that indicates if the current block is an analysis block
executor¶
A class that can setup and execute loop algorithm blocks on the backend
- class beat.backend.python.execution.loop.LoopChannel(socket)[source]¶
Bases:
object
The LoopChannel class is a direct communication link between a loop using algorithm and the loop itself
- setup(algorithm, prefix)[source]¶
Setup the channel internals
- Parameters
algorithm (
algorithm.Algorithm
) – algorithm for which the communication channel is setup.prefix (str) – Folder were the prefix is located.
- class beat.backend.python.execution.loop.LoopExecutor(message_handler, directory, dataformat_cache=None, database_cache=None, library_cache=None, cache_root='/cache', db_socket=None)[source]¶
Bases:
object
Executors runs the code given an execution block information
- Parameters
socket (zmq.Socket) – A pre-connected socket to send and receive messages from.
directory (str) – The path to a directory containing all the information required to run the user experiment.
dataformat_cache (
dict
, Optional) – A dictionary mapping dataformat names to loaded dataformats. This parameter is optional and, if passed, may greatly speed-up database loading times as dataformats that are already loaded may be re-used. If you use this parameter, you must guarantee that the cache is refreshed as appropriate in case the underlying dataformats change.database_cache (
dict
, Optional) – A dictionary mapping database names to loaded databases. This parameter is optional and, if passed, may greatly speed-up database loading times as databases that are already loaded may be re-used. If you use this parameter, you must guarantee that the cache is refreshed as appropriate in case the underlying databases change.library_cache (
dict
, Optional) – A dictionary mapping library names to loaded libraries. This parameter is optional and, if passed, may greatly speed-up library loading times as libraries that are already loaded may be re-used. If you use this parameter, you must guarantee that the cache is refreshed as appropriate in case the underlying libraries change.
- property runner¶
Returns the algorithm runner
This property allows for lazy loading of the runner
- property address¶
Address of the message handler
- property valid¶
A boolean that indicates if this executor is valid or not
Message handlers¶
This module implements a message handler that will be in charge with ZeroMQ communication.
- class beat.backend.python.execution.messagehandlers.MessageHandler(host_address, data_sources=None, kill_callback=None, context=None)[source]¶
Bases:
Thread
A 0MQ message handler for our communication with other processes
- run()[source]¶
Method representing the thread’s activity.
You may override this method in a subclass. The standard run() method invokes the callable object passed to the object’s constructor as the target argument, if any, with sequential and keyword arguments taken from the args and kwargs arguments, respectively.
- class beat.backend.python.execution.messagehandlers.LoopMessageHandler(host_address, data_sources=None, kill_callback=None, context=None)[source]¶
Bases:
MessageHandler
Custom message handler that will handle validation request from loop using algorithm
- setup(algorithm, prefix)[source]¶
Setup the loop internals
- Parameters
algorithm (
algorithm.Algorithm
) – algorithm for which the communication channel is setup.prefix (str) – Folder were the prefix is located.
- set_executor(executor)[source]¶
Set the executor for validation
- Parameters
executor (
loop.LoopExecutor
) – Loop executor
- validate(result)[source]¶
Validate the result received and send back a boolean answer about the validity of it as well as additional data for the loop using algorithm to process
Syntax: val
- Parameters
result (
beat.backend.python.dataformat.DataFormat
) – Result to be validated.
helpers¶
This module implements various helper methods and classes
- class beat.backend.python.helpers.AccessMode[source]¶
Bases:
object
Possible access modes
- NONE = 0¶
- LOCAL = 1¶
- REMOTE = 2¶
- beat.backend.python.helpers.create_inputs_from_configuration(config, algorithm, prefix, cache_root, cache_access=0, db_access=0, unpack=True, socket=None, databases=None, no_synchronisation_listeners=False)[source]¶
- beat.backend.python.helpers.create_outputs_from_configuration(config, algorithm, prefix, cache_root, input_list=None, data_loaders=None, loop_socket=None)[source]¶
inputs¶
This module implements input related classes
- beat.backend.python.inputs.first(iterable, default=None)[source]¶
Get the first item of a list or default
- class beat.backend.python.inputs.Input(name, data_format, data_source)[source]¶
Bases:
object
Represents an input of a processing block that receive data from a (legacy) data source
A list of those inputs must be provided to the algorithms (see
InputList
)- group¶
Group containing this input
- Type
- data¶
The last block of data received on the input
- data_index¶
Index of the last block of data received on the input (see the section Inputs synchronization of the User’s Guide)
- Type
- data_index_end¶
End index of the last block of data received on the input (see the section Inputs synchronization of the User’s Guide)
- Type
- data_same_as_previous¶
Indicates if the last block of data received was changed (see the section Inputs synchronization of the User’s Guide)
- Type
- class beat.backend.python.inputs.InputGroup(channel, synchronization_listener=None, restricted_access=True)[source]¶
Bases:
object
Represents a group of inputs synchronized together
A group implementing this interface is provided to the algorithms (see
InputList
).See
Input
Example
inputs = InputList() print(inputs['labels'].data_format) for index in range(0, len(inputs)): print(inputs[index].data_format) for input in inputs: print(input.data_format) for input in inputs[0:2]: print(input.data_format)
- Parameters
channel (str) – Name of the data channel of the group
synchronization_listener (outputs.SynchronizationListener) – Synchronization listener to use
restricted_access (bool) – Indicates if the algorithm can freely use the inputs
- data_index¶
Index of the last block of data received on the inputs (see the section Inputs synchronization of the User’s Guide)
- Type
- data_index_end¶
End index of the last block of data received on the inputs (see the section Inputs synchronization of the User’s Guide)
- Type
- synchronization_listener¶
Synchronization listener used
- class beat.backend.python.inputs.InputList[source]¶
Bases:
object
Represents the list of inputs of a processing block
Inputs are organized by groups. The inputs inside a group are all synchronized together (see the section Inputs synchronization of the User’s Guide).
A list implementing this interface is provided to the algorithms
One group of inputs is always considered as the main one, and is used to drive the algorithm. The usage of the other groups is left to the algorithm.
See
Input
SeeInputGroup
Example
inputs = InputList() ... # Retrieve an input by name input = inputs['labels'] # Retrieve an input by index for index in range(0, len(inputs)): input = inputs[index] # Iteration over all inputs for input in inputs: ... # Iteration over some inputs for input in inputs[0:2]: ... # Retrieve the group an input belongs to, by input name group = inputs.groupOf('label') # Retrieve the group an input belongs to input = inputs['labels'] group = input.group
- main_group¶
Main group (for data-driven algorithms)
- Type
- add(group)[source]¶
Add a group to the list
- Parameters
group (InputGroup) – The group to add
library¶
Validation for libraries
- class beat.backend.python.library.Storage(prefix, name, language=None)[source]¶
Bases:
CodeStorage
Resolves paths for libraries
- Parameters
- asset_type = 'library'¶
- asset_folder = 'libraries'¶
- class beat.backend.python.library.Library(prefix, name, library_cache=None)[source]¶
Bases:
object
Librarys represent independent algorithm components within the platform.
This class can only parse the meta-parameters of the library. The actual library is not directly treated by this class - only by the associated algorithms.
- Parameters
prefix (str) – Establishes the prefix of your installation.
name (str) – The fully qualified algorithm name (e.g.
user/algo/1
)library_cache (
dict
, Optional) – A dictionary mapping library names to loaded libraries. This parameter is optional and, if passed, may greatly speed-up library loading times as libraries that are already loaded may be re-used.
- libraries¶
A mapping object defining other libraries this library needs to load so it can work properly.
- Type
- load()[source]¶
Loads the Python module for this library resolving all references
Returns the loaded Python module.
- property name¶
The name of this object
- property schema_version¶
Returns the schema version
- property language¶
Returns the current language set for the library code
- property valid¶
A boolean that indicates if this library is valid or not
- property uses¶
Mapping object defining the required library import name (keys) and the full-names (values)
- property description¶
The short description for this object
- property documentation¶
The full-length description for this object
- write(storage=None)[source]¶
Writes contents to prefix location.
- Parameters
storage (
Storage
, Optional) – If you pass a new storage, then this object will be written to that storage point rather than its default.
- export(prefix)[source]¶
Recursively exports itself into another prefix
Other required libraries are also copied.
- Parameters
prefix (str) – Establishes the prefix of your installation.
- Returns
None
- Raises
RuntimeError – If prefix and self.prefix point to the same directory.
outputs¶
This module implements output related classes
- class beat.backend.python.outputs.SynchronizationListener[source]¶
Bases:
object
A callback mechanism to keep Inputs and Outputs in groups and lists synchronized together.
- class beat.backend.python.outputs.Output(name, data_sink, synchronization_listener=None, force_start_index=0)[source]¶
Bases:
object
Represents one output of a processing block
A list of outputs implementing this interface is provided to the algorithms (see
OutputList
).- Parameters
name (str) – Name of the output
data_sink (data.DataSink) – Sink of data to be used by the output, pre-configured with the correct data format.
- data_sink¶
Sink of data used by the output
- Type
- write(data, end_data_index=None)[source]¶
Write a block of data on the output
- Parameters
data (baseformat.baseformat) – The block of data to write, or None (if the algorithm doesn’t want to write any data)
end_data_index (int) – Last index of the written data (see the section Inputs synchronization of the User’s Guide). If not specified, the current end data index of the Inputs List is used
- class beat.backend.python.outputs.RemotelySyncedOutput(name, data_sink, socket, synchronization_listener=None, force_start_index=0)[source]¶
Bases:
Output
- write(data, end_data_index=None)[source]¶
Write a block of data on the output
- Parameters
data (baseformat.baseformat) – The block of data to write, or None (if the algorithm doesn’t want to write any data)
end_data_index (int) – Last index of the written data (see the section Inputs synchronization of the User’s Guide). If not specified, the current end data index of the Inputs List is used
- class beat.backend.python.outputs.OutputList[source]¶
Bases:
object
Represents the list of outputs of a processing block
A list implementing this interface is provided to the algorithms
See
Output
.Example
outputs = OutputList() ... print(outputs['result'].data_format) for index in six.moves.range(0, len(outputs)): outputs[index].write(...) for output in outputs: output.write(...) for output in outputs[0:2]: output.write(...)
stats¶
This module implements statistical related helper functions.
- beat.backend.python.stats.io_statistics(configuration, input_list=None, output_list=None)[source]¶
Summarize current I/O statistics looking at data sources and sinks, inputs and outputs
- Parameters
configuration (dict) – Executor configuration
input_list (inputs.InputList) – List of input to gather statistics from
output_list (outputs.OutputList) – List of outputs to gather statistics from
- Returns
A dictionary summarizing current I/O statistics
- Return type
- beat.backend.python.stats.update(statistics, additional_statistics)[source]¶
Updates the content of statistics parameter with additional data. No new entries will be created. Only the values already available in statistics will be used.
utils¶
This module implements helper classes and functions.
- beat.backend.python.utils.hashed_or_simple(prefix, what, path, suffix='.json')[source]¶
Returns a hashed path or simple path depending on where the resource is
- beat.backend.python.utils.safe_rmdir(f)[source]¶
Safely removes the directory containg a given file from the disk
- beat.backend.python.utils.extension_for_language(language)[source]¶
Returns the preferred extension for a given programming language
The set of languages supported must match those declared in our
common.json
schema.
- class beat.backend.python.utils.File(path, binary=False)[source]¶
Bases:
object
User helper to read and write file objects
- class beat.backend.python.utils.AbstractStorage(path)[source]¶
Bases:
object
- asset_type = None¶
- asset_folder = None¶
- class beat.backend.python.utils.Storage(path)[source]¶
Bases:
AbstractStorage
Resolves paths for objects that provide only a description
- class beat.backend.python.utils.CodeStorage(path, language=None)[source]¶
Bases:
AbstractStorage
Resolves paths for objects that provide a description and code
- Parameters
language (str) – One of the valdid programming languages
- property language¶
- class beat.backend.python.utils.NumpyJSONEncoder(skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=False, sort_keys=False, indent=None, separators=None, encoding='utf-8', default=None, use_decimal=True, namedtuple_as_object=True, tuple_as_array=True, bigint_as_string=False, item_sort_key=None, for_json=False, ignore_nan=False, int_as_string_bitcount=None, iterable_as_array=False)[source]¶
Bases:
JSONEncoder
Encodes numpy arrays and scalars
See also
- default(obj)[source]¶
Implement this method in a subclass such that it returns a serializable object for
o
, or calls the base implementation (to raise aTypeError
).For example, to support arbitrary iterators, you could implement default like this:
def default(self, o): try: iterable = iter(o) except TypeError: pass else: return list(iterable) return JSONEncoder.default(self, o)