API¶
This section includes information for using the Python API of
beat.backend.python
.
loader¶
This modules implements a simple loader for Python code as well as safe executor. Safe in this context means that if the method raises an exception, it will catch it and return in a suitable form to the caller.
-
beat.backend.python.loader.
load_module
(name, path, uses)[source]¶ Loads the Python file as module, returns a proper Python module
Parameters: - name (str) – The name of the Python module to create. Must be a valid Python symbol name
- path (str) – The full path of the Python file to load the module contents from
- uses (dict) – A mapping which indicates the name of the library to load (as a module for the current library) and the full-path and use mappings of such modules.
Returns: - A valid Python module you can use in an Algorithm or
Library.
Return type:
-
beat.backend.python.loader.
run
(obj, method, exc=None, *args, **kwargs)[source]¶ Runs a method on the object and protects its execution
In case an exception is raised, it is caught and transformed into the exception class the user passed.
Parameters: - obj (object) – The python object in which execute the method
- method (str) – The method name to execute on the object
- exc (class, Optional) – The class to use as base exception
when translating the exception from the user code. If you set it to
None
, then just re-throws the user raised exception. - *args – Arguments to the object method, passed unchanged
- **kwargs – Arguments to the object method, passed unchanged
Returns: whatever
obj.method()
is bound to return.Return type:
hash¶
Various functions for hashing platform contributions and others
-
beat.backend.python.hash.
toPath
(hash, suffix='.data')[source]¶ Returns the path on disk which corresponds to the hash given.
Parameters: Returns: Path to file based on hash
Return type:
-
beat.backend.python.hash.
toUserPath
(username)[source]¶ Returns the path to the user specific folder
Parameters: username (str) – User name to get the path from Returns: Path on file system for the user Return type: str
-
beat.backend.python.hash.
hash
(dictionary_or_string)[source]¶ Generates a hash for the given parameter
Parameters: dictionary_or_string (str or dict) – Input to hash Returns: Hash from input Return type: str
-
beat.backend.python.hash.
hashJSON
(contents, description)[source]¶ Hashes the pre-loaded JSON object using
hashlib.hash.hexdigest()
Excludes description changes
Returns: hash Return type: str
-
beat.backend.python.hash.
hashJSONFile
(path, description)[source]¶ Hashes the JSON file contents using
hashlib.hash.hexdigest()
Excludes description changes
Returns: hash Return type: str
-
beat.backend.python.hash.
hashFileContents
(path)[source]¶ Hashes the file contents using
hashlib.hash.hexdigest()
.Returns: hash Return type: str
baseformat¶
Base type for all data formats
-
beat.backend.python.baseformat.
setup_scalar
(formatname, attrname, dtype, value, casting, add_defaults)[source]¶ Casts the value to the the scalar type defined by dtype
Parameters: - formatname (str) – The name of this dataformat (e.g.
user/format/1
). This value is only used for informational purposes - attrname (str) – The name of this attribute (e.g.
value
). This value is only used for informational purposes - dtype (numpy.dtype) – The datatype of every element on the array
- value (file object, Optional) – A representation of the value.
This object will be cast into a scalar with the dtype defined by the
dtype
parameter. - casting (str) – See
numpy.can_cast()
for a description of possible values for this field. - add_defaults (bool) – If we should use defaults for missing attributes. In
case this value is set to
True
, missing attributes are set with defaults, otherwise, aTypeError
is raise if a missing attribute is found.
Returns: the scalar or its default representation, if no value is set.
Return type: - formatname (str) – The name of this dataformat (e.g.
-
beat.backend.python.baseformat.
setup_array
(formatname, attrname, shape, dtype, value, casting, add_defaults)[source]¶ Casts the value to the the array type defined by (shape, dtype)
Parameters: - formatname (str) – The name of this dataformat (e.g.
user/format/1
). This value is only used for informational purposes - attrname (str) – The name of this attribute (e.g.
value
). This value is only used for informational purposes - shape (
tuple
) – The shape of the array - dtype (numpy.dtype) – The datatype of every element on the array
- value (file object, Optional) – A representation of the value.
This object will be cast into a numpy array with the dtype defined by
the
dtype
parameter. - casting (str) – See
numpy.can_cast()
for a description of possible values for this field. - add_defaults (bool) – If we should use defaults for missing attributes. In
case this value is set to
True
, missing attributes are set with defaults, otherwise, aTypeError
is raise if a missing attribute is found.
Returns: - with the adequate dimensions. If a
value
is set, validates that value and returns it as a newnumpy.ndarray
.
Return type: - formatname (str) – The name of this dataformat (e.g.
-
beat.backend.python.baseformat.
pack_array
(dtype, value, fd)[source]¶ Binary-encodes the array at
value
into the file descriptorfd
Parameters: - dtype (numpy.dtype) – The datatype of the array (taken from the format descriptor)
- value (file object, Optional) – The
numpy.ndarray
representing the value to be encoded - fd (file object) – The file where to encode the input
-
beat.backend.python.baseformat.
pack_scalar
(dtype, value, fd)[source]¶ Binary-encodes the scalar at
value
into the file descriptorfd
Parameters: - dtype (numpy.dtype) – The datatype of the scalar (taken from the format descriptor)
- value (object, Optional) – An object representing the value to be encoded
- fd (file object) – The file where to encode the input
-
beat.backend.python.baseformat.
read_some
(format, fd)[source]¶ Reads some of the data from the file descriptor
fd
-
beat.backend.python.baseformat.
read_string
(fd)[source]¶ Reads the next string from the file descriptor
fd
-
beat.backend.python.baseformat.
unpack_array
(shape, dtype, fd)[source]¶ Unpacks the following data array.
Returns the unpacked array as a
numpy.ndarray
object. No checks are performed by this function as we believe that the binary stream matches perfectly the data type.Parameters: - shape (
tuple
) – The shape of the array - dtype (numpy.dtype) – The datatype of every element on the array
- fd (file object) – The file where to encode the input
Returns: advances readout of
fd
.Return type: - shape (
-
beat.backend.python.baseformat.
unpack_scalar
(dtype, fd)[source]¶ Unpacks the following scalar.
Returns the unpacked scalar. No checks are performed by this function as we believe that the binary stream matches perfectly the data type.
Parameters: - dtype (numpy.dtype) – The datatype of every element on the array
- fd (file object) – The file where to encode the input
Returns: - which among other options, can be a numpy scalar (
int8
, float32
,bool_
, etc) or a string (str
). Advances readout offd
.
Return type:
-
class
beat.backend.python.baseformat.
baseformat
(**kwargs)[source]¶ Bases:
object
All dataformats are represented, in Python, by a derived class of this one
Construction is, by default, set to using a unsafe data type conversion. For an ‘safe’ converter, use
baseformat.from_dict()
, where you can, optionally, set the casting style (seenumpy.can_cast()
for details on the values this parameter can assume).Parameters part of the declared type which are not set, are filled in with defaults. Similarly to the
casting
parameter, usebaseformat.from_dict()
to be able to adjust this behaviour.-
from_dict
(data, casting='safe', add_defaults=False)[source]¶ Same as initializing the object, but with a less strict type casting
Construction is, by default, set to using a unsafe data type conversion. See
numpy.can_cast()
for details on the values this parameter can assume).Parameters: - data (
dict
, Optional) – A dictionary representing the data input, matching the keywords defined at the resolved format. A value ofNone
, if passed, effectively results in the same as passing an empty dictionary{}
. - casting (str) – See
numpy.can_cast()
for a description of possible values for this field. By default, it is set to'safe'
. Use the constructor to get a default'unsafe'
behaviour. - add_defaults (bool) – If we should use defaults for missing
attributes. Incase this value is set to True, missing attributes
are set with defaults, otherwise, a
TypeError
is raise if a missing attribute is found.
- data (
-
pack_into
(fd)[source]¶ Creates a binary representation of this object into a file.
This method will make the object pickle itself on the file descritor
fd
. If you’d like to write the contents of this file into a string, use thesix.BytesIO
.
-
pack
()[source]¶ Creates a binary representation of this object as a string representation. It uses,
baseformat.pack_into()
to encode the string.
-
unpack_from
(fd)[source]¶ Loads a binary representation of this object
We don’t run any extra checks as an unpack operation is only supposed to be carried out once the type compatibility has been established.
-
unpack
(s)[source]¶ Loads a binary representation of this object from a string
Effectively, this method just calls
baseformat.unpack_from()
with asix.BytesIO
wrapped around the input string.
-
isclose
(other, *args, **kwargs)[source]¶ Tests for closeness in the numerical sense.
Values such as integers, booleans and strings are checked for an exact match. Parameters with floating-point components such as 32-bit floats and complex values should be close enough given the input parameterization.
Parameters for floating-point checks are those for
numpy.isclose()
. Check its help page for more details.Returns: indicates if the other object is close enough to this one. Return type: bool
-
dataformat¶
Validation and parsing for dataformats
-
class
beat.backend.python.dataformat.
Storage
(prefix, name)[source]¶ Bases:
beat.backend.python.utils.Storage
Resolves paths for dataformats
Parameters:
-
class
beat.backend.python.dataformat.
DataFormat
(prefix, data, parent=None, dataformat_cache=None)[source]¶ Bases:
object
Data formats define the chunks of data that circulate between blocks.
Parameters: - prefix (str) – Establishes the prefix of your installation.
- data (str, dict) – The fully qualified algorithm name (e.g.
user/algo/1
) or a dictionary representing the data format (for analyzer results). - parent (
tuple
, Optional) – The parent DataFormat for this format. If set toNone
, this means this dataformat is the first one on the hierarchy tree. If set to a tuple, the contents are(format-instance, field-name)
, which indicates the originating object that is this object’s parent and the name of the field on that object that points to this one. - dataformat_cache (
dict
, Optional) – A dictionary mapping dataformat names to loaded dataformats. This parameter is optional and, if passed, may greatly speed-up data format loading times as dataformats that are already loaded may be re-used. If you use this parameter, you must guarantee that the cache is refreshed as appropriate in case the underlying dataformats change.
-
errors
¶ A list containing errors found while loading this dataformat.
Type: list of str
-
parent
¶ The pointer to the dataformat to which the current format is part of. It is useful for internal error reporting.
Type: dataformat.DataFormat
-
name
Returns the name of this object, either from the filename or composed from the hierarchy it belongs.
-
schema_version
¶ Returns the schema version
-
extends
¶ If this dataformat extends another one, this is it, otherwise
None
-
type
¶ Returns a new type that can create instances of this dataformat.
The new returned type provides a basis to construct new objects which represent the dataformat. It provides a simple JSON serializer and a for-screen representation.
Example
To create an object respecting the data format from a JSON descriptor, use the following technique:
ftype = dataformat(...).type json = simplejson.loads(...) newobj = ftype(**json) # instantiates the new object, checks format
To dump the object into JSON, use the following technique:
simplejson.dumps(newobj.as_dict(), indent=4)
A string representation of the object uses the technique above to pretty-print the object contents to the screen.
-
valid
¶ A boolean that indicates if this dataformat is valid or not
-
description
The short description for this object
-
documentation
The full-length description for this object
-
validate
(data)[source]¶ Validates a piece of data provided by the user
In order to validate, the data object must be complete and safe-castable to this dataformat. For any other validation operation that would require special settings, use instead the
type()
method to generate a valid type and use eitherfrom_dict
,unpack
orunpack_from
depending on your use-case.Parameters: data (dict, str, file object) – This parameter represents the data to be validated. It may be a dictionary with the JSON representation of a data blob or, else, a binary blob (represented by either a string or a file descriptor object) from which the data will be read. If problems occur, an exception is raised. Returns: Raises if an error occurs. Return type: None
-
isparent
(other)[source]¶ Tells if the other object extends self (directly or indirectly).
Parameters: other (DataFormat) – another object to check Returns: True
, ifother
is a parent ofself
.False
- otherwise.
Return type: bool
-
json_dumps
(indent=4)[source]¶ Dumps the JSON declaration of this object in a string
Parameters: indent (int) – The number of indentation spaces at every indentation level Returns: The JSON representation for this object Return type: str
-
write
(storage=None)[source]¶ Writes contents to prefix location
Parameters: storage ( Storage
, Optional) – If you pass a new storage, then this object will be written to that storage point rather than its default.
-
export
(prefix)[source]¶ Recursively exports itself into another prefix
Other required dataformats are also copied.
Parameters: prefix (str) – Establishes the prefix of your installation. Returns: None Raises: RuntimeError
– If prefix and self.prefix point to the same directory.
algorithm¶
Validation for algorithms
-
class
beat.backend.python.algorithm.
Storage
(prefix, name, language=None)[source]¶ Bases:
beat.backend.python.utils.CodeStorage
Resolves paths for algorithms
Parameters:
-
class
beat.backend.python.algorithm.
Runner
(module, obj_name, algorithm, exc=None)[source]¶ Bases:
object
A special loader class for algorithms, with specialized methods
Parameters: - module (module) – The preloaded module containing the
algorithm as returned by
loader.load_module()
. - obj_name (str) – The name of the object within the module you’re interested on
- algorithm (object) – The algorithm instance that is used for parameter checking.
- exc (class) – The class to use as base exception when
translating the exception from the user code. Read the documentation of
loader.run()
for more details.
- module (module) – The preloaded module containing the
algorithm as returned by
-
class
beat.backend.python.algorithm.
Algorithm
(prefix, name, dataformat_cache=None, library_cache=None)[source]¶ Bases:
object
Algorithms represent runnable components within the platform.
This class can only parse the meta-parameters of the algorithm (i.e., input and output declaration, grouping, synchronization details, parameters and splittability). The actual algorithm is not directly treated by this class. It can, however, provide you with a loader for actually running the algorithmic code (see
Algorithm.runner()
).Parameters: - prefix (str) – Establishes the prefix of your installation.
- name (str) – The fully qualified algorithm name (e.g.
user/algo/1
) - dataformat_cache (
dict
, Optional) – A dictionary mapping dataformat names to loaded dataformats. This parameter is optional and, if passed, may greatly speed-up algorithm loading times as dataformats that are already loaded may be re-used. - library_cache (
dict
, Optional) – A dictionary mapping library names to loaded libraries. This parameter is optional and, if passed, may greatly speed-up library loading times as libraries that are already loaded may be re-used.
-
dataformats
¶ A dictionary containing all pre-loaded dataformats used by this algorithm. Data format objects will be of type
dataformat.DataFormat
.Type: dict
-
libraries
¶ A mapping object defining other libraries this algorithm needs to load so it can work properly.
Type: dict
-
uses
¶ A mapping object defining the required library import name (keys) and the full-names (values).
Type: dict
-
parameters
¶ A dictionary containing all pre-defined parameters that this algorithm accepts.
Type: dict
-
splittable
¶ A boolean value that indicates if this algorithm is automatically parallelizeable by our backend.
Type: bool
-
input_map
¶ A dictionary where the key is the input name and the value, its type. All input names (potentially from different groups) are comprised in this dictionary.
Type: dict
-
output_map
¶ A dictionary where the key is the output name and the value, its type. All output names (potentially from different groups) are comprised in this dictionary.
Type: dict
-
results
¶ If this algorithm is actually an analyzer (i.e., there are no formal outputs, but results that must be saved by the platform), then this dictionary contains the names and data types of those elements.
Type: dict
-
groups
¶ A list containing dictionaries with inputs and outputs belonging to the same synchronization group.
Type: dict
-
LEGACY
= 'legacy'¶
-
SEQUENTIAL
= 'sequential'¶
-
AUTONOMOUS
= 'autonomous'¶
-
name
Returns the name of this object
-
schema_version
¶ Returns the schema version
-
api_version
¶ Returns the API version
-
type
¶ Returns the type of algorithm
-
language
¶ Returns the current language set for the executable code
-
clean_parameter
(parameter, value)[source]¶ Checks if a given value against a declared parameter
This method checks if the provided user value can be safe-cast to the parameter type as defined on its specification and that it conforms to any parameter-imposed restrictions.
Parameters: Returns: The converted value, with an appropriate numpy type.
Raises: KeyError
– If the parameter cannot be found on this algorithm’s declaration.ValueError
– If the parameter cannot be safe cast into the algorithm’s type. Alternatively, aValueError
may also be raised if a range or choice was specified and the value does not obey those settings stipulated for the parameter
-
valid
¶ A boolean that indicates if this algorithm is valid or not
-
uses
-
isAnalyzer
¶ Returns whether this algorithms is an analyzer
-
results
The results of this algorithm
-
parameters
The parameters of this algorithm
-
splittable
Whether this algorithm can be split between several processes
-
description
¶ The short description for this object
-
documentation
¶ The full-length description for this object
-
runner
(klass='Algorithm', exc=None)[source]¶ Returns a runnable algorithm object.
Parameters: Returns: - An instance of the algorithm,
which will be constructed, but not setup. You must set it up before using the
process
method.
Return type:
-
json_dumps
(indent=4)[source]¶ Dumps the JSON declaration of this object in a string
Parameters: indent (int) – The number of indentation spaces at every indentation level Returns: The JSON representation for this object Return type: str
-
write
(storage=None)[source]¶ Writes contents to prefix location
Parameters: storage ( Storage
, Optional) – If you pass a new storage, then this object will be written to that storage point rather than its default.
-
export
(prefix)[source]¶ Recursively exports itself into another prefix
Dataformats and associated libraries are also copied.
Parameters: prefix (str) – A path to a prefix that must different then my own. Returns: None Raises: RuntimeError
– If prefix and self.prefix point to the same directory.
database¶
Validation of databases
-
class
beat.backend.python.database.
Storage
(prefix, name)[source]¶ Bases:
beat.backend.python.utils.CodeStorage
Resolves paths for databases
Parameters:
-
class
beat.backend.python.database.
Runner
(module, definition, prefix, root_folder, exc=None)[source]¶ Bases:
object
A special loader class for database views, with specialized methods
Parameters: - db_name (str) – The full name of the database object for this view
- module (module) – The preloaded module containing the database
views as returned by
loader.load_module()
. - prefix (str) – Establishes the prefix of your installation.
- root_folder (str) – The path pointing to the root folder of this database
- exc (class) – The class to use as base exception when
translating the exception from the user code. Read the documention of
loader.run()
for more details. - *args – Constructor parameters for the database view. Normally, none.
- **kwargs – Constructor parameters for the database view. Normally, none.
-
class
beat.backend.python.database.
Database
(prefix, name, dataformat_cache=None)[source]¶ Bases:
object
Databases define the start point of the dataflow in an experiment.
Parameters: - prefix (str) – Establishes the prefix of your installation.
- name (str) – The fully qualified database name (e.g.
db/1
) - dataformat_cache (
dict
, Optional) – A dictionary mapping dataformat names to loaded dataformats. This parameter is optional and, if passed, may greatly speed-up database loading times as dataformats that are already loaded may be re-used. If you use this parameter, you must guarantee that the cache is refreshed as appropriate in case the underlying dataformats change.
-
name
Returns the name of this object
-
description
¶ The short description for this object
-
documentation
¶ The full-length description for this object
-
schema_version
¶ Returns the schema version
-
valid
¶ A boolean that indicates if this database is valid or not
-
protocols
¶ The declaration of all the protocols of the database
-
protocol_names
¶ Names of protocols declared for this database
-
view
(protocol, name, exc=None, root_folder=None)[source]¶ Returns the database view, given the protocol and the set name
Parameters: Returns: The database view, which will be constructed, but not setup. You must set it up before using methods
done
ornext
.
-
json_dumps
(indent=4)[source]¶ Dumps the JSON declaration of this object in a string
Parameters: indent (int) – The number of indentation spaces at every indentation level Returns: The JSON representation for this object Return type: str
-
write
(storage=None)[source]¶ Writes contents to prefix location
Parameters: storage ( Storage
, Optional) – If you pass a new storage, then this object will be written to that storage point rather than its default.
-
export
(prefix)[source]¶ Recursively exports itself into another prefix
Dataformats associated are also exported recursively
Parameters: prefix (str) – A path to a prefix that must different then my own. Returns: None Raises: RuntimeError
– If prefix and self.prefix point to the same directory.
-
class
beat.backend.python.database.
View
[source]¶ Bases:
object
-
index
(root_folder, parameters)[source]¶ Returns a list of (named) tuples describing the data provided by the view.
The ordering of values inside the tuples is free, but it is expected that the list is ordered in a consistent manner (ie. all train images of person A, then all train images of person B, …).
For instance, assuming a view providing that kind of data:
----------- ----------- ----------- ----------- ----------- ----------- | image | | image | | image | | image | | image | | image | ----------- ----------- ----------- ----------- ----------- ----------- ----------- ----------- ----------- ----------- ----------- ----------- | file_id | | file_id | | file_id | | file_id | | file_id | | file_id | ----------- ----------- ----------- ----------- ----------- ----------- ----------------------------------- ----------------------------------- | client_id | | client_id | ----------------------------------- -----------------------------------
a list like the following should be generated:
[ (client_id=1, file_id=1, image=filename1), (client_id=1, file_id=2, image=filename2), (client_id=1, file_id=3, image=filename3), (client_id=2, file_id=4, image=filename4), (client_id=2, file_id=5, image=filename5), (client_id=2, file_id=6, image=filename6), ... ]
Warning
DO NOT store images, sound files or data loadable from a file in the list! Store the path of the file to load instead.
-
-
class
beat.backend.python.database.
DatabaseTester
(name, view_class, outputs_declaration, parameters, irregular_outputs=[], all_combinations=True)[source]¶ Bases:
object
Used while developing a new database view, to test its behavior
This class tests that, for each combination of connected/not connected outputs:
- Data indices seems consistent
- All the connected outputs produce data
- All the not connected outputs don’t produce data
It also report some stats, and can generate a text file detailing the data generated by each output.
By default, outputs are assumed to produce data at constant intervals. Those that don’t follow this pattern, must be declared as ‘irregular’.
Note that no particular check is done about the database declaration or the correctness of the generated data with their data formats. This class is mainly used to check that the outputs are correctly synchronized.
data¶
Data I/O classes and functions
-
exception
beat.backend.python.data.
RemoteException
(kind, message)[source]¶ Bases:
Exception
Exception happening on a remote location
-
beat.backend.python.data.
mixDataIndices
(list_of_data_indices)[source]¶ Given a collection of lists of data indices (belonging to separate but synchronized files/inputs), returns the most granular list of indices that span all the data
For example, the mix of
[(0, 2), (3, 4)]and
[(0, 4)]is:
[(0, 2), (3, 4)]The mix of
[(0, 2), (3, 4)]and
[(0, 1), (2, 3), (4, 4)]is:
[(0, 1), (2, 2), (3, 3), (4, 4)]
-
beat.backend.python.data.
getAllFilenames
(filename, start_index=None, end_index=None)[source]¶ Returns the names of all the files related to the given data file, taking the provided start and end indices into account.
Parameters: Returns: - (data_filenames, indices_filenames,
data_checksum_filenames, indices_checksum_filenames)
-
class
beat.backend.python.data.
DataSource
[source]¶ Bases:
object
Base class to load data from some source
-
class
beat.backend.python.data.
CachedDataSource
[source]¶ Bases:
beat.backend.python.data.DataSource
Utility class to load data from a file in the cache
-
setup
(filename, prefix, start_index=None, end_index=None, unpack=True)[source]¶ Configures the data source
Parameters: - filename (str) – Name of the file to read the data from
- prefix (str) – Establishes the prefix of your installation.
- start_index (int) – The starting index (if not set or set to
None
, the default, read data from the begin of file) - end_index (int) – The end index (if not set or set to
None
, the default, reads the data until the end) - unpack (bool) – Indicates if the data must be unpacked or not
Returns: True
, if successful, orFalse
otherwise.
-
-
class
beat.backend.python.data.
DatabaseOutputDataSource
[source]¶ Bases:
beat.backend.python.data.DataSource
Utility class to load data from an output of a database view
-
setup
(view, output_name, dataformat_name, prefix, start_index=None, end_index=None, pack=False)[source]¶ Configures the data source
Parameters: - prefix (str) – Establishes the prefix of your installation.
- start_index (int) – The starting index (if not set or set to
None
, the default, read data from the begin of file) - end_index (int) – The end index (if not set or set to
None
, the default, reads the data until the end) - unpack (bool) – Indicates if the data must be unpacked or not
Returns: True
, if successful, orFalse
otherwise.
-
-
class
beat.backend.python.data.
RemoteDataSource
[source]¶ Bases:
beat.backend.python.data.DataSource
Utility class to load data from a data source accessible via a socket
-
setup
(socket, input_name, dataformat_name, prefix, unpack=True)[source]¶ Configures the data source
Parameters: - socket (zmq.Socket) – The socket to use to access the data.
- input_name (str) – Name of the input corresponding to the data source.
- dataformat_name (str) – Name of the data format.
- prefix (str) – Establishes the prefix of your installation.
- unpack (bool) – Indicates if the data must be unpacked or not
Returns: True
, if successful, orFalse
otherwise.
-
-
class
beat.backend.python.data.
DataSink
[source]¶ Bases:
object
Interface of all the Data Sinks
Data Sinks are used by the outputs of an algorithm to write/transmit data.
-
write
(data, start_data_index, end_data_index)[source]¶ Writes a block of data
Parameters: - data (baseformat.baseformat) – The block of data to write
- start_data_index (int) – Start index of the written data
- end_data_index (int) – End index of the written data
-
-
class
beat.backend.python.data.
StdoutDataSink
[source]¶ Bases:
beat.backend.python.data.DataSink
Data Sink that prints informations about the written data on stdout
Note: The written data is lost! Use this class for debugging purposes
-
write
(data, start_data_index, end_data_index)[source]¶ Write a block of data
Parameters: - data (baseformat.baseformat) –
- start_data_index (int) – Start index of the written data
- end_data_index (int) – End index of the written data
-
-
class
beat.backend.python.data.
CachedDataSink
[source]¶ Bases:
beat.backend.python.data.DataSink
Data Sink that save data in the Cache
The default behavior is to save the data in a binary format.
-
setup
(filename, dataformat, start_index, end_index, encoding='binary')[source]¶ Configures the data sink
Parameters: - filename (str) – Name of the file to generate
- dataformat (dataformat.DataFormat) – The dataformat to be used inside this file. All objects stored inside this file will respect that format.
- encoding (str) – String defining the encoding to be used for encoding
the data. Only a few options are supported:
binary
(the default) orjson
(debugging purposes).
-
write
(data, start_data_index, end_data_index)[source]¶ Writes a block of data to the filesystem
Parameters: - data (baseformat.baseformat) – The block of data to write
- start_data_index (int) – Start index of the written data
- end_data_index (int) – End index of the written data
-
-
beat.backend.python.data.
load_data_index
(cache_root, hash_path)[source]¶ Loads a cached-data index if it exists. Returns empty otherwise.
Parameters: - cache_root (str) – The path to the root of the cache directory
- hash_path (str) – The hashed path of the input you wish to load the indexes
for, as it is returned by the utility function
hash.toPath()
.
Returns: A list, which will be empty if the index file is not present. Note that, given the current design, an empty list means an error condition.
data_loaders¶
This module implements all the data communication related classes
-
class
beat.backend.python.data_loaders.
DataView
(data_loader, data_indices)[source]¶ Bases:
object
Provides access to a subset of data from a group of inputs synchronized together
Data views are created from a data loader (see
DataLoader
), which are provided to the algorithms of types ‘sequential’ and ‘autonomous’ (seeDataLoaderList
).Example
view = data_loader.view('input1', 0) for i in range(view.count()) (data, start_index, end_index) = view[i]
Parameters: - data_loader (
DataLoader
) – Name of the data channel of the group of inputs - data_indices (
list
) – Data indices to consider as a list of tuples
-
data_index_start
¶ Lower data index across all inputs (see the section Inputs synchronization of the User’s Guide)
Type: int
-
data_index_end
¶ Bigger data index across all inputs (see the section Inputs synchronization of the User’s Guide)
Type: int
-
count
(input_name=None)[source]¶ Returns the number of available data indexes for the given input name. If none given the number of available data units.
Parameters: input_name (str) – Name of the input for which the count is requested Returns: - Number of data indexes for the input given or the number of
- data units.
Return type: (int)
- data_loader (
-
class
beat.backend.python.data_loaders.
DataLoader
(channel)[source]¶ Bases:
object
Provides access to data from a group of inputs synchronized together
Data loaders are provided to the algorithms of types ‘sequential’ and ‘autonomous’ (see
DataLoaderList
).Example
# Iterate through all the data for i in range(data_loader.count()) (data, start_index, end_index) = data_loader[i] print(data['input1'].data) # Restrict to a subset of the data view = data_loader.view('input1', 0) for i in range(view.count()) (data, start_index, end_index) = view[i]
Parameters: channel (str) – Name of the data channel of the group of inputs -
data_index_start
¶ Lower data index across all inputs (see the section Inputs synchronization of the User’s Guide)
Type: int
-
data_index_end
¶ Bigger data index across all inputs (see the section Inputs synchronization of the User’s Guide)
Type: int
-
count
(input_name=None)[source]¶ Returns the number of available data indexes for the given input name. If none given the number of available data units.
Parameters: input_name (str) – Name of the input for which the count is requested Returns: - Number of data indexes for the input given or the number of
- data units.
Return type: (int)
-
-
class
beat.backend.python.data_loaders.
DataLoaderList
[source]¶ Bases:
object
Represents a list of data loaders
Inputs are organized by groups. The inputs inside a group are all synchronized together (see the section Inputs synchronization of the User’s Guide). A data loader provides access to data from a group of inputs.
A list implementing this interface is provided to the algorithms of types ‘sequential’ and ‘autonomous’.
One group of inputs is always considered as the main one, and is used to drive the algorithm. The usage of the other groups is left to the algorithm.
See
DataLoader
Example
data_loaders = DataLoaderList() ... # Retrieve a data loader by name data_loader = data_loaders['labels'] # Retrieve a data loader by index for index in range(0, len(data_loaders)): data_loader = data_loaders[index] # Iteration over all data loaders for data_loader in data_loaders: ... # Retrieve the data loader an input belongs to, by input name data_loader = data_loaders.loaderOf('label')
-
main_loader
¶ Main data loader
Type: DataLoader
-
add
(data_loader)[source]¶ Add a data loader to the list
Parameters: data_loader (DataLoader) – The data loader to add
-
dbexecution¶
Execution utilities
-
class
beat.backend.python.dbexecution.
DBExecutor
(message_handler, prefix, cache_root, data, dataformat_cache=None, database_cache=None)[source]¶ Bases:
object
Executor specialised in database views
Parameters: - prefix (str) – Establishes the prefix of your installation.
- data (dict, str) – The piece of data representing the block to be executed. It must validate against the schema defined for execution blocks. If a string is passed, it is supposed to be a fully qualified absolute path to a JSON file containing the block execution information.
- dataformat_cache (
dict
, Optional) – A dictionary mapping dataformat names to loaded dataformats. This parameter is optional and, if passed, may greatly speed-up database loading times as dataformats that are already loaded may be re-used. If you use this parameter, you must guarantee that the cache is refreshed as appropriate in case the underlying dataformats change. - database_cache (
dict
, Optional) – A dictionary mapping database names to loaded databases. This parameter is optional and, if passed, may greatly speed-up database loading times as databases that are already loaded may be re-used. If you use this parameter, you must guarantee that the cache is refreshed as appropriate in case the underlying databases change.
-
databases
¶ A dictionary in which keys are strings with database names and values are
database.Database
, representing the databases required for running this block. The dictionary may be empty in case all inputs are taken from the file cache.Type: dict
-
views
¶ A dictionary in which the keys are tuples pointing to the
(<database-name>, <protocol>, <set>)
and the value is a setup view for that particular combination of details. The dictionary may be empty in case all inputs are taken from the file cache.Type: dict
-
input_list
¶ A list of inputs that will be served to the algorithm.
Type: inputs.InputList
-
address
¶ Address of the message handler
-
valid
¶ A boolean that indicates if this executor is valid or not
executor¶
A class that can setup and execute algorithm blocks on the backend
-
class
beat.backend.python.executor.
Executor
(socket, directory, dataformat_cache=None, database_cache=None, library_cache=None, cache_root='/cache', db_socket=None)[source]¶ Bases:
object
Executors runs the code given an execution block information
Parameters: - socket (zmq.Socket) – A pre-connected socket to send and receive messages from.
- directory (str) – The path to a directory containing all the information required to run the user experiment.
- dataformat_cache (
dict
, Optional) – A dictionary mapping dataformat names to loaded dataformats. This parameter is optional and, if passed, may greatly speed-up database loading times as dataformats that are already loaded may be re-used. If you use this parameter, you must guarantee that the cache is refreshed as appropriate in case the underlying dataformats change. - database_cache (
dict
, Optional) – A dictionary mapping database names to loaded databases. This parameter is optional and, if passed, may greatly speed-up database loading times as databases that are already loaded may be re-used. If you use this parameter, you must guarantee that the cache is refreshed as appropriate in case the underlying databases change. - library_cache (
dict
, Optional) – A dictionary mapping library names to loaded libraries. This parameter is optional and, if passed, may greatly speed-up library loading times as libraries that are already loaded may be re-used. If you use this parameter, you must guarantee that the cache is refreshed as appropriate in case the underlying libraries change.
-
runner
¶ Returns the algorithm runner
This property allows for lazy loading of the runner
-
schema_version
¶ Returns the schema version
-
analysis
¶ A boolean that indicates if the current block is an analysis block
helpers¶
This module implements various helper methods and classes
-
class
beat.backend.python.helpers.
AccessMode
[source]¶ Bases:
object
Possible access modes
-
NONE
= 0¶
-
LOCAL
= 1¶
-
REMOTE
= 2¶
-
inputs¶
This module implements input related classes
-
class
beat.backend.python.inputs.
Input
(name, data_format, data_source)[source]¶ Bases:
object
Represents an input of a processing block that receive data from a (legacy) data source
A list of those inputs must be provided to the algorithms (see
InputList
)Parameters: -
group
¶ Group containing this input
Type: InputGroup
-
data
¶ The last block of data received on the input
Type: baseformat.baseformat
-
data_index
¶ Index of the last block of data received on the input (see the section Inputs synchronization of the User’s Guide)
Type: int
-
data_index_end
¶ End index of the last block of data received on the input (see the section Inputs synchronization of the User’s Guide)
Type: int
-
data_same_as_previous
¶ Indicates if the last block of data received was changed (see the section Inputs synchronization of the User’s Guide)
Type: bool
-
-
class
beat.backend.python.inputs.
InputGroup
(channel, synchronization_listener=None, restricted_access=True)[source]¶ Bases:
object
Represents a group of inputs synchronized together
A group implementing this interface is provided to the algorithms (see
InputList
).See
Input
Example
inputs = InputList() print(inputs['labels'].data_format) for index in range(0, len(inputs)): print(inputs[index].data_format) for input in inputs: print(input.data_format) for input in inputs[0:2]: print(input.data_format)
Parameters: - channel (str) – Name of the data channel of the group
- synchronization_listener (outputs.SynchronizationListener) – Synchronization listener to use
- restricted_access (bool) – Indicates if the algorithm can freely use the inputs
-
data_index
¶ Index of the last block of data received on the inputs (see the section Inputs synchronization of the User’s Guide)
Type: int
-
data_index_end
¶ End index of the last block of data received on the inputs (see the section Inputs synchronization of the User’s Guide)
Type: int
-
synchronization_listener
¶ Synchronization listener used
Type: outputs.SynchronizationListener
-
class
beat.backend.python.inputs.
InputList
[source]¶ Bases:
object
Represents the list of inputs of a processing block
Inputs are organized by groups. The inputs inside a group are all synchronized together (see the section Inputs synchronization of the User’s Guide).
A list implementing this interface is provided to the algorithms
One group of inputs is always considered as the main one, and is used to drive the algorithm. The usage of the other groups is left to the algorithm.
See
Input
SeeInputGroup
Example
inputs = InputList() ... # Retrieve an input by name input = inputs['labels'] # Retrieve an input by index for index in range(0, len(inputs)): input = inputs[index] # Iteration over all inputs for input in inputs: ... # Iteration over some inputs for input in inputs[0:2]: ... # Retrieve the group an input belongs to, by input name group = inputs.groupOf('label') # Retrieve the group an input belongs to input = inputs['labels'] group = input.group
-
main_group
¶ Main group (for data-driven algorithms)
Type: InputGroup
-
add
(group)[source]¶ Add a group to the list
Parameters: group (InputGroup) – The group to add
-
library¶
Validation for libraries
-
class
beat.backend.python.library.
Storage
(prefix, name, language=None)[source]¶ Bases:
beat.backend.python.utils.CodeStorage
Resolves paths for libraries
Parameters:
-
class
beat.backend.python.library.
Library
(prefix, name, library_cache=None)[source]¶ Bases:
object
Librarys represent independent algorithm components within the platform.
This class can only parse the meta-parameters of the library. The actual library is not directly treated by this class - only by the associated algorithms.
Parameters: - prefix (str) – Establishes the prefix of your installation.
- name (str) – The fully qualified algorithm name (e.g.
user/algo/1
) - library_cache (
dict
, Optional) – A dictionary mapping library names to loaded libraries. This parameter is optional and, if passed, may greatly speed-up library loading times as libraries that are already loaded may be re-used.
-
libraries
¶ A mapping object defining other libraries this library needs to load so it can work properly.
Type: dict
-
uses
¶ A mapping object defining the required library import name (keys) and the full-names (values).
Type: dict
-
load
()[source]¶ Loads the Python module for this library resolving all references
Returns the loaded Python module.
-
name
Returns the name of this object
-
schema_version
¶ Returns the schema version
-
language
¶ Returns the current language set for the library code
-
valid
¶ A boolean that indicates if this library is valid or not
-
uses
-
description
The short description for this object
-
documentation
The full-length description for this object
-
json_dumps
(indent=4)[source]¶ Dumps the JSON declaration of this object in a string
Parameters: indent (int) – The number of indentation spaces at every indentation level Returns: The JSON representation for this object Return type: str
-
write
(storage=None)[source]¶ Writes contents to prefix location.
Parameters: storage ( Storage
, Optional) – If you pass a new storage, then this object will be written to that storage point rather than its default.
-
export
(prefix)[source]¶ Recursively exports itself into another prefix
Other required libraries are also copied.
Parameters: prefix (str) – Establishes the prefix of your installation. Returns: None Raises: RuntimeError
– If prefix and self.prefix point to the same directory.
message_handler¶
This module implements a message handler that will be in charge with ZeroMQ communication.
-
class
beat.backend.python.message_handler.
MessageHandler
(host_address, data_sources=None, kill_callback=None, context=None)[source]¶ Bases:
threading.Thread
A 0MQ message handler for our communication with other processes
-
run
()[source]¶ Method representing the thread’s activity.
You may override this method in a subclass. The standard run() method invokes the callable object passed to the object’s constructor as the target argument, if any, with sequential and keyword arguments taken from the args and kwargs arguments, respectively.
-
outputs¶
This module implements output related classes
-
class
beat.backend.python.outputs.
SynchronizationListener
[source]¶ Bases:
object
A callback mechanism to keep Inputs and Outputs in groups and lists synchronized together.
-
class
beat.backend.python.outputs.
Output
(name, data_sink, synchronization_listener=None, force_start_index=0)[source]¶ Bases:
object
Represents one output of a processing block
A list of outputs implementing this interface is provided to the algorithms (see
OutputList
).Parameters: - name (str) – Name of the output
- data_sink (data.DataSink) – Sink of data to be used by the output, pre-configured with the correct data format.
-
data_sink
¶ Sink of data used by the output
Type: data.DataSink
-
write
(data, end_data_index=None)[source]¶ Write a block of data on the output
Parameters: - data (baseformat.baseformat) – The block of data to write, or None (if the algorithm doesn’t want to write any data)
- end_data_index (int) – Last index of the written data (see the section Inputs synchronization of the User’s Guide). If not specified, the current end data index of the Inputs List is used
-
class
beat.backend.python.outputs.
OutputList
[source]¶ Bases:
object
Represents the list of outputs of a processing block
A list implementing this interface is provided to the algorithms
See
Output
.Example
outputs = OutputList() ... print(outputs['result'].data_format) for index in six.moves.range(0, len(outputs)): outputs[index].write(...) for output in outputs: output.write(...) for output in outputs[0:2]: output.write(...)
stats¶
This module implements statistical related helper functions.
-
beat.backend.python.stats.
io_statistics
(configuration, input_list=None, output_list=None)[source]¶ Summarize current I/O statistics looking at data sources and sinks, inputs and outputs
Parameters: - configuration (dict) – Executor configuration
- input_list (inputs.InputList) – List of input to gather statistics from
- output_list (outputs.OutputList) – List of outputs to gather statistics from
Returns: A dictionary summarizing current I/O statistics
Return type:
utils¶
This module implements helper classes and functions.
-
beat.backend.python.utils.
hashed_or_simple
(prefix, what, path, suffix='.json')[source]¶ Returns a hashed path or simple path depending on where the resource is
-
beat.backend.python.utils.
safe_rmdir
(f)[source]¶ Safely removes the directory containg a given file from the disk
-
beat.backend.python.utils.
extension_for_language
(language)[source]¶ Returns the preferred extension for a given programming language
The set of languages supported must match those declared in our
common.json
schema.Parameters: language (str) – Returns: The extension for the given language, including a leading .
(dot)Return type: str Raises: KeyError
– If the language is not defined in our internal dictionary.
-
class
beat.backend.python.utils.
File
(path, binary=False)[source]¶ Bases:
object
User helper to read and write file objects
-
class
beat.backend.python.utils.
Storage
(path)[source]¶ Bases:
object
Resolves paths for objects that provide only a description
-
class
beat.backend.python.utils.
CodeStorage
(path, language=None)[source]¶ Bases:
object
Resolves paths for objects that provide a description and code
Parameters: language (str) – One of the valdid programming languages -
language
¶
-
-
class
beat.backend.python.utils.
NumpyJSONEncoder
(skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, encoding='utf-8', default=None, use_decimal=True, namedtuple_as_object=True, tuple_as_array=True, bigint_as_string=False, item_sort_key=None, for_json=False, ignore_nan=False, int_as_string_bitcount=None, iterable_as_array=False)[source]¶ Bases:
simplejson.encoder.JSONEncoder
Encodes numpy arrays and scalars
See also
-
default
(obj)[source]¶ Implement this method in a subclass such that it returns a serializable object for
o
, or calls the base implementation (to raise aTypeError
).For example, to support arbitrary iterators, you could implement default like this:
def default(self, o): try: iterable = iter(o) except TypeError: pass else: return list(iterable) return JSONEncoder.default(self, o)
-