.. vim: set fileencoding=utf-8 : .. Copyright (c) 2016 Idiap Research Institute, http://www.idiap.ch/ .. .. Contact: beat.support@idiap.ch .. .. .. .. This file is part of the beat.docs module of the BEAT platform. .. .. .. .. Commercial License Usage .. .. Licensees holding valid commercial BEAT licenses may use this file in .. .. accordance with the terms contained in a written agreement between you .. .. and Idiap. For further information contact tto@idiap.ch .. .. .. .. Alternatively, this file may be used under the terms of the GNU Affero .. .. Public License version 3 as published by the Free Software and appearing .. .. in the file LICENSE.AGPL included in the packaging of this file. .. .. The BEAT platform is distributed in the hope that it will be useful, but .. .. WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY .. .. or FITNESS FOR A PARTICULAR PURPOSE. .. .. .. .. You should have received a copy of the GNU Affero Public License along .. .. with the BEAT platform. If not, see http://www.gnu.org/licenses/. .. .. _beat-system-algorithms: ============ Algorithms ============ Algorithms are user-defined piece of software that run within the blocks of a toolchain. An algorithm can read data on the input(s) of the block and write processed data on its output(s) (We refer to the inputs and outputs collectively as *endpoints*.). They are, hence, key components for scientific experiments, since they formally describe how to transform raw data into higher level concept such as classes. An algorithm lies at the core of each processing block and may be subject to parametrization. Inputs and outputs of an algorithm have well-defined data formats. The format of the data on each input and output of the block is defined at a higher-level in BEAT framework. It is expected that the implementation of the algorithm respects the format of each endpoint that was declared before. :numref:`beat-core-overview-block` displays the relationship between a processing block and its algorithm. .. _beat-core-overview-block: .. figure:: ./img/block.* Relationship between a processing block and its algorithm Typically, an algorithm will process data units received at the input endpoints, and push the relevant results to the output endpoint. Each algorithm must have at least one input and at least one output. The links in a toolchain connect the output of one block to the input of another effectively connecting algorithms together, thus determining the information-flow through the toolchain. Blocks at the beginning of the toolchain are typically connected to datasets, and blocks at the end of a toolchain are connected to analyzers (special algorithms with no output). BEAT is responsible for delivering inputs from the desired datasets into the toolchain and through your algorithms. This drives the synchronization of information-flow through the toolchain. Flow synchronization is determined by data units produced from a dataset and injected into the toolchain. .. note:: **Naming Convention** Algorithms are named using three values joined by a ``/`` (slash) operator: * **username**: indicates the author of the algorithm * **name**: indicates the name of the algorithm * **version**: indicates the version (integer starting from ``1``) of the algorithm Each tuple of these three components defines a *unique* algorithm name inside the BEAT ecosystem. .. _beat-system-algorithms-types: Algorithm types =============== The current version of BEAT framework has two algorithm type which are different in the way they handle data samples. These algorithms are the following: - Sequential - Autonomous In the previous versions of BEAT only one type of algorithm (referred to as v1 algorithm) was implemented. The sequential algorithm type is the direct successor of the v1 algorithm. For migration information, see :ref:`beat-system-algorithms-api-migration`. The platform now also provides the concept of soft loop. The soft loop allows the implementation of supervised processing within a macro block. Sequential ---------- The sequential algorithm is **data-driven**; algorithm is typically provided one data sample at a time and must immediately produce some output data. Autonomous ---------- The autonomous algorithm as its name suggest is responsible for loading the data samples it needs in order to do its work. It's also responsible for writing the appropriate amount of data on its outputs. Furthermore, the way the algorithm handle the data is highly configurable and covers a huge range of possible scenarios. Loop ---- A loop is composed of three elements: - An processor algorithm - An evaluator algorithm - A LoopChannel The two algorithms work in pair using the LoopChannel to communicate. The processor algorithm is responsible for applying some transformation or analysis on a set of data and then send the result to evaluator for validation. The role of the evaluator is to provide a feedback to the processor that will either continue processing the same block of data or go on with the next until all data is exhausted. The output writing of the evaluator is synchronized with the output writing of the processor. Sequential versions have also the reading part that is synchronized so that the evaluator can read data at the same pace as the processor. The two algorithms are available in both sequential and autonomous form. However there are only three valid combinations: ========== ========== Processor Evaluator ========== ========== Autonomous Autonomous Sequential Sequential Sequential Autonomous ========== ========== .. _beat-system-algorithms-definition: Definition of an Algorithm ========================== An algorithm is defined by two distinct components: * a `JSON`_ object with several fields, specifying the inputs, the outputs, the parameters and additional information such as the language in which it is implemented. * source code (and/or [later] binary code) describing how to transform the input data. .. _beat-system-algorithms-definition-json: JSON Declaration ---------------- A `JSON`_ declaration of an algorithm consists of several fields. For example, the following declaration is the one of an algorithm implementing probabilistic component analysis (PCA): .. code-block:: javascript { "schema_version": 2, "language": "python", "api_version": 2, "type": "sequential", "splittable": false, "groups": [ { "inputs": { "image": { "type": "system/array_2d_uint8/1" } }, "outputs": { "subspace": { "type": "tutorial/linear_machine/1" } } } ], "parameters": { "number-of-components": { "default": 5, "type": "uint32" } }, "description": "Principal Component Analysis (PCA)" } Here are the description for each of the fields in the example above: * **schema_version:** specifies which schema version must be used to validate the file content. * **api_version:** specifies the version of the API implemented by the algorithm. * **type:** specifies the type of the algorithm. Depending on that, the execution model will change. * **language:** specifies the language in which the algorithm is implemented. * **splittable:** indicates, whether the algorithm can be parallelized into chunks or not. * **parameters:** lists the parameters of the algorithm, describing both default values and their types. * **groups:** gives information about the inputs and outputs of the algorithm. They are provided into a list of dictionary, each element in this list being associated to a database *channel*. The group, which contains outputs, is the *synchronization channel*. By default, a loop is automatically performed by the BEAT framework on the synchronization channel, and user-code must not loop on this group. In contrast, it is the responsibility of the user to load data from the other groups. This is described in more details in the following subsections. * **description:** is optional and gives a short description of the algorithm. .. note:: The graphical interface of BEAT provides user-friendly editors to configure the main components of the system (for example: algorithms, data formats, etc.), which simplifies their `JSON`_ declaration definition. One needs only to declare an algorithm using the described specifications when not using this graphical interface. .. _beat-system-algorithms-definition-analyzer: Analyzer ........ At the end of the processing workflow of an experiment, there is a special kind of algorithm, which does not yield any *output* but instead it produces *results*. These algorithms are called **analyzers**. *Results* of an experiment are reported back to the user. Data privacy is very important in the BEAT framework and therefore only a limited number of data formats can be employed as results in an analyzer, such as boolean, integers, floating point values, strings (of limited size), as well as plots (such as scatter or bar plots). For example, the following declaration is the one of a simple analyzer, which generates an ROC curve as well as few other metrics. .. code-block:: javascript { "language": "python", "groups": [ { "inputs": { "scores": { "type": "tutorial/probe_scores/1" } } } ], "results": { "far": { "type": "float32", "display": true }, "roc": { "type": "plot/scatter/1", "display": false }, "number_of_positives": { "type": "int32", "display": false }, "frr": { "type": "float32", "display": true }, "eer": { "type": "float32", "display": true }, "threshold": { "type": "float32", "display": false }, "number_of_negatives": { "type": "int32", "display": false } } } .. _beat-system-algorithms-definition-code: Source code ----------- The BEAT framework has been designed to support algorithms written in different programming languages. However, for each language, a corresponding back-end needs to be implemented, which is in charge of connecting the inputs and outputs to the algorithm and running its code as expected. In this section, we describe the implementation of algorithms in the Python and C++ programming language. |project| treats algorithms as objects that are derived from the class ``Algorithm`` when using Python or in case of C++, they should be derived from ``IAlgorithmLagacy``, ``IAlgorithmSequential``, or ``IAlgorithmAutonomous`` depending of the algorithm type. To define a new algorithm, at least one method must be implemented: * ``process()``: the method that actually processes input and produces outputs. The code example below illustrates the implementation of an algorithm (in Python): .. code-block:: python :linenos: class Algorithm: def process(self, inputs, data_loaders, outputs): # here, you read inputs, process and write results to outputs Here is the equivalent example for a sequential algorithm in C++: .. code-block:: c++ :linenos: class Algorithm: public IAlgorithmSequential { public: bool process(const InputList& inputs, const DataloaderList& data_load_list, const OutputList& outputs) override { // here, you read inputs, process and write results to outputs } }; .. _beat-system-algorithms-examples: Examples ........ To implement a new algorithm, one must write a class following a few conventions. In the following, examples of such classes are provided. .. _beat-system-algorithms-examples-simple-sequential: Simple sequential algorithm (no parametrization) ................................................ At the very minimum, an algorithm class must look like this: .. code-block:: python class Algorithm: def process(self, inputs, data_loaders, outputs): # Read data from inputs, compute something, and write the result # of the computation on outputs ... return True The class must be called ``Algorithm`` and must have a method called ``process()``, that takes as parameters a list of inputs (see section :ref:`beat-system-algorithms-input-inputlist`), a list of data loader (see section :ref:`beat-system-algorithms-dataloaders-dataloaderlist`) and a list of outputs (see section :ref:`beat-system-algorithms-output-outputlist`). This method must return ``True`` if everything went correctly, and ``False`` if an error occurred. The platform will call this method once per block of data available on the `synchronized` inputs of the block. .. _beat-system-algorithms-examples-simple-autonomous: Simple autonomous algorithm (no parametrization) ................................................ At the very minimum, an algorithm class must look like this: .. code-block:: python class Algorithm: def process(self, data_loaders, outputs): # Read data from data_loaders, compute something, and write the # result of the computation on outputs ... return True The class must be called ``Algorithm`` and must have a method called ``process()``, that takes as parameters a list of data loader (see section :ref:`beat-system-algorithms-dataloaders`) and a list of outputs (see section :ref:`beat-system-algorithms-output-outputlist`). This method must return ``True`` if everything went correctly, and ``False`` if an error occurred. The platform will call this method only once as it is its responsibility to load the appropriate amount of data and process it. .. _beat-system-algorithms-examples-simple-processor: Simple autonomous processor algorithm (no parametrization) .......................................................... At the very minimum, a processor algorithm class must look like this: .. code-block:: python class Algorithm: def process(self, data_loaders, outputs, loop_channel): # Read data from data_loaders, compute something, and validates the # hypothesis ... is_valid, feedback = loop_channel.validate({"value": np.float64(some_value)}) # check is_valid and continue appropriately and write the result # of the computation on outputs ... return True The class must be called ``Algorithm`` and must have a method called ``process()``, that takes as parameters a list of inputs (see section :ref:`beat-system-algorithms-input-inputlist`), a list of data loader (see section :ref:`beat-system-algorithms-dataloaders-dataloaderlist`), a list of outputs (see section :ref:`beat-system-algorithms-output-outputlist`) and a loop chanel (see section :ref:`beat-system-algorithms-loop-channel`) . This method must return ``True`` if everything went correctly, and ``False`` if an error occurred. The platform will call this method once per block of data available on the `synchronized` inputs of the block. .. _beat-system-algorithms-examples-simple-evaluator: Simple autonomous evaluator algorithm (no parametrization) .......................................................... At the very minimum, a processor algorithm class must look like this: .. code-block:: python class Algorithm: def validate(self, hypothesis): # compute if hypothesis makes sense and returns a tuple with a # boolean value and some feendback return (result, {"value": np.float32(delta)}) def write(self, outputs, processor_output_name, end_data_index): # write something on its output, it is called in sync with processor # algorithm write outputs["out"].write({"value": np.int32(self.output)}, end_data_index) The class must be called ``Algorithm`` and must have a method called ``validate()``, that takes as parameter a dataformat that will contain the hypothesis that needs validation. The function must return a tuple made of a boolean value and feedback value that will be used by the processor to determine whether it should continue processing the current data or move further. .. _beat-system-algorithms-examples-parameterizable: Parameterizable algorithm ......................... The following is valid for all types of algorithms To implement a parameterizable algorithm, two things must be added to the class: (1) a field in the JSON declaration of the algorithm containing their default values as well as the type of the parameters, and (2) a method called ``setup()``, that takes one argument, a map containing the parameters of the algorithm. .. code-block:: javascript { ... "parameters": { "threshold": { "default": 0.5, "type": "float32" } }, ... } .. code-block:: python class Algorithm: def setup(self, parameters): # Retrieve the value of the parameters self.threshold = parameters['threshold'] return True def process(self, inputs, data_loaders, outputs): # Read data from inputs, compute something, and write the result # of the computation on outputs ... return True When retrieving the value of the parameters, one must not assume that a value was provided for each parameter. This is why we may use a *try: ... except: ...* construct in the ``setup()`` method. .. _beat-system-algorithms-preparation: Preparation of an algorithm ........................... The following is valid for all types of algorithms Often algorithms need to compute some values or retrieve some data prior to applying their mathematical logic. This is possible using the prepare method. .. code-block:: python class Algorithm: def prepare(self, data_loaders): # Retrieve and prepare some data. data_loader = data_loaders.loaderOf('in2') (data, _, _) = data_loader[0] self.offset = data['in2'].value return True def process(self, inputs, data_loaders, outputs): # Read data from inputs, compute something, and write the result # of the computation on outputs ... return True Data Synchronization in Sequential Algorithms ============================================= One particularity of the |project| framework is how the data-flow through a given toolchain is synchronized. The framework is responsible for extracting data units (images, speech-segments, videos, etc.) from the database and presenting them to the input endpoints of certain blocks, as specified in the toolchain. Each time a new data unit is presented to the input of a block can be thought of as an individual time-unit. The algorithm implemented in a block is responsible for the synchronization between its inputs and its output. In other words, every time a data unit is produced by a dataset on an experiment, the ``process()`` method of your algorithm is called to act upon it. An algorithm may have one of two kinds of sychronicities: one-to-one, and many-to-one. These are discussed in detail in separate sections below. One-to-one synchronization -------------------------- Here, the algorithm generates one output for every input entity (e.g., image, video, speech-file). For example, an image-based feature-extraction algorithm would typically output one set of features every time it is called with a new input image. A schematic diagram of one-to-one sychronization for an algorithm is shown in the figure below: .. image:: img/case-study-1.* At the configuration shown in this figure, the algorithm-block has two endpoints: one input, and one output. The inputs and outputs and the block are synchronized together (notice the color information). Each red box represents one input unit (e.g., an image, or a video), that is fed to the input interface of the block. Corresponding to each input received, the block produces one output unit, shown as a blue box in the figure. An example code showing how to implement an algorithm in this configuration is shown below: .. code-block:: python :linenos: class Algorithm: def process(self, inputs, data_loaders, outputs): # to read the field "value" on the "in" input, use "data" # a possible declaration of "user/format/1" would be: # { # "value": ... # } value = inputs['in'].data.value # do your processing and create the "output" value output = magical_processing(value) # to write "output" into the relevant endpoint use "write" # a possible declaration of "user/other/1" would be: # { # "value": ... # } outputs['out'].write({'value': output}) # No error occurred, so return True return True .. code-block:: c++ :linenos: class Algorithm: public IAlgorithmSequential { public: bool process(const InputList& inputs, const DataloaderList& data_load_list, const OutputList& outputs) override { // to read the field "value" on the "in" input, use "data" // a possible declaration of "user/format/1" would be: // { // "value": ... // } auto value = inputs["in"]->data()->value; // do your processing and create the "output" value auto output = magical_processing(value); // to write "output" into the relevant endpoint use "write" // a possible declaration of "user/other/1" would be: // { // "value": ... // } user::other_1 result; result.value = output; outputs["out"]->write(&result); # No error occurred, so return true return true; } }; In this example, the platform will call the user algorithm every time a new input block with the format ``user/format/1`` is available at the input. Notice no ``for`` loops are necessary on the user code. The platform controls the looping for you. A more complex case of one-to-one sychronization is shown the following figure: .. image:: img/case-study-2.* In such a configuration, the platform will ensure that each input unit at the input-endpoint ``in`` is associated with the correct input unit at the input-endpoint ``in2``. For example, referring to the figure above, the items at the input ``in`` could be images, at the items at the input ``in2`` could be labels, and the configuration depicted indicates that the first two input images have the same label, say, ``l1``, whereas the next two input images have the same label, say, ``l2``. The algorithm produces one output item at the endpoint ``out``, for each input object presented at endpoint ``in``. Example code implementing an algorithm processing data in this scenario is shown below: .. code-block:: python :linenos: class Algorithm: def process(self, inputs, data_loaders, outputs): i1 = inputs['in'].data.value i2 = inputs['in2'].data.value out = magical_processing(i1, i2) outputs['out'].write({'value': out}) return True .. code-block:: c++ :linenos: class Algorithm: public IAlgorithmSequential { public: bool process(const InputList& inputs, const DataloaderList& data_load_list, const OutputList& outputs) override { auto i1 = inputs["in"]->data()->value; auto i2 = inputs["in2"]->data()->value; auto out = magical_processing(i1, i2); user::other_1 result; result.value = out; outputs["out"]->write(&result); return true; } }; You should notice that we still don't require any sort of ``for`` loops! BEAT *synchronizes* the inputs ``in`` and ``in2`` so they are available to your program as the dataset implementor defined. Many-to-one synchronization --------------------------- Here, the algorithm produces a single output after processing a batch of inputs. For example, the algorithm may produce a model for a *dog* after processing all input images for the *dog* class. A block diagram illustrating many-to-one synchronization is shown below: .. image:: img/case-study-3.* Here the synchronization is driven by the endpoint ``in2``. For each data unit received at the input ``in2``, the algorithm generates one output unit. Note that, here, multiple units received at the input ``in`` are accumulated and associated with a single unit received at ``in2``. The user does not have to handle the internal indexing. Producing output data at the right moment is enough for BEAT to understand the output is synchronized with ``in2``. The example below illustrates how such an algorithm could be implemented: .. code-block:: python :linenos: class Algorithm: def __init__(self): self.objs = [] def process(self, inputs, data_loaders, outputs): self.objs.append(inputs['in'].data.value) # accumulates if not (inputs['in2'].hasMoreData()): out = magical_processing(self.objs) outputs['out'].write({'value': out}) self.objs = [] #reset accumulator for next label return True .. code-block:: c++ :linenos: class Algorithm: public IAlgorithmSequential { public: bool process(const InputList& inputs, const DataloaderList& data_load_list, const OutputList& outputs) override { objs.push_back(inputs["in"]->data()->value); // accumulates if !(inputs["in2"]->hasMoreData()) { auto out = magical_processing(objs); user::other_1 result; result.value = out; outputs["out"]->write(&result); objs.clear(); // reset accumulator for next label } return true; } public: std::vector objs; }; Here, the units received at the endpoint ``in`` are accumulated as long as the ``hasMoreData()`` method attached to the input ``in2`` returns ``True``. When ``hasMoreData()`` returns ``False``, the corresponding label is read from ``in2``, and a result is produced at the endpoint ``out``. After an output unit has been produced, the internal accumulator for ``in`` is cleared, and the algorithm starts accumulating a new set of objects for the next label. Unsynchronized Operation ------------------------ Not all inputs for a block need to be synchronized together. In the diagram shown below, the block is synchronized with the inputs ``in`` and ``in2`` (as indicated by the green circle which matches the colour of the input lines connecting ``in`` and ``in2``). The output ``out`` is synchronized with the block (and as one can notice locking at the code below, outputs signal after every ``in`` input). The input ``in3`` is not synchronized with the endpoints ``in``, ``in2`` and with the block. A processing block which receives a previously calculated model and must score test samples is a good example for this condition. In this case, the user is responsible for reading out the contents of ``in3`` explicitly. .. image:: img/case-study-4.* In this case the algorithm will include an explicit loop to read the unsynchronized input (``in3``). .. code-block:: python :linenos: class Algorithm: def __init__(self): self.models = [] def prepare(self, data_loaders): # Loads the "model" data at the beginning loader = data_loaders.loaderOf('in3') for i in range(loader.count()): view = loader.view('in3', i) data, _, _ = view[0] self.models.append(data['in3'].value) return True def process(self, inputs, data_loaders, outputs): # N.B.: this will be called for every unit in `in' # Processes the current input in `in' and `in2', apply the # model/models out = magical_processing(inputs['in'].data.value, inputs['in2'].data.value, self.models) # Writes the output outputs.write({'value': out}) return True .. code-block:: c++ :linenos: class Algorithm: public IAlgorithmSequential { public: bool prepare(const beat::backend::cxx::DataLoaderList& data_load_list) override { auto loader = data_load_list["in3"]; for (int i = 0 ; i < loader->count() ; ++i) { auto view = loader->view("in3", i); std::map data; std::tie(data, std::ignore, std::ignore) = (*view)[0]; auto model = static_cast(data["in3"]); models.append(*model); } return true; } bool process(const InputList& inputs, const DataloaderList& data_load_list, const OutputList& outputs) override { # N.B.: this will be called for every unit in `in' // Processes the current input in `in' and `in2', apply the model/models auto out = magical_processing(inputs["in"]->data()->value, inputs["in2"]->data()->value, models); // Writes the output user::other_1 result; result.value = out; outputs["out"]->write(&result); return true; } public: std::vector models; }; In the example above you have several inputs which are synchronized together, but unsynchronized with the block you're writing your algorithm for. It may also happen that you have even more data inputs that are unsynchronized. In this case, using *group* for different set of inputs makes the code easier to read. .. it is safer to treat inputs using their *group*. For example: .. .. code-block python .. :linenos: .. class Algorithm: .. def __init__(self): .. self.models = None .. def prepare(self, data_loaders): .. #??? Is the concept of groups any use when we have dataloaders assuming this scenario??? .. # Loads the "model" data at the beginning .. loader = data_loaders.loaderOf('in3') .. for i in range(loader.count()): .. view = loader.view('in3', i) .. data, _, _ = view[0] .. self.models.append(data['in3'].value) .. def process(self, inputs, data_loaders, outputs): .. # N.B.: this will be called for every unit in `in' .. # Loads the "model" data at the beginning, once .. if self.models is None: .. self.models = [] .. group = inputs.groupOf('in3') .. while group.hasMoreData(): .. group.next() #synchronously advances the data .. self.models.append(group['in3'].data.value) .. # Processes the current input in `in' and `in2', apply the model/models .. out = magical_processing(inputs['in'].data.value, .. inputs['in2'].data.value, .. self.models) .. # Writes the output .. outputs.write({'value': out}) .. return True .. code-block c++ .. :linenos: .. class Algorithm: public IAlgorithmSequential .. { .. public: bool prepare(const beat::backend::cxx::DataLoaderList& data_load_list) override .. { .. auto loader = data_load_list["in3"]; .. for (int i = 0 ; i < loader->count() ; ++i) { .. auto view = loader->view("in3", i); std::map data; std::tie(data, std::ignore, std::ignore) = (*view)[0]; auto model = static_cast(data["in3"]); .. models.append(*model); .. } .. return true; .. } .. bool process(const InputList& inputs, const DataloaderList& data_load_list, const OutputList& outputs) override .. { .. // N.B.: this will be called for every unit in `in' .. // Processes the current input in `in' and `in2', apply the model/models auto out = magical_processing(inputs["in"]->data()->value, inputs["in2"]->data()->value, .. models); .. // Writes the output user::other_1 result; .. result.value = out; .. outputs["out"]->write(&result); .. return true; .. } .. public: std::vector models; .. }; .. In practice, encoding your algorithms using *groups* instead of looping over .. individual inputs makes the code more robust to changes. .. _beat-system-algorithms-input: Handling input data ------------------- .. _beat-system-algorithms-input-inputlist: Input list .......... An algorithm is given access to the **list of the inputs** of the processing block. This list can be used to access each input individually, either by their name (see section :ref:`beat-system-algorithms-input-name`), their index or by iterating over the list: .. code-block:: python # 'inputs' is the list of inputs of the processing block print(inputs['labels'].data_format) for index in range(0, inputs.length): print(inputs[index].data_format) for input in inputs: print(input.data_format) for input in inputs[0:2]: print(input.data_format) Additionally, the following method is usable on a **list of inputs**: .. py:method:: InputList.hasMoreData() Indicates if there is (at least) another block of data to process on some of the inputs .. _beat-system-algorithms-input-input: Input ..... Each input provides the following informations: .. py:attribute:: Input.name *(string)* Name of the input .. py:attribute:: Input.data_format *(string)* Data format accepted by the input .. py:attribute:: Input.data_index *(integer)* Index of the last block of data received on the input (See section :ref:`beat-system-algorithms-input-synchronization`) .. py:attribute:: Input.data *(object)* The last block of data received on the input The structure of the ``data`` object is dependent of the data format assigned to the input. Note that ``data`` can be *None*. .. _beat-system-algorithms-input-name: Input naming ............ Each algorithm assign a name of its choice to each input (and output, see section :ref:`beat-system-algorithms-output-name`). This mechanism ensures that algorithms are easily shareable between users. For instance, in :numref:`beat-system-algorithms-input-naming`, two different users (Joe and Bill) are using two different toolchains. Both toolchains have one block with two entries and one output, with a similar set of data formats (*image/rgb* and *label* on the inputs, *array/float* on the output), although not in the same order. The two blocks use different algorithms, which both refers to their inputs and outputs using names of their choice Nevertheless, Joe can choose to use Bill's algorithm instead of his own one. When the algorithm to use is changed, BEAT will attempt to match each input with the names (and types) declared by the algorithm. In case of ambiguity, the user will be asked to manually resolve it. In other words: the way the block is connected in the toolchain doesn't force a naming scheme or a specific order of inputs to the algorithms used in that block. As long as the set of data types (on the inputs and outputs) is compatible for both the block and the algorithm, the algorithm can be used in the block. .. _beat-system-algorithms-input-naming: .. figure:: ./img/inputs-naming.* Different toolchains, but interchangeable algorithms The name of the inputs are assigned in the JSON declaration of the algorithm, such as: .. code-block:: javascript { ... "groups": [ { "inputs": { "name1": { "type": "data_format_1" }, "name2": { "type": "data_format_2" } } } ], ... } .. _beat-system-algorithms-input-synchronization: Inputs synchronization ...................... The data available on the different inputs from the synchronized channels are (of course) synchronized. Let's consider the example toolchain on :numref:`beat-system-algorithms-input-synchronization-example`, where: * The image database provides two kind of data: some *images* and their associated *labels* * The *block A* receives both data via its inputs * The *block B* only receives the *labels* * Both algorithms are *data-driven* The system will ask the *block A* to process 6 images, one by one. On the second input, the algorithm will find the correct label for the current image. The ``block B`` will only be asked to process 2 labels. The algorithm can retrieve the index of the current block of data of each of its input by looking at their ``data_index`` attribute. For simplicity, the list of inputs has two attributes (``current_data_index`` and ``current_end_data_index``) that indicates the data indexes currently used by the synchronization mechanism of the platform. .. _beat-system-algorithms-input-synchronization-example: .. figure:: ./img/inputs-synchronization.* :width: 80% Synchronization example .. _beat-system-algorithms-input-unsynchronized: Additional input methods for unsynchronized channels .................................................... Unsynchronized input channels of algorithms can be accessed at will, and algorithms can use it any way they want. To be able to perform their job, they have access to additional methods. The following method is usable on a **list of inputs**: .. py:method:: InputList.next() Retrieve the next block of data on all the inputs **in a synchronized manner** Let's come back at the example toolchain on :numref:`beat-system-algorithms-input-synchronization-example`, and assume that *block A* uses an autonomous algorithm. To iterate over all the data on its inputs, the algorithm would do: .. code-block:: python class Algorithm: def process(self, inputs, data_loaders, outputs): # Iterate over all the unsynchronized data while inputs.hasMoreData(): inputs.next() # Do something with inputs['images'].data and inputs['labels'].data ... # At this point, there is no more data available on inputs['images'] and # inputs['labels'] return True The following methods are usable on an ``input``, in cases where the algorithm doesn't care about the synchronization of some of its inputs: .. py:method:: Input.hasMoreData() Indicates if there is (at least) another block of data available on the input .. py:method:: Input.next() Retrieve the next block of data .. warning:: Once this method has been called by an algorithm, the input is no more automatically synchronized with the other inputs of the block. In the following example, the algorithm desynchronizes one of its inputs but keeps the others synchronized and iterate over all their data: .. code-block:: javascript { ... "groups": [ { "inputs": { "images": { "type": "image/rgb" }, "labels": { "type": "label" }, "desynchronized": { "type": "number" } } } ], ... } .. code-block:: python class Algorithm: def process(self, inputs, data_loaders, utputs): # Desynchronize the third input. From now on, inputs['desynchronized'].data # and inputs['desynchronized'].data_index won't change inputs['desynchronized'].next() # Iterate over all the data on the inputs still synchronized while inputs.hasMoreData(): inputs.next() # Do something with inputs['images'].data and inputs['labels'].data ... # At this point, there is no more data available on inputs['images'] and # inputs['labels'], but there might be more on inputs['desynchronized'] return True .. _beat-system-algorithms-input-feedbackloop: Feedback inputs ............... The :numref:`beat-system-algorithms-input-feedbackloop-example` shows a toolchain containing a feedback loop. A special kind of input is needed in this scenario: a *feedback input*, that isn't synchronized with the other inputs, and can be freely used by the algorithm. Those feedback inputs aren't yet implemented in the prototype of the platform. This will be addressed in a later version. .. _beat-system-algorithms-input-feedbackloop-example: .. figure:: ./img/feedback-loop.* Feedback loop .. _beat-system-algorithms-dataloaders: Data loaders ------------ .. _beat-system-algorithms-dataloaders-dataloaderlist: DataLoader list ............... An algorithm is given access to the **list of data loaders** of the processing block. This list can be used to access each data loader individually, either by their channel name (see :ref:`beat-system-algorithms-input-name`), their index or by iterating over the list: .. code-block:: python # 'data_loaders' is the list of data loaders of the processing block # Retrieve a data loader by name data_loader = data_loaders['labels'] # Retrieve a data loader by index for index in range(0, len(data_loaders)): data_loader = data_loaders[index] # Iteration over all data loaders for data_loader in data_loaders: ... # Retrieve the data loader an input belongs to, by input name data_loader = data_loaders.loaderOf('label') .. _beat-system-algorithms-dataloaders-dataloader: DataLoader .......... Provides access to data from a group of inputs synchronized together. See :py:class:`DataLoader`. .. _beat-system-algorithms-output: Handling output data -------------------- .. _beat-system-algorithms-output-outputlist: Output list ........... An algorithm is given access to the **list of the outputs** of the processing block. This list can be used to access each output individually, either by their name (see section :ref:`beat-system-algorithms-output-name`), their index or by iterating over the list: .. code-block:: python # 'outputs' is the list of outputs of the processing block print outputs['features'].data_format for index in range(0, outputs.length): outputs[index].write(...) for output in outputs: output.write(...) for output in outputs[0:2]: output.write(...) .. _beat-system-algorithms-output-output: Output ...... Each output provides the following informations: .. py:attribute:: OutputList.name *(string)* Name of the output .. py:attribute:: OutputList.data_format *(string)* Format of the data written on the output And the following method: .. py:method:: OutputList.write(data, end_data_index=None) Write a block of data on the output We'll look at the usage of this method through some examples in the following sections. .. _beat-system-algorithms-output-name: Output naming ............. Like for its inputs, each algorithm assign a name of its choice to each output (see section :ref:`beat-system-algorithms-input-name` for more details) by including them in the JSON declaration of the algorithm. .. code-block:: javascript { ... "groups": [ { "inputs": { ... }, "outputs": { "name1": { "type": "data_format1" }, "name2": { "type": "data_format2" } } } ], ... } .. _beat-system-algorithms-output-example1: Example 1: Write one block of data for each received block of data .................................................................. .. _beat-system-algorithms-output-example1-figure: .. figure:: ./img/outputs-example1.* Example 1: 6 images as input, 6 blocks of data produced Consider the example toolchain on :numref:`beat-system-algorithms-output-example1-figure`. We will implement a *data-driven* algorithm that will write one block of data on the output of the block for each image received on its inputs. This is the simplest case. .. code-block:: javascript { ... "groups": [ { "inputs": { "images": { "type": "image/rgb" }, "labels": { "type": "label" } }, "outputs": { "features": { "type": "array/float" } } } ], ... } .. code-block:: python class Algorithm: def process(self, inputs, outputs): # Compute something from inputs['images'].data and inputs['labels'].data # and store the result in 'data' data = ... # Write our data block on the output outputs['features'].write(data) return True The structure of the ``data`` object is dependent of the data format assigned to the output. .. _beat-system-algorithms-output-example2: Example 2: Skip some blocks of data ................................... .. _beat-system-algorithms-output-example2-figure: .. figure:: ./img/outputs-example2.* Example 2: 6 images as input, 4 blocks of data produced, 2 blocks of data skipped Consider the example toolchain on :numref:`beat-system-algorithms-output-example2-figure`. This time, our algorithm will use a criterion to decide if it can perform its computation on an image or not, and tell the platform that, for a particular data index, no data is available. .. code-block:: javascript { ... "groups": [ { "inputs": { "images": { "type": "image/rgb" }, "labels": { "type": "label" } }, "outputs": { "features": { "type": "array/float" } } } ], ... } .. code-block:: python class Algorithm: def process(self, inputs, data_loaders, outputs): # Use a criterion on the image to determine if we can perform our # computation on it or not if can_compute(inputs['images'].data): # Compute something from inputs['images'].data and inputs['labels'].data # and store the result in 'data' data = ... # Write our data block on the output outputs['features'].write(data) else: # Tell the platform that no data is available for this image outputs['features'].write(None) return True def can_compute(self, image): # Implementation of our criterion ... return True # or False .. _beat-system-algorithms-output-example3: Example 3: Write one block of data related to several received blocks of data ............................................................................. .. _beat-system-algorithms-output-example3-figure: .. figure:: ./img/outputs-example3.* Example 3: 6 images as input, 2 blocks of data produced Consider the example toolchain on :numref:`beat-system-algorithms-output-example3-figure`. This time, our algorithm will compute something using all the images with the same label (all the dogs, all the cats) and write only one block of data related to all those images. The key here is the correct usage of the **current end data index** of the input list to specify the indexes of the blocks of data we write on the output. This ensure that the data will be synchronized everywhere in the toolchain: the platform can now tell, for each of our data block, which image and label it relates to (See section :ref:`beat-system-algorithms-input-synchronization`). Additionally, since we can't know in advance if the image currently processed is the last one with the current label, we need to memorize the current data index of the input list to correctly assign it later when we effectively write the data block on the output. .. code-block:: javascript { ... "groups": [ { "inputs": { "images": { "type": "image/rgb" }, "labels": { "type": "label" } }, "outputs": { "features": { "type": "array/float" } } } ], ... } .. code-block:: python class Algorithm: def __init__(self): self.data = None # Block of data updated each time we # receive a new image self.current_label = None # Label of the images currently processed self.previous_data_index = None # Data index of the input list during the # processing of the previous image def process(self, inputs, data_loaders, outputs): # Determine if we already processed some image(s) if self.data is not None: # Determine if the label has changed since the last image we processed if inputs['labels'].data.name != self.current_label: # Write the block of data on the output outputs['features'].write(self.data, self.previous_data_index) self.data = None # Memorize the current data index of the input list self.previous_data_index = inputs.current_end_data_index # Create a new block of data if necessary if self.data is None: self.data = ... # Remember the label we are currently processing self.current_label = inputs['labels'].data.name # Compute something from inputs['images'].data and inputs['labels'].data # and update the content of 'self.data' ... # Determine if this was the last block of data or not if not(inputs.hasMoreData()): # Write the block of data on the output outputs['features'].write(self.data, inputs.current_end_data_index) return True .. _beat-system-algorithms-loop-channel: Soft loop communication ----------------------- The processor and evaluator algorithm components of the soft loop macro block communicate with each other using a LoopChannel object. This object defines the two dataformats that will be used to make the request and the answer that will transit through the loop channel. This class is only meant to be used by the algorithm implementer. .. _beat-system-algorithms-api-migration: Migrating from API v1 to API v2 ------------------------------- Algorithm that have been written using BEAT's algorithm v1 can still be run under v2 execution model. They are now considered legacy algorithm and should be ported quickly to the API v2. API v2 provides two different types of algorithms: - Sequential - Autonomous The Sequential type follows the same code execution model as the v1 API, meaning that the process function is called once for each input item. The Autonomous type allows the developer to load the input data at will therefor the process method will only be called once. This allows for example to optimize loading of data to the GPU memory for faster execution. The straightforward migration path from v1 to v2 is to make a Sequential algorithm which will require only a few changes regarding the code. API V1: .. code-block:: python class Algorithm: def setup(self, parameters): self.sync = parameters['sync'] return True def process(self, inputs, outputs): if inputs[self.sync].isDataUnitDone(): outputs['out'].write({ 'value': inputs['in1'].data.value + inputs['in2'].data.value, }) return True API V2 sequential: .. code-block:: python class Algorithm: def setup(self, parameters): self.sync = parameters['sync'] return True def process(self, inputs, data_loaders, outputs): if inputs[self.sync].isDataUnitDone(): outputs['out'].write({ 'value': inputs['in1'].data.value + inputs['in2'].data.value, }) return True API V2 automous: .. code-block:: python class Algorithm: def setup(self, parameters): self.sync = parameters['sync'] return True def process(self, data_loaders, outputs): data_loader = data_loaders.loaderOf('in1') for i in range(data_loader.count(self.sync)): view = data_loader.view(self.sync, i) (data, start, end) = view[view.count() - 1] outputs['out'].write({ 'value': data['in1'].value + data['in2'].value, }, end ) return True .. include:: links.rst