.. vim: set fileencoding=utf-8 :

.. Copyright (c) 2016 Idiap Research Institute, http://www.idiap.ch/          ..
.. Contact: beat.support@idiap.ch                                             ..
..                                                                            ..
.. This file is part of the beat.docs module of the BEAT platform.            ..
..                                                                            ..
.. Commercial License Usage                                                   ..
.. Licensees holding valid commercial BEAT licenses may use this file in      ..
.. accordance with the terms contained in a written agreement between you     ..
.. and Idiap. For further information contact tto@idiap.ch                    ..
..                                                                            ..
.. Alternatively, this file may be used under the terms of the GNU Affero     ..
.. Public License version 3 as published by the Free Software and appearing   ..
.. in the file LICENSE.AGPL included in the packaging of this file.           ..
.. The BEAT platform is distributed in the hope that it will be useful, but   ..
.. WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY ..
.. or FITNESS FOR A PARTICULAR PURPOSE.                                       ..
..                                                                            ..
.. You should have received a copy of the GNU Affero Public License along     ..
.. with the BEAT platform. If not, see http://www.gnu.org/licenses/.          ..


.. _beat-system-algorithms:

============
 Algorithms
============

Algorithms are user-defined piece of software that run within the blocks of a
toolchain. An algorithm can read data on the input(s) of the block and write
processed data on its output(s) (We refer to the inputs and outputs collectively as *endpoints*.).
They are, hence, key components for
scientific experiments, since they formally describe how to transform raw
data into higher level concept such as classes.


An algorithm lies at the core of each processing block and may be subject to
parametrization. Inputs and outputs of an algorithm have well-defined data
formats. The format of the data on each input and output of the block is
defined at a higher-level in BEAT framework. It is expected that the
implementation of the algorithm respects the format of each endpoint that was declared before.

:numref:`beat-core-overview-block` displays the relationship between a
processing block and its algorithm.

.. _beat-core-overview-block:
.. figure:: ./img/block.*

   Relationship between a processing block and its algorithm

Typically, an algorithm will process data units received at the input
endpoints, and push the relevant results to the output endpoint. Each algorithm
must have at least one input and at least one output. The links in a toolchain
connect the output of one block to the input of another effectively connecting
algorithms together, thus determining the information-flow through the
toolchain.

Blocks at the beginning of the toolchain are typically connected to datasets,
and blocks at the end of a toolchain are connected to analyzers (special
algorithms with no output). BEAT is responsible for
delivering inputs from the desired datasets into the toolchain and through your
algorithms. This drives the synchronization of information-flow through the
toolchain. Flow synchronization is determined by data units produced from a
dataset and injected into the toolchain.

.. note:: **Naming Convention**

   Algorithms are named using three values joined by a ``/`` (slash) operator:

     * **username**: indicates the author of the algorithm
     * **name**: indicates the name of the algorithm
     * **version**: indicates the version (integer starting from ``1``) of the
       algorithm

   Each tuple of these three components defines a *unique* algorithm name
   inside the BEAT ecosystem.


.. _beat-system-algorithms-types:

Algorithm types
===============

The current version of BEAT framework has two algorithm type which are different
in the way they handle data samples. These algorithms are the following:

- Sequential
- Autonomous

In the previous versions of BEAT only one type of
algorithm (referred to as v1 algorithm) was implemented.
The sequential algorithm type is the direct successor of the v1 algorithm. For
migration information, see :ref:`beat-system-algorithms-api-migration`.

The platform now also provides the concept of soft loop. The soft loop allows
the implementation of supervised processing within a macro block.

Sequential
----------

The sequential algorithm is **data-driven**; algorithm is typically provided
one data sample at a time and must immediately produce some output data.

Autonomous
----------

The autonomous algorithm as its name suggest is responsible for loading the data
samples it needs in order to do its work. It's also responsible for writing the
appropriate amount of data on its outputs.


Furthermore, the way the algorithm handle the data is highly configurable and
covers a huge range of possible scenarios.

Loop
----

A loop is composed of three elements:

- An processor algorithm
- An evaluator algorithm
- A LoopChannel

The two algorithms work in pair using the LoopChannel to communicate. The
processor algorithm is responsible for applying some transformation or analysis
on a set of data and then send the result to evaluator for validation. The
role of the evaluator is to provide a feedback to the processor that will
either continue processing the same block of data or go on with the next until
all data is exhausted. The output writing of the evaluator is synchronized with
the output writing of the processor.

Sequential versions have also the reading part that is synchronized so that the
evaluator can read data at the same pace as the processor.

The two algorithms are available in both sequential and autonomous form. However
there are only three valid combinations:

========== ==========
 Processor Evaluator
========== ==========
Autonomous Autonomous
Sequential Sequential
Sequential Autonomous
========== ==========


.. _beat-system-algorithms-definition:

Definition of an Algorithm
==========================

An algorithm is defined by two distinct components:

* a `JSON`_ object with several fields, specifying the inputs, the outputs,
  the parameters and additional information such as the language in which it
  is implemented.
* source code (and/or [later] binary code) describing how to transform the input
  data.


.. _beat-system-algorithms-definition-json:

JSON Declaration
----------------

A `JSON`_ declaration of an algorithm consists of several fields. For example,
the following declaration is the one of an algorithm implementing
probabilistic component analysis (PCA):

.. code-block:: javascript

    {
        "schema_version": 2,
        "language": "python",
        "api_version": 2,
        "type": "sequential",
        "splittable": false,
        "groups": [
            {
                "inputs": {
                    "image": {
                        "type": "system/array_2d_uint8/1"
                    }
                },
                "outputs": {
                    "subspace": {
                        "type": "tutorial/linear_machine/1"
                    }
                }
            }
        ],
        "parameters": {
            "number-of-components": {
                "default": 5,
                "type": "uint32"
            }
        },
        "description": "Principal Component Analysis (PCA)"
    }

Here are the description for each of the fields in the example above:

*   **schema_version:** specifies which schema version must be used to validate the file content.

*   **api_version:** specifies the version of the API implemented by the algorithm.

*   **type:** specifies the type of the algorithm. Depending on that, the execution model will change.

*   **language:** specifies the language in which the algorithm is implemented.

*   **splittable:** indicates, whether the algorithm can be parallelized into chunks or not.

*   **parameters:** lists the parameters of the algorithm, describing both default values and their types.

*   **groups:** gives information about the inputs and outputs of the algorithm. They are provided into a list of dictionary, each element in this list being associated to a database *channel*. The group, which contains outputs, is the *synchronization channel*. By default, a loop is automatically performed by the BEAT framework on the synchronization channel, and user-code must not loop on this group. In contrast, it is the responsibility of the user to load data from the other groups. This is described in more details in the following subsections.

*   **description:** is optional and gives a short description of the algorithm.

.. note::

   The graphical interface of BEAT provides user-friendly editors to configure
   the main components of the system (for example: algorithms, data formats,
   etc.), which simplifies their `JSON`_ declaration definition. One needs
   only to declare an algorithm using the described specifications when not
   using this graphical interface.


.. _beat-system-algorithms-definition-analyzer:

Analyzer
........

At the end of the processing workflow of an experiment, there is a special
kind of algorithm, which does not yield any *output* but instead it produces
*results*. These algorithms are called **analyzers**.

*Results* of an experiment are reported back to the user. Data privacy is very
important in the BEAT framework and therefore only a limited number of data
formats can be employed as results in an analyzer, such as boolean, integers,
floating point values, strings (of limited size), as well as plots (such as
scatter or bar plots).

For example, the following declaration is the one of a simple analyzer, which
generates an ROC curve as well as few other metrics.

.. code-block:: javascript

    {
      "language": "python",
      "groups": [
        {
          "inputs": {
            "scores": {
              "type": "tutorial/probe_scores/1"
            }
          }
        }
      ],
      "results": {
        "far": {
          "type": "float32",
          "display": true
        },
        "roc": {
          "type": "plot/scatter/1",
          "display": false
        },
        "number_of_positives": {
          "type": "int32",
          "display": false
        },
        "frr": {
          "type": "float32",
          "display": true
        },
        "eer": {
          "type": "float32",
          "display": true
        },
        "threshold": {
          "type": "float32",
          "display": false
        },
        "number_of_negatives": {
          "type": "int32",
          "display": false
        }
      }
    }


.. _beat-system-algorithms-definition-code:

Source code
-----------

The BEAT framework has been designed to support algorithms written in different
programming languages. However, for each language, a corresponding back-end
needs to be implemented, which is in charge of connecting the inputs and
outputs to the algorithm and running its code as expected. In this section,
we describe the implementation of algorithms in the Python and C++ programming
language.


|project| treats algorithms as objects that are derived from the class
``Algorithm`` when using Python or in case of C++, they should be derived from
``IAlgorithmLagacy``, ``IAlgorithmSequential``, or ``IAlgorithmAutonomous``
depending of the algorithm type. To define a new algorithm,
at least one method must be implemented:

  * ``process()``: the method that actually processes input and produces
    outputs.

The code example below illustrates the implementation of an algorithm (in
Python):

.. code-block:: python
   :linenos:

   class Algorithm:

        def process(self, inputs, data_loaders, outputs):
           # here, you read inputs, process and write results to outputs


Here is the equivalent example for a sequential algorithm in C++:

.. code-block:: c++
   :linenos:

    class Algorithm: public IAlgorithmSequential
    {
    public:
        bool process(const InputList& inputs, const DataloaderList& data_load_list, const OutputList& outputs) override
        {
            // here, you read inputs, process and write results to outputs
        }
    };

.. _beat-system-algorithms-examples:

Examples
........

To implement a new algorithm, one must write a class following a few
conventions. In the following, examples of such classes are provided.

.. _beat-system-algorithms-examples-simple-sequential:

Simple sequential algorithm (no parametrization)
................................................

At the very minimum, an algorithm class must look like this:

.. code-block:: python

    class Algorithm:

        def process(self, inputs, data_loaders, outputs):
            # Read data from inputs, compute something, and write the result
            # of the computation on outputs
            ...
            return True

The class must be called ``Algorithm`` and must have a method called
``process()``, that takes as parameters a list of inputs (see section
:ref:`beat-system-algorithms-input-inputlist`), a list of data loader (see section
:ref:`beat-system-algorithms-dataloaders-dataloaderlist`) and a list of outputs
(see section :ref:`beat-system-algorithms-output-outputlist`). This method must
return ``True`` if everything went correctly, and ``False`` if an error
occurred.

The platform will call this method once per block of data available on the
`synchronized` inputs of the block.

.. _beat-system-algorithms-examples-simple-autonomous:

Simple autonomous algorithm (no parametrization)
................................................

At the very minimum, an algorithm class must look like this:

.. code-block:: python

    class Algorithm:

        def process(self, data_loaders, outputs):
            # Read data from data_loaders, compute something, and write the
            # result of the computation on outputs
            ...
            return True

The class must be called ``Algorithm`` and must have a method called
``process()``, that takes as parameters a list of data loader (see section
:ref:`beat-system-algorithms-dataloaders`) and a list of outputs (see
section :ref:`beat-system-algorithms-output-outputlist`). This method must
return ``True`` if everything went correctly, and ``False`` if an error
occurred.

The platform will call this method only once as it is its responsibility to load
the appropriate amount of data and process it.


.. _beat-system-algorithms-examples-simple-processor:

Simple autonomous processor algorithm (no parametrization)
..........................................................

At the very minimum, a processor algorithm class must look like this:

.. code-block:: python

    class Algorithm:

        def process(self, data_loaders, outputs, loop_channel):
            # Read data from data_loaders, compute something, and validates the
            # hypothesis
            ...
            is_valid, feedback = loop_channel.validate({"value": np.float64(some_value)})
            # check is_valid and continue appropriately and write the result
            # of the computation on outputs
            ...
            return True


The class must be called ``Algorithm`` and must have a method called
``process()``, that takes as parameters a list of inputs (see section
:ref:`beat-system-algorithms-input-inputlist`), a list of data loader (see section
:ref:`beat-system-algorithms-dataloaders-dataloaderlist`), a list of outputs
(see section :ref:`beat-system-algorithms-output-outputlist`) and a loop chanel
(see section :ref:`beat-system-algorithms-loop-channel`) . This method must
return ``True`` if everything went correctly, and ``False`` if an error
occurred.

The platform will call this method once per block of data available on the
`synchronized` inputs of the block.


.. _beat-system-algorithms-examples-simple-evaluator:

Simple autonomous evaluator algorithm (no parametrization)
..........................................................

At the very minimum, a processor algorithm class must look like this:

.. code-block:: python

    class Algorithm:

        def validate(self, hypothesis):
            # compute if hypothesis makes sense and returns a tuple with a
            # boolean value and some feendback
            return (result, {"value": np.float32(delta)})

        def write(self, outputs, processor_output_name, end_data_index):
            # write something on its output, it is called in sync with processor
            # algorithm write
            outputs["out"].write({"value": np.int32(self.output)}, end_data_index)


The class must be called ``Algorithm`` and must have a method called
``validate()``, that takes as parameter a dataformat that will contain the
hypothesis that needs validation. The function must return a tuple made of a
boolean value and feedback value that will be used by the processor to determine
whether it should continue processing the current data or move further.


.. _beat-system-algorithms-examples-parameterizable:

Parameterizable algorithm
.........................

The following is valid for all types of algorithms

To implement a parameterizable algorithm, two things must be added to the class:
(1) a field in the JSON declaration of the algorithm containing their default
values as well as the type of the parameters, and (2) a method called
``setup()``, that takes one argument, a map containing the parameters of the
algorithm.

.. code-block:: javascript

    {
        ...
        "parameters": {
            "threshold": {
                "default": 0.5,
                "type": "float32"
            }
        },
        ...
    }

.. code-block:: python

    class Algorithm:

        def setup(self, parameters):
            # Retrieve the value of the parameters
            self.threshold = parameters['threshold']
            return True

        def process(self, inputs, data_loaders, outputs):
            # Read data from inputs, compute something, and write the result
            # of the computation on outputs
            ...
            return True

When retrieving the value of the parameters, one must not assume that a value
was provided for each parameter. This is why we may use a *try: ... except: ...*
construct in the ``setup()`` method.

.. _beat-system-algorithms-preparation:

Preparation of an algorithm
...........................

The following is valid for all types of algorithms

Often algorithms need to compute some values or retrieve some data prior to
applying their mathematical logic.

This is possible using the prepare method.

.. code-block:: python

    class Algorithm:

        def prepare(self, data_loaders):
            # Retrieve and prepare some data.
            data_loader = data_loaders.loaderOf('in2')
            (data, _, _) = data_loader[0]
            self.offset = data['in2'].value
            return True

        def process(self, inputs, data_loaders, outputs):
            # Read data from inputs, compute something, and write the result
            # of the computation on outputs
            ...
            return True


Data Synchronization in Sequential Algorithms
=============================================

One particularity of the |project| framework is how the data-flow through a
given toolchain is synchronized. The framework is responsible for extracting
data units (images, speech-segments, videos, etc.) from the database and
presenting them to the input endpoints of certain blocks, as specified in the
toolchain. Each time a new data unit is presented to the input of a block can
be thought of as an individual time-unit. The algorithm implemented in a block
is responsible for the synchronization between its inputs and its output. In
other words, every time a data unit is produced by a dataset on an experiment,
the ``process()`` method of your algorithm is called to act upon it.

An algorithm may have one of two kinds of sychronicities: one-to-one, and
many-to-one. These are discussed in detail in separate sections below.


One-to-one synchronization
--------------------------

Here, the algorithm generates one output for every input entity (e.g., image,
video, speech-file).  For example, an image-based feature-extraction algorithm
would typically output one set of features every time it is called with a new
input image. A schematic diagram of one-to-one sychronization for an algorithm
is shown in the figure below:

.. image:: img/case-study-1.*

At the configuration shown in this figure, the algorithm-block has two
endpoints: one input, and one output. The inputs and outputs and the block are
synchronized together (notice the color information). Each red box represents
one input unit (e.g., an image, or a video), that is fed to the input interface
of the block.  Corresponding to each input received, the block produces one
output unit, shown as a blue box in the figure.

An example code showing how to implement an algorithm in this configuration is shown below:

.. code-block:: python
   :linenos:

    class Algorithm:

        def process(self, inputs, data_loaders, outputs):

            # to read the field "value" on the "in" input, use "data"
            # a possible declaration of "user/format/1" would be:
            # {
            #   "value": ...
            # }
            value = inputs['in'].data.value

            # do your processing and create the "output" value
            output = magical_processing(value)

            # to write "output" into the relevant endpoint use "write"
            # a possible declaration of "user/other/1" would be:
            # {
            #   "value": ...
            # }
            outputs['out'].write({'value': output})

            # No error occurred, so return True
            return True


.. code-block:: c++
   :linenos:

    class Algorithm: public IAlgorithmSequential
    {
    public:
        bool process(const InputList& inputs, const DataloaderList& data_load_list, const OutputList& outputs) override
        {
            // to read the field "value" on the "in" input, use "data"
            // a possible declaration of "user/format/1" would be:
            // {
            //   "value": ...
            // }
            auto value = inputs["in"]->data<user::format_1>()->value;

            // do your processing and create the "output" value
            auto output = magical_processing(value);

            // to write "output" into the relevant endpoint use "write"
            // a possible declaration of "user/other/1" would be:
            // {
            //   "value": ...
            // }
            user::other_1 result;
            result.value = output;

            outputs["out"]->write(&result);

            # No error occurred, so return true
            return true;
        }
    };


In this example, the platform will call the user algorithm every time a new
input block with the format ``user/format/1`` is available at the input. Notice
no ``for`` loops are necessary on the user code. The platform controls the
looping for you.


A more complex case of one-to-one sychronization is shown the following figure:

.. image:: img/case-study-2.*

In such a configuration, the platform will ensure that each input unit at the
input-endpoint ``in`` is associated with the correct input unit at the
input-endpoint ``in2``. For example, referring to the figure above, the items
at the input ``in`` could be images, at the items at the input ``in2`` could be
labels, and the configuration depicted indicates that the first two input
images have the same label, say, ``l1``, whereas the next two input images have
the same label, say, ``l2``. The algorithm produces one output item at the
endpoint ``out``, for each input object presented at endpoint ``in``.

Example code implementing an algorithm processing data in this scenario is
shown below:

.. code-block:: python
   :linenos:

    class Algorithm:

        def process(self, inputs, data_loaders, outputs):

            i1 = inputs['in'].data.value
            i2 = inputs['in2'].data.value

            out = magical_processing(i1, i2)

            outputs['out'].write({'value': out})

            return True


.. code-block:: c++
   :linenos:

    class Algorithm: public IAlgorithmSequential
    {
    public:
        bool process(const InputList& inputs, const DataloaderList& data_load_list, const OutputList& outputs) override
        {
            auto i1 = inputs["in"]->data<user::format_1>()->value;
            auto i2 = inputs["in2"]->data<user::format_1>()->value;

            auto out = magical_processing(i1, i2);

            user::other_1 result;
            result.value = out;

            outputs["out"]->write(&result);

            return true;
        }
    };


You should notice that we still don't require any sort of ``for`` loops! BEAT *synchronizes* the inputs ``in`` and ``in2`` so they are available to
your program as the dataset implementor defined.


Many-to-one synchronization
---------------------------

Here, the algorithm produces a single output after processing a batch of
inputs.  For example, the algorithm may produce a model for a *dog* after
processing all input images for the *dog* class. A block diagram illustrating
many-to-one synchronization is shown below:

.. image:: img/case-study-3.*


Here the synchronization is driven by the endpoint ``in2``. For each data unit
received at the input ``in2``, the algorithm generates one output unit. Note
that, here, multiple units received at the input ``in`` are accumulated and
associated with a single unit received at ``in2``. The user does not have to
handle the internal indexing. Producing output data at the right moment is
enough for BEAT to understand the output is synchronized with ``in2``.

The example below illustrates how such an algorithm could be implemented:

.. code-block:: python
   :linenos:

    class Algorithm:

        def __init__(self):
            self.objs = []

        def process(self, inputs, data_loaders, outputs):
            self.objs.append(inputs['in'].data.value) # accumulates

            if not (inputs['in2'].hasMoreData()):
               out = magical_processing(self.objs)
               outputs['out'].write({'value': out})
               self.objs = [] #reset accumulator for next label

            return True


.. code-block:: c++
   :linenos:

    class Algorithm: public IAlgorithmSequential
    {
    public:
        bool process(const InputList& inputs, const DataloaderList& data_load_list, const OutputList& outputs) override
        {
            objs.push_back(inputs["in"]->data<user::format_1>()->value); // accumulates

            if !(inputs["in2"]->hasMoreData())
            {
                auto out = magical_processing(objs);

                user::other_1 result;
                result.value = out;

                outputs["out"]->write(&result);

                objs.clear();   // reset accumulator for next label
            }

            return true;
        }

    public:
        std::vector<float> objs;
    };


Here, the units received at the endpoint ``in`` are accumulated as long as the
``hasMoreData()`` method attached to the input ``in2`` returns ``True``.
When ``hasMoreData()`` returns ``False``, the corresponding label is read
from ``in2``, and a result is produced at the endpoint ``out``. After an output
unit has been produced, the internal accumulator for ``in`` is cleared, and the
algorithm starts accumulating a new set of objects for the next label.


Unsynchronized Operation
------------------------

Not all inputs for a block need to be synchronized together. In the diagram
shown below, the block is synchronized with the inputs ``in`` and ``in2`` (as indicated by
the green circle which matches the colour of the input lines connecting ``in`` and ``in2``).
The output ``out`` is synchronized with the block (and as one can notice locking at the code
below, outputs signal after every ``in`` input). The input ``in3`` is not
synchronized with the endpoints ``in``, ``in2`` and with the block. A processing block
which receives a previously calculated model and must score test samples is a
good example for this condition. In this case, the user is responsible for
reading out the contents of ``in3`` explicitly.

.. image:: img/case-study-4.*


In this case the algorithm will include an explicit loop to read the
unsynchronized input (``in3``).

.. code-block:: python
   :linenos:

    class Algorithm:

        def __init__(self):
            self.models = []

        def prepare(self, data_loaders):

            # Loads the "model" data at the beginning
            loader = data_loaders.loaderOf('in3')
            for i in range(loader.count()):
                view = loader.view('in3', i)
                data, _, _ = view[0]
                self.models.append(data['in3'].value)
            return True


        def process(self, inputs, data_loaders, outputs):
            # N.B.: this will be called for every unit in `in'

            # Processes the current input in `in' and `in2', apply the
            # model/models
            out = magical_processing(inputs['in'].data.value,
                                     inputs['in2'].data.value,
                                     self.models)

            # Writes the output
            outputs.write({'value': out})

            return True


.. code-block:: c++
   :linenos:

    class Algorithm: public IAlgorithmSequential
    {
    public:
        bool prepare(const beat::backend::cxx::DataLoaderList& data_load_list) override
        {
            auto loader = data_load_list["in3"];
            for (int i = 0 ; i < loader->count() ; ++i) {
                auto view = loader->view("in3", i);
                std::map<std::string, beat::backend::cxx::Data *> data;
                std::tie(data, std::ignore, std::ignore) = (*view)[0];
                auto model = static_cast<user::model*>(data["in3"]);
                models.append(*model);
            }

            return true;
        }

        bool process(const InputList& inputs, const DataloaderList& data_load_list, const OutputList& outputs) override
        {
            # N.B.: this will be called for every unit in `in'

            // Processes the current input in `in' and `in2', apply the model/models
            auto out = magical_processing(inputs["in"]->data<user::format_1>()->value,
                                          inputs["in2"]->data<user::format_1>()->value,
                                          models);

            // Writes the output
            user::other_1 result;
            result.value = out;

            outputs["out"]->write(&result);

            return true;
        }

    public:
        std::vector<user::model_1> models;
    };


In the example above you have several inputs which are synchronized together, but
unsynchronized with the block you're writing your algorithm for. It may also happen that you have even more data inputs that are unsynchronized. In this case, using *group* for different set of inputs makes the code easier to read.
.. it is safer to treat inputs using their *group*. For example:

.. .. code-block python
..    :linenos:

..     class Algorithm:

..         def __init__(self):
..             self.models = None

..         def prepare(self, data_loaders):

..             #??? Is the concept of groups any use when we have dataloaders assuming this scenario???

..             # Loads the "model" data at the beginning
..             loader = data_loaders.loaderOf('in3')
..             for i in range(loader.count()):
..                 view = loader.view('in3', i)
..                 data, _, _ = view[0]
..                 self.models.append(data['in3'].value)

..         def process(self, inputs, data_loaders, outputs):
..             # N.B.: this will be called for every unit in `in'

..             # Loads the "model" data at the beginning, once
..             if self.models is None:
..                 self.models = []
..                 group = inputs.groupOf('in3')
..                 while group.hasMoreData():
..                     group.next() #synchronously advances the data
..                     self.models.append(group['in3'].data.value)

..             # Processes the current input in `in' and `in2', apply the model/models
..             out = magical_processing(inputs['in'].data.value,
..                                      inputs['in2'].data.value,
..                                      self.models)

..             # Writes the output
..             outputs.write({'value': out})

..             return True


.. code-block c++
..    :linenos:

..     class Algorithm: public IAlgorithmSequential
..     {
..     public:
        bool prepare(const beat::backend::cxx::DataLoaderList& data_load_list) override
..         {
..             auto loader = data_load_list["in3"];
..             for (int i = 0 ; i < loader->count() ; ++i) {
..                 auto view = loader->view("in3", i);
                std::map<std::string, beat::backend::cxx::Data *> data;
                std::tie(data, std::ignore, std::ignore) = (*view)[0];
                auto model = static_cast<user::model*>(data["in3"]);
..                 models.append(*model);
..             }

..             return true;
..         }

..         bool process(const InputList& inputs, const DataloaderList& data_load_list, const OutputList& outputs) override
..         {
..             // N.B.: this will be called for every unit in `in'

..             // Processes the current input in `in' and `in2', apply the model/models
            auto out = magical_processing(inputs["in"]->data<user::format_1>()->value,
                                          inputs["in2"]->data<user::format_1>()->value,
..                                           models);

..             // Writes the output
            user::other_1 result;
..             result.value = out;

..             outputs["out"]->write(&result);

..             return true;
..         }

..     public:
        std::vector<user::model_1> models;
..     };


.. In practice, encoding your algorithms using *groups* instead of looping over
.. individual inputs makes the code more robust to changes.


.. _beat-system-algorithms-input:

Handling input data
-------------------

.. _beat-system-algorithms-input-inputlist:

Input list
..........

An algorithm is given access to the **list of the inputs** of the processing
block. This list can be used to access each input individually, either by
their name (see section :ref:`beat-system-algorithms-input-name`), their index
or by iterating over the list:

.. code-block:: python

    # 'inputs' is the list of inputs of the processing block

    print(inputs['labels'].data_format)

    for index in range(0, inputs.length):
        print(inputs[index].data_format)

    for input in inputs:
        print(input.data_format)

    for input in inputs[0:2]:
        print(input.data_format)

Additionally, the following method is usable on a **list of inputs**:

.. py:method:: InputList.hasMoreData()

    Indicates if there is (at least) another block of data to process on some of
    the inputs


.. _beat-system-algorithms-input-input:

Input
.....

Each input provides the following informations:

.. py:attribute:: Input.name

    *(string)* Name of the input

.. py:attribute:: Input.data_format

    *(string)* Data format accepted by the input

.. py:attribute:: Input.data_index

    *(integer)* Index of the last block of data received on the input (See section
    :ref:`beat-system-algorithms-input-synchronization`)

.. py:attribute:: Input.data

    *(object)* The last block of data received on the input

The structure of the ``data`` object is dependent of the data format assigned to
the input. Note that ``data`` can be *None*.

.. _beat-system-algorithms-input-name:

Input naming
............

Each algorithm assign a name of its choice to each input (and output, see
section :ref:`beat-system-algorithms-output-name`). This mechanism ensures that algorithms
are easily shareable between users.

For instance, in :numref:`beat-system-algorithms-input-naming`, two different users
(Joe and Bill) are using two different toolchains. Both toolchains have one
block with two entries and one output, with a similar set of data formats
(*image/rgb* and *label* on the inputs, *array/float* on the output), although
not in the same order. The two blocks use different algorithms, which both
refers to their inputs and outputs using names of their choice

Nevertheless, Joe can choose to use Bill's algorithm instead of his own one.
When the algorithm to use is changed, BEAT will
attempt to match each input with the names (and types) declared by the
algorithm. In case of ambiguity, the user will be asked to manually resolve it.

In other words: the way the block is connected in the toolchain doesn't force a
naming scheme or a specific order of inputs to the algorithms used in that
block. As long as the set of data types (on the inputs and outputs) is
compatible for both the block and the algorithm, the algorithm can be used in
the block.

.. _beat-system-algorithms-input-naming:
.. figure:: ./img/inputs-naming.*

   Different toolchains, but interchangeable algorithms

The name of the inputs are assigned in the JSON declaration of the algorithm,
such as:

.. code-block:: javascript

    {
        ...
        "groups": [
            {
                "inputs": {
                    "name1": {
                        "type": "data_format_1"
                    },
                    "name2": {
                        "type": "data_format_2"
                    }
                }
            }
        ],
        ...
    }


.. _beat-system-algorithms-input-synchronization:

Inputs synchronization
......................

The data available on the different inputs from the synchronized channels
are (of course) synchronized. Let's consider the example toolchain on
:numref:`beat-system-algorithms-input-synchronization-example`, where:

* The image database provides two kind of data: some *images* and their
  associated *labels*
* The *block A* receives both data via its inputs
* The *block B* only receives the *labels*
* Both algorithms are *data-driven*

The system will ask the *block A* to process 6 images, one by one. On the
second input, the algorithm will find the correct label for the current image.
The ``block B`` will only be asked to process 2 labels.

The algorithm can retrieve the index of the current block of data of each of
its input by looking at their ``data_index`` attribute. For simplicity, the
list of inputs has two attributes (``current_data_index`` and
``current_end_data_index``) that indicates the data indexes currently used by
the synchronization mechanism of the platform.

.. _beat-system-algorithms-input-synchronization-example:
.. figure:: ./img/inputs-synchronization.*
   :width: 80%

   Synchronization example


.. _beat-system-algorithms-input-unsynchronized:

Additional input methods for unsynchronized channels
....................................................

Unsynchronized input channels of algorithms can be accessed at will, and
algorithms can use it any way they want. To be able to perform their job, they
have access to additional methods.

The following method is usable on a **list of inputs**:

.. py:method:: InputList.next()

    Retrieve the next block of data on all the inputs **in a synchronized
    manner**


Let's come back at the example toolchain on
:numref:`beat-system-algorithms-input-synchronization-example`, and assume
that *block A* uses an autonomous algorithm. To iterate over all the data on
its inputs, the algorithm would do:

.. code-block:: python

    class Algorithm:

        def process(self, inputs, data_loaders, outputs):

            # Iterate over all the unsynchronized data
            while inputs.hasMoreData():
                inputs.next()

                # Do something with inputs['images'].data and inputs['labels'].data
                ...

            # At this point, there is no more data available on inputs['images'] and
            # inputs['labels']

            return True


The following methods are usable on an ``input``, in cases where the algorithm
doesn't care about the synchronization of some of its inputs:

.. py:method:: Input.hasMoreData()

    Indicates if there is (at least) another block of data available on the input

.. py:method:: Input.next()

    Retrieve the next block of data

    .. warning::

       Once this method has been called by an algorithm, the input is no more
       automatically synchronized with the other inputs of the block.

In the following example, the algorithm desynchronizes one of its inputs but
keeps the others synchronized and iterate over all their data:

.. code-block:: javascript

    {
        ...
        "groups": [
            {
                "inputs": {
                    "images": {
                        "type": "image/rgb"
                    },
                    "labels": {
                        "type": "label"
                    },
                    "desynchronized": {
                        "type": "number"
                    }
                }
            }
        ],
        ...
    }


.. code-block:: python

    class Algorithm:

        def process(self, inputs, data_loaders, utputs):

            # Desynchronize the third input. From now on, inputs['desynchronized'].data
            # and inputs['desynchronized'].data_index won't change
            inputs['desynchronized'].next()

            # Iterate over all the data on the inputs still synchronized
            while inputs.hasMoreData():
                inputs.next()

                # Do something with inputs['images'].data and inputs['labels'].data
                ...

            # At this point, there is no more data available on inputs['images'] and
            # inputs['labels'], but there might be more on inputs['desynchronized']

            return True


.. _beat-system-algorithms-input-feedbackloop:

Feedback inputs
...............

The :numref:`beat-system-algorithms-input-feedbackloop-example` shows a toolchain
containing a feedback loop. A special kind of input is needed in this scenario:
a *feedback input*, that isn't synchronized with the other inputs, and can be
freely used by the algorithm.

Those feedback inputs aren't yet implemented in the prototype of the platform.
This will be addressed in a later version.

.. _beat-system-algorithms-input-feedbackloop-example:
.. figure:: ./img/feedback-loop.*

    Feedback loop


.. _beat-system-algorithms-dataloaders:

Data loaders
------------

.. _beat-system-algorithms-dataloaders-dataloaderlist:

DataLoader list
...............

An algorithm is given access to the **list of data loaders** of the processing
block. This list can be used to access each data loader individually, either by
their channel name (see :ref:`beat-system-algorithms-input-name`), their
index or by iterating over the list:


.. code-block:: python

    # 'data_loaders' is the list of data loaders of the processing block

    # Retrieve a data loader by name
    data_loader = data_loaders['labels']

    # Retrieve a data loader by index
    for index in range(0, len(data_loaders)):
        data_loader = data_loaders[index]

    # Iteration over all data loaders
    for data_loader in data_loaders:
        ...

    # Retrieve the data loader an input belongs to, by input name
    data_loader = data_loaders.loaderOf('label')


.. _beat-system-algorithms-dataloaders-dataloader:

DataLoader
..........

Provides access to data from a group of inputs synchronized together.
See :py:class:`DataLoader`.

.. _beat-system-algorithms-output:

Handling output data
--------------------

.. _beat-system-algorithms-output-outputlist:

Output list
...........

An algorithm is given access to the **list of the outputs** of the processing
block.  This list can be used to access each output individually, either by
their name (see section :ref:`beat-system-algorithms-output-name`), their index
or by iterating over the list:

.. code-block:: python

    # 'outputs' is the list of outputs of the processing block

    print outputs['features'].data_format

    for index in range(0, outputs.length):
        outputs[index].write(...)

    for output in outputs:
        output.write(...)

    for output in outputs[0:2]:
        output.write(...)


.. _beat-system-algorithms-output-output:

Output
......

Each output provides the following informations:

.. py:attribute:: OutputList.name

    *(string)* Name of the output

.. py:attribute:: OutputList.data_format

    *(string)* Format of the data written on the output


And the following method:

.. py:method:: OutputList.write(data, end_data_index=None)

    Write a block of data on the output


We'll look at the usage of this method through some examples in the following
sections.


.. _beat-system-algorithms-output-name:

Output naming
.............

Like for its inputs, each algorithm assign a name of its choice to each output
(see section :ref:`beat-system-algorithms-input-name` for more details) by
including them in the JSON declaration of the algorithm.


.. code-block:: javascript

    {
        ...
        "groups": [
            {
                "inputs": {
                    ...
                },
                "outputs": {
                    "name1": {
                        "type": "data_format1"
                    },
                    "name2": {
                        "type": "data_format2"
                    }
                }
            }
        ],
        ...
    }


.. _beat-system-algorithms-output-example1:

Example 1: Write one block of data for each received block of data
..................................................................

.. _beat-system-algorithms-output-example1-figure:
.. figure:: ./img/outputs-example1.*

   Example 1: 6 images as input, 6 blocks of data produced

Consider the example toolchain on
:numref:`beat-system-algorithms-output-example1-figure`. We will implement a
*data-driven* algorithm that will write one block of data on the output of the
block for each image received on its inputs. This is the simplest case.

.. code-block:: javascript

    {
        ...
        "groups": [
            {
                "inputs": {
                    "images": {
                        "type": "image/rgb"
                    },
                    "labels": {
                        "type": "label"
                    }
                },
                "outputs": {
                    "features": {
                        "type": "array/float"
                    }
                }
            }
        ],
        ...
    }


.. code-block:: python

    class Algorithm:

        def process(self, inputs, outputs):

            # Compute something from inputs['images'].data and inputs['labels'].data
            # and store the result in 'data'
            data = ...

            # Write our data block on the output
            outputs['features'].write(data)

            return True


The structure of the ``data`` object is dependent of the data format assigned
to the output.


.. _beat-system-algorithms-output-example2:

Example 2: Skip some blocks of data
...................................

.. _beat-system-algorithms-output-example2-figure:
.. figure:: ./img/outputs-example2.*

   Example 2: 6 images as input, 4 blocks of data produced, 2 blocks of data
   skipped

Consider the example toolchain on
:numref:`beat-system-algorithms-output-example2-figure`. This time, our algorithm
will use a criterion to decide if it can perform its computation on an image or
not, and tell the platform that, for a particular data index, no data is
available.

.. code-block:: javascript

    {
        ...
        "groups": [
            {
                "inputs": {
                    "images": {
                        "type": "image/rgb"
                    },
                    "labels": {
                        "type": "label"
                    }
                },
                "outputs": {
                    "features": {
                        "type": "array/float"
                    }
                }
            }
        ],
        ...
    }

.. code-block:: python

    class Algorithm:

        def process(self, inputs, data_loaders, outputs):

            # Use a criterion on the image to determine if we can perform our
            # computation on it or not
            if can_compute(inputs['images'].data):
                # Compute something from inputs['images'].data and inputs['labels'].data
                # and store the result in 'data'
                data = ...

                # Write our data block on the output
                outputs['features'].write(data)
            else:
                # Tell the platform that no data is available for this image
                outputs['features'].write(None)

            return True

        def can_compute(self, image):
            # Implementation of our criterion
            ...
            return True # or False


.. _beat-system-algorithms-output-example3:

Example 3: Write one block of data related to several received blocks of data
.............................................................................

.. _beat-system-algorithms-output-example3-figure:
.. figure:: ./img/outputs-example3.*

   Example 3: 6 images as input, 2 blocks of data produced

Consider the example toolchain on
:numref:`beat-system-algorithms-output-example3-figure`. This time, our algorithm
will compute something using all the images with the same label (all the dogs,
all the cats) and write only one block of data related to all those images.

The key here is the correct usage of the **current end data index** of the
input list to specify the indexes of the blocks of data we write on the output.
This ensure that the data will be synchronized everywhere in the toolchain: the
platform can now tell, for each of our data block, which image and label it
relates to (See section :ref:`beat-system-algorithms-input-synchronization`).

Additionally, since we can't know in advance if the image currently processed
is the last one with the current label, we need to memorize the current data
index of the input list to correctly assign it later when we effectively write
the data block on the output.

.. code-block:: javascript

    {
        ...
        "groups": [
            {
                "inputs": {
                    "images": {
                        "type": "image/rgb"
                    },
                    "labels": {
                        "type": "label"
                    }
                },
                "outputs": {
                    "features": {
                        "type": "array/float"
                    }
                }
            }
        ],
        ...
    }

.. code-block:: python

    class Algorithm:

        def __init__(self):
            self.data = None                # Block of data updated each time we
                                            # receive a new image
            self.current_label = None       # Label of the images currently processed
            self.previous_data_index = None # Data index of the input list during the
                                            # processing of the previous image

        def process(self, inputs, data_loaders, outputs):
            # Determine if we already processed some image(s)
            if self.data is not None:
                # Determine if the label has changed since the last image we processed
                if inputs['labels'].data.name != self.current_label:
                    # Write the block of data on the output
                    outputs['features'].write(self.data, self.previous_data_index)
                    self.data = None

            # Memorize the current data index of the input list
            self.previous_data_index = inputs.current_end_data_index

            # Create a new block of data if necessary
            if self.data is None:
                self.data = ...

                # Remember the label we are currently processing
                self.current_label = inputs['labels'].data.name

            # Compute something from inputs['images'].data and inputs['labels'].data
            # and update the content of 'self.data'
            ...

            # Determine if this was the last block of data or not
            if not(inputs.hasMoreData()):
                # Write the block of data on the output
                outputs['features'].write(self.data, inputs.current_end_data_index)

            return True


.. _beat-system-algorithms-loop-channel:

Soft loop communication
-----------------------

The processor and evaluator algorithm components of the soft loop macro block
communicate with each other using a LoopChannel object. This object defines the
two dataformats that will be used to make the request and the answer that will
transit through the loop channel. This class is only meant to be used by the
algorithm implementer.

.. _beat-system-algorithms-api-migration:

Migrating from API v1 to API v2
-------------------------------

Algorithm that have been written using BEAT's algorithm v1 can still be run under
v2 execution model. They are now considered legacy algorithm and should be ported
quickly to the API v2.

API v2 provides two different types of algorithms:
- Sequential
- Autonomous

The Sequential type follows the same code execution model as the v1 API, meaning
that the process function is called once for each input item.

The Autonomous type allows the developer to load the input data at will therefor
the process method will only be called once. This allows for example to optimize
loading of data to the GPU memory for faster execution.

The straightforward migration path from v1 to v2 is to make a Sequential algorithm
which will require only a few changes regarding the code.

API V1:

.. code-block:: python

    class Algorithm:

        def setup(self, parameters):
            self.sync = parameters['sync']
            return True


        def process(self, inputs, outputs):
            if inputs[self.sync].isDataUnitDone():
                outputs['out'].write({
                    'value': inputs['in1'].data.value + inputs['in2'].data.value,
                })

            return True


API V2 sequential:

.. code-block:: python

    class Algorithm:

        def setup(self, parameters):
            self.sync = parameters['sync']
            return True


        def process(self, inputs, data_loaders, outputs):
            if inputs[self.sync].isDataUnitDone():
                outputs['out'].write({
                    'value': inputs['in1'].data.value + inputs['in2'].data.value,
                })

            return True


API V2 automous:

.. code-block:: python

    class Algorithm:

        def setup(self, parameters):
            self.sync = parameters['sync']
            return True


        def process(self, data_loaders, outputs):
            data_loader = data_loaders.loaderOf('in1')

            for i in range(data_loader.count(self.sync)):
                view = data_loader.view(self.sync, i)

                (data, start, end) = view[view.count() - 1]

                outputs['out'].write({
                        'value': data['in1'].value + data['in2'].value,
                    },
                    end
                )

            return True
.. include:: links.rst