.. vim: set fileencoding=utf-8 :

.. Copyright (c) 2016 Idiap Research Institute, http://www.idiap.ch/          ..
.. Contact: beat.support@idiap.ch                                             ..
..                                                                            ..
.. This file is part of the beat.web module of the BEAT platform.             ..
..                                                                            ..
.. Commercial License Usage                                                   ..
.. Licensees holding valid commercial BEAT licenses may use this file in      ..
.. accordance with the terms contained in a written agreement between you     ..
.. and Idiap. For further information contact tto@idiap.ch                    ..
..                                                                            ..
.. Alternatively, this file may be used under the terms of the GNU Affero     ..
.. Public License version 3 as published by the Free Software and appearing   ..
.. in the file LICENSE.AGPL included in the packaging of this file.           ..
.. The BEAT platform is distributed in the hope that it will be useful, but   ..
.. WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY ..
.. or FITNESS FOR A PARTICULAR PURPOSE.                                       ..
..                                                                            ..
.. You should have received a copy of the GNU Affero Public License along     ..
.. with the BEAT platform. If not, see http://www.gnu.org/licenses/.          ..


.. _backend:

=========================
 Queues and Environments
=========================

So that you can take full advantage of existing hardware and software resources
on your experiments, it is useful to understand how the |project| platform
backend executes your experiments. A |project| backend is composed of a central
*scheduler* and associated *worker* nodes, where the user algorithms are
actually executed. When you click the ``Go!`` button on the experiment
configuration page, the declaration of this experiment transmitted to the
scheduler, that now must run the experiment until it finishes, you press the
``stop`` button, or an error condition is produced.

As it is described in the the "Toolchains" section of "Getting Started with BEAT" in `BEAT documentation`_ section, the scheduler first breaks
the toolchain into a sequence of executable blocks with dependencies. For
example: block ``B`` must be run after block ``A``. Each block is then
scheduled for execution depending on current resource availability. If no more
resources are available, then the experiment is halted until further resources
are unblocked for you. To avoid a particular user can drain out all available
resources, there is a limit in the amount of resources each user can
instantaneously consume on the backend. This value is configurable by the
system administrator and can be hardened or softened on demand.


Hardware resources
------------------

Resources in |project| are organized in what we call *slots*. When the
scheduler wants to execute the algorithm for a particular block of your
experiment, it checks if any *slot* on the farm, matching your requested
characteristics is free. If so, then the algorithm is executed on that slot.
Otwerwise, it waits until a slot of that type is available.

A *slot* represents, essentially:

  * A number of computing cores (e.g. 2)
  * An amount of RAM (e.g. 4 Gb)
  * On a machine with a particular operating system installed (e.g. Debian
    Linux, version 8.0)
  * For a given amount of time (e.g. 3 hours)

When the user algorithm occupies a slot on the backend, the platform will:

  1. Create an operating-system level process on the machine where the slot is
     to run the user algorithm
  2. Ensure the algorithm will not consume more resources than prescribed. In
     the example above, that would mean: occupy 2 physical processing cores,
     consume at most 4 Gb of RAM and, all that, for at most 3 hours.

Each slot in the platform is associated at least with one *queue*. A *queue* is
just a set of slots which share the **same** properties. Queues also have a
name, to allow users and administrators to distinguish them. Because each slot
in a queue has the same properties, the scheduler does not make any distinction
between those. The scheduler may handle any number of queues, which makes the
|project| platform able to handle different combinations of computing resources
and operating systems.

When you create an experiment, you **must** select a default queue that will be
used to execute all blocks in the experiment, short of any other specificities.
Optionally, you may use the pull-down button in the block (enabled when you
select an algorithm for a block) to override the default queue and execute the
algorithm on that block in a different one. **No built-in limitations exists**.
Block ``A`` can be executed on a queue based on Debian Linux while, at the same
experiment, Block ``B`` is executed on a queue based on Microsoft Windows. This
is also useful, for example, if your experiment uses a computing-intensive
algorithm. You can then use long-waiting queues for that purpose.

.. tip::

   Typically, systems are organized so there are more slots on queues which
   consume less resources and more slots on queues that consume more resources.
   This technique allows for optimal resource usage while still providing a way
   to run long processing jobs.


Software resources
------------------

When the user process executes on the backend, effectively running the user
algorithm, it is isolated from the backend via a special process we call an
*I/O daemon*. In reality, the user process works as a co-process to the I/O
daemon, that is responsible for controlling it, read and write data from
datasets and/or the disk cache and collect standard output and error logs
generated from user code. In this way, the user process only enjoys minimal
access to the system resources and can be properly monitored. The following
figure illustrates this relationship.

.. image:: img/sandbox.*


When the I/O daemon launches the user process, it executes it using a
predefined *environment*. An environment is nothing else than a simple wrapper
script that launches the user code enabling access to a directory on the worker
where useful modules are installed. For example, an environment based on the
Python interpreter may have the NumPy_ package installed. Another one may have
OpenCV bindings, Scikit Learn or else. Each environment is isolated from the
other and can contain any combination of packages, as desired by the platform
administrator. You can browse all `available environments`_ at the |project|
platform by selecting ``Environments`` on the ``System Resources`` tab. Each
environment is accompanied with a documentation explaining what is installed on
them.

When you create an experiment, you **must** select a default environment that
will be used to execute all blocks in the experiment, short of any other
specificities. Optionally, you may use the pull-down button in the block
(enabled when you select an algorithm for a block) to override the default
environment and execute the algorithm on that block in a different one. **No
built-in limitations exists**.  Block ``A`` can be executed on an enviroment
based on Python while, at the same experiment, Block ``B`` is executed on an
environment based on Matlab. This is also useful, for example, if your
experiment uses old algorithms, for that cannot work against recent versions of
base software packages such as NumPy_. You can use environments with previous
versions of these packages for that purpose.


.. include:: ../links.rst