10. Queues and Environments¶
So that you can take full advantage of existing hardware and software resources
on your experiments, it is useful to understand how the BEAT platform
backend executes your experiments. A BEAT backend is composed of a central
scheduler and associated worker nodes, where the user algorithms are
actually executed. When you click the
Go! button on the experiment
configuration page, the declaration of this experiment transmitted to the
scheduler, that now must run the experiment until it finishes, you press the
stop button, or an error condition is produced.
As it is described in the the “Toolchains” section of “Getting Started with BEAT” in BEAT documentation section, the scheduler first breaks
the toolchain into a sequence of executable blocks with dependencies. For
B must be run after block
A. Each block is then
scheduled for execution depending on current resource availability. If no more
resources are available, then the experiment is halted until further resources
are unblocked for you. To avoid a particular user can drain out all available
resources, there is a limit in the amount of resources each user can
instantaneously consume on the backend. This value is configurable by the
system administrator and can be hardened or softened on demand.
10.1. Hardware resources¶
Resources in BEAT are organized in what we call slots. When the scheduler wants to execute the algorithm for a particular block of your experiment, it checks if any slot on the farm, matching your requested characteristics is free. If so, then the algorithm is executed on that slot. Otwerwise, it waits until a slot of that type is available.
A slot represents, essentially:
A number of computing cores (e.g. 2)
An amount of RAM (e.g. 4 Gb)
On a machine with a particular operating system installed (e.g. Debian Linux, version 8.0)
For a given amount of time (e.g. 3 hours)
When the user algorithm occupies a slot on the backend, the platform will:
Create an operating-system level process on the machine where the slot is to run the user algorithm
Ensure the algorithm will not consume more resources than prescribed. In the example above, that would mean: occupy 2 physical processing cores, consume at most 4 Gb of RAM and, all that, for at most 3 hours.
Each slot in the platform is associated at least with one queue. A queue is just a set of slots which share the same properties. Queues also have a name, to allow users and administrators to distinguish them. Because each slot in a queue has the same properties, the scheduler does not make any distinction between those. The scheduler may handle any number of queues, which makes the BEAT platform able to handle different combinations of computing resources and operating systems.
When you create an experiment, you must select a default queue that will be
used to execute all blocks in the experiment, short of any other specificities.
Optionally, you may use the pull-down button in the block (enabled when you
select an algorithm for a block) to override the default queue and execute the
algorithm on that block in a different one. No built-in limitations exists.
A can be executed on a queue based on Debian Linux while, at the same
B is executed on a queue based on Microsoft Windows. This
is also useful, for example, if your experiment uses a computing-intensive
algorithm. You can then use long-waiting queues for that purpose.
Typically, systems are organized so there are more slots on queues which consume less resources and more slots on queues that consume more resources. This technique allows for optimal resource usage while still providing a way to run long processing jobs.
10.2. Software resources¶
When the user process executes on the backend, effectively running the user algorithm, it is isolated from the backend via a special process we call an I/O daemon. In reality, the user process works as a co-process to the I/O daemon, that is responsible for controlling it, read and write data from datasets and/or the disk cache and collect standard output and error logs generated from user code. In this way, the user process only enjoys minimal access to the system resources and can be properly monitored. The following figure illustrates this relationship.
When the I/O daemon launches the user process, it executes it using a
predefined environment. An environment is nothing else than a simple wrapper
script that launches the user code enabling access to a directory on the worker
where useful modules are installed. For example, an environment based on the
Python interpreter may have the NumPy package installed. Another one may have
OpenCV bindings, Scikit Learn or else. Each environment is isolated from the
other and can contain any combination of packages, as desired by the platform
administrator. You can browse all available environments at the BEAT
platform by selecting
Environments on the
System Resources tab. Each
environment is accompanied with a documentation explaining what is installed on
When you create an experiment, you must select a default environment that
will be used to execute all blocks in the experiment, short of any other
specificities. Optionally, you may use the pull-down button in the block
(enabled when you select an algorithm for a block) to override the default
environment and execute the algorithm on that block in a different one. No
built-in limitations exists. Block
A can be executed on an enviroment
based on Python while, at the same experiment, Block
B is executed on an
environment based on Matlab. This is also useful, for example, if your
experiment uses old algorithms, for that cannot work against recent versions of
base software packages such as NumPy. You can use environments with previous
versions of these packages for that purpose.