10. Queues and Environments

So that you can take full advantage of existing hardware and software resources on your experiments, it is useful to understand how the BEAT platform backend executes your experiments. A BEAT backend is composed of a central scheduler and associated worker nodes, where the user algorithms are actually executed. When you click the Go! button on the experiment configuration page, the declaration of this experiment transmitted to the scheduler, that now must run the experiment until it finishes, you press the stop button, or an error condition is produced.

As it is described in the the “Toolchains” section of “Getting Started with BEAT” in BEAT documentation section, the scheduler first breaks the toolchain into a sequence of executable blocks with dependencies. For example: block B must be run after block A. Each block is then scheduled for execution depending on current resource availability. If no more resources are available, then the experiment is halted until further resources are unblocked for you. To avoid a particular user can drain out all available resources, there is a limit in the amount of resources each user can instantaneously consume on the backend. This value is configurable by the system administrator and can be hardened or softened on demand.

10.1. Hardware resources

Resources in BEAT are organized in what we call slots. When the scheduler wants to execute the algorithm for a particular block of your experiment, it checks if any slot on the farm, matching your requested characteristics is free. If so, then the algorithm is executed on that slot. Otwerwise, it waits until a slot of that type is available.

A slot represents, essentially:

  • A number of computing cores (e.g. 2)

  • An amount of RAM (e.g. 4 Gb)

  • On a machine with a particular operating system installed (e.g. Debian Linux, version 8.0)

  • For a given amount of time (e.g. 3 hours)

When the user algorithm occupies a slot on the backend, the platform will:

  1. Create an operating-system level process on the machine where the slot is to run the user algorithm

  2. Ensure the algorithm will not consume more resources than prescribed. In the example above, that would mean: occupy 2 physical processing cores, consume at most 4 Gb of RAM and, all that, for at most 3 hours.

Each slot in the platform is associated at least with one queue. A queue is just a set of slots which share the same properties. Queues also have a name, to allow users and administrators to distinguish them. Because each slot in a queue has the same properties, the scheduler does not make any distinction between those. The scheduler may handle any number of queues, which makes the BEAT platform able to handle different combinations of computing resources and operating systems.

When you create an experiment, you must select a default queue that will be used to execute all blocks in the experiment, short of any other specificities. Optionally, you may use the pull-down button in the block (enabled when you select an algorithm for a block) to override the default queue and execute the algorithm on that block in a different one. No built-in limitations exists. Block A can be executed on a queue based on Debian Linux while, at the same experiment, Block B is executed on a queue based on Microsoft Windows. This is also useful, for example, if your experiment uses a computing-intensive algorithm. You can then use long-waiting queues for that purpose.

Tip

Typically, systems are organized so there are more slots on queues which consume less resources and more slots on queues that consume more resources. This technique allows for optimal resource usage while still providing a way to run long processing jobs.

10.2. Software resources

When the user process executes on the backend, effectively running the user algorithm, it is isolated from the backend via a special process we call an I/O daemon. In reality, the user process works as a co-process to the I/O daemon, that is responsible for controlling it, read and write data from datasets and/or the disk cache and collect standard output and error logs generated from user code. In this way, the user process only enjoys minimal access to the system resources and can be properly monitored. The following figure illustrates this relationship.

../../../../../_images/sandbox.svg

When the I/O daemon launches the user process, it executes it using a predefined environment. An environment is nothing else than a simple wrapper script that launches the user code enabling access to a directory on the worker where useful modules are installed. For example, an environment based on the Python interpreter may have the NumPy package installed. Another one may have OpenCV bindings, Scikit Learn or else. Each environment is isolated from the other and can contain any combination of packages, as desired by the platform administrator. You can browse all available environments at the BEAT platform by selecting Environments on the System Resources tab. Each environment is accompanied with a documentation explaining what is installed on them.

When you create an experiment, you must select a default environment that will be used to execute all blocks in the experiment, short of any other specificities. Optionally, you may use the pull-down button in the block (enabled when you select an algorithm for a block) to override the default environment and execute the algorithm on that block in a different one. No built-in limitations exists. Block A can be executed on an enviroment based on Python while, at the same experiment, Block B is executed on an environment based on Matlab. This is also useful, for example, if your experiment uses old algorithms, for that cannot work against recent versions of base software packages such as NumPy. You can use environments with previous versions of these packages for that purpose.