.. vim: set fileencoding=utf-8 : .. Copyright (c) 2016 Idiap Research Institute, http://www.idiap.ch/ .. .. Contact: beat.support@idiap.ch .. .. .. .. This file is part of the beat.web module of the BEAT platform. .. .. .. .. Commercial License Usage .. .. Licensees holding valid commercial BEAT licenses may use this file in .. .. accordance with the terms contained in a written agreement between you .. .. and Idiap. For further information contact tto@idiap.ch .. .. .. .. Alternatively, this file may be used under the terms of the GNU Affero .. .. Public License version 3 as published by the Free Software and appearing .. .. in the file LICENSE.AGPL included in the packaging of this file. .. .. The BEAT platform is distributed in the hope that it will be useful, but .. .. WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY .. .. or FITNESS FOR A PARTICULAR PURPOSE. .. .. .. .. You should have received a copy of the GNU Affero Public License along .. .. with the BEAT platform. If not, see http://www.gnu.org/licenses/. .. .. _backend: ========================= Queues and Environments ========================= So that you can take full advantage of existing hardware and software resources on your experiments, it is useful to understand how the |project| platform backend executes your experiments. A |project| backend is composed of a central *scheduler* and associated *worker* nodes, where the user algorithms are actually executed. When you click the ``Go!`` button on the experiment configuration page, the declaration of this experiment transmitted to the scheduler, that now must run the experiment until it finishes, you press the ``stop`` button, or an error condition is produced. As it is described in the the "Toolchains" section of "Getting Started with BEAT" in `BEAT documentation`_ section, the scheduler first breaks the toolchain into a sequence of executable blocks with dependencies. For example: block ``B`` must be run after block ``A``. Each block is then scheduled for execution depending on current resource availability. If no more resources are available, then the experiment is halted until further resources are unblocked for you. To avoid a particular user can drain out all available resources, there is a limit in the amount of resources each user can instantaneously consume on the backend. This value is configurable by the system administrator and can be hardened or softened on demand. Hardware resources ------------------ Resources in |project| are organized in what we call *slots*. When the scheduler wants to execute the algorithm for a particular block of your experiment, it checks if any *slot* on the farm, matching your requested characteristics is free. If so, then the algorithm is executed on that slot. Otwerwise, it waits until a slot of that type is available. A *slot* represents, essentially: * A number of computing cores (e.g. 2) * An amount of RAM (e.g. 4 Gb) * On a machine with a particular operating system installed (e.g. Debian Linux, version 8.0) * For a given amount of time (e.g. 3 hours) When the user algorithm occupies a slot on the backend, the platform will: 1. Create an operating-system level process on the machine where the slot is to run the user algorithm 2. Ensure the algorithm will not consume more resources than prescribed. In the example above, that would mean: occupy 2 physical processing cores, consume at most 4 Gb of RAM and, all that, for at most 3 hours. Each slot in the platform is associated at least with one *queue*. A *queue* is just a set of slots which share the **same** properties. Queues also have a name, to allow users and administrators to distinguish them. Because each slot in a queue has the same properties, the scheduler does not make any distinction between those. The scheduler may handle any number of queues, which makes the |project| platform able to handle different combinations of computing resources and operating systems. When you create an experiment, you **must** select a default queue that will be used to execute all blocks in the experiment, short of any other specificities. Optionally, you may use the pull-down button in the block (enabled when you select an algorithm for a block) to override the default queue and execute the algorithm on that block in a different one. **No built-in limitations exists**. Block ``A`` can be executed on a queue based on Debian Linux while, at the same experiment, Block ``B`` is executed on a queue based on Microsoft Windows. This is also useful, for example, if your experiment uses a computing-intensive algorithm. You can then use long-waiting queues for that purpose. .. tip:: Typically, systems are organized so there are more slots on queues which consume less resources and more slots on queues that consume more resources. This technique allows for optimal resource usage while still providing a way to run long processing jobs. Software resources ------------------ When the user process executes on the backend, effectively running the user algorithm, it is isolated from the backend via a special process we call an *I/O daemon*. In reality, the user process works as a co-process to the I/O daemon, that is responsible for controlling it, read and write data from datasets and/or the disk cache and collect standard output and error logs generated from user code. In this way, the user process only enjoys minimal access to the system resources and can be properly monitored. The following figure illustrates this relationship. .. image:: img/sandbox.* When the I/O daemon launches the user process, it executes it using a predefined *environment*. An environment is nothing else than a simple wrapper script that launches the user code enabling access to a directory on the worker where useful modules are installed. For example, an environment based on the Python interpreter may have the NumPy_ package installed. Another one may have OpenCV bindings, Scikit Learn or else. Each environment is isolated from the other and can contain any combination of packages, as desired by the platform administrator. You can browse all `available environments`_ at the |project| platform by selecting ``Environments`` on the ``System Resources`` tab. Each environment is accompanied with a documentation explaining what is installed on them. When you create an experiment, you **must** select a default environment that will be used to execute all blocks in the experiment, short of any other specificities. Optionally, you may use the pull-down button in the block (enabled when you select an algorithm for a block) to override the default environment and execute the algorithm on that block in a different one. **No built-in limitations exists**. Block ``A`` can be executed on an enviroment based on Python while, at the same experiment, Block ``B`` is executed on an environment based on Matlab. This is also useful, for example, if your experiment uses old algorithms, for that cannot work against recent versions of base software packages such as NumPy_. You can use environments with previous versions of these packages for that purpose. .. include:: ../links.rst