.. vim: set fileencoding=utf-8 :
.. Copyright (c) 2016 Idiap Research Institute, http://www.idiap.ch/ ..
.. Contact: beat.support@idiap.ch ..
.. ..
.. This file is part of the beat.web module of the BEAT platform. ..
.. ..
.. Commercial License Usage ..
.. Licensees holding valid commercial BEAT licenses may use this file in ..
.. accordance with the terms contained in a written agreement between you ..
.. and Idiap. For further information contact tto@idiap.ch ..
.. ..
.. Alternatively, this file may be used under the terms of the GNU Affero ..
.. Public License version 3 as published by the Free Software and appearing ..
.. in the file LICENSE.AGPL included in the packaging of this file. ..
.. The BEAT platform is distributed in the hope that it will be useful, but ..
.. WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY ..
.. or FITNESS FOR A PARTICULAR PURPOSE. ..
.. ..
.. You should have received a copy of the GNU Affero Public License along ..
.. with the BEAT platform. If not, see http://www.gnu.org/licenses/. ..
.. _administratorguide-idiap_platform:
Example: The Platform Deployed at Idiap Research Institute
==========================================================
This section gives some insight into the BEAT platform deployed at Idiap
Research Institute, which is now `publicly available
`_.
.. _administratorguide-idiap_platform-strategy:
Deployment Strategy
-------------------
BEAT has been carefully designed, such that a platform can be easily deployed
in a distributed manner. At Idiap, we have opted for a deployment strategy,
which is somehow in between
:ref:`administratorguide-hardware_guidelines-distributed-nodes` and
:ref:`administratorguide-hardware_guidelines-distributed-architecture`.
First, a server called `beatweb` hosts both the web server as well as a
dedicated PostgreSQL database server. Second, a server called `beatsched`
hosts the scheduler, which is in charge of splitting jobs across several
worker nodes (named according to the pattern `beatproc*`). Finally, for
administration purposes, a dedicated server called `beatadm` is employed.
Cache data are stored on an NFS infrastructure.
.. _administratorguide-idiap_platform-hardware:
Hardware Specifications
-----------------------
Computing Hardware
..................
Following a thorough comparison and evaluation of all aspects - features,
operations, maintenance, warranty, cost, etc. - of several IT solutions,
Idiap chose in 2011 to rely on IBM BladeCenter (H) solutions as its
processing resources. Retrospectively, experience has shown that if such
solutions do possess some caveats - IBM hardware undoubtedly requires greater
knowledge (and patience) to reach configuration objectives - they do allow in
the end to lower the overall operational burden (and cost) as well as provide
the means to significantly/easily increase the global system performances.
Based on that experience, Idiap chose in early 2014 to extend its processing
hardware base with IBM FlexSystem solutions for the BEAT platform. Overall,
there are TODO nodes, each node consisting of two Intel Xeon E5-2690v2
(20 cores) with 256GB of DDR3 RAM.
Storage
.......
Historically, Idiap has relied on NetApp filers as its main storage resource.
Even though competitors alternatives have been analyzed when major new
investments were looked into, Idiap has stuck to this original choice for
the BEAT platform, and chose a NetApp 3220 dual head network filer with
20TB of *mirrored* storage (10 TB usable capacity).
Summary
.......
.. _administratorguide-idiapplatform-hardware-physical:
.. figure:: img/physical-platform.*
:width: 80%
Physical hardware of the platform deployed at Idiap
The resulting hardware infrastructure is summarized in
:numref:`administratorguide-idiapplatform-hardware-physical`. Communication
between each machine and the storage is through a 10Gbits/s switch HP Procurve
E8212zl.
.. _administratorguide-idiap_platform-virtualization:
Virtualization
--------------
Virtualizing resources - servers, storage, networks - is now part of every IT
departments life. It consists of creating a virtual (rather than actual)
version of something. For instance, a virtual machine (VM) is an abstraction
of the computer hardware that allows a single machine to behave as if it were
many machines.
While virtualization is not a strong required for the deployment of a BEAT
platform, this provides significant benefits such as:
* Dynamic load balancing, by moving virtual machines to underutilized servers
and/or by reallocating and instantiating resources whenever required.
* Improving flexibility by allowing several applications requiring different
environments to run on the same physical machine.
* Enabling a virtual image on a machine to be instantly moved on another
server, e.g. if a machine failure occurs.
* Improving system reliability and availability, since virtualization may
prevent system crashes due to memory corruption caused by software like
device drivers, and it helps to avoid service interruption in case of
physical maintenance.
For the platform deployed at Idiap, virtualization is a versatile tool to
perform dynamic load balancing. All the previously described servers
(`beatweb`, `beatsched` and `beatadm`) as well as the workers are indeed
virtual machines. This allows the creation of several single core workers with
different computing environment from a single powerful multi-core machine,
as well as to adapt the worker specifications without too much effort
whenever required. Similarly, increasing the capacity (RAM of number of cores)
of a server such as `beatweb` is possible, when the platform become more
mature, with, hence, an increased website traffic.
.. _administratorguide-idiap_platform-software:
Software Specifications
-----------------------
Though its Unix history had it venture on the soil of various Unix-like
operating systems, Idiap nowadays rely solely on the Debian Linux (64-bit)
distribution to power its servers infrastructure. Favoring stability and
security over leading-edginess of open source software - as far as servers are
concerned - Idiap relies in particular on the Debian/Stable branch, also known
as *Debian/Wheezy*.
Since 2011, Idiap has been virtualizing its servers resources using the open
source virtualization and high-availability software described below, all
readily available as (appropriately bundled and pre-configured) Debian
packages:
* `KVM `_ (hardware-accelerated virtualization)
* `QEMU `_ (x86 hardware emulation/virtualization)
* `libvirt `_ (virtualization (abstraction) API)
* `Corosync `_ (group (cluster) communication system)
* `Pacemaker `_ (high-availability resource manager)
.. _administratorguide-idiap_platform-storage:
Storage Organization
--------------------
An NFS infrastructure is employed to store data in a distributed manner. In
particular, data are organized into several partitions as follows:
* **/remote/dataset** contains the raw data from the scientific databases on
which experiments are conducted.
* **/remote/cache** contains the data generated by the scientific experiments,
outputs of all intermediate blocks included.
* **/remote/sw** contains the full stack of BEAT software. Installing the
software centrally rather than locally on each server/machine reduces
maintenance efforts, since software update is performed once centrally
rather than on several nodes.
* **/remote/environment** contains environments to run scientific experiments.
* **/remote/prefix** contains the definition of user-defined objects such as
algorithms, toolchains and experiments.
The BEAT machines have different access levels to each of these partitions.
This is useful for security purposes, since BEAT servers only have the file
permission they need to do their work. On one side, the administrative server
`beatadm` has read and write acess to all partitions. In contrast, other BEAT
servers have limited access to these partitions.
`beatsched` has read and write access to a single partition `/remote/cache/`,
such that cache files can be moved to their final destinations, when a worker
successfully completes a job. In contrast, the partitions `/remote/dataset`,
`/remote/environment`, `/remote/prefix` and `/remote/sw` are only accessible
in read mode.
Similarly, the workers `beatproc*` have read only access to the partitions
`/remote/dataset`, `/remote/environment`, `/remote/prefix` and `/remote/sw`.
Besides, they have a full read and write access to `/remote/cache/` to be
able to read the inputs and to write the outputs of the jobs they are
assigned to.
Finally, `beatweb` has read and write access to `/remote/prefix`, allowing
users to add and remove content through the BEAT website. Furthermore, it has
only read access to `/remote/cache`, `/remote/sw` and `/remote/environment`.
Besides, it has no access to `/remote/dataset`, since this would be useless.