2. Example: The Platform Deployed at Idiap Research Institute

This section gives some insight into the BEAT platform deployed at Idiap Research Institute, which is now publicly available.

2.1. Deployment Strategy

BEAT has been carefully designed, such that a platform can be easily deployed in a distributed manner. At Idiap, we have opted for a deployment strategy, which is somehow in between Distributed processing nodes and Load-balanced distributed architecture.

First, a server called beatweb hosts both the web server as well as a dedicated PostgreSQL database server. Second, a server called beatsched hosts the scheduler, which is in charge of splitting jobs across several worker nodes (named according to the pattern beatproc*). Finally, for administration purposes, a dedicated server called beatadm is employed. Cache data are stored on an NFS infrastructure.

2.2. Hardware Specifications

2.2.1. Computing Hardware

Following a thorough comparison and evaluation of all aspects - features, operations, maintenance, warranty, cost, etc. - of several IT solutions, Idiap chose in 2011 to rely on IBM BladeCenter (H) solutions as its processing resources. Retrospectively, experience has shown that if such solutions do possess some caveats - IBM hardware undoubtedly requires greater knowledge (and patience) to reach configuration objectives - they do allow in the end to lower the overall operational burden (and cost) as well as provide the means to significantly/easily increase the global system performances. Based on that experience, Idiap chose in early 2014 to extend its processing hardware base with IBM FlexSystem solutions for the BEAT platform. Overall, there are TODO nodes, each node consisting of two Intel Xeon E5-2690v2 (20 cores) with 256GB of DDR3 RAM.

2.2.2. Storage

Historically, Idiap has relied on NetApp filers as its main storage resource. Even though competitors alternatives have been analyzed when major new investments were looked into, Idiap has stuck to this original choice for the BEAT platform, and chose a NetApp 3220 dual head network filer with 20TB of mirrored storage (10 TB usable capacity).

2.2.3. Summary

../../../../_images/physical-platform.svg

Fig. 2.1 Physical hardware of the platform deployed at Idiap

The resulting hardware infrastructure is summarized in Fig. 2.1. Communication between each machine and the storage is through a 10Gbits/s switch HP Procurve E8212zl.

2.3. Virtualization

Virtualizing resources - servers, storage, networks - is now part of every IT departments life. It consists of creating a virtual (rather than actual) version of something. For instance, a virtual machine (VM) is an abstraction of the computer hardware that allows a single machine to behave as if it were many machines.

While virtualization is not a strong required for the deployment of a BEAT platform, this provides significant benefits such as:

  • Dynamic load balancing, by moving virtual machines to underutilized servers and/or by reallocating and instantiating resources whenever required.

  • Improving flexibility by allowing several applications requiring different environments to run on the same physical machine.

  • Enabling a virtual image on a machine to be instantly moved on another server, e.g. if a machine failure occurs.

  • Improving system reliability and availability, since virtualization may prevent system crashes due to memory corruption caused by software like device drivers, and it helps to avoid service interruption in case of physical maintenance.

For the platform deployed at Idiap, virtualization is a versatile tool to perform dynamic load balancing. All the previously described servers (beatweb, beatsched and beatadm) as well as the workers are indeed virtual machines. This allows the creation of several single core workers with different computing environment from a single powerful multi-core machine, as well as to adapt the worker specifications without too much effort whenever required. Similarly, increasing the capacity (RAM of number of cores) of a server such as beatweb is possible, when the platform become more mature, with, hence, an increased website traffic.

2.4. Software Specifications

Though its Unix history had it venture on the soil of various Unix-like operating systems, Idiap nowadays rely solely on the Debian Linux (64-bit) distribution to power its servers infrastructure. Favoring stability and security over leading-edginess of open source software - as far as servers are concerned - Idiap relies in particular on the Debian/Stable branch, also known as Debian/Wheezy.

Since 2011, Idiap has been virtualizing its servers resources using the open source virtualization and high-availability software described below, all readily available as (appropriately bundled and pre-configured) Debian packages:

  • KVM (hardware-accelerated virtualization)

  • QEMU (x86 hardware emulation/virtualization)

  • libvirt (virtualization (abstraction) API)

  • Corosync (group (cluster) communication system)

  • Pacemaker (high-availability resource manager)

2.5. Storage Organization

An NFS infrastructure is employed to store data in a distributed manner. In particular, data are organized into several partitions as follows:

  • /remote/dataset contains the raw data from the scientific databases on which experiments are conducted.

  • /remote/cache contains the data generated by the scientific experiments, outputs of all intermediate blocks included.

  • /remote/sw contains the full stack of BEAT software. Installing the software centrally rather than locally on each server/machine reduces maintenance efforts, since software update is performed once centrally rather than on several nodes.

  • /remote/environment contains environments to run scientific experiments.

  • /remote/prefix contains the definition of user-defined objects such as algorithms, toolchains and experiments.

The BEAT machines have different access levels to each of these partitions. This is useful for security purposes, since BEAT servers only have the file permission they need to do their work. On one side, the administrative server beatadm has read and write acess to all partitions. In contrast, other BEAT servers have limited access to these partitions.

beatsched has read and write access to a single partition /remote/cache/, such that cache files can be moved to their final destinations, when a worker successfully completes a job. In contrast, the partitions /remote/dataset, /remote/environment, /remote/prefix and /remote/sw are only accessible in read mode.

Similarly, the workers beatproc* have read only access to the partitions /remote/dataset, /remote/environment, /remote/prefix and /remote/sw. Besides, they have a full read and write access to /remote/cache/ to be able to read the inputs and to write the outputs of the jobs they are assigned to.

Finally, beatweb has read and write access to /remote/prefix, allowing users to add and remove content through the BEAT website. Furthermore, it has only read access to /remote/cache, /remote/sw and /remote/environment. Besides, it has no access to /remote/dataset, since this would be useless.