5. Installation

In this section, we provide basic instructions and fundamental ideas required to deploy the BEAT platform. Depending on the deployment strategy (single machine or distributed across several machines), the installation instructions will of course differ. Nevertheless, configuring and installing a simple platform instance remains reasonably easy.

5.1. Installing beat.web

The BEAT platform is written as a set of python packages. This package (beat.web), in particular, constitutes the central deployment pillar of BEAT platform instance. It uses as a base development library, a web framework called Django. If you are unfamiliar with this framework, but wishes to deploy or develop the BEAT platform, it is recommended you familiarize yourself with it.

To deploy a platform on a single machine, it is, hence, sufficient to install beat.web to get the full BEAT software stack installed. The recipe is as follows:

$ # after downloading and extracting the beat.web package
$ python bootstrap-buildout.py
$ ./bin/buildout

These two commands should download and install all non-installed dependencies and generate a fully operational test and development environment.

Note

cpulimit has been superseded by the use of Docker

Tip

If you’d like to speed-up the installation, it is strongly advised you prepare a preset virtual environment (see the virtualenv package) with all required dependencies, so that ./bin/buildout does not download and installs all of them every time you cleanup. This technique should allow you to quickly clean-up and re-start your working environment which is useful during development.

In order to fetch currently needed dependencies, run:

$ ./bin/buildout #to setup once
$ ./bin/pip freeze > requirements.txt

Examine the file requirements.txt and remove packages you are either developing locally (e.g., all that are under src) or that you think you don’t need. The command pip freeze reports all installed packages and not only those which are needed by your project. If the Python prompt you used for bootstrapping already had a good set of packages installed, you may see them there.

Once you have a satisfying requirements.txt file, you may proceed to recreate a virtualenv you’ll use for your development. Just call:

$ virtualenv ~/work/beat-env #--system-site-packages

To create the virtual environment. This new environment does not contain system packages by default. You may override that by specifying --system-site-packages as suggested above. Then, install the required packages on your new virtual environment:

$ ~/work/beat-env/bin/pip install -r requirements.txt

After that step is done, your virtual environment is ready for deployment. You may now start from scratch to develop beat.web taking as base the Python interpreter on your virtualenv:

$ cd beat.web
$ git clean -fdx #full clean-up
$ ~/work/beat-env/bin/python bootstrap-buildout.py
$ ./bin/buildout

You’ll realize the buildout step now takes considerably less time and you may repeat this last step as much as needed. pip is a very flexible tool and you may use it to manage the virtualenv installing and removing packages as needed.

5.2. Documentation

The documentation project is divided in 3 parts. The user guide is the only one which is automatically built as part of the buildout procedure. The API and administrators guide need to be manually compiled if required.

To build the API documentation, just do:

$ ./bin/sphinx-apidoc --separate -d 2 --output=doc/api/api beat beat/web/*/migrations beat/web/*/tests
$ ./bin/sphinx-build doc/api html/api

To build the administrator guide, just do:

$ ./bin/sphinx-build doc/admin html/admin

The above commands will build the stated guides, in HTML format, and dump results into your local directory html. You may navigate then to that directory and, with your preferred web browser, open the file index.html to browse the available documentation.

The basic user guide which includes information for users of the platform, is built automatically upon buildout. If you wish to build it and place it alongside the other guides, you may do it as well like this:

$ ./bin/sphinx-build doc/user html/user

5.3. Unit Testing

After installation, it is possible to run a suite of unit tests to check for the installation sanity. To do so, use:

$ ./bin/django test --settings=beat.web.settings.test -v 1

You may pass filtering criteria to just launch tests for a particular set of beat.web applications. For example, to run tests only concerning beat.web.toolchains, run:

$ ./bin/django test --settings=beat.web.settings.test -v 1 beat.web.toolchains.tests

To measure coverage, you must set an environment variable for nose:

$ ./bin/coverage run --source='./beat/web' ./bin/django test --settings=beat.web.settings.test
$ ./bin/coverage report

Or, to generate an HTML report:

$ ./bin/coverate html

Tip

You may significatively speed-up your testing by re-using the same test database from run to run. In order to do this, just specify the flag --keepdb when you run your tests:

$ ./bin/django test --settings=beat.web.settings.test -v 1 --keepdb

In this case, Django will create and keep a test database called test.sql3 on your current directory. You may delete it when you’re done.

5.4. End-to-End Testing

Protractor is an e2e (end-to-end) testing tool for web apps. Protractor runs tests through Selenium using a real browser, and as such needs a headed environment and a compatible browser installed.

Warning

Protractor will open a new browser window in the foreground when it is started.

5.4.1. Setup

There are two system dependencies to run Selenium:

  • Java 8 must be available in your PATH

  • If you want to run the testing in a GNOME environment, you need GConf

Download/update Protractor’s dependencies into the local repository (Selenium & more):

./bin/webdriver-manager update

5.4.2. Running tests with the provided script

The protractor.sh script is a one-liner to run Protractor tests. It handles database creation/saving/restoring and manages the required local server processes. However, it assumes several things:

  • It is being ran in the top directory of the beat.web repository

  • The repository has already ran ./bin/buildout successfully and with default development configuration

  • Protractor’s .conf file is ./protractor-conf.js

  • No additional arguments need to be passed to webdriver-manager or Django runserver

  • Django uses ./django.sql3 as the database

  • If ./template.django.sql3 does not exist, the default database generated by ./bin/django install is sufficient for testing the basic tests. However, some tests will fail and it is suggested to provide a database with experiments that have been ran successfully.

5.4.3. Manual test running

If the protractor.sh script won’t work, one can test manually.

The webdriver-manager must be running while testing. To run tests using a local BEAT web server, you must have the BEAT web server up as well.

5.4.3.1. Starting the webdriver server

  • Start the webdriver server in a separate shell (or append `` &`` to run it as a background process in the current shell)

    ./bin/webdriver-manager start
    

    Important

    You may only have 1 webdriver manager running at once.

  • After the webdriver finishes initialization, you can run tests

    ./bin/protractor protractor-conf.js
    
  • If you started your webdriver server as a background process, you can kill all webdriver processes

    pkill -f webdriver-manager
    

5.4.4. Understanding the output of Protractor

By default Protractor prints to STDOUT. If a test passes, nothing is printed about that particular test. If a test fails, Protractor will print more information about the failure, including the specific test, type of failure that occurred, and a stack trace. At the end of testing, Protractor will print a summary of the test run.

5.4.4.1. Saving test results

Beyond simply piping Protractor’s output to a file, you may enable detailed logging via a specified JSON file. Just uncomment the relevant line in protractor-conf.js and optionally change the output file location:

//resultJsonOutputFile: './protractor-test-results.json'

5.4.5. Adding your test to Protractor

The configuration file detailing the test files is protractor-conf.js. The specs field is a comma-separated list of test files - just add your new test file to the list and run protractor again.

For example, to add the test file example-spec.js:

  • Before

    specs: [
           './beat/web/reports/static/reports/test/test-spec.js'
    ],
    
  • After

    specs: [
           './beat/web/reports/static/reports/test/test-spec.js',
           'example-spec.js'
    ],
    

5.4.6. Overriding Protractor’s browser choices

In protractor-conf.js, add a multiCapabilities option in the following format:

multiCapabilities: [
    {
            browserName: '<browser name 1>'
    },
    {
            browserName: '<browser name 2>'
    },
    ...
]

Note

You may need to download your browsers’ WebDrivers separately - see the official Selenium docs.

5.4.7. Writing Protractor tests

Protractor uses and expects tests to use the Jasmine BDD testing framework. For a tutorial on writing Protractor tests, see the official Protractor tutorial. Protractor also has documentation on their website.

5.4.7.1. BEAT platform & Protractor’s Angular support

By default, Protractor assumes that the tested website will use Angular in a particular fashion to more intelligently detect a page that has finished rendering. However, the BEAT platform does not use Angular this way, and Protractor will hang forever. To tell Protractor not to assume this compatibility, add the following line at the top of each top-level describe block in your test files:

browser.ignoreSynchronization = true;

5.5. Instantiating and Starting a Development System

For a simple (development) system, the default settings on beat/web/settings/settings.py should work out of the box. These settings:

  • Instantiate the web service on the local host under port 8000 (the address will be http://127.0.0.1:8000

  • Use an SQLITE3 database named django.sql3 located on the current working directory

  • Run with full debug output

  • It sets the working BEAT prefix to ./web_dynamic_data

  • A single user, called user will be setup into the system. This user will have administrative powers.

If you need to tweak these settings, just edit the file beat/web/settings/settings.py. You may also consult the Django documentation for detailed information on other settings.

Once the Django settings are tweaked to your liking, you can run a single command to fully populate the development webserver with test databases, toolchains, algorithms and experiments:

$ ./bin/django install -v1

Note

Concerning databases installed by this command, we only explain the platform how to access their data. It does not download the raw data for the databases that you must procure yourself through the relevant web sites (checkout the database pages on the Idiap instance of the BEAT platform for details).

Note

If you need to specify your own path to the directories containing the databases, you could just create a simple JSON file as follows:

{
  "atnt/1": "/remote/databases/atnt",
  "banca/2": "/remote/databases/banca"
}

Then just use the previous script with the option --database-root-file:

$ ./bin/django install -v1 --database-root-file=MYFILE.json

By default, paths to the root of all databases are set to match the Idiap Research Institute filesystem organisation.

Note

For every installed database, you’ll need to generate their data indices, which allows the platform to correctly parallelize algorithms. To do so, for every combination of database and version you wish to support, run the following command:

$ ./bin/beat -p prefix db index <name>/<version>

Replacing the strings <name> by the name of the database you wish to dump the indices for, together with the version in <version>. For example, to dump the indices for the AT&T database, version 1, do the following:

$ ./bin/beat -p prefix db index atnt/1

Once the contributions and users are in place, you’re ready to start the test server:

$ ./bin/django runserver

At this point, the platform can be accessed by typing the URL http://127.0.0.1:8000 in a web browser on the machine the server is running.

Note

To use a dedicated database server such as PostgreSQL, it is sufficient to configure its Django-like settings in beat/web/settings/settings.py, assuming the the database server is operational.

5.5.1. All-in-one Platform

The BEAT platform is composed of 3 application types that run in synchrony to create, store and process your experiments: the web server, the scheduler and one or more workers. The web server is used by you to create and launch experiments. The scheduler assigns experiment blocks (actually beat.web.backend.JobSplit’s) to run in one of the available workers, respecting user quotas and worker limitations. The worker runs the user algorithms installed on each block upon scheduling, notifying the web server when it’s done.

The base software framework and models that allow the 3 applications to run cooperatively are described in one single place: the Django models and the central database of this package. Effectively, it means this package contains all information that is required to run the 3 types of applications. The applications “communicate” between each other using the shared Django database, reading and modifying objects as experiments are assigned and treated. Several deployment scenarios are therefore possible and you must use the one most suited for your requirements.

In order to start the system, just run:

$ ./bin/django runserver

Once the Django development web server is up and running, open a browser and navigate to http://127.0.0.1:8000. Login with an account with administrative rights and click on the scheduler icon, using the omni-bar, on the top of any page. Use the “Helper panel” available to launch one-off or repetitive scheduling and/or worker activities. In this case, both the scheduling and worker activities run in the context of the web server process.

5.5.2. Discrete Platform using Localhost

It is also possible to run each of the applications as separated processes. Here is how to do it.

  1. Start the web service normally:

    $ ./bin/django runserver
    
  2. Start the full scheduling setup:

    $ ./bin/django full_scheduling
    

This will start all elements of the scheduling/working process. Docker can be used for the worker node passing the --docker option.

Each element composing the scheduling can also be started separately:

  1. Start a the broker node:

    $ ./bin/django broker -v 2
    
  2. Start a single scheduling node:

    $ ./bin/django scheduler -v 2
    
  3. Start a worker for your current node:

    $ ./bin/django worker -v 2
    

By default, the applications are configured to figure out paths and configuration options by themselves. You can override some defaults via the command line. Just check the output of each of those commands running the --help flag on any of them.

5.5.3. Mixing and matching

You can mix and match any of the above techniques to run a 4-node system (all-in-one or discrete) to build a test system to suite to your needs. For example, it is possible to launch the scheduling activities using the web server and the page reload trick while launching the worker process separately as per above.

5.5.4. Going full scale

In order to transform the development system into a fully scale platform, you will have to create your own maintenance scripts allowing you to automatically start/stop, update and secure the BEAT platform applications across your BEAT web nodes. It is beyond the scope of this documentation to enter into details concerning these. We provide only some tips which we consider important:

  • Don’t use the SQLite backend on a production system, it does not work well with the concurrency you may generate. Prefer a PostGRES SQL database.

  • The “cache” directory (see the variable CACHE_ROOT on the Django settings file) is shared amongst all applications in the cluster. It is adviseable you use a proper networked filesystem with good synchronisation primitives to avoid issues concerning the production and consumption of data caches between workers living in different nodes.

  • Don’t rely on your memory: script all deployment instructions so that you can do them routinely whenever newer versions come up or you have an issue.

  • Security: You’ll be running code uploaded by users on your computer. Make sure you properly isolate each of the processes and the backend farm to avoid unpleasant surprises. Some helpers:

    • Disk access: two main directories are shared across the applications. The cache directory stores intermediary block results. The prefix directory stores user contributions on disk. You may tune the file system access on a distributed BEAT platform to increase its security:

      • The web server only needs read access to the cache directories. It needs read and write access to the prefix directory in order to store user contributions.

      • The scheduler needs read/write access to the cache directory. It does not use the prefix directory and does not read or treat user contributions. The scheduler also need access to the Django database.

      • The workers need read/write access to the cache directory and read access to the prefix directory. The workers also need access to the Django database.

      • The processes launched by the worker need to have similar permissions as their worker. The user executable though, should have demoted permissions to increase security. For example, no need to access the Django database (or the settings file), the prefix or the cache. All is done via the parent process. In order to implement this, the easiest is to make sure the worker process is run by an unpriviledged user and a group with the right access permissions, allowing it to access the Django database (and the Django settings file), the prefix and the cache. This will be inherited by the processes launched by the worker, that will serve data to the processes wrapping the user code. To demote the user process, just set the group id of the environment executable to an unpriviledged group. This way, the following security chain is achieved (pseudo user/groups):

           worker        ->      process      -> environment exec(user code)
        [nobody:beat]         [nobody:beat]           [nobody:nogroup]
        

        It is a requirement by the BEAT platform that this process chain belongs to the same user. Signals for stopping or killing the applications in the chain if necessary.

        If you don’t do anything, then the user code will be run in a process with the same privileges as the worker application.

    • E-mail privileges: e-mailing maybe configured as part of the Django standard logging facilities or used to report experiment completion and other platform activity. While, by default, all node types have access to Django the configuration and can potentially send e-mails, it is wiser to use a Django extension such as Post-office to centralize e-mail sending to one node, avoiding potential spam.

    • User processes: user code is run in isolated processes launched by the children of worker processes. Because the user code process does not require disk access to either the prefix or the cache, it should run without access to those resources in order to improve the platform security. This may be achieved by running user processes in chroot’ed environments or making sure user code is launched with a user identity which has far fewer access permissions than the worker process itself. Have a look at the --help output of the worker application for more information and examples.

You may contact our support in case you need advice concerning this topic.

5.6. Development Notes

5.6.1. Backup and Restore

The BEAT platform can be backed-up and restore easily. These commands allow for safe information keeping, but also to copy over the state of a given deployment to a local development server, where more thorough tests can be performed while tracking a bug or improving performance.

It is easy to quickly setup a local system for development, taking as base the current state of a production system. Here are some instructions:

  1. Before starting, make sure you have gone through, at least once, over the instructions above. It explains the very basic setup required for a complete development environment.

  2. Dump and back-up your current production BEAT database:

    [production]$ ./bin/django backup
    
  3. [Optional] If you have made important modifications between the contents available at your production server and your currently checked-out source, you’ll need to run Django migrations on data imported from the production server. If you need to do this, make sure you don’t have unapplied commits to your local development package and reset it to the production tag:

    [development]$ git checkout <production-tag>
    

    Note

    You can figure you the production tag by looking at the footer of the BEAT website. The corresponding tag name is found by prefixing a v before the version number. For example, the tag for version 0.8.2 of the platform is v0.8.2.

    Also make sure to revert all dependent packages, so as to recreate the state of the database schema as on the production site.

  4. Remove the current local development database so that the restore operation can start from scratch:

    [development]$ rm -rf django.sql3 web_dynamic_data
    
  5. Copy the backup tarball from the production server and restore it locally:

    [development]$ scp root@<beatproductionmachine>:backups/<backup-filename>.tar.bz2
    [development]$ ./bin/django restore <backup-filename>.tar.bz2
    

    At this point, you have recreated a copy of your production system locally, on your SQLite3 database.

  6. Reset queue configuration to allow for local running.

    You may, optionally, reset the queue configuration of your installation so that the environment you have is compatible with your development machine, so that you can immediately run experiments locally. To do so, use the qsetup Django command:

    [development]$ ./bin/django qsetup --reset
    
  7. Re-checkout the tip:

    $ git co master #or any other branch
    
  8. Apply migrations:

    $ ./bin/django migrate
    

At this point, you should have a complete development setup with all elements available on the production system installed locally. This system is fully capable of running experiments locally using your machine.

5.6.2. Testing Django Migrations

Django migrations, introduced in version 1.7, is a useful feature for automatically migrating your database to new model schemas, if you get it right. Here is a recipe to make sure your migrations will work on your production system, allowing for quick and repetitive test/fix cycles.

The key idea is that we follow the setup for the administratorguide-installation-localhost-snapshot and then, locally backup our database and prefix so that we can quickly reproduce the migration test loop.

  1. Make sure you go through the administratorguide-installation-localhost-snapshot instructions above (up to step 6 only).

  2. Make a copy of the SQLite3 database:

    $ cp -a django.sql3 django.sql3.backup
    

    This backup will allow you to quickly test the migrations w/o having to checkout the production version anymore.

    Also, create a temporary git repository of web_dynamic_data, so you can cross-check changes and reset it in case of problems:

    $ cd web_dynamic_data
    $ git init .
    $ git add .
    $ git commit -m "Initial commit"
    $ cd ..
    
  3. Go back to the HEAD or branch you were developping before:

    $ git checkout HEAD
    
  4. Here is how to test/fix your migrations:

    1. Run “django migrate”:

      $ ./bin/django migrate
      
    2. Check your database by visually inspecting it on the django web admin or by manually dumping it.

    3. If a problem is detected, fix it and revert the state:

      $ cp -af django.sql3.backup django.sql3
      $ cd web_dynamic_data && git reset --hard HEAD && git clean -fdx . \
        & cd ..
      

      Note

      Tip: Write the above lines in a shell script so it is easy to repeat.

      Go back to a. and restart.

5.6.3. Javascript Management with Node.js/Bower

We manage javascript external packages with the help of Bower. If you’d like to include more packages that will be statically served with the Django web app, please consider including them at the appropriate section of buildout.cfg.

5.7. Issues

If you find problems concerning this package, please post a message to our group mailing list. Currently open issues can be tracked at our gitlab page.