5. Installation¶
In this section, we provide basic instructions and fundamental ideas required to deploy the BEAT platform. Depending on the deployment strategy (single machine or distributed across several machines), the installation instructions will of course differ. Nevertheless, configuring and installing a simple platform instance remains reasonably easy.
5.1. Installing beat.web¶
The BEAT platform is written as a set of python packages. This package (beat.web), in particular, constitutes the central deployment pillar of BEAT platform instance. It uses as a base development library, a web framework called Django. If you are unfamiliar with this framework, but wishes to deploy or develop the BEAT platform, it is recommended you familiarize yourself with it.
To deploy a platform on a single machine, it is, hence, sufficient to install
beat.web
to get the full BEAT software stack installed. The recipe is as
follows:
$ # after downloading and extracting the beat.web package
$ python bootstrap-buildout.py
$ ./bin/buildout
These two commands should download and install all non-installed dependencies and generate a fully operational test and development environment.
Note
cpulimit has been superseded by the use of Docker
Tip
If you’d like to speed-up the installation, it is strongly advised you
prepare a preset virtual environment (see the virtualenv package) with all
required dependencies, so that ./bin/buildout
does not download and
installs all of them every time you cleanup. This technique should allow you
to quickly clean-up and re-start your working environment which is useful
during development.
In order to fetch currently needed dependencies, run:
$ ./bin/buildout #to setup once
$ ./bin/pip freeze > requirements.txt
Examine the file requirements.txt
and remove packages you are either
developing locally (e.g., all that are under src
) or that you think you
don’t need. The command pip freeze
reports all installed packages and not
only those which are needed by your project. If the Python prompt you used
for bootstrapping already had a good set of packages installed, you may see
them there.
Once you have a satisfying requirements.txt
file, you may proceed to
recreate a virtualenv you’ll use for your development. Just call:
$ virtualenv ~/work/beat-env #--system-site-packages
To create the virtual environment. This new environment does not contain
system packages by default. You may override that by specifying
--system-site-packages
as suggested above. Then, install the required
packages on your new virtual environment:
$ ~/work/beat-env/bin/pip install -r requirements.txt
After that step is done, your virtual environment is ready for deployment.
You may now start from scratch to develop beat.web
taking as base the
Python interpreter on your virtualenv:
$ cd beat.web
$ git clean -fdx #full clean-up
$ ~/work/beat-env/bin/python bootstrap-buildout.py
$ ./bin/buildout
You’ll realize the buildout step now takes considerably less time and you may
repeat this last step as much as needed. pip
is a very flexible tool and
you may use it to manage the virtualenv installing and removing packages as
needed.
5.2. Documentation¶
The documentation project is divided in 3 parts. The user guide is the only one
which is automatically built as part of the buildout
procedure. The API and
administrators guide need to be manually compiled if required.
To build the API documentation, just do:
$ ./bin/sphinx-apidoc --separate -d 2 --output=doc/api/api beat beat/web/*/migrations beat/web/*/tests
$ ./bin/sphinx-build doc/api html/api
To build the administrator guide, just do:
$ ./bin/sphinx-build doc/admin html/admin
The above commands will build the stated guides, in HTML format, and dump
results into your local directory html
. You may navigate then to that
directory and, with your preferred web browser, open the file index.html
to
browse the available documentation.
The basic user guide which includes information for users of the platform, is
built automatically upon buildout
. If you wish to build it and place it
alongside the other guides, you may do it as well like this:
$ ./bin/sphinx-build doc/user html/user
5.3. Unit Testing¶
After installation, it is possible to run a suite of unit tests to check for the installation sanity. To do so, use:
$ ./bin/django test --settings=beat.web.settings.test -v 1
You may pass filtering criteria to just launch tests for a particular set of
beat.web
applications. For example, to run tests only concerning
beat.web.toolchains
, run:
$ ./bin/django test --settings=beat.web.settings.test -v 1 beat.web.toolchains.tests
To measure coverage, you must set an environment variable for nose:
$ ./bin/coverage run --source='./beat/web' ./bin/django test --settings=beat.web.settings.test
$ ./bin/coverage report
Or, to generate an HTML report:
$ ./bin/coverate html
Tip
You may significatively speed-up your testing by re-using the same test
database from run to run. In order to do this, just specify the flag
--keepdb
when you run your tests:
$ ./bin/django test --settings=beat.web.settings.test -v 1 --keepdb
In this case, Django will create and keep a test database called
test.sql3
on your current directory. You may delete it when you’re done.
5.4. End-to-End Testing¶
Protractor is an e2e (end-to-end) testing tool for web apps. Protractor runs tests through Selenium using a real browser, and as such needs a headed environment and a compatible browser installed.
Warning
Protractor will open a new browser window in the foreground when it is started.
5.4.1. Setup¶
There are two system dependencies to run Selenium:
Java 8 must be available in your PATH
If you want to run the testing in a GNOME environment, you need GConf
Download/update Protractor’s dependencies into the local repository (Selenium & more):
./bin/webdriver-manager update
5.4.2. Running tests with the provided script¶
The protractor.sh
script is a one-liner to run Protractor tests. It handles database creation/saving/restoring and manages the required local server processes. However, it assumes several things:
It is being ran in the top directory of the
beat.web
repositoryThe repository has already ran
./bin/buildout
successfully and with default development configurationProtractor’s
.conf
file is./protractor-conf.js
No additional arguments need to be passed to
webdriver-manager
or Djangorunserver
Django uses
./django.sql3
as the databaseIf
./template.django.sql3
does not exist, the default database generated by./bin/django install
is sufficient for testing the basic tests. However, some tests will fail and it is suggested to provide a database with experiments that have been ran successfully.
5.4.3. Manual test running¶
If the protractor.sh
script won’t work, one can test manually.
The webdriver-manager
must be running while testing. To run tests using a local BEAT web server, you must have the BEAT web server up as well.
5.4.3.1. Starting the webdriver server¶
Start the webdriver server in a separate shell (or append `` &`` to run it as a background process in the current shell)
./bin/webdriver-manager start
Important
You may only have 1 webdriver manager running at once.
After the webdriver finishes initialization, you can run tests
./bin/protractor protractor-conf.js
If you started your webdriver server as a background process, you can kill all webdriver processes
pkill -f webdriver-manager
5.4.4. Understanding the output of Protractor¶
By default Protractor prints to STDOUT
. If a test passes, nothing is printed about that particular test. If a test fails, Protractor will print more information about the failure, including the specific test, type of failure that occurred, and a stack trace. At the end of testing, Protractor will print a summary of the test run.
5.4.4.1. Saving test results¶
Beyond simply piping Protractor’s output to a file, you may enable detailed logging via a specified JSON file. Just uncomment the relevant line in protractor-conf.js
and optionally change the output file location:
//resultJsonOutputFile: './protractor-test-results.json'
5.4.5. Adding your test to Protractor¶
The configuration file detailing the test files is protractor-conf.js
. The specs
field is a comma-separated list of test files - just add your new test file to the list and run protractor again.
For example, to add the test file example-spec.js
:
Before
specs: [ './beat/web/reports/static/reports/test/test-spec.js' ],
After
specs: [ './beat/web/reports/static/reports/test/test-spec.js', 'example-spec.js' ],
5.4.6. Overriding Protractor’s browser choices¶
In protractor-conf.js
, add a multiCapabilities
option in the following format:
multiCapabilities: [
{
browserName: '<browser name 1>'
},
{
browserName: '<browser name 2>'
},
...
]
Note
You may need to download your browsers’ WebDrivers separately - see the official Selenium docs.
5.4.7. Writing Protractor tests¶
Protractor uses and expects tests to use the Jasmine BDD testing framework. For a tutorial on writing Protractor tests, see the official Protractor tutorial. Protractor also has documentation on their website.
5.4.7.1. BEAT platform & Protractor’s Angular support¶
By default, Protractor assumes that the tested website will use Angular in a particular fashion to more intelligently detect a page that has finished rendering. However, the BEAT platform does not use Angular this way, and Protractor will hang forever. To tell Protractor not to assume this compatibility, add the following line at the top of each top-level describe
block in your test files:
browser.ignoreSynchronization = true;
5.5. Instantiating and Starting a Development System¶
For a simple (development) system, the default settings on
beat/web/settings/settings.py
should work out of the box. These settings:
Instantiate the web service on the local host under port 8000 (the address will be
http://127.0.0.1:8000
Use an SQLITE3 database named
django.sql3
located on the current working directoryRun with full debug output
It sets the working BEAT prefix to
./web_dynamic_data
A single user, called
user
will be setup into the system. This user will have administrative powers.
If you need to tweak these settings, just edit the file
beat/web/settings/settings.py
. You may also consult the Django
documentation for detailed information on other settings.
Once the Django settings are tweaked to your liking, you can run a single command to fully populate the development webserver with test databases, toolchains, algorithms and experiments:
$ ./bin/django install -v1
Note
Concerning databases installed by this command, we only explain the platform how to access their data. It does not download the raw data for the databases that you must procure yourself through the relevant web sites (checkout the database pages on the Idiap instance of the BEAT platform for details).
Note
If you need to specify your own path to the directories containing the databases, you could just create a simple JSON file as follows:
{
"atnt/1": "/remote/databases/atnt",
"banca/2": "/remote/databases/banca"
}
Then just use the previous script with the option --database-root-file
:
$ ./bin/django install -v1 --database-root-file=MYFILE.json
By default, paths to the root of all databases are set to match the Idiap Research Institute filesystem organisation.
Note
For every installed database, you’ll need to generate their data indices, which allows the platform to correctly parallelize algorithms. To do so, for every combination of database and version you wish to support, run the following command:
$ ./bin/beat -p prefix db index <name>/<version>
Replacing the strings <name>
by the name of the database you wish to dump
the indices for, together with the version in <version>
. For example, to
dump the indices for the AT&T database, version 1, do the following:
$ ./bin/beat -p prefix db index atnt/1
Once the contributions and users are in place, you’re ready to start the test server:
$ ./bin/django runserver
At this point, the platform can be accessed by typing the URL
http://127.0.0.1:8000
in a web browser on the machine the server is
running.
Note
To use a dedicated database server such as PostgreSQL, it is sufficient
to configure its Django-like settings in beat/web/settings/settings.py
,
assuming the the database server is operational.
5.5.1. All-in-one Platform¶
The BEAT platform is composed of 3 application types that run in synchrony to
create, store and process your experiments: the web server, the scheduler and
one or more workers. The web server is used by you to create and launch
experiments. The scheduler assigns experiment blocks (actually
beat.web.backend.JobSplit
’s) to run in one of the available
workers, respecting user quotas and worker limitations. The worker runs the
user algorithms installed on each block upon scheduling, notifying the web
server when it’s done.
The base software framework and models that allow the 3 applications to run cooperatively are described in one single place: the Django models and the central database of this package. Effectively, it means this package contains all information that is required to run the 3 types of applications. The applications “communicate” between each other using the shared Django database, reading and modifying objects as experiments are assigned and treated. Several deployment scenarios are therefore possible and you must use the one most suited for your requirements.
In order to start the system, just run:
$ ./bin/django runserver
Once the Django development web server is up and running, open a browser and navigate to http://127.0.0.1:8000. Login with an account with administrative rights and click on the scheduler icon, using the omni-bar, on the top of any page. Use the “Helper panel” available to launch one-off or repetitive scheduling and/or worker activities. In this case, both the scheduling and worker activities run in the context of the web server process.
5.5.2. Discrete Platform using Localhost¶
It is also possible to run each of the applications as separated processes. Here is how to do it.
Start the web service normally:
$ ./bin/django runserverStart the full scheduling setup:
$ ./bin/django full_scheduling
This will start all elements of the scheduling/working process. Docker can
be used for the worker node passing the --docker
option.
Each element composing the scheduling can also be started separately:
Start a the broker node:
$ ./bin/django broker -v 2Start a single scheduling node:
$ ./bin/django scheduler -v 2Start a worker for your current node:
$ ./bin/django worker -v 2
By default, the applications are configured to figure out paths and
configuration options by themselves. You can override some defaults via the
command line. Just check the output of each of those commands running the
--help
flag on any of them.
5.5.3. Mixing and matching¶
You can mix and match any of the above techniques to run a 4-node system (all-in-one or discrete) to build a test system to suite to your needs. For example, it is possible to launch the scheduling activities using the web server and the page reload trick while launching the worker process separately as per above.
5.5.4. Going full scale¶
In order to transform the development system into a fully scale platform, you will have to create your own maintenance scripts allowing you to automatically start/stop, update and secure the BEAT platform applications across your BEAT web nodes. It is beyond the scope of this documentation to enter into details concerning these. We provide only some tips which we consider important:
Don’t use the SQLite backend on a production system, it does not work well with the concurrency you may generate. Prefer a PostGRES SQL database.
The “cache” directory (see the variable
CACHE_ROOT
on the Django settings file) is shared amongst all applications in the cluster. It is adviseable you use a proper networked filesystem with good synchronisation primitives to avoid issues concerning the production and consumption of data caches between workers living in different nodes.Don’t rely on your memory: script all deployment instructions so that you can do them routinely whenever newer versions come up or you have an issue.
Security: You’ll be running code uploaded by users on your computer. Make sure you properly isolate each of the processes and the backend farm to avoid unpleasant surprises. Some helpers:
Disk access: two main directories are shared across the applications. The cache directory stores intermediary block results. The prefix directory stores user contributions on disk. You may tune the file system access on a distributed BEAT platform to increase its security:
The web server only needs read access to the cache directories. It needs read and write access to the prefix directory in order to store user contributions.
The scheduler needs read/write access to the cache directory. It does not use the prefix directory and does not read or treat user contributions. The scheduler also need access to the Django database.
The workers need read/write access to the cache directory and read access to the prefix directory. The workers also need access to the Django database.
The processes launched by the worker need to have similar permissions as their worker. The user executable though, should have demoted permissions to increase security. For example, no need to access the Django database (or the settings file), the prefix or the cache. All is done via the parent process. In order to implement this, the easiest is to make sure the worker process is run by an unpriviledged user and a group with the right access permissions, allowing it to access the Django database (and the Django settings file), the prefix and the cache. This will be inherited by the processes launched by the worker, that will serve data to the processes wrapping the user code. To demote the user process, just set the group id of the environment executable to an unpriviledged group. This way, the following security chain is achieved (pseudo user/groups):
worker -> process -> environment exec(user code) [nobody:beat] [nobody:beat] [nobody:nogroup]It is a requirement by the BEAT platform that this process chain belongs to the same user. Signals for stopping or killing the applications in the chain if necessary.
If you don’t do anything, then the user code will be run in a process with the same privileges as the worker application.
E-mail privileges: e-mailing maybe configured as part of the Django standard logging facilities or used to report experiment completion and other platform activity. While, by default, all node types have access to Django the configuration and can potentially send e-mails, it is wiser to use a Django extension such as Post-office to centralize e-mail sending to one node, avoiding potential spam.
User processes: user code is run in isolated processes launched by the children of worker processes. Because the user code process does not require disk access to either the prefix or the cache, it should run without access to those resources in order to improve the platform security. This may be achieved by running user processes in
chroot
’ed environments or making sure user code is launched with a user identity which has far fewer access permissions than the worker process itself. Have a look at the--help
output of theworker
application for more information and examples.
You may contact our support in case you need advice concerning this topic.
5.6. Development Notes¶
5.6.1. Backup and Restore¶
The BEAT platform can be backed-up and restore easily. These commands allow for safe information keeping, but also to copy over the state of a given deployment to a local development server, where more thorough tests can be performed while tracking a bug or improving performance.
It is easy to quickly setup a local system for development, taking as base the current state of a production system. Here are some instructions:
Before starting, make sure you have gone through, at least once, over the instructions above. It explains the very basic setup required for a complete development environment.
Dump and back-up your current production BEAT database:
[production]$ ./bin/django backup
[Optional] If you have made important modifications between the contents available at your production server and your currently checked-out source, you’ll need to run Django migrations on data imported from the production server. If you need to do this, make sure you don’t have unapplied commits to your local development package and reset it to the production tag:
[development]$ git checkout <production-tag>
Note
You can figure you the production tag by looking at the footer of the BEAT website. The corresponding tag name is found by prefixing a
v
before the version number. For example, the tag for version0.8.2
of the platform isv0.8.2
.Also make sure to revert all dependent packages, so as to recreate the state of the database schema as on the production site.
Remove the current local development database so that the restore operation can start from scratch:
[development]$ rm -rf django.sql3 web_dynamic_data
Copy the backup tarball from the production server and restore it locally:
[development]$ scp root@<beatproductionmachine>:backups/<backup-filename>.tar.bz2 [development]$ ./bin/django restore <backup-filename>.tar.bz2
At this point, you have recreated a copy of your production system locally, on your SQLite3 database.
Reset queue configuration to allow for local running.
You may, optionally, reset the queue configuration of your installation so that the environment you have is compatible with your development machine, so that you can immediately run experiments locally. To do so, use the
qsetup
Django command:[development]$ ./bin/django qsetup --reset
Re-checkout the tip:
$ git co master #or any other branch
Apply migrations:
$ ./bin/django migrate
At this point, you should have a complete development setup with all elements available on the production system installed locally. This system is fully capable of running experiments locally using your machine.
5.6.2. Testing Django Migrations¶
Django migrations, introduced in version 1.7, is a useful feature for automatically migrating your database to new model schemas, if you get it right. Here is a recipe to make sure your migrations will work on your production system, allowing for quick and repetitive test/fix cycles.
The key idea is that we follow the setup for the administratorguide-installation-localhost-snapshot and then, locally backup our database and prefix so that we can quickly reproduce the migration test loop.
Make sure you go through the administratorguide-installation-localhost-snapshot instructions above (up to step 6 only).
Make a copy of the SQLite3 database:
$ cp -a django.sql3 django.sql3.backup
This backup will allow you to quickly test the migrations w/o having to checkout the production version anymore.
Also, create a temporary git repository of
web_dynamic_data
, so you can cross-check changes and reset it in case of problems:$ cd web_dynamic_data $ git init . $ git add . $ git commit -m "Initial commit" $ cd ..
Go back to the HEAD or branch you were developping before:
$ git checkout HEAD
Here is how to test/fix your migrations:
Run “django migrate”:
$ ./bin/django migrate
Check your database by visually inspecting it on the django web admin or by manually dumping it.
If a problem is detected, fix it and revert the state:
$ cp -af django.sql3.backup django.sql3 $ cd web_dynamic_data && git reset --hard HEAD && git clean -fdx . \ & cd ..
Note
Tip: Write the above lines in a shell script so it is easy to repeat.
Go back to a. and restart.
5.7. Issues¶
If you find problems concerning this package, please post a message to our group mailing list. Currently open issues can be tracked at our gitlab page.