8. Extending the platform

A running platform can be extended in several ways, depending to the specific needs of the particular instance. We devise this section following a Troubleshooting Guide style. Counters installed on the BEAT platform software allow for diagnosing and understanding of similar issues.

8.1. Not enough processing power

Situation: A lot of experiments are performed at the same time, and jobs are frequently put in a waiting queue while all the processing nodes are used.

Consequence: The delay between the scheduling of an experiment by a user and the availability of the results is too long.

Actions: Add more processing nodes (dedicated computers), and tell the Scheduler about them (i.e. how to establish a connection to them).

8.2. Not enough room in the Data Cache

Situation: A lot of different experiments are performed, and data is frequently removed from the Cache to make room for another experiment’s data, only to be regenerated a short while later due to an experiment similar to the one that generated it in the first place.

Consequence: The delay between the scheduling of an experiment by a user and the availability of the results is too long.

Actions: Increase the size of the Data Cache, so more data can be keep more longer

8.3. The Web Server can’t process all the requests fast enough

Situation: A lot of users are using the website at the same time, flooding the Web Server with too many requests.

Consequence: The website is slow, feels not responsive enough and the users receive timeouts notifications.

Actions: Several different solutions (of increasing complexity) can be implemented:

  • Add a web server dedicated to static data (images, CSS, Javascript libraries). This will remove a fair chunk of the traffic from the main web server, but might not suffice if the problem is mainly caused by requests for dynamic data that takes time to be generated.

  • Add a caching mechanism to the website (for example: a memcached server, see http://memcached.org/) so the frequent requests aren’t reprocessed (database queries, template rendering, business logic) every time if it is not necessary.

  • Add a load-balancing “Round Robin DNS” mechanism (setting up multiple DNS records for the same hosting domain, each pointing to a different web server). Very simple to set up, mostly does not require any special server setup at all. But there’s no nice way to “switch” users down to another server if one server is overloaded or down, and making changes to the DNS can take days to propagate across DNS caches.

  • Use a front-end server or dedicated hardware device to do the load-balancing over multiple web servers. Most load balancer solutions include tools for monitoring server load and appropriately sending requests to the least-busy servers, and performing proper failover. But the set-up is tricky, and load-balancer hardware is often expensive.

8.4. The Database Server can’t process all the requests fast enough

Situation: Too much requests are sent to the Database Server by the website.

Consequence: The website is slow and feels not responsive enough.

Actions: Several different solutions (of increasing complexity) can be implemented:

  • Add a caching mechanism to the website (for example: a memcached server, see http://memcached.org/) so the frequent requests aren’t reprocessed (database queries, template rendering, business logic) every time if it is not necessary.

  • Use database replication: one Database Server acts as the master, others as slaves. Write operations are performed on the master, read operations are spread across the slaves. Requires some coding to adapt the system to the specific installation (for example, to handle the delays between the writing of some data on the master and its copy on all the slaves).

  • Use database sharing: each host keeps a separate, unique chunk of each database table. The website use a deterministic function decides where any given record should reside, so the load distribution is effectively stateless and can scale up to huge numbers of database servers. Requires some code changes.