Introduction¶

The BEAT platform is a web-based system for certifying results for software-based data-driven workflows that can be sub-divided functionally (into processing blocks). The platform takes all burden of hosting data and software away from users by providing a capable computing farm that handles both aspects graciously. Data is kept sequestered inside the platform. The user provides the description of data formats, algorithms, data flows (also known as toolchains) and experimental details (parameters), which are mashed inside the platform to produce beautiful results, easily exportable into computer graphics or tables for scientific reports.

It is intended as a fundamental building-block in Reproducible Research, allowing academic and industrial parties to prescribe system behavior and have it reproducible through software, hardware and staff generations. Here are some known applications:

Challenges and competitions on defined data, protocols and workflow components;
Study group exercises and exams;
Support to publication submission;
System and algorithm performance optimization;
Reproduction of experiments through communities;
Support for industry-academy relationship.

This package, in particular, defines a set of core components useful for the whole platform: the building blocks used by all other packages in the BEAT software suite. These are:

Data formats: the specification of data which is transmitted between blocks of a toolchain;
Libraries: routines (source-code or binaries) that can be incorporated into other libraries or user code on algorithms;
Algorithms: the program (source-code or binaries) that defines the user algorithm to be run within the blocks of a toolchain;
Databases and Datasets: means to read raw-data from a disk and feed into a toolchain, respecting a certain usage protocol;
Toolchain: the definition of the data flow in an experiment;
Experiment: the reunion of algorithms, datasets, a toolchain and parameters that allow the platform to schedule and run the prescribed recipe to produce displayable results.

A Simple Example¶

The next figure shows a representation of a very simple toolchain, composed of only a few color-coded components:

To the left, the reader can identify two datasets, named set and set2 respectively. They emit data (of, at this point, an unspecified type) into the following processing blocks;
Following the datasets, two processing blocks named echo1 and echo2 receive the input from the dataset and emit data into a third block, named echo3;
The final component receives the inputs emitted from echo3 and it is called analysis. Because this block has no output, it is considered a final block, from which the BEAT platform expects to collect experiment results (that, at this point, are also unspecified).

$digraph "user/triangle/1" { graph [compound=true rankdir=LR splines=polyline] subgraph dataset_cluster { graph [label=datasets rank=same] set [label=<<table border="0" cellspacing="0" bgcolor="#7878ff"><tr><td><font color="#000000"><b><u>set</u></b></font></td><td><table border="0" cellspacing="5" bgcolor="#7878ff"><tr><td port="output_out" bgcolor="#0000FF" border="1"><font color="#ffffff">out</font></td></tr></table></td></tr></table>> fontname=Helvetica fontsize=12 shape=none] set2 [label=<<table border="0" cellspacing="0" bgcolor="#e2ffc7"><tr><td><font color="#000000"><b><u>set2</u></b></font></td><td><table border="0" cellspacing="5" bgcolor="#e2ffc7"><tr><td port="output_out" bgcolor="#6AA84F" border="1"><font color="#000000">out</font></td></tr></table></td></tr></table>> fontname=Helvetica fontsize=12 shape=none] } echo1 [label=<<table border="0" cellspacing="0" bgcolor="#7878ff"><tr><td><table border="0" cellspacing="5" bgcolor="#7878ff"><tr><td port="input_in" bgcolor="#0000FF" border="1"><font color="#ffffff">in</font></td></tr></table></td><td><font color="#000000"><b><u>echo1</u></b></font></td><td><table border="0" cellspacing="5" bgcolor="#7878ff"><tr><td port="output_out" bgcolor="#0000FF" border="1"><font color="#ffffff">out</font></td></tr></table></td></tr></table>> fontname=Helvetica fontsize=12 shape=none] set:output_out -> echo1:input_in [label="" color="#0000FF" fontcolor="#0000FF" fontname=Helvetica fontsize=12] echo2 [label=<<table border="0" cellspacing="0" bgcolor="#e2ffc7"><tr><td><table border="0" cellspacing="5" bgcolor="#e2ffc7"><tr><td port="input_in" bgcolor="#6AA84F" border="1"><font color="#000000">in</font></td></tr></table></td><td><font color="#000000"><b><u>echo2</u></b></font></td><td><table border="0" cellspacing="5" bgcolor="#e2ffc7"><tr><td port="output_out" bgcolor="#6AA84F" border="1"><font color="#000000">out</font></td></tr></table></td></tr></table>> fontname=Helvetica fontsize=12 shape=none] set2:output_out -> echo2:input_in [label="" color="#6AA84F" fontcolor="#6AA84F" fontname=Helvetica fontsize=12] echo3 [label=<<table border="0" cellspacing="0" bgcolor="#7878ff"><tr><td><table border="0" cellspacing="5" bgcolor="#7878ff"><tr><td port="input_in" bgcolor="#0000FF" border="1"><font color="#ffffff">in</font></td></tr><tr><td port="input_in2" bgcolor="#0000FF" border="1"><font color="#ffffff">in2</font></td></tr></table></td><td><font color="#000000"><b><u>echo3</u></b></font></td><td><table border="0" cellspacing="5" bgcolor="#7878ff"><tr><td port="output_out" bgcolor="#0000FF" border="1"><font color="#ffffff">out</font></td></tr></table></td></tr></table>> fontname=Helvetica fontsize=12 shape=none] echo1:output_out -> echo3:input_in [label="" color="#0000FF" fontcolor="#0000FF" fontname=Helvetica fontsize=12] echo2:output_out -> echo3:input_in2 [label="" color="#0000FF" fontcolor="#0000FF" fontname=Helvetica fontsize=12] analysis [label=<<table border="0" cellspacing="0" bgcolor="#7878ff"><tr><td><table border="0" cellspacing="5" bgcolor="#7878ff"><tr><td port="input_in" bgcolor="#0000FF" border="1"><font color="#ffffff">in</font></td></tr></table></td><td><font color="#000000"><b><u>analysis</u></b></font></td></tr></table>> fontname=Helvetica fontsize=12 shape=none] echo3:output_out -> analysis:input_in [label="" color="#0000FF" fontcolor="#0000FF" fontname=Helvetica fontsize=12] subgraph analyzer_cluster { graph [label=analyzers rank=same] } }$

The toolchain only defines the very basic data flow and connections that must be respected by experiments. It does not define what is the type of data that is produced or consumed by any of the existing blocks, the algorithms or databases and protocols to use. From the toolchain description, it is possible to devise a possible execution order, by taking into consideration the imposed data flow. In this simple example, the datasets called set and set2 may yield data in parallel, allowing the execution of blocks echo1 and echo2. Block echo3 must come next, before the analysis block, which comes by last.

In typical problems that can be implemented in the BEAT platform, datasets are composed of multiple instances of raw data. For example, these could be images for an object recognition problem, speech sequences for a speech recognition task or model data for biometric recognition tasks. Computing blocks must process these data by looping on these atomic data samples. The color-coding in the figure indicates this extra data-flow information: for each dataset in the drawing, it indicates how blocks loop on their atomic data. For the proposed, toolchain, we can observe that blocks echo1, echo3 and analysis loop over the “raw” data samples from set, while echo2 loop over the samples from set2.

The next figure shows a complete experimental setup for the above toolchain. The input blocks use a given database, called simple/1 (the name is simple and the version is 1), using one of its protocols called protocol. Each block is set to a specific data set inside the database/protocol combination. Both datasets on this database/protocol yield objects of type beat/integer/1 (a format called integer from user beat, version 1), which are consumed by algorithms running on the next blocks. The block echo1 uses the algorithm user/integers_echo/1 (an algorithm called integers_echo from user user, version 1) and also yields beat/integer/1 objects. The same is valid for the algorithm running on block echo2.

The algorithm for block echo3 cannot possibly be the same - it must deal with 2 inputs, generated by blocks looping on different raw data. We’ll be more detailed about conceptual differences while writing algorithms which are not synchronized with all of their inputs next. For this introduction, it suffices you understand the organization of algorithms in an experiment is constrained by its neighboring block requirements as well as the input and output data flows determined for a given block.

Block echo3 yields elements to the algorithm on the analysis block, called user/integers_echo_analyzer/1, which produces a single result named out_data, which is of type int32 (that is, a signed integer with 32 bits). Algorithms that do not communicate with other algorithms are typically called analyzers. They are set-up on the end of experiments so as to produce quantifiable results you can use to measure the performance of your experimental setup.

$digraph "user/triangle/1" { graph [compound=true rankdir=LR splines=polyline] subgraph dataset_cluster { graph [label=datasets rank=same] set [label=<<table border="0" cellspacing="0" bgcolor="#7878ff"><tr><td><font color="#000000"><b><u>set</u></b><br/>simple/1<br/><i>protocol:set</i></font></td><td><table border="0" cellspacing="5" bgcolor="#7878ff"><tr><td port="output_out" bgcolor="#0000FF" border="1"><font color="#ffffff">out</font></td></tr></table></td></tr></table>> fontname=Helvetica fontsize=12 shape=none] set2 [label=<<table border="0" cellspacing="0" bgcolor="#e2ffc7"><tr><td><font color="#000000"><b><u>set2</u></b><br/>simple/1<br/><i>protocol:set2</i></font></td><td><table border="0" cellspacing="5" bgcolor="#e2ffc7"><tr><td port="output_out" bgcolor="#6AA84F" border="1"><font color="#000000">out</font></td></tr></table></td></tr></table>> fontname=Helvetica fontsize=12 shape=none] } echo1 [label=<<table border="0" cellspacing="0" bgcolor="#7878ff"><tr><td><table border="0" cellspacing="5" bgcolor="#7878ff"><tr><td port="input_in" bgcolor="#0000FF" border="1"><font color="#ffffff">in</font></td></tr></table></td><td><font color="#000000"><b><u>echo1</u></b><br/>user/integers_echo/1<br/><i>@environment(1) x 1</i></font></td><td><table border="0" cellspacing="5" bgcolor="#7878ff"><tr><td port="output_out" bgcolor="#0000FF" border="1"><font color="#ffffff">out</font></td></tr></table></td></tr></table>> fontname=Helvetica fontsize=12 shape=none] set:output_out -> echo1:input_in [label="beat/integer/1" color="#0000FF" fontcolor="#0000FF" fontname=Helvetica fontsize=12] echo2 [label=<<table border="0" cellspacing="0" bgcolor="#e2ffc7"><tr><td><table border="0" cellspacing="5" bgcolor="#e2ffc7"><tr><td port="input_in" bgcolor="#6AA84F" border="1"><font color="#000000">in</font></td></tr></table></td><td><font color="#000000"><b><u>echo2</u></b><br/>user/integers_echo/1<br/><i>@environment(1) x 1</i></font></td><td><table border="0" cellspacing="5" bgcolor="#e2ffc7"><tr><td port="output_out" bgcolor="#6AA84F" border="1"><font color="#000000">out</font></td></tr></table></td></tr></table>> fontname=Helvetica fontsize=12 shape=none] set2:output_out -> echo2:input_in [label="beat/integer/1" color="#6AA84F" fontcolor="#6AA84F" fontname=Helvetica fontsize=12] echo3 [label=<<table border="0" cellspacing="0" bgcolor="#7878ff"><tr><td><table border="0" cellspacing="5" bgcolor="#7878ff"><tr><td port="input_in" bgcolor="#0000FF" border="1"><font color="#ffffff">in</font></td></tr><tr><td port="input_in2" bgcolor="#0000FF" border="1"><font color="#ffffff">in2</font></td></tr></table></td><td><font color="#000000"><b><u>echo3</u></b><br/>user/integers_echo_ignore/1<br/><i>@environment(1) x 1</i></font></td><td><table border="0" cellspacing="5" bgcolor="#7878ff"><tr><td port="output_out" bgcolor="#0000FF" border="1"><font color="#ffffff">out</font></td></tr></table></td></tr></table>> fontname=Helvetica fontsize=12 shape=none] echo1:output_out -> echo3:input_in [label="beat/integer/1" color="#0000FF" fontcolor="#0000FF" fontname=Helvetica fontsize=12] echo2:output_out -> echo3:input_in2 [label="beat/integer/1" color="#0000FF" fontcolor="#0000FF" fontname=Helvetica fontsize=12] analysis [label=<<table border="0" cellspacing="0" bgcolor="#7878ff"><tr><td><table border="0" cellspacing="5" bgcolor="#7878ff"><tr><td port="input_in" bgcolor="#0000FF" border="1"><font color="#ffffff">in</font></td></tr></table></td><td><font color="#000000"><b><u>analysis</u></b><br/>user/integers_echo_analyzer/1<br/><i>@environment(1) x 1</i></font></td><td><table border="0" cellspacing="5" bgcolor="#7878ff"><tr><td bgcolor="#0000FF" border="1"><font color="#ffffff">out_data<br/>(int32)</font></td></tr></table></td></tr></table>> fontname=Helvetica fontsize=12 shape=none] echo3:output_out -> analysis:input_in [label="beat/integer/1" color="#0000FF" fontcolor="#0000FF" fontname=Helvetica fontsize=12] subgraph analyzer_cluster { graph [label=analyzers rank=same] } }$

Design¶

The next figure shows an UML representation of main BEAT components, showing some of their interaction and interdependence. Experiments use algorithms, data sets and a toolchain in order to define a complete runnable setup. Data sets are grouped into protocols which are, in turn, grouped into databases. Algorithms use data formats to defined input and output patterns. Most objects are subject to versioning, possess a name and belong to a specific user. By contracting those markers, it is possible to define unique identifiers for all objects in the platform. In the example above, you can identify some examples.

$digraph hierarchy { graph [fontname="helvetica", compound=true, splines=polyline] node [fontname="helvetica", shape=record, style=filled, fillcolor=gray95] edge [fontname="helvetica"] subgraph "algorithm_cluster" { 1[label = "{Dataformat|...|+user\n+name\n+version}"] 2[label = "{Algorithm|...|+user\n+name\n+version\n+code\n+language}"] 6[label = "{Library|...|+user\n+name\n+version\n+code\n+language}"] } subgraph "database_cluster" { graph [label=datasets] 3[label = "{Database|...|+name\n+version}"] 4[label = "{Protocol|...|+template}"] 5[label = "Set"] } subgraph "experiment_cluster" { graph [label=experiments] 7[label = "{Toolchain|+execution_order()|+user\n+name\n+version}"] 8[label = "{Experiment|...|+user\n+label}"] } 1->1 [label = "0..*", arrowhead=empty] 2->1 [label = "1..*", arrowhead=empty] 2->6 [label = "0..*", arrowhead=empty] 6->6 [label = "0..*", arrowhead=empty] 4->3 [label = "1..*", arrowhead=odiamond] 5->4 [label = "1..*", arrowhead=odiamond] 5->1 [label = "1..*", arrowhead=empty] 8->7 [label = "1..1", arrowhead=empty] 8->2 [label = "1..*", arrowhead=empty] 8->5 [label = "1..*", arrowhead=empty] }$

The BEAT platform provides a graphical user interface so that you can program data formats, algorithms, toolchains and define experiments rather intuitively. This package provides the core building blocks of the BEAT platform. For expert users, we provide a command-line interface to the platform, allowing such users to create, modify and dispose of such objects using their own private editors. For developers and programmers, the rest of this guide details each of those building blocks, their relationships and how to use such a command-line interface to interact with the platform efficiently.