.. vim: set fileencoding=utf-8 : .. Copyright (c) 2016 Idiap Research Institute, http://www.idiap.ch/ .. .. Contact: beat.support@idiap.ch .. .. .. .. This file is part of the beat.docs module of the BEAT platform. .. .. .. .. Commercial License Usage .. .. Licensees holding valid commercial BEAT licenses may use this file in .. .. accordance with the terms contained in a written agreement between you .. .. and Idiap. For further information contact tto@idiap.ch .. .. .. .. Alternatively, this file may be used under the terms of the GNU Affero .. .. Public License version 3 as published by the Free Software and appearing .. .. in the file LICENSE.AGPL included in the packaging of this file. .. .. The BEAT platform is distributed in the hope that it will be useful, but .. .. WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY .. .. or FITNESS FOR A PARTICULAR PURPOSE. .. .. .. .. You should have received a copy of the GNU Affero Public License along .. .. with the BEAT platform. If not, see http://www.gnu.org/licenses/. .. .. _beat-system: ============== Introduction ============== The BEAT framework describes experiments through fundamental building blocks (object types): * **Data formats**: the specification of data which is transmitted between blocks of a toolchain; * **Algorithms**: the program (source-code or binaries) that defines the user algorithm to be run within the blocks of a toolchain; * **Libraries**: routines (source-code or binaries) that can be incorporated into other libraries or user code on algorithms; * **Databases** and **Datasets**: means to read raw-data from a disk and feed into a toolchain, respecting a certain usage protocol; * **Toolchains**: the definition of the data flow in an experiment; * **Experiments**: the reunion of algorithms, datasets, a toolchain and parameters that allow the system to run the prescribed recipe to produce displayable results. Instances of these building blocks are represented in JSON_ using a specific schema. Multiple objects can be stored on disk as files following a directory structure. .. _beat-system-example: A Simple Example ================ The next figure shows a representation of a very simple toolchain, composed of only a few color-coded components: * To the left, the reader can identify two datasets, named ``set`` and ``set2`` respectively. They emit data (of, at this point, an unspecified type) into the following processing blocks; * Following the datasets, two processing blocks named ``echo1`` and ``echo2`` receive the input from the dataset and emit data into a third block, named ``echo3``; * The final component receives the inputs emitted from ``echo3`` and it is called ``analysis``. Because this block has no output, it is considered a final block, from which BEAT expects to collect experiment results (that, at this point, are also unspecified). .. Simple toolchain representation for the BEAT platform .. image:: img/toolchain-triangle.png The toolchain only defines the very basic data flow and connections that must be respected by experiments. It does not define what is the type of data that is produced or consumed by any of the existing blocks, the algorithms or databases and protocols to use. From the toolchain description, it is possible to devise a possible execution order, by taking into consideration the imposed data flow. In this simple example, the datasets called ``set`` and ``set2`` may yield data in parallel, allowing the execution of blocks ``echo1`` and ``echo2``. Block ``echo3`` must come next, before the ``analysis`` block, which comes by last. In typical problems that can be implemented in the BEAT, datasets are composed of multiple instances of raw data. For example, these could be images for an object recognition problem, speech sequences for a speech recognition task or model data for biometric recognition tasks. Computing blocks must process these data by looping on these atomic data samples. The color-coding in the figure indicates this extra data-flow information: for each dataset in the drawing, it indicates how blocks loop on their atomic data. For the proposed, toolchain, we can observe that blocks ``echo1``, ``echo3`` and ``analysis`` loop over the "raw" data samples from ``set``, while ``echo2`` loop over the samples from ``set2``. The next figure shows a complete experimental setup for the above toolchain. The input blocks use a given database, called ``simple/1`` (the name is ``simple`` and the version is ``1``), using one of its protocols called ``protocol``. Each block is set to a specific data set inside the database/protocol combination. Both datasets on this database/protocol yield objects of type ``beat/integer/1`` (a format called ``integer`` from user ``beat``, version ``1``), which are consumed by algorithms running on the next blocks. The block ``echo1`` uses the algorithm ``user/integers_echo/1`` (an algorithm called ``integers_echo`` from user ``user``, version ``1``) and also yields ``beat/integer/1`` objects. The same is valid for the algorithm running on block ``echo2``. The algorithm for block ``echo3`` cannot possibly be the same - it must deal with 2 inputs, generated by blocks looping on different raw data. We'll be more detailed about conceptual differences while writing algorithms which are not synchronized with all of their inputs next. For this introduction, it suffices you understand the organization of algorithms in an experiment is constrained by its neighboring block requirements as well as the input and output data flows determined for a given block. Block ``echo3`` yields elements to the algorithm on the ``analysis`` block, called ``user/integers_echo_analyzer/1``, which produces a single result named ``out_data``, which is of type ``int32`` (that is, a signed integer with 32 bits). Algorithms that do not communicate with other algorithms are typically called ``analyzers``. They are set-up on the end of experiments so as to produce quantifiable results you can use to measure the performance of your experimental setup. .. Simple experiment representation for the BEAT platform .. image:: img/experiment-triangle.png .. _beat-system-design: Design ====== The next figure shows a UML representation of main BEAT components, showing some of their interaction and interdependence. Experiments use algorithms, data sets and a toolchain in order to define a complete runnable setup. Data sets are grouped into protocols which are, in turn, grouped into databases. Algorithms use data formats to defined input and output patterns. Most objects are subject to versioning, possess a name and belong to a specific user. By contracting those markers, it is possible to define unique identifiers for all objects in the platform. In the example above, you can identify some examples. .. High-level component interaction in the BEAT platform core .. graphviz:: digraph hierarchy { graph [compound=true, splines=polyline] node [shape=record, style=filled, fillcolor=gray95] edge [] subgraph "algorithm_cluster" { 1[label = "{Dataformat|...|+user\n+name\n+version}"] 2[label = "{Algorithm|...|+user\n+name\n+version\n+code\n+language}"] 6[label = "{Library|...|+user\n+name\n+version\n+code\n+language}"] } subgraph "database_cluster" { graph [label=datasets] 3[label = "{Database|...|+name\n+version}"] 4[label = "{Protocol|...|+template}"] 5[label = "Set"] } subgraph "experiment_cluster" { graph [label=experiments] 7[label = "{Toolchain|+execution_order()|+user\n+name\n+version}"] 8[label = "{Experiment|...|+user\n+label}"] } 1->1 [label = "0..*", arrowhead=empty] 2->1 [label = "1..*", arrowhead=empty] 2->6 [label = "0..*", arrowhead=empty] 6->6 [label = "0..*", arrowhead=empty] 4->3 [label = "1..*", arrowhead=odiamond] 5->4 [label = "1..*", arrowhead=odiamond] 5->1 [label = "1..*", arrowhead=empty] 8->7 [label = "1..1", arrowhead=empty] 8->2 [label = "1..*", arrowhead=empty] 8->5 [label = "1..*", arrowhead=empty] } The BEAT framework provides a graphical user interface so that users can program data formats, algorithms, toolchains and define experiments rather intuitively. It also provides a command-line interface which is used in parallel with the GUI for running experiments, creating, modifying, or deleting objects and interacting with the BEAT platform. .. include:: links.rst