Experiments

An experiment is the reunion of algorithms, datasets, a toolchain and parameters that allow the system to schedule and run the prescribed recipe to produce displayable results. Defining a BEAT experiment can be seen as configuring the processing blocks of a toolchain, such as selecting which database, algorithms and algorithm parameters to use.

The graphical interface of BEAT provides user-friendly editors to configure the main components of the system (for example: experiments, data formats, etc.), which simplifies their JSON declaration definition. One needs only to declare an experiment using the described specifications when not using this graphical interface.

Note

Naming Convention

Experiments are named using five values joined by a / (slash) operator:

  • username: indicates the author of the experiment

  • toolchain username: indicates the author of the toolchain used for that experiment

  • toolchain name: indicates the name of the toolchain used for that experiment

  • toolchain version: indicates the version (integer starting from 1) of the toolchain used for the experiment

  • name: an identifier for the object

Each tuple of these five components defines a unique experiment name.

Declaration of an experiment

An experiment is declared in a JSON file, and must contain at least the following fields:

{
    "datasets": [
    ],
    "blocks": [
    ],
    "analyzers": [
    ],
    "globals": [
    ]
}

Declaration of the dataset(s)

The dataset inputs are defined by the toolchain. However, the toolchain does not describe which data to plug in each dataset input.

This is the role of the field datasets from an experiment. For each dataset, an experiment must specify three attributes as follows:

{
    "datasets": [
        "templates": {
            "set": "templates",
            "protocol": "idiap",
            "database": "atnt/1"
        },
        ...
    ],
    ...
}

The key of an experiment dataset must correspond to the desired dataset name from the toolchain. Then, three fields must be given:

  • database: the database name and version

  • protocol: the protocol name

  • set: the dataset name of this database to associate to this toolchain dataset

Declaration of the block(s)

The blocks are defined by the toolchain. However, the toolchain does not describe which algorithm to run in each processing block, and how each of these algorithms are parametrized.

This is the role of the field blocks from an experiment. For each block, an experiment must specify four attributes as follows:

{
    "blocks": {
        "linear_machine_training": {
            "inputs": {
                "image": "image"
            },
            "parameters": {},
            "algorithm": "tutorial/pca/1",
            "outputs": {
                "subspace": "subspace"
            }
        },
        ...
    },
    ...
}

The key of an experiment block must correspond to the desired block from the toolchain. Then, four fields must be given:

  • algorithm: the algorithm to use (author_name/algorithm_name/version)

  • inputs: the list of inputs. The key is the algorithm input, while the value is the corresponding toolchain input.

  • outputs: the list of outputs. The key is the algorithm output, while the value is the corresponding toolchain output.

  • parameters: the algorithm parameters to use for this processing block

Note

Algorithms, Datasets and Blocks

While configuring the experiment, your objective is to fill-in all containers defined by the toolchain with valid datasets and algorithms or analyzers. BEAT will check connected datasets, algorithms and analyzers produce or consume data in the right format. It only presents options which are compatible with adjacent blocks.

For example, if you chose dataset A for block train of your experiment that outputs objects in the format user/format/1, then the algorithm running on the block following train, must consume user/format/1 on its input. Therefore, the choices for algorithms that can run after train become limited at the moment you chose the dataset A. The configuration system will dynamically update to take those constraints into consideration every time you make a selection, increasing the global constraints for the experiment.

Declaration of the analyzer(s)

Analyzers are similar to algorithms, except that they run on toolchain endpoints. There configuration is very similar to the one of regular blocks, except that they have no outputs:

{
    "analyzers": {
        "analysis": {
            "inputs": {
                "scores": "scores"
            },
            "algorithm": "tutorial/postperf/1"
        }
    },
}

Global parameters

Each block and analyzer may rely on its own local parameters. However, several blocks may rely on the exact same parameters. In this case, it is more convenient to define those globally.

For an experiment, this is achieved using the globals field in its JSON declaration. For instance:

{
    "globals": {
        "queue": "Default",
        "environment": {
            "version": "1.1.0",
            "name": "Scientific Python 2.7"
        },
        "tutorial/pca/1": {
            "number-of-components": "5"
        }
    },
    ...
}