.. vim: set fileencoding=utf-8 : .. _gridtk.generate: ===================================== Script Generation for Grid Searches ===================================== The thing with scientific discovery is that, sometimes, you need to do a lot of experiments before a reasonable conclusion. These experiments require minor variations in their configuration and submission, possibly to an SGE-enabled facility for processing. This guide explains how to use the script ``jgen``, which helps you in generating multiple experiment configurations for your grid searches. The system supposes that a single experiment is defined in a single file while multiple experiments can be run by somehow executing sequences of these individual configuration files. The script ``jgen`` takes, in its simplistic form, 3 parameters that explain: * The "combinations" of variables that one needs to scan for a search in a YAML_ file * A Jinja2_ template file that explains the setup of each experiment * An output template that explains how to mix the parameters in your YAML_ file with the template and generate a bunch of experiment configurations to run Let's decrypt each of these inputs. YAML Input ---------- The YAML_ input file describes all possible combinations of parameters you want to scan. All root keys that represent lists will be combined in all possible ways to produce, each combination, a "configuration set". A configuration set corresponds to settings for **all** variables in the input template that needs replacing. For example, if your template mentions the variables ``name`` and ``version``, then each configuration set should yield values for both ``name`` and ``version``. For example: .. code-block:: yaml name: [john, lisa] version: [v1, v2] This should yield to the following configuration sets: .. code-block:: python [ {'name': 'john', 'version': 'v1'}, {'name': 'john', 'version': 'v2'}, {'name': 'lisa', 'version': 'v1'}, {'name': 'lisa', 'version': 'v2'}, ] Each key in the input file should correspond to either an object or a YAML list. If the object is a list, then we'll iterate over it for every possible combination of elements in the lists. If the element in question is not a list, then it is considered unique and repeated for each generated configuration set. Example .. code-block:: yaml name: [john, lisa] version: [v1, v2] text: > hello, world! Should yield to the following configuration sets: .. code-block:: python [ {'name': 'john', 'version': 'v1', 'text': 'hello, world!'}, {'name': 'john', 'version': 'v2', 'text': 'hello, world!'}, {'name': 'lisa', 'version': 'v1', 'text': 'hello, world!'}, {'name': 'lisa', 'version': 'v2', 'text': 'hello, world!'}, ] Keys starting with one `_` (underscore) are treated as "unique" objects as well. Example: .. code-block:: yaml name: [john, lisa] version: [v1, v2] _unique: [i1, i2] Should yield to the following configuration sets: .. code-block:: python [ {'name': 'john', 'version': 'v1', '_unique': ['i1', 'i2']}, {'name': 'john', 'version': 'v2', '_unique': ['i1', 'i2']}, {'name': 'lisa', 'version': 'v1', '_unique': ['i1', 'i2']}, {'name': 'lisa', 'version': 'v2', '_unique': ['i1', 'i2']}, ] Jinja2 Template --------------- This corresponds to a file that will have variables replaced for each of the configuration sets generated by your YAML_ file. For example, if your template is a python file that uses the variables this way: .. code-block:: text #/usr/bin/env python print('My name is {{ name }}') print('This is {{ version }}') Then, ``jgen`` will generate 4 output files each with combinations of ``name`` and ``version`` as explained above. Output filename template ------------------------ This is the same as the Jinja2_ template, in the sense it has the same build rules, but it is just a string, describing the path in which the extrapolated configurations, when combined with the template, will be saved. It may be something like this, considering our example above: .. code-block:: text output-dir/{{ name }}-{{ version }}.py With all those inputs, the ``jgen`` command will look like this: .. code-block:: sh $ jgen variables.yaml template.py 'output-dir/{{ name }}-{{ version }}.py' Generating Aggregations ----------------------- When you generate as many files you need to run, it is sometimes practical to also generate an "aggregation" script, that makes running all configurations easy. For example, one could think of a bash script that runs all of the above generated python scripts. We call those "aggregations". When aggregating, you iterate over a specific variable called ``cfgset``, which contains the dictionaries for each configuration set extrapolation. For example, an aggregation would look like this: .. code-block:: sh #/usr/bin/env bash {% for k in cfgset %} python output-dir/{{ k.name }}-{{ k.version }}.py {% endfor %} Which would then generate: .. code-block:: sh #/usr/bin/env bash python output-dir/john-v1.py python output-dir/john-v2.py python output-dir/lisa-v1.py python output-dir/lisa-v2.py With this generated bash script, you could run all configuration sets from a single command line. The final command line for ``jgen``, including the generation of specific configuration files and the aggregation would look like the following: .. code-block:: sh $ jgen variables.yaml template.py 'output-dir/{{ name }}-{{ version }}.py' run.sh 'output-dir/run.sh' Automatic injection of variables -------------------------------- Sometimes you want to use variables that are user specific in your jinja templates; For example, a temp directory that can be different for other users. To allow this, jgen automatically injects ``bob.extension.rc`` (see :ref:`bob.extension.rc`) into your variables. Then, you can access ``bob.extension.rc`` using something like: ``rc.variable_name`` to access variables from it in your jinja templates. .. Place your references here: .. _yaml: https://en.wikipedia.org/wiki/YAML .. _jinja2: http://jinja.pocoo.org/docs/