.. vim: set fileencoding=utf-8 :


.. _gridtk.generate:

=====================================
 Script Generation for Grid Searches
=====================================

The thing with scientific discovery is that, sometimes, you need to do a lot
of experiments before a reasonable conclusion. These experiments require minor
variations in their configuration and submission, possibly to an SGE-enabled
facility for processing.

This guide explains how to use the script ``jgen``, which helps you in
generating multiple experiment configurations for your grid searches. The
system supposes that a single experiment is defined in a single file while
multiple experiments can be run by somehow executing sequences of these
individual configuration files.

The script ``jgen`` takes, in its simplistic form, 3 parameters that explain:

* The "combinations" of variables that one needs to scan for a search in a
  YAML_ file
* A Jinja2_ template file that explains the setup of each experiment
* An output template that explains how to mix the parameters in your YAML_ file
  with the template and generate a bunch of experiment configurations to run

Let's decrypt each of these inputs.


YAML Input
----------

The YAML_ input file describes all possible combinations of parameters you want
to scan. All root keys that represent lists will be combined in all possible
ways to produce, each combination, a "configuration set".

A configuration set corresponds to settings for **all** variables in the input
template that needs replacing. For example, if your template mentions the
variables ``name`` and ``version``, then each configuration set should yield
values for both ``name`` and ``version``.

For example:

.. code-block:: yaml

   name: [john, lisa]
   version: [v1, v2]


This should yield to the following configuration sets:

.. code-block:: python

   [
     {'name': 'john', 'version': 'v1'},
     {'name': 'john', 'version': 'v2'},
     {'name': 'lisa', 'version': 'v1'},
     {'name': 'lisa', 'version': 'v2'},
   ]


Each key in the input file should correspond to either an object or a YAML
list. If the object is a list, then we'll iterate over it for every possible
combination of elements in the lists. If the element in question is not a list,
then it is considered unique and repeated for each generated configuration set.
Example

.. code-block:: yaml

   name: [john, lisa]
   version: [v1, v2]
   text: >
      hello,
      world!

Should yield to the following configuration sets:

.. code-block:: python

   [
     {'name': 'john', 'version': 'v1', 'text': 'hello, world!'},
     {'name': 'john', 'version': 'v2', 'text': 'hello, world!'},
     {'name': 'lisa', 'version': 'v1', 'text': 'hello, world!'},
     {'name': 'lisa', 'version': 'v2', 'text': 'hello, world!'},
   ]

Keys starting with one `_` (underscore) are treated as "unique" objects as
well. Example:

.. code-block:: yaml

   name: [john, lisa]
   version: [v1, v2]
   _unique: [i1, i2]

Should yield to the following configuration sets:

.. code-block:: python

   [
     {'name': 'john', 'version': 'v1', '_unique': ['i1', 'i2']},
     {'name': 'john', 'version': 'v2', '_unique': ['i1', 'i2']},
     {'name': 'lisa', 'version': 'v1', '_unique': ['i1', 'i2']},
     {'name': 'lisa', 'version': 'v2', '_unique': ['i1', 'i2']},
   ]


Jinja2 Template
---------------

This corresponds to a file that will have variables replaced for each of the
configuration sets generated by your YAML_ file. For example, if your template
is a python file that uses the variables this way:

.. code-block:: text

   #/usr/bin/env python

   print('My name is {{ name }}')
   print('This is {{ version }}')


Then, ``jgen`` will generate 4 output files each with combinations of ``name``
and ``version`` as explained above.


Output filename template
------------------------

This is the same as the Jinja2_ template, in the sense it has the same build
rules, but it is just a string, describing the path in which the extrapolated
configurations, when combined with the template, will be saved. It may be
something like this, considering our example above:

.. code-block:: text

   output-dir/{{ name }}-{{ version }}.py


With all those inputs, the ``jgen`` command will look like this:

.. code-block:: sh

   $ jgen variables.yaml template.py 'output-dir/{{ name }}-{{ version }}.py'


Generating Aggregations
-----------------------

When you generate as many files you need to run, it is sometimes practical to
also generate an "aggregation" script, that makes running all configurations
easy. For example, one could think of a bash script that runs all of the above
generated python scripts. We call those "aggregations". When aggregating, you
iterate over a specific variable called ``cfgset``, which contains the
dictionaries for each configuration set extrapolation. For example, an
aggregation would look like this:

.. code-block:: sh

   #/usr/bin/env bash

   {% for k in cfgset %}
   python output-dir/{{ k.name }}-{{ k.version }}.py
   {% endfor %}


Which would then generate:

.. code-block:: sh

   #/usr/bin/env bash

   python output-dir/john-v1.py
   python output-dir/john-v2.py
   python output-dir/lisa-v1.py
   python output-dir/lisa-v2.py


With this generated bash script, you could run all configuration sets from a
single command line.

The final command line for ``jgen``, including the generation of specific
configuration files and the aggregation would look like the following:

.. code-block:: sh

   $ jgen variables.yaml template.py 'output-dir/{{ name }}-{{ version }}.py' run.sh 'output-dir/run.sh'


Automatic injection of variables
--------------------------------

Sometimes you want to use variables that are user specific in your jinja templates; For
example, a temp directory that can be different for other users. To allow this, jgen
automatically injects ``bob.extension.rc`` (see :ref:`bob.extension.rc`) into your
variables. Then, you can access ``bob.extension.rc`` using something like:
``rc.variable_name`` to access variables from it in your jinja templates.


.. Place your references here:
.. _yaml: https://en.wikipedia.org/wiki/YAML
.. _jinja2: http://jinja.pocoo.org/docs/