Script Generation for Grid Searches

The thing with scientific discovery is that, sometimes, you need to do a lot of experiments before a reasonable conclusion. These experiments require minor variations in their configuration and submission, possibly to an SGE-enabled facility for processing.

This guide explains how to use the script jgen, which helps you in generating multiple experiment configurations for your grid searches. The system supposes that a single experiment is defined in a single file while multiple experiments can be run by somehow executing sequences of these individual configuration files.

The script jgen takes, in its simplistic form, 3 parameters that explain:

  • The “combinations” of variables that one needs to scan for a search in a YAML file

  • A Jinja2 template file that explains the setup of each experiment

  • An output template that explains how to mix the parameters in your YAML file with the template and generate a bunch of experiment configurations to run

Let’s decrypt each of these inputs.

YAML Input

The YAML input file describes all possible combinations of parameters you want to scan. All root keys that represent lists will be combined in all possible ways to produce, each combination, a “configuration set”.

A configuration set corresponds to settings for all variables in the input template that needs replacing. For example, if your template mentions the variables name and version, then each configuration set should yield values for both name and version.

For example:

name: [john, lisa]
version: [v1, v2]

This should yield to the following configuration sets:

[
  {'name': 'john', 'version': 'v1'},
  {'name': 'john', 'version': 'v2'},
  {'name': 'lisa', 'version': 'v1'},
  {'name': 'lisa', 'version': 'v2'},
]

Each key in the input file should correspond to either an object or a YAML list. If the object is a list, then we’ll iterate over it for every possible combination of elements in the lists. If the element in question is not a list, then it is considered unique and repeated for each generated configuration set. Example

name: [john, lisa]
version: [v1, v2]
text: >
   hello,
   world!

Should yield to the following configuration sets:

[
  {'name': 'john', 'version': 'v1', 'text': 'hello, world!'},
  {'name': 'john', 'version': 'v2', 'text': 'hello, world!'},
  {'name': 'lisa', 'version': 'v1', 'text': 'hello, world!'},
  {'name': 'lisa', 'version': 'v2', 'text': 'hello, world!'},
]

Keys starting with one _ (underscore) are treated as “unique” objects as well. Example:

name: [john, lisa]
version: [v1, v2]
_unique: [i1, i2]

Should yield to the following configuration sets:

[
  {'name': 'john', 'version': 'v1', '_unique': ['i1', 'i2']},
  {'name': 'john', 'version': 'v2', '_unique': ['i1', 'i2']},
  {'name': 'lisa', 'version': 'v1', '_unique': ['i1', 'i2']},
  {'name': 'lisa', 'version': 'v2', '_unique': ['i1', 'i2']},
]

Jinja2 Template

This corresponds to a file that will have variables replaced for each of the configuration sets generated by your YAML file. For example, if your template is a python file that uses the variables this way:

#/usr/bin/env python

print('My name is {{ name }}')
print('This is {{ version }}')

Then, jgen will generate 4 output files each with combinations of name and version as explained above.

Output filename template

This is the same as the Jinja2 template, in the sense it has the same build rules, but it is just a string, describing the path in which the extrapolated configurations, when combined with the template, will be saved. It may be something like this, considering our example above:

output-dir/{{ name }}-{{ version }}.py

With all those inputs, the jgen command will look like this:

$ jgen variables.yaml template.py 'output-dir/{{ name }}-{{ version }}.py'

Generating Aggregations

When you generate as many files you need to run, it is sometimes practical to also generate an “aggregation” script, that makes running all configurations easy. For example, one could think of a bash script that runs all of the above generated python scripts. We call those “aggregations”. When aggregating, you iterate over a specific variable called cfgset, which contains the dictionaries for each configuration set extrapolation. For example, an aggregation would look like this:

#/usr/bin/env bash

{% for k in cfgset %}
python output-dir/{{ k.name }}-{{ k.version }}.py
{% endfor %}

Which would then generate:

#/usr/bin/env bash

python output-dir/john-v1.py
python output-dir/john-v2.py
python output-dir/lisa-v1.py
python output-dir/lisa-v2.py

With this generated bash script, you could run all configuration sets from a single command line.

The final command line for jgen, including the generation of specific configuration files and the aggregation would look like the following:

$ jgen variables.yaml template.py 'output-dir/{{ name }}-{{ version }}.py' run.sh 'output-dir/run.sh'

Automatic injection of variables

Sometimes you want to use variables that are user specific in your jinja templates; For example, a temp directory that can be different for other users. To allow this, jgen automatically injects bob.extension.rc (see Global Configuration System) into your variables. Then, you can access bob.extension.rc using something like: rc.variable_name to access variables from it in your jinja templates.