Inference and Evaluation

This guides explains how to run inference or a complete evaluation using command-line tools. Inference produces probability maps for input images, while evaluation will analyze such output against existing annotations and produce performance figures.


You may use one of your trained models (or one of ours to run inference on existing datasets or your own dataset. In inference (or prediction) mode, we input data, the trained model, and output HDF5 files containing the prediction outputs for every input image. Each HDF5 file contains a single object with a 2-dimensional matrix of floating point numbers indicating the vessel probability ([0.0,1.0]) for each pixel in the input image.

Inference on an existing dataset

To run inference, use the sub-command predict to run prediction on an existing dataset:

$ bob binseg predict -vv <model> -w <path/to/model.pth> <dataset>

Replace <model> and <dataset> by the appropriate configuration files. Replace <path/to/model.pth> to a path leading to the pre-trained model, or URL pointing to a pre-trained model (e.g. one of ours).

Inference on a custom dataset

If you would like to test your own data against one of the pre-trained models, you need to instantiate A CSV-based configuration

Read the appropriate module documentation for details.

$ bob binseg config copy csv-dataset-example
# edit to your liking
$ bob binseg predict -vv <model> -w <path/to/model.pth> ./

Inference typically consumes less resources than training, but you may speed things up using --device='cuda:0' in case you have a GPU.


In evaluation, we input an annotated dataset and predictions to generate performance summaries that help analysis of a trained model. Evaluation is done using the evaluate command ` followed by the model and the annotated dataset configuration, and the path to the pretrained weights via the --weight argument.

Use bob binseg evaluate --help for more information.

E.g. run inference on predictions from the DRIVE test set, do the following:

# Point directly to saved model via -w argument:
bob binseg evaluate -vv drive-test -p /predictions/folder -o /eval/results/folder

If available, you may use the option --second-annotator to

Comparing Systems

To compare multiple systems together and generate combined plots and tables, use the compare command. Use --help for a quick guide.

$ bob binseg compare -vv A A/metrics.csv B B/metrics.csv