Command-line Interface#

This section contains an overview of command-line applications shipped with this package.

binseg#

Binary Segmentation Benchmark.

binseg [OPTIONS] COMMAND [ARGS]...

analyze#

Runs a complete evaluation from prediction to comparison.

This script is just a wrapper around the individual scripts for running prediction and evaluating FCN models. It organises the output in a preset way:


└─ <output-folder>/
    ├── predictions/  #the prediction outputs for the train/test set
    ├── overlayed/  #the overlayed outputs for the train/test set
        ├── predictions/  #predictions overlayed on the input images
        ├── analysis/  #predictions overlayed on the input images
        ├              #including analysis of false positives, negatives
        ├              #and true positives
        └── second-annotator/  #if set, store overlayed images for the
                            #second annotator here
    └── analysis /  #the outputs of the analysis of both train/test sets
                    #includes second-annotator "mesures" as well, if
                    # configured

N.B.: The tool is designed to prevent analysis bias and allows one to provide separate subsets for training and evaluation. Instead of using simple datasets, datasets for full experiment running should be dictionaries with specific subset names:

  • __train__: dataset used for training, prioritarily. It is typically the dataset containing data augmentation pipelines.

  • train (optional): a copy of the __train__ dataset, without data augmentation, that will be evaluated alongside other sets available

  • *: any other name, not starting with an underscore character (_), will be considered a test set for evaluation.

N.B.2: The threshold used for calculating the F1-score on the test set, or overlay analysis (false positives, negatives and true positives overprinted on the original image) also follows the logic above.

It is possible to pass one or several Python files (or names of deepdraw.config entry points or module names) as CONFIG arguments to the command line which contain the parameters listed below as Python variables. The options through the command-line (see below) will override the values of configuration files. You can run this command with <COMMAND> -H example_config.py to create a template config file.

binseg analyze [OPTIONS] [CONFIG]...

Options

-o, --output-folder <output_folder>#

Required Path where to store experiment outputs (created if does not exist)

-m, --model <model>#

Required A torch.nn.Module instance implementing the network to be trained, and then evaluated

-d, --dataset <dataset>#

Required A dictionary mapping string keys to deepdraw.data.utils.SampleList2TorchDataset’s. At least one key named ‘train’ must be available. This dataset will be used for training the network model. All other datasets will be used for prediction and evaluation. Dataset descriptions include all required pre-processing, including eventual data augmentation, which may be eventually excluded for prediction and evaluation purposes

-S, --second-annotator <second_annotator>#

A dataset or dictionary, like in –dataset, with the same sample keys, but with annotations from a different annotator that is going to be compared to the one in –dataset

-b, --batch-size <batch_size>#

Required Number of samples in every batch (this parameter affects memory requirements for the network). If the number of samples in the batch is larger than the total number of samples available for training, this value is truncated. If this number is smaller, then batches of the specified size are created and fed to the network until there are no more new samples to feed (epoch is finished). If the total number of training samples is not a multiple of the batch-size, the last batch will be smaller than the first.

Default:

1

-d, --device <device>#

Required A string indicating the device to use (e.g. “cpu” or “cuda:0”)

Default:

cpu

-O, --overlayed, --no-overlayed#

Creates overlayed representations of the output probability maps, similar to –overlayed in prediction-mode, except it includes distinctive colours for true and false positives and false negatives. If not set, or empty then do NOT output overlayed images.

Default:

False

-w, --weight <weight>#

Required Path or URL to pretrained model file (.pth extension)

-S, --steps <steps>#

Required This number is used to define the number of threshold steps to consider when evaluating the highest possible F1-score on test data.

Default:

1000

-P, --parallel <parallel>#

Required Use multiprocessing for data processing: if set to -1 (default), disables multiprocessing. Set to 0 to enable as many data loading instances as processing cores as available in the system. Set to >= 1 to enable that many multiprocessing instances for data processing.

Default:

-1

-L, --plot-limits <plot_limits>#

If set, this option affects the performance comparison plots. It must be a 4-tuple containing the bounds of the plot for the x and y axis respectively (format: x_low, x_high, y_low, y_high]). If not set, use normal bounds ([0, 1, 0, 1]) for the performance curve.

Default:

0.0, 1.0, 0.0, 1.0

-v, --verbose#

Increase the verbosity level from 0 (only error and critical) messages will be displayed, to 1 (like 0, but adds warnings), 2 (like 1, but adds info messags), and 3 (like 2, but also adds debugging messages) by adding the –verbose option as often as desired (e.g. ‘-vvv’ for debug).

Default:

0

-H, --dump-config <dump_config>#

Name of the config file to be generated

Arguments

CONFIG#

Optional argument(s)

Examples:

1. Re-evaluates a pre-trained M2U-Net model with DRIVE (vessel
segmentation), on the CPU, by running inference and evaluation on results
from its test set:
$ deepdraw analyze -vv m2unet drive --weight=model.path

compare#

binseg compare [OPTIONS] [LABEL_PATH]...

Options

-f, --output-figure <output_figure>#

Path where write the output figure (any extension supported by matplotlib is possible). If not provided, does not produce a figure.

-T, --table-format <table_format>#

Required The format to use for the comparison table

Default:

rst

Options:

asciidoc | double_grid | double_outline | fancy_grid | fancy_outline | github | grid | heavy_grid | heavy_outline | html | jira | latex | latex_booktabs | latex_longtable | latex_raw | mediawiki | mixed_grid | mixed_outline | moinmoin | orgtbl | outline | pipe | plain | presto | pretty | psql | rounded_grid | rounded_outline | rst | simple | simple_grid | simple_outline | textile | tsv | unsafehtml | youtrack

-u, --output-table <output_table>#

Path where write the output table. If not provided, does not write write a table to file, only to stdout.

-t, --threshold <threshold>#

This number is used to select which F1-score to use for representing a system performance. If not set, we report the maximum F1-score in the set, which is equivalent to threshold selection a posteriori (biased estimator), unless the performance file being considered already was pre-tunned, and contains a ‘threshold_a_priori’ column which we then use to pick a threshold for the dataset. You can override this behaviour by either setting this value to a floating-point number in the range [0.0, 1.0], or to a string, naming one of the systems which will be used to calculate the threshold leading to the maximum F1-score and then applied to all other sets.

-L, --plot-limits <plot_limits>#

If set, must be a 4-tuple containing the bounds of the plot for the x and y axis respectively (format: x_low, x_high, y_low, y_high]). If not set, use normal bounds ([0, 1, 0, 1]) for the performance curve.

Default:

0.0, 1.0, 0.0, 1.0

-v, --verbose#

Increase the verbosity level from 0 (only error and critical) messages will be displayed, to 1 (like 0, but adds warnings), 2 (like 1, but adds info messags), and 3 (like 2, but also adds debugging messages) by adding the –verbose option as often as desired (e.g. ‘-vvv’ for debug).

Default:

0

Arguments

LABEL_PATH#

Optional argument(s)

Examples:

1. Compares system A and B, with their own pre-computed measure files:
$ deepdraw compare -vv A path/to/A/train.csv B path/to/B/test.csv

config#

Commands for listing, describing and copying configuration resources.

binseg config [OPTIONS] COMMAND [ARGS]...

copy#

Copy a specific configuration resource so it can be modified locally.

binseg config copy [OPTIONS] SOURCE DESTINATION

Options

-v, --verbose#

Increase the verbosity level from 0 (only error and critical) messages will be displayed, to 1 (like 0, but adds warnings), 2 (like 1, but adds info messags), and 3 (like 2, but also adds debugging messages) by adding the –verbose option as often as desired (e.g. ‘-vvv’ for debug).

Default:

0

Arguments

SOURCE#

Required argument

DESTINATION#

Required argument

Examples:

1. Makes a copy of one of the stock configuration files locally, so it can be
adapted:
$ deepdraw config copy montgomery -vvv newdataset.py

describe#

Describes a specific configuration file.

binseg config describe [OPTIONS] NAME...

Options

-v, --verbose#

Increase the verbosity level from 0 (only error and critical) messages will be displayed, to 1 (like 0, but adds warnings), 2 (like 1, but adds info messags), and 3 (like 2, but also adds debugging messages) by adding the –verbose option as often as desired (e.g. ‘-vvv’ for debug).

Default:

0

Arguments

NAME#

Required argument(s)

Examples:

1. Describes the Montgomery dataset configuration:
deepdraw config describe montgomery
2. Describes the Montgomery dataset configuration and lists its
contents:
deepdraw config describe montgomery -v

list#

Lists configuration files installed.

binseg config list [OPTIONS]

Options

-v, --verbose#

Increase the verbosity level from 0 (only error and critical) messages will be displayed, to 1 (like 0, but adds warnings), 2 (like 1, but adds info messags), and 3 (like 2, but also adds debugging messages) by adding the –verbose option as often as desired (e.g. ‘-vvv’ for debug).

Default:

0

Examples:

1. Lists all configuration resources (type: deepdraw.config) installed:
deepdraw config list
2. Lists all configuration resources and their descriptions (notice this may
be slow as it needs to load all modules once):
deepdraw config list -v

dataset#

Commands for listing and verifying datasets.

binseg dataset [OPTIONS] COMMAND [ARGS]...

check#

Checks file access on one or more datasets.

binseg dataset check [OPTIONS] [DATASET]...

Options

-l, --limit <limit>#

Required Limit check to the first N samples in each dataset, making the check sensibly faster. Set it to zero to check everything.

-v, --verbose#

Increase the verbosity level from 0 (only error and critical) messages will be displayed, to 1 (like 0, but adds warnings), 2 (like 1, but adds info messags), and 3 (like 2, but also adds debugging messages) by adding the –verbose option as often as desired (e.g. ‘-vvv’ for debug).

Default:

0

Arguments

DATASET#

Optional argument(s)

Examples:

1. Check if all files of the Montgomery dataset can be loaded:
deepdraw dataset check -vv montgomery
2. Check if all files of multiple installed datasets can be loaded:
deepdraw dataset check -vv montgomery shenzhen
3. Check if all files of all installed datasets can be loaded:
deepdraw dataset check

list#

Lists all supported and configured datasets.

binseg dataset list [OPTIONS]

Options

-v, --verbose#

Increase the verbosity level from 0 (only error and critical) messages will be displayed, to 1 (like 0, but adds warnings), 2 (like 1, but adds info messags), and 3 (like 2, but also adds debugging messages) by adding the –verbose option as often as desired (e.g. ‘-vvv’ for debug).

Default:

0

Examples:

1. To install a dataset, set up its data directory (“datadir”). For
example, to setup access to Montgomery files you downloaded locally at
the directory “/path/to/montgomery/files”, edit the RC file (typically
$HOME/.config/deepdraw.toml), and add a line like the following:
[datadir]
montgomery = "/path/to/montgomery/files"

Note

This setting is case-sensitive.

2. List all raw datasets supported (and configured):
$ deepdraw dataset list

evaluate#

It is possible to pass one or several Python files (or names of deepdraw.config entry points or module names) as CONFIG arguments to the command line which contain the parameters listed below as Python variables. The options through the command-line (see below) will override the values of configuration files. You can run this command with <COMMAND> -H example_config.py to create a template config file.

binseg evaluate [OPTIONS] [CONFIG]...

Options

-o, --output-folder <output_folder>#

Required Path where to store the analysis result (created if does not exist)

-p, --predictions-folder <predictions_folder>#

Required Path where predictions are currently stored

-d, --dataset <dataset>#

Required A torch.utils.data.dataset.Dataset instance implementing a dataset to be used for evaluation purposes, possibly including all pre-processing pipelines required or, optionally, a dictionary mapping string keys to torch.utils.data.dataset.Dataset instances. All keys that do not start with an underscore (_) will be processed.

-S, --second-annotator <second_annotator>#

A dataset or dictionary, like in –dataset, with the same sample keys, but with annotations from a different annotator that is going to be compared to the one in –dataset. The same rules regarding dataset naming conventions apply

-O, --overlayed <overlayed>#

Creates overlayed representations of the output probability maps, similar to –overlayed in prediction-mode, except it includes distinctive colours for true and false positives and false negatives. If not set, or empty then do NOT output overlayed images. Otherwise, the parameter represents the name of a folder where to store those

-t, --threshold <threshold>#

This number is used to define positives and negatives from probability maps, and report F1-scores (a priori). It should either come from the training set or a separate validation set to avoid biasing the analysis. Optionally, if you provide a multi-set dataset as input, this may also be the name of an existing set from which the threshold will be estimated (highest F1-score) and then applied to the subsequent sets. This number is also used to print the test set F1-score a priori performance

-S, --steps <steps>#

Required This number is used to define the number of threshold steps to consider when evaluating the highest possible F1-score on test data.

Default:

1000

-P, --parallel <parallel>#

Required Use multiprocessing for data processing: if set to -1 (default), disables multiprocessing. Set to 0 to enable as many data loading instances as processing cores as available in the system. Set to >= 1 to enable that many multiprocessing instances for data processing.

Default:

-1

-v, --verbose#

Increase the verbosity level from 0 (only error and critical) messages will be displayed, to 1 (like 0, but adds warnings), 2 (like 1, but adds info messags), and 3 (like 2, but also adds debugging messages) by adding the –verbose option as often as desired (e.g. ‘-vvv’ for debug).

Default:

0

-H, --dump-config <dump_config>#

Name of the config file to be generated

Arguments

CONFIG#

Optional argument(s)

Examples:

1. Runs evaluation on an existing dataset configuration:
$ deepdraw evaluate -vv drive --predictions-folder=path/to/predictions --output-folder=path/to/results
2. To run evaluation on a folder with your own images and annotations, you
must first specify resizing, cropping, etc, so that the image can be
correctly input to the model. Failing to do so will likely result in
poor performance. To figure out such specifications, you must consult
the dataset configuration used for training the provided model.
Once you figured this out, do the following:
$ deepdraw config copy csv-dataset-example mydataset.py
# modify "mydataset.py" to your liking
$ deepdraw evaluate -vv mydataset.py --predictions-folder=path/to/predictions --output-folder=path/to/results

experiment#

Runs a complete experiment, from training, to prediction and evaluation.

This script is just a wrapper around the individual scripts for training, running prediction, evaluating and comparing FCN model performance. It organises the output in a preset way:

 
└─ <output-folder>/
   ├── model/  #the generated model will be here
   ├── predictions/  #the prediction outputs for the train/test set
   ├── overlayed/  #the overlayed outputs for the train/test set
      ├── predictions/  #predictions overlayed on the input images
      ├── analysis/  #predictions overlayed on the input images
      ├              #including analysis of false positives, negatives
      ├              #and true positives
      └── second-annotator/  #if set, store overlayed images for the
                             #second annotator here
   └── analysis /  #the outputs of the analysis of both train/test sets
                   #includes second-annotator "mesures" as well, if
                   # configured

Training is performed for a configurable number of epochs, and generates at least a final_model.pth. It may also generate a number of intermediate checkpoints. Checkpoints are model files (.pth files) that are stored during the training and useful to resume the procedure in case it stops abruptly.

N.B.: The tool is designed to prevent analysis bias and allows one to provide (potentially multiple) separate subsets for training, validation, and evaluation. Instead of using simple datasets, datasets for full experiment running should be dictionaries with specific subset names:

  • __train__: dataset used for training, prioritarily. It is typically the dataset containing data augmentation pipelines.

  • __valid__: dataset used for validation. It is typically disjoint from the training and test sets. In such a case, we checkpoint the model with the lowest loss on the validation set as well, throughout all the training, besides the model at the end of training.

  • train (optional): a copy of the __train__ dataset, without data augmentation, that will be evaluated alongside other sets available

  • __valid_extra__: a list of datasets that are tracked during validation, but do not affect checkpoiting. If present, an extra column with an array containing the loss of each set is kept on the training log.

  • *: any other name, not starting with an underscore character (_), will be considered a test set for evaluation.

N.B.2: The threshold used for calculating the F1-score on the test set, or overlay analysis (false positives, negatives and true positives overprinted on the original image) also follows the logic above.

It is possible to pass one or several Python files (or names of deepdraw.config entry points or module names) as CONFIG arguments to the command line which contain the parameters listed below as Python variables. The options through the command-line (see below) will override the values of configuration files. You can run this command with <COMMAND> -H example_config.py to create a template config file.

binseg experiment [OPTIONS] [CONFIG]...

Options

-o, --output-folder <output_folder>#

Required Path where to store experiment outputs (created if does not exist)

-m, --model <model>#

Required A torch.nn.Module instance implementing the network to be trained, and then evaluated

-d, --dataset <dataset>#

Required A dictionary mapping string keys to torch.utils.data.dataset.Dataset instances implementing datasets to be used for training and validating the model, possibly including all pre-processing pipelines required or, optionally, a dictionary mapping string keys to torch.utils.data.dataset.Dataset instances. At least one key named train must be available. This dataset will be used for training the network model. The dataset description must include all required pre-processing, including eventual data augmentation. If a dataset named __train__ is available, it is used prioritarily for training instead of train. If a dataset named __valid__ is available, it is used for model validation (and automatic check-pointing) at each epoch. If a dataset list named __valid_extra__ is available, then it will be tracked during the validation process and its loss output at the training log as well, in the format of an array occupying a single column. All other keys are considered test datasets and only used during analysis, to report the final system performance

-S, --second-annotator <second_annotator>#

A dataset or dictionary, like in –dataset, with the same sample keys, but with annotations from a different annotator that is going to be compared to the one in –dataset

--optimizer <optimizer>#

Required A torch.optim.Optimizer that will be used to train the network

--criterion <criterion>#

Required A loss function to compute the FCN error for every sample respecting the PyTorch API for loss functions (see torch.nn.modules.loss)

--scheduler <scheduler>#

Required A learning rate scheduler that drives changes in the learning rate depending on the FCN state (see torch.optim.lr_scheduler)

-b, --batch-size <batch_size>#

Required Number of samples in every batch (this parameter affects memory requirements for the network). If the number of samples in the batch is larger than the total number of samples available for training, this value is truncated. If this number is smaller, then batches of the specified size are created and fed to the network until there are no more new samples to feed (epoch is finished). If the total number of training samples is not a multiple of the batch-size, the last batch will be smaller than the first, unless –drop-incomplete-batch is set, in which case this batch is not used.

Default:

2

-c, --batch-chunk-count <batch_chunk_count>#

Required Number of chunks in every batch (this parameter affects memory requirements for the network). The number of samples loaded for every iteration will be batch-size/batch-chunk-count. batch-size needs to be divisible by batch-chunk-count, otherwise an error will be raised. This parameter is used to reduce number of samples loaded in each iteration, in order to reduce the memory usage in exchange for processing time (more iterations). This is specially interesting whe one is running with GPUs with limited RAM. The default of 1 forces the whole batch to be processed at once. Otherwise the batch is broken into batch-chunk-count pieces, and gradients are accumulated to complete each batch.

Default:

1

-D, --drop-incomplete-batch, --no-drop-incomplete-batch#

Required If set, then may drop the last batch in an epoch, in case it is incomplete. If you set this option, you should also consider increasing the total number of epochs of training, as the total number of training steps may be reduced

Default:

False

-e, --epochs <epochs>#

Required Number of epochs (complete training set passes) to train for. If continuing from a saved checkpoint, ensure to provide a greater number of epochs than that saved on the checkpoint to be loaded.

Default:

1000

-p, --checkpoint-period <checkpoint_period>#

Required Number of epochs after which a checkpoint is saved. A value of zero will disable check-pointing. If checkpointing is enabled and training stops, it is automatically resumed from the last saved checkpoint if training is restarted with the same configuration.

Default:

0

-d, --device <device>#

Required A string indicating the device to use (e.g. “cpu” or “cuda:0”)

Default:

cpu

-s, --seed <seed>#

Seed to use for the random number generator

Default:

42

-P, --parallel <parallel>#

Required Use multiprocessing for data loading and processing: if set to -1 (default), disables multiprocessing altogether. Set to 0 to enable as many data loading instances as processing cores as available in the system. Set to >= 1 to enable that many multiprocessing instances for data processing.

Default:

-1

-I, --monitoring-interval <monitoring_interval>#

Required Time between checks for the use of resources during each training epoch. An interval of 5 seconds, for example, will lead to CPU and GPU resources being probed every 5 seconds during each training epoch. Values registered in the training logs correspond to averages (or maxima) observed through possibly many probes in each epoch. Notice that setting a very small value may cause the probing process to become extremely busy, potentially biasing the overall perception of resource usage.

Default:

5.0

-O, --overlayed, --no-overlayed#

Creates overlayed representations of the output probability maps, similar to –overlayed in prediction-mode, except it includes distinctive colours for true and false positives and false negatives. If not set, or empty then do NOT output overlayed images.

Default:

False

-S, --steps <steps>#

Required This number is used to define the number of threshold steps to consider when evaluating the highest possible F1-score on test data.

Default:

1000

-L, --plot-limits <plot_limits>#

If set, this option affects the performance comparison plots. It must be a 4-tuple containing the bounds of the plot for the x and y axis respectively (format: x_low, x_high, y_low, y_high]). If not set, use normal bounds ([0, 1, 0, 1]) for the performance curve.

Default:

0.0, 1.0, 0.0, 1.0

-v, --verbose#

Increase the verbosity level from 0 (only error and critical) messages will be displayed, to 1 (like 0, but adds warnings), 2 (like 1, but adds info messags), and 3 (like 2, but also adds debugging messages) by adding the –verbose option as often as desired (e.g. ‘-vvv’ for debug).

Default:

0

-H, --dump-config <dump_config>#

Name of the config file to be generated

Arguments

CONFIG#

Optional argument(s)

Examples:

1. Trains an M2U-Net model (VGG-16 backbone) with DRIVE (vessel
segmentation), on the CPU, for only two epochs, then runs inference and
evaluation on stock datasets, report performance as a table and a figure:
$ deepdraw experiment -vv m2unet drive --epochs=2

mkmask#

Commands for generating masks for images in a dataset.

It is possible to pass one or several Python files (or names of None entry points or module names) as CONFIG arguments to the command line which contain the parameters listed below as Python variables. The options through the command-line (see below) will override the values of configuration files. You can run this command with <COMMAND> -H example_config.py to create a template config file.

binseg mkmask [OPTIONS] [CONFIG]...

Options

-o, --output-folder <output_folder>#

Required Path where to store the generated model (created if does not exist)

-d, --dataset <dataset>#

Required The base path to the dataset to which we want to generate the masks. In case you have already configured the path for the datasets supported by deepdraw, you can just use the name of the dataset as written in the config.

-g, --globs <globs>#

Required The global path to the dataset to which we want to generate the masks.We need to specify the path for the images ,Ex : –globs=”images/*.jpg”It also can be used multiple time.

-t, --threshold <threshold>#

Required Generating a mask needs a threshold to be fixed in order to transform the image to binary

-v, --verbose#

Increase the verbosity level from 0 (only error and critical) messages will be displayed, to 1 (like 0, but adds warnings), 2 (like 1, but adds info messags), and 3 (like 2, but also adds debugging messages) by adding the –verbose option as often as desired (e.g. ‘-vvv’ for debug).

Default:

0

-H, --dump-config <dump_config>#

Name of the config file to be generated

Arguments

CONFIG#

Optional argument(s)

Examples:

1. Generate masks for supported dataset by deepdraw. Ex: refuge.
$ deepdraw mkmask --dataset="refuge" --globs="Training400/*Glaucoma/*.jpg" --globs="Training400/*AMD/*.jpg" --threshold=5

Or you can generate the same results with this command

$ deepdraw mkmask -d "refuge" -g "Training400/*Glaucoma/*.jpg" -g "Training400/*AMD/*.jpg" -t 5
2. Generate masks for non supported dataset by deepdraw
$ deepdraw mkmask -d "Path/to/dataset" -g "glob1" -g "glob2" -g glob3  -t 4

predict#

Predicts vessel map (probabilities) on input images.

It is possible to pass one or several Python files (or names of deepdraw.config entry points or module names) as CONFIG arguments to the command line which contain the parameters listed below as Python variables. The options through the command-line (see below) will override the values of configuration files. You can run this command with <COMMAND> -H example_config.py to create a template config file.

binseg predict [OPTIONS] [CONFIG]...

Options

-o, --output-folder <output_folder>#

Required Path where to store the predictions (created if does not exist)

-m, --model <model>#

Required A torch.nn.Module instance implementing the network to be evaluated

-d, --dataset <dataset>#

Required A torch.utils.data.dataset.Dataset instance implementing a dataset to be used for running prediction, possibly including all pre-processing pipelines required or, optionally, a dictionary mapping string keys to torch.utils.data.dataset.Dataset instances. All keys that do not start with an underscore (_) will be processed.

-b, --batch-size <batch_size>#

Required Number of samples in every batch (this parameter affects memory requirements for the network)

Default:

1

-d, --device <device>#

Required A string indicating the device to use (e.g. “cpu” or “cuda:0”)

Default:

cpu

-w, --weight <weight>#

Required Path or URL to pretrained model file (.pth extension)

-O, --overlayed <overlayed>#

Creates overlayed representations of the output probability maps on top of input images (store results as PNG files). If not set, or empty then do NOT output overlayed images. Otherwise, the parameter represents the name of a folder where to store those

-P, --parallel <parallel>#

Required Use multiprocessing for data loading: if set to -1 (default), disables multiprocessing data loading. Set to 0 to enable as many data loading instances as processing cores as available in the system. Set to >= 1 to enable that many multiprocessing instances for data loading.

Default:

-1

-v, --verbose#

Increase the verbosity level from 0 (only error and critical) messages will be displayed, to 1 (like 0, but adds warnings), 2 (like 1, but adds info messags), and 3 (like 2, but also adds debugging messages) by adding the –verbose option as often as desired (e.g. ‘-vvv’ for debug).

Default:

0

-H, --dump-config <dump_config>#

Name of the config file to be generated

Arguments

CONFIG#

Optional argument(s)

Examples:

1. Runs prediction on an existing dataset configuration:
$ deepdraw predict -vv m2unet drive --weight=path/to/model_final_epoch.pth --output-folder=path/to/predictions
2. To run prediction on a folder with your own images, you must first
specify resizing, cropping, etc, so that the image can be correctly
input to the model. Failing to do so will likely result in poor
performance. To figure out such specifications, you must consult the
dataset configuration used for training the provided model. Once
you figured this out, do the following:
$ deepdraw config copy csv-dataset-example mydataset.py
# modify "mydataset.py" to include the base path and required transforms
$ deepdraw predict -vv m2unet mydataset.py --weight=path/to/model_final_epoch.pth --output-folder=path/to/predictions

significance#

Evaluates how significantly different are two models on the same

dataset.

This application calculates the significance of results of two models operating on the same dataset, and subject to a priori threshold tunning.

It is possible to pass one or several Python files (or names of deepdraw.config entry points or module names) as CONFIG arguments to the command line which contain the parameters listed below as Python variables. The options through the command-line (see below) will override the values of configuration files. You can run this command with <COMMAND> -H example_config.py to create a template config file.

binseg significance [OPTIONS] [CONFIG]...

Options

-n, --names <names>#

Required Names of the two systems to compare

-p, --predictions <predictions>#

Required Path where predictions of system 2 are currently stored. You may also input predictions from a second-annotator. This application will adequately handle it.

-d, --dataset <dataset>#

Required A dictionary mapping string keys to torch.utils.data.dataset.Dataset instances

-t, --threshold <threshold>#

Required This number is used to define positives and negatives from probability maps, and report F1-scores (a priori). By default, we expect a set named ‘validation’ to be available at the input data. If that is not the case, we use ‘train’, if available. You may provide the name of another dataset to be used for threshold tunning otherwise. If not set, or a string is input, threshold tunning is done per system, individually. Optionally, you may also provide a floating-point number between [0.0, 1.0] as the threshold to use for both systems.

Default:

validation

-e, --evaluate <evaluate>#

Required Name of the dataset to evaluate

Default:

test

-S, --steps <steps>#

Required This number is used to define the number of threshold steps to consider when evaluating the highest possible F1-score on train/test data.

Default:

1000

-s, --size <size>#

Required This is a tuple with two values indicating the size of windows to be used for sliding window analysis. The values represent height and width respectively.

Default:

128, 128

-t, --stride <stride>#

Required This is a tuple with two values indicating the stride of windows to be used for sliding window analysis. The values represent height and width respectively.

Default:

32, 32

-f, --figure <figure>#

Required The name of a performance figure (e.g. f1_score, or jaccard) to use when comparing performances

Default:

accuracy

-o, --output-folder <output_folder>#

Path where to store visualizations

-R, --remove-outliers, --no-remove-outliers#

Required If set, removes outliers from both score distributions before running statistical analysis. Outlier removal follows a 1.5 IQR range check from the difference in figures between both systems and assumes most of the distribution is contained within that range (like in a normal distribution)

Default:

False

-R, --remove-zeros, --no-remove-zeros#

Required If set, removes instances from the statistical analysis in which both systems had a performance equal to zero.

Default:

False

-x, --parallel <parallel>#

Required Set the number of parallel processes to use when running using multiprocessing. A value of zero uses all reported cores.

Default:

1

-k, --checkpoint-folder <checkpoint_folder>#

Path where to store checkpointed versions of sliding window performances

-v, --verbose#

Increase the verbosity level from 0 (only error and critical) messages will be displayed, to 1 (like 0, but adds warnings), 2 (like 1, but adds info messags), and 3 (like 2, but also adds debugging messages) by adding the –verbose option as often as desired (e.g. ‘-vvv’ for debug).

Default:

0

-H, --dump-config <dump_config>#

Name of the config file to be generated

Arguments

CONFIG#

Optional argument(s)

Examples:

1. Runs a significance test using as base the calculated predictions of two
different systems, on the same dataset:
$ deepdraw significance -vv drive --names system1 system2 --predictions=path/to/predictions/system-1 path/to/predictions/system-2
2. By default, we use a “validation” dataset if it is available, to infer
the a priori threshold for the comparison of two systems. Otherwise,
you may need to specify the name of a set to be used as validation set
for choosing a threshold. The same goes for the set to be used for
testing the hypothesis - by default we use the “test” dataset if it is
available, otherwise, specify.
$ deepdraw significance -vv drive --names system1 system2 --predictions=path/to/predictions/system-1 path/to/predictions/system-2 --threshold=train --evaluate=alternate-test

train#

Trains an FCN to perform binary segmentation.

Training is performed for a configurable number of epochs, and generates at least a final_model.pth. It may also generate a number of intermediate checkpoints. Checkpoints are model files (.pth files) that are stored during the training and useful to resume the procedure in case it stops abruptly.

Tip: In case the model has been trained over a number of epochs, it is possible to continue training, by simply relaunching the same command, and changing the number of epochs to a number greater than the number where the original training session stopped (or the last checkpoint was saved).

It is possible to pass one or several Python files (or names of deepdraw.config entry points or module names) as CONFIG arguments to the command line which contain the parameters listed below as Python variables. The options through the command-line (see below) will override the values of configuration files. You can run this command with <COMMAND> -H example_config.py to create a template config file.

binseg train [OPTIONS] [CONFIG]...

Options

-o, --output-folder <output_folder>#

Required Path where to store the generated model (created if does not exist)

-m, --model <model>#

Required A torch.nn.Module instance implementing the network to be trained

-d, --dataset <dataset>#

Required A dictionary mapping string keys to torch.utils.data.dataset.Dataset instances implementing datasets to be used for training and validating the model, possibly including all pre-processing pipelines required or, optionally, a dictionary mapping string keys to torch.utils.data.dataset.Dataset instances. At least one key named train must be available. This dataset will be used for training the network model. The dataset description must include all required pre-processing, including eventual data augmentation. If a dataset named __train__ is available, it is used prioritarily for training instead of train. If a dataset named __valid__ is available, it is used for model validation (and automatic check-pointing) at each epoch. If a dataset list named __extra_valid__ is available, then it will be tracked during the validation process and its loss output at the training log as well, in the format of an array occupying a single column. All other keys are considered test datasets and are ignored during training

--optimizer <optimizer>#

Required A torch.optim.Optimizer that will be used to train the network

--criterion <criterion>#

Required A loss function to compute the FCN error for every sample respecting the PyTorch API for loss functions (see torch.nn.modules.loss)

--scheduler <scheduler>#

Required A learning rate scheduler that drives changes in the learning rate depending on the FCN state (see torch.optim.lr_scheduler)

-b, --batch-size <batch_size>#

Required Number of samples in every batch (this parameter affects memory requirements for the network). If the number of samples in the batch is larger than the total number of samples available for training, this value is truncated. If this number is smaller, then batches of the specified size are created and fed to the network until there are no more new samples to feed (epoch is finished). If the total number of training samples is not a multiple of the batch-size, the last batch will be smaller than the first, unless –drop-incomplete-batch is set, in which case this batch is not used.

Default:

2

-c, --batch-chunk-count <batch_chunk_count>#

Required Number of chunks in every batch (this parameter affects memory requirements for the network). The number of samples loaded for every iteration will be batch-size/batch-chunk-count. batch-size needs to be divisible by batch-chunk-count, otherwise an error will be raised. This parameter is used to reduce number of samples loaded in each iteration, in order to reduce the memory usage in exchange for processing time (more iterations). This is specially interesting whe one is running with GPUs with limited RAM. The default of 1 forces the whole batch to be processed at once. Otherwise the batch is broken into batch-chunk-count pieces, and gradients are accumulated to complete each batch.

Default:

1

-D, --drop-incomplete-batch, --no-drop-incomplete-batch#

Required If set, then may drop the last batch in an epoch, in case it is incomplete. If you set this option, you should also consider increasing the total number of epochs of training, as the total number of training steps may be reduced

Default:

False

-e, --epochs <epochs>#

Required Number of epochs (complete training set passes) to train for. If continuing from a saved checkpoint, ensure to provide a greater number of epochs than that saved on the checkpoint to be loaded.

Default:

1000

-p, --checkpoint-period <checkpoint_period>#

Required Number of epochs after which a checkpoint is saved. A value of zero will disable check-pointing. If checkpointing is enabled and training stops, it is automatically resumed from the last saved checkpoint if training is restarted with the same configuration.

Default:

0

-d, --device <device>#

Required A string indicating the device to use (e.g. “cpu” or “cuda:0”)

Default:

cpu

-s, --seed <seed>#

Seed to use for the random number generator

Default:

42

-P, --parallel <parallel>#

Required Use multiprocessing for data loading: if set to -1 (default), disables multiprocessing data loading. Set to 0 to enable as many data loading instances as processing cores as available in the system. Set to >= 1 to enable that many multiprocessing instances for data loading.

Default:

-1

-I, --monitoring-interval <monitoring_interval>#

Required Time between checks for the use of resources during each training epoch. An interval of 5 seconds, for example, will lead to CPU and GPU resources being probed every 5 seconds during each training epoch. Values registered in the training logs correspond to averages (or maxima) observed through possibly many probes in each epoch. Notice that setting a very small value may cause the probing process to become extremely busy, potentially biasing the overall perception of resource usage.

Default:

5.0

-v, --verbose#

Increase the verbosity level from 0 (only error and critical) messages will be displayed, to 1 (like 0, but adds warnings), 2 (like 1, but adds info messags), and 3 (like 2, but also adds debugging messages) by adding the –verbose option as often as desired (e.g. ‘-vvv’ for debug).

Default:

0

-H, --dump-config <dump_config>#

Name of the config file to be generated

Arguments

CONFIG#

Optional argument(s)

Examples:

1. Trains a U-Net model (VGG-16 backbone) with DRIVE (vessel segmentation),
on a GPU (cuda:0):
$ deepdraw train -vv unet drive --batch-size=4 --device="cuda:0"
2. Trains a HED model with HRF on a GPU (cuda:0):
$ deepdraw train -vv hed hrf --batch-size=8 --device="cuda:0"
3. Trains a M2U-Net model on the COVD-DRIVE dataset on the CPU:
$ deepdraw train -vv m2unet covd-drive --batch-size=8

train-analysis#

Analyze the training logs for loss evolution and resource

utilisation.

It is possible to pass one or several Python files (or names of deepdraw.config entry points or module names) as CONFIG arguments to the command line which contain the parameters listed below as Python variables. The options through the command-line (see below) will override the values of configuration files. You can run this command with <COMMAND> -H example_config.py to create a template config file.

binseg train-analysis [OPTIONS] LOG CONSTANTS [CONFIG]...

Options

-o, --output-pdf <output_pdf>#

Required Name of the output file to dump

Default:

trainlog.pdf

-v, --verbose#

Increase the verbosity level from 0 (only error and critical) messages will be displayed, to 1 (like 0, but adds warnings), 2 (like 1, but adds info messags), and 3 (like 2, but also adds debugging messages) by adding the –verbose option as often as desired (e.g. ‘-vvv’ for debug).

Default:

0

-H, --dump-config <dump_config>#

Name of the config file to be generated

Arguments

LOG#

Required argument

CONSTANTS#

Required argument

CONFIG#

Optional argument(s)

Examples:

1. Analyzes a training log and produces various plots:
$ deepdraw train-analysis -vv log.csv constants.csv