Command-Line Interface (CLI)¶
This package provides a single entry point for all of its applications using Bob’s unified CLI mechanism. A list of available applications can be retrieved using:
$ bob binseg --help
Usage: bob binseg [OPTIONS] COMMAND [ARGS]...
Binary 2D Image Segmentation Benchmark commands.
Options:
-h, -?, --help Show this message and exit.
Commands:
analyze Runs a complete evaluation from prediction to comparison
compare Compare multiple systems together.
config Command for listing, describing and copying...
dataset Commands for listing and verifying datasets
evaluate Evaluate an FCN on a binary segmentation task.
experiment Runs a complete experiment, from training, to...
mkmask Commands for generating masks for images in a dataset.
predict Predicts vessel map (probabilities) on input images.
significance Evaluates how significantly different are two models on...
train Trains an FCN to perform binary segmentation.
train-analysis Analyze the training logs for loss evolution and...
Setup¶
A CLI application to list and check installed (raw) datasets.
$ bob binseg dataset --help
Usage: bob binseg dataset [OPTIONS] COMMAND [ARGS]...
Commands for listing and verifying datasets
Options:
-h, -?, --help Show this message and exit.
Commands:
check Checks file access on one or more datasets
list Lists all supported and configured datasets
List available datasets¶
Lists supported and configured raw datasets.
$ bob binseg dataset list --help
Usage: bob binseg dataset list [OPTIONS]
Lists all supported and configured datasets
Options:
-v, --verbose Increase the verbosity level from 0 (only error messages) to
1 (warnings), 2 (log messages), 3 (debug information) by
adding the --verbose option as often as desired (e.g. '-vvv'
for debug).
-h, -?, --help Show this message and exit.
Examples:
1. To install a dataset, set up its data directory ("datadir"). For
example, to setup access to DRIVE files you downloaded locally at
the directory "/path/to/drive/files", do the following:
$ bob config set "bob.ip.binseg.drive.datadir" "/path/to/drive/files"
Notice this setting **is** case-sensitive.
2. List all raw datasets supported (and configured):
$ bob binseg dataset list
Check available datasets¶
Checks if we can load all files listed for a given dataset (all subsets in all protocols).
$ bob binseg dataset check --help
Usage: bob binseg dataset check [OPTIONS] [DATASET]...
Checks file access on one or more datasets
Options:
-l, --limit INTEGER RANGE Limit check to the first N samples in each
dataset, making the check sensibly faster. Set
it to zero to check everything. [x>=0; required]
-v, --verbose Increase the verbosity level from 0 (only error
messages) to 1 (warnings), 2 (log messages), 3
(debug information) by adding the --verbose
option as often as desired (e.g. '-vvv' for
debug).
-h, -?, --help Show this message and exit.
Examples:
1. Check if all files of the DRIVE dataset can be loaded:
$ bob binseg dataset check -vv drive
2. Check if all files of multiple installed datasets can be loaded:
$ bob binseg dataset check -vv drive stare
3. Check if all files of all installed datasets can be loaded:
$ bob binseg dataset check
Preset Configuration Resources¶
A CLI application allows one to list, inspect and copy available configuration resources exported by this package.
$ bob binseg config --help
Usage: bob binseg config [OPTIONS] COMMAND [ARGS]...
Command for listing, describing and copying configuration resources.
Options:
-?, -h, --help Show this message and exit.
Commands:
copy Copy a specific configuration resource so it can be modified...
describe Describe a specific configuration file.
list List configuration files installed.
Listing Resources¶
$ bob binseg config list --help
Usage: bob binseg config list [OPTIONS]
List configuration files installed.
Options:
-v, --verbose Increase the verbosity level from 0 (only error messages) to
1 (warnings), 2 (log messages), 3 (debug information) by
adding the --verbose option as often as desired (e.g. '-vvv'
for debug).
-h, -?, --help Show this message and exit.
Examples:
1. Lists all configuration resources (type: bob.ip.binseg.config) installed:
$ bob binseg config list
2. Lists all configuration resources and their descriptions (notice this may
be slow as it needs to load all modules once):
$ bob binseg config list -v
Available Resources¶
Here is a list of all resources currently exported.
$ bob binseg config list -v
module: bob.ip.binseg.configs.datasets
chasedb1 CHASE-DB1 dataset for Vessel Segmentation (first-anno...
chasedb1-1024 CHASE-DB1 dataset for Vessel Segmentation
chasedb1-2nd CHASE-DB1 dataset for Vessel Segmentation (second-ann...
chasedb1-768 CHASE-DB1 dataset for Vessel Segmentation
chasedb1-covd COVD-CHASEDB1 for Vessel Segmentation
chasedb1-mtest CHASE-DB1 cross-evaluation dataset with matched resol...
chasedb1-xtest CHASE-DB1 cross-evaluation dataset
combined-cup Combining all optic cup dataset together with the sam...
combined-disc Combining all optic disc dataset together with the sa...
combined-vessels Combining all vessel dataset together with the same r...
csv-dataset-example Example CSV-based custom filelist dataset
cxr8 CXR8 Dataset (default protocol)
cxr8-idiap CXR8 Dataset ("idiap" protocol - just like "default",...
cxr8-idiap-xtest CXR8 cross-evaluation dataset with Idiap directory st...
cxr8-xtest CXR8 cross-evaluation dataset
drhagis DRHAGIS dataset for Vessel Segmentation (default prot...
drionsdb DRIONS-DB for Optic Disc Segmentation (expert #1 anno...
drionsdb-2nd DRIONS-DB for Optic Disc Segmentation (expert #2 anno...
drionsdb-2nd-512 DRIONS-DB for Optic Disc Segmentation (expert #2 anno...
drionsdb-512 DRIONS-DB for Optic Disc Segmentation (expert #1 anno...
drionsdb-768 DRIONS-DB for Optic Disc Segmentation (expert #1 anno...
drishtigs1-cup DRISHTI-GS1 dataset for Cup Segmentation (agreed by a...
drishtigs1-cup-512 DRISHTI-GS1 dataset for Cup Segmentation (agreed by a...
drishtigs1-cup-768 DRISHTI-GS1 dataset for Cup Segmentation (agreed by a...
drishtigs1-cup-any DRISHTI-GS1 dataset for Cup Segmentation (agreed by a...
drishtigs1-disc DRISHTI-GS1 dataset for Optic Disc Segmentation (agre...
drishtigs1-disc-512 DRISHTI-GS1 dataset for Optic Disc Segmentation (agre...
drishtigs1-disc-768 DRISHTI-GS1 dataset for Optic Disc Segmentation (agre...
drishtigs1-disc-any DRISHTI-GS1 dataset for Optic Disc Segmentation (agre...
drive DRIVE dataset for Vessel Segmentation (default protoc...
drive-1024 DRIVE dataset for Vessel Segmentation (Resolution use...
drive-2nd DRIVE dataset for Vessel Segmentation (second annotat...
drive-768 DRIVE dataset for Vessel Segmentation (Resolution use...
drive-covd COVD-DRIVE for Vessel Segmentation
drive-mtest DRIVE cross-evaluation dataset with matched resolutio...
drive-xtest DRIVE cross-evaluation dataset
hrf HRF dataset for Vessel Segmentation (default protocol...
hrf-1024 HRF dataset for Vessel Segmentation
hrf-768 HRF dataset for Vessel Segmentation
hrf-covd COVD-HRF for Vessel Segmentation
hrf-highres HRF dataset for Vessel Segmentation (default protocol...
hrf-mtest HRF cross-evaluation dataset with matched resolution
hrf-xtest HRF cross-evaluation dataset
iostar-disc IOSTAR dataset for Optic Disc Segmentation (default p...
iostar-disc-512 IOSTAR dataset for Optic Disc Segmentation
iostar-disc-768 IOSTAR dataset for Optic Disc Segmentation
iostar-vessel IOSTAR dataset for Vessel Segmentation (default proto...
iostar-vessel-768 IOSTAR dataset for Vessel Segmentation (default proto...
iostar-vessel-covd COVD-IOSTAR for Vessel Segmentation
iostar-vessel-mtest IOSTAR vessel cross-evaluation dataset with matched r...
iostar-vessel-xtest IOSTAR vessel cross-evaluation dataset
jsrt Japanese Society of Radiological Technology dataset f...
jsrt-xtest JSRT CXR cross-evaluation dataset
montgomery Montgomery County dataset for Lung Segmentation (defa...
montgomery-xtest Montgomery County cross-evaluation dataset
refuge-cup REFUGE dataset for Optic Cup Segmentation (default pr...
refuge-cup-512 REFUGE dataset for Optic Cup Segmentation
refuge-cup-768 REFUGE dataset for Optic Cup Segmentation
refuge-disc REFUGE dataset for Optic Disc Segmentation (default p...
refuge-disc-512 DRISHTI-GS1 dataset for Optic Disc Segmentation (agre...
refuge-disc-768 DRISHTI-GS1 dataset for Optic Disc Segmentation (agre...
rimoner3-cup RIM-ONE r3 for Optic Cup Segmentation (expert #1 anno...
rimoner3-cup-2nd RIM-ONE r3 for Optic Cup Segmentation (expert #2 anno...
rimoner3-cup-512 RIM-ONE r3 for Optic Cup Segmentation (expert #1 anno...
rimoner3-cup-768 RIM-ONE r3 for Optic Cup Segmentation (expert #1 anno...
rimoner3-disc RIM-ONE r3 for Optic Disc Segmentation (expert #1 ann...
rimoner3-disc-2nd RIM-ONE r3 for Optic Disc Segmentation (expert #2 ann...
rimoner3-disc-512 RIM-ONE r3 for Optic Disc Segmentation (expert #1 ann...
rimoner3-disc-768 RIM-ONE r3 for Optic Disc Segmentation (expert #1 ann...
shenzhen Shenzhen dataset for Lung Segmentation (default proto...
shenzhen-small Shenzhen dataset for Lung Segmentation (default proto...
shenzhen-xtest Shenzhen cross-evaluation dataset
stare STARE dataset for Vessel Segmentation (annotator AH)
stare-1024 STARE dataset for Vessel Segmentation (annotator AH)
stare-2nd STARE dataset for Vessel Segmentation (annotator VK)
stare-768 STARE dataset for Vessel Segmentation (annotator AH)
stare-covd COVD-STARE for Vessel Segmentation
stare-mtest STARE cross-evaluation dataset with matched resolutio...
stare-xtest STARE cross-evaluation dataset
module: bob.ip.binseg.configs.models
driu DRIU Network for Vessel Segmentation
driu-bn DRIU Network for Vessel Segmentation with Batch Normalization
driu-od DRIU Network for Optic Disc Segmentation
hed HED Network for image segmentation
lwnet Little W-Net for image segmentation
m2unet MobileNetV2 U-Net model for image segmentation
resunet Residual U-Net for image segmentation
unet U-Net for image segmentation
Describing a Resource¶
$ bob binseg config describe --help
Usage: bob binseg config describe [OPTIONS] NAME...
Describe a specific configuration file.
Options:
-v, --verbose Increase the verbosity level from 0 (only error messages) to
1 (warnings), 2 (log messages), 3 (debug information) by
adding the --verbose option as often as desired (e.g. '-vvv'
for debug).
-?, -h, --help Show this message and exit.
Examples:
1. Describes the DRIVE (training) dataset configuration:
$ bob binseg config describe drive
2. Describes the DRIVE (training) dataset configuration and lists its
contents:
$ bob binseg config describe drive -v
Copying a Resource¶
You may use this command to locally copy a resource file so you can change it.
$ bob binseg config copy --help
Usage: bob binseg config copy [OPTIONS] SOURCE DESTINATION
Copy a specific configuration resource so it can be modified locally.
Options:
-v, --verbose Increase the verbosity level from 0 (only error messages) to
1 (warnings), 2 (log messages), 3 (debug information) by
adding the --verbose option as often as desired (e.g. '-vvv'
for debug).
-?, -h, --help Show this message and exit.
Examples:
1. Makes a copy of one of the stock configuration files locally, so it can be
adapted:
$ bob binseg config copy drive -vvv newdataset.py
Running and Analyzing Experiments¶
These applications run a combined set of steps in one go. They work well with our preset configuration resources.
Running a Full Experiment Cycle¶
This command can run training, prediction, evaluation and comparison from a single, multi-step application.
$ bob binseg experiment --help
Usage: bob binseg experiment [OPTIONS] [CONFIG]...
Runs a complete experiment, from training, to prediction and evaluation
This script is just a wrapper around the individual scripts for
training, running prediction, evaluating and comparing FCN
model performance. It organises the output in a preset
way::
└─ <output-folder>/
├── model/ #the generated model will be here
├── predictions/ #the prediction outputs for the train/test set
├── overlayed/ #the overlayed outputs for the train/test set
├── predictions/ #predictions overlayed on the input images
├── analysis/ #predictions overlayed on the input images
├ #including analysis of false positives, negatives
├ #and true positives
└── second-annotator/ #if set, store overlayed images for the
#second annotator here
└── analysis / #the outputs of the analysis of both train/test sets
#includes second-annotator "mesures" as well, if
# configured
Training is performed for a configurable number of epochs, and
generates at least a final_model.pth. It may also generate
a number of intermediate checkpoints. Checkpoints are model
files (.pth files) that are stored during the training and
useful to resume the procedure in case it stops abruptly.
N.B.: The tool is designed to prevent analysis bias and allows one
to provide (potentially multiple) separate subsets for
training, validation, and evaluation. Instead of using
simple datasets, datasets for full experiment running should
be dictionaries with specific subset names:
* ``__train__``: dataset used for training, prioritarily. It is
typically the dataset containing data augmentation
pipelines. * ``__valid__``: dataset used for validation. It
is typically disjoint from the training and test sets. In
such a case, we checkpoint the model with the lowest loss
on the validation set as well, throughout all the
training, besides the model at the end of training. *
``train`` (optional): a copy of the ``__train__`` dataset, without
data augmentation, that will be evaluated alongside other
sets available * ``__valid_extra__``: a list of datasets
that are tracked during validation, but do not affect
checkpoiting. If present, an extra column with an array
containing the loss of each set is kept on the training
log. * ``*``: any other name, not starting with an
underscore character (``_``), will be considered a test
set for evaluation.
N.B.2: The threshold used for calculating the F1-score on the test
set, or overlay analysis (false positives, negatives and
true positives overprinted on the original image) also
follows the logic above.
It is possible to pass one or several Python files (or names of
``bob.ip.binseg.config`` entry points or module names i.e. import paths) as
CONFIG arguments to this command line which contain the parameters listed
below as Python variables. Available entry points are:
**bob.ip.binseg** entry points are: chasedb1, chasedb1-1024, chasedb1-2nd,
chasedb1-768, chasedb1-covd, chasedb1-mtest, chasedb1-xtest, combined-cup,
combined-disc, combined-vessels, csv-dataset-example, cxr8, cxr8-idiap,
cxr8-idiap-xtest, cxr8-xtest, drhagis, drionsdb, drionsdb-2nd,
drionsdb-2nd-512, drionsdb-512, drionsdb-768, drishtigs1-cup,
drishtigs1-cup-512, drishtigs1-cup-768, drishtigs1-cup-any, drishtigs1-disc,
drishtigs1-disc-512, drishtigs1-disc-768, drishtigs1-disc-any, driu, driu-
bn, driu-od, drive, drive-1024, drive-2nd, drive-768, drive-covd, drive-
mtest, drive-xtest, hed, hrf, hrf-1024, hrf-768, hrf-covd, hrf-highres, hrf-
mtest, hrf-xtest, iostar-disc, iostar-disc-512, iostar-disc-768, iostar-
vessel, iostar-vessel-768, iostar-vessel-covd, iostar-vessel-mtest, iostar-
vessel-xtest, jsrt, jsrt-xtest, lwnet, m2unet, montgomery, montgomery-xtest,
refuge-cup, refuge-cup-512, refuge-cup-768, refuge-disc, refuge-disc-512,
refuge-disc-768, resunet, rimoner3-cup, rimoner3-cup-2nd, rimoner3-cup-512,
rimoner3-cup-768, rimoner3-disc, rimoner3-disc-2nd, rimoner3-disc-512,
rimoner3-disc-768, shenzhen, shenzhen-small, shenzhen-xtest, stare,
stare-1024, stare-2nd, stare-768, stare-covd, stare-mtest, stare-xtest, unet
The options through the command-line (see below) will override the values of
argument provided configuration files. You can run this command with
``<COMMAND> -H example_config.py`` to create a template config file.
Options:
-o, --output-folder PATH Path where to store experiment outputs
(created if does not exist) [required]
-m, --model CUSTOM A torch.nn.Module instance implementing the
network to be trained, and then evaluated
[required]
-d, --dataset CUSTOM A dictionary mapping string keys to
torch.utils.data.dataset.Dataset instances
implementing datasets to be used for
training and validating the model, possibly
including all pre-processing pipelines
required or, optionally, a dictionary
mapping string keys to
torch.utils.data.dataset.Dataset instances.
At least one key named ``train`` must be
available. This dataset will be used for
training the network model. The dataset
description must include all required pre-
processing, including eventual data
augmentation. If a dataset named
``__train__`` is available, it is used
prioritarily for training instead of
``train``. If a dataset named ``__valid__``
is available, it is used for model
validation (and automatic check-pointing) at
each epoch. If a dataset list named
``__valid_extra__`` is available, then it
will be tracked during the validation
process and its loss output at the training
log as well, in the format of an array
occupying a single column. All other keys
are considered test datasets and only used
during analysis, to report the final system
performance [required]
-S, --second-annotator CUSTOM A dataset or dictionary, like in --dataset,
with the same sample keys, but with
annotations from a different annotator that
is going to be compared to the one in
--dataset
--optimizer CUSTOM A torch.optim.Optimizer that will be used to
train the network [required]
--criterion CUSTOM A loss function to compute the FCN error for
every sample respecting the PyTorch API for
loss functions (see torch.nn.modules.loss)
[required]
--scheduler CUSTOM A learning rate scheduler that drives
changes in the learning rate depending on
the FCN state (see torch.optim.lr_scheduler)
[required]
-b, --batch-size INTEGER RANGE Number of samples in every batch (this
parameter affects memory requirements for
the network). If the number of samples in
the batch is larger than the total number of
samples available for training, this value
is truncated. If this number is smaller,
then batches of the specified size are
created and fed to the network until there
are no more new samples to feed (epoch is
finished). If the total number of training
samples is not a multiple of the batch-size,
the last batch will be smaller than the
first, unless --drop-incomplete-batch is
set, in which case this batch is not used.
[default: 2; x>=1; required]
-c, --batch-chunk-count INTEGER RANGE
Number of chunks in every batch (this
parameter affects memory requirements for
the network). The number of samples loaded
for every iteration will be batch-
size/batch-chunk-count. batch-size needs to
be divisible by batch-chunk-count, otherwise
an error will be raised. This parameter is
used to reduce number of samples loaded in
each iteration, in order to reduce the
memory usage in exchange for processing time
(more iterations). This is specially
interesting whe one is running with GPUs
with limited RAM. The default of 1 forces
the whole batch to be processed at once.
Otherwise the batch is broken into batch-
chunk-count pieces, and gradients are
accumulated to complete each batch.
[default: 1; x>=1; required]
-D, --drop-incomplete-batch / --no-drop-incomplete-batch
If set, then may drop the last batch in an
epoch, in case it is incomplete. If you set
this option, you should also consider
increasing the total number of epochs of
training, as the total number of training
steps may be reduced [default: no-drop-
incomplete-batch; required]
-e, --epochs INTEGER RANGE Number of epochs (complete training set
passes) to train for. If continuing from a
saved checkpoint, ensure to provide a
greater number of epochs than that saved on
the checkpoint to be loaded. [default:
1000; x>=1; required]
-p, --checkpoint-period INTEGER RANGE
Number of epochs after which a checkpoint is
saved. A value of zero will disable check-
pointing. If checkpointing is enabled and
training stops, it is automatically resumed
from the last saved checkpoint if training
is restarted with the same configuration.
[default: 0; x>=0; required]
-d, --device TEXT A string indicating the device to use (e.g.
"cpu" or "cuda:0") [default: cpu; required]
-s, --seed INTEGER RANGE Seed to use for the random number generator
[default: 42; x>=0]
-P, --parallel INTEGER RANGE Use multiprocessing for data loading and
processing: if set to -1 (default), disables
multiprocessing altogether. Set to 0 to
enable as many data loading instances as
processing cores as available in the system.
Set to >= 1 to enable that many
multiprocessing instances for data
processing. [default: -1; x>=-1; required]
-I, --monitoring-interval FLOAT RANGE
Time between checks for the use of resources
during each training epoch. An interval of
5 seconds, for example, will lead to CPU and
GPU resources being probed every 5 seconds
during each training epoch. Values
registered in the training logs correspond
to averages (or maxima) observed through
possibly many probes in each epoch. Notice
that setting a very small value may cause
the probing process to become extremely
busy, potentially biasing the overall
perception of resource usage. [default:
5.0; x>=0.1; required]
-O, --overlayed / --no-overlayed
Creates overlayed representations of the
output probability maps, similar to
--overlayed in prediction-mode, except it
includes distinctive colours for true and
false positives and false negatives. If not
set, or empty then do **NOT** output
overlayed images. [default: no-overlayed]
-S, --steps INTEGER This number is used to define the number of
threshold steps to consider when evaluating
the highest possible F1-score on test data.
[default: 1000; required]
-L, --plot-limits FLOAT... If set, this option affects the performance
comparison plots. It must be a 4-tuple
containing the bounds of the plot for the x
and y axis respectively (format: x_low,
x_high, y_low, y_high]). If not set, use
normal bounds ([0, 1, 0, 1]) for the
performance curve. [default: 0.0, 1.0, 0.0,
1.0]
-v, --verbose Increase the verbosity level from 0 (only
error messages) to 1 (warnings), 2 (log
messages), 3 (debug information) by adding
the --verbose option as often as desired
(e.g. '-vvv' for debug).
-H, --dump-config FILENAME Name of the config file to be generated
-?, -h, --help Show this message and exit.
Examples:
1. Trains an M2U-Net model (VGG-16 backbone) with DRIVE (vessel
segmentation), on the CPU, for only two epochs, then runs inference and
evaluation on stock datasets, report performance as a table and a figure:
$ bob binseg experiment -vv m2unet drive --epochs=2
Running Complete Experiment Analysis¶
This command can run prediction, evaluation and comparison from a single, multi-step application.
$ bob binseg analyze --help
Usage: bob binseg analyze [OPTIONS] [CONFIG]...
Runs a complete evaluation from prediction to comparison
This script is just a wrapper around the individual scripts for
running prediction and evaluating FCN models. It organises
the output in a preset way::
└─ <output-folder>/
├── predictions/ #the prediction outputs for the train/test set
├── overlayed/ #the overlayed outputs for the train/test set
├── predictions/ #predictions overlayed on the input images
├── analysis/ #predictions overlayed on the input images
├ #including analysis of false positives, negatives
├ #and true positives
└── second-annotator/ #if set, store overlayed images for the
#second annotator here
└── analysis / #the outputs of the analysis of both train/test sets
#includes second-annotator "mesures" as well, if
# configured
N.B.: The tool is designed to prevent analysis bias and allows one
to provide separate subsets for training and evaluation.
Instead of using simple datasets, datasets for full
experiment running should be dictionaries with specific
subset names:
* ``__train__``: dataset used for training, prioritarily. It is
typically the dataset containing data augmentation
pipelines. * ``train`` (optional): a copy of the
``__train__`` dataset, without data augmentation, that
will be evaluated alongside other sets available * ``*``:
any other name, not starting with an underscore character (``_``),
will be considered a test set for evaluation.
N.B.2: The threshold used for calculating the F1-score on the test
set, or overlay analysis (false positives, negatives and
true positives overprinted on the original image) also
follows the logic above.
It is possible to pass one or several Python files (or names of
``bob.ip.binseg.config`` entry points or module names i.e. import paths) as
CONFIG arguments to this command line which contain the parameters listed
below as Python variables. Available entry points are:
**bob.ip.binseg** entry points are: chasedb1, chasedb1-1024, chasedb1-2nd,
chasedb1-768, chasedb1-covd, chasedb1-mtest, chasedb1-xtest, combined-cup,
combined-disc, combined-vessels, csv-dataset-example, cxr8, cxr8-idiap,
cxr8-idiap-xtest, cxr8-xtest, drhagis, drionsdb, drionsdb-2nd,
drionsdb-2nd-512, drionsdb-512, drionsdb-768, drishtigs1-cup,
drishtigs1-cup-512, drishtigs1-cup-768, drishtigs1-cup-any, drishtigs1-disc,
drishtigs1-disc-512, drishtigs1-disc-768, drishtigs1-disc-any, driu, driu-
bn, driu-od, drive, drive-1024, drive-2nd, drive-768, drive-covd, drive-
mtest, drive-xtest, hed, hrf, hrf-1024, hrf-768, hrf-covd, hrf-highres, hrf-
mtest, hrf-xtest, iostar-disc, iostar-disc-512, iostar-disc-768, iostar-
vessel, iostar-vessel-768, iostar-vessel-covd, iostar-vessel-mtest, iostar-
vessel-xtest, jsrt, jsrt-xtest, lwnet, m2unet, montgomery, montgomery-xtest,
refuge-cup, refuge-cup-512, refuge-cup-768, refuge-disc, refuge-disc-512,
refuge-disc-768, resunet, rimoner3-cup, rimoner3-cup-2nd, rimoner3-cup-512,
rimoner3-cup-768, rimoner3-disc, rimoner3-disc-2nd, rimoner3-disc-512,
rimoner3-disc-768, shenzhen, shenzhen-small, shenzhen-xtest, stare,
stare-1024, stare-2nd, stare-768, stare-covd, stare-mtest, stare-xtest, unet
The options through the command-line (see below) will override the values of
argument provided configuration files. You can run this command with
``<COMMAND> -H example_config.py`` to create a template config file.
Options:
-o, --output-folder PATH Path where to store experiment outputs
(created if does not exist) [required]
-m, --model CUSTOM A torch.nn.Module instance implementing the
network to be trained, and then evaluated
[required]
-d, --dataset CUSTOM A dictionary mapping string keys to bob.ip.c
ommon.data.utils.SampleList2TorchDataset's.
At least one key named 'train' must be
available. This dataset will be used for
training the network model. All other
datasets will be used for prediction and
evaluation. Dataset descriptions include all
required pre-processing, including eventual
data augmentation, which may be eventually
excluded for prediction and evaluation
purposes [required]
-S, --second-annotator CUSTOM A dataset or dictionary, like in --dataset,
with the same sample keys, but with
annotations from a different annotator that
is going to be compared to the one in
--dataset
-b, --batch-size INTEGER RANGE Number of samples in every batch (this
parameter affects memory requirements for
the network). If the number of samples in
the batch is larger than the total number of
samples available for training, this value
is truncated. If this number is smaller,
then batches of the specified size are
created and fed to the network until there
are no more new samples to feed (epoch is
finished). If the total number of training
samples is not a multiple of the batch-size,
the last batch will be smaller than the
first. [default: 1; x>=1; required]
-d, --device TEXT A string indicating the device to use (e.g.
"cpu" or "cuda:0") [default: cpu; required]
-O, --overlayed / --no-overlayed
Creates overlayed representations of the
output probability maps, similar to
--overlayed in prediction-mode, except it
includes distinctive colours for true and
false positives and false negatives. If not
set, or empty then do **NOT** output
overlayed images. [default: no-overlayed]
-w, --weight CUSTOM Path or URL to pretrained model file (.pth
extension) [required]
-S, --steps INTEGER This number is used to define the number of
threshold steps to consider when evaluating
the highest possible F1-score on test data.
[default: 1000; required]
-P, --parallel INTEGER RANGE Use multiprocessing for data processing: if
set to -1 (default), disables
multiprocessing. Set to 0 to enable as many
data loading instances as processing cores
as available in the system. Set to >= 1 to
enable that many multiprocessing instances
for data processing. [default: -1; x>=-1;
required]
-L, --plot-limits FLOAT... If set, this option affects the performance
comparison plots. It must be a 4-tuple
containing the bounds of the plot for the x
and y axis respectively (format: x_low,
x_high, y_low, y_high]). If not set, use
normal bounds ([0, 1, 0, 1]) for the
performance curve. [default: 0.0, 1.0, 0.0,
1.0]
-v, --verbose Increase the verbosity level from 0 (only
error messages) to 1 (warnings), 2 (log
messages), 3 (debug information) by adding
the --verbose option as often as desired
(e.g. '-vvv' for debug).
-H, --dump-config FILENAME Name of the config file to be generated
-h, -?, --help Show this message and exit.
Examples:
1. Re-evaluates a pre-trained M2U-Net model with DRIVE (vessel
segmentation), on the CPU, by running inference and evaluation on results
from its test set:
$ bob binseg analyze -vv m2unet drive --weight=model.path
Single-Step Applications¶
These applications allow finer control over the experiment cycle. They also work well with our preset configuration resources, but allow finer control on the input datasets.
Training FCNs¶
Training creates of a new PyTorch model. This model can be used for evaluation tests or for inference.
$ bob binseg train --help
Usage: bob binseg train [OPTIONS] [CONFIG]...
Trains an FCN to perform binary segmentation.
Training is performed for a configurable number of epochs, and generates
at least a final_model.pth. It may also generate a number of
intermediate checkpoints. Checkpoints are model files (.pth files)
that are stored during the training and useful to resume the
procedure in case it stops abruptly.
Tip: In case the model has been trained over a number of epochs, it is
possible to continue training, by simply relaunching the same command,
and changing the number of epochs to a number greater than the
number where the original training session stopped (or the last
checkpoint was saved).
It is possible to pass one or several Python files (or names of
``bob.ip.binseg.config`` entry points or module names i.e. import paths) as
CONFIG arguments to this command line which contain the parameters listed
below as Python variables. Available entry points are:
**bob.ip.binseg** entry points are: chasedb1, chasedb1-1024, chasedb1-2nd,
chasedb1-768, chasedb1-covd, chasedb1-mtest, chasedb1-xtest, combined-cup,
combined-disc, combined-vessels, csv-dataset-example, cxr8, cxr8-idiap,
cxr8-idiap-xtest, cxr8-xtest, drhagis, drionsdb, drionsdb-2nd,
drionsdb-2nd-512, drionsdb-512, drionsdb-768, drishtigs1-cup,
drishtigs1-cup-512, drishtigs1-cup-768, drishtigs1-cup-any, drishtigs1-disc,
drishtigs1-disc-512, drishtigs1-disc-768, drishtigs1-disc-any, driu, driu-
bn, driu-od, drive, drive-1024, drive-2nd, drive-768, drive-covd, drive-
mtest, drive-xtest, hed, hrf, hrf-1024, hrf-768, hrf-covd, hrf-highres, hrf-
mtest, hrf-xtest, iostar-disc, iostar-disc-512, iostar-disc-768, iostar-
vessel, iostar-vessel-768, iostar-vessel-covd, iostar-vessel-mtest, iostar-
vessel-xtest, jsrt, jsrt-xtest, lwnet, m2unet, montgomery, montgomery-xtest,
refuge-cup, refuge-cup-512, refuge-cup-768, refuge-disc, refuge-disc-512,
refuge-disc-768, resunet, rimoner3-cup, rimoner3-cup-2nd, rimoner3-cup-512,
rimoner3-cup-768, rimoner3-disc, rimoner3-disc-2nd, rimoner3-disc-512,
rimoner3-disc-768, shenzhen, shenzhen-small, shenzhen-xtest, stare,
stare-1024, stare-2nd, stare-768, stare-covd, stare-mtest, stare-xtest, unet
The options through the command-line (see below) will override the values of
argument provided configuration files. You can run this command with
``<COMMAND> -H example_config.py`` to create a template config file.
Options:
-o, --output-folder PATH Path where to store the generated model
(created if does not exist) [required]
-m, --model CUSTOM A torch.nn.Module instance implementing the
network to be trained [required]
-d, --dataset CUSTOM A dictionary mapping string keys to
torch.utils.data.dataset.Dataset instances
implementing datasets to be used for
training and validating the model, possibly
including all pre-processing pipelines
required or, optionally, a dictionary
mapping string keys to
torch.utils.data.dataset.Dataset instances.
At least one key named ``train`` must be
available. This dataset will be used for
training the network model. The dataset
description must include all required pre-
processing, including eventual data
augmentation. If a dataset named
``__train__`` is available, it is used
prioritarily for training instead of
``train``. If a dataset named ``__valid__``
is available, it is used for model
validation (and automatic check-pointing) at
each epoch. If a dataset list named
``__extra_valid__`` is available, then it
will be tracked during the validation
process and its loss output at the training
log as well, in the format of an array
occupying a single column. All other keys
are considered test datasets and are ignored
during training [required]
--optimizer CUSTOM A torch.optim.Optimizer that will be used to
train the network [required]
--criterion CUSTOM A loss function to compute the FCN error for
every sample respecting the PyTorch API for
loss functions (see torch.nn.modules.loss)
[required]
--scheduler CUSTOM A learning rate scheduler that drives
changes in the learning rate depending on
the FCN state (see torch.optim.lr_scheduler)
[required]
-b, --batch-size INTEGER RANGE Number of samples in every batch (this
parameter affects memory requirements for
the network). If the number of samples in
the batch is larger than the total number of
samples available for training, this value
is truncated. If this number is smaller,
then batches of the specified size are
created and fed to the network until there
are no more new samples to feed (epoch is
finished). If the total number of training
samples is not a multiple of the batch-size,
the last batch will be smaller than the
first, unless --drop-incomplete-batch is
set, in which case this batch is not used.
[default: 2; x>=1; required]
-c, --batch-chunk-count INTEGER RANGE
Number of chunks in every batch (this
parameter affects memory requirements for
the network). The number of samples loaded
for every iteration will be batch-
size/batch-chunk-count. batch-size needs to
be divisible by batch-chunk-count, otherwise
an error will be raised. This parameter is
used to reduce number of samples loaded in
each iteration, in order to reduce the
memory usage in exchange for processing time
(more iterations). This is specially
interesting whe one is running with GPUs
with limited RAM. The default of 1 forces
the whole batch to be processed at once.
Otherwise the batch is broken into batch-
chunk-count pieces, and gradients are
accumulated to complete each batch.
[default: 1; x>=1; required]
-D, --drop-incomplete-batch / --no-drop-incomplete-batch
If set, then may drop the last batch in an
epoch, in case it is incomplete. If you set
this option, you should also consider
increasing the total number of epochs of
training, as the total number of training
steps may be reduced [default: no-drop-
incomplete-batch; required]
-e, --epochs INTEGER RANGE Number of epochs (complete training set
passes) to train for. If continuing from a
saved checkpoint, ensure to provide a
greater number of epochs than that saved on
the checkpoint to be loaded. [default:
1000; x>=1; required]
-p, --checkpoint-period INTEGER RANGE
Number of epochs after which a checkpoint is
saved. A value of zero will disable check-
pointing. If checkpointing is enabled and
training stops, it is automatically resumed
from the last saved checkpoint if training
is restarted with the same configuration.
[default: 0; x>=0; required]
-d, --device TEXT A string indicating the device to use (e.g.
"cpu" or "cuda:0") [default: cpu; required]
-s, --seed INTEGER RANGE Seed to use for the random number generator
[default: 42; x>=0]
-P, --parallel INTEGER RANGE Use multiprocessing for data loading: if set
to -1 (default), disables multiprocessing
data loading. Set to 0 to enable as many
data loading instances as processing cores
as available in the system. Set to >= 1 to
enable that many multiprocessing instances
for data loading. [default: -1; x>=-1;
required]
-I, --monitoring-interval FLOAT RANGE
Time between checks for the use of resources
during each training epoch. An interval of
5 seconds, for example, will lead to CPU and
GPU resources being probed every 5 seconds
during each training epoch. Values
registered in the training logs correspond
to averages (or maxima) observed through
possibly many probes in each epoch. Notice
that setting a very small value may cause
the probing process to become extremely
busy, potentially biasing the overall
perception of resource usage. [default:
5.0; x>=0.1; required]
-v, --verbose Increase the verbosity level from 0 (only
error messages) to 1 (warnings), 2 (log
messages), 3 (debug information) by adding
the --verbose option as often as desired
(e.g. '-vvv' for debug).
-H, --dump-config FILENAME Name of the config file to be generated
-h, -?, --help Show this message and exit.
Examples:
1. Trains a U-Net model (VGG-16 backbone) with DRIVE (vessel segmentation),
on a GPU (``cuda:0``):
$ bob binseg train -vv unet drive --batch-size=4 --device="cuda:0"
2. Trains a HED model with HRF on a GPU (``cuda:0``):
$ bob binseg train -vv hed hrf --batch-size=8 --device="cuda:0"
3. Trains a M2U-Net model on the COVD-DRIVE dataset on the CPU:
$ bob binseg train -vv m2unet covd-drive --batch-size=8
Prediction with FCNs¶
Inference takes as input a PyTorch model and generates output probabilities as HDF5 files. The probability map has the same size as the input and indicates, from 0 to 1 (floating-point number), the probability of a vessel in that pixel, from less probable (0.0) to more probable (1.0).
$ bob binseg predict --help
Usage: bob binseg predict [OPTIONS] [CONFIG]...
Predicts vessel map (probabilities) on input images.
It is possible to pass one or several Python files (or names of
``bob.ip.binseg.config`` entry points or module names i.e. import paths) as
CONFIG arguments to this command line which contain the parameters listed
below as Python variables. Available entry points are:
**bob.ip.binseg** entry points are: chasedb1, chasedb1-1024, chasedb1-2nd,
chasedb1-768, chasedb1-covd, chasedb1-mtest, chasedb1-xtest, combined-cup,
combined-disc, combined-vessels, csv-dataset-example, cxr8, cxr8-idiap,
cxr8-idiap-xtest, cxr8-xtest, drhagis, drionsdb, drionsdb-2nd,
drionsdb-2nd-512, drionsdb-512, drionsdb-768, drishtigs1-cup,
drishtigs1-cup-512, drishtigs1-cup-768, drishtigs1-cup-any, drishtigs1-disc,
drishtigs1-disc-512, drishtigs1-disc-768, drishtigs1-disc-any, driu, driu-
bn, driu-od, drive, drive-1024, drive-2nd, drive-768, drive-covd, drive-
mtest, drive-xtest, hed, hrf, hrf-1024, hrf-768, hrf-covd, hrf-highres, hrf-
mtest, hrf-xtest, iostar-disc, iostar-disc-512, iostar-disc-768, iostar-
vessel, iostar-vessel-768, iostar-vessel-covd, iostar-vessel-mtest, iostar-
vessel-xtest, jsrt, jsrt-xtest, lwnet, m2unet, montgomery, montgomery-xtest,
refuge-cup, refuge-cup-512, refuge-cup-768, refuge-disc, refuge-disc-512,
refuge-disc-768, resunet, rimoner3-cup, rimoner3-cup-2nd, rimoner3-cup-512,
rimoner3-cup-768, rimoner3-disc, rimoner3-disc-2nd, rimoner3-disc-512,
rimoner3-disc-768, shenzhen, shenzhen-small, shenzhen-xtest, stare,
stare-1024, stare-2nd, stare-768, stare-covd, stare-mtest, stare-xtest, unet
The options through the command-line (see below) will override the values of
argument provided configuration files. You can run this command with
``<COMMAND> -H example_config.py`` to create a template config file.
Options:
-o, --output-folder PATH Path where to store the predictions (created
if does not exist) [required]
-m, --model CUSTOM A torch.nn.Module instance implementing the
network to be evaluated [required]
-d, --dataset CUSTOM A torch.utils.data.dataset.Dataset instance
implementing a dataset to be used for
running prediction, possibly including all
pre-processing pipelines required or,
optionally, a dictionary mapping string keys
to torch.utils.data.dataset.Dataset
instances. All keys that do not start with
an underscore (_) will be processed.
[required]
-b, --batch-size INTEGER RANGE Number of samples in every batch (this
parameter affects memory requirements for
the network) [default: 1; x>=1; required]
-d, --device TEXT A string indicating the device to use (e.g.
"cpu" or "cuda:0") [default: cpu; required]
-w, --weight CUSTOM Path or URL to pretrained model file (.pth
extension) [required]
-O, --overlayed CUSTOM Creates overlayed representations of the
output probability maps on top of input
images (store results as PNG files). If
not set, or empty then do **NOT** output
overlayed images. Otherwise, the parameter
represents the name of a folder where to
store those
-P, --parallel INTEGER RANGE Use multiprocessing for data loading: if set
to -1 (default), disables multiprocessing
data loading. Set to 0 to enable as many
data loading instances as processing cores
as available in the system. Set to >= 1 to
enable that many multiprocessing instances
for data loading. [default: -1; x>=-1;
required]
-v, --verbose Increase the verbosity level from 0 (only
error messages) to 1 (warnings), 2 (log
messages), 3 (debug information) by adding
the --verbose option as often as desired
(e.g. '-vvv' for debug).
-H, --dump-config FILENAME Name of the config file to be generated
-?, -h, --help Show this message and exit.
Examples:
1. Runs prediction on an existing dataset configuration:
$ bob binseg predict -vv m2unet drive --weight=path/to/model_final_epoch.pth --output-folder=path/to/predictions
2. To run prediction on a folder with your own images, you must first
specify resizing, cropping, etc, so that the image can be correctly
input to the model. Failing to do so will likely result in poor
performance. To figure out such specifications, you must consult the
dataset configuration used for **training** the provided model. Once
you figured this out, do the following:
$ bob binseg config copy csv-dataset-example mydataset.py
# modify "mydataset.py" to include the base path and required transforms
$ bob binseg predict -vv m2unet mydataset.py --weight=path/to/model_final_epoch.pth --output-folder=path/to/predictions
FCN Performance Evaluation¶
Evaluation takes inference results and compares it to ground-truth, generating a series of analysis figures which are useful to understand model performance.
$ bob binseg evaluate --help
Usage: bob binseg evaluate [OPTIONS] [CONFIG]...
Evaluate an FCN on a binary segmentation task.
It is possible to pass one or several Python files (or names of
``bob.ip.binseg.config`` entry points or module names i.e. import paths) as
CONFIG arguments to this command line which contain the parameters listed
below as Python variables. Available entry points are:
**bob.ip.binseg** entry points are: chasedb1, chasedb1-1024, chasedb1-2nd,
chasedb1-768, chasedb1-covd, chasedb1-mtest, chasedb1-xtest, combined-cup,
combined-disc, combined-vessels, csv-dataset-example, cxr8, cxr8-idiap,
cxr8-idiap-xtest, cxr8-xtest, drhagis, drionsdb, drionsdb-2nd,
drionsdb-2nd-512, drionsdb-512, drionsdb-768, drishtigs1-cup,
drishtigs1-cup-512, drishtigs1-cup-768, drishtigs1-cup-any, drishtigs1-disc,
drishtigs1-disc-512, drishtigs1-disc-768, drishtigs1-disc-any, driu, driu-
bn, driu-od, drive, drive-1024, drive-2nd, drive-768, drive-covd, drive-
mtest, drive-xtest, hed, hrf, hrf-1024, hrf-768, hrf-covd, hrf-highres, hrf-
mtest, hrf-xtest, iostar-disc, iostar-disc-512, iostar-disc-768, iostar-
vessel, iostar-vessel-768, iostar-vessel-covd, iostar-vessel-mtest, iostar-
vessel-xtest, jsrt, jsrt-xtest, lwnet, m2unet, montgomery, montgomery-xtest,
refuge-cup, refuge-cup-512, refuge-cup-768, refuge-disc, refuge-disc-512,
refuge-disc-768, resunet, rimoner3-cup, rimoner3-cup-2nd, rimoner3-cup-512,
rimoner3-cup-768, rimoner3-disc, rimoner3-disc-2nd, rimoner3-disc-512,
rimoner3-disc-768, shenzhen, shenzhen-small, shenzhen-xtest, stare,
stare-1024, stare-2nd, stare-768, stare-covd, stare-mtest, stare-xtest, unet
The options through the command-line (see below) will override the values of
argument provided configuration files. You can run this command with
``<COMMAND> -H example_config.py`` to create a template config file.
Options:
-o, --output-folder PATH Path where to store the analysis result
(created if does not exist) [required]
-p, --predictions-folder DIRECTORY
Path where predictions are currently stored
[required]
-d, --dataset CUSTOM A torch.utils.data.dataset.Dataset instance
implementing a dataset to be used for
evaluation purposes, possibly including all
pre-processing pipelines required or,
optionally, a dictionary mapping string keys
to torch.utils.data.dataset.Dataset
instances. All keys that do not start with
an underscore (_) will be processed.
[required]
-S, --second-annotator CUSTOM A dataset or dictionary, like in --dataset,
with the same sample keys, but with
annotations from a different annotator that
is going to be compared to the one in
--dataset. The same rules regarding dataset
naming conventions apply
-O, --overlayed CUSTOM Creates overlayed representations of the
output probability maps, similar to
--overlayed in prediction-mode, except it
includes distinctive colours for true and
false positives and false negatives. If not
set, or empty then do **NOT** output
overlayed images. Otherwise, the parameter
represents the name of a folder where to
store those
-t, --threshold CUSTOM This number is used to define positives and
negatives from probability maps, and report
F1-scores (a priori). It should either come
from the training set or a separate
validation set to avoid biasing the
analysis. Optionally, if you provide a
multi-set dataset as input, this may also be
the name of an existing set from which the
threshold will be estimated (highest
F1-score) and then applied to the subsequent
sets. This number is also used to print the
test set F1-score a priori performance
-S, --steps INTEGER This number is used to define the number of
threshold steps to consider when evaluating
the highest possible F1-score on test data.
[default: 1000; required]
-P, --parallel INTEGER RANGE Use multiprocessing for data processing: if
set to -1 (default), disables
multiprocessing. Set to 0 to enable as many
data loading instances as processing cores
as available in the system. Set to >= 1 to
enable that many multiprocessing instances
for data processing. [default: -1; x>=-1;
required]
-v, --verbose Increase the verbosity level from 0 (only
error messages) to 1 (warnings), 2 (log
messages), 3 (debug information) by adding
the --verbose option as often as desired
(e.g. '-vvv' for debug).
-H, --dump-config FILENAME Name of the config file to be generated
-h, -?, --help Show this message and exit.
Examples:
1. Runs evaluation on an existing dataset configuration:
$ bob binseg evaluate -vv drive --predictions-folder=path/to/predictions --output-folder=path/to/results
2. To run evaluation on a folder with your own images and annotations, you
must first specify resizing, cropping, etc, so that the image can be
correctly input to the model. Failing to do so will likely result in
poor performance. To figure out such specifications, you must consult
the dataset configuration used for **training** the provided model.
Once you figured this out, do the following:
$ bob binseg config copy csv-dataset-example mydataset.py
# modify "mydataset.py" to your liking
$ bob binseg evaluate -vv mydataset.py --predictions-folder=path/to/predictions --output-folder=path/to/results
Performance Comparison¶
Performance comparison takes the performance evaluation results and generate combined figures and tables that compare results of multiple systems.
$ bob binseg compare --help
Usage: bob binseg compare [OPTIONS] [LABEL_PATH]...
Compare multiple systems together.
Options:
-f, --output-figure FILE Path where write the output figure (any
extension supported by matplotlib is
possible). If not provided, does not
produce a figure.
-T, --table-format [asciidoc|double_grid|double_outline|fancy_grid|fancy_outline|github|grid|heavy_grid|heavy_outline|html|jira|latex|latex_booktabs|latex_longtable|latex_raw|mediawiki|mixed_grid|mixed_outline|moinmoin|orgtbl|outline|pipe|plain|presto|pretty|psql|rounded_grid|rounded_outline|rst|simple|simple_grid|simple_outline|textile|tsv|unsafehtml|youtrack]
The format to use for the comparison table
[default: rst; required]
-u, --output-table FILE Path where write the output table. If not
provided, does not write write a table to
file, only to stdout.
-t, --threshold TEXT This number is used to select which F1-score
to use for representing a system
performance. If not set, we report the
maximum F1-score in the set, which is
equivalent to threshold selection a
posteriori (biased estimator), unless the
performance file being considered already
was pre-tunned, and contains a
'threshold_a_priori' column which we then
use to pick a threshold for the dataset. You
can override this behaviour by either
setting this value to a floating-point
number in the range [0.0, 1.0], or to a
string, naming one of the systems which will
be used to calculate the threshold leading
to the maximum F1-score and then applied to
all other sets.
-L, --plot-limits FLOAT... If set, must be a 4-tuple containing the
bounds of the plot for the x and y axis
respectively (format: x_low, x_high, y_low,
y_high]). If not set, use normal bounds
([0, 1, 0, 1]) for the performance curve.
[default: 0.0, 1.0, 0.0, 1.0]
-v, --verbose Increase the verbosity level from 0 (only
error messages) to 1 (warnings), 2 (log
messages), 3 (debug information) by adding
the --verbose option as often as desired
(e.g. '-vvv' for debug).
-?, -h, --help Show this message and exit.
Examples:
1. Compares system A and B, with their own pre-computed measure files:
$ bob binseg compare -vv A path/to/A/train.csv B path/to/B/test.csv
Performance Difference Significance¶
Calculates the significance between results obtained through 2 systems on the same dataset.
$ bob binseg significance --help
Usage: bob binseg significance [OPTIONS] [CONFIG]...
Evaluates how significantly different are two models on the same dataset
This application calculates the significance of results of two models
operating on the same dataset, and subject to a priori threshold
tunning.
It is possible to pass one or several Python files (or names of
``bob.ip.binseg.config`` entry points or module names i.e. import paths) as
CONFIG arguments to this command line which contain the parameters listed
below as Python variables. Available entry points are:
**bob.ip.binseg** entry points are: chasedb1, chasedb1-1024, chasedb1-2nd,
chasedb1-768, chasedb1-covd, chasedb1-mtest, chasedb1-xtest, combined-cup,
combined-disc, combined-vessels, csv-dataset-example, cxr8, cxr8-idiap,
cxr8-idiap-xtest, cxr8-xtest, drhagis, drionsdb, drionsdb-2nd,
drionsdb-2nd-512, drionsdb-512, drionsdb-768, drishtigs1-cup,
drishtigs1-cup-512, drishtigs1-cup-768, drishtigs1-cup-any, drishtigs1-disc,
drishtigs1-disc-512, drishtigs1-disc-768, drishtigs1-disc-any, driu, driu-
bn, driu-od, drive, drive-1024, drive-2nd, drive-768, drive-covd, drive-
mtest, drive-xtest, hed, hrf, hrf-1024, hrf-768, hrf-covd, hrf-highres, hrf-
mtest, hrf-xtest, iostar-disc, iostar-disc-512, iostar-disc-768, iostar-
vessel, iostar-vessel-768, iostar-vessel-covd, iostar-vessel-mtest, iostar-
vessel-xtest, jsrt, jsrt-xtest, lwnet, m2unet, montgomery, montgomery-xtest,
refuge-cup, refuge-cup-512, refuge-cup-768, refuge-disc, refuge-disc-512,
refuge-disc-768, resunet, rimoner3-cup, rimoner3-cup-2nd, rimoner3-cup-512,
rimoner3-cup-768, rimoner3-disc, rimoner3-disc-2nd, rimoner3-disc-512,
rimoner3-disc-768, shenzhen, shenzhen-small, shenzhen-xtest, stare,
stare-1024, stare-2nd, stare-768, stare-covd, stare-mtest, stare-xtest, unet
The options through the command-line (see below) will override the values of
argument provided configuration files. You can run this command with
``<COMMAND> -H example_config.py`` to create a template config file.
Options:
-n, --names TEXT... Names of the two systems to compare
[required]
-p, --predictions DIRECTORY... Path where predictions of system 2 are
currently stored. You may also input
predictions from a second-annotator. This
application will adequately handle it.
[required]
-d, --dataset CUSTOM A dictionary mapping string keys to
torch.utils.data.dataset.Dataset instances
[required]
-t, --threshold TEXT This number is used to define positives and
negatives from probability maps, and report
F1-scores (a priori). By default, we expect
a set named 'validation' to be available at
the input data. If that is not the case, we
use 'train', if available. You may provide
the name of another dataset to be used for
threshold tunning otherwise. If not set, or
a string is input, threshold tunning is done
per system, individually. Optionally, you
may also provide a floating-point number
between [0.0, 1.0] as the threshold to use
for both systems. [default: validation;
required]
-e, --evaluate TEXT Name of the dataset to evaluate [default:
test; required]
-S, --steps INTEGER This number is used to define the number of
threshold steps to consider when evaluating
the highest possible F1-score on train/test
data. [default: 1000; required]
-s, --size INTEGER... This is a tuple with two values indicating
the size of windows to be used for sliding
window analysis. The values represent
height and width respectively. [default:
128, 128; required]
-t, --stride INTEGER... This is a tuple with two values indicating
the stride of windows to be used for sliding
window analysis. The values represent
height and width respectively. [default:
32, 32; required]
-f, --figure TEXT The name of a performance figure (e.g.
f1_score, or jaccard) to use when comparing
performances [default: accuracy; required]
-o, --output-folder PATH Path where to store visualizations
-R, --remove-outliers / --no-remove-outliers
If set, removes outliers from both score
distributions before running statistical
analysis. Outlier removal follows a 1.5 IQR
range check from the difference in figures
between both systems and assumes most of the
distribution is contained within that range
(like in a normal distribution) [default:
no-remove-outliers; required]
-R, --remove-zeros / --no-remove-zeros
If set, removes instances from the
statistical analysis in which both systems
had a performance equal to zero. [default:
no-remove-zeros; required]
-x, --parallel INTEGER Set the number of parallel processes to use
when running using multiprocessing. A value
of zero uses all reported cores. [default:
1; required]
-k, --checkpoint-folder PATH Path where to store checkpointed versions of
sliding window performances
-v, --verbose Increase the verbosity level from 0 (only
error messages) to 1 (warnings), 2 (log
messages), 3 (debug information) by adding
the --verbose option as often as desired
(e.g. '-vvv' for debug).
-H, --dump-config FILENAME Name of the config file to be generated
-?, -h, --help Show this message and exit.
Examples:
1. Runs a significance test using as base the calculated predictions of two
different systems, on the **same** dataset:
$ bob binseg significance -vv drive --names system1 system2 --predictions=path/to/predictions/system-1 path/to/predictions/system-2
2. By default, we use a "validation" dataset if it is available, to infer
the a priori threshold for the comparison of two systems. Otherwise,
you may need to specify the name of a set to be used as validation set
for choosing a threshold. The same goes for the set to be used for
testing the hypothesis - by default we use the "test" dataset if it is
available, otherwise, specify.
$ bob binseg significance -vv drive --names system1 system2 --predictions=path/to/predictions/system-1 path/to/predictions/system-2 --threshold=train --evaluate=alternate-test