bob.ip.binseg.engine.evaluator¶

Defines functionality for the evaluation of predictions

Functions

`compare_annotators`(baseline, other, name, ...)	Compares annotations on the same dataset
`run`(dataset, name, predictions_folder[, ...])	Runs inference and calculates measures
`sample_measures_for_threshold`(pred, gt, ...)	Calculates counts on one single sample, for a specific threshold

bob.ip.binseg.engine.evaluator.sample_measures_for_threshold(pred, gt, mask, threshold)[source]¶

Calculates counts on one single sample, for a specific threshold

Parameters

pred (torch.Tensor) – pixel-wise predictions
gt (torch.Tensor) – ground-truth (annotations)
mask (torch.Tensor) – region mask (used only if available). May be set to None.
threshold (float) – a particular threshold in which to calculate the performance measures

Returns

tp (int)
fp (int)
tn (int)
fn (int)

bob.ip.binseg.engine.evaluator.run(dataset, name, predictions_folder, output_folder=None, overlayed_folder=None, threshold=None, steps=1000, parallel=-1)[source]¶

Runs inference and calculates measures

Parameters

dataset (py:class:torch.utils.data.Dataset) – a dataset to iterate on
name (str) – the local name of this dataset (e.g. train, or test), to be used when saving measures files.
predictions_folder (str) – folder where predictions for the dataset images have been previously stored
output_folder (str, Optional) – folder where to store results. If not provided, then do not store any analysis (useful for quickly calculating overlay thresholds)
overlayed_folder (str, Optional) – if not None, then it should be the name of a folder where to store overlayed versions of the images and ground-truths
threshold (float, Optional) – if overlayed_folder, then this should be threshold (floating point) to apply to prediction maps to decide on positives and negatives for overlaying analysis (graphical output). This number should come from the training set or a separate validation set. Using a test set value may bias your analysis. This number is also used to print the a priori F1-score on the evaluated set.
steps (float, Optional) – number of threshold steps to consider when evaluating thresholds.
parallel (int, Optional) – If set to a value different >= 0, uses multiprocessing for estimating thresholds for each sample through a processing pool. A value of zero will create as many processes in the pool as cores in the machine. A negative value disables multiprocessing altogether. A value greater than zero will spawn as many processes as requested.

Returns

threshold – Threshold to achieve the highest possible F1-score for this dataset

Return type

float

bob.ip.binseg.engine.evaluator.compare_annotators(baseline, other, name, output_folder, overlayed_folder=None, parallel=-1)[source]¶

Compares annotations on the same dataset

Parameters

baseline (py:class:torch.utils.data.Dataset) – a dataset to iterate on, containing the baseline annotations
other (py:class:torch.utils.data.Dataset) – a second dataset, with the same samples as baseline, but annotated by a different annotator than in the first dataset. The key values must much between baseline and this dataset.
name (str) – the local name of this dataset (e.g. train-second-annotator, or test-second-annotator), to be used when saving measures files.
output_folder (str) – folder where to store results
overlayed_folder (str, Optional) – if not None, then it should be the name of a folder where to store overlayed versions of the images and ground-truths
parallel (int, Optional) – If set to a value different >= 0, uses multiprocessing for estimating thresholds for each sample through a processing pool. A value of zero will create as many processes in the pool as cores in the machine. A negative value disables multiprocessing altogether. A value greater than zero will spawn as many processes as requested.