bob.ip.binseg.engine.evaluator

Defines functionality for the evaluation of predictions

Functions

compare_annotators(baseline, other, name, ...)

Compares annotations on the same dataset

run(dataset, name, predictions_folder[, ...])

Runs inference and calculates measures

sample_measures_for_threshold(pred, gt, ...)

Calculates counts on one single sample, for a specific threshold

bob.ip.binseg.engine.evaluator.sample_measures_for_threshold(pred, gt, mask, threshold)[source]

Calculates counts on one single sample, for a specific threshold

Parameters
  • pred (torch.Tensor) – pixel-wise predictions

  • gt (torch.Tensor) – ground-truth (annotations)

  • mask (torch.Tensor) – region mask (used only if available). May be set to None.

  • threshold (float) – a particular threshold in which to calculate the performance measures

Returns

  • tp (int)

  • fp (int)

  • tn (int)

  • fn (int)

bob.ip.binseg.engine.evaluator.run(dataset, name, predictions_folder, output_folder=None, overlayed_folder=None, threshold=None, steps=1000, parallel=-1)[source]

Runs inference and calculates measures

Parameters
  • dataset (py:class:torch.utils.data.Dataset) – a dataset to iterate on

  • name (str) – the local name of this dataset (e.g. train, or test), to be used when saving measures files.

  • predictions_folder (str) – folder where predictions for the dataset images have been previously stored

  • output_folder (str, Optional) – folder where to store results. If not provided, then do not store any analysis (useful for quickly calculating overlay thresholds)

  • overlayed_folder (str, Optional) – if not None, then it should be the name of a folder where to store overlayed versions of the images and ground-truths

  • threshold (float, Optional) – if overlayed_folder, then this should be threshold (floating point) to apply to prediction maps to decide on positives and negatives for overlaying analysis (graphical output). This number should come from the training set or a separate validation set. Using a test set value may bias your analysis. This number is also used to print the a priori F1-score on the evaluated set.

  • steps (float, Optional) – number of threshold steps to consider when evaluating thresholds.

  • parallel (int, Optional) – If set to a value different >= 0, uses multiprocessing for estimating thresholds for each sample through a processing pool. A value of zero will create as many processes in the pool as cores in the machine. A negative value disables multiprocessing altogether. A value greater than zero will spawn as many processes as requested.

Returns

threshold – Threshold to achieve the highest possible F1-score for this dataset

Return type

float

bob.ip.binseg.engine.evaluator.compare_annotators(baseline, other, name, output_folder, overlayed_folder=None, parallel=-1)[source]

Compares annotations on the same dataset

Parameters
  • baseline (py:class:torch.utils.data.Dataset) – a dataset to iterate on, containing the baseline annotations

  • other (py:class:torch.utils.data.Dataset) – a second dataset, with the same samples as baseline, but annotated by a different annotator than in the first dataset. The key values must much between baseline and this dataset.

  • name (str) – the local name of this dataset (e.g. train-second-annotator, or test-second-annotator), to be used when saving measures files.

  • output_folder (str) – folder where to store results

  • overlayed_folder (str, Optional) – if not None, then it should be the name of a folder where to store overlayed versions of the images and ground-truths

  • parallel (int, Optional) – If set to a value different >= 0, uses multiprocessing for estimating thresholds for each sample through a processing pool. A value of zero will create as many processes in the pool as cores in the machine. A negative value disables multiprocessing altogether. A value greater than zero will spawn as many processes as requested.