bob.ip.common.utils.measure

Functions

auc(x, y)

Calculates the area under the precision-recall curve (AUC)

base_measures(tp, fp, tn, fn)

Calculates frequentist measures from true/false positive and negative counts

bayesian_measures(tp, fp, tn, fn, lambda_, ...)

Calculates mean and mode from true/false positive and negative counts with credible regions

beta_credible_region(k, i, lambda_, coverage)

Returns the mode, upper and lower bounds of the equal-tailed credible region of a probability estimate following Bernoulli trials.

get_intersection(pred_box, gt_box, multiplier)

Calculate intersection of boxes.

tricky_division(n, d)

Divides n by d.

bob.ip.common.utils.measure.tricky_division(n, d)[source]

Divides n by d. Returns 0.0 in case of a division by zero

bob.ip.common.utils.measure.base_measures(tp, fp, tn, fn)[source]

Calculates frequentist measures from true/false positive and negative counts

This function can return (frequentist versions of) standard machine learning measures from true and false positive counts of positives and negatives. For a thorough look into these and alternate names for the returned values, please check Wikipedia’s entry on Precision and Recall.

Parameters
  • tp (int) – True positive count, AKA “hit”

  • fp (int) – False positive count, AKA “false alarm”, or “Type I error”

  • tn (int) – True negative count, AKA “correct rejection”

  • fn (int) – False Negative count, AKA “miss”, or “Type II error”

Returns

  • precision (float) – P, AKA positive predictive value (PPV). It corresponds arithmetically to tp/(tp+fp). In the case tp+fp == 0, this function returns zero for precision.

  • recall (float) – R, AKA sensitivity, hit rate, or true positive rate (TPR). It corresponds arithmetically to tp/(tp+fn). In the special case where tp+fn == 0, this function returns zero for recall.

  • specificity (float) – S, AKA selectivity or true negative rate (TNR). It corresponds arithmetically to tn/(tn+fp). In the special case where tn+fp == 0, this function returns zero for specificity.

  • accuracy (float) – A, see Accuracy. is the proportion of correct predictions (both true positives and true negatives) among the total number of pixels examined. It corresponds arithmetically to (tp+tn)/(tp+tn+fp+fn). This measure includes both true-negatives and positives in the numerator, what makes it sensitive to data or regions without annotations.

  • jaccard (float) – J, see Jaccard Index or Similarity. It corresponds arithmetically to tp/(tp+fp+fn). In the special case where tn+fp+fn == 0, this function returns zero for the Jaccard index. The Jaccard index depends on a TP-only numerator, similarly to the F1 score. For regions where there are no annotations, the Jaccard index will always be zero, irrespective of the model output. Accuracy may be a better proxy if one needs to consider the true abscence of annotations in a region as part of the measure.

  • f1_score (float) – F1, see F1-score. It corresponds arithmetically to 2*P*R/(P+R) or 2*tp/(2*tp+fp+fn). In the special case where P+R == (2*tp+fp+fn) == 0, this function returns zero for the Jaccard index. The F1 or Dice score depends on a TP-only numerator, similarly to the Jaccard index. For regions where there are no annotations, the F1-score will always be zero, irrespective of the model output. Accuracy may be a better proxy if one needs to consider the true abscence of annotations in a region as part of the measure.

bob.ip.common.utils.measure.beta_credible_region(k, i, lambda_, coverage)[source]

Returns the mode, upper and lower bounds of the equal-tailed credible region of a probability estimate following Bernoulli trials.

This implemetnation is based on [GOUTTE-2005]. It assumes \(k\) successes and \(l\) failures (\(n = k+l\) total trials) are issued from a series of Bernoulli trials (likelihood is binomial). The posterior is derivated using the Bayes Theorem with a beta prior. As there is no reason to favour high vs. low precision, we use a symmetric Beta prior (\(\alpha=\beta\)):

\[\begin{split}P(p|k,n) &= \frac{P(k,n|p)P(p)}{P(k,n)} \\ P(p|k,n) &= \frac{\frac{n!}{k!(n-k)!}p^{k}(1-p)^{n-k}P(p)}{P(k)} \\ P(p|k,n) &= \frac{1}{B(k+\alpha, n-k+eta)}p^{k+\alpha-1}(1-p)^{n-k+\beta-1} \\ P(p|k,n) &= \frac{1}{B(k+\alpha, n-k+\alpha)}p^{k+\alpha-1}(1-p)^{n-k+\alpha-1}\end{split}\]

The mode for this posterior (also the maximum a posteriori) is:

\[\text{mode}(p) = \frac{k+\lambda-1}{n+2\lambda-2}\]

Concretely, the prior may be flat (all rates are equally likely, \(\lambda=1\)) or we may use Jeoffrey’s prior (\(\lambda=0.5\)), that is invariant through re-parameterisation. Jeffrey’s prior indicate that rates close to zero or one are more likely.

The mode above works if \(k+{\alpha},n-k+{\alpha} > 1\), which is usually the case for a resonably well tunned system, with more than a few samples for analysis. In the limit of the system performance, \(k\) may be 0, which will make the mode become zero.

For our purposes, it may be more suitable to represent \(n = k + l\), with \(k\), the number of successes and \(l\), the number of failures in the binomial experiment, and find this more suitable representation:

\[\begin{split}P(p|k,l) &= \frac{1}{B(k+\alpha, l+\alpha)}p^{k+\alpha-1}(1-p)^{l+\alpha-1} \\ \text{mode}(p) &= \frac{k+\lambda-1}{k+l+2\lambda-2}\end{split}\]

This can be mapped to most rates calculated in the context of binary classification this way:

  • Precision or Positive-Predictive Value (PPV): p = TP/(TP+FP), so k=TP, l=FP

  • Recall, Sensitivity, or True Positive Rate: r = TP/(TP+FN), so k=TP, l=FN

  • Specificity or True Negative Rage: s = TN/(TN+FP), so k=TN, l=FP

  • F1-score: f1 = 2TP/(2TP+FP+FN), so k=2TP, l=FP+FN

  • Accuracy: acc = TP+TN/(TP+TN+FP+FN), so k=TP+TN, l=FP+FN

  • Jaccard: j = TP/(TP+FP+FN), so k=TP, l=FP+FN

Contrary to frequentist approaches, in which one can only say that if the test were repeated an infinite number of times, and one constructed a confidence interval each time, then X% of the confidence intervals would contain the true rate, here we can say that given our observed data, there is a X% probability that the true value of \(k/n\) falls within the provided interval.

Note

For a disambiguation with Confidence Interval, read https://en.wikipedia.org/wiki/Credible_interval.

Parameters
  • k (int) – Number of successes observed on the experiment

  • i (int) – Number of failures observed on the experiment

  • lambda (float, Optional) – The parameterisation of the Beta prior to consider. Use \(\lambda=1\) for a flat prior. Use \(\lambda=0.5\) for Jeffrey’s prior (the default).

  • coverage (float, Optional) – A floating-point number between 0 and 1.0 indicating the coverage you’re expecting. A value of 0.95 will ensure 95% of the area under the probability density of the posterior is covered by the returned equal-tailed interval.

Returns

  • mean (float) – The mean of the posterior distribution

  • mode (float) – The mode of the posterior distributA questão do volume eion

  • lower, upper (float) – The lower and upper bounds of the credible region

bob.ip.common.utils.measure.bayesian_measures(tp, fp, tn, fn, lambda_, coverage)[source]

Calculates mean and mode from true/false positive and negative counts with credible regions

This function can return bayesian estimates of standard machine learning measures from true and false positive counts of positives and negatives. For a thorough look into these and alternate names for the returned values, please check Wikipedia’s entry on Precision and Recall. See beta_credible_region() for details on the calculation of returned values.

Parameters
  • tp (int) – True positive count, AKA “hit”

  • fp (int) – False positive count, AKA “false alarm”, or “Type I error”

  • tn (int) – True negative count, AKA “correct rejection”

  • fn (int) – False Negative count, AKA “miss”, or “Type II error”

  • lambda (float) – The parameterisation of the Beta prior to consider. Use \(\lambda=1\) for a flat prior. Use \(\lambda=0.5\) for Jeffrey’s prior.

  • coverage (float) – A floating-point number between 0 and 1.0 indicating the coverage you’re expecting. A value of 0.95 will ensure 95% of the area under the probability density of the posterior is covered by the returned equal-tailed interval.

Returns

  • precision ((float, float, float, float)) – P, AKA positive predictive value (PPV), mean, mode and credible intervals (95% CI). It corresponds arithmetically to tp/(tp+fp).

  • recall ((float, float, float, float)) – R, AKA sensitivity, hit rate, or true positive rate (TPR), mean, mode and credible intervals (95% CI). It corresponds arithmetically to tp/(tp+fn).

  • specificity ((float, float, float, float)) – S, AKA selectivity or true negative rate (TNR), mean, mode and credible intervals (95% CI). It corresponds arithmetically to tn/(tn+fp).

  • accuracy ((float, float, float, float)) – A, mean, mode and credible intervals (95% CI). See Accuracy. is the proportion of correct predictions (both true positives and true negatives) among the total number of pixels examined. It corresponds arithmetically to (tp+tn)/(tp+tn+fp+fn). This measure includes both true-negatives and positives in the numerator, what makes it sensitive to data or regions without annotations.

  • jaccard ((float, float, float, float)) – J, mean, mode and credible intervals (95% CI). See Jaccard Index or Similarity. It corresponds arithmetically to tp/(tp+fp+fn). The Jaccard index depends on a TP-only numerator, similarly to the F1 score. For regions where there are no annotations, the Jaccard index will always be zero, irrespective of the model output. Accuracy may be a better proxy if one needs to consider the true abscence of annotations in a region as part of the measure.

  • f1_score ((float, float, float, float)) – F1, mean, mode and credible intervals (95% CI). See F1-score. It corresponds arithmetically to 2*P*R/(P+R) or 2*tp/(2*tp+fp+fn). The F1 or Dice score depends on a TP-only numerator, similarly to the Jaccard index. For regions where there are no annotations, the F1-score will always be zero, irrespective of the model output. Accuracy may be a better proxy if one needs to consider the true abscence of annotations in a region as part of the measure.

bob.ip.common.utils.measure.auc(x, y)[source]

Calculates the area under the precision-recall curve (AUC)

This function requires a minimum of 2 points and will use the trapezoidal method to calculate the area under a curve bound between [0.0, 1.0]. It interpolates missing points if required. The input x should be continuously increasing or decreasing.

Parameters
  • x (numpy.ndarray) – A 1D numpy array containing continuously increasing or decreasing values for the X coordinate.

  • y (numpy.ndarray) – A 1D numpy array containing the Y coordinates of the X values provided in x.

bob.ip.common.utils.measure.get_intersection(pred_box, gt_box, multiplier)[source]

Calculate intersection of boxes.

Parameters
  • pred_box (torch.Tensor) – A 1D numpy array containing predicted box coords.

  • gt_box (torch.Tensor) – A 1D numpy array containing groud truth box coords.

  • multiplier (float) – A number to increase the predicted bounding box by.