Python API

This section includes information for using the Python API of bob.measure.

Measurement

bob.measure.mse(estimation, target)[source]

Calculates the mean square error between a set of outputs and target values using the following formula:

MSE(\hat{\Theta}) = E[(\hat{\Theta} - \Theta)^2]

Estimation (\hat{\Theta}) and target (\Theta) are supposed to have 2 dimensions. Different examples are organized as rows while different features in the estimated values or targets are organized as different columns.

bob.measure.rmse(estimation, target)[source]

Calculates the root mean square error between a set of outputs and target values using the following formula:

RMSE(\hat{\Theta}) = \sqrt(E[(\hat{\Theta} - \Theta)^2])

Estimation (\hat{\Theta}) and target (\Theta) are supposed to have 2 dimensions. Different examples are organized as rows while different features in the estimated values or targets are organized as different columns.

bob.measure.relevance(input, machine)[source]

Calculates the relevance of every input feature to the estimation process using the following definition from:

Neural Triggering System Operating on High Resolution Calorimetry Information, Anjos et al, April 2006, Nuclear Instruments and Methods in Physics Research, volume 559, pages 134-138

R(x_{i}) = |E[(o(x) - o(x|x_{i}=E[x_{i}]))^2]|

In other words, the relevance of a certain input feature i is the change on the machine output value when such feature is replaced by its mean for all input vectors. For this to work, the input parameter has to be a 2D array with features arranged column-wise while different examples are arranged row-wise.

bob.measure.recognition_rate(cmc_scores)[source]

Calculates the recognition rate from the given input, which is identical to the rank 1 (C)MC value.

The input has a specific format, which is a list of two-element tuples. Each of the tuples contains the negative and the positive scores for one test item. To read the lists from score files in 4 or 5 column format, please use the bob.measure.load.cmc_four_column() or bob.measure.load.cmc_five_column() function.

The recognition rate is defined as the number of test items, for which the positive score is greater than or equal to all negative scores, divided by the number of all test items. If several positive scores for one test item exist, the highest score is taken.

bob.measure.cmc(cmc_scores)[source]

Calculates the cumulative match characteristic (CMC) from the given input.

The input has a specific format, which is a list of two-element tuples. Each of the tuples contains the negative and the positive scores for one test item. To read the lists from score files in 4 or 5 column format, please use the bob.measure.load.cmc_four_column() or bob.measure.load.cmc_five_column() function.

For each test item the probability that the rank r of the positive score is calculated. The rank is computed as the number of negative scores that are higher than the positive score. If several positive scores for one test item exist, the highest positive score is taken. The CMC finally computes how many test items have rank r or higher.

bob.measure.get_config()[source]

Returns a string containing the configuration information.

bob.measure.correctly_classified_negatives(negatives, threshold) → int

This method returns an array composed of booleans that pin-point which negatives where correctly classified in a “negative” score sample, given a threshold. It runs the formula: foreach (element k in negative) if negative[k] < threshold: returnValue[k] = true else: returnValue[k] = false

bob.measure.correctly_classified_positives(positives, threshold) → numpy.ndarray

This method returns a 1D array composed of booleans that pin-point which positives where correctly classified in a ‘positive’ score sample, given a threshold. It runs the formula: foreach (element k in positive) if positive[k] >= threshold: returnValue[k] = true else: returnValue[k] = false

bob.measure.det(negatives, positives, n_points) → numpy.ndarray

Calculates points of an Detection Error-Tradeoff Curve (DET).

Calculates the DET curve given a set of positive and negative scores and a desired number of points. Returns a two-dimensional array of doubles that express on its rows:

[0]
X axis values in the normal deviate scale for the false-rejections
[1]
Y axis values in the normal deviate scale for the false-accepts

You can plot the results using your preferred tool to first create a plot using rows 0 and 1 from the returned value and then replace the X/Y axis annotation using a pre-determined set of tickmarks as recommended by NIST. The algorithm that calculates the deviate scale is based on function ppndf() from the NIST package DETware version 2.1, freely available on the internet. Please consult it for more details. By 20.04.2011, you could find such package here.

bob.measure.eer_rocch(negatives, positives) → float

Calculates the equal-error-rate (EER) given the input data, on the ROC Convex Hull as done in the Bosaris toolkit (https://sites.google.com/site/bosaristoolkit/).

bob.measure.eer_threshold(negatives, positives) → float

Calculates the threshold that is as close as possible to the equal-error-rate (EER) given the input data. The EER should be the point where the FAR equals the FRR. Graphically, this would be equivalent to the intersection between the ROC (or DET) curves and the identity.

bob.measure.epc(dev_negatives, dev_positives, test_negatives, test_positives, n_points) → numpy.ndarray

Calculates points of an Expected Performance Curve (EPC).

Calculates the EPC curve given a set of positive and negative scores and a desired number of points. Returns a two-dimensional blitz::Array of doubles that express the X (cost) and Y (HTER on the test set given the min. HTER threshold on the development set) coordinates in this order. Please note that, in order to calculate the EPC curve, one needs two sets of data comprising a development set and a test set. The minimum weighted error is calculated on the development set and then applied to the test set to evaluate the half-total error rate at that position.

The EPC curve plots the HTER on the test set for various values of ‘cost’. For each value of ‘cost’, a threshold is found that provides the minimum weighted error (see bob.measure.min_weighted_error_rate_threshold()) on the development set. Each threshold is consecutively applied to the test set and the resulting HTER values are plotted in the EPC.

The cost points in which the EPC curve are calculated are distributed uniformily in the range [0.0, 1.0].

bob.measure.f_score(negatives, positives, threshold[, weight=1.0]) → float

This method computes F-score of the accuracy of the classification. It is a weighted mean of precision and recall measurements. The weight parameter needs to be non-negative real value. In case the weight parameter is 1, the F-score is called F1 score and is a harmonic mean between precision and recall values.

bob.measure.far_threshold(negatives, positives[, far_value=0.001]) → float

Computes the threshold such that the real FAR is at least the requested far_value.

Keyword parameters:

negatives
The impostor scores to be used for computing the FAR
positives
The client scores; ignored by this function
far_value
The FAR value where the threshold should be computed

Returns the computed threshold (float)

bob.measure.farfrr(negatives, positives, threshold) -> (float, float)

Calculates the false-acceptance (FA) ratio and the FR false-rejection (FR) ratio given positive and negative scores and a threshold. positives holds the score information for samples that are labelled to belong to a certain class (a.k.a., ‘signal’ or ‘client’). negatives holds the score information for samples that are labelled not to belong to the class (a.k.a., ‘noise’ or ‘impostor’).

It is expected that ‘positive’ scores are, at least by design, greater than negative scores. So, every positive value that falls bellow the threshold is considered a false-rejection (FR). negative samples that fall above the threshold are considered a false-accept (FA).

Positives that fall on the threshold (exactly) are considered correctly classified. Negatives that fall on the threshold (exactly) are considered incorrectly classified. This equivalent to setting the comparision like this pseudo-code:

foreach (positive as K) if K < threshold: falseRejectionCount += 1 foreach (negative as K) if K >= threshold: falseAcceptCount += 1

The threshold value does not necessarily have to fall in the range covered by the input scores (negatives and positives altogether), but if it does not, the output will be either (1.0, 0.0) or (0.0, 1.0) depending on the side the threshold falls.

The output is in form of a tuple of two double-precision real numbers. The numbers range from 0 to 1. The first element of the pair is the false-accept ratio (FAR). The second element of the pair is the false-rejection ratio (FRR).

It is possible that scores are inverted in the negative/positive sense. In some setups the designer may have setup the system so positive samples have a smaller score than the negative ones. In this case, make sure you normalize the scores so positive samples have greater scores before feeding them into this method.

bob.measure.frr_threshold(negatives, positives[, frr_value=0.001]) → float

Computes the threshold such that the real FRR is at least the requested frr_value.

Keyword parameters:

negatives
The impostor scores; ignored by this function
positives
The client scores to be used for computing the FRR
frr_value
The FRR value where the threshold should be computed

Returns the computed threshold (float)

bob.measure.min_hter_threshold(negatives, positives) → float

Calculates the min_weighted_error_rate_threshold() with the cost set to 0.5.

bob.measure.min_weighted_error_rate_threshold(negatives, positives, cost) → float

Calculates the threshold that minimizes the error rate, given the input data. An optional parameter ‘cost’ determines the relative importance between false-accepts and false-rejections. This number should be between 0 and 1 and will be clipped to those extremes. The value to minimize becomes: ER_cost = [cost * FAR] + [(1-cost) * FRR]. The higher the cost, the higher the importance given to not making mistakes classifying negatives/noise/impostors.

bob.measure.ppndf(value) → float

Returns the Deviate Scale equivalent of a false rejection/acceptance ratio.

The algorithm that calculates the deviate scale is based on function ppndf() from the NIST package DETware version 2.1, freely available on the internet. Please consult it for more details.

bob.measure.precision_recall(negatives, positives, threshold) -> (float, float)

Calculates the precision and recall (sensitiveness) values given positive and negative scores and a threshold. positives holds the score information for samples that are labeled to belong to a certain class (a.k.a., ‘signal’ or ‘client’). negatives holds the score information for samples that are labeled not to belong to the class (a.k.a., ‘noise’ or ‘impostor’). For more precise details about how the method considers error rates, please refer to the documentation of the method farfrr().

bob.measure.precision_recall_curve(negatives, positives, n_points) → numpy.ndarray

Calculates the precision-recall curve given a set of positive and negative scores and a number of desired points. Returns a two-dimensional array of doubles that express the X (precision) and Y (recall) coordinates in this order. The points in which the curve is calculated are distributed uniformly in the range [min(negatives, positives), max(negatives, positives)].

bob.measure.roc(negatives, positives, n_points) → numpy.ndarray

Calculates points of an Receiver Operating Characteristic (ROC).

Calculates the ROC curve given a set of positive and negative scores and a desired number of points. Returns a two-dimensional array of doubles that express the X (FAR) and Y (FRR) coordinates in this order. The points in which the ROC curve are calculated are distributed uniformily in the range [min(negatives, positives), max(negatives, positives)].

bob.measure.roc_for_far(negatives, positives, far_list) → numpy.ndarray

Calculates the ROC curve given a set of positive and negative scores and the FAR values for which the FRR should be computed. The resulting ROC curve holds a copy of the given FAR values (row 0), and the corresponding FRR values (row 1).

bob.measure.rocch(negatives, positives) → numpy.ndarray

Calculates the ROC Convex Hull curve given a set of positive and negative scores. Returns a two-dimensional array of doubles that express the X (FAR) and Y (FRR) coordinates in this order.

bob.measure.rocch2eer(pmiss_pfa) → float

Calculates the threshold that is as close as possible to the equal-error-rate (EER) given the input data.

Loading data

A set of utilities to load score files with different formats.

bob.measure.load.open_file(filename)[source]

Opens the given score file for reading. Score files might be raw text files, or a tar-file including a single score file inside.

Parameters:

filename : str or file-like
The name of the score file to open, or a file-like object open for reading. If a file name is given, the according file might be a raw text file or a (compressed) tar file containing a raw text file.
Returns:
A read-only file-like object as it would be returned by open().
bob.measure.load.four_column(filename)[source]

Loads a score set from a single file to memory.

Verifies that all fields are correctly placed and contain valid fields.

Returns a python generator of tuples containing the following fields:

[0]
claimed identity (string)
[1]
real identity (string)
[2]
test label (string)
[3]
score (float)
bob.measure.load.split_four_column(filename)[source]

Loads a score set from a single file to memory and splits the scores between positives and negatives. The score file has to respect the 4 column format as defined in the method four_column().

This method avoids loading and allocating memory for the strings present in the file. We only keep the scores.

Returns a python tuple (negatives, positives). The values are 1-D blitz arrays of float64.

bob.measure.load.cmc_four_column(filename)[source]

Loads scores to compute CMC curves from a file in four column format. The four column file needs to be in the same format as described in the four_column function, and the “test label” (column 3) has to contain the test/probe file name.

This function returns a list of tuples. For each probe file, the tuple consists of a list of negative scores and a list of positive scores. Usually, the list of positive scores should contain only one element, but more are allowed.

The result of this function can directly be passed to, e.g., the bob.measure.cmc function.

bob.measure.load.five_column(filename)[source]

Loads a score set from a single file to memory.

Verifies that all fields are correctly placed and contain valid fields.

Returns a python generator of tuples containing the following fields:

[0]
claimed identity (string)
[1]
model label (string)
[2]
real identity (string)
[3]
test label (string)
[4]
score (float)
bob.measure.load.split_five_column(filename)[source]

Loads a score set from a single file to memory and splits the scores between positives and negatives. The score file has to respect the 5 column format as defined in the method five_column().

This method avoids loading and allocating memory for the strings present in the file. We only keep the scores.

Returns a python tuple (negatives, positives). The values are 1-D blitz arrays of float64.

bob.measure.load.cmc_five_column(filename)[source]

Loads scores to compute CMC curves from a file in five column format. The four column file needs to be in the same format as described in the five_column function, and the “test label” (column 4) has to contain the test/probe file name.

This function returns a list of tuples. For each probe file, the tuple consists of a list of negative scores and a list of positive scores. Usually, the list of positive scores should contain only one element, but more are allowed.

The result of this function can directly be passed to, e.g., the bob.measure.cmc function.

Calibration

Measures for calibration

bob.measure.calibration.cllr(negatives, positives)[source]

Computes the ‘cost of log likelihood ratio’ measure as given in the bosaris toolkit

bob.measure.calibration.min_cllr(negatives, positives)[source]

Computes the ‘minimum cost of log likelihood ratio’ measure as given in the bosaris toolkit

Plotting

Methods to plot error analysis figures such as ROC, precision-recall curve, EPC and DET

bob.measure.plot.roc(negatives, positives, npoints=100, CAR=False, **kwargs)[source]

Plots Receiver Operating Charactaristic (ROC) curve.

This method will call matplotlib to plot the ROC curve for a system which contains a particular set of negatives (impostors) and positives (clients) scores. We use the standard matplotlib.pyplot.plot() command. All parameters passed with exeception of the three first parameters of this method will be directly passed to the plot command. If you wish to understand your options, look here:

http://matplotlib.sourceforge.net/api/pyplot_api.html#matplotlib.pyplot.plot

The plot will represent the false-alarm on the vertical axis and the false-rejection on the horizontal axis.

Input arguments:

negatives
a blitz array of negative class scores in float64 format
positives
a blitz array of positive class scores in float64 format
npoints
number of points to use when drawing the ROC curve
CAR
plot CAR over FAR in semilogx (CAR=True) or FAR over FRR linearly (CAR=False, the default)
kwargs
a dictionary of extra plotting parameters, that is passed directly to matplotlib.pyplot.plot().

Note

This function does not initiate and save the figure instance, it only issues the plotting command. You are the responsible for setting up and saving the figure as you see fit.

Return value is the matplotlib line that was added as defined by the matplotlib.pyplot.plot() command.

bob.measure.plot.precision_recall_curve(negatives, positives, npoints=100, **kwargs)[source]

Plots Precision-Recall curve.

This method will call matplotlib to plot the precision-recall curve for a system which contains a particular set of negatives (impostors) and positives (clients) scores. We use the standard matplotlib.pyplot.plot() command. All parameters passed with exeception of the three first parameters of this method will be directly passed to the plot command. If you wish to understand your options, look here:

http://matplotlib.sourceforge.net/api/pyplot_api.html#matplotlib.pyplot.plot

The plot will represent the false-alarm on the vertical axis and the false-rejection on the horizontal axis.

Input arguments:

negatives
a blitz array of negative class scores in float64 format
positives
a blitz array of positive class scores in float64 format
npoints
number of points to use when drawing the ROC curve
kwargs
a dictionary of extra plotting parameters, that is passed directly to matplotlib.pyplot.plot().

Note

This function does not initiate and save the figure instance, it only issues the plotting command. You are the responsible for setting up and saving the figure as you see fit.

Return value is the matplotlib line that was added as defined by the matplotlib.pyplot.plot() command.

bob.measure.plot.epc(dev_negatives, dev_positives, test_negatives, test_positives, npoints=100, **kwargs)[source]

Plots Expected Performance Curve (EPC) as defined in the paper:

Bengio, S., Keller, M., Mariéthoz, J. (2004). The Expected Performance Curve. International Conference on Machine Learning ICML Workshop on ROC Analysis in Machine Learning, 136(1), 1963–1966. IDIAP RR. Available: http://eprints.pascal-network.org/archive/00000670/

This method will call matplotlib to plot the EPC curve for a system which contains a particular set of negatives (impostors) and positives (clients) for both the development and test sets. We use the standard matplotlib.pyplot.plot() command. All parameters passed with exeception of the five first parameters of this method will be directly passed to the plot command. If you wish to understand your options, look here:

http://matplotlib.sourceforge.net/api/pyplot_api.html#matplotlib.pyplot.plot

The plot will represent the minimum HTER on the vertical axis and the cost on the horizontal axis.

Input arguments:

dev_negatives
blitz array of negative class scores on development set in float64 format
dev_positives
blitz array of positive class scores on development set in float64 format
test_negatives
blitz array of negative class scores on test set in float64 format, or a list of those
test_positives
blitz array of positive class scores on test set in float64 format, or a list of those
npoints
number of points to use when drawing the EPC curve
kwargs
a dictionary of extra plotting parameters, that is passed directly to matplotlib.pyplot.plot().

Note

This function does not initiate and save the figure instance, it only issues the plotting commands. You are the responsible for setting up and saving the figure as you see fit.

Return value is the matplotlib line that was added as defined by the matplotlib.pyplot.plot() command.

bob.measure.plot.det(negatives, positives, npoints=100, axisfontsize='x-small', **kwargs)[source]

Plots Detection Error Trade-off (DET) curve as defined in the paper:

Martin, A., Doddington, G., Kamm, T., Ordowski, M., & Przybocki, M. (1997). The DET curve in assessment of detection task performance. Fifth European Conference on Speech Communication and Technology (pp. 1895-1898). Available: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.117.4489&rep=rep1&type=pdf

This method will call matplotlib to plot the DET curve(s) for a system which contains a particular set of negatives (impostors) and positives (clients) scores. We use the standard matplotlib.pyplot.plot() command. All parameters passed with exception of the three first parameters of this method will be directly passed to the plot command. If you wish to understand your options, look here:

http://matplotlib.sourceforge.net/api/pyplot_api.html#matplotlib.pyplot.plot

The plot will represent the false-alarm on the vertical axis and the false-rejection on the horizontal axis.

This method is strongly inspired by the NIST implementation for Matlab, called DETware, version 2.1 and available for download at the NIST website:

http://www.itl.nist.gov/iad/mig/tools/

Keyword parameters:

positives
numpy.array of positive class scores in float64 format
negatives
numpy.array of negative class scores in float64 format
npoints
number of points to use when drawing the EPC curve
axisfontsize
the size to be used by x/ytickables to set the font size on the axis
kwargs
a dictionary of extra plotting parameters, that is passed directly to matplotlib.pyplot.plot().

Note

This function does not initiate and save the figure instance, it only issues the plotting commands. You are the responsible for setting up and saving the figure as you see fit.

Note

If you wish to reset axis zooming, you must use the gaussian scale rather than the visual marks showed at the plot, which are just there for displaying purposes. The real axis scale is based on the bob.measure.ppndf() method. For example, if you wish to set the x and y axis to display data between 1% and 40% here is the recipe:

import bob.measure
import matplotlib.pyplot as mpl
bob.measure.plot.det(...) #call this as many times as you need
#AFTER you plot the DET curve, just set the axis in this way:
mpl.axis([bob.measure.ppndf(k/100.0) for k in (1, 40, 1, 40)])

We provide a convenient way for you to do the above in this module. So, optionally, you may use the bob.measure.plot.det_axis() method like this:

import bob.measure
bob.measure.plot.det(...)
# please note we convert percentage values in det_axis()
bob.measure.plot.det_axis([1, 40, 1, 40])

Return value is the matplotlib line that was added as defined by the matplotlib.pyplot.plot() command.

bob.measure.plot.det_axis(v, **kwargs)[source]

Sets the axis in a DET plot.

This method wraps the matplotlib.pyplot.axis() by calling bob.measure.ppndf() on the values passed by the user so they are meaningful in a DET plot as performed by bob.measure.plot.det().

Keyword parameters:

v
Python iterable (list or tuple) with the X and Y limits in the order (xmin, xmax, ymin, ymax). Expected values should be in percentage (between 0 and 100%). If v is not a list or tuple that contains 4 numbers it is passed without further inspection to matplotlib.pyplot.axis().
kwargs
All remaining arguments will be passed to matplotlib.pyplot.axis() without further inspection.

Returns whatever matplotlib.pyplot.axis() returns.

bob.measure.plot.cmc(cmc_scores, logx=True, **kwargs)[source]

Plots the (cumulative) match characteristics curve and returns the maximum rank.