Score normalization

Score normalization aims to compensate for statistical variations in output scores due to changes in the conditions across different enrollment and probe samples. This is achieved by scaling distributions of system output scores to better facilitate the application of a single, global threshold for authentication.

Bob has implemented the bob.bio.base.pipelines.PipelineScoreNorm which is an extension of regular bob.bio.base.pipelines.PipelineSimple where a post process step is amended to the scoring stage. Bob has implemented three different strategies to normalize scores with two post processors, and these strategies are presented in the next subsections.

Warning

Not all databases support the score normalization operations. Please look below at Score normalization and databases for more information on how to enable score normalization in databases.

Z-Norm

Given a score \(s_i\), Z-Norm [Auckenthaler2000] and [Mariethoz2005] (zero-normalization) scales this value by the mean (\(\mu\)) and standard deviation (\(\sigma\)) of an impostor score distribution. This score distribution can be computed beforehand, and it is defined as the following.

\[zs_i = \frac{s_i - \mu}{\sigma}\]

This scoring technique is implemented in our API via bob.bio.base.pipelines.ZNormScores().

Currently, the ZNorm is available via the following CLI command

$ bob bio pipeline score-norm [SIMPLE-PIPELINE-COMMANDS] --score-normalization-type znorm

T-Norm

T-norm [Auckenthaler2000] and [Mariethoz2005] (Test-normalization) operates in a probe-centric manner. If in the Z-Norm \(\mu\) and \(\sigma\) are estimated using an impostor set of models and its scores, the t-norm computes these statistics using the current probe sample against at set of models in a cohort \(\Theta_{c}\). A cohort can be any semantic organization that is sensible to your recognition task, such as sex (male and females), ethnicity, age, etc and is defined as the following.

\[ts_i = \frac{s_i - \mu}{\sigma}\]

where, \(s_i\) is \(P(x_i | \Theta)\) (the score given the claimed model), \(\mu = \frac{ \sum\limits_{i=0}^{N} P(x_i | \Theta_{c}) }{N}\) (\(\Theta_{c}\) are the models of one co-hort) and \(\sigma\) is the standard deviation computed using the same criteria used to compute \(\mu\).

This scoring technique is implemented in our API via bob.bio.base.pipelines.TNormScores().

Currently, the ZNorm is available via the following CLI command

$ bob bio pipeline score-norm [SIMPLE-PIPELINE-COMMANDS] --score-normalization-type tnorm

Note

T-norm introduces extra computation during scoring, as the probe samples need to be compared to each cohort model in order to have \(\mu\) and \(\sigma\).

S-Norm

Score normalization and databases

To enable the above mentioned score normalization strategies it is assumed that you passed through this section. Once you have absorbed that, enabling score normalization operations to your database is easy. It consists of adding the following files in bold at the CVS database file structure:

my_dataset
|
+-- my_protocol_1
    |
    +-- norm
    |    |
    |    +-- train_world.csv
    |    +-- *for_tnorm.csv*
    |    +-- *for_znorm.csv*
    |
    +-- dev
    |   |
    |   +-- for_models.csv
    |   +-- for_probes.csv
    |
    +-- eval
         |
         +-- for_models.csv
         +-- for_probes.csv

The file format is identical as in the current CSV interface

Calibration by group

Implements an adaptation of the Categorical Calibration defined in [Mandasari2014].

Todo

Discuss all the four calibration strategies