.. author: Tiago de Freitas Pereira .. date: Wed 21 Sep 2020 15:45:00 UTC+02 .. _bob.bio.base.pipeline_score_norm: =================== Score normalization =================== Score normalization aims to compensate for statistical variations in output scores due to changes in the conditions across different enrollment and probe samples. This is achieved by scaling distributions of system output scores to better facilitate the application of a single, global threshold for authentication. Bob has implemented the :py:class:`bob.bio.base.pipelines.PipelineScoreNorm` which is an extension of regular :py:class:`bob.bio.base.pipelines.PipelineSimple` where a post process step is amended to the scoring stage. Bob has implemented three different strategies to normalize scores with two post processors, and these strategies are presented in the next subsections. .. warning:: Not all databases support the score normalization operations. Please look below at *Score normalization and databases* for more information on how to enable score normalization in databases. Z-Norm ====== .. _znorm: Given a score :math:`s_i`, Z-Norm [Auckenthaler2000]_ and [Mariethoz2005]_ (zero-normalization) scales this value by the mean (:math:`\mu`) and standard deviation (:math:`\sigma`) of an impostor score distribution. This score distribution can be computed beforehand, and it is defined as the following. .. math:: zs_i = \frac{s_i - \mu}{\sigma} This scoring technique is implemented in our API via :py:func:`bob.bio.base.pipelines.ZNormScores`. Currently, the ZNorm is available via the following CLI command :: $ bob bio pipeline score-norm [SIMPLE-PIPELINE-COMMANDS] --score-normalization-type znorm T-Norm ====== .. _tnorm: T-norm [Auckenthaler2000]_ and [Mariethoz2005]_ (Test-normalization) operates in a probe-centric manner. If in the Z-Norm :math:`\mu` and :math:`\sigma` are estimated using an impostor set of models and its scores, the t-norm computes these statistics using the current probe sample against at set of models in a cohort :math:`\Theta_{c}`. A cohort can be any semantic organization that is sensible to your recognition task, such as sex (male and females), ethnicity, age, etc and is defined as the following. .. math:: ts_i = \frac{s_i - \mu}{\sigma} where, :math:`s_i` is :math:`P(x_i | \Theta)` (the score given the claimed model), :math:`\mu = \frac{ \sum\limits_{i=0}^{N} P(x_i | \Theta_{c}) }{N}` (:math:`\Theta_{c}` are the models of one co-hort) and :math:`\sigma` is the standard deviation computed using the same criteria used to compute :math:`\mu`. This scoring technique is implemented in our API via :py:func:`bob.bio.base.pipelines.TNormScores`. Currently, the ZNorm is available via the following CLI command :: $ bob bio pipeline score-norm [SIMPLE-PIPELINE-COMMANDS] --score-normalization-type tnorm .. note:: T-norm introduces extra computation during scoring, as the probe samples need to be compared to each cohort model in order to have :math:`\mu` and :math:`\sigma`. S-Norm ====== .. todo: To be implemented Score normalization and databases ================================= .. _score_norm_databases: To enable the above mentioned score normalization strategies it is assumed that you passed through this :ref:`section `. Once you have absorbed that, enabling score normalization operations to your database is easy. It consists of adding the following files in bold at the CVS database file structure: .. code-block:: text my_dataset | +-- my_protocol_1 | +-- norm | | | +-- train_world.csv | +-- *for_tnorm.csv* | +-- *for_znorm.csv* | +-- dev | | | +-- for_models.csv | +-- for_probes.csv | +-- eval | +-- for_models.csv +-- for_probes.csv The file format is identical as in the current :ref:`CSV interface ` ==================== Calibration by group ==================== Implements an adaptation of the Categorical Calibration defined in [Mandasari2014]_. .. todo:: Discuss all the four calibration strategies