.. _bob.tutorial: ******************************** Getting started with |project| ******************************** The following tutorial constitutes a suitable starting point to get to know how to use |project|'s packages and to learn its fundamental concepts. Multi-dimensional Arrays ======================== The fundamental data structure of |project| is a multi-dimensional array. In signal processing and machine learning, arrays are a suitable representation for many different types of digital signals such as images, audio data and extracted features. For multi-dimensional arrays, we rely on `NumPy`_. For an introduction and tutorials about NumPy ndarrays, just visit the `NumPy Reference`_ website. Digital signals as multi-dimensional arrays =========================================== For |project|, we have decided to represent digital signals directly as :any:`numpy.ndarray` rather than having dedicated classes for each type of signals. This implies that some convention has been defined. Vectors and matrices -------------------- A vector is represented as a 1D NumPy array, whereas a matrix is represented by a 2D array whose first dimension corresponds to the rows, and second dimension to the columns. .. code:: python >>> import numpy >>> A = numpy.array([[1, 2, 3], [4, 5, 6]], dtype='uint8') # A is a matrix 2x3 >>> print(A) [[1 2 3] [4 5 6]] >>> b = numpy.array([1, 2, 3], dtype='uint8') # b is a vector of length 3 >>> print(b) [1 2 3] Images ------ **Grayscale** images are represented as 2D arrays, the first dimension being the height (number of rows) and the second dimension being the width (number of columns). For instance: .. code:: python >>> img = numpy.ndarray((480,640), dtype='uint8') ``img`` which is a 2D array can be seen as a gray-scale image of dimension 640 (width) by 480 (height). In addition, ``img`` can be seen as a matrix with 480 rows and 640 columns. This is the reason why we have decided that for images, the first dimension is the height and the second one the width, such that it matches the matrix convention as well. **Color** images are represented as 3D arrays, the first dimension being the number of color planes, the second dimension the height and the third the width. As an image is an array, this is the responsibility of the user to know in which color space the content is stored. :any:`bob.io.image` provides functions to convert Bob format images into Matplotlib_ and other formats and back: .. code:: python >>> import bob.io.image >>> colored_bob_format = numpy.ndarray((3,480,640), dtype='uint8') >>> colored_matplotlib_format = bob.io.image.to_matplotlib(colored_bob_format) >>> print(colored_matplotlib_format.shape) [480 640 3] >>> colored_bob_format = bob.io.image.to_bob(colored_matplotlib_format) >>> print(colored_bob_format.shape) [3 480 640] >>> pillow_img = bob.io.image.bob_to_pillow(colored_bob_format) >>> opencv_bgr = bob.io.image.bob_to_opencv(colored_bob_format) .. note:: In :ref:`bob.bio.face`, the images are assumed to be in range ``[0,255]`` irrespective of their data type. Videos ------ A video can be seen as a sequence of images over time. By convention, the first dimension is for the frame indices (time index), whereas the remaining ones are related to the corresponding image frame. The videos have the shape of ``(N,C,H,W)``, where ``N`` is the number of frames, ``H`` the height, ``W`` the width and ``C`` the number of color planes. Input and output ================ :ref:`bob.io.base` provides two generic functions :any:`bob.io.base.load` and :any:`bob.io.base.save` to load and save data of various types, based on the filename extension. For example, to load a ``.jpg`` image, simply call: .. code:: python >>> import bob.io.base >>> img = bob.io.base.load("myimg.jpg") `HDF5`_ format, through h5py_, and images, through imageio_, are supported. For loading videos, use imageio-ffmpeg_ directly. Machine learning ================ :ref:`bob.learn.em` provides implementation of the following methods: - K-Means clustering - Gaussian Mixture Modeling (GMM) - Joint Factor Analysis (JFA) - Inter-Session Variability (ISV) - Total Variability (TV, also known as i-vector) - Probabilistic Linear Discriminant Analysis (PLDA, also known as i-vector) All implementations use dask_ to parallelize the training computation. Database interfaces =================== Bob provides an API on top of CSV files to easily query databases. A generic implementation is provided in :ref:`bob.pipelines` but packages such as :ref:`bob.bio.base` and :ref:`bob.pad.base` provide their own implementations. Performance evaluation ====================== Methods in the :ref:`bob.measure` module can be used evaluate error for multi-class or binary classification problems. Several evaluation techniques such as: - Root Mean Squared Error (RMSE) - F-score - Precision and Recall - False Positive Rate (FPR) - False Negative Rate (FNR) - Equal Error Rates (EER) can be computed. Moreover, functionality for plotting - ROC - DET - CMC - EPC curves are described in more detail in the :ref:`bob.measure`. .. include:: links.rst