Getting started with Bob¶
The following tutorial constitutes a suitable starting point to get to know how to use Bob’s packages and to learn its fundamental concepts.
They all rely on the lab-like environment which is Python. Using Bob within a Python environment is convenient because:
you can easily glue together all of the components of an experiment within a single Python script (which does not require to be compiled),
scripts may easily rely on other Python tools like SciPy as well as Bob, and
Python bindings are used to transparently run the underlying efficient C++ compiled code for the key features of the library.
The fundamental data structure of Bob is a multi-dimensional array. In
signal processing and machine learning, arrays are a suitable representation
for many different types of digital signals such as images, audio data and
extracted features. Python is the working environment selected for this library
and so when using Python we have relied on the existing NumPy
numpy.ndarray. This provides with greater
flexibility within the Python environment.
At the C++ level, the Blitz++ library is used to handle arrays. Bob provides internal conversion routines to transparently and efficiently convert NumPy ndarrays to/from Blitz++. As they are done implicitly, the user has no need to care about this aspect and should just use NumPy ndarrays everywhere while inside Python code.
For an introduction and tutorials about NumPy ndarrays, just visit the NumPy Reference website. For a short tutorial on the bindings from NumPy ndarrays to Blitz++, you can read the documentation of our Blitz++/Python Arrays package.
Many functions in Bob will return multi-dimensional arrays type
bob.blitz.array, which are wrapped by as a
you can use these arrays in all contexts inside Bob, NumPy and Scipy, some
functionality of the
numpy.ndarray are not available. In
particular, resizing the arrays with
numpy.ndarray.resize will raise an
exception. In such cases, please make a copy of the array using
Digital signals as multi-dimensional arrays¶
For Bob, we have decided to represent digital signals directly as
numpy.ndarray rather than having dedicated classes for each type of
signals. This implies that some convention has been defined.
Vectors and matrices¶
A vector is represented as a 1D NumPy array, whereas a matrix is represented by a 2D array whose first dimension corresponds to the rows, and second dimension to the columns.
>>> import numpy >>> A = numpy.array([[1, 2, 3], [4, 5, 6]], dtype='uint8') # A is a matrix 2x3 >>> print(A) [[1 2 3] [4 5 6]] >>> b = numpy.array([1, 2, 3], dtype='uint8') # b is a vector of length 3 >>> print(b) [1 2 3]
Grayscale images are represented as 2D arrays, the first dimension being the height (number of rows) and the second dimension being the width (number of columns). For instance:
>>> img = numpy.ndarray((480,640), dtype='uint8')
img which is a 2D array can be seen as a gray-scale image of
dimension 640 (width) by 480 (height). In addition,
img can be seen
as a matrix with 480 rows and 640 columns. This is the reason why we
have decided that for images, the first dimension is the height and the
second one the width, such that it matches the matrix convention as
Color images are represented as 3D arrays, the first dimension being the number of color planes, the second dimension the height and the third the width. As an image is an array, this is the responsibility of the user to know in which color space the content is stored. Bob’s Color Conversion Routines provides functions to perform color-space conversion:
>>> import bob.ip.color >>> colored = numpy.ndarray((3,480,640), dtype='uint8') >>> gray = bob.ip.color.rgb_to_gray(colored) >>> print (gray.shape) [480 640]
A video can be seen as a sequence of images over time. By convention, the first dimension is for the frame indices (time index), whereas the remaining ones are related to the corresponding image frame. More information about loading and handling video sources can be found in Bob’s Video I/O Routines.
Audio signals in Bob are represented as 2D arrays: the first dimension being the number of channels and the second dimension corresponding to the time index. For instance:
>>> import bob.io.audio >>> audio = bob.io.audio.reader("test.wav") >>> audio.rate 16000.0 >>> signal = audio.load() >>> signal.shape (1, 268197)
Bob’s Audio I/O Routines supports loading a variety of audio files. Please refer to its documentation for more information.
You can also use
scipy.io.wavfile to load wav files in Python but the
returned data is slightly different compared to
bob.io.audio. In Scipy
the first dimension corresponds to the time index rather than the audio
channel. Also in Scipy, the loaded signal maybe an
something else depending on the audio but
bob.io.audio always returns
the data as
float arrays. We recommend using
bob.io.audio since it
supports more audio formats and it is more consistent with the rest of Bob
Input and output¶
The default way to read and write data from and to files with Bob is using the binary HDF5 format which has several tools to inspect those files. Bob’s support for HDF5 files is given through the Bob’s Core I/O Routines package.
On the other hand, loading and writing of different kinds of data is provided in other Packages of Bob using a plug-in strategy. Many image types can be read using Bob’s I/O Routines for Images of Various type, and many video codecs are supported through the Bob’s Video I/O Routines plug-in. Also, a comprehensive support for MatLab files is given through the Matlab(R) I/O Support for Bob interface.
Additionally, Bob’s Core I/O Routines provides two generic functions
bob.io.base.save to load and save data of
various types, based on the filename extension. For example, to load a
.jpg image, simply call:
>>> import bob.io.base >>> import bob.io.image #under the hood: loads Bob plug-in for image files >>> img = bob.io.base.load("myimg.jpg")
The image processing module is split into several packages, where most
functionality is contained in the Bob’s Basic Image Processing Routines module. For an
introduction in simple affine image transformations such as scaling and
rotating images, as well as for more complex operations like Gaussian or Sobel
filtering, please refer to the Bob’s Basic Image Processing Routines. Also, simple
texture features like LBP’s can be extracted using
Gabor wavelet functionality has made it into its own package Bob’s Gabor wavelet routines. A tutorial on how to perform a Gabor wavelet transform, extract Gabor jets in grid graphs and compare Gabor jets, please read the Bob’s Gabor wavelet routines.
Machines and Trainers are one of the core components of Bob. Machines represent statistical models or other functions defined by parameters that can be trained or set by using trainers. Two examples of machines are multi-layer perceptrons (MLPs) and Gaussian mixture models (GMMs).
The operation you normally expect from a machine is to be able to feed a feature vector and extract the machine response or output for that input vector. It works, in many ways, similarly to signal processing blocks. Different types of machines will give you a different type of output. Here, we examine a few of the machines and trainers available in Bob.
For a start, you should read the Bob Linear Machines and Trainers, which is able to perform subspace projections like PCA and LDA.
Multi-Layer Perceptron (MLP) Machines and Trainers are provided within the Bob’s Multi-Layer Perceptron Machines package.
Generating strong classifiers by Boosting Strong Classifiers weak classifiers is provided by Generalized Boosting Framework using Stump and Look Up Table (LUT) based Weak Classifiers.
K-Means clustering and Gaussian Mixture Modeling, as well as Joint Factor Analysis, Inter-Session Variability and Total Variability modeling and, finally, Probabilistic Linear Discriminant Analysis is implemented in Expectation Maximization Machine Learning Tools.
Bob provides an API to easily query and interface with well known databases. A database contains information about the organization of the files, functions to query information such as the data which might be used for training a model, but it usually does not contain the data itself (except for some toy examples). Please visit Bob Database for an excellent guide on Bob’s datbases.
Bob includes a (growing) list of supported database interfaces. There are some small toy databases like Iris Flower Data Set and the MNIST Database Interface database can be used to train and evaluate classification experiments. For the former, a detailed example on how to use Bob’s machine learning techniques to classify the Iris flowers is given in Tutorial: Analysis of the Fisher Iris Dataset.
However, most of the databases contain face images, speech data or videos that are used for biometric recognition and presentation attack detection (anti-spoofing). A complete (and growing) list of database packages can be found in our Packages.
Several databases that can be used for biometric recognition share a common
interface, which is defined in the
package. Generic functionality that is available in all verification database
packages is defined in the bob.bio.base, while a list of
databases that implement this interface can be found in
bob.bio.spear, or any other biometric package depending
on the modality of the database.
Methods in the Bob’s Metric Routines module can be used evaluate error for multi-class or binary classification problems. Several evaluation techniques such as Root Mean Squared Error, F-score, Recognition Rates, False Acceptance and False Rejection Rates, and Equal Error Rates can be computed, but also functionality for plotting CMC, ROC, DET and EPC curves are described in more detail in the Bob’s Metric Routines.