Database

Bob provides an API to easily query and interface with well known databases. A Bob database contains information about the organization of the files, functions to query information such as the data which might be used for training a model, but it usually does not contain the data itself (except for some toy examples). Most of the databases are stored in a sqlite file, whereas the smallest ones can be stored as filelists.

As databases usually contain thousands of files, and as verification protocols often require to store information about pairs of files, the size of such databases can become very large. For this reason, we have decided to externalize many of them in Satellite Packages.

Iris Flower Data Set

The Iris flower data set or Fisher’s Iris data set is a multivariate data set introduced by Sir Ronald Aylmer Fisher (1936) as an example of discriminant analysis. The dataset consists of 50 samples from three species of Iris flowers (Iris setosa, Iris virginica and Iris versicolor). Four features were measured from each sample, they are the length and the width of sepal and petal, in centimeters.

As this data set is quite small and used for testing purpose, it is directly integrated into Bob, which provides both ways to access the data, as well as the data itself (feature vectors of length four for various samples of the three species).

A description of the feature vector can be obtained using the attribute bob.db.iris.names.

>>> descriptor_labels = bob.db.iris.names
>>> descriptor_labels
['Sepal Length', 'Sepal Width', 'Petal Length', 'Petal Width']

The data (feature vectors) can be retrieved using the bob.db.iris.data() function. This returns a 3-key dictionary, with 3 numpy.ndarray as values, one for each of the three species of Iris flowers.

>>> data = bob.db.iris.data()
>>> type(data['setosa'])
<type 'numpy.ndarray'>
>>> data['setosa'].shape
(50, 4)
>>> data.keys()
['setosa', 'versicolor', 'virginica']

Each numpy.ndarray consists of 50 feature vectors of length four.

The database also contains statistics about the feature vectors, which can be obtained using the bob.db.iris.stats dictionary. A description of these statistics is provided by bob.db.iris.stat_names.