User’s Guide

This package contains the access API and descriptions for the AT&T database of faces, which is formerly known as the ORL database. This package only contains the Bob accessor methods to use the dataset directly from python. The actual raw data for the database should be downloaded from the original URL. A convenient command is provided for this purpose:

$ bob_dbmanage.py atnt download

This command will try to download and install the database on a directory that is internal to the package. In case you don’t have write access to such directory, use the --output-dir flag to specify an alternate directory:

$ bob_dbmanage.py atnt download --output-dir raw

The command above will download the raw data files of the AT&T database into the directory raw inside your current working directory.

The Database Interface

The bob.db.atnt.Database provides an interface to access samples from this dataset. The database object is initialized by passing it the location where the raw samples have been downloaded. Assuming downloading to the package inner directory worked for you, then you don’t need to pass any parameter to the constructor:

>>> import bob.db.atnt
>>> db = bob.db.atnt.Database()

You can then use the bob.db.atnt.Database.objects() to access pointers to the raw data samples of the AT&T dataset in a programmatic way:

>>> for sample in db.objects():
...     # do something with "sample"
...     pass

In the case of this database, each “sample” returned by bob.db.atnt.Database.objects() is actually an object of class bob.db.atnt.File, representing the abstraction of a single (raw) dataset file. File objects in this package contain a path variable that point to their relative location w.r.t. a database root directory:

>>> f = db.objects()[0]
>>> type(f) 
<... 'bob.db.atnt.models.File'>
>>> f.path 
'...'

You may use the method bob.db.atnt.File.make_path() to construct paths which contain both a prefix directory and a suffixed extension. For example, to build a full path to an installed image in the raw dataset, call this method without any parameters:

>>> f.make_path()
'/install/path/s1/9.pgm'

You may override the default directory and extensions that are attached to the return path. For example:

>>> f.make_path('/another/path', '.hdf5') 
'/another/path/....hdf5'

You may load the contents of the image file pointed by this database entry using the bob.db.atnt.File.load() method:

>>> image = f.load()
>>> type(image) 
<... 'numpy.ndarray'>
>>> image.shape
(112, 92)
>>> image.dtype
dtype('uint8')

Pipelines

In data processing pipelines, it is typical to save the intermediate result of processing images to temporary files you’ll need to load later. In Bob, those files are normally HDF5 files (see Bob’s Core I/O Routines). You can easily create a processing pipeline re-using the database interface like this:

1>>> image = f.load()
2>>> processed = processor(image)
3>>> f.save(processed, '/path/to/processed', '.hdf5')
4# stores "processed" in an HDF5 file file named /path/to/processed/s1/9.hdf5

Line 1 loads the image. Line 2 processes the image and generates a processed version of the image (e.g. as a numpy.ndarray). Line 3 above uses this db package interface to save the resulting file respecting the original database structure. This is convenient because of two reasons:

  1. You can manually inspect the directory containing processed images and quickly find the processed version of any original image in the database;

  2. You can re-use bob.db.atnt.File.load() to reload the processed file and continue the pipelining indefinitely.

For example, suppose one would like to re-process the processed image above, it is possible to repeat the coding pattern above, now defining input and output directories:

>>> processed = f.load('/path/to/processed', '.hdf5')
>>> reprocessed = reprocessor(processed)
>>> f.save(processed, '/path/to/reprocessed', '.hdf5')

Selectors

You may iterate over a subset of samples from the AT&T database using parameters to bob.db.atnt.Database.objects() (check its documentation for details). For example, to iterate over all the training images, one can write:

>>> training_images = []
>>> for sample in db.objects(groups='world'):
...   training_images.append(sample.load())

Command-line Interface

The command-line interface allows users to check or export information encoded in Python API via the console. Consult the command-line help for more details:

$ bob_dbmanage.py atnt --help
...