Python API

The YouTube Faces database protocol interface. Please refer to http://www.cs.tau.ac.il/~wolf/ytfaces for information how to get a copy of the original data.

Note

There has been errata data published for the database. These errata is not considered in the protocols (yet).

The YouTube database consists of 10 different splits, which are called “fold” here (to be consistent with the LFW database). In each fold 9/10 of the database are used for training, and one for evaluation. In this implementation of the YouTube protocols, up to 7/10 of the data is used for training (groups='world'), 2/10 are used for development (to estimate a threshold; groups='dev') and the last 1/10 is finally used to evaluate the system (groups='eval').

To compute recognition results, please execute experiments on all 10 protocols (protocol='fold1'protocol='fold10') and average the resulting classification results (cf. http://vis-www.cs.umass.edu/lfw for details on scoring).

The design of this implementation differs slightly compared to the one from http://www.cs.tau.ac.il/~wolf/ytfaces. Originally, only lists of image pairs are provided by the creators of the YouTube database. To be consistent with other Bob databases, here the lists are split up into files to be enrolled, and probe files. The files to be enrolled are always the first file in the pair, while the second pair item is used as probe.

Note

When querying probe files, please always query probe files for a specific model id: objects(..., purposes = 'probe', model_ids = (model_id,)). In this case, you will follow the default protocols given by the database.

When querying training files objects(..., groups='world'), you will automatically end up with the “image restricted configuration”. When you want to respect the “unrestricted configuration” (cf. README on http://vis-www.cs.umass.edu/lfw), please query the files that belong to the pairs, via objects(..., groups='world', world_type='unrestricted')

If you want to stick to the original protocol and use only the pairs for training and testing, feel free to query the pairs function.

Note

The pairs that are provided using the pairs function, and the files provided by the objects function (see note above) correspond to the identical model/probe pairs. Hence, either of the two approaches should give the same recognition results.

class bob.db.youtube.Database(original_directory=None, original_extension='/*.jpg', annotation_extension='.labeled_faces.txt')

Bases: bob.db.base.SQLiteDatabase

The dataset class opens and maintains a connection opened to the Database.

It provides many different ways to probe for the characteristics of the data and for the data itself inside the database.

annotations(directory, image_names=None)[source]

Returns the annotations for the given file id as a dictionary of dictionaries, e.g. {‘1.56.jpg’ : {‘topleft’:(y,x), ‘bottomright’:(y,x)}, ‘1.57.jpg’ : {‘topleft’:(y,x), ‘bottomright’:(y,x)}, …}. Here, the key of the dictionary is the full image file name of the original image.

Keyword parameters:

directory

The Directory object for which you want to retrieve the annotations

image_names

If given, only the annotations for the given image names (without path, but including filaname extension) are extracted and returned

clients(protocol=None, groups=None, subworld='sevenfolds', world_type='unrestricted')[source]

Returns a list of Client objects for the specific query by the user.

Keyword Parameters:

protocol

The protocol to consider; one of: (‘fold1’, …, ‘fold10’), or None

groups

The groups to which the clients belong; one or several of: (‘world’, ‘dev’, ‘eval’)

subworld

The subset of the training data. Has to be specified if groups includes ‘world’ and protocol is one of ‘fold1’, …, ‘fold10’. It might be exactly one of (‘onefolds’, ‘twofolds’, …, ‘sevenfolds’). Ignored for group ‘dev’ and ‘eval’.

world_type

One of (‘restricted’, ‘unrestricted’). Ignored.

Returns: A list containing all Client objects which have the desired properties.

get_client_id_from_file_id(file_id, **kwargs)[source]

Returns the client_id (real client id) attached to the given file_id

Keyword Parameters:

file_id

The file_id to consider

Returns: The client_id attached to the given file_id

get_client_id_from_model_id(model_id, **kwargs)[source]

Returns the client_id (real client id) attached to the given model id

Keyword Parameters:

model_id

The model to consider

Returns: The client_id attached to the given model

groups()[source]

Returns the groups, which are available in the database.

model_ids(protocol=None, groups=None)[source]

Returns a list of model ids for the specific query by the user. For the ‘dev’ and ‘eval’ groups, the first element of each pair is extracted.

Keyword Parameters:

protocol

The protocol to consider; one of: (‘fold1’, …, ‘fold10’), or None

groups

The groups to which the clients belong; one or several of: (‘dev’, ‘eval’) The ‘eval’ group does not exist for protocol ‘view1’.

Returns: A list containing all model ids which have the desired properties.

models(protocol=None, groups=None)[source]

Returns a list of Directory objects (there are multiple models per client) for the specific query by the user. For the ‘dev’ and ‘eval’ groups, the first element of each pair is extracted.

Keyword Parameters:

protocol

The protocol to consider; one of: (‘fold1’, …, ‘fold10’), or None

groups

The groups to which the clients belong; one or several of: (‘dev’, ‘eval’)

Returns: A list containing all Directory objects which have the desired properties.

objects(protocol=None, model_ids=None, groups=None, purposes=None, subworld='sevenfolds', world_type='unrestricted')[source]

Returns a list of Directory objects for the specific query by the user.

Keyword Parameters:

protocol

The protocol to consider (‘fold1’, …, ‘fold10’), or None

groups

The groups to which the objects belong (‘world’, ‘dev’, ‘eval’)

purposes

The purposes of the objects (‘enroll’, ‘probe’)

subworld

The subset of the training data. Has to be specified if groups includes ‘world’ and protocol is one of ‘fold1’, …, ‘fold10’. It might be exactly one of (‘onefolds’, ‘twofolds’, …, ‘sevenfolds’).

world_type

One of (‘restricted’, ‘unrestricted’). If ‘restricted’, only the files that are used in one of the training pairs are used. For ‘unrestricted’, all files of the training people are returned.

model_ids

Only retrieves the objects for the provided list of model ids. If ‘None’ is given (this is the default), no filter over the model_ids is performed. Note that the combination of ‘world’ group and ‘model_ids’ should be avoided.

Returns: A list of Directory objects considering all the filtering criteria.

original_file_name(directory, check_existence=None)[source]

Returns the list of original image names for the given directory, sorted by frame number. In opposition to other bob databases, here a list of file names is returned.

Keyword arguments:

directorybob.db.youtube.Directory

The Directory object to retrieve the list of file names for

check_existencebool

Shall the existence of the files be checked?

pairs(protocol=None, groups=None, classes=None, subworld='sevenfolds')[source]

Queries a list of Pair’s of files.

Keyword Parameters:

protocol

The protocol to consider (‘fold1’, …, ‘fold10’)

groups

The groups to which the objects belong (‘world’, ‘dev’, ‘eval’)

classes

The classes to which the pairs belong (‘matched’, ‘unmatched’)

subworld

The subset of the training data. Has to be specified if groups includes ‘world’ and protocol is one of ‘fold1’, …, ‘fold10’. It might be exactly one of (‘onefolds’, ‘twofolds’, …, ‘sevenfolds’).

Returns: A list of Pair’s considering all the filtering criteria.

protocol_names()[source]

Returns the names of the valid protocols.

subworld_names(protocol=None)[source]

Returns all valid sub-worlds for the fold.. protocols.

tmodel_ids(protocol, groups=None)[source]

Returns a list of T-Norm model ids that can be used for ZT norm. In fact, it uses the model ids from two other splits of the data, specifically, the last two of the training splits. Hence, to get training data independent from ZT-Norm data, use maximum subworld=’fivefolds’ in the world query.

Keyword Parameters:

protocol

The protocol to consider; one of: (‘fold1’, …, ‘fold10’), or None

groups

Ignored.

Returns: A list containing all Directory objects which have the desired properties.

tmodels(protocol=None, groups=None)[source]

Returns a list of T-Norm models that can be used for ZT norm. In fact, it uses the model ids from two other splits of the data, specifically, the last two of the training splits. Hence, to get training data independent from ZT-Norm data, use maximum subworld=’fivefolds’ in the world query.

Keyword Parameters:

protocol

The protocol to consider; one of: (‘fold1’, …, ‘fold10’), or None

groups

Ignored.

Returns: A list containing all Directory objects which have the desired properties.

tobjects(protocol, model_ids=None, groups=None)[source]
Returns a set of filenames for enrolling T-norm models for score

normalization.

Keyword Parameters:

protocol

The protocol to consider (‘fold1’, …, ‘fold10’), or None

model_ids

Only retrieves the files for the provided list of model ids. If ‘None’ is given (this is the default), no filter over the model_ids is performed.

groups

Ignored.

Returns: A set of Directory objects with the given properties.

world_types()[source]

Returns the valid types of worlds: (‘restricted’, ‘unrestricted’).

zobjects(protocol, model_ids=None, groups=None)[source]
Returns a set of filenames for Z-norm probing for score

normalization.

Keyword Parameters:

protocol

The protocol to consider (‘fold1’, …, ‘fold10’), or None

model_ids

Only retrieves the files for the provided list of model ids. If ‘None’ is given (this is the default), no filter over the model_ids is performed.

groups

Ignored.

Returns: A set of Directory objects with the given properties.

class bob.db.youtube.Client(id, name)

Bases: sqlalchemy.ext.declarative.api.Base

Information about the clients (identities) of the Youtube Faces database.

id
name
class bob.db.youtube.Directory(file_id, client_id, path)

Bases: sqlalchemy.ext.declarative.api.Base, bob.db.base.File

Information about the directories of the Youtube Faces database.

client
client_id
id
path
shot_id
class bob.db.youtube.Pair(protocol, enroll_id, probe_id, enroll_client_id, probe_client_id, is_match)

Bases: sqlalchemy.ext.declarative.api.Base

Information of the pairs (as given in the pairs.txt files) of the LFW database.

enroll_client
enroll_client_id
enroll_directory
enroll_directory_id
id
is_match
probe_client
probe_client_id
probe_directory
probe_directory_id
protocol
bob.db.youtube.get_config()[source]

Returns a string containing the configuration information.