bob.ip.common.data.dataset¶

Classes

`CSVDataset`(subsets, fieldnames, loader)	Generic multi-subset filelist dataset that yields samples
`JSONDataset`(protocols, fieldnames, loader)	Generic multi-protocol/subset filelist dataset that yields samples

class bob.ip.common.data.dataset.JSONDataset(protocols, fieldnames, loader)[source]¶

Bases: object

Generic multi-protocol/subset filelist dataset that yields samples

To create a new dataset, you need to provide one or more JSON formatted filelists (one per protocol) with the following contents:

{
    "subset1": [
        [
            "value1",
            "value2",
            "value3"
        ],
        [
            "value4",
            "value5",
            "value6"
        ]
    ],
    "subset2": [
    ]
}

Your dataset many contain any number of subsets, but all sample entries must contain the same number of fields.

Parameters

protocols (list, dict) – Paths to one or more JSON formatted files containing the various protocols to be recognized by this dataset, or a dictionary, mapping protocol names to paths (or opened file objects) of CSV files. Internally, we save a dictionary where keys default to the basename of paths (list input).
fieldnames (list, tuple) – An iterable over the field names (strings) to assign to each entry in the JSON file. It should have as many items as fields in each entry of the JSON file.
loader (object) –
A function that receives as input, a context dictionary (with at least a “protocol” and “subset” keys indicating which protocol and subset are being served), and a dictionary with {fieldname: value} entries, and returns an object with at least 2 attributes:
- key: which must be a unique string for every sample across subsets in a protocol, and
- data: which contains the data associated witht this sample

check(limit=0)[source]¶

For each protocol, check if all data can be correctly accessed

This function assumes each sample has a data and a key attribute. The key attribute should be a string, or representable as such.

Parameters: limit (int) – Maximum number of samples to check (in each protocol/subset combination) in this dataset. If set to zero, then check everything.
Returns: errors – Number of errors found
Return type: int

subsets(protocol)[source]¶

Returns all subsets in a protocol

This method will load JSON information for a given protocol and return all subsets of the given protocol after converting each entry through the loader function.

Parameters: protocol (str) – Name of the protocol data to load
Returns: subsets – A dictionary mapping subset names to lists of objects (respecting the key, data interface).
Return type: dict

class bob.ip.common.data.dataset.CSVDataset(subsets, fieldnames, loader)[source]¶

Bases: object

Generic multi-subset filelist dataset that yields samples

To create a new dataset, you only need to provide a CSV formatted filelist using any separator (e.g. comma, space, semi-colon) with the following information:

value1,value2,value3
value4,value5,value6
...

Notice that all rows must have the same number of entries.

Parameters

subsets (list, dict) – Paths to one or more CSV formatted files containing the various subsets to be recognized by this dataset, or a dictionary, mapping subset names to paths (or opened file objects) of CSV files. Internally, we save a dictionary where keys default to the basename of paths (list input).
fieldnames (list, tuple) – An iterable over the field names (strings) to assign to each column in the CSV file. It should have as many items as fields in each row of the CSV file(s).
loader (object) – A function that receives as input, a context dictionary (with, at least, a “subset” key indicating which subset is being served), and a dictionary with {key: path} entries, and returns a dictionary with the loaded data.

check(limit=0)[source]¶

For each subset, check if all data can be correctly accessed

This function assumes each sample has a data and a key attribute. The key attribute should be a string, or representable as such.

Parameters: limit (int) – Maximum number of samples to check (in each protocol/subset combination) in this dataset. If set to zero, then check everything.
Returns: errors – Number of errors found
Return type: int

subsets()[source]¶

Returns all available subsets at once

Returns: subsets – A dictionary mapping subset names to lists of objects (respecting the key, data interface).
Return type: dict

samples(subset)[source]¶

Returns all samples in a subset

This method will load CSV information for a given subset and return all samples of the given subset after passing each entry through the loading function.

Parameters: subset (str) – Name of the subset data to load
Returns: subset – A lists of objects (respecting the key, data interface).
Return type: list