Coverage for /scratch/builds/bob/bob.ip.binseg/miniconda/conda-bld/bob.ip.binseg_1635977648782/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/lib/python3.8/site-packages/bob/ip/binseg/configs/datasets/csv.py: 64%

Hot-keys on this page

r m x p toggle line displays

j k next/prev highlighted chunk

0 (zero) top of page

1 (one) first highlighted chunk

1#!/usr/bin/env python

2# coding=utf-8

4"""Example CSV-based custom filelist dataset

6In case you have your own dataset that is organized on your filesystem (or

7elsewhere), this configuration shows an example setup so you can feed such data

8(potentially including any ground-truth you may have) to train, predict or

9evaluate one of the available network models.

11You must write CSV based file (e.g. using comma as separator) that describes

12the data (and ground-truth) locations for each sample on your dataset. So, for

13example, if you have a file structure like this:

15.. code-block:: text

17 ├── images

18 ├── image_1.png

19 ├── ...

20 └── image_n.png

21 └── ground-truth

22 ├── gt_1.png

23 ├── ...

24 └── gt_n.png

26Then create one or more files, each containing a subset of your dataset:

28.. code-block:: text

30 images/image_1.png,ground-truth/gt_1.png

31 ...,...

32 images/image_n.png,ground-truth/gt_n.png

34To create a subset without ground-truth (e.g., for prediction purposes), then

35omit the second column on the CSV file.

37Use the path leading to the CSV file and carefully read the comments in this

38configuration. **Copy it locally to make changes**:

40.. code-block:: sh

42 $ bob binseg config copy csv-dataset-example mydataset.py

43 # edit mydataset.py as explained here, follow the comments

45Finally, the only object this file needs to provide is one named ``dataset``,

46and it should contain a dictionary mapping a name, such as ``train``, ``dev``,

47or ``test``, to objects of type :py:class:`torch.utils.data.Dataset`. As you

48will see in this example, we provide boilerplate code to do so.

50More information:

52* :py:class:`bob.ip.binseg.data.dataset.CSVDataset` for operational details.

53* :py:class:`bob.ip.binseg.data.dataset.JSONDataset` for an alternative for

54 multi-protocol datasets (all of our supported raw datasets are implemented

55 using this)

56* :py:func:`bob.ip.binseg.configs.datasets.make_dataset` for extra

57 information on the sample list to pytorch connector.

59"""

61import os

63from bob.ip.binseg.data.dataset import CSVDataset

64from bob.ip.binseg.data.loader import load_pil_1, load_pil_rgb

65from bob.ip.binseg.data.sample import Sample

67# How we use the loaders - "sample" is a dictionary where keys are defined

68# below and map to the columns of the CSV files you input. This one is

69# configured to load images and labels using PIL.

72def _loader(context, sample):

73 # "context" is ignored in this case - database is homogeneous

74 # it is a dictionary that passes e.g., the name of the subset

75 # being loaded, so you can take contextual decisions on the loading

77 # Using the path leading to the various data files stored in disk allows

78 # the CSV file to contain only relative paths and is, therefore, more

79 # compact. Of course, you can make those paths absolute and then simplify

80 # it here.

81 root_path = "/path/where/raw/files/sit"

83 data = load_pil_rgb(os.path.join(root_path, sample["data"]))

84 label = load_pil_1(os.path.join(root_path, sample["label"]))

86 # You may also return DelayedSample to avoid data loading to take place

87 # as the sample object itself is created. Take a look at our own datasets

88 # for examples.

89 return Sample(

90 key=os.path.splitext(sample["data"])[0],

91 data=dict(data=data, label=label),

92 )

95# This is just a class that puts everything together: the CSV file, how to load

96# each sample defined in the dataset, and names for the various columns of the

97# CSV file. Once created, this object can be called to generate sample lists.

99_raw_dataset = CSVDataset(

100 # path to the CSV file(s) - you may add as many subsets as you want:

101 # * "__train__" is used for training a model (stock data augmentation is

102 # applied via our "make_dataset()" connector)

103 # * anything else can be used for prediction and/or evaluation (if labels

104 # are also provided in such a set). Data augmentation is NOT applied

105 # using our "make_dataset()" connector.

106 subsets={

107 "__train__": "<path/to/train.csv>", # applies data augmentation

108 "train": "<path/to/train.csv>", # no data augmentation, evaluate it

109 "test": "<path/to/test.csv>", # no data augmentation, evaluate it

110 },

111 fieldnames=("data", "label"), # these are the column names

112 loader=_loader,

113)

114

115# Finally, we build a connector to passes our dataset to the pytorch framework

116# so we can, for example, train and evaluate a pytorch model. The connector

117# only converts the sample lists into a standard tuple (data[, label[, mask]])

118# that is expected by our engines, after applying the (optional)

119# transformations you define.

120

121# from bob.ip.binseg.configs.datasets import make_dataset as _maker

122

123# Add/tune your (optional) transforms below - these are just examples

124# compatible with a model that requires image inputs of 544 x 544 pixels.

125# from bob.ip.binseg.data.transforms import CenterCrop

126

127# dataset = _maker(_raw_dataset.subsets(), [CenterCrop((544, 544))])

11 statements 7 run 4 missing 0 excluded

11 statements