.. vim: set fileencoding=utf-8 : =========== User guide =========== This package builds on top of tensorflow_ (at least 2.3 is needed). You are expected to have some familiarity with it before continuing. The best way to use tensorflow_ is to use its ``tf.keras`` and ``tf.data`` API. We recommend reading at least the following pages: * https://www.tensorflow.org/tutorials/quickstart/beginner * https://www.tensorflow.org/tutorials/quickstart/advanced * https://keras.io/getting_started/intro_to_keras_for_engineers/ * https://keras.io/getting_started/intro_to_keras_for_researchers/ * https://www.tensorflow.org/tutorials/load_data/images * https://www.tensorflow.org/guide/data If you were used to Tensorflow 1 API, then reading these pages are also recommended: * https://www.tensorflow.org/guide/effective_tf2 * https://www.tensorflow.org/guide/migrate * https://www.tensorflow.org/guide/upgrade * https://github.com/tensorflow/community/blob/master/sigs/testing/faq.md In the rest of this guide, you will learn a few tips and examples on how to: * Port v1 checkpoints to tf v2 format. * Create datasets and save TFRecords. * Create models with custom training and evaluation logic. * Mixed-precision training * Multi-GPU and multi-worker training After reading this page, you may look at a complete example in: https://gitlab.idiap.ch/bob/bob.learn.tensorflow/-/blob/master/examples/MSCeleba_centerloss_mixed_precision_multi_worker.py Porting V1 Tensorflow checkpoints to V2 ======================================= Take a look at the notebook located at: https://gitlab.idiap.ch/bob/bob.learn.tensorflow/-/blob/master/examples/convert_v1_checkpoints_to_v2.ipynb for an example. Creating datasets from data =========================== If you are working with Bob databases, below is an example of converting them to ``tf.data.Dataset``'s using :any:`bob.learn.tensorflow.data.dataset_using_generator`: .. testsetup:: import tempfile temp_dir = model_dir = tempfile.mkdtemp() .. doctest:: >>> import bob.db.atnt >>> from bob.learn.tensorflow.data import dataset_using_generator >>> import tensorflow as tf >>> db = bob.db.atnt.Database() >>> samples = db.objects(groups="world") >>> # construct integer labels for each identity in the database >>> CLIENT_IDS = (str(f.client_id) for f in samples) >>> CLIENT_IDS = list(set(CLIENT_IDS)) >>> CLIENT_IDS = dict(zip(CLIENT_IDS, range(len(CLIENT_IDS)))) >>> def reader(sample): ... img = sample.load(db.original_directory, db.original_extension) ... label = CLIENT_IDS[str(sample.client_id)] ... return img, label >>> dataset = dataset_using_generator(samples, reader) >>> dataset Create TFRecords from tf.data.Datasets ====================================== Use :any:`bob.learn.tensorflow.data.dataset_to_tfrecord` and :any:`bob.learn.tensorflow.data.dataset_from_tfrecord` to painlessly convert **any** ``tf.data.Dataset`` to TFRecords and create datasets back from those TFRecords: >>> from bob.learn.tensorflow.data import dataset_to_tfrecord >>> from bob.learn.tensorflow.data import dataset_from_tfrecord >>> path = f"{temp_dir}/my_dataset" >>> dataset_to_tfrecord(dataset, path) >>> dataset = dataset_from_tfrecord(path) >>> dataset There is also a script called ``bob tf dataset-to-tfrecord`` that wraps the :any:`bob.learn.tensorflow.data.dataset_to_tfrecord` for easy Grid job submission. Create models with custom training and evaluation logic ======================================================= Training models for biometrics recognition (and metric learning in general) is different from the typical classification problems since the labels during training and testing are different. We found that overriding the ``compile``, ``train_step``, and ``test_step`` methods as explained in https://www.tensorflow.org/guide/keras/customizing_what_happens_in_fit is the best trade-off between the control of what happens during training and evaluation and writing boilerplate code. Mixed-precision training ======================== When doing mixed precision training: https://www.tensorflow.org/guide/mixed_precision it is important to scale the loss before computing the gradients. Multi-GPU and multi-worker training =================================== It is important that custom metrics and losses do not average their results by the batch size as the values should be averaged by the global batch size: https://www.tensorflow.org/tutorials/distribute/custom_training Take a look at custom metrics and losses in this package for examples of correct implementations. .. _tensorflow: https://www.tensorflow.org/