Densenet model optimization

Note

The Densenet121 model contains 7’216’513 parameters.

Training on TB datasets from scratch

To select the optimal learning rate and batch size for the training on the TB datasets from scratch (densenet not pretrained), we did a grid search with the following parameters.

  • learning rate of 1e-4, 5e-5 and 1e-5

  • batch size of 4 and 8

We systematically used the training set of the combined dataset MC-CH-IN for this optimization.

The minimum validation loss we found is 0.3168 by using a learning rate of 5e-5 and a batch size of 8

Thresholds selection

The threshold was systematically selected on the validation set of the datasets on which the model was trained.

  • Threshold for Densenet trained on MC: 0.599

  • Threshold for Densenet trained on MC-CH: 0.519

  • Threshold for Densenet trained on MC-CH-IN: 0.472

Pre-training on NIH CXR14

We used the pretrained Densenet121 model provided by PyTorch. For the pretraining on the NIH CXR14 dataset, the hyperparameters from the CheXNeXt study were used: batch size of 8, learning rate 1e-4 and the default Adam optimizer parameters: beta_1=0.9, beta_2=0.999, epsilon = 1e-8.

Fine-tuning on TB datasets

To select the optimal learning rate and batch size for the fine-tuning (after the pre-training on NIH CXR14), we did a grid search with the following parameters.

  • learning rate of 1e-4, 1e-5, 5e-6, 1e-6

  • batch size of 4, 8 and 16

We systematically used the training set of the combined dataset MC-CH-IN for this optimization.

The minimum validation loss we found is 0.1511 by using a learning rate of 1e-4 and a batch size of 8

Minimum validation loss grid search

This table indicates the minimum validation loss obtained for each combination of learning rate and batch size.

Learning rate

Batch size of 4

Batch size of 8

Batch size of 16

1e-4 (training for 300 epochs)

0.2053

0.1511

0.2372

1e-5 (training for 500 epochs)

0.1832

0.1931

0.2326

5e-6 (training for 300 epochs)

0.1932

0.2234

0.2298

1e-6 (training for 600 epochs)

0.2086

0.2139

0.2138

Thresholds selection

The threshold was systematically selected on the validation set of the datasets on which the model was trained.

  • Threshold for Densenet trained on MC: 0.688

  • Threshold for Densenet trained on MC-CH: 0.386

  • Threshold for Densenet trained on MC-CH-IN: 0.432

Other hyperparameters

The default Adam optimizer parameters were used: beta_1=0.9, beta_2=0.999, epsilon = 1e-8.