deepdraw.engine.adabound#

Implementation of the AdaBound optimizer.

<https://github.com/Luolc/AdaBound/blob/master/adabound/adabound.py>:

@inproceedings{Luo2019AdaBound,
  author = {Luo, Liangchen and Xiong, Yuanhao and Liu, Yan and Sun, Xu},
  title = {Adaptive Gradient Methods with Dynamic Bound of Learning Rate},
  booktitle = {Proceedings of the 7th International Conference on Learning Representations},
  month = {May},
  year = {2019},
  address = {New Orleans, Louisiana}
}

Classes

`AdaBound`(params[, lr, betas, final_lr, ...])	Implements the AdaBound algorithm.
`AdaBoundW`(params[, lr, betas, final_lr, ...])	Implements AdaBound algorithm with Decoupled Weight Decay (See https://arxiv.org/abs/1711.05101)

class deepdraw.engine.adabound.AdaBound(params, lr=0.001, betas=(0.9, 0.999), final_lr=0.1, gamma=0.001, eps=1e-08, weight_decay=0, amsbound=False)[source]#

Bases: Optimizer

Implements the AdaBound algorithm.

Parameters:

params (list) – Iterable of parameters to optimize or dicts defining parameter groups
lr (float, optional) – Adam learning rate
betas (tuple, optional) – Coefficients (as a 2-tuple of floats) used for computing running averages of gradient and its square
final_lr (float, optional) – Final (SGD) learning rate
gamma (float, optional) – Convergence speed of the bound functions
eps (float, optional) – Term added to the denominator to improve numerical stability
weight_decay (float, optional) – Weight decay (L2 penalty)
amsbound (bool, optional) – Whether to use the AMSBound variant of this algorithm

step(closure=None)[source]#

Performs a single optimization step.

Parameters:: closure (callable, optional) – A closure that reevaluates the model and returns the loss.

class deepdraw.engine.adabound.AdaBoundW(params, lr=0.001, betas=(0.9, 0.999), final_lr=0.1, gamma=0.001, eps=1e-08, weight_decay=0, amsbound=False)[source]#

Bases: Optimizer

Implements AdaBound algorithm with Decoupled Weight Decay (See https://arxiv.org/abs/1711.05101)

Parameters:

params (list) – Iterable of parameters to optimize or dicts defining parameter groups
lr (float, optional) – Adam learning rate
betas (tuple, optional) – Coefficients (as a 2-tuple of floats) used for computing running averages of gradient and its square
final_lr (float, optional) – Final (SGD) learning rate
gamma (float, optional) – Convergence speed of the bound functions
eps (float, optional) – Term added to the denominator to improve numerical stability
weight_decay (float, optional) – Weight decay (L2 penalty)
amsbound (bool, optional) – Whether to use the AMSBound variant of this algorithm

step(closure=None)[source]#

Performs a single optimization step.

Parameters:: closure (callable, optional) – A closure that reevaluates the model and returns the loss.