RMSprop
RMSprop is an adaptive learning rate optimizer that is very similar to Adagrad
. RMSprop stores a weighted average of the squared past gradients for each parameter and uses it to scale their learning rate. This allows the learning rate to be automatically lower or higher depending on the magnitude of the gradient, and it prevents the learning rate from diminishing.
RMSprop
class bitsandbytes.optim.RMSprop
< source >( params lr = 0.01 alpha = 0.99 eps = 1e-08 weight_decay = 0 momentum = 0 centered = False optim_bits = 32 args = None min_8bit_size = 4096 percentile_clipping = 100 block_wise = True )
RMSprop8bit
class bitsandbytes.optim.RMSprop8bit
< source >( params lr = 0.01 alpha = 0.99 eps = 1e-08 weight_decay = 0 momentum = 0 centered = False args = None min_8bit_size = 4096 percentile_clipping = 100 block_wise = True )
RMSprop32bit
class bitsandbytes.optim.RMSprop32bit
< source >( params lr = 0.01 alpha = 0.99 eps = 1e-08 weight_decay = 0 momentum = 0 centered = False args = None min_8bit_size = 4096 percentile_clipping = 100 block_wise = True )