Skip to content
This repository has been archived by the owner on Nov 3, 2022. It is now read-only.

Add LearningRateMultiplier wrapper for optimizers #396

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

stante
Copy link

@stante stante commented Jan 7, 2019

Summary

Optimizer have a model global learning rate. This PR adds a wrapper, which can be used with existing optimizers to provide a facility to specify different learning rates per layers in a network. The per layer learning rate is specified as a factor, which is multiplied with the learning rate of the wrapped optimizer. This wrapper can be used in the following way:

multipliers = {'dense_1': 0.5, 'dense_2': 0.4}
opt = LearningRateMultiplier(SGD, lr_multipliers=multipliers, lr=0.001, momentum=0.9)

The example wrappes SGD and specifies lr and momentum for it. The layer which contain the string 'dense_1' has a multiplier of 0.5 and the layer which contains the string dense_2 has the multiplier of 0.4.

Different multipliers for kernel and bias can be specified with:

multipliers = {'dense_1/kernel': 0.5, 'dense_1/bias': 0.1}

Related Issues

There are issues regarding this topic in keras keras-team/keras#11934, keras-team/keras#7912 and partially keras-team/keras#5920

@gabrieldemarmiesse
Copy link
Contributor

It seems there is some pep8 errors and that the code isn't compatible with python 2 because of super() . Super takes two arguments in python 2. Usually it's the class and self.

@gabrieldemarmiesse
Copy link
Contributor

You can find out more about the errors by looking at the travis logs.

Copy link
Contributor

@gabrieldemarmiesse gabrieldemarmiesse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for working on that. Many people asked for this feature, it's very welcome. Since your optimizer is quite special, (an optimizer inside an optimizer) we'll make sure that we minimize the amount of hackyness so that it works in as many cases as possible. See my comments. If you have any questions/problems, feel free to ask for help.

learning rate of the optimizer.

Note: This is a wrapper and does not implement any
optimization algorithm.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about two examples?

  • One where you specify manually the learning rate by using the strings as keys {'conv_1/kernel':0.5, 'conv_1/bias':0.1}
  • One where you programmatically set the learning rates by iterating through the layers of the model (for big models this is useful). I suppose that it should be possible with a for loop and getting the layer.name as the key of the dictionary.

# Arguments
optimizer: An optimizer class to be wrapped.
lr_multipliers: Dictionary of the per layer factors. For
example `optimizer={'conv_1/kernel':0.5, 'conv_1/bias':0.1}`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: the keyword is lr_multipliers.

optimization algorithm.

# Arguments
optimizer: An optimizer class to be wrapped.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think optimizer should be an optimizer instance, not an optimizer class. Let's minimize the hackyness.

class.
"""
def __init__(self, optimizer, lr_multipliers=None, **kwargs):
self._class = optimizer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think underscores are needed.

optimizers._test_optimizer(opt1, target=0.95)

mult = {'dense': 10}
opt2 = LearningRateMultiplier(SGD, lr_multipliers=mult,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make a second function test_lr_multiplier_layerwise for this?

mult = {'dense': 10}
opt2 = LearningRateMultiplier(SGD, lr_multipliers=mult,
lr=0.001, momentum=0.9, nesterov=True)
optimizers._test_optimizer(opt2, target=0.95)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll also need a third test test_lr_multiplier_weightwise where you use the format {'layer_name/weight_name': lr} to ensure that all configuration work.

And a fourth test with a more complex optimizer (ADAM would be a good fit)

from keras_contrib.tests import optimizers
from keras_contrib.optimizers import LearningRateMultiplier
from keras.optimizers import SGD, Adam
from keras.callbacks import LearningRateScheduler
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused import

if name.startswith('_'):
super(LearningRateMultiplier, self).__setattr__(name, value)
else:
self._optimizer.__setattr__(name, value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think __setattr__ and __getattr__ are needed. By calling the right super() functions at the right places, everything should work. Ask me if you have any issues while removing them.

You'll likely have to have a lr parameter which will be the same as self.optimizer.lr since many callbacks expect a lr attribute.

self._class = optimizer
self._optimizer = optimizer(**kwargs)
self._lr_multipliers = lr_multipliers or {}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should call super() at the end of the __init__ function. You can take a look at the source code of keras optimizers to see what happends.

return updates

def get_config(self):
config = {'optimizer': self._class,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since optimizer will be an instance of the class optimizer, you should use the function serialize_keras_object which will serialize the optimizer for you.

@Dicksonchin93
Copy link

will there be updates on this? if not can I make a new PR that adds this class to keras-contrib? @gabrieldemarmiesse @stante , will be enabling DiscriminativeLearningRate in general but not specifically only learning rate multiplier.

I propose three settings, automatic learning rate decaying (cosine) from the base learning rate of the wrapped optimizer by layer, automatic learning rate decaying (cosine) from the base learning rate of the wrapped optimizer by convolutional blocks/groups, and this learning rate multiplier

@gabrieldemarmiesse
Copy link
Contributor

Keras contrib is currently deprecated. Please redicted the PRs to tensorflow/addons. It would be really nice if you could add that @Dicksonchin93 , a lot of people are asking for this feature :)

@Dicksonchin93
Copy link

@gabrieldemarmiesse is there a reason why we shouldn't add this into keras directly?

@gabrieldemarmiesse
Copy link
Contributor

gabrieldemarmiesse commented Jan 9, 2020 via email

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants