Questions about learning rate #4

chingyaoc · 2016-06-30T07:36:46Z

In san_att_lstm_twolayer.py, we can see that the learning rate is 0.05 initially.
In function get_lr(), learning rate will dacay like this

options['lr'] * (options['gamma'] ** power)

However options['gamma'] is equals to 1, which means that the learning rate will not decay.

options['gamma'] = 1

This is the part I'm wondering.

The text was updated successfully, but these errors were encountered:

zcyang · 2016-06-30T09:33:49Z

options['gamma'] is simply one hyper parameter, you can set it to 1 or anything else. I don't remember whether options['gamma']=1 is optimal. Maybe you can tune it to make it better.

chingyaoc · 2016-06-30T11:44:10Z

Thanks for the reply. Do you still have the optimal parameter for learning rate, momentum and decay ratio? Since I follow the learning rate which is 0.05, the loss become 20 in the early iteration which is not reasonable.

options['lr'] = numpy.float32(0.05)
options['momentum'] = numpy.float32(0.9)
options['gamma'] = 1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about learning rate #4

Questions about learning rate #4

chingyaoc commented Jun 30, 2016

zcyang commented Jun 30, 2016

chingyaoc commented Jun 30, 2016

Questions about learning rate #4

Questions about learning rate #4

Comments

chingyaoc commented Jun 30, 2016

zcyang commented Jun 30, 2016

chingyaoc commented Jun 30, 2016