You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Good day, I'm having a bit of trouble understanding why the value of 1.005 was chosen for the factor parameter. I know it's supposed to be an increment by 0.5% at each iteration and that the new learning rate will be new_lr = old_lr * factor, but I don't get where that value (1.005) came from.
I think the value of 0.5% is chosen arbitrarily just to increase from small learning rate (starting from 0.001). As was stated earlier in the chapter, the optimal learning rate is usually a half of maximum learning rate before it diverges, but we can get it from either way:
forward (multiplying or adding)
backward (dividing or subtracting)
Edit:
Forgot to add, that this is precisely what is done in the notebook (121th cell), the author discovers the maximum value and takes half of it (in this case 1e-3).
Good day, I'm having a bit of trouble understanding why the value of 1.005 was chosen for the
factor
parameter. I know it's supposed to be an increment by 0.5% at each iteration and that the new learning rate will benew_lr = old_lr * factor
, but I don't get where that value (1.005) came from.This is the code:
The text was updated successfully, but these errors were encountered: