-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gradient check example #1497
base: develop
Are you sure you want to change the base?
Gradient check example #1497
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## develop #1497 +/- ##
===========================================
- Coverage 84.37% 84.34% -0.03%
===========================================
Files 163 163
Lines 14037 14037
===========================================
- Hits 11844 11840 -4
- Misses 2193 2197 +4 ☔ View full report in Codecov by Sentry. |
Fixed two small typos.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I think I would just skip over check_grad
and directly introduce check_grad_multi_eps
. The latter performs much better.
Co-authored by: Daniel Weindl [email protected]
I agree that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thanks for this.
"source": [ | ||
"# Gradient checks\n", | ||
"\n", | ||
"It is best practice to do gradient checks before and after gradient-based optimization.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be good to include some rationale for why to check it afterwards and what to look for. I.e. except for parameters with active bounds, the values should be close to 0. At the same time, this might make it difficult to get good FD approximations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean, a 0 gradient makes the FD approximation difficult?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that, and one might miss gradient entries that are (incorrectly) always zero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't these already show up before optimisation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they should.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments, more in the direction of how would one interpret the results of gradient checks. Because sometimes we get high absolute errors, but relative are still ok, so is this an issue, or not? Some guidance for users.
Not sure if we want to add that type of content to this notebook tho, or just want to leave it only as -- "This can be done and it's done using these functions."
"- `fd_f`: FD forward difference\n", | ||
"- `fd_b`: FD backward difference\n", | ||
"- `fd_c`: Approximation of FD central difference (reusing the information from `fd_f` and `fd_b`)\n", | ||
"- `fd_err`: Deviation between forward and backward differences `fd_f`, `fd_b`\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be good to add what this represents and why it's here. It is in the name (error) but it shows in some way how much does the gradient change in the local area of the point that's being checked. So it is in some way a showing of whether the step size is small enough -- in cases the function is smooth enough.
However, that's also not completely true. If you're at the optimum then the forward gradient will be positive, and the backward negative, possibly with high values, so fd_err will be very high, almost for any choice of eps.
"text/plain": [ | ||
" grad fd_f fd_b fd_c \\\n", | ||
"Epo_degradation_BaF3 2.899805e+10 2.898349e+10 2.899516e+10 2.898933e+10 \n", | ||
"k_exp_hetero -1.822477e+03 8.247990e+07 -1.185836e+08 -1.805185e+07 \n", | ||
"k_exp_homo 1.940159e+06 -7.634094e+06 2.620560e+07 9.285754e+06 \n", | ||
"k_imp_hetero 1.324222e+09 9.636109e+08 1.401833e+09 1.182722e+09 \n", | ||
"k_imp_homo 2.759777e+09 2.689595e+09 2.810697e+09 2.750146e+09 \n", | ||
"k_phos -3.183894e+10 -3.189761e+10 -3.181251e+10 -3.185506e+10 \n", | ||
"sd_pSTAT5A_rel -4.106435e+12 -4.106388e+12 -4.106482e+12 -4.106435e+12 \n", | ||
"sd_pSTAT5B_rel -2.467665e+11 -2.468085e+11 -2.467245e+11 -2.467665e+11 \n", | ||
"sd_rSTAT5A_rel 3.684015e+01 -4.769135e+07 4.769149e+07 7.324219e+01 \n", | ||
"\n", | ||
" fd_err abs_err rel_err \n", | ||
"Epo_degradation_BaF3 1.166055e+07 8.729236e+06 3.011190e-04 \n", | ||
"k_exp_hetero 2.010635e+08 1.805003e+07 9.998990e-01 \n", | ||
"k_exp_homo 3.383970e+07 7.345596e+06 7.910607e-01 \n", | ||
"k_imp_hetero 4.382216e+08 1.415004e+08 1.196396e-01 \n", | ||
"k_imp_homo 1.211021e+08 9.630702e+06 3.501887e-03 \n", | ||
"k_phos 8.509938e+07 1.611509e+07 5.058878e-04 \n", | ||
"sd_pSTAT5A_rel 9.372550e+07 6.663599e+02 1.622721e-10 \n", | ||
"sd_pSTAT5B_rel 8.401874e+07 3.362799e+01 1.362745e-10 \n", | ||
"sd_rSTAT5A_rel 9.538284e+07 3.640203e+01 4.970090e-01 " | ||
] | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we should add some conclusion on this or one of the gradient checks? In the sense what would we think if we saw this gradient check: the absolute errors are rather high but can be expected with finite differences on random points. We see this also with fd_err, so changing the eps might make sense.
But what's reassuring is that the relative error is not too high in most cases.
Would redoing the gradient check with another eps make sense to show how it can affect the check?
AmiciObjective.check_gradients_match_finite_differences
#1494I'd be happy about suggestions on the "Best practices" and "How to fix my gradients".