Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gradient check example #1497

Open
wants to merge 13 commits into
base: develop
Choose a base branch
from
Open

Gradient check example #1497

wants to merge 13 commits into from

Conversation

m-philipps
Copy link
Contributor

I'd be happy about suggestions on the "Best practices" and "How to fix my gradients".

@codecov-commenter
Copy link

codecov-commenter commented Oct 20, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.34%. Comparing base (e9a969e) to head (c0ef8cb).

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1497      +/-   ##
===========================================
- Coverage    84.37%   84.34%   -0.03%     
===========================================
  Files          163      163              
  Lines        14037    14037              
===========================================
- Hits         11844    11840       -4     
- Misses        2193     2197       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@dweindl dweindl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I think I would just skip over check_grad and directly introduce check_grad_multi_eps. The latter performs much better.

doc/example/gradient_check.ipynb Outdated Show resolved Hide resolved
doc/example/gradient_check.ipynb Outdated Show resolved Hide resolved
@m-philipps
Copy link
Contributor Author

I think I would just skip over check_grad and directly introduce check_grad_multi_eps. The latter performs much better.

I agree that check_grad_multi_eps is convenient to use in practice; I like showcasing check_grad first to build it up like a tutorial, such that it is easier to understand what check_grad_multi_eps is doing for people who aren't so familiar with gradient checks. I'm also fine with changing it though.

Copy link
Collaborator

@PaulJonasJost PaulJonasJost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks for this.

pypesto/objective/amici/amici.py Show resolved Hide resolved
"source": [
"# Gradient checks\n",
"\n",
"It is best practice to do gradient checks before and after gradient-based optimization.\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to include some rationale for why to check it afterwards and what to look for. I.e. except for parameters with active bounds, the values should be close to 0. At the same time, this might make it difficult to get good FD approximations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean, a 0 gradient makes the FD approximation difficult?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that, and one might miss gradient entries that are (incorrectly) always zero.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't these already show up before optimisation?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they should.

doc/example/gradient_check.ipynb Outdated Show resolved Hide resolved
doc/example/gradient_check.ipynb Outdated Show resolved Hide resolved
doc/example/gradient_check.ipynb Outdated Show resolved Hide resolved
doc/example/gradient_check.ipynb Outdated Show resolved Hide resolved
doc/example/gradient_check.ipynb Show resolved Hide resolved
doc/example/gradient_check.ipynb Outdated Show resolved Hide resolved
@m-philipps m-philipps marked this pull request as draft November 6, 2024 12:11
@m-philipps m-philipps marked this pull request as ready for review January 2, 2025 14:40
@m-philipps m-philipps requested a review from vwiela January 8, 2025 18:59
@Doresic Doresic mentioned this pull request Jan 8, 2025
Copy link
Contributor

@Doresic Doresic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments, more in the direction of how would one interpret the results of gradient checks. Because sometimes we get high absolute errors, but relative are still ok, so is this an issue, or not? Some guidance for users.

Not sure if we want to add that type of content to this notebook tho, or just want to leave it only as -- "This can be done and it's done using these functions."

"- `fd_f`: FD forward difference\n",
"- `fd_b`: FD backward difference\n",
"- `fd_c`: Approximation of FD central difference (reusing the information from `fd_f` and `fd_b`)\n",
"- `fd_err`: Deviation between forward and backward differences `fd_f`, `fd_b`\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be good to add what this represents and why it's here. It is in the name (error) but it shows in some way how much does the gradient change in the local area of the point that's being checked. So it is in some way a showing of whether the step size is small enough -- in cases the function is smooth enough.

However, that's also not completely true. If you're at the optimum then the forward gradient will be positive, and the backward negative, possibly with high values, so fd_err will be very high, almost for any choice of eps.

Comment on lines +230 to +253
"text/plain": [
" grad fd_f fd_b fd_c \\\n",
"Epo_degradation_BaF3 2.899805e+10 2.898349e+10 2.899516e+10 2.898933e+10 \n",
"k_exp_hetero -1.822477e+03 8.247990e+07 -1.185836e+08 -1.805185e+07 \n",
"k_exp_homo 1.940159e+06 -7.634094e+06 2.620560e+07 9.285754e+06 \n",
"k_imp_hetero 1.324222e+09 9.636109e+08 1.401833e+09 1.182722e+09 \n",
"k_imp_homo 2.759777e+09 2.689595e+09 2.810697e+09 2.750146e+09 \n",
"k_phos -3.183894e+10 -3.189761e+10 -3.181251e+10 -3.185506e+10 \n",
"sd_pSTAT5A_rel -4.106435e+12 -4.106388e+12 -4.106482e+12 -4.106435e+12 \n",
"sd_pSTAT5B_rel -2.467665e+11 -2.468085e+11 -2.467245e+11 -2.467665e+11 \n",
"sd_rSTAT5A_rel 3.684015e+01 -4.769135e+07 4.769149e+07 7.324219e+01 \n",
"\n",
" fd_err abs_err rel_err \n",
"Epo_degradation_BaF3 1.166055e+07 8.729236e+06 3.011190e-04 \n",
"k_exp_hetero 2.010635e+08 1.805003e+07 9.998990e-01 \n",
"k_exp_homo 3.383970e+07 7.345596e+06 7.910607e-01 \n",
"k_imp_hetero 4.382216e+08 1.415004e+08 1.196396e-01 \n",
"k_imp_homo 1.211021e+08 9.630702e+06 3.501887e-03 \n",
"k_phos 8.509938e+07 1.611509e+07 5.058878e-04 \n",
"sd_pSTAT5A_rel 9.372550e+07 6.663599e+02 1.622721e-10 \n",
"sd_pSTAT5B_rel 8.401874e+07 3.362799e+01 1.362745e-10 \n",
"sd_rSTAT5A_rel 9.538284e+07 3.640203e+01 4.970090e-01 "
]
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we should add some conclusion on this or one of the gradient checks? In the sense what would we think if we saw this gradient check: the absolute errors are rather high but can be expected with finite differences on random points. We see this also with fd_err, so changing the eps might make sense.
But what's reassuring is that the relative error is not too high in most cases.

Would redoing the gradient check with another eps make sense to show how it can affect the check?

doc/example/gradient_check.ipynb Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants