Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

norms should delegate to the backend where possible #6

Open
mglisse opened this issue Apr 13, 2020 · 3 comments
Open

norms should delegate to the backend where possible #6

mglisse opened this issue Apr 13, 2020 · 3 comments

Comments

@mglisse
Copy link
Contributor

mglisse commented Apr 13, 2020

Hello,
with a pytorch tensor t, I can call t.norm(p, dim). This gives a similar result to eagerpy's lp, but makes a huge difference when it comes to the gradient. Pytorch has this feature where deriving sqrt in 0 gives infinity (mathematically sensible), which often yields a gradient of NaN. However, special functions like norm are handled specially, similarly to abs, and return a suitable subgradient (0).
Could you please make l2/lp/... call norm for the pytorch backend?

@mglisse
Copy link
Contributor Author

mglisse commented Apr 22, 2020

With an example to demonstrate the issue:

import torch
import eagerpy
a = torch.tensor([0.], requires_grad=True)
torch.norm(a, p=2).backward()
print(a.grad)
eagerpy.astensor(a).norms.l2().raw.backward()
print(a.grad)

tensor([0.])
tensor([nan])

@jonasrauber
Copy link
Owner

Hi @mglisse, thanks for request and the example code.
That makes a lot of sense and I think this might be doable.
May I ask how you use EagerPy? Do you just use it as an alternative API for PyTorch, without needing the the ability to run the same code using different frameworks, or why is this only a problem with PyTorch?

@mglisse
Copy link
Contributor Author

mglisse commented Apr 23, 2020

Hi, thanks for the reply. I use eagerpy so I can write the code only once and let it work with several frameworks. It is true that currently I mostly experiment with pytorch though.
The problem isn't limited to pytorch. The first time I hit this NaN issue with pytorch, jax was giving good numbers, so I assumed they were doing something different. I didn't keep the exact code, and now that I try to reproduce it, I seem to get NaN from jax and pytorch in the same cases. So I don't know if my experiment at the time was bogus, or hit a very special case...
A good thing is that all frameworks seem to provide a norm function (at least for p not 0?). A bad thing is that the one in jax (I did not check tensorflow) does not seem to have a special (sub)gradient implementation, it also gives a NaN gradient for jax.numpy.linalg.norm(x,2) in 0. But I could go ask them about that. Another bad thing is that they don't have the same definition. On a matrix [[1,2],[3,4]] with p=1, numpy/jax return 6 while torch/tensorflow return 10, that complicates things a bit...
Of course there are workarounds, I could compute norms manually and add tiny (trying to work through the various dtype/finfo to get it) before doing the square root. Or I can let eagerpy compute the norm and if result is 0, result=result.from_numpy(0.) to replace it with a constant (or actually some better formulation to get the right dtype, plus with pytorch this one does not have requires_grad so if I call .raw.backward() directly on it without combining it with other numbers, it fails).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants