norms should delegate to the backend where possible #6

mglisse · 2020-04-13T12:37:19Z

Hello,
with a pytorch tensor t, I can call t.norm(p, dim). This gives a similar result to eagerpy's lp, but makes a huge difference when it comes to the gradient. Pytorch has this feature where deriving sqrt in 0 gives infinity (mathematically sensible), which often yields a gradient of NaN. However, special functions like norm are handled specially, similarly to abs, and return a suitable subgradient (0).
Could you please make l2/lp/... call norm for the pytorch backend?

The text was updated successfully, but these errors were encountered:

mglisse · 2020-04-22T14:15:38Z

With an example to demonstrate the issue:

import torch
import eagerpy
a = torch.tensor([0.], requires_grad=True)
torch.norm(a, p=2).backward()
print(a.grad)
eagerpy.astensor(a).norms.l2().raw.backward()
print(a.grad)

tensor([0.])
tensor([nan])

jonasrauber · 2020-04-23T09:52:52Z

Hi @mglisse, thanks for request and the example code.
That makes a lot of sense and I think this might be doable.
May I ask how you use EagerPy? Do you just use it as an alternative API for PyTorch, without needing the the ability to run the same code using different frameworks, or why is this only a problem with PyTorch?

mglisse · 2020-04-23T11:00:00Z

Hi, thanks for the reply. I use eagerpy so I can write the code only once and let it work with several frameworks. It is true that currently I mostly experiment with pytorch though.
The problem isn't limited to pytorch. The first time I hit this NaN issue with pytorch, jax was giving good numbers, so I assumed they were doing something different. I didn't keep the exact code, and now that I try to reproduce it, I seem to get NaN from jax and pytorch in the same cases. So I don't know if my experiment at the time was bogus, or hit a very special case...
A good thing is that all frameworks seem to provide a norm function (at least for p not 0?). A bad thing is that the one in jax (I did not check tensorflow) does not seem to have a special (sub)gradient implementation, it also gives a NaN gradient for jax.numpy.linalg.norm(x,2) in 0. But I could go ask them about that. Another bad thing is that they don't have the same definition. On a matrix [[1,2],[3,4]] with p=1, numpy/jax return 6 while torch/tensorflow return 10, that complicates things a bit...
Of course there are workarounds, I could compute norms manually and add tiny (trying to work through the various dtype/finfo to get it) before doing the square root. Or I can let eagerpy compute the norm and if result is 0, result=result.from_numpy(0.) to replace it with a constant (or actually some better formulation to get the right dtype, plus with pytorch this one does not have requires_grad so if I call .raw.backward() directly on it without combining it with other numbers, it fails).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

norms should delegate to the backend where possible #6

norms should delegate to the backend where possible #6

mglisse commented Apr 13, 2020

mglisse commented Apr 22, 2020

jonasrauber commented Apr 23, 2020

mglisse commented Apr 23, 2020 •

edited

Loading

norms should delegate to the backend where possible #6

norms should delegate to the backend where possible #6

Comments

mglisse commented Apr 13, 2020

mglisse commented Apr 22, 2020

jonasrauber commented Apr 23, 2020

mglisse commented Apr 23, 2020 • edited Loading

mglisse commented Apr 23, 2020 •

edited

Loading