-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support different backends for nn
#1264
Comments
Note, I added the |
Another aspect of eager vs graph (symbolic) mode: I think for However, in So far we only defined parameters in However, for any computation depending on a parameter which can potentially change, we need to think about this. It's not clear yet how to solve this. This becomes relevant for example for weight norm (#91). (Edit Maybe it's not so much a problem when we wrap Edit I moved this to an own separate issue: rwth-i6/returnn_common#250 |
Note on this: Named tensors in PyTorch is quite incomplete (e.g. pytorch/pytorch#94586) and its development was stopped (pytorch/pytorch#60832). So using |
There is quite some overlap with the PyTorch backend issue (#1120). We decided to define most of the core |
I think we can close this issue here, as most frontend API related discussions happens in #1120. |
Edit What is referred to as
rc.nn
was the RETURNN-commonnn
API. We decided to move this API over to RETURNN itself, where we just call it the "frontend API". Also see #1120 specifically about PyTorch.Edit This issue is now part of #1120.
While thinking about how to integrate
rc.nn
better into RETURNN (#1185), specifically also how to directly construct the layers and not go over the net dict, and then also thinking about PyTorch in RETURNN (#1120), and howrc.nn
can be useful for PyTorch as well, I came to the conclusion that we should designnn
in a way that it is easy to switch out different backends. I'm currently thinking about:Currently, only RETURNN net dict is supported, and a limited amount of RETURNN layers directly with TF eager mode.
All works via the
nn.make_layer
function, which gets a RETURNN layer dict, as you see it normally in the RETURNN net dict.nn.make_layer
is already too RETURNN specific. I'm thinking about a dedicated lower-level API inrc.nn
, which can be switched out for different backends. This is similar as Keras had it some time ago, where they defined all the low-level functions they used, for both TF and Theano. Our_generated_layers.py
is almost already like that. We maybe want to cleanup it a bit more and not really expose all RETURNN layers.All the logic from
NameCtx
is maybe also only needed for the RETURNN net dict backend, because this is almost all about how the RETURNN net dict is constructed in the end and to figure out layer names. Some of thenn.Dim
names (descriptions) also use it but this is anyway some aspect which I'm not totally happy with.We anyway also wanted to abstract and move
ExternData
/Data
/Dim
to be framework independent (#1165). It's basically already like that, it just needs to be cleaned up a bit and moved.(Despite, the
Dim
internals should also be cleaned up. See #975.)TensorFlow directly could mean both graph mode and eager mode.
PyTorch is eager mode.
In case of eager mode, we should take extra care that it is efficient to run. We never really optimized this too much as so far it just created the graph and the runtime later would only operate on the graph. But in eager mode, it would get executed again and again. The RETURNN net dict would be way too much overhead for eager mode. For the other backends, we need to be careful. I'm not sure if
Data
is already too much overhead. Probably it needs to be optimized. And then,nn.Tensor
is another wrapper. Currently it's an own data structure. But we might make it an alias totorch.Tensor
ortensorflow.Tensor
and somehow attach our meta information to it (allData
stuff, specifically theDim
s). I'm not sure. But the overhead ofrc.nn
must be really minimal, otherwise it would not really be attractive. In case of PyTorch, I was thinking about using their named dimensions, and somehow keeping a global dimension tag register, by making sure that all names are unique. So that way you can always get the reference to theDim
object given a puretorch.Tensor
.In principle, I think it should be possible to have this efficient and the overhead minimal, so that PyTorch and TF eager mode are really also potential backends.
For now, we should just keep such potential plans in mind, when thinking about some internal design or the API of
nn
. It should not be too RETURNN or too TF specific, except maybe for theExternData
,Data
andDim
s.So, effectively, what would
rc.nn
provide as advantage over just using PyTorch directly?Dim
object, and very consistent use of it.jax.vmap
, in an efficient and straightforward way.nn.Loop
could make it more efficient.Related:
The text was updated successfully, but these errors were encountered: