Support different backends for `nn` #1264

albertz · 2022-10-26T09:06:19Z

Edit What is referred to as rc.nn was the RETURNN-common nn API. We decided to move this API over to RETURNN itself, where we just call it the "frontend API". Also see #1120 specifically about PyTorch.

Edit This issue is now part of #1120.

While thinking about how to integrate rc.nn better into RETURNN (#1185), specifically also how to directly construct the layers and not go over the net dict, and then also thinking about PyTorch in RETURNN (#1120), and how rc.nn can be useful for PyTorch as well, I came to the conclusion that we should design nn in a way that it is easy to switch out different backends. I'm currently thinking about:

RETURNN net dict (only for the RETURNN TensorFlow backend, although in principle there could also be other RETURNN backends for this)
RETURNN layers directly, as suggested in Better integration of returnn_common.nn #1185
TensorFlow directly, not via RETURNN layers at all
PyTorch (see Frontend API and PyTorch backend #1120)
JAX

Currently, only RETURNN net dict is supported, and a limited amount of RETURNN layers directly with TF eager mode.

All works via the nn.make_layer function, which gets a RETURNN layer dict, as you see it normally in the RETURNN net dict.

nn.make_layer is already too RETURNN specific. I'm thinking about a dedicated lower-level API in rc.nn, which can be switched out for different backends. This is similar as Keras had it some time ago, where they defined all the low-level functions they used, for both TF and Theano. Our _generated_layers.py is almost already like that. We maybe want to cleanup it a bit more and not really expose all RETURNN layers.

All the logic from NameCtx is maybe also only needed for the RETURNN net dict backend, because this is almost all about how the RETURNN net dict is constructed in the end and to figure out layer names. Some of the nn.Dim names (descriptions) also use it but this is anyway some aspect which I'm not totally happy with.

We anyway also wanted to abstract and move ExternData/Data/Dim to be framework independent (#1165). It's basically already like that, it just needs to be cleaned up a bit and moved.
(Despite, the Dim internals should also be cleaned up. See #975.)

TensorFlow directly could mean both graph mode and eager mode.
PyTorch is eager mode.

In case of eager mode, we should take extra care that it is efficient to run. We never really optimized this too much as so far it just created the graph and the runtime later would only operate on the graph. But in eager mode, it would get executed again and again. The RETURNN net dict would be way too much overhead for eager mode. For the other backends, we need to be careful. I'm not sure if Data is already too much overhead. Probably it needs to be optimized. And then, nn.Tensor is another wrapper. Currently it's an own data structure. But we might make it an alias to torch.Tensor or tensorflow.Tensor and somehow attach our meta information to it (all Data stuff, specifically the Dims). I'm not sure. But the overhead of rc.nn must be really minimal, otherwise it would not really be attractive. In case of PyTorch, I was thinking about using their named dimensions, and somehow keeping a global dimension tag register, by making sure that all names are unique. So that way you can always get the reference to the Dim object given a pure torch.Tensor.

In principle, I think it should be possible to have this efficient and the overhead minimal, so that PyTorch and TF eager mode are really also potential backends.

For now, we should just keep such potential plans in mind, when thinking about some internal design or the API of nn. It should not be too RETURNN or too TF specific, except maybe for the ExternData, Data and Dims.

So, effectively, what would rc.nn provide as advantage over just using PyTorch directly?

The Dim object, and very consistent use of it.
- This allows for cleaner code in many parts, less errors, easier debugging.
- This also allows for automatic batching, like jax.vmap, in an efficient and straightforward way.
Some amount of optimization in nn.Loop could make it more efficient.
Support of different backends. JAX or TF are probably slightly faster. For debugging, PyTorch or TF eager mode can be used, and later it can be switched to TF or JAX.

https://github.com/unifyai/ivy

The text was updated successfully, but these errors were encountered:

albertz · 2022-10-26T09:14:47Z

Note, I added the first-release tag, but only that we think about necessary API changes now, because once we have the first stable release, changing the API would be much more difficult.

albertz · 2022-11-11T20:11:41Z

Another aspect of eager vs graph (symbolic) mode:

I think for __call__ and other functions executed, this is all fine.

However, in __init__, there is an important difference. In each case, this is executed only once. With symbolic computation, represententing some value e.g. based on a parameter, for example weight normalized parameters, this is totally fine and the right thing to do for symbolic execution. However, in case of eager execution, only executing it once is not helpful. E.g. in PyTorch, weight normalization will use _forward_pre_hooks to calculate it again and again.

So far we only defined parameters in __init__, and maybe their initial values (nn.init.ParamInit) or maybe things like weight decay. This is fine for both eager and symbolic mode.

However, for any computation depending on a parameter which can potentially change, we need to think about this. It's not clear yet how to solve this. This becomes relevant for example for weight norm (#91).

(Edit Maybe it's not so much a problem when we wrap nn.Tensor anyway. It then can be either directly the tensor (when inside some __call__) or a symbolic representation (when inside some __init__, or for TF graph mode all the time). Post edit This might make the logic way too complex. We should think of more simple solutions.)

Edit I moved this to an own separate issue: rwth-i6/returnn_common#250

albertz · 2023-02-14T10:39:42Z

In case of PyTorch, I was thinking about using their named dimensions, and somehow keeping a global dimension tag register, by making sure that all names are unique.

Note on this: Named tensors in PyTorch is quite incomplete (e.g. pytorch/pytorch#94586) and its development was stopped (pytorch/pytorch#60832). So using torch.Tensor directly is not really an option. We need our own wrapper class (Data).

albertz · 2023-02-27T12:55:15Z

There is quite some overlap with the PyTorch backend issue (#1120). We decided to define most of the core nn functions as a "frontend API" directly in RETURNN, and also the RETURNN Data (renamed to Tensor) is supposed to be the main tensor class for this frontend API (already prepared in #1261). This frontend API including the implementation for RETURNN layers and PyTorch is part of RETURNN itself. So it makes more sense to move this issue over to RETURNN. Edit Done.

albertz · 2023-03-16T12:59:53Z

I think we can close this issue here, as most frontend API related discussions happens in #1120.

albertz mentioned this issue Oct 26, 2022

Frontend API and PyTorch backend #1120

Open

albertz mentioned this issue Nov 12, 2022

Weight norm rwth-i6/returnn_common#91

Closed

albertz mentioned this issue Jan 20, 2023

Allow __init__ logic to work equally for graph-based and eager-based backends, specifically re-parameterization like weight norm rwth-i6/returnn_common#250

Open

albertz mentioned this issue Feb 13, 2023

Move core nn functions to RETURNN rwth-i6/returnn_common#252

Open

albertz mentioned this issue Feb 24, 2023

Tensor abstraction #1261

Merged

albertz transferred this issue from rwth-i6/returnn_common Feb 27, 2023

albertz mentioned this issue Mar 10, 2023

Tensor.feature_dim and Tensor.feature_dim_axis #1273

Closed

albertz closed this as completed Mar 16, 2023

albertz added the returnn-frontend label May 24, 2024

albertz mentioned this issue May 25, 2024

RF weight dropout and variational noise #1518

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support different backends for `nn` #1264

Support different backends for `nn` #1264

albertz commented Oct 26, 2022 •

edited

Loading

albertz commented Oct 26, 2022

albertz commented Nov 11, 2022 •

edited

Loading

albertz commented Feb 14, 2023

albertz commented Feb 27, 2023 •

edited

Loading

albertz commented Mar 16, 2023

Support different backends for nn #1264

Support different backends for nn #1264

Comments

albertz commented Oct 26, 2022 • edited Loading

albertz commented Oct 26, 2022

albertz commented Nov 11, 2022 • edited Loading

albertz commented Feb 14, 2023

albertz commented Feb 27, 2023 • edited Loading

albertz commented Mar 16, 2023

Support different backends for `nn` #1264

Support different backends for `nn` #1264

albertz commented Oct 26, 2022 •

edited

Loading

albertz commented Nov 11, 2022 •

edited

Loading

albertz commented Feb 27, 2023 •

edited

Loading