Convolution over feature dim #135

vieting · 2022-12-07T18:46:06Z

As discussed in #134, it is currently not possible to do a convolution over the axis which RETURNN considers the feature dim and it would be helpful to set in_dim. However, then the basic _get_output_shape_from_returnn fails because there, the new feature dim is mapped to the old feature dim and as a result, the remaining dims are also mapped incorrectly. I add a suggestion in this PR.

vieting · 2022-12-09T09:21:38Z

The failing tests are the ones from #125.

albertz · 2022-12-09T09:38:22Z

pytorch_to_returnn/torch/nn/modules/conv.py

+    input_tensor = naming.tensors[input]
+    in_dim = input_tensor.returnn_data.dim_tags[input_tensor.returnn_axis_from_torch_axis[1]]
+    in_spatial_dims = [
+      input_tensor.returnn_data.dim_tags[input_tensor.returnn_axis_from_torch_axis[dim + len(input.shape)]]


Why don't you use _get_input_axis_to_returnn for that?

Because that would return an axis description like "T" or "F". This would be mapped to a dim tag in the ConvLayer construction. However, in case we do convolution over the feature dim, "F" would be mapped to the in_dim, so the new feature dim and not the old one which does not work.

albertz · 2022-12-09T09:41:47Z

pytorch_to_returnn/torch/nn/modules/conv.py

    assert len(inputs_flat) == 1
    torch_shape = list(inputs_flat[0].shape)
    torch_shape[1] = self.out_channels
    for idx in range(self.nd):
      torch_ax = idx + 2
      torch_shape[torch_ax] = (torch_shape[torch_ax] + 2 * self.padding[idx] - self.dilation[idx] * (
        self.kernel_size[idx] - 1) - 1) // self.stride[idx] + 1
+
+    from pytorch_to_returnn.naming import Naming


I don't understand why you do it so complicated. This code here should be quite short, straightforward, and not using any heuristics. Your code here is full of heuristics, checking whether you can map all axes, etc. You don't need that. We know exactly how it must map.

We know exactly how it must map.

All dims that did not change are trivial, the channel dim can be done as I do it here. How would you do it for the spatial dims? Just assume that the order of spatial dims is the same as in in_spatial_dims?

I added an update which does the mapping as I described above.

Just assume that the order of spatial dims is the same as in in_spatial_dims?

What order? Of the RETURNN output?

We don't need to guess anything here. We know everything exactly.

Except BCHW vs BHWC, but you can just check where out_dim is.

Ok, then that is exactly what I do now, right?

vieting · 2022-12-09T11:31:24Z

If we do this change, I have a follow up issue. The test cases work, but if I write the conversion result to a file via converter.get_returnn_config_serialized(), the dim tags will show up at the beginning and are used in the network as well. Here is an example:

import numpy
from returnn.tf.util.data import Dim, batch_dim, single_step_dim, SpatialDim, FeatureDim

use_tensorflow = True
behavior_version = 12

feature_data_dim = FeatureDim('feature:data', 11)
time_data_dim = SpatialDim('time:data')
spatial1_data_dim = SpatialDim('spatial1:data')
Linear_feature_dense_dim = FeatureDim('Linear:feature-dense', 7)

extern_data = {
  'data': {
    'dim_tags': [
      batch_dim,
      feature_data_dim,
      time_data_dim,
      spatial1_data_dim
    ],
    'dtype': 'float32',
    'time_dim_axis': 2,
    'feature_dim_axis': 1
  }
}

network = {
  'Transpose': {'class': 'copy', 'from': 'data'},
  'Linear': {
    'class': 'linear',
    'from': 'Transpose',
    'n_out': 7,
    'with_bias': True,
    'activation': None
  },
  'Transpose_1': {'class': 'copy', 'from': 'Linear'},
  'Conv2d': {
    'class': 'conv',
    'from': 'Transpose_1',
    'activation': None,
    'with_bias': True,
    'n_out': 13,
    'filter_size': (3, 5),
    'padding': 'valid',
    'in_spatial_dims': [
      time_data_dim,
      spatial1_data_dim
    ],
    'in_dim': Linear_feature_dense_dim,
    'strides': (2, 2)
  },
  'output': {'class': 'copy', 'from': 'Conv2d'}
}

However, Linear_feature_dense_dim in the example is not identical with the dim tag in the input. We would have to set the output dim tags also I guess, right? Are there already solutions for this in returnn_common or elsewhere?

albertz · 2022-12-09T12:29:56Z

You need to change all n_out to out_dim. This is for LinearLayer but also other layers like ConvLayer.

You also need to specify out_spatial_dims for ConvLayer.

vieting · 2022-12-09T12:53:48Z

I need to create them in create_returnn_layer_dict and there is no way to infer them, right?

And the printed config will possibly have LOTS of dim tags at the beginning. I guess that's similar in returnn common, right?

vieting added 2 commits December 7, 2022 19:42

add vgg test

9d4057a

ConvNd: set in_dim and in_spatial_dims dim tags

8311196

vieting marked this pull request as draft December 7, 2022 18:51

small fix

758862d

vieting marked this pull request as ready for review December 9, 2022 07:38

vieting requested a review from albertz December 9, 2022 07:38

albertz reviewed Dec 9, 2022

View reviewed changes

remove heuristics

44d9152

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convolution over feature dim #135

Convolution over feature dim #135

vieting commented Dec 7, 2022

vieting commented Dec 9, 2022

albertz Dec 9, 2022

vieting Dec 9, 2022

albertz Dec 9, 2022

vieting Dec 9, 2022

vieting Dec 9, 2022

albertz Dec 9, 2022

albertz Dec 9, 2022

vieting Dec 9, 2022

vieting commented Dec 9, 2022 •

edited

Loading

albertz commented Dec 9, 2022

vieting commented Dec 9, 2022 •

edited

Loading

Convolution over feature dim #135

Are you sure you want to change the base?

Convolution over feature dim #135

Conversation

vieting commented Dec 7, 2022

vieting commented Dec 9, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vieting commented Dec 9, 2022 • edited Loading

albertz commented Dec 9, 2022

vieting commented Dec 9, 2022 • edited Loading

vieting commented Dec 9, 2022 •

edited

Loading

vieting commented Dec 9, 2022 •

edited

Loading