-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convolution over feature dim #135
base: main
Are you sure you want to change the base?
Conversation
The failing tests are the ones from #125. |
input_tensor = naming.tensors[input] | ||
in_dim = input_tensor.returnn_data.dim_tags[input_tensor.returnn_axis_from_torch_axis[1]] | ||
in_spatial_dims = [ | ||
input_tensor.returnn_data.dim_tags[input_tensor.returnn_axis_from_torch_axis[dim + len(input.shape)]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't you use _get_input_axis_to_returnn
for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because that would return an axis description like "T"
or "F"
. This would be mapped to a dim tag in the ConvLayer
construction. However, in case we do convolution over the feature dim, "F"
would be mapped to the in_dim
, so the new feature dim and not the old one which does not work.
assert len(inputs_flat) == 1 | ||
torch_shape = list(inputs_flat[0].shape) | ||
torch_shape[1] = self.out_channels | ||
for idx in range(self.nd): | ||
torch_ax = idx + 2 | ||
torch_shape[torch_ax] = (torch_shape[torch_ax] + 2 * self.padding[idx] - self.dilation[idx] * ( | ||
self.kernel_size[idx] - 1) - 1) // self.stride[idx] + 1 | ||
|
||
from pytorch_to_returnn.naming import Naming |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why you do it so complicated. This code here should be quite short, straightforward, and not using any heuristics. Your code here is full of heuristics, checking whether you can map all axes, etc. You don't need that. We know exactly how it must map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We know exactly how it must map.
All dims that did not change are trivial, the channel dim can be done as I do it here. How would you do it for the spatial dims? Just assume that the order of spatial dims is the same as in in_spatial_dims
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added an update which does the mapping as I described above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just assume that the order of spatial dims is the same as in
in_spatial_dims
?
What order? Of the RETURNN output?
We don't need to guess anything here. We know everything exactly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except BCHW vs BHWC, but you can just check where out_dim is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, then that is exactly what I do now, right?
If we do this change, I have a follow up issue. The test cases work, but if I write the conversion result to a file via
However, |
You need to change all You also need to specify |
I need to create them in And the printed config will possibly have LOTS of dim tags at the beginning. I guess that's similar in returnn common, right? |
As discussed in #134, it is currently not possible to do a convolution over the axis which RETURNN considers the feature dim and it would be helpful to set
in_dim
. However, then the basic_get_output_shape_from_returnn
fails because there, the new feature dim is mapped to the old feature dim and as a result, the remaining dims are also mapped incorrectly. I add a suggestion in this PR.