some questions... #3

lovekittynine · 2022-12-07T08:28:50Z

Hello, i have some small questions about the code.

First, the MLP block uses self.pos layer, actually, the author hadn't mentioned it in the paper. It acts like a depth-wise separable convolution together with self.fc2, but it add some extra parameters, the effective of this layer are huge???
Second, in the Block code, i see the default args of kernel_size is 11, and padding is 5 for self.a layer, however, in the last stage(stage 4), the size of feature map is 7x7 (224x224 inputs), using kernel_size = 11 for convolution seems some strange.

Thanks for your replay!

class MLP(nn.Module):
    def __init__(self, dim, mlp_ratio=4):
        super().__init__()

        self.norm = LayerNorm(dim, eps=1e-6, data_format="channels_first")
        
        self.fc1 = nn.Conv2d(dim, dim * mlp_ratio, 1)
        self.pos = nn.Conv2d(dim * mlp_ratio, dim * mlp_ratio, 3, padding=1, groups=dim * mlp_ratio)
        self.fc2 = nn.Conv2d(dim * mlp_ratio, dim, 1)
        self.act = nn.GELU()

    def forward(self, x):
        B, C, H, W = x.shape

        
        x = self.norm(x)
        x = self.fc1(x)
        x = self.act(x)
        x = x + self.act(self.pos(x))
        x = self.fc2(x)

        return

The text was updated successfully, but these errors were encountered:

houqb · 2022-12-07T11:12:36Z

Thanks for the questions.

We do miss the description on the use of 3x3 dwise conv in MLP and will update the paper.
You may refer to the paper termed RepLKNet for more explanations on this. In addition, this is benefitial to downstream tasks, which need higher-resolution images.

whiteinblue · 2022-12-22T09:42:28Z

Extra Question: you add self.layer_scale_1 and self.layer_scale_2 to ConvMod block, it also introduce extra parameters, what's the effective of the two scale params ???

houqb · 2023-01-14T03:30:01Z

If you use Hadamard product, the magnitude of the feature values tend to be larger than using addition. These parameters help the optimization process, which has been widely used in modern network architectures. You may refer to CaiT by Touvron et al. for more details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

some questions... #3

some questions... #3

lovekittynine commented Dec 7, 2022

houqb commented Dec 7, 2022 •

edited

Loading

whiteinblue commented Dec 22, 2022

houqb commented Jan 14, 2023

some questions... #3

some questions... #3

Comments

lovekittynine commented Dec 7, 2022

houqb commented Dec 7, 2022 • edited Loading

whiteinblue commented Dec 22, 2022

houqb commented Jan 14, 2023

houqb commented Dec 7, 2022 •

edited

Loading