hvision-nku / conv2former Goto Github PK

View Code? Open in Web Editor NEW

142.0 142.0 10.0 16 KB

License: MIT License

Python 100.00%

conv2former's People

Contributors

Stargazers

Watchers

Forkers

dl-vit redamancyay hhhj1999 feiward helonin qihuacheng aliman80 zhichiang strategist922

conv2former's Issues

some questions...

Hello, i have some small questions about the code.

First, the MLP block uses self.pos layer, actually, the author hadn't mentioned it in the paper. It acts like a depth-wise separable convolution together with self.fc2, but it add some extra parameters, the effective of this layer are huge???
Second, in the Block code, i see the default args of kernel_size is 11, and padding is 5 for self.a layer, however, in the last stage(stage 4), the size of feature map is 7x7 (224x224 inputs), using kernel_size = 11 for convolution seems some strange.

Thanks for your replay!

class MLP(nn.Module):
    def __init__(self, dim, mlp_ratio=4):
        super().__init__()

        self.norm = LayerNorm(dim, eps=1e-6, data_format="channels_first")
        
        self.fc1 = nn.Conv2d(dim, dim * mlp_ratio, 1)
        self.pos = nn.Conv2d(dim * mlp_ratio, dim * mlp_ratio, 3, padding=1, groups=dim * mlp_ratio)
        self.fc2 = nn.Conv2d(dim * mlp_ratio, dim, 1)
        self.act = nn.GELU()

    def forward(self, x):
        B, C, H, W = x.shape

        
        x = self.norm(x)
        x = self.fc1(x)
        x = self.act(x)
        x = x + self.act(self.pos(x))
        x = self.fc2(x)

        return

Hi, your work is brilliant. Could I ask if there is any plan to release the pre-trained model?

training code and pre-trained model

Hello, thank you for your excellent work. Could you please release your training code and pre-trained model?

Is the kernel size of the last stage also 11？

As I see it, the spatial size of last stage is 7x7，maybe is it too big for using a 11x11 dwconv in convmod?

Is there any plan to release the pretrained model?

Hi, your work is brilliant. Could I ask if there is any plan to release the pre-trained model?

About the speed and Cuda memory.

I try to reproduce your model and I find that During training, the Conv2former-T was twice as slow as the Convnext-T and required nearly twice as much cuda memory, is this normal?

Trying to understand the Conv modulation

Hello, in the paper authors stated that: "The difference is that the convolutional kernels are static while the attention matrix generated by self-attention can adapt to the input"

Yes, the statement is indeed correct. However, I still don't quite get why authors wrote it like that, and please correct me if I'm wrong in this one.

Considering Self-attention first. During inference, attention matrix are generated using a Linear layer. Therefore it can adapt to the input. Next, is Conv. Can't we treat the whole conv modulation as one conv where: the kernel of conv modulation is generate using a conv layer, therefore this kernel of this conv modulation can adapt to the input as well?

hvision-nku / conv2former Goto Github PK

conv2former's People

Contributors

Stargazers

Watchers

Forkers

conv2former's Issues

some questions...

Hi, your work is brilliant. Could I ask if there is any plan to release the pre-trained model?

training code and pre-trained model

Is the kernel size of the last stage also 11？

Is there any plan to release the pretrained model?

About the speed and Cuda memory.

Trying to understand the Conv modulation

Questions about the code

请问能否提供更完整的代码？

The training and inferencing code?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent