hvision-nku / conv2former Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
Hello, i have some small questions about the code.
self.pos
layer, actually, the author hadn't mentioned it in the paper. It acts like a depth-wise separable convolution together with self.fc2
, but it add some extra parameters, the effective of this layer are huge???self.a
layer, however, in the last stage(stage 4), the size of feature map is 7x7 (224x224 inputs), using kernel_size = 11 for convolution seems some strange.Thanks for your replay!
class MLP(nn.Module):
def __init__(self, dim, mlp_ratio=4):
super().__init__()
self.norm = LayerNorm(dim, eps=1e-6, data_format="channels_first")
self.fc1 = nn.Conv2d(dim, dim * mlp_ratio, 1)
self.pos = nn.Conv2d(dim * mlp_ratio, dim * mlp_ratio, 3, padding=1, groups=dim * mlp_ratio)
self.fc2 = nn.Conv2d(dim * mlp_ratio, dim, 1)
self.act = nn.GELU()
def forward(self, x):
B, C, H, W = x.shape
x = self.norm(x)
x = self.fc1(x)
x = self.act(x)
x = x + self.act(self.pos(x))
x = self.fc2(x)
return
Hi, your work is brilliant. Could I ask if there is any plan to release the pre-trained model?
Hello, thank you for your excellent work. Could you please release your training code and pre-trained model?
As I see it, the spatial size of last stage is 7x7,maybe is it too big for using a 11x11 dwconv in convmod?
Hi, your work is brilliant. Could I ask if there is any plan to release the pre-trained model?
I try to reproduce your model and I find that During training, the Conv2former-T was twice as slow as the Convnext-T and required nearly twice as much cuda memory, is this normal?
Hello, in the paper authors stated that: "The difference is that the convolutional kernels are static while the attention matrix generated by self-attention can adapt to the input"
Yes, the statement is indeed correct. However, I still don't quite get why authors wrote it like that, and please correct me if I'm wrong in this one.
Considering Self-attention first. During inference, attention matrix are generated using a Linear layer. Therefore it can adapt to the input. Next, is Conv. Can't we treat the whole conv modulation as one conv where: the kernel of conv modulation is generate using a conv layer, therefore this kernel of this conv modulation can adapt to the input as well?
Is this the official realization of the paper?
I am interested in Conv2Former, and when will you publish the training and inferencing code and tutorial?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.