Git Product home page Git Product logo

external-attention-pytorch's Introduction

external-attention-pytorch's People

Contributors

epsilon-deltta avatar iscyy avatar rushi-the-neural-arch avatar wmkai avatar xmu-xiaoma666 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

external-attention-pytorch's Issues

How to use attention module efficiently?

Hi, This repository helped me a lot. thank you

By the way, I have a question.
Is there a way to do attention only certain parts of the image?

In other words, is there a way to specify the part of the image that needs attention?

I want to use attention module more efficiently in CV task.

Polarized Self-Attention问题

您好,非常感谢您的工作!在PolarizedSelfAttention.py中,如果对通道分支和空间分支以串联组合方式进行处理,模块的输出需要加上通道分支的输出channel_out吗?即第88行代码是否有必要?不是直接return spatial_out吗?

WeightedPermuteMLP代码中的Linear问题?

WeightedPermuteMLP 中采用了几个全连接层Linear,具体代码位置在ViP.py中的21-23行

        self.mlp_c=nn.Linear(dim,dim,bias=qkv_bias)
        self.mlp_h=nn.Linear(dim,dim,bias=qkv_bias)
        self.mlp_w=nn.Linear(dim,dim,bias=qkv_bias)

这几个线性层的输入输出通道数都是dim,即输入输出的通道数不变
在forward时,除了mlp_c是直接输入了x没有什么问题

    def forward(self,x) :
        B,H,W,C=x.shape

        c_embed=self.mlp_c(x)

        S=C//self.seg_dim
        h_embed=x.reshape(B,H,W,self.seg_dim,S).permute(0,3,2,1,4).reshape(B,self.seg_dim,W,H*S)
        h_embed=self.mlp_h(h_embed).reshape(B,self.seg_dim,W,H,S).permute(0,3,2,1,4).reshape(B,H,W,C)

        w_embed=x.reshape(B,H,W,self.seg_dim,S).permute(0,3,1,2,4).reshape(B,self.seg_dim,H,W*S)
        w_embed=self.mlp_w(w_embed).reshape(B,self.seg_dim,H,W,S).permute(0,2,3,1,4).reshape(B,H,W,C)

        weight=(c_embed+h_embed+w_embed).permute(0,3,1,2).flatten(2).mean(2)
        weight=self.reweighting(weight).reshape(B,C,3).permute(2,0,1).softmax(0).unsqueeze(2).unsqueeze(2)

        x=c_embed*weight[0]+w_embed*weight[1]+h_embed*weight[2]

        x=self.proj_drop(self.proj(x))

其他的两个线性层在使用时都有问题
可以看到这一步

h_embed=x.reshape(B,H,W,self.seg_dim,S).permute(0,3,2,1,4).reshape(B,self.seg_dim,W,H*S)

最后将通道数改为了H*S ,在执行时如果H*S不等于C,接下来的线性层就会出错了,实际上这一步肯定会错误。
论文当中的代码处理也是类似的方法,不知道怎么解决?

BUG in CoAtNet

您好,最近在尝试用博主实现的coatnet源码做一些研究,但在我将图片输入进博主复现的coatnet之后,发现最终输出图片的stride是16而非原文中提到的是32,我看了看源码,感觉可能是源码中最后两个下采样使用的是一维最大池化,就导致需要两次一维池化下采样才能达到stride翻倍的效果。我个人有个想法感觉可以改成特征图在经过最后两个自注意力结构时,在池化前先将图像用view和pemute还原成BCWH的形式,然后再用二维最大池化,之后再用view和permute降维以适配自注意力结构的输入格式(方法有点复杂)。还有一个问题就是,博主在进行下采样的时候为什么即使在卷积部分也不采用stride为2的卷积而是采用最大池化呢?是博主有找到依据还是说只是先这么写着没那么麻烦呢?如果是我原文看漏了我自觉面壁一分钟。

init_weights需要显式调用吗

你好,我看了你的代码,注意到你在每种注意力中都实现了方法(eg: link),但是并没有调用。请问是需要显式调用吗还是初始化时PyTorch会自动调用

About coatnet

感觉博主对coatnet的实现在很多地方有问题(也吐槽一下coatnet这篇论文很多细节都没说清楚)
我觉得最重要的一个概念是文章作者所说的relative attention。文章本身也没聊这个概念,不过它在这个概念的基础上折腾了一下卷积和自注意力的权重公式。最最关键的是,作者是通过引入全局静态卷积核来融合卷积与transformer的(说得更简单一点就是,人论文里模型的图中写的是Rel-Attention,而不是普通的Attention)。说实话这个全局静态卷积核我是没有在博主你的实现里看到。
另外,我好像也没看到任何残差连接,x = out + x呢。。
抱歉,大晚上脑子有点晕,很多表述不是很妥,不过我觉得我想说的核心问题还是表达出来了

PSA问题

PSA的代码好像是对原特征图四种尺度卷积形成四部分特征图再拼接吧?看知乎说是划分四部分再卷积拼接

Some errors when using PSA

I wanted to insert PSA module to FCOS, but I met some errors.
Fortunately, I has solved that. I will show the changes I did.
There are three changes. First one is in the init function.

def __init__(self, channel=512,reduction=4,S=4):
       super().__init__()
       self.S=S

       self.convs=nn.ModuleList([])
       for i in range(S):
           # Add groups
           self.convs.append(nn.Conv2d(channel//S,channel//S,kernel_size=2*(i+1)+1,padding=i+1,groups=2**i))

       self.se_blocks=nn.ModuleList([])
       for i in range(S):
           self.se_blocks.append(nn.Sequential(
               nn.AdaptiveAvgPool2d(1),
               nn.Conv2d(channel//S, channel // (S*reduction),kernel_size=1, bias=False),
               nn.ReLU(inplace=False),
               nn.Conv2d(channel // (S*reduction), channel//S,kernel_size=1, bias=False),
               nn.Sigmoid()
           ))

       self.softmax=nn.Softmax(dim=1)

I find there are different groups setting for different SPC convs in the original EPSANet , so I add groups to same places. But I am not sure it is right.
Second, it is also in this function, I change self.convs=[] and self.se_blocks=[] to self.convs=nn.ModuleList([]) and self.se_blocks=nn.ModuleList([]). That is because when I used GPU to train my model, if their type are list, the initial weight will not be load to GPU. Using nn.ModuleList() will fix that.
Third, it is in the forward function. When I trained model, I got this error.
one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 128, 32, 32]], which is output 0 of SliceBackward, is at version 3; expected version 2 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
After try many different ways, I found the reason. This error is caused by the variables are changed in the forward function. There is my code.
` def forward(self, x):
b, c, h, w = x.size()

    #Step1:SPC module
    PSA_input=x.view(b,self.S,c//self.S,h,w) #bs,s,ci,h,w
    outs=[]
    for idx,conv in enumerate(self.convs):
        SPC_input = PSA_input[:,idx,:,:,:]
        #SPC_out[:,idx,:,:,:]=se(SPC_input)
        outs.append(conv(SPC_input))
    SPC_out = torch.stack([out for out in outs],dim=1)

    #Step2:SE weight
    SE_out=torch.zeros_like(SPC_out)
    outs=[]
    for idx,se in enumerate(self.se_blocks):
        SE_input = SPC_out[:,idx,:,:,:]
        #SE_out[:,idx,:,:,:]=se(SE_input)
        outs.append(se(SE_input))

    SE_out =  torch.stack([out for out in outs],dim=1)
    
    #Step3:Softmax
    softmax_out=self.softmax(SE_out)

    #Step4:SPA
    PSA_out=SPC_out*softmax_out
    PSA_out=PSA_out.view(b,-1,h,w)

    return PSA_out`

Now, my model with PSA can be trained, but don not know the result till now.
I think these changes don't change the struct of PSA, but I am not sure. I hope you can check if these changes are right.
Finally, thanks for your working.

Maybe an error in SKAttention?

首先感谢您的工作。
attention_weughts=self.softmax(attention_weughts)#k,bs,channel,1,1
此时attention_weughts 的shape 为 k,bs,channel,1,1,而该softmax应该是 k 这个维度进行。
因此 SKAttention 中 self.softmax=nn.Softmax(dim=1) 是否要改为 self.softmax=nn.Softmax(dim=0)
@xmu-xiaoma666 期待您的回复

关于halonet

你好,在运行halonet核心代码时,我发现两个不同的随机生成的输入会得到相同的结果,请问这是什么原因

Maybe an error in mlp/mlp_mixer.py

Dear Author:
Hello.
I find a question in here, and after I read the paper, I find the skip-connection here is
image

And the code here should be

class MixerBlock(nn.Module):
    def __init__(self,tokens_mlp_dim=16,channels_mlp_dim=1024,tokens_hidden_dim=32,channels_hidden_dim=1024):
        super().__init__()
        self.ln=nn.LayerNorm(channels_mlp_dim)
        self.tokens_mlp_block=MlpBlock(tokens_mlp_dim,mlp_dim=tokens_hidden_dim)
        self.channels_mlp_block=MlpBlock(channels_mlp_dim,mlp_dim=channels_hidden_dim)

    def forward(self,x):
        """
        x: (bs,tokens,channels)
        """
        ### tokens mixing
        y=self.ln(x)
        y=y.transpose(1,2) #(bs,channels,tokens)
        y=self.tokens_mlp_block(y) #(bs,channels,tokens)
        ### channels mixing
        y=y.transpose(1,2) #(bs,tokens,channels)
       # fixme: start
        out =x+y #(bs,tokens,channels)
        y=self.ln(out) #(bs,tokens,channels)
        y=out+self.channels_mlp_block(y) #(bs,tokens,channels)
       # fixme: end
        return y

Looking forward to your reply!
Best wishes!

关于输入输出参数的问题

感谢您整理的内容,非常好,但是对于参数有一些问题,希望可以帮忙解答
from attention.DANet import DAModule
import torch

input=torch.randn(50,512,7,7) →请问(512,7,7)这个是特征图尺寸对吧,50代表什么意思呢,batchsize?
danet=DAModule(d_model=512,kernel_size=3,H=7,W=7) → d_model代表什么呢
print(danet(input).shape)

CondConv and DynamicConv are the same

I think, there are some problem with CondConv implementation because it's almost the same as for DynamicConv and differs from the original paper's architecture (according to the paper, it should not have attention module)

请问ExternalAttention中的queries得是(bs, n, c),c是指channels?

def forward(self, queries):
attn=self.mk(queries) #bs,n,S
attn=self.softmax(attn) #bs,n,S
attn=attn/torch.sum(attn,dim=2,keepdim=True) #bs,n,S
out=self.mv(attn) #bs,n,d_model
return out

我想应用于分割代码中,请问queries (bs,n,c)这个c是指按类别分的么?还是只是我的特征图的通道数?
因为我得到的特征图是(bs,c,m,n)也就是bs张,c个通道,尺寸为m*n,需要先转成(bs,n,c)的格式?

Request for sample code

Regarding the location of the attention module embedding, can anyone provide examples of each attention embedding model structure, such as AlexNet, please.

Bug in CBAM

截屏2021-06-23 15 46 44

Hello, should the self.maxpool be AdaptiveMaxPooling?

mobilevit ’s structure output donot consistent with the paper

thanks you for the great work;
here is the paper's graph:
Uploading image.png…

I print the layer input and output below:
0 fc x.shape torch.Size([1, 3, 224, 224])
1 fc y.shape torch.Size([1, 16, 112, 112])
2 fc y.shape torch.Size([1, 16, 112, 112])
3 fc y.shape torch.Size([1, 24, 112, 112])
4 fc y.shape torch.Size([1, 24, 112, 112])
5 fc y.shape torch.Size([1, 24, 112, 112])
m_vits 1 b y.shape torch.Size([1, 48, 112, 112])
m_vits 1 b y.shape torch.Size([1, 48, 112, 112])
m_vits 2 b y.shape torch.Size([1, 64, 112, 112])
m_vits 2 b y.shape torch.Size([1, 64, 112, 112])
m_vits 3 b y.shape torch.Size([1, 80, 112, 112])
m_vits 3 b y.shape torch.Size([1, 80, 112, 112])
2222 fc y.shape torch.Size([1, 320, 112, 112])
3 fc y.shape torch.Size([1, 3595520])

使用ExternalAttention时报错

有人使用时遇到过这个错误吗?
RuntimeError: mat1 dim 1 must match mat2 dim 0
程序中关于nn.Linear应该没有问题
(mk): Linear(in_features=576, out_features=8, bias=False)
(mv): Linear(in_features=8, out_features=576, bias=False)

另外主干程序使用EAattention没有问题

论文引用

您好。我最近在写论文,里面用到了一些您复现的算法。所以说,除了想引用原作者的工作,也想引用您的工作。我是直接引用您的github,还是有相关论文可以引用? 非常感谢您的代码给我们的研究工作带来了很多便利以及新的想法。

How is the reproduced performance?

Hi, it's a great and concise reimplementation of MLP works. Meanwhile, I'm wondering how is the performance of reimplemented version compares to the reported performance of the manuscripts? It would be very appreciated if you could show the results of the experiment.

CoAtNet no residuals

Hi guys,

I've noticed that in your CoAtNet there aren't residual connection and norm layers

padding value should be same with dilation value

Hi

When I check this line, I thought that to make sure the spatial size won't change, the padding value should be consistent with the dilation value. Since kernel_size = 3, stride set as 1(default), 2 * p = d(3-1) = 2*d, so p=d(not constant 1)

image

self.sa.add_module('conv_%d'%i,nn.Conv2d(kernel_size=3,in_channels=channel//reduction,out_channels=channel//reduction,padding=1,dilation=dia_val))

Best

安装

这个总结的很不错 作者辛苦了 有个问题就是 这个可以通过pip 或者其他方式安装吗 ? 还是说需要将全部代码 git clone?

Need Help in modifying the given model

I need to modify the following model by adding one linear layer followed by one dropout layer and finally one linear layer(by concatenating one output from dropout layer and one from tabular data of 12 columns of data) to give one regression value as output.

Model class link:–> https://github.com/xmu-xiaoma666/External-Attention-pytorch/blob/master/model/attention/CoAtNet.py

I tried this:

class coAtNet_Model(nn.Module):
    def __init__(self):
        super(coAtNet_Model, self).__init__()
        self.model = CoAtNet(3,224)
        self.classifier = nn.Linear(14, 128)
        self.dropout = nn.Dropout(0.1)
        self.out = nn.Linear(128 + 12, 1)

    def forward(self, image, tabular_data_inputs):
        x = self.model(image)
        x = self.classifier(x)
        x = self.dropout(x)
        x = torch.cat([x, tabular_data_inputs], dim=1)
        x = self.out(x)

        return x
model = coAtNet_Model()

but getting error as: —>

-->x = torch.cat([x, tabular_data_inputs], dim=1)
   x = self.out(x)

RuntimeError: Tensors must have same number of dimensions: got 2 and 4

please help me in this.

maybe a error in attention/VIP.py

Dear Author:
Hello.
I find a question here
x=h_embed*weight[0]+w_embed*weight[1]+h_embed*weight[2]
maybe it's x=c_embed*weight[0]+w_embed*weight[1]+h_embed*weight[2]
thanks

No Adaptive Kernel Size Being Used in ECA Attention.

Hi, in the current code for ECA Attention the kernel size for convolution layer needs to be passed as a parameter, but in the original paper the kernel size is determined by a mapping function which takes number of channels as input.

Refer to the figure 3 of the paper.

So do you think the code in this repo needs some changes according to that? Or could you explain the reasoning behind using fixed kernel size?

torch load mobilevit_s.pt error

mvit_s = mobilevit_s()
checkpoint = torch.load("mobilevit_s.pt",map_location='cpu')
mvit_s.load_state_dict(checkpoint)
I downlod checkpoint from https://github.com/apple/ml-cvnets/blob/main/examples/README-mobilevit.md.
How can I load this

RuntimeError: Error(s) in loading state_dict for MobileViT:
Missing key(s) in state_dict: "conv_1.0.weight", "conv_1.0.bias", "conv_1.1.weight", "conv_1.1.bias", "conv_1.1.running_mean", "conv_1.1.running_var", "mv2.0.conv.0.weight", "mv2.0.conv.1.weight", "mv2.0.conv.1.bias", "mv2.0.conv.1.running_mean", "mv2.0.conv.1.running_var", "mv2.0.conv.3.weight", "mv2.0.conv.4.weight", "mv2.0.conv.4.bias", "mv2.0.conv.4.running_mean", "mv2.0.conv.4.running_var", "mv2.0.conv.6.weight", "mv2.0.conv.8.weight", "mv2.0.conv.8.bias", "mv2.0.conv.8.running_mean", "mv2.0.conv.8.running_var", "mv2.1.conv.0.weight", "mv2.1.conv.1.weight", "mv2.1.conv.1.bias", "mv2.1.conv.1.running_mean", "mv2.1.conv.1.running_var", "mv2.1.conv.3.weight", "mv2.1.conv.4.weight", "mv2.1.conv.4.bias", "mv2.1.conv.4.running_mean", "mv2.1.conv.4.running_var", "mv2.1.conv.6.weight", "mv2.1.conv.8.weight", "mv2.1.conv.8.bias", "mv2.1.conv.8.running_mean", "mv2.1.conv.8.running_var", "mv2.2.conv.0.weight", "mv2.2.conv.1.weight", "mv2.2.conv.1.bias", "mv2.2.conv.1.running_mean", "mv2.2.conv.1.running_var", "mv2.2.conv.3.weight", "mv2.2.conv.4.weight", "mv2.2.conv.4.bias", "mv2.2.conv.4.running_mean", "mv2.2.conv.4.running_var", "mv2.2.conv.6.weight", "mv2.2.conv.8.weight", "mv2.2.conv.8.bias", "mv2.2.conv.8.running_mean", "mv2.2.conv.8.running_var", "mv2.3.conv.0.weight", "mv2.3.conv.1.weight", "mv2.3.conv.1.bias", "mv2.3.conv.1.running_mean", "mv2.3.conv.1.running_var", "mv2.3.conv.3.weight", "mv2.3.conv.4.weight", "mv2.3.conv.4.bias", "mv2.3.conv.4.running_mean", "mv2.3.conv.4.running_var", "mv2.3.conv.6.weight", "mv2.3.conv.8.weight", "mv2.3.conv.8.bias", "mv2.3.conv.8.running_mean", "mv2.3.conv.8.running_var", "mv2.4.conv.0.weight", "mv2.4.conv.1.weight", "mv2.4.conv.1.bias", "mv2.4.conv.1.running_mean", "mv2.4.conv.1.running_var", "mv2.4.conv.3.weight", "mv2.4.conv.4.weight", "mv2.4.conv.4.bias", "mv2.4.conv.4.running_mean", "mv2.4.conv.4.running_var", "mv2.4.conv.6.weight", "mv2.4.conv.8.weight", "mv2.4.conv.8.bias", "mv2.4.conv.8.running_mean", "mv2.4.conv.8.running_var", "mv2.5.conv.0.weight", "mv2.5.conv.1.weight", "mv2.5.conv.1.bias", "mv2.5.conv.1.running_mean", "mv2.5.conv.1.running_var", "mv2.5.conv.3.weight", "mv2.5.conv.4.weight", "mv2.5.conv.4.bias", "mv2.5.conv.4.running_mean", "mv2.5.conv.4.running_var", "mv2.5.conv.6.weight", "mv2.5.conv.8.weight", "mv2.5.conv.8.bias", "mv2.5.conv.8.running_mean", "mv2.5.conv.8.running_var", "mv2.6.conv.0.weight", "mv2.6.conv.1.weight", "mv2.6.conv.1.bias", "mv2.6.conv.1.running_mean", "mv2.6.conv.1.running_var", "mv2.6.conv.3.weight", "mv2.6.conv.4.weight", "mv2.6.conv.4.bias", "mv2.6.conv.4.running_mean", "mv2.6.conv.4.running_var", "mv2.6.conv.6.weight", "mv2.6.conv.8.weight", "mv2.6.conv.8.bias", "mv2.6.conv.8.running_mean", "mv2.6.conv.8.running_var", "m_vits.0.conv_1.weight", "m_vits.0.conv_1.bias", "m_vits.0.conv2.weight", "m_vits.0.conv2.bias", "m_vits.0.trans.layers.0.0.ln.weight", "m_vits.0.trans.layers.0.0.ln.bias", "m_vits.0.trans.layers.0.0.fn.to_qkv.weight", "m_vits.0.trans.layers.0.0.fn.to_out.0.weight", "m_vits.0.trans.layers.0.0.fn.to_out.0.bias", "m_vits.0.trans.layers.0.1.ln.weight", "m_vits.0.trans.layers.0.1.ln.bias", "m_vits.0.trans.layers.0.1.fn.net.0.weight", "m_vits.0.trans.layers.0.1.fn.net.0.bias", "m_vits.0.trans.layers.0.1.fn.net.3.weight", "m_vits.0.trans.layers.0.1.fn.net.3.bias", "m_vits.0.trans.layers.1.0.ln.weight", "m_vits.0.trans.layers.1.0.ln.bias", "m_vits.0.trans.layers.1.0.fn.to_qkv.weight", "m_vits.0.trans.layers.1.0.fn.to_out.0.weight", "m_vits.0.trans.layers.1.0.fn.to_out.0.bias", "m_vits.0.trans.layers.1.1.ln.weight", "m_vits.0.trans.layers.1.1.ln.bias", "m_vits.0.trans.layers.1.1.fn.net.0.weight", "m_vits.0.trans.layers.1.1.fn.net.0.bias", "m_vits.0.trans.layers.1.1.fn.net.3.weight", "m_vits.0.trans.layers.1.1.fn.net.3.bias", "m_vits.0.conv3.weight", "m_vits.0.conv3.bias", "m_vits.0.conv4.weight", "m_vits.0.conv4.bias", "m_vits.1.conv_1.weight", "m_vits.1.conv_1.bias", "m_vits.1.conv2.weight", "m_vits.1.conv2.bias", "m_vits.1.trans.layers.0.0.ln.weight", "m_vits.1.trans.layers.0.0.ln.bias", "m_vits.1.trans.layers.0.0.fn.to_qkv.weight", "m_vits.1.trans.layers.0.0.fn.to_out.0.weight", "m_vits.1.trans.layers.0.0.fn.to_out.0.bias", "m_vits.1.trans.layers.0.1.ln.weight", "m_vits.1.trans.layers.0.1.ln.bias", "m_vits.1.trans.layers.0.1.fn.net.0.weight", "m_vits.1.trans.layers.0.1.fn.net.0.bias", "m_vits.1.trans.layers.0.1.fn.net.3.weight", "m_vits.1.trans.layers.0.1.fn.net.3.bias", "m_vits.1.trans.layers.1.0.ln.weight", "m_vits.1.trans.layers.1.0.ln.bias", "m_vits.1.trans.layers.1.0.fn.to_qkv.weight", "m_vits.1.trans.layers.1.0.fn.to_out.0.weight", "m_vits.1.trans.layers.1.0.fn.to_out.0.bias", "m_vits.1.trans.layers.1.1.ln.weight", "m_vits.1.trans.layers.1.1.ln.bias", "m_vits.1.trans.layers.1.1.fn.net.0.weight", "m_vits.1.trans.layers.1.1.fn.net.0.bias", "m_vits.1.trans.layers.1.1.fn.net.3.weight", "m_vits.1.trans.layers.1.1.fn.net.3.bias", "m_vits.1.trans.layers.2.0.ln.weight", "m_vits.1.trans.layers.2.0.ln.bias", "m_vits.1.trans.layers.2.0.fn.to_qkv.weight", "m_vits.1.trans.layers.2.0.fn.to_out.0.weight", "m_vits.1.trans.layers.2.0.fn.to_out.0.bias", "m_vits.1.trans.layers.2.1.ln.weight", "m_vits.1.trans.layers.2.1.ln.bias", "m_vits.1.trans.layers.2.1.fn.net.0.weight", "m_vits.1.trans.layers.2.1.fn.net.0.bias", "m_vits.1.trans.layers.2.1.fn.net.3.weight", "m_vits.1.trans.layers.2.1.fn.net.3.bias", "m_vits.1.trans.layers.3.0.ln.weight", "m_vits.1.trans.layers.3.0.ln.bias", "m_vits.1.trans.layers.3.0.fn.to_qkv.weight", "m_vits.1.trans.layers.3.0.fn.to_out.0.weight", "m_vits.1.trans.layers.3.0.fn.to_out.0.bias", "m_vits.1.trans.layers.3.1.ln.weight", "m_vits.1.trans.layers.3.1.ln.bias", "m_vits.1.trans.layers.3.1.fn.net.0.weight", "m_vits.1.trans.layers.3.1.fn.net.0.bias", "m_vits.1.trans.layers.3.1.fn.net.3.weight", "m_vits.1.trans.layers.3.1.fn.net.3.bias", "m_vits.1.conv3.weight", "m_vits.1.conv3.bias", "m_vits.1.conv4.weight", "m_vits.1.conv4.bias", "m_vits.2.conv_1.weight", "m_vits.2.conv_1.bias", "m_vits.2.conv2.weight", "m_vits.2.conv2.bias", "m_vits.2.trans.layers.0.0.ln.weight", "m_vits.2.trans.layers.0.0.ln.bias", "m_vits.2.trans.layers.0.0.fn.to_qkv.weight", "m_vits.2.trans.layers.0.0.fn.to_out.0.weight", "m_vits.2.trans.layers.0.0.fn.to_out.0.bias", "m_vits.2.trans.layers.0.1.ln.weight", "m_vits.2.trans.layers.0.1.ln.bias", "m_vits.2.trans.layers.0.1.fn.net.0.weight", "m_vits.2.trans.layers.0.1.fn.net.0.bias", "m_vits.2.trans.layers.0.1.fn.net.3.weight", "m_vits.2.trans.layers.0.1.fn.net.3.bias", "m_vits.2.trans.layers.1.0.ln.weight", "m_vits.2.trans.layers.1.0.ln.bias", "m_vits.2.trans.layers.1.0.fn.to_qkv.weight", "m_vits.2.trans.layers.1.0.fn.to_out.0.weight", "m_vits.2.trans.layers.1.0.fn.to_out.0.bias", "m_vits.2.trans.layers.1.1.ln.weight", "m_vits.2.trans.layers.1.1.ln.bias", "m_vits.2.trans.layers.1.1.fn.net.0.weight", "m_vits.2.trans.layers.1.1.fn.net.0.bias", "m_vits.2.trans.layers.1.1.fn.net.3.weight", "m_vits.2.trans.layers.1.1.fn.net.3.bias", "m_vits.2.trans.layers.2.0.ln.weight", "m_vits.2.trans.layers.2.0.ln.bias", "m_vits.2.trans.layers.2.0.fn.to_qkv.weight", "m_vits.2.trans.layers.2.0.fn.to_out.0.weight", "m_vits.2.trans.layers.2.0.fn.to_out.0.bias", "m_vits.2.trans.layers.2.1.ln.weight", "m_vits.2.trans.layers.2.1.ln.bias", "m_vits.2.trans.layers.2.1.fn.net.0.weight", "m_vits.2.trans.layers.2.1.fn.net.0.bias", "m_vits.2.trans.layers.2.1.fn.net.3.weight", "m_vits.2.trans.layers.2.1.fn.net.3.bias", "m_vits.2.conv3.weight", "m_vits.2.conv3.bias", "m_vits.2.conv4.weight", "m_vits.2.conv4.bias", "conv2.0.weight", "conv2.0.bias", "conv2.1.weight", "conv2.1.bias", "conv2.1.running_mean", "conv2.1.running_var", "fc.weight".
Unexpected key(s) in state_dict: "layer_1.0.block.exp_1x1.block.conv.weight", "layer_1.0.block.exp_1x1.block.norm.weight", "layer_1.0.block.exp_1x1.block.norm.bias", "layer_1.0.block.exp_1x1.block.norm.running_mean", "layer_1.0.block.exp_1x1.block.norm.running_var", "layer_1.0.block.exp_1x1.block.norm.num_batches_tracked", "layer_1.0.block.conv_3x3.block.conv.weight", "layer_1.0.block.conv_3x3.block.norm.weight", "layer_1.0.block.conv_3x3.block.norm.bias", "layer_1.0.block.conv_3x3.block.norm.running_mean", "layer_1.0.block.conv_3x3.block.norm.running_var", "layer_1.0.block.conv_3x3.block.norm.num_batches_tracked", "layer_1.0.block.red_1x1.block.conv.weight", "layer_1.0.block.red_1x1.block.norm.weight", "layer_1.0.block.red_1x1.block.norm.bias", "layer_1.0.block.red_1x1.block.norm.running_mean", "layer_1.0.block.red_1x1.block.norm.running_var", "layer_1.0.block.red_1x1.block.norm.num_batches_tracked", "layer_2.0.block.exp_1x1.block.conv.weight", "layer_2.0.block.exp_1x1.block.norm.weight", "layer_2.0.block.exp_1x1.block.norm.bias", "layer_2.0.block.exp_1x1.block.norm.running_mean", "layer_2.0.block.exp_1x1.block.norm.running_var", "layer_2.0.block.exp_1x1.block.norm.num_batches_tracked", "layer_2.0.block.conv_3x3.block.conv.weight", "layer_2.0.block.conv_3x3.block.norm.weight", "layer_2.0.block.conv_3x3.block.norm.bias", "layer_2.0.block.conv_3x3.block.norm.running_mean", "layer_2.0.block.conv_3x3.block.norm.running_var", "layer_2.0.block.conv_3x3.block.norm.num_batches_tracked", "layer_2.0.block.red_1x1.block.conv.weight", "layer_2.0.block.red_1x1.block.norm.weight", "layer_2.0.block.red_1x1.block.norm.bias", "layer_2.0.block.red_1x1.block.norm.running_mean", "layer_2.0.block.red_1x1.block.norm.running_var", "layer_2.0.block.red_1x1.block.norm.num_batches_tracked", "layer_2.1.block.exp_1x1.block.conv.weight", "layer_2.1.block.exp_1x1.block.norm.weight", "layer_2.1.block.exp_1x1.block.norm.bias", "layer_2.1.block.exp_1x1.block.norm.running_mean", "layer_2.1.block.exp_1x1.block.norm.running_var", "layer_2.1.block.exp_1x1.block.norm.num_batches_tracked", "layer_2.1.block.conv_3x3.block.conv.weight", "layer_2.1.block.conv_3x3.block.norm.weight", "layer_2.1.block.conv_3x3.block.norm.bias", "layer_2.1.block.conv_3x3.block.norm.running_mean", "layer_2.1.block.conv_3x3.block.norm.running_var", "layer_2.1.block.conv_3x3.block.norm.num_batches_tracked", "layer_2.1.block.red_1x1.block.conv.weight", "layer_2.1.block.red_1x1.block.norm.weight", "layer_2.1.block.red_1x1.block.norm.bias", "layer_2.1.block.red_1x1.block.norm.running_mean", "layer_2.1.block.red_1x1.block.norm.running_var", "layer_2.1.block.red_1x1.block.norm.num_batches_tracked", "layer_2.2.block.exp_1x1.block.conv.weight", "layer_2.2.block.exp_1x1.block.norm.weight", "layer_2.2.block.exp_1x1.block.norm.bias", "layer_2.2.block.exp_1x1.block.norm.running_mean", "layer_2.2.block.exp_1x1.block.norm.running_var", "layer_2.2.block.exp_1x1.block.norm.num_batches_tracked", "layer_2.2.block.conv_3x3.block.conv.weight", "layer_2.2.block.conv_3x3.block.norm.weight", "layer_2.2.block.conv_3x3.block.norm.bias", "layer_2.2.block.conv_3x3.block.norm.running_mean", "layer_2.2.block.conv_3x3.block.norm.running_var", "layer_2.2.block.conv_3x3.block.norm.num_batches_tracked", "layer_2.2.block.red_1x1.block.conv.weight", "layer_2.2.block.red_1x1.block.norm.weight", "layer_2.2.block.red_1x1.block.norm.bias", "layer_2.2.block.red_1x1.block.norm.running_mean", "layer_2.2.block.red_1x1.block.norm.running_var", "layer_2.2.block.red_1x1.block.norm.num_batches_tracked", "layer_3.0.block.exp_1x1.block.conv.weight", "layer_3.0.block.exp_1x1.block.norm.weight", "layer_3.0.block.exp_1x1.block.norm.bias", "layer_3.0.block.exp_1x1.block.norm.running_mean", "layer_3.0.block.exp_1x1.block.norm.running_var", "layer_3.0.block.exp_1x1.block.norm.num_batches_tracked", "layer_3.0.block.conv_3x3.block.conv.weight", "layer_3.0.block.conv_3x3.block.norm.weight", "layer_3.0.block.conv_3x3.block.norm.bias", "layer_3.0.block.conv_3x3.block.norm.running_mean", "layer_3.0.block.conv_3x3.block.norm.running_var", "layer_3.0.block.conv_3x3.block.norm.num_batches_tracked", "layer_3.0.block.red_1x1.block.conv.weight", "layer_3.0.block.red_1x1.block.norm.weight", "layer_3.0.block.red_1x1.block.norm.bias", "layer_3.0.block.red_1x1.block.norm.running_mean", "layer_3.0.block.red_1x1.block.norm.running_var", "layer_3.0.block.red_1x1.block.norm.num_batches_tracked", "layer_3.1.local_rep.conv_3x3.block.conv.weight", "layer_3.1.local_rep.conv_3x3.block.norm.weight", "layer_3.1.local_rep.conv_3x3.block.norm.bias", "layer_3.1.local_rep.conv_3x3.block.norm.running_mean", "layer_3.1.local_rep.conv_3x3.block.norm.running_var", "layer_3.1.local_rep.conv_3x3.block.norm.num_batches_tracked", "layer_3.1.local_rep.conv_1x1.block.conv.weight", "layer_3.1.global_rep.0.pre_norm_mha.0.weight", "layer_3.1.global_rep.0.pre_norm_mha.0.bias", "layer_3.1.global_rep.0.pre_norm_mha.1.qkv_proj.weight", "layer_3.1.global_rep.0.pre_norm_mha.1.qkv_proj.bias", "layer_3.1.global_rep.0.pre_norm_mha.1.out_proj.weight", "layer_3.1.global_rep.0.pre_norm_mha.1.out_proj.bias", "layer_3.1.global_rep.0.pre_norm_ffn.0.weight", "layer_3.1.global_rep.0.pre_norm_ffn.0.bias", "layer_3.1.global_rep.0.pre_norm_ffn.1.weight", "layer_3.1.global_rep.0.pre_norm_ffn.1.bias", "layer_3.1.global_rep.0.pre_norm_ffn.4.weight", "layer_3.1.global_rep.0.pre_norm_ffn.4.bias", "layer_3.1.global_rep.1.pre_norm_mha.0.weight", "layer_3.1.global_rep.1.pre_norm_mha.0.bias", "layer_3.1.global_rep.1.pre_norm_mha.1.qkv_proj.weight", "layer_3.1.global_rep.1.pre_norm_mha.1.qkv_proj.bias", "layer_3.1.global_rep.1.pre_norm_mha.1.out_proj.weight", "layer_3.1.global_rep.1.pre_norm_mha.1.out_proj.bias", "layer_3.1.global_rep.1.pre_norm_ffn.0.weight", "layer_3.1.global_rep.1.pre_norm_ffn.0.bias", "layer_3.1.global_rep.1.pre_norm_ffn.1.weight", "layer_3.1.global_rep.1.pre_norm_ffn.1.bias", "layer_3.1.global_rep.1.pre_norm_ffn.4.weight", "layer_3.1.global_rep.1.pre_norm_ffn.4.bias", "layer_3.1.global_rep.2.weight", "layer_3.1.global_rep.2.bias", "layer_3.1.conv_proj.block.conv.weight", "layer_3.1.conv_proj.block.norm.weight", "layer_3.1.conv_proj.block.norm.bias", "layer_3.1.conv_proj.block.norm.running_mean", "layer_3.1.conv_proj.block.norm.running_var", "layer_3.1.conv_proj.block.norm.num_batches_tracked", "layer_3.1.fusion.block.conv.weight", "layer_3.1.fusion.block.norm.weight", "layer_3.1.fusion.block.norm.bias", "layer_3.1.fusion.block.norm.running_mean", "layer_3.1.fusion.block.norm.running_var", "layer_3.1.fusion.block.norm.num_batches_tracked", "layer_4.0.block.exp_1x1.block.conv.weight", "layer_4.0.block.exp_1x1.block.norm.weight", "layer_4.0.block.exp_1x1.block.norm.bias", "layer_4.0.block.exp_1x1.block.norm.running_mean", "layer_4.0.block.exp_1x1.block.norm.running_var", "layer_4.0.block.exp_1x1.block.norm.num_batches_tracked", "layer_4.0.block.conv_3x3.block.conv.weight", "layer_4.0.block.conv_3x3.block.norm.weight", "layer_4.0.block.conv_3x3.block.norm.bias", "layer_4.0.block.conv_3x3.block.norm.running_mean", "layer_4.0.block.conv_3x3.block.norm.running_var", "layer_4.0.block.conv_3x3.block.norm.num_batches_tracked", "layer_4.0.block.red_1x1.block.conv.weight", "layer_4.0.block.red_1x1.block.norm.weight", "layer_4.0.block.red_1x1.block.norm.bias", "layer_4.0.block.red_1x1.block.norm.running_mean", "layer_4.0.block.red_1x1.block.norm.running_var", "layer_4.0.block.red_1x1.block.norm.num_batches_tracked", "layer_4.1.local_rep.conv_3x3.block.conv.weight", "layer_4.1.local_rep.conv_3x3.block.norm.weight", "layer_4.1.local_rep.conv_3x3.block.norm.bias", "layer_4.1.local_rep.conv_3x3.block.norm.running_mean", "layer_4.1.local_rep.conv_3x3.block.norm.running_var", "layer_4.1.local_rep.conv_3x3.block.norm.num_batches_tracked", "layer_4.1.local_rep.conv_1x1.block.conv.weight", "layer_4.1.global_rep.0.pre_norm_mha.0.weight", "layer_4.1.global_rep.0.pre_norm_mha.0.bias", "layer_4.1.global_rep.0.pre_norm_mha.1.qkv_proj.weight", "layer_4.1.global_rep.0.pre_norm_mha.1.qkv_proj.bias", "layer_4.1.global_rep.0.pre_norm_mha.1.out_proj.weight", "layer_4.1.global_rep.0.pre_norm_mha.1.out_proj.bias", "layer_4.1.global_rep.0.pre_norm_ffn.0.weight", "layer_4.1.global_rep.0.pre_norm_ffn.0.bias", "layer_4.1.global_rep.0.pre_norm_ffn.1.weight", "layer_4.1.global_rep.0.pre_norm_ffn.1.bias", "layer_4.1.global_rep.0.pre_norm_ffn.4.weight", "layer_4.1.global_rep.0.pre_norm_ffn.4.bias", "layer_4.1.global_rep.1.pre_norm_mha.0.weight", "layer_4.1.global_rep.1.pre_norm_mha.0.bias", "layer_4.1.global_rep.1.pre_norm_mha.1.qkv_proj.weight", "layer_4.1.global_rep.1.pre_norm_mha.1.qkv_proj.bias", "layer_4.1.global_rep.1.pre_norm_mha.1.out_proj.weight", "layer_4.1.global_rep.1.pre_norm_mha.1.out_proj.bias", "layer_4.1.global_rep.1.pre_norm_ffn.0.weight", "layer_4.1.global_rep.1.pre_norm_ffn.0.bias", "layer_4.1.global_rep.1.pre_norm_ffn.1.weight", "layer_4.1.global_rep.1.pre_norm_ffn.1.bias", "layer_4.1.global_rep.1.pre_norm_ffn.4.weight", "layer_4.1.global_rep.1.pre_norm_ffn.4.bias", "layer_4.1.global_rep.2.pre_norm_mha.0.weight", "layer_4.1.global_rep.2.pre_norm_mha.0.bias", "layer_4.1.global_rep.2.pre_norm_mha.1.qkv_proj.weight", "layer_4.1.global_rep.2.pre_norm_mha.1.qkv_proj.bias", "layer_4.1.global_rep.2.pre_norm_mha.1.out_proj.weight", "layer_4.1.global_rep.2.pre_norm_mha.1.out_proj.bias", "layer_4.1.global_rep.2.pre_norm_ffn.0.weight", "layer_4.1.global_rep.2.pre_norm_ffn.0.bias", "layer_4.1.global_rep.2.pre_norm_ffn.1.weight", "layer_4.1.global_rep.2.pre_norm_ffn.1.bias", "layer_4.1.global_rep.2.pre_norm_ffn.4.weight", "layer_4.1.global_rep.2.pre_norm_ffn.4.bias", "layer_4.1.global_rep.3.pre_norm_mha.0.weight", "layer_4.1.global_rep.3.pre_norm_mha.0.bias", "layer_4.1.global_rep.3.pre_norm_mha.1.qkv_proj.weight", "layer_4.1.global_rep.3.pre_norm_mha.1.qkv_proj.bias", "layer_4.1.global_rep.3.pre_norm_mha.1.out_proj.weight", "layer_4.1.global_rep.3.pre_norm_mha.1.out_proj.bias", "layer_4.1.global_rep.3.pre_norm_ffn.0.weight", "layer_4.1.global_rep.3.pre_norm_ffn.0.bias", "layer_4.1.global_rep.3.pre_norm_ffn.1.weight", "layer_4.1.global_rep.3.pre_norm_ffn.1.bias", "layer_4.1.global_rep.3.pre_norm_ffn.4.weight", "layer_4.1.global_rep.3.pre_norm_ffn.4.bias", "layer_4.1.global_rep.4.weight", "layer_4.1.global_rep.4.bias", "layer_4.1.conv_proj.block.conv.weight", "layer_4.1.conv_proj.block.norm.weight", "layer_4.1.conv_proj.block.norm.bias", "layer_4.1.conv_proj.block.norm.running_mean", "layer_4.1.conv_proj.block.norm.running_var", "layer_4.1.conv_proj.block.norm.num_batches_tracked", "layer_4.1.fusion.block.conv.weight", "layer_4.1.fusion.block.norm.weight", "layer_4.1.fusion.block.norm.bias", "layer_4.1.fusion.block.norm.running_mean", "layer_4.1.fusion.block.norm.running_var", "layer_4.1.fusion.block.norm.num_batches_tracked", "layer_5.0.block.exp_1x1.block.conv.weight", "layer_5.0.block.exp_1x1.block.norm.weight", "layer_5.0.block.exp_1x1.block.norm.bias", "layer_5.0.block.exp_1x1.block.norm.running_mean", "layer_5.0.block.exp_1x1.block.norm.running_var", "layer_5.0.block.exp_1x1.block.norm.num_batches_tracked", "layer_5.0.block.conv_3x3.block.conv.weight", "layer_5.0.block.conv_3x3.block.norm.weight", "layer_5.0.block.conv_3x3.block.norm.bias", "layer_5.0.block.conv_3x3.block.norm.running_mean", "layer_5.0.block.conv_3x3.block.norm.running_var", "layer_5.0.block.conv_3x3.block.norm.num_batches_tracked", "layer_5.0.block.red_1x1.block.conv.weight", "layer_5.0.block.red_1x1.block.norm.weight", "layer_5.0.block.red_1x1.block.norm.bias", "layer_5.0.block.red_1x1.block.norm.running_mean", "layer_5.0.block.red_1x1.block.norm.running_var", "layer_5.0.block.red_1x1.block.norm.num_batches_tracked", "layer_5.1.local_rep.conv_3x3.block.conv.weight", "layer_5.1.local_rep.conv_3x3.block.norm.weight", "layer_5.1.local_rep.conv_3x3.block.norm.bias", "layer_5.1.local_rep.conv_3x3.block.norm.running_mean", "layer_5.1.local_rep.conv_3x3.block.norm.running_var", "layer_5.1.local_rep.conv_3x3.block.norm.num_batches_tracked", "layer_5.1.local_rep.conv_1x1.block.conv.weight", "layer_5.1.global_rep.0.pre_norm_mha.0.weight", "layer_5.1.global_rep.0.pre_norm_mha.0.bias", "layer_5.1.global_rep.0.pre_norm_mha.1.qkv_proj.weight", "layer_5.1.global_rep.0.pre_norm_mha.1.qkv_proj.bias", "layer_5.1.global_rep.0.pre_norm_mha.1.out_proj.weight", "layer_5.1.global_rep.0.pre_norm_mha.1.out_proj.bias", "layer_5.1.global_rep.0.pre_norm_ffn.0.weight", "layer_5.1.global_rep.0.pre_norm_ffn.0.bias", "layer_5.1.global_rep.0.pre_norm_ffn.1.weight", "layer_5.1.global_rep.0.pre_norm_ffn.1.bias", "layer_5.1.global_rep.0.pre_norm_ffn.4.weight", "layer_5.1.global_rep.0.pre_norm_ffn.4.bias", "layer_5.1.global_rep.1.pre_norm_mha.0.weight", "layer_5.1.global_rep.1.pre_norm_mha.0.bias", "layer_5.1.global_rep.1.pre_norm_mha.1.qkv_proj.weight", "layer_5.1.global_rep.1.pre_norm_mha.1.qkv_proj.bias", "layer_5.1.global_rep.1.pre_norm_mha.1.out_proj.weight", "layer_5.1.global_rep.1.pre_norm_mha.1.out_proj.bias", "layer_5.1.global_rep.1.pre_norm_ffn.0.weight", "layer_5.1.global_rep.1.pre_norm_ffn.0.bias", "layer_5.1.global_rep.1.pre_norm_ffn.1.weight", "layer_5.1.global_rep.1.pre_norm_ffn.1.bias", "layer_5.1.global_rep.1.pre_norm_ffn.4.weight", "layer_5.1.global_rep.1.pre_norm_ffn.4.bias", "layer_5.1.global_rep.2.pre_norm_mha.0.weight", "layer_5.1.global_rep.2.pre_norm_mha.0.bias", "layer_5.1.global_rep.2.pre_norm_mha.1.qkv_proj.weight", "layer_5.1.global_rep.2.pre_norm_mha.1.qkv_proj.bias", "layer_5.1.global_rep.2.pre_norm_mha.1.out_proj.weight", "layer_5.1.global_rep.2.pre_norm_mha.1.out_proj.bias", "layer_5.1.global_rep.2.pre_norm_ffn.0.weight", "layer_5.1.global_rep.2.pre_norm_ffn.0.bias", "layer_5.1.global_rep.2.pre_norm_ffn.1.weight", "layer_5.1.global_rep.2.pre_norm_ffn.1.bias", "layer_5.1.global_rep.2.pre_norm_ffn.4.weight", "layer_5.1.global_rep.2.pre_norm_ffn.4.bias", "layer_5.1.global_rep.3.weight", "layer_5.1.global_rep.3.bias", "layer_5.1.conv_proj.block.conv.weight", "layer_5.1.conv_proj.block.norm.weight", "layer_5.1.conv_proj.block.norm.bias", "layer_5.1.conv_proj.block.norm.running_mean", "layer_5.1.conv_proj.block.norm.running_var", "layer_5.1.conv_proj.block.norm.num_batches_tracked", "layer_5.1.fusion.block.conv.weight", "layer_5.1.fusion.block.norm.weight", "layer_5.1.fusion.block.norm.bias", "layer_5.1.fusion.block.norm.running_mean", "layer_5.1.fusion.block.norm.running_var", "layer_5.1.fusion.block.norm.num_batches_tracked", "conv_1x1_exp.block.conv.weight", "conv_1x1_exp.block.norm.weight", "conv_1x1_exp.block.norm.bias", "conv_1x1_exp.block.norm.running_mean", "conv_1x1_exp.block.norm.running_var", "conv_1x1_exp.block.norm.num_batches_tracked", "classifier.fc.weight", "classifier.fc.bias", "conv_1.block.conv.weight", "conv_1.block.norm.weight", "conv_1.block.norm.bias", "conv_1.block.norm.running_mean", "conv_1.block.norm.running_var", "conv_1.block.norm.num_batches_tracked".

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.