afeng-x / smt Goto Github PK

View Code? Open in Web Editor NEW

170.0 170.0 15.0 13.3 MB

This is an official implementation for "Scale-Aware Modulation Meet Transformer".

Home Page: https://arxiv.org/abs/2307.08579

License: MIT License

Python 97.56% Jupyter Notebook 2.44%

smt's Introduction

Hi there 👋

😄 This is Weifeng Lin, 林炜丰 in Chinese.

👨‍🎓 I am now a PHD candidate at The Chinese University of Hong Kong.

📫 Email: [email protected]

📖 homepage: https://afeng-x.github.io/

smt's People

Contributors

Stargazers

Watchers

Forkers

hongbo-sun waungjian betterwuwangsheng rainbowpillow qhfan pipizhum ntnu-mechlab tony2016edu jinchen2028 marenan dl-vit yoowang mymuli liuzihan888 hebychen zsm191

smt's Issues

Long training time

Why did the training time for 10 batches increase several times when replacing the backbone network from CrossFormer-S to SMT-T, despite SMT-T having much fewer parameters and computations than CrossFormer-S?

Can you design a bigger model like smt_huge?

Now the biggest model is smt_large, Can you design a bigger model like smt_huge?
Thank you very much!

I When I run this command, the following error occurs. How can I resolve it?

a problem with importing these modules

Hello, author, how to solve the following problem:

I'm having a problem, can anyone help me with it please?

ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2694) of binary: /usr/bin/python
Traceback (most recent call last):
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launch.py", line 193, in
main()
File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launch.py", line 189, in main
launch(args)
File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launch.py", line 174, in launch
run(args)
File "/usr/local/lib/python3.6/dist-packages/torch/distributed/run.py", line 713, in run
)(*cmd_args)
File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launcher/api.py", line 131, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launcher/api.py", line 261, in launch_agent
failures=result.failures,
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

请问coco的权重可以分享吗

DW_Conv

Thank you very much for sharing your great work, in the paper I noticed that DW_Conv were used in the SAM module. If I use vanilla convolution instead of DW_Conv, will it have a performance improvement in addition to growing parameters?

Hi author, I didn't find the SAM block in the code, please tell me where it is, thanks!

train on IN-1K from scratch with smt_t; 65% top1 of 40 epoches.

I am not sure I train rightly.
some setting:

opt: adamw
baselr: 1e-3 warmup first 5 epoch
batch_size : 128
total epoch: 150 (not 300 because of too long gpu time with 10 days, 150 epoches ,4 T4 gpus)
lr_scheduler : cos from 5th epoch
now trained 40 epoches ,65% top1 ,and increase rate slowly per epoch.
and except total epoch(150) less than default 300;others are almost same as git .
I want to know 2 information:
1、Will total epoch that set 150 be big influence on final result ?
2、Is it normal that 40 epoches ,65% top1 ? bacause training need long time ,I want to get some information before. If it is wrong,I will stop it early.

Missing important literature

Hi @AFeng-x, thanks for sharing the great SMT work!

I'd like to bring up another highly related hierarchical vision transformer that also deals with the scale problems: MaxViT: Multi-Axis Vision Transformer [ECCV 2022]. I'm wondering if you could also add our work to your comparison figure and tables? Thanks a lot!

The “split_groups” in the smt of model seems to be “self.split_groups”

如何进行语义分割训练

作者你好，非常感谢你们的工作，但是我没有找到如何可以进行语义分割训练的文件

pth文件的缺少

这个ckpt.pth找不到麻烦可以提供一下吗谢谢

Query about the motivation

Hi there! This is a nice work. But I have a little query about the motivation of the architecture design. In the paper "Based on the research conducted in [11, 4], which performed a quantitative analysis of different depths of self-attention blocks and discovered that shallow blocks tend to capture short-range dependencies while deeper ones capture long-range dependencies".

From my knowledge, the transformer can always model globally and can capture high effective receptive field from initial stages. Why did you refer that the shallow blocks capture short-range dependencies and deeper ones capture long-range dependencies? Why did they both capture long-range? Why did shallow capture short range but deep capture long-range?

forward

Visualization of modulation values, relative receptive field

Thank you for your great work. I'm quite interested in your visualization works. It will contribute much to the research community.
It would be better if the authors could provide the code to visualize feature maps across heads and how to compute and draw relative receptive fields.
Thank you so much in advance.

About the pth file

Thanks for your team's efforts. Could you please share this weight file？

Pretrain weights for segmentation tasks

Hi,thank you for your work. Can you share the pretrain weights for segmentation task？

Hello, I am a beginner and I am encountering some problems that are causing my code to not run. Could you help me take a look at it? Thank you.

How to draw the Fig.4？

Thank you for your sharing, may I ask how the figure 4 in the paper is drawn? Can you provide the code?

OutOfMemoryError

Hi author, your code is great, but when I introduce your SMT module for training I always have a case of not enough memory in the attn = (q @ k.transpose(-2, -1)) * self.scale statement when calculating the Attention, and it's not enough for me to set the Batchsize to 1. Can the author give some ideas how to modify it, please. I'm only using stage3's structure

mmdet

Your work is so great, thank you for your contribution to science, but I am having problems reproducing it：TypeError: RetinaNet: init_weights() got an unexpected keyword argument 'pretrained' ， may I ask what is your version of mmdet, I am having the above problem with both MaskRCNN and RetinaNet.

您好，如何画出Fig.4(a)Different Heads的图片？

您好，我使用您代码中给的vis.ipynb代码，但只画出了如下图一样的图。请问如何画出您论文中Figure 4(a) Different Heads一样的图？