😄 This is Weifeng Lin, 林炜丰 in Chinese.
👨🎓 I am now a PHD candidate at The Chinese University of Hong Kong.
📫 Email: [email protected]
📖 homepage: https://afeng-x.github.io/
This is an official implementation for "Scale-Aware Modulation Meet Transformer".
Home Page: https://arxiv.org/abs/2307.08579
License: MIT License
😄 This is Weifeng Lin, 林炜丰 in Chinese.
👨🎓 I am now a PHD candidate at The Chinese University of Hong Kong.
📫 Email: [email protected]
📖 homepage: https://afeng-x.github.io/
Why did the training time for 10 batches increase several times when replacing the backbone network from CrossFormer-S to SMT-T, despite SMT-T having much fewer parameters and computations than CrossFormer-S?
Now the biggest model is smt_large, Can you design a bigger model like smt_huge?
Thank you very much!
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2694) of binary: /usr/bin/python
Traceback (most recent call last):
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launch.py", line 193, in
main()
File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launch.py", line 189, in main
launch(args)
File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launch.py", line 174, in launch
run(args)
File "/usr/local/lib/python3.6/dist-packages/torch/distributed/run.py", line 713, in run
)(*cmd_args)
File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launcher/api.py", line 131, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launcher/api.py", line 261, in launch_agent
failures=result.failures,
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
Thank you very much for sharing your great work, in the paper I noticed that DW_Conv were used in the SAM module. If I use vanilla convolution instead of DW_Conv, will it have a performance improvement in addition to growing parameters?
Hi author, I didn't find the SAM block in the code, please tell me where it is, thanks!
I am not sure I train rightly.
some setting:
Hi @AFeng-x, thanks for sharing the great SMT work!
I'd like to bring up another highly related hierarchical vision transformer that also deals with the scale problems: MaxViT: Multi-Axis Vision Transformer [ECCV 2022]. I'm wondering if you could also add our work to your comparison figure and tables? Thanks a lot!
作者你好,非常感谢你们的工作,但是我没有找到如何可以进行语义分割训练的文件
Hi there! This is a nice work. But I have a little query about the motivation of the architecture design. In the paper "Based on the research conducted in [11, 4], which performed a quantitative analysis of different depths of self-attention blocks and discovered that shallow blocks tend to capture short-range dependencies while deeper ones capture long-range dependencies".
From my knowledge, the transformer can always model globally and can capture high effective receptive field from initial stages. Why did you refer that the shallow blocks capture short-range dependencies and deeper ones capture long-range dependencies? Why did they both capture long-range? Why did shallow capture short range but deep capture long-range?
Thank you for your great work. I'm quite interested in your visualization works. It will contribute much to the research community.
It would be better if the authors could provide the code to visualize feature maps across heads and how to compute and draw relative receptive fields.
Thank you so much in advance.
Thank you for your sharing, may I ask how the figure 4 in the paper is drawn? Can you provide the code?
Hi author, your code is great, but when I introduce your SMT module for training I always have a case of not enough memory in the attn = (q @ k.transpose(-2, -1)) * self.scale statement when calculating the Attention, and it's not enough for me to set the Batchsize to 1. Can the author give some ideas how to modify it, please. I'm only using stage3's structure
Your work is so great, thank you for your contribution to science, but I am having problems reproducing it:TypeError: RetinaNet: init_weights() got an unexpected keyword argument 'pretrained' , may I ask what is your version of mmdet, I am having the above problem with both MaskRCNN and RetinaNet.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.