Git Product home page Git Product logo

smt's Introduction

smt's People

Contributors

afeng-x avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

smt's Issues

Long training time

Why did the training time for 10 batches increase several times when replacing the backbone network from CrossFormer-S to SMT-T, despite SMT-T having much fewer parameters and computations than CrossFormer-S?

I'm having a problem, can anyone help me with it please?

ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2694) of binary: /usr/bin/python
Traceback (most recent call last):
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launch.py", line 193, in
main()
File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launch.py", line 189, in main
launch(args)
File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launch.py", line 174, in launch
run(args)
File "/usr/local/lib/python3.6/dist-packages/torch/distributed/run.py", line 713, in run
)(*cmd_args)
File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launcher/api.py", line 131, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launcher/api.py", line 261, in launch_agent
failures=result.failures,
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

DW_Conv

Thank you very much for sharing your great work, in the paper I noticed that DW_Conv were used in the SAM module. If I use vanilla convolution instead of DW_Conv, will it have a performance improvement in addition to growing parameters?

train on IN-1K from scratch with smt_t; 65% top1 of 40 epoches.

I am not sure I train rightly.
some setting:

  • opt: adamw
  • baselr: 1e-3 warmup first 5 epoch
  • batch_size : 128
  • total epoch: 150 (not 300 because of too long gpu time with 10 days, 150 epoches ,4 T4 gpus)
  • lr_scheduler : cos from 5th epoch
    now trained 40 epoches ,65% top1 ,and increase rate slowly per epoch.
    and except total epoch(150) less than default 300;others are almost same as git .
    I want to know 2 information:
    1、Will total epoch that set 150 be big influence on final result ?
    2、Is it normal that 40 epoches ,65% top1 ? bacause training need long time ,I want to get some information before. If it is wrong,I will stop it early.

Query about the motivation

Hi there! This is a nice work. But I have a little query about the motivation of the architecture design. In the paper "Based on the research conducted in [11, 4], which performed a quantitative analysis of different depths of self-attention blocks and discovered that shallow blocks tend to capture short-range dependencies while deeper ones capture long-range dependencies".

From my knowledge, the transformer can always model globally and can capture high effective receptive field from initial stages. Why did you refer that the shallow blocks capture short-range dependencies and deeper ones capture long-range dependencies? Why did they both capture long-range? Why did shallow capture short range but deep capture long-range?

Visualization of modulation values, relative receptive field

Thank you for your great work. I'm quite interested in your visualization works. It will contribute much to the research community.
It would be better if the authors could provide the code to visualize feature maps across heads and how to compute and draw relative receptive fields.
Thank you so much in advance.

About the pth file

Thanks for your team's efforts. Could you please share this weight file?
image

How to draw the Fig.4?

Thank you for your sharing, may I ask how the figure 4 in the paper is drawn? Can you provide the code?

OutOfMemoryError

Hi author, your code is great, but when I introduce your SMT module for training I always have a case of not enough memory in the attn = (q @ k.transpose(-2, -1)) * self.scale statement when calculating the Attention, and it's not enough for me to set the Batchsize to 1. Can the author give some ideas how to modify it, please. I'm only using stage3's structure

mmdet

Your work is so great, thank you for your contribution to science, but I am having problems reproducing it:TypeError: RetinaNet: init_weights() got an unexpected keyword argument 'pretrained' , may I ask what is your version of mmdet, I am having the above problem with both MaskRCNN and RetinaNet.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.