Git Product home page Git Product logo

dynamic-vision-transformer's Introduction

Hi there 👋

I’m currently a Ph.D. student at Tsinghua University. 🔭

Yulin's GitHub stats

dynamic-vision-transformer's People

Contributors

blackfeather-wang avatar star9988rr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

dynamic-vision-transformer's Issues

Some questions wioth FLOPs calculation in ViT

Thanks for your great work. I am interested in the FLOPs reported in your paper like table 1 table 4. I am wondering if you can release the code of FLOPs calcuation for ViT. Thank you!

About the implementation of upsampling in relation_reuse

The main concern for me is that what is the necessity to split relation_temp as:

  split_index = int(relation_temp.size(0) / 2)
  relation_temp = torch.cat(
      (
          self.relation_reuse_upsample(relation_temp[:split_index * 1]),
          self.relation_reuse_upsample(relation_temp[split_index * 1:]),
      ), 0
  )

It is more straight to implement the upsample like this:

  relation_temp =  self.relation_reuse_upsample(relation_temp)

Could you please explain the difference between the above two implementations?

关于feature 和 relation reuse 的疑惑

  1. 我们知道一个transformer应该是由多个encoder blocks组成的,那么我好奇的是upstream transformer 最后一层的输出是否要与downstream transformer每一个encoder block中的mlp输出进行级联?
  2. 论文中提到要重用upstream transformer的attention logits, 也就是重用upstream transformer中由Q与K生成的attention map, 那么我所好奇的是,是不是要将upstream transformer 每一个encoder block中的 attention map都与 downstream transformer与之深度对应的encoder block的attention map 进行级联来达到relation resue的目的?
  3. 这种重用机制所带来的额外计算开销理论上来说是非常巨大的,就像densenet的dense connection, 而论文中提到额外的计算开销是很小的,那么我觉得只有一个理由能解释这种相对额外开销很小的原因就是每一个patch 进行linear projection后得到的D的数值是很小的。我这样理解对吗?

Error 'Unknown model (DVT_T2t_vit_12)'

Hi!

I try to evaluate the DVT_T2t_vit_12, then I run 'python inference.py --data_url ./data/ --batch_size 64 --model DVT_T2t_vit_12 --checkpoint_path .\checkpoint\DVT_T2t_vit_12.pth.tar --eval_mode 1', I get the error.

"
Traceback (most recent call last):
File "inference.py", line 226, in
main()
File "inference.py", line 57, in main
model = create_model(
File "A:\transformer\DViT\Dynamic-Vision-Transformer-main\Dynamic-Vision-Transformer-main\timm\models\factory.py", line 59, in create_model
raise RuntimeError('Unknown model (%s)' % model_name)
RuntimeError: Unknown model (DVT_T2t_vit_12)
"

And I try to print the _model_entrypoints, which in ..Dynamic-Vision-Transformer-main/timm/models/registry.py to find the model name'DVT_T2t_vit_12'. I don't see that.

env: python:3.8 pytorch:1.8.1 torchvision 0.9.1

termination condition

Your original article says that when the prediction result does not satisfy the termination condition, the model will improve the prediction accuracy by increasing the number of tokens and introducing additional Transformer layers. What is the termination condition here and how is it reflected in the code?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.