nnizhang / vst Goto Github PK

View Code? Open in Web Editor NEW

118.0 118.0 25.0 215 KB

source code of “Visual Saliency Transformer” (ICCV2021)

Python 99.77% Shell 0.23%

vst's People

Contributors

Stargazers

Watchers

vst's Issues

How long did it take you to complete the experiment

Hello，thank for your work，and I am interested in your module，I notice that you use a GTX 1080 Ti GPU to implement experiment，I wonder to know how long did it take you to complete the experiment？
thank you for your answering.

Generate contour maps for other dataset

Hi, How we can generate contour maps for other datasets. I checked the Egnet but didn't find anything about create contour maps.
They mentioned this sentence (We use the sal2edge.m to generate the edge label for training.) Are the edge label same as the contour maps?
@nnizhang

UserWarning: Grad strides do not match bucket view strides

Hi, thank you very much for releasing code for this inspiring work! When I try to run the code of the RGBD part, though the code is totally runnable, I encounter this warning at the beginning of training:

UserWarning: Grad strides do not match bucket view strides. This may indicate grad was not created according to the gradient layout contract, or that the param's strides changed since DDP was constructed.  This is not an error, but may impair performance.
grad.sizes() = [1, 64], strides() = [1, 1]
bucket_view.sizes() = [1, 64], strides() = [64, 1] (Triggered internally at ../torch/csrc/distributed/c10d/reducer.cpp:323.)
  Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass

I'm wondering what could be the cause of it and if it will really have any influence on performance. Thank you very much for your time and help!

model small

Hello, author. I changed the distributed training of the code into single-step training, and set batchsize to 8. The training model is smaller than what you provided, 174165KB.The test images are all gray.Can you tell me what's going on here

Pretrained T2T-ViT model can't be opened.

Thanks for your hard work.

I find a problem in your project on github, that is the pretrained T2T-ViT_t-14 model couldn't be opened. There are always problems, regarsless of in windows or Linux.

Testing without Masks

Hello! And thank you for this latest work. I do apologize for this question if there is an easy answer that I have missed in the code. The inclusion of the evaluator is super helpful, but was curious if it was possible to amend the code to allow for testing when I do not have a mask of the image I wish to test, only the image, and still output the predicted mask?

Why we need "Saliency Token"?

Hi, firstly, thanks for your amazing work!

I have a question about the model. I dont understand why in decoder we need to prepare a "saliency token" to the transformer.
I simply remove contour branch and purely use saliency branch, and delete the saliency token, the model will not work...
also I dont understand the function "saliency_token_inference", why we use feature as queue but use token as k and v...?

do you mind to explain a bit?

thanks

A question regarding the token based multi-task prediction

Excellent work! Thanks very much for the repo.

I have a question regarding the Equation (5) in the paper below. Given the output of sigmoid() is the attention (i.e., As, of size l1 x 1) between the task-specific token and all patch tokens, what does As*Vs mean if the Vs is a value of the task-specific token? Why not using values of patch tokens?

nnizhang / vst Goto Github PK

vst's People

Contributors

Stargazers

Watchers

Forkers

vst's Issues

How long did it take you to complete the experiment

Generate contour maps for other dataset

UserWarning: Grad strides do not match bucket view strides

model small

Pretrained T2T-ViT model can't be opened.

Testing without Masks

Why we need "Saliency Token"?

A question regarding the token based multi-task prediction

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent