nnizhang / vst Goto Github PK
View Code? Open in Web Editor NEWsource code of “Visual Saliency Transformer” (ICCV2021)
source code of “Visual Saliency Transformer” (ICCV2021)
Hello,thank for your work,and I am interested in your module,I notice that you use a GTX 1080 Ti GPU to implement experiment,I wonder to know how long did it take you to complete the experiment?
thank you for your answering.
Hi, thank you very much for releasing code for this inspiring work! When I try to run the code of the RGBD part, though the code is totally runnable, I encounter this warning at the beginning of training:
UserWarning: Grad strides do not match bucket view strides. This may indicate grad was not created according to the gradient layout contract, or that the param's strides changed since DDP was constructed. This is not an error, but may impair performance.
grad.sizes() = [1, 64], strides() = [1, 1]
bucket_view.sizes() = [1, 64], strides() = [64, 1] (Triggered internally at ../torch/csrc/distributed/c10d/reducer.cpp:323.)
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
I'm wondering what could be the cause of it and if it will really have any influence on performance. Thank you very much for your time and help!
Hello, author. I changed the distributed training of the code into single-step training, and set batchsize to 8. The training model is smaller than what you provided, 174165KB.The test images are all gray.Can you tell me what's going on here
Thanks for your hard work.
I find a problem in your project on github, that is the pretrained T2T-ViT_t-14 model couldn't be opened. There are always problems, regarsless of in windows or Linux.
Hello! And thank you for this latest work. I do apologize for this question if there is an easy answer that I have missed in the code. The inclusion of the evaluator is super helpful, but was curious if it was possible to amend the code to allow for testing when I do not have a mask of the image I wish to test, only the image, and still output the predicted mask?
Hi, firstly, thanks for your amazing work!
I have a question about the model. I dont understand why in decoder we need to prepare a "saliency token" to the transformer.
I simply remove contour branch and purely use saliency branch, and delete the saliency token, the model will not work...
also I dont understand the function "saliency_token_inference", why we use feature as queue but use token as k and v...?
do you mind to explain a bit?
thanks
Excellent work! Thanks very much for the repo.
I have a question regarding the Equation (5) in the paper below. Given the output of sigmoid() is the attention (i.e., As, of size l1 x 1) between the task-specific token and all patch tokens, what does As*Vs mean if the Vs is a value of the task-specific token? Why not using values of patch tokens?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.