Git Product home page Git Product logo

video-action-transformer-network-pytorch-'s Introduction

Video-Action-Transformer-Network-Pytorch-

Pytorch and Tensorflow Implementation of the paper Video Action Transformer Network
Rohit Girdhar, Joao Carreira, Carl Doersch, Andrew Zisserman

Retasked Video transformer (uses resnet as base) transformer_v1.py is more like real transformer, transformer.py more true to what paper advertises Usage :

from transformer_v1 import Semi_Transformer
model = Semi_Transformer(num_classes=num_classes , num_frames = max_seq_len)
outputs, features = model(imgs) # outputs is the classification layer output (do cross entropy loss)
                                #features are used as video embedding
                                
##################### or ###################
from transformer_v2 import Semi_Transformer
model = Semi_Transformer(num_classes=625 , seq_len = max_seq_len)

In case you find any discrepency, please raise an issue. If any one was able to reproduce the paper results kindly help me with this issue. If possible please meantion the changes needs to be further added.

video-action-transformer-network-pytorch-'s People

Contributors

ppriyank avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

video-action-transformer-network-pytorch-'s Issues

One implementation problem in transformer_v3.py

It seems that the function defined here needs to be modified as follows:

    def forward(self, q, k, v, mask=None):
        bs = k.shape[0]
        k = k.view(bs, -1, self.head, self.d_k)
        q = F.relu(self.q_linear(q).view(bs, self.head, self.d_k))
        ...

Please help check this.
BTW, thanks for sharing this implementation.

About replication

Hi, thanks for this helpful code! Have you replicated performance mentioned in the original paper?

How to slide the output from resnet to get the temporally central feature map?

The paper mentions that the output from the trunk is sliced to get the central frame's feature map and is then passed through the RPN.

"We slice out the temporally-central frame from this feature map and pass it through a region proposal network
(RPN)..."

How should I slice the output from the trunk? What is the input dimension expected by Tail?

Regression with video transformer?

Do I need to edit the BNClassifier class for regression problems? Training with mse loss and sgd set to 1e-3 doesn't converge but I can't intuitively work out why

issue raised

Hi,
Thanks for your awesome project. I found a dependence issue when I try to run transformer_v3.py. Could you please tell me which dependence do I need? Thanks in advance!

from tools import BNClassifier, BottleSoftmax
ImportError: cannot import name 'BNClassifier' from 'tools' (unknown location)

Query, key, and value are different from the paper

Hi!

In the paper, it mentions that the query is an ROI, and key and values are from the clip around an ROI. But in this implementation, it seems that all query, key and value are from the features in the same spatial locations. In that case, it will not be able to aggregate another person that the target person is talking to.

Please clarify this I am mistaken. Thank you very much!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.