Light

ppriyank / video-action-transformer-network-pytorch- Goto Github PK

View Code? Open in Web Editor NEW

133.0 4.0 21.0 18 KB

Implementation of the paper Video Action Transformer Network

Python 100.00%

transformer pytorch video videotransformer actionrecognition pytorch-transformers tensorflow-transformer

video-action-transformer-network-pytorch-'s Introduction

Video-Action-Transformer-Network-Pytorch-

Pytorch and Tensorflow Implementation of the paper Video Action Transformer Network
Rohit Girdhar, Joao Carreira, Carl Doersch, Andrew Zisserman

Retasked Video transformer (uses resnet as base) transformer_v1.py is more like real transformer, transformer.py more true to what paper advertises Usage :

from transformer_v1 import Semi_Transformer
model = Semi_Transformer(num_classes=num_classes , num_frames = max_seq_len)
outputs, features = model(imgs) # outputs is the classification layer output (do cross entropy loss)
                                #features are used as video embedding
                                
##################### or ###################
from transformer_v2 import Semi_Transformer
model = Semi_Transformer(num_classes=625 , seq_len = max_seq_len)

In case you find any discrepency, please raise an issue. If any one was able to reproduce the paper results kindly help me with this issue. If possible please meantion the changes needs to be further added.

video-action-transformer-network-pytorch-'s People

Contributors

Stargazers

Watchers

video-action-transformer-network-pytorch-'s Issues

One implementation problem in transformer_v3.py

It seems that the function defined here needs to be modified as follows:

    def forward(self, q, k, v, mask=None):
        bs = k.shape[0]
        k = k.view(bs, -1, self.head, self.d_k)
        q = F.relu(self.q_linear(q).view(bs, self.head, self.d_k))
        ...

Please help check this.
BTW, thanks for sharing this implementation.

About replication

Hi, thanks for this helpful code! Have you replicated performance mentioned in the original paper?

How to slide the output from resnet to get the temporally central feature map?

The paper mentions that the output from the trunk is sliced to get the central frame's feature map and is then passed through the RPN.

"We slice out the temporally-central frame from this feature map and pass it through a region proposal network
(RPN)..."

How should I slice the output from the trunk? What is the input dimension expected by Tail?

Regression with video transformer?

Do I need to edit the BNClassifier class for regression problems? Training with mse loss and sgd set to 1e-3 doesn't converge but I can't intuitively work out why

issue raised

Hi,
Thanks for your awesome project. I found a dependence issue when I try to run transformer_v3.py. Could you please tell me which dependence do I need? Thanks in advance!

from tools import BNClassifier, BottleSoftmax
ImportError: cannot import name 'BNClassifier' from 'tools' (unknown location)

Query, key, and value are different from the paper

Hi!

In the paper, it mentions that the query is an ROI, and key and values are from the clip around an ROI. But in this implementation, it seems that all query, key and value are from the features in the same spatial locations. In that case, it will not be able to aggregate another person that the target person is talking to.

Please clarify this I am mistaken. Thank you very much!

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.