Git Product home page Git Product logo

evad's Introduction

Official PyTorch Implementation of EVAD

EVAD Framework

Efficient Video Action Detection with Token Dropout and Context Refinement
Lei Chen, Zhan Tong, Yibing Song, Gangshan Wu, Limin Wang

News

[2023.07.14] Our EVAD is accepted by ICCV 2023!
[2023.06.09] Code and model weights have been released!

Installation

Please find installation instructions in INSTALL.md.

Data Preparation

Please follow the instructions in DATASET.md to prepare AVA dataset.

Model Zoo

method keep rate enhanced weight config backbone pre-train #frame x sample rate GFLOPs mAP model
EVAD 1.0 1 ViT_B_16x4 ViT-B (VideoMAE) K400 16x4 425 32.1 link
EVAD 0.7 1 ViT_B_16x4_KTP ViT-B (VideoMAE) K400 16x4 243 32.3 link
EVAD 0.6 4 ViT_B_16x4_KTP_EW ViT-B (VideoMAE) K400 16x4 209 31.8 link
EVAD 0.7 1 ViT_B_16x4_KTP ViT-B (VideoMAEv2) K710+K400 16x4 243 37.7 link
EVAD 0.7 1 ViT_L_16x4_KTP ViT-L (VideoMAE) K700 16x4 737 39.7 link

Training

python -m torch.distributed.launch --nproc_per_node=8 projects/evad/run_net.py --cfg "projects/evad/configs/config_file.yaml" DATA.PATH_TO_DATA_DIR "path/to/ava" TRAIN.CHECKPOINT_FILE_PATH "path/to/pretrain.pth" OUTPUT_DIR "path/to/output"

Validation

You can load specific checkpoint file with TEST.CHECKPOINT_FILE_PATH or autoload the last checkpoint from the output folder.

python -m torch.distributed.launch --nproc_per_node=1 projects/evad/run_net.py --cfg "projects/evad/configs/config_file.yaml" DATA.PATH_TO_DATA_DIR "path/to/ava" TRAIN.ENABLE False TEST.ENABLE True NUM_GPUS 1 OUTPUT_DIR "path/to/output"

Acknowledgements

This project is built upon SparseR-CNN and PySlowFast. We also reference and use some code from WOO and VideoMAE. Thanks to the contributors of these great codebases.

License

The majority of this project is released under the CC-BY-NC 4.0 license as found in the LICENSE file. Portions of the project are available under separate license terms: SlowFast and pytorch-image-models are licensed under the Apache 2.0 license. SparseR-CNN is licensed under the MIT license.

Citation

If you find this project useful, please feel free to leave a star and cite our paper:

@inproceedings{chen2023efficient,
    author    = {Chen, Lei and Tong, Zhan and Song, Yibing and Wu, Gangshan and Wang, Limin},
    title     = {Efficient Video Action Detection with Token Dropout and Context Refinement},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    year      = {2023}
}

@article{chen2023efficient,
  title={Efficient Video Action Detection with Token Dropout and Context Refinement},
  author={Chen, Lei and Tong, Zhan and Song, Yibing and Wu, Gangshan and Wang, Limin},
  journal={arXiv preprint arXiv:2304.08451},
  year={2023}
}

evad's People

Contributors

miasanlei avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

evad's Issues

jhmdb

Hi author, I am very interested in your article, but would like to reproduce your code with the jhmdb dataset. Can you provide some code for the processing of the jhmdb?thank you very much

No such file or directory: '../anno_person2v1/person_test2020.json'

Hi! Thanks for your work. I have been trying to run the evaluation script. However, I encountered an issue which I don't seem to have the knowledge to solve it. So, this is why I'm asking you, if you could provide any useful information regarding this issue. The error says I don't have the file 'person_test2020.json', which I don't. Where could I get this file from? Any useful information it's greatly appreciated. Thanks in advanced.

Sincerely,
Alberto

demo

非常感谢作者提供的视频动作检测源码,请问有计划提供演示demo吗?关于演示demo,有几个问题需要咨询一下作者:
1:是否和端到端的yowov2动作检测一样,只需要输入一段视频clip即可,但看提供的projects/evad/test_net.py 还需要输入人的坐标位置等信息,请问在实际用的时候,这个坐标位置信息是否可以填写为none,而人的坐标位置信息仅用于测试模型map指标?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.