Git Product home page Git Product logo

video-classification-3d-cnn-pytorch's Introduction

Video Classification Using 3D ResNet

This is a pytorch code for video (action) classification using 3D ResNet trained by this code.
The 3D ResNet is trained on the Kinetics dataset, which includes 400 action classes.
This code uses videos as inputs and outputs class names and predicted class scores for each 16 frames in the score mode.
In the feature mode, this code outputs features of 512 dims (after global average pooling) for each 16 frames.

Torch (Lua) version of this code is available here.

Requirements

conda install pytorch torchvision cuda80 -c soumith
  • FFmpeg, FFprobe
wget http://johnvansickle.com/ffmpeg/releases/ffmpeg-release-64bit-static.tar.xz
tar xvf ffmpeg-release-64bit-static.tar.xz
cd ./ffmpeg-3.3.3-64bit-static/; sudo cp ffmpeg ffprobe /usr/local/bin;
  • Python 3

Preparation

  • Download this code.
  • Download the pretrained model.
    • ResNeXt-101 achieved the best performance in our experiments. (See paper in details.)

Usage

Assume input video files are located in ./videos.

To calculate class scores for each 16 frames, use --mode score.

python main.py --input ./input --video_root ./videos --output ./output.json --model ./resnet-34-kinetics.pth --mode score

To visualize the classification results, use generate_result_video/generate_result_video.py.

To calculate video features for each 16 frames, use --mode feature.

python main.py --input ./input --video_root ./videos --output ./output.json --model ./resnet-34-kinetics.pth --mode feature

Citation

If you use this code, please cite the following:

@article{hara3dcnns,
  author={Kensho Hara and Hirokatsu Kataoka and Yutaka Satoh},
  title={Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?},
  journal={arXiv preprint},
  volume={arXiv:1711.09577},
  year={2017},
}

video-classification-3d-cnn-pytorch's People

Contributors

kenshohara avatar

Watchers

James Cloos avatar Tom avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.