Git Product home page Git Product logo

xiaocai-rookie / paddlevideo Goto Github PK

View Code? Open in Web Editor NEW

This project forked from paddlepaddle/paddlevideo

0.0 1.0 0.0 43.34 MB

Comprehensive, latest, and deployable video deep learning algorithm, including video recognition, action localization, and temporal action detection tasks. It's a high-performance, light-weight codebase provides practical models for video understanding research and application

License: Apache License 2.0

Python 99.26% Shell 0.74%

paddlevideo's Introduction

简体中文 | English

PaddleVideo

Introduction

python version paddle version

PaddleVideo is a toolset for video recognition, action localization, and spatio temporal action detection tasks prepared for the industry and academia. This repository provides examples and best practice guildelines for exploring deep learning algorithm in the scene of video area. We devote to support experiments and utilities which can significantly reduce the "time to deploy". By the way, this is also a proficiency verification and implementation of the newest PaddlePaddle 2.0 in the video field.


Feature

  • Advanced model zoo design PaddleVideo unifies the video understanding tasks, including recogniztion, localization, spatio temporal action detection, and so on. with the clear configuration system based on IOC/DI, we design a decoupling modular and extensible framework which can easily construct a customized network by combining different modules.

  • Various dataset and architectures PaddleVideo supports more datasets and architectures, including Kinectics400, ucf101, YoutTube8M datasets, and video recognition models, such as TSN, TSM, SlowFast, AttentionLSTM and action localization model, like BMN.

  • Higher performance PaddleVideo has built-in solutions to improve accuracy on the recognition models. PP-TSM, which is based on the standard TSM, already archive the best performance in the 2D recognition network, has the same size of parameters but improve the Top1 Acc to 73.5% , and one can easily apply the soulutions on his own dataset.

  • Faster training strategy PaddleVideo suppors faster training strategy, it accelerates by 100% compared with the standard Slowfast version, and it only takes 10 days to train from scratch on the kinetics400 dataset.

  • Deployable PaddleVideo is powered by the Paddle Inference. There is no need to convert the model to ONNX format when deploying it, all you want can be found in this repository.

Overview of the kit structures

Architectures Frameworks Components Data Augmentation
  • Recognition
    • TSN
    • TSM
    • SlowFast
    • PP-TSM
    • VideoTag
    • AttentionLSTM
  • Localization
    • BMN
  • Recognizer1D
  • Recognizer2D
  • Recognizer3D
  • Localizer

    • Backbone
    • resnet
    • resnet_tsm
    • resnet_tweaks_tsm
    • bmn
      Head
    • tsm_head
    • tsn_head
    • bmn_head
    • Solver
      • Optimizer
        • Momentum
        • RMSProp
      • LearningRate
        • PiecewiseDecay
    • Loss
      • CrossEntropy
      • BMNLoss
    • Metrics
      • CenterCrop
      • MultiCrop
    • Batch
      • Mixup
      • Cutmix
    • Image
      • Scale
      • Random FLip
      • Jitter Scale
      • Crop
      • MultiCrop
      • Center Crop
      • MultiScaleCrop
      • Random Crop
      • PackOutput

    Overview of the performance

    The chart below illustrates the performance of the video recognition models both 2D and 3D architectures, including our implementation and Pytorch version. It shows the relationship between Acc Top1 and VPS on the Kinectics400 dataset. (Tested on the NVIDIA® Tesla® GPU V100.)

    Note:

    • PP-TSM improves almost 3.5% Top1 accuracy from standard TSM.
    • all these models described by RED color can be obtained in the Model Zoo, and others are Pytorch results.

    Community

    • Scan the QR code below with your Wechat, you can access to official technical exchange group. Look forward to your participation.

    Special Applications



    Tutorials and Docs

    License

    PaddleVideo is released under the Apache 2.0 license.

    Contributing

    This poject welcomes contributions and suggestions. Please see our contribution guidelines.

    paddlevideo's People

    Contributors

    chajchaj avatar d-danielyang avatar dreamer121121 avatar huangjun12 avatar shippingwang avatar xiaoguanghu01 avatar

    Watchers

     avatar

    Recommend Projects

    • React photo React

      A declarative, efficient, and flexible JavaScript library for building user interfaces.

    • Vue.js photo Vue.js

      🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

    • Typescript photo Typescript

      TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

    • TensorFlow photo TensorFlow

      An Open Source Machine Learning Framework for Everyone

    • Django photo Django

      The Web framework for perfectionists with deadlines.

    • D3 photo D3

      Bring data to life with SVG, Canvas and HTML. 📊📈🎉

    Recommend Topics

    • javascript

      JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

    • web

      Some thing interesting about web. New door for the world.

    • server

      A server is a program made to process requests and deliver data to clients.

    • Machine learning

      Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

    • Game

      Some thing interesting about game, make everyone happy.

    Recommend Org

    • Facebook photo Facebook

      We are working to build community through open source technology. NB: members must have two-factor auth.

    • Microsoft photo Microsoft

      Open source projects and samples from Microsoft.

    • Google photo Google

      Google ❤️ Open Source for everyone.

    • D3 photo D3

      Data-Driven Documents codes.