Git Product home page Git Product logo

cv_project's Introduction

PyTorch-Re3

Objective

Implementation of Re3 in PyTorch: Real-Time Recurrent Regression Networks for Visual Tracking of Generic Objects Based on the this paper by Daniel Gordon, Ali Farhadi, and Dieter Fox

Approach from the paper

Robust object tracking is a fundamental problem in Computer Vision. It plays an important role in several areas of robotics. Re3 provides a lightweight model to track objects robustly and also incorporate temporal information into its model. It handles temporary occlusion too.

Consists of convolutional layers to embed the object appearance, recurrent layers to remember appearance and motion information, and a regression layer to output the location of the object.

The way this is carried out is by the following steps:-

Object appearance embedding

  • Learn the feature extraction directly by using a convolutional pipeline that can be trained fully end-to-end on a large amount of data
  • At each frame, the network is feeded with a pair of crops from the image sequence. The first crop is centered at the object’s location in the previous image, whereas the second crop is in the same location, but in the current image. The crops are each padded to be twice the size of the object’s bounding box to provide the network with context
  • The crops are warped to be 227 × 227 pixels before being input into the network
  • Skip connections are used when spatial resolution decreases to give the network a richer appearance model
  • The skip connections are each fed through their own 1 × 1 × C convolutional layers where C is chosen to be less than the number of input channels

The two crops

Visualization of images with bounding box at frames i and i-1

Image crops fed to the network

Datasets

For training I have specifically used the ALOV300++ dataset. Although the actual paper suggests using ImageNet video dataset as well and to further make synthetic datapoints from the ImageNet video dataset, I sticked to using the ALOV300++ dataset due to a constraint on the resources available and the time for training.

A lot of code for preprocessing etc has been taken from the python version of goturn(as the feeding of crops etc were similar)

Requirements

matplotlib
numpy
torch
PIL
skimage
cuda(recommended)

Remaining work to be done

  • The paper mentions the use of two layer, factored-LSTM with peephole connections to be used which I could not find an implementation in PyTorch.
  • The paper also mentions to train the network on ILSVRC 2016 Object Detection from Video dataset (Imagenet Video) data and also to create synthetic data from it.
  • I could not complete the training procedure and so work needs to be done in that aspect specifically unrolling during training and the procedure of learning to fix mistakes.

Instructions to use code

In the ALOV dataset downloaded, keep the ground truth values and the actual data into a folder named 'alov' just outside the folder of this repository. Let the name of the directory for the actual video data frames be 'imagedata++' and the corresponding annotations be 'alov300++_rectangleAnnotation_full'. I also took out the last entires from each of the directories of imagedata++ to construct the test set. Place the test set in the imagedata++ directory.

The code can then be directly run using python3 trainModel.py To test the code, in the file testModel.py, change model_weights on line 26 to the file name of the saved model.

I could not complete the network's training using the entire dataset(due to it requiring a lot of computational time and resources) and hence the trained model is unavailable. So to test the network, it has to be trained first using the command mentioned above and then used for testing.

cv_project's People

Contributors

abhishekkumar16 avatar

Stargazers

 avatar

Watchers

 avatar paper2code - bot avatar

Forkers

naghmehg

cv_project's Issues

Bad result from the Re3 tracker

Hi, I have tried to implement your code, however, it will always encounter cuda out of memory when doing the training step.
I change some of the code to prevent the cuda memory problem, but the tracking result is very bad.
I would like to know whether you can get a good result from your Re3 tracker or not, if yes, could I know how to solve the cuda memory problems? I believe that it is because of the LSTM part. Thank you very much.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.