Git Product home page Git Product logo

pygoturn's Introduction

PyTorch GOTURN tracker

This is the PyTorch implementation of GOTURN visual tracker (Held. et. al, ECCV 2016). GOTURN is one of the key trackers which proposed an alternative deep learning approach to object tracking by learning a comparator function.

Why PyTorch implementation?

Although author's original C++ Caffe implementation and this Python Caffe implementation are well-documented, I feel a PyTorch implementation would be more readable and much easier to adapt for further research. Hence, this is my humble attempt to reproduce GOTURN from scratch in PyTorch which includes data loading, training and inference. I hope this is a useful contribution to the vision community.

Highlights

  • Supports PyTorch 1.0 and Python3.
  • Reproduces GOTURN end to end in PyTorch including training and inference.
  • Provides pretrained PyTorch GOTURN model.
  • Fast: Tracks target objects at 100+ fps.
  • Benchmark: Evaluation on OTB50 and OTB100.

Output

Environment

PyTorch 1.0 and Python3 recommended.

numpy==1.14.5
torch==1.0.0
opencv-python==4.0.0.21
torchvision==0.2.1
tensorboardX==1.6

To install all the packages, do pip3 install -r requirements.txt.

Demo

Navigate to pygoturn/src and do:

python3 demo.py -w /path/to/pretrained/model

Images with bounding box predictions will be saved in pygoturn/result directory.

Arguments:

-w / --model-weights: Path to a PyTorch pretrained model checkpoint.
-d / --data-directory: Path to a tracking sequence which follows OTB format.
-s / --save-directory: Directory to save sequence images with predicted bounding boxes.

Benchmark

To evaluate PyTorchGOTURN on OTB50 and OTB100, follow the steps below:

  • Install got10k toolkit.
    pip install --upgrade got10k
    
  • Download pretrained model.
  • Edit OTB dataset path and model path appropriately in src/evaluate.py. The script will automatically download OTB dataset at the path provided.
  • Run evaluation script:
    python3 evaluate.py
    

Performance

Dataset AUC Precision
OTB50 0.401 0.548
OTB100 0.405 0.550

As per foolwood/benchmark_results, the original Caffe GOTURN yields AUC: 0.427 and Precision: 0.572 on OTB100. I feel this minor difference in performance is due to difference in the way ImageNet models are trained in Caffe and PyTorch like input normalization, layer specific learning rates etc. In this repository, I followed exact GOTURN hyperparameters which may not be the best for PyTorch. I feel with some hyperparameter tuning, GOTURN performance can be reproduced with an end-to-end PyTorch model.

Feel free to contribute to this project, if you have any improvements!

Fast inference

In order to benchmark results for a tracking sequence or do fast inference, run the following command:

python3 test.py -w ../checkpoints/pytorch_goturn.pth.tar -d ../data/OTB/Man

Arguments:

-w / --model-weights: Path to a PyTorch pretrained model checkpoint.
-d / --data-directory: Path to a tracking sequence which follows OTB format.

Training

Please follow the steps below for data preparation and training a pygoturn model from scratch.

Prepare training data

Navidate to pygoturn/data.

Either use download.sh script to automatically download all datasets or manually download them from the links below in pygoturn/data:

Once you have all the above files in pygoturn/data, use pygoturn/data/setup.sh script to setup datasets in the way pygoturn training script /src/train.py expects OR follow the manual steps below:

  • Untar ILSVRC2014_DET_train.tar. You'll have a directory ILSVRC2014_DET_train containing multiple tar files.
  • First, delete all the tar files in ILSVRC2014_DET_train directory which start with name ILSVRC2013. This is an important step to reproduce the exact same number of ImageNet training samples (239283) as described in GOTURN paper.
  • Untar all the remaining tar files in ILSVRC2014_DET_train. When done, delete all *.tar files. Since there are several tar files to untar, you can use data/untar.sh script. Just copy untar.sh to ILSVRC2014_DET_train directory and do: ./untar.sh. Delete untar.sh from data/ILSVRC2014_DET_train when you are done.
  • Untar ILSVRC2014_DET_bbox_train.tgz.
  • Unzip alov300++_frames.zip and alov300++GT_txtFiles.zip.

Once you finish data preparation, make sure that you have the following directories:

data/ILSVRC2014_DET_train
data/ILSVRC2014_DET_bbox_train
data/imagedata++
data/alov300++_rectangleAnnotation_full

Kick off training!

Navigate to pygoturn/src and run the following command to train GOTURN with default parameters:

python3 train.py

All the parameters for GOTURN training can be passed as arguments. View pygoturn/src/train.py for more details regarding arguments.

Citation

If you find this code useful in your research, please cite:

@inproceedings{held2016learning,
  title={Learning to Track at 100 FPS with Deep Regression Networks},
  author={Held, David and Thrun, Sebastian and Savarese, Silvio},
  booktitle={European Conference Computer Vision (ECCV)},
  year={2016}
}

Acknowledgements

  • I'd like to thank the original authors for releasing a clean C++ implementation [davheld/GOTURN] and it was heavily referenced to tune hyperparameters appropriately.
  • This python caffe implementation [nrupatunga/PY-GOTURN] was pretty useful to understand GOTURN batch formation procedure. I borrowed some of its parts and adapted it to Pytorch.

License

MIT

pygoturn's People

Contributors

amoudgl avatar sydney0zq avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

pygoturn's Issues

Error when executing test.py which is similar to the error #17

The code is latest.
The problem description and exception infomation are same as the issue #17.
However, the code worked finely when using the pretrained model file which was downloaded from Google Drive. So, I am thinking that whether the number of batch is too small when training the model manually.
P.S. My num_batch is 50 when training the model manually by executing the command python train.py.

Model Load Error

Hi I suggest to change test.py line 37 to self.model.load_state_dict(checkpoint) otherwise it will be an error while loading the final_model.pth .

Traceback (most recent call last):
  File "demo.py", line 100, in <module>
    main(args)
  File "demo.py", line 70, in main
    device)
  File "/home/mspl/pygoturn/src/test.py", line 39, in __init__
    self.model.load_state_dict(checkpoint['state_dict'])

Error when executing test.py

I completed training just fine, but I'm getting this error when I try to do evaluation:

Traceback (most recent call last):
  File "src/test.py", line 121, in <module>
    main()
  File "src/test.py", line 118, in main
    tester.test()
  File "src/test.py", line 103, in test
    sample = self[i]
  File "src/test.py", line 59, in __getitem__
    sample = self.get_sample(idx)
  File "src/test.py", line 74, in get_sample
    prev_img = transform_prev({'image':prev, 'bb':prevbb})['image']
  File "/z/sw/packages/pytorch/py3.5/0.3.0/lib/python3.5/site-packages/torchvision-0.2.0-py3.5.egg/torchvision/transforms/transforms.py", line 42, in __call__
  File "/z/home/natlouis/pygoturn/src/helper.py", line 28, in __call__
    h, w = image.shape[:2]
ValueError: not enough values to unpack (expected 2, got 0)

It seems that when it's doing the transform: CropPrev() and then Rescale(), the Rescale() object gets a null image from sample. But when I reverse the order of the operations (scale and then crop), this error doesn't occur.

What are the supported versions of Pytorch & Python?
I'm using

  • Python 3.5
  • Pytorch 0.3.0
  • scipy 1.0.0
  • scikit-image 0.13.1

an error about the use of exp_lr_scheduler

Hello, thank you for your great work, but I find an error about exp_lr_scheduler. For example, when itr > 0 and itr % args.lr_decay_step == 0, the lr should become 0.1*lr. However, the lr will continue decreasing until done==1. And the number of decreasing is related with batchsize. The more batchsize, the more the number of decreasing. So I set exp_lr_scheduler after “if done:”.

get unstable results when evaluate the model

Since we have nn.Dropout() layer in our model,during test,we should set the model to evaluation status by self.model.eval(), but in your code, test.py there is no such setting,so the Dropout layer is still working during test,thus we will get unstable output even from the same model and the same input.

But the worst part is, when we do not set self.model.eval(),we can get unstable but relativly correct output, but after you do set self.model.eval(),you will get even worse result,and in most of the time ,the result is totally wrong after a dozen of frames since the first frame is initialized by the groundtruth box.

I don't get it, and when I check the original GOTURN in caffe,I found the same setting,which means the train and test phase are using the same .prototxt file,is equal to set self.model.train() even in test time!
any ideas? thank you in advance.

Inference on KITTI

Hello there, I'm a student working on system level characterization of deep learning tasks and want to use GOTURN as an example for object tracking. I found your codebase is very concise and easy to understand. However, I'm not very familiar with object tracking task, so I want to know

  1. Can your pretrained model directly on KITTI tracking data without fine-tuning? I don't really need the inference accuracy to be optimal as long as it works well.
  2. If answer to the question 1 is yes, I guess all I need to do it to modify the test.py to make it work on non-OTB dataset, right?
  3. If not, how hard do you think is to retrain or fine-tune your pre-trained model on KITTI monocular tracking? Rough estimation is perfectly fine, I just want to get an idea of the difficulty.

Thank you for the code and any help you can provide!

Training content

Can you please tell me what are the training images ( size ) that the pretrained-model has been trained on?

Code cleanup

  • Remove extra whitespaces, lines.
  • Add comments wherever required.
  • Use minimum variables in functions.

[Bug] Generate batch sample function

In train.py, we generate one raw sample and then use motion model to generate 10 examples. But when we encounter a real video one, we should not use our motion model to generate 10 examples.

About helper.py

Hello, @amoudgl , I am reading your code one line by one line recently. I am having some troubles reading the helper.py file. I have a bit of confusion about those functions and variables' meaning. Through the paper and supplementary material, I think that I have understood completely the principle of GOTURN, but the reason why I am feeling confused is that I think the logic of image preprocessing is not so complicated.
So, can you make some explanations for me? thank you very much!

help on annotation of Alov300++

Hi man,
thank you for your code.
I want to train with custom video. I first apply annotation on sample images of alov dataset with rectangle but the rectangle is not correct. is the annotation of frame number correspond to image or not?
[https://ibb.co/YjZGD26](image with correspond annotation)

in another words, How I can annot my video frames?

thanks

pretrained model

pretrained model file seems to have been damaged?
could you fix it, thanks

The loss is not drop any more on got1k dataset?

Hi, I tried this code and train it on randomly selected 1k videos from got-10k dataset, and the loss droped normally from about 800+ to 100+ (about cost 98,500 iterations). However, the loss do not drop anymore, as following:
[training goturn2.0 joint conv+fc] step = 98534/500000, loss = 173.214868, time = 1.017638 [training goturn2.0 joint conv+fc] step = 98535/500000, loss = 185.998779, time = 1.318390 [training goturn2.0 joint conv+fc] step = 98536/500000, loss = 201.093970, time = 1.261940 [training goturn2.0 joint conv+fc] step = 98537/500000, loss = 162.243616, time = 1.298599 [training goturn2.0 joint conv+fc] step = 98538/500000, loss = 189.368579, time = 1.428025 [training goturn2.0 joint conv+fc] step = 98539/500000, loss = 186.761877, time = 1.108085 [training goturn2.0 joint conv+fc] step = 98540/500000, loss = 190.769653, time = 1.227486 [training goturn2.0 joint conv+fc] step = 98541/500000, loss = 160.548572, time = 1.023933 [training goturn2.0 joint conv+fc] step = 98542/500000, loss = 166.954614, time = 1.313365 [training goturn2.0 joint conv+fc] step = 98543/500000, loss = 158.464966, time = 1.047702 [training goturn2.0 joint conv+fc] step = 98544/500000, loss = 190.491577, time = 1.230863 [training goturn2.0 joint conv+fc] step = 98545/500000, loss = 190.102234, time = 0.995957 [training goturn2.0 joint conv+fc] step = 98546/500000, loss = 185.101526, time = 1.247413 [training goturn2.0 joint conv+fc] step = 98547/500000, loss = 177.304321, time = 1.103242 [training goturn2.0 joint conv+fc] step = 98548/500000, loss = 162.095325, time = 1.248142 [training goturn2.0 joint conv+fc] step = 98549/500000, loss = 155.143604, time = 1.182452 [training goturn2.0 joint conv+fc] step = 98550/500000, loss = 215.679712, time = 1.089420 [training goturn2.0 joint conv+fc] step = 98551/500000, loss = 172.392944, time = 1.197206 [training goturn2.0 joint conv+fc] step = 98552/500000, loss = 194.520471, time = 0.962296 [training goturn2.0 joint conv+fc] step = 98553/500000, loss = 213.370923, time = 1.385756 [training goturn2.0 joint conv+fc] step = 98554/500000, loss = 208.281494, time = 1.098781 [training goturn2.0 joint conv+fc] step = 98555/500000, loss = 154.950000, time = 1.852570 [training goturn2.0 joint conv+fc] step = 98556/500000, loss = 172.599121, time = 0.953801 [training goturn2.0 joint conv+fc] step = 98557/500000, loss = 152.899561, time = 1.244250 [training goturn2.0 joint conv+fc] step = 98558/500000, loss = 177.016675, time = 1.057501

So, do you see similar situations before? How can I further drop this loss? Because at current stage, I tried to run the tracker, the tracking results is really bad even though it is fast. Looking forward to your replay. Thanks.

The loss explodes

I am trying to train GOTURN, but the loss explodes.

Is it a version issue

ImageNet training

Add ImageNet training to GOTURN.

  • Add ImageNet loader to datasets.py.
  • Filter ImageNet dataset by removing images with objects covering 66% of the image size.
  • Implement random cropping (motion smoothness model) on the fly.
  • Update train.py with ImageNet training.

the choice of loss function and learning rate

Hi, guys,have you ever try to find out where the original caffe's L1 loss function really work in pytorch?

part 1:
In my experiment the following two loss function get absolutely different results:
1: loss_fn = torch.nn.L1Loss(size_average=False)
bad result,using lr=1e-5,see part 2 next
2: loss_fn = torch.nn.SmoothL1Loss(size_average=True)
relatively good result,using lr=5e-3,since the loss scale is roughly 1:100 compared with above L1 loss

part 2:
and there is one more thing,I think your learning rate is not right,since in the original GOTURN,the base_lr: 0.000001 is 1e-6 that's all right.but in the corresponding tracker.prototxt file,the learned fc layer has parameters like :
name: "fc6-new"
type: "InnerProduct"
bottom: "pool5_concat"
top: "fc6"
param {
lr_mult: 10
decay_mult: 1
}
which means the indeed lr for fc layer is equal to : base_lr*lr_mult=1e-5.
so the lr for fc layer should set to 1e-5 in our pytorch code.

so,as far as i can see,there are two problems remains:
1:should we use better loss function like SmoothL1Loss?
2:have you reproduce the original GOTURN result using this code? HOW? and what's the best learning rate schedule?

bounding box recenter and uncenter?

    def uncenter(self, raw_image, search_location, edge_spacing_x, edge_spacing_y):
        self.x1 = max(0.0, self.x1 + search_location.x1 - edge_spacing_x)
        self.y1 = max(0.0, self.y1 + search_location.y1 - edge_spacing_y)
        self.x2 = min(raw_image.shape[1], self.x2 + search_location.x1 - edge_spacing_x)
        self.y2 = min(raw_image.shape[0], self.y2 + search_location.y1 - edge_spacing_y)

    def recenter(self, search_loc, edge_spacing_x, edge_spacing_y, bbox_gt_recentered):
        bbox_gt_recentered.x1 = self.x1 - search_loc.x1 + edge_spacing_x
        bbox_gt_recentered.y1 = self.y1 - search_loc.y1 + edge_spacing_y
        bbox_gt_recentered.x2 = self.x2 - search_loc.x1 + edge_spacing_x
        bbox_gt_recentered.y2 = self.y2 - search_loc.y1 + edge_spacing_y

as aforemention, I dont understand the meaning of recenter and uncenter operation?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.