Git Product home page Git Product logo

rt-mdnet's Introduction

RT-MDNet: Real-Time Multi-Domain Convolutional Neural Network Tracker

Created by Ilchae Jung, Jeany Son, Mooyeol Baek, and Bohyung Han

Introduction

RT-MDNet is the real-time extension of MDNet and is the state-of-the-art real-time tracker. Detailed description of the system is provided by our project page and paper

Citation

If you're using this code in a publication, please cite our paper.

@InProceedings{rtmdnet,
author = {Jung, Ilchae and Son, Jeany and Baek, Mooyeol and Han, Bohyung},
title = {Real-Time MDNet},
booktitle = {European Conference on Computer Vision (ECCV)},
month = {Sept},
year = {2018}
}

System Requirements

This code is tested on 64 bit Linux (Ubuntu 16.04 LTS).

Prerequisites 0. PyTorch (>= 0.2.1) 0. For GPU support, a GPU (~2GB memory for test) and CUDA toolkit. 0. Training Dataset (ImageNet-Vid) if needed.

Online Tracking

Pretrained Model and results If you only run the tracker, you can use the pretrained model: RT-MDNet-ImageNet-pretrained. Also, results from pretrained model are provided in here.

Demo 0. Run 'Run.py'.

Learning RT-MDNet

Preparing Datasets 0. If you download ImageNet-Vid dataset, you run 'modules/prepro_data_imagenet.py' to parse meta-data from dataset. After that, 'imagenet_refine.pkl' is generized. 0. type the path of 'imagenet_refine.pkl' in 'train_mrcnn.py'

Demo 0. Run 'train_mrcnn.py' after hyper-parameter tuning suitable to the capacity of your system.

rt-mdnet's People

Contributors

ilchaejung avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

rt-mdnet's Issues

samples2maskroi

I cannot understand these code,why the receptive field is subtracted to x2,y2, could you explain more about the reason?
rois[:, 0] *= cur_resize_ratio[0]
rois[:, 1] *= cur_resize_ratio[1]
rois[:, 2] = np.maximum(rois[:,0]+1,rois[:, 2]*cur_resize_ratio[0] - receptive_field)
rois[:, 3] = np.maximum(rois[:,1]+1,rois[:, 3]*cur_resize_ratio[1] - receptive_field)
why not like this:
rois[:, 0] *= cur_resize_ratio[0]
rois[:, 1] *= cur_resize_ratio[1]
rois[:, 2] *= cur_resize_ratio[0]
rois[:, 3] *= cur_resize_ratio[1]

about padding

padded_x1 = (neg_examples[:,0]-neg_examples[:,2](opts['padding']-1.)/2.).min()
padded_y1 = (neg_examples[:,1]-neg_examples[:,3]
(opts['padding']-1.)/2.).min()
padded_x2 = (neg_examples[:,0]+neg_examples[:,2](opts['padding']+1.)/2.).max()
padded_y2 = (neg_examples[:,1]+neg_examples[:,3]
(opts['padding']+1.)/2.).max()

I don't understand calculating padding like this. The neg_examples bbox is [x,y,w,h]. Why (x-w)*0.1?
What's the meaning? Can anyone explain to me。

About receptive_field

rois[:, 0] *= cur_resize_ratio[0]
rois[:, 1] *= cur_resize_ratio[1]
rois[:, 2] = np.maximum(rois[:,0]+1,rois[:, 2]*cur_resize_ratio[0] - receptive_field)
rois[:, 3] = np.maximum(rois[:,1]+1,rois[:, 3]*cur_resize_ratio[1] - receptive_field)

I have question about receptiive_field, why there rois[:, 2]*cur_resize_ratio[0] minus receptive_field? what's the meaning about receptive_field? can anyone answer me?

any plan to support PyTorch1.0

Great work, thanks very much!

Any plan to upgrade to PyTorch1.0?

Any suggestion if I am upgrading to PyTorch1.0 myself, I will be happy to share once done.

run script not working

/neural-networks/RT-MDNet$ ./Run.py
from: can't read /var/mail/os.path
from: can't read /var/mail/tracker
./Run.py: line 13: syntax error near unexpected token (' ./Run.py: line 13: def genConfig(seq_path, set_type):'

Can not display?

when I "python Run.py -visualize",show error

Traceback (most recent call last):
File "Run.py", line 119, in
iou_result, result_bb, fps, result_nobb = run_mdnet(img_list, gt[0], gt, seq = seq, display=opts['visualize'])
File "/home/zhulishun/tracking/RT-MDNet-master/tracker.py", line 422, in run_mdnet
im = ax.imshow(cur_image, aspect='normal')
File "/home/zhulishun/anaconda2/envs/rtmdnet/lib/python2.7/site-packages/matplotlib/init.py", line 1867, in inner
return func(ax, *args, **kwargs)
File "/home/zhulishun/anaconda2/envs/rtmdnet/lib/python2.7/site-packages/matplotlib/axes/_axes.py", line 5496, in imshow
self.set_aspect(aspect)
File "/home/zhulishun/anaconda2/envs/rtmdnet/lib/python2.7/site-packages/matplotlib/axes/_base.py", line 1373, in set_aspect
aspect = float(aspect) # raise ValueError if necessary
ValueError: could not convert string to float: normal

about GPU memory size

thanks for your work,my gpu is 2GB. but still shows out of memory. And I try to run it on CPU and change the GPU setting to FALSE. However, it cannot work.

about the randomness

I think that the randomness only comes from SampleGenerator, and only np.rand is used in SampleGenerator. I have fixed the random seed of numpy, random and pytorch. But I cannot get the same result on the same video.
I really wonder where the randomness comes from.

The padding operation discussion

Hi, thanks for releasing the excellent work.

However, I have several points that cannot figure out well.

Firstly it is in the modules/pretrain_opts.py and options.py. As we can see, there are parameters like padding_ratio and jitter. I also found in https://github.com/IlchaeJung/RT-MDNet/blob/master/modules/data_prov.py, we compute the extra padding area to get a larger image and then we use jitter to scale this image, finally we crop the positive and negative regions. This operations are also applied when online tracking.

Is this just a means of augmentation? Or because MDNet's conv layers have no padding, therefore you add some uncertain padding to enlarge the origin image size? I cannot figure it out and it seems that your paper doesn't explain it at all. Can you please why we do this padding and jitter operations?

Thank you very much!

about data pre-prepare

I noticed you also provide scripts to pre-prepare dataset like VOT/OTB, but it seems that it lack some files like otb-vot15.txt or vot-otb.txt, sincerely hope for your reply!

visual tracking result

This is my resule,
'fps': {'Basketball': 19.858671676184922, 'Baby_ce': 22.22692606990516}}
I want ti know 'How to put RT-MDNet into OTB to evaluation'
Is this program have a visual tracking result?

about ROI align

Thank you for releasing this excellent work.
I can successfully run this code on a 1080 TI GPU, while when I run the code using my K80 GPU, the error "cudaCheckError() failed : invalid device function" occurs.
I think the error may because the ROI align. I wonder how to recompile the ROI align module so that I can try it on different GPU devices? Thank you.

License

Hi,
Which license this repository uses?

no use of 'frame_interval'

Thank you for your work!
in train_mrcnn.py, what's the meaning of variable frame_interval, it seems it does not be used?

The results of OTB2015

Thank you for releasing this excellent work.
I tested your model on OTB2015 and the result was 0.632,This result is lower than 0.650 in your paper. I use the default parameter settings of your code. Could you please tell me how to solve this issue? Thank you very much.

The results of OTB2015 and training time

Thank you for releasing this excellent work.
I have two questions to ask you.
1. I tested your model on OTB2015 and the result was 0.632,This result is lower than 0.650 in your paper. I use the default parameter settings of your code. Could you please tell me how to solve this issue?
2. I train ImageNet-Vid dataset on a GeForce RTX 2080, how long will it take?
Thank you very much.

Error during code execution

1、FileNotFoundError: [Errno 2] No such file or directory: './models/rt-mdnet.pth'
Hello, I did not find the rt-mdnet.pth file in the code. How should I solve this problem?
2、
image
I want to use cpu to execute this code, can the code be executed by cpu? Because of the ‘ImportError: libcudart.so.10.0: cannot open shared object file: No such file or directory ’ in the image, I have commented on these ‘import’ statements, so it has no effect on the cpu execution of this code? ?
Thank you very much.

How to start?

1.Where put pretrained model ?You instruction is too siample to understand.

put it on'./modules'

the GPU memory

If the target size is small the size of the cropped img will become very large (6000*7000). My GPU has 12GB memory and is not enough. Is there any way to fix it? Maybe there are some memory leak when the opts['jitter'] is True.

About feature size

I found that the relationship between the size of conv3 output feature and the input image is not strictly one eighth, will it affect roalign to extract sample features?

ImportError: ./modules/roi_align/_ext/roi_align/_roi_align.so: undefined symbol: state

I get the following error:

Traceback (most recent call last):
  File "Run.py", line 3, in <module>
    from tracker import *
  File "/my_dummy_path/RT-MDNet/tracker.py", line 15, in <module>
    from data_prov import *
  File "./modules/data_prov.py", line 18, in <module>
    from img_cropper import *
  File "./modules/img_cropper.py", line 3, in <module>
    from roi_align.modules.roi_align import RoIAlign
  File "./modules/roi_align/modules/roi_align.py", line 3, in <module>
    from ..functions.roi_align import RoIAlignFunction, RoIAlignAdaFunction, RoIAlignDenseAdaFunction
  File "./modules/roi_align/functions/roi_align.py", line 3, in <module>
    from .._ext import roi_align
  File "./modules/roi_align/_ext/roi_align/__init__.py", line 3, in <module>
    from ._roi_align import lib as _lib, ffi as _ffi
ImportError: ./modules/roi_align/_ext/roi_align/_roi_align.so: undefined symbol: state

How to solve it?

about extra_pos_feats and pos_feats

I don't understand why get the training feat twice in the first frames to fintune. extra_pos_feats and pos_feats come from by almost same way.

About the Training loss

ImageNet-vid is too big so i decide to use the VOT dataset to train the model.
would u mind to tell me the specific data :'Mean precision & Inter loss' during your training process(using Imagenet-vid)?

only give first frame groundtruth

How to use in a movie recorded by myself?I only want to give groundtruth of the first frame,but how can I do ?I would be thank you very much.Only a tip is fine.

the results of OTB2015,fix random seeds

I didn't change any parameters. Because of the randomness of the tracking results, I fixed the random seeds. But the final result is lower than that shown in the paper. What's the matter?
The result of the code I run is PR=0.847,SR=0.632.

Time cost for offline pretraining?

Thanks for your wonderful work!

I notice that in the pretrain_opts: n_cycles: 1000
And in train_mrcnn.py, K is the number of VID videos, it is 3499.(Some of them are filtered by preprocessing)

It seems that it is really huge inputs for MDNet. I want to ask for two values:

  • The offline training time for each iteration
  • The total time for the whole offline training

Thanks for your attention.

bb_result and bb_result_nobb

I transfor 'result.npy' to'result.mat',found .mat file has' bb_result' and 'bb_result_nobb ',what does those result maens? Can you help me to learn more information,I just really interested in it.

random results on OTB

I tried several sequences of otb, but the results of each run are different.

0 Bird1 : 0.201098956064 , total mIoU:0.201098956064, fps:27.1192838431
1 Box : 0.302790951364 , total mIoU:0.251944953714, fps:34.7540858495
2 Couple : 0.662352476224 , total mIoU:0.388747461218, fps:37.7562471058
3 Freeman4 : 0.666170731107 , total mIoU:0.45810327869, fps:39.7685989323
4 BlurBody : 0.722626621854 , total mIoU:0.511007947323, fps:39.1456625614
5 Jumping : 0.678226531945 , total mIoU:0.538877711426, fps:41.4720706029

0 Bird1 : 0.111411426274 , total mIoU:0.111411426274, fps:15.230079837
1 Box : 0.699401980322 , total mIoU:0.405406703298, fps:27.8460658587
2 Couple : 0.682630547749 , total mIoU:0.497814651448, fps:33.7325697033
3 Freeman4 : 0.611166113115 , total mIoU:0.526152516865, fps:36.3247190875
4 BlurBody : 0.718198220241 , total mIoU:0.56456165754, fps:36.3292326732
5 Jumping : 0.676521638398 , total mIoU:0.58322165435, fps:38.9468761315

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.