Git Product home page Git Product logo

siam-mot's Introduction

SiamMOT

SiamMOT is a region-based Siamese Multi-Object Tracking network that detects and associates object instances simultaneously.

SiamMOT: Siamese Multi-Object Tracking,
Bing Shuai, Andrew Berneshawi, Xinyu Li, Davide Modolo, Joseph Tighe,

@inproceedings{shuai2021siammot,
  title={SiamMOT: Siamese Multi-Object Tracking},
  author={Shuai, Bing and Berneshawi, Andrew and Li, Xinyu and Modolo, Davide and Tighe, Joseph},
  booktitle={CVPR},
  year={2021}
}

Abstract

In this paper, we focus on improving online multi-object tracking (MOT). In particular, we introduce a region-based Siamese Multi-Object Tracking network, which we name SiamMOT. SiamMOT includes a motion model that estimates the instance’s movement between two frames such that detected instances are associated. To explore how the motion modelling affects its tracking capability, we present two variants of Siamese tracker, one that implicitly models motion and one that models it explicitly. We carry out extensive quantitative experiments on three different MOT datasets: MOT17, TAO-person and Caltech Roadside Pedestrians, showing the importance of motion modelling for MOT and the ability of SiamMOT to substantially outperform the state-of-the-art. Finally, SiamMOT also outperforms the winners of ACM MM’20 HiEve Grand Challenge on HiEve dataset. Moreover, SiamMOT is efficient, and it runs at 17 FPS for 720P videos on a single modern GPU.

Installation

Please refer to INSTALL.md for installation instructions.

Try SiamMOT demo

For demo purposes, we provide two tracking models -- tracking person (visible part) or jointly tracking person and vehicles (bus, car, truck, motorcycle, etc). The person tracking model is trained on COCO-17 and CrowdHuman, while the latter model is trained on COCO-17 and VOC12. Currently, both models used in demos use EMM as its motion model, which performs best among different alternatives.

In order to run the demo, use the following command:

python3 demos/demo.py --demo-video  PATH_TO_DEMO_VIDE --track-class person --dump-video True

You can choose person or person_vehicel for track-class such that person tracking or person/vehicle tracking model is used accordingly.

The model would be automatically downloaded to demos/models, and the visualization of tracking outputs is automatically saved to demos/demo_vis

We also provide several pre-trained models in model_zoos.md that can be used for demo.

Dataset Evaluation and Training

After installation, follow the instructions in DATA.md to setup the datasets. As a sanity check, the models presented in model_zoos.md can be used to for benchmark testing.

Use the following command to train a model on an 8-GPU machine: Before running training / inference, setup the configuration file properly

python3 -m torch.distributed.launch --nproc_per_node=8 tools/train_net.py --config-file configs/dla/DLA_34_FPN.yaml --train-dir PATH_TO_TRAIN_DIR --model-suffix MODEL_SUFFIX 

Use the following command to test a model on a single-GPU machine:

python3 tools/test_net.py --config-file configs/dla/DLA_34_FPN.yaml --output-dir PATH_TO_OUTPUT_DIR --model-file PATH_TO_MODEL_FILE --test-dataset DATASET_KEY --set val

Note: If you get an error ModuleNotFoundError: No module named 'siammot' when running in the git root then make sure your PYTHONPATH includes the current directory, which you can add by running: export PYTHONPATH=.:$PYTHONPATH or you can explicitly add the project to the path by replacing the '.' in the export command with the absolute path to the git root.

Multi-gpu testing is going to be supported later.

Version

This is the preliminary version specifically for Airbone Object Tracking (AOT) workshop. The current version only support the motion model being EMM.

  • [Update 06/02/2021] Refactor of configuration file
  • [Update 06/02/2021] Operator patching for amodal inference (needed in MOT17) and model release of MOT17 model
  • [Update 06/02/2021] Support inference based on provided public detection

Stay tuned for more updates

License

This project is licensed under the Apache-2.0 License.

siam-mot's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

siam-mot's Issues

Can siam-mot model be trained on a server with CUDA10.2

Hi~
I'm an undergraduate student. I rent a server with CUDA10.2 and can't install the new 11.0 version.(It didn't provide the permission of sudo commands)
Is it possible to train siam-mot with a CUDA10.2 server?
Thank U!

KEYPOINT_HEAD

When I want to use the KEYPOINT_HEAD,what should I do? Thank you so much.

Weights not available

Dear authors,
Thanks for making your code available, I am grateful to you for it.
In regards to the code base, I was wondering when would the weights for the MOTChallenge-2017 Test (Public detection) data set, as referenced in the model zoo readme be available?

Thanks again.

Flush the track memory for different videos.

Hi,

I noticed that "reset_siammot_status" or "flush_memory" are the modules to clear the memory between training on different videos, and they are not used in the training process.

However, I have frames from different videos in the training set. In that case, may I know how to refresh the memory after training on one video and then switch to another video?

Many thanks in advance.

why need video in demos/demo_vis?

Hi everyone, I am new to learning cv and deep learning, and I use colab as my machine. My input video is person_car.mp4, and the output video should be in demos/demo_vis, but when I run demo.py, it needs person_car.mp4 in demos/demo_vis too, that is strange. Can somebody explain?
image
this is my github which store my colab file

Can SiamMOT be used for person search ?

Hi, can I ask if SiamMOT can be used for person search because some of the other works like FairMOT have a Re-ID branch but SiamMOT seems to not have that?

About Train model

I have trained the model many times , but the gap about 3-4% on MOT17 train set is always exist. I train the model in the four gpus env,where base lr 0.01 、steps=(20000,30000) total iters are 35k, and max_size_train is 1200.
I want to know that if the parameter of max_size_train has a great impact on training ? what's more, can you give me some tips about training the model ?
I appreciate your reply to my question. tahnks.

Install maskrcnn-benchmark in the siammot env or in a seperate env?

Confused by the install instructions.
If I install everything except maskrcnn-benchmark as the instructions showed, I got the following errors when running the demo on a long video (not sure if it is caused by the lack of maskrcnn-benchmark):

Traceback (most recent call last):
File "demos/demo.py", line 5, in
from demos.demo_inference import DemoInference
ModuleNotFoundError: No module named 'demos'

AOT Challenge - Dataset ingestion

Has anyone already ingested AOT dataset for running this repo on it? Can someone please guide or give reference/link to your code.
TIA

Can I use the repo to detect score board?

Hello:
I want to know, in one live volleyball game video. There is one scoreboard in the TV screen, its position is fixed, but the score in the scoreboard is changing quite often.
I want to know if I can use the repo to detect the scoreboard, I need its scoreboard position in the screen, and further, can I detect the score change in the scoreboard.
For example, at the beginning, the score is: 0-0, if one team scores, then the score becomes 1-0.
Can I detect the position of scoreboard and detect the score changes in it?
If yes, please give a general instruction on how to do this.
Thanks,

how to output the video ?

Thank you for your sharing.
I follow the README.md instruction. when I input the following command, like this:
python3 demos/demo.py --demo-video test.mp4 --track-class person --dump-video True

I can't see any generated result video in the file "demos/demos_vis/".

I have tried many commands, like these :


python3 demos/demo.py --demo-video test.mp4 --track-class person --dump-video True --output-path demos/demos_vis


python3 demos/demo.py --demo-video test.mp4 --track-class person --dump-video False


python3 demos/demo.py --demo-video test.mp4 --track-class person --dump-video False
--output-path demos/demos_vis

and so on.

any suggestions for it ? How to output the result video ? Thank you

error about train the model

Thanks for the great works.
when i train the model with MOT17 dataset by the following command:

python3 -m torch.distributed.launch --nproc_per_node=2 tools/train_net.py --config-file configs/dla/DLA_34_FPN_EMM_MOT17.yaml --train-dir my_train_results/MOT17_TEST/ --model-suffix pth

i got the error:

Traceback (most recent call last):
File "tools/train_net.py", line 132, in
main()
File "tools/train_net.py", line 128, in main
train(cfg, train_dir, args.local_rank, args.distributed, logger)
File "tools/train_net.py", line 80, in train
logger, tensorboard_writer
File "./siammot/engine/trainer.py", line 51, in do_train
result, loss_dict = model(images, targets)
File "/home/sx/Documents/anaconda/anaconda3/envs/pt170/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/sx/Documents/anaconda/anaconda3/envs/pt170/lib/python3.6/site-packages/torch/nn/parallel/distributed.py", line 619, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/home/sx/Documents/anaconda/anaconda3/envs/pt170/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/sx/Documents/anaconda/anaconda3/envs/pt170/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/apex/amp/_initialize.py", line 197, in new_fwd
**applier(kwargs, input_caster))
File "./siammot/modelling/rcnn.py", line 47, in forward
features = self.backbone(images.tensors)
File "/home/sx/Documents/anaconda/anaconda3/envs/pt170/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/sx/Documents/anaconda/anaconda3/envs/pt170/lib/python3.6/site-packages/torch/nn/modules/container.py", line 117, in forward
input = module(input)
File "/home/sx/Documents/anaconda/anaconda3/envs/pt170/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "./siammot/modelling/backbone/dla.py", line 297, in forward
x5 = self.level5(x4)
File "/home/sx/Documents/anaconda/anaconda3/envs/pt170/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "./siammot/modelling/backbone/dla.py", line 231, in forward
x1 = self.tree1(x, residual)
File "/home/sx/Documents/anaconda/anaconda3/envs/pt170/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "./siammot/modelling/backbone/dla.py", line 54, in forward
out += residual
RuntimeError: The size of tensor a (47) must match the size of tensor b (46) at non-singleton dimension 3
can anybody help me ? thank you !

What about Detectron2

Is there any reason why Detectron2 was not used instead of MaskRCNN since it is the most up-to-date object detector?

IMM

Hello,If I want to implement an Implicit motion model based on your paper, what should I do?

Do I just need to add the IMM folder under /siam-mot/siammot/modelling/track_head and imitate the EMM file to implement the corresponding code, can you give me a hint?

Demo output not getting created in demo/demo_vis

Getting the following error when
"python3 demos/demo.py --demo-video data/person_car.mp4 --track-class person --dump-video True"
is ran,
"Unrecognized option 'crf'.
Error splitting the argument list: Option not found"
Also the output files are not getting created in demos/demos_vis

Screenshot from 2021-09-21 23-43-27

We also tried uninstalling and installing again ffmpeg package. It doesn't work

Test error

Hello.
I encountered this error when testing with the following command,“python3 -m tools.test_net --config-file configs/dla/DLA_34_FPN_EMM_MOT17.yaml --output-dir /home/yhb/下载/track_results --model-file /home/yhb/下载/DLA-34-FPN_EMM_crowdhuman_mot17 --test-dataset /media/yhb/Data/BaiduNetdiskDownload/MOT challenge/MOT17/test”
error:test_net.py: error: unrecognized arguments: challenge/MOT17/test
--test-path is the test path in MOT17 downloaded from the official website has not been modified.
What changes do I need to make to --test-path? It is very helpful for beginners.
Thank you very much!

errors about train_net.py

This is my train.py parameter configuration:
图片
This is my data structure:
图片
But I have this problem:
图片
I think there's something wrong here.But I do not know how to solve, ask for help, thank you
图片
图片

How to replace the default spatial matching with a new method?

I've running the code successfully,thank you for your great work!
Now I want to do some research,for example, replacing the default spatial matching method with my newly proposed method.
The question is: where can we modify the code in order to change the spatial matching method?
I've read the related code (./siammot/modeling/track_head),but found nothing related to the matching part.
Could you please tell me more about the spatial matching part? maybe I missed some information.

Using pytorch >=1.5.0, I have compiled maskrcnn-benchmark. But when running siam-mot, it can't import DFConv2d

I have solved the problem, thank you!

I am using python3.6, pytorch 1.7.0 and cuda 11.1.

According to the INSTALL.md, I have run the commands below to change the two files, "/deform_conv_cuda.cu" and "/deform_pool_cuda.cu".

cuda_dir="maskrcnn_benchmark/csrc/cuda"
perl -i -pe 's/AT_CHECK/TORCH_CHECK/' $cuda_dir/deform_pool_cuda.cu $cuda_dir/deform_conv_cuda.cu

Then, the maskrcnn_benchmark can be compiled without any error. But when running siam-mot, it reports the error following:
Traceback (most recent call last):
File "tools/test_net.py", line 13, in
from siammot.modelling.rcnn import build_siammot
File "/root/siam-mot/siammot/modelling/rcnn.py", line 12, in
from .backbone.backbone_ext import build_backbone
File "/root/siam-mot/siammot/modelling/backbone/backbone_ext.py", line 8, in
from .dla import dla
File "/root/siam-mot/siammot/modelling/backbone/dla.py", line 8, in
from maskrcnn_benchmark.layers import DFConv2d
ImportError: cannot import name 'DFConv2d'

I have checked the code of the maskrcnn-benchmark package that has been installed in /opt/conda/lib/python3.6/site-pakages. These is no DFConv2d.

Got 0 in motmetrics

I run test_net.py with MOT17, and got result below

image

What happend?
I didn't edit any code in this project.

colab

can you please add a google colab for inference

ffmpeg

when I run demo.py,I meet the problem which ffmpeg._run.Error: ffprobe error (see stderr output for detail).How can I solve it,thank you.

Class imbalance for rpn

I was trying to improve upon this model for the AOT challenge organised by aicrowd. I am facing the issue of severe class imbalance for rpn and box_head part.Even if posnegsampler is used which has pos ratio as 0.5, the implementation ends up with 1-2 positive labels out of the total 256 sampled anchors.These anchors also have extremely low IOU of 0.3-0.4 .How has this issue been been addressed in the baseline for aot challenge?

Logic behind padding features

Can I get the reasoning behind padding the features with zeros when using poolers to extract features for search regions(sr)? Also, why isn't padding done while pooling for proposals/detections? Any intuition behind the padding value chosen?

Code references:

In track_core.py EMM forward function:

        features = self.track_utils.pad_feature(features)
        sr_features = self.feature_extractor(features, boxes, sr)

Also, need an explanation for the following lines of code in track_utils.py in extend_bbox function. Shouldn't it be only the factor 2 instead of (self.search_expansion * 2.)

        # todo: need to check the equation later
        min_w_ext = (self.min_search_wh - bbox_w) / (self.search_expansion * 2.)
        min_h_ext = (self.min_search_wh - bbox_h) / (self.search_expansion * 2.)

ModuleNotFoundError: No module named 'demos.demo_inference'

While running the following command on Google Colab:

!python /content/siam-mot/demos/demo.py --demo-video /content/drive/MyDrive/cars.mov --track-class person_vehicle --dump-video True

I get an error:
from demos.demo_inference import DemoInference
ModuleNotFoundError: No module named 'demos.demo_inference'

I have pip installed demos and all other requirements but still have this issue

How to increase the detector confidence?

Hello,
I ran the code with my demo video and I found that it detects a lot of false positives,
Is there any confidence threshold parameter that I could set to control that ?

Nan values for loss and accuracy in training and testing.

As per our capacity, we reduced it to 4 GPUs and kept the learning rate as default 0.02. After 40-60 iterations we started getting Nan value of losses
We reduced the rate to 0.015 and then trained. Even with this for >200 iterations, sometimes it shows Nan loss values, sometimes it runs fine.
Even after testing with Nan loss values, we found all the accuracy values in the output table came out to be Nan.

WhatsApp Image 2021-09-04 at 22 48 10

only use tracker

Hi, excellent work! But I have some questions.
If I already have bboxes in the video and I only want to use the SiamMOT for tracking-by-detection, what should I do.
Particularly, how to just inference with my bboxes. And, how to train SiamMOT with my bboxes and then inference.
Thank you.
(maybe issue #5 is the same problem with me)

No such file or directory: 'ffprobe'

Hello, when I try to demo my video, with 'python3 demos/demo.py --demo-video PATH_TO_DEMO_VIDE --track-class person --dump-video True'.I've set the right video path, I met the problem: 'FileNotFoundError: [Errno 2] No such file or directory: 'ffprobe''!
I've tried many approches but all of them do not work.What should I do?

KeyError: 'categories'

Hello,my problem is

Traceback (most recent call last):
  File "tools/train_net.py", line 130, in <module>
    main()
  File "tools/train_net.py", line 126, in main
    train(cfg, train_dir, args.local_rank, args.distributed, logger)
  File "tools/train_net.py", line 69, in train
    start_iter=arguments["iteration"],
  File "/data/users/CHDHPC/2020224031/code/SiamMOT/siammot/data/build_train_data_loader.py", line 66, in build_train_data_loader
    dataset = build_dataset(cfg)
  File "/data/users/CHDHPC/2020224031/code/SiamMOT/siammot/data/build_train_data_loader.py", line 38, in build_dataset
    amodal=cfg.INPUT.AMODAL)
  File "/data/users/CHDHPC/2020224031/code/SiamMOT/siammot/data/image_dataset.py", line 43, in __init__
    self._det_classes = [c['name'] for c in self.dataset.loadCats(self.dataset.getCatIds())]
  File "/data/users/CHDHPC/2020224031/anaconda3/envs/FairMOT/lib/python3.6/site-packages/pycocotools/coco.py", line 168, in getCatIds
    cats = self.dataset['categories']
KeyError: 'categories'

can you tell me what should i do?

Variable memory requirements

I have noticed that the memory requirements for the model change depending on whether the training starts from a freshly initialized model or a model initialized from a checkpoint.

I am training the model on NVidia RTX 2080Ti GPU, which provides 11GB of memory. In order to start the training without running into RuntimeError: CUDA error: out of memory exception, I need to set the number of video clips per batch equal to 3. More specifically, in terms of configuration settings:

SOLVER:
  VIDEO_CLIPS_PER_BATCH: 3

This produces the batch size equal to 6, since we have 2 random frames per clip, as given by the configuration below:

VIDEO:
  RANDOM_FRAMES_PER_CLIP: 2

However. if I restart the training from a previously stored checkpoint, the memory consumption decreases to such an extent, that I can add one more video clip per batch without crashing due to insufficient memory capacity. More concretely, my configuration allows the following:

SOLVER:
  VIDEO_CLIPS_PER_BATCH: 4

This does not seem to influence the model performance after training.

I have tried explicitly calling the garbage collector and emptying the CUDA cache using

import gc
import torch

gc.collect()
torch.cuda.empty_cache()

but to no avail.

My question is. What do you think might be causing this sort of memory leak? I have been working on this architecture for some time and yet I haven't found a reasonable explanation so far.

At this point, my pipeline involves two separate configurations. First, I run the training for 100 iterations, save the checkpoint, halt the training, and then restart it with a different configuration allowing a bigger batch size, and let it train as required. It is pretty cumbersome as well as highly unprofessional. I would like to understand the underlying cause.

Thank you for your input.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.