Git Product home page Git Product logo

memotr's Introduction

MeMOTR

The official implementation of MeMOTR: Long-Term Memory-Augmented Transformer for Multi-Object Tracking, ICCV 2023.

Authors: Ruopeng Gao, Limin Wang.

PWC PWC

MeMOTR

MeMOTR is a fully-end-to-end memory-augmented multi-object tracker based on Transformer. We leverage long-term memory injection with a customized memory-attention layer, thus significantly improving the association performance.

Dance Demo

News ๐Ÿ”ฅ

  • 2024.05.09: We release MOTIP, a new perspective to regard the multi-object tracking task as an ID prediction problem ๐Ÿ”ญ.

  • 2024.02.21: We add the results on SportsMOT in our arxiv version (supp part). We would appreciate it if you could CITE our trackers in the SportsMOT comparison ๐Ÿ“ˆ.

  • 2023.12.24: We release the code, scripts and checkpoints on BDD100K ๐Ÿš—.

  • 2023.12.13: We implement a jupyter notebook to run our model on your own video ๐ŸŽฅ.

  • 2023.11.07: We release the scripts and checkpoints on SportsMOT ๐Ÿ€.

  • 2023.08.24: We release the scripts and checkpoints on DanceTrack ๐Ÿ’ƒ.

  • 2023.08.09: We release the main code. More configurations, scripts and checkpoints will be released soon ๐Ÿ”œ.

Installation

conda create -n MeMOTR python=3.10  # create a virtual env
# I remember that I use some new features in Python 3.10, but I'm not exactly sure about this.
conda activate MeMOTR               # activate the env
conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia
# Our code is primarily running on PyTorch 1.13.1, 
# but it should be also compatible with earlier PyTorch versions (e.g., 1.12.1).
# However, too early pytorch version may cause some issue that need to be fixed, as we use some newly proposed feature of pytorch (e.g., ResNet50_Weights).
conda install matplotlib pyyaml scipy tqdm tensorboard
pip install opencv-python

You also need to compile the Deformable Attention CUDA ops:

# From https://github.com/fundamentalvision/Deformable-DETR
cd ./models/ops/
sh make.sh
# You can test this ops if you need:
python test.py

Data

You should put the unzipped MOT17 and CrowdHuman datasets into the DATADIR/MOT17/images/ and DATADIR/CrowdHuman/images/, respectively. And then generate the ground truth files by running the corresponding script: ./data/gen_mot17_gts.py and ./data/gen_crowdhuman_gts.py.

Finally, you should get the following dataset structure:

DATADIR/
  โ”œโ”€โ”€ DanceTrack/
  โ”‚ โ”œโ”€โ”€ train/
  โ”‚ โ”œโ”€โ”€ val/
  โ”‚ โ”œโ”€โ”€ test/
  โ”‚ โ”œโ”€โ”€ train_seqmap.txt
  โ”‚ โ”œโ”€โ”€ val_seqmap.txt
  โ”‚ โ””โ”€โ”€ test_seqmap.txt
  โ”œโ”€โ”€ SportsMOT/
  โ”‚ โ”œโ”€โ”€ train/
  โ”‚ โ”œโ”€โ”€ val/
  โ”‚ โ”œโ”€โ”€ test/
  โ”‚ โ”œโ”€โ”€ train_seqmap.txt
  โ”‚ โ”œโ”€โ”€ val_seqmap.txt
  โ”‚ โ””โ”€โ”€ test_seqmap.txt
  โ”œโ”€โ”€ MOT17/
  โ”‚ โ”œโ”€โ”€ images/
  โ”‚ โ”‚ โ”œโ”€โ”€ train/     # unzip from MOT17
  โ”‚ โ”‚ โ””โ”€โ”€ test/      # unzip from MOT17
  โ”‚ โ””โ”€โ”€ gts/
  โ”‚   โ””โ”€โ”€ train/     # generate by ./data/gen_mot17_gts.py
  โ””โ”€โ”€ CrowdHuman/
    โ”œโ”€โ”€ images/
    โ”‚ โ”œโ”€โ”€ train/     # unzip from CrowdHuman
    โ”‚ โ””โ”€โ”€ val/       # unzip from CrowdHuman
    โ””โ”€โ”€ gts/
      โ”œโ”€โ”€ train/     # generate by ./data/gen_crowdhuman_gts.py
      โ””โ”€โ”€ val/       # generate by ./data/gen_crowdhuman_gts.py

Pretrain

We initialize our model with the official DAB-Deformable-DETR (with R50 backbone) weights pretrained on the COCO dataset, you can also download the checkpoint we used here. And then put the checkpoint at the root of this project dir.

Scripts on DanceTrack

Training

Train MeMOTR with 8 GPUs on DanceTrack (recommended to use GPUs with >= 32 GB Memory, like V100-32GB or some else):

python -m torch.distributed.run --nproc_per_node=8 main.py --use-distributed --config-path ./configs/train_dancetrack.yaml --outputs-dir ./outputs/memotr_dancetrack/ --batch-size 1 --data-root <your data dir path>

if your GPU's memory is below than 32 GB, we also implement a memory-optimized version (by running option --use-checkpoint) as discussed in the paper, we use gradient checkpoint to reduce the allocated GPU memory. This following training script will only take about 10 GB GPU memory:

python -m torch.distributed.run --nproc_per_node=8 main.py --use-distributed --config-path ./configs/train_dancetrack.yaml --outputs-dir ./outputs/memotr_dancetrack/ --batch-size 1 --data-root <your data dir path> --use-checkpoint

Submit and Evaluation

You can use this script to evaluate the trained model on the DanceTrack val set:

python main.py --mode eval --data-root <your data dir path> --eval-mode specific --eval-model <filename of the checkpoint> --eval-dir ./outputs/memotr_dancetrack/ --eval-threads <your gpus num>

for submitting, you can use the following scripts:

python -m torch.distributed.run --nproc_per_node=8 main.py --mode submit --submit-dir ./outputs/memotr_dancetrack/ --submit-model <filename of the checkpoint> --use-distributed --data-root <your data dir path>

Besides, if you just want to directly eval or submit through our trained checkpoint, you can get the checkpoint we used in the paper here. Then put this checkpoint into ./outputs/memotr_dancetrack/ and run the above scripts.

Scripts on MOT17

Submit

For submitting, you can use the following scripts:

python -m torch.distributed.run --nproc_per_node=8 main.py --mode submit --config-path ./outputs/memotr_mot17/train/config.yaml --submit-dir ./outputs/memotr_mot17/ --submit-model <filename of the checkpoint> --use-distributed --data-root <your data dir path>

Also, you can directly download our trained checkpoint here. Then put it into ./outputs/memotr_mot17/ and run the above script for submitting to get submit files of MOT17 test set.

Scripts on SportsMOT and other datasets

You can replace the --config-path in DanceTrack Scripts. E.g., from ./configs/train_dancetrack.yaml to ./configs/train_sportsmot.yaml for training on SportsMOT.

Results

Multi-Object Tracking on the DanceTrack test set

Methods HOTA DetA AssA checkpoint
MeMOTR 68.5 80.5 58.4 Google Drive
MeMOTR (Deformable DETR) 63.4 77.0 52.3 Google Drive

Multi-Object Tracking on the SportsMOT test set

For all experiments, we do not use extra data (like CrowdHuman) for training.

Methods HOTA DetA AssA checkpoint
MeMOTR 70.0 83.1 59.1 Google Drive
MeMOTR (Deformable DETR) 68.8 82.0 57.8 Google Drive

Multi-Object Tracking on the MOT17 test set

Methods HOTA DetA AssA checkpoint
MeMOTR 58.8 59.6 58.4 Google Drive

Multi-Category Multi-Object Tracking on the BDD100K val set

Methods mTETA mLocA mAssocA checkpoint
MeMOTR 53.6 38.1 56.7 Google Drive

Contact

Citation

@InProceedings{MeMOTR,
    author    = {Gao, Ruopeng and Wang, Limin},
    title     = {{MeMOTR}: Long-Term Memory-Augmented Transformer for Multi-Object Tracking},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2023},
    pages     = {9901-9910}
}

Stars

Star History Chart

Acknowledgement

memotr's People

Contributors

hellorpg avatar wanglimin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

memotr's Issues

question about --use-checkpoint

ๅฝ“ๆˆ‘ไธไฝฟ็”จcheckpoint๏ผŒๅฏไปฅ่ฟ่กŒไฝ†ๆ˜ฏ้€Ÿๅบฆๆฏ”่พƒๆ…ข
ๅฝ“ๆˆ‘ไฝฟ็”จcheckpoint๏ผŒๅนถไธ”CHECKPOINT_LEVEL=2ๆ—ถ๏ผŒไผšๆœ‰ไปฅไธ‹้”™่ฏฏ
ValueError: Unexpected keyword arguments: use_reentrant
features, pos = checkpoint(self.backbone, frame, use_reentrant=False)
็‚น่ฟ›ๅŽปdef checkpoint(function, *args, **kwargs):checkpointๅ‡ฝๆ•ฐๆฒกๆœ‰use_reentrant่ฟ™ไธชๅ‚ๆ•ฐ
ๅฝ“ๆˆ‘ๆŠŠuse_reentrant=Falseๅˆ ๅŽปๆˆ–่€…ไฝฟ็”จCHECKPOINT_LEVEL=3ๆ—ถ๏ผŒๅˆไผšๆœ‰ไปฅไธ‹้”™่ฏฏ

  File "main.py", line 120, in <module>
    main(config=merged_config)
  File "main.py", line 103, in main
    train(config=config)
  File "/cver/tcying/ytc/MeMOTR/train_engine.py", line 126, in train
    train_one_epoch(
  File "/cver/tcying/ytc/MeMOTR/train_engine.py", line 238, in train_one_epoch
    loss.backward()
  File "/cver/tcying/lib/python3.8/site-packages/torch/_tensor.py", line 255, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/cver/tcying/lib/python3.8/site-packages/torch/autograd/__init__.py", line 147, in backward
    Variable._execution_engine.run_backward(
  File "/cver/tcying/lib/python3.8/site-packages/torch/autograd/function.py", line 87, in apply
    return self._forward_cls.backward(self, *args)  # type: ignore[attr-defined]
  File "/cver/tcying/lib/python3.8/site-packages/torch/utils/checkpoint.py", line 138, in backward
    torch.autograd.backward(outputs_with_grad, args_with_grad)
  File "/cver/tcying/lib/python3.8/site-packages/torch/autograd/__init__.py", line 147, in backward
    Variable._execution_engine.run_backward(
RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module parameter outside the `forward` function. Please make sure model parameters are not shared across multiple concurrent forward-backward passes. or try to use _set_static_graph() as a workaround if this module graph does not change during training loop.2) Reused parameters in multiple reentrant backward passes. For example, if you use multiple `checkpoint` functions to wrap the same part of your model, it would result in the same set of parameters been used by different reentrant backward passes multiple times, and hence marking a variable ready multiple times. DDP does not support such use cases in default. You can try to use _set_static_graph() as a workaround if your module graph does not change over iterations.
Parameter at index 271 has been marked as ready twice. This means that multiple autograd engine  hooks have fired for this particular parameter during this iteration. You can set the environment variable TORCH_DISTRIBUTED_DEBUG to either INFO or DETAIL to print parameter names for further debugging.

่ฏท้—ฎ่ฏฅๅฆ‚ไฝ•่งฃๅ†ณๅ‘ข

MOT 17 Training

Can you please share details about the validation set you used to validate your method for the MOT17 dataset before submitting to the test server?

Error while training in distributed mode

Hello, I ran into an error when I am training the code in distributed mode. Error is as follow "torch.distributed.elastic.multiprocessing.errors.childFailedError: main.py FAILED

any idea?

Thanks!

How to fine tune a custom VOC Dataset?

Thanks for your amazing work. I have a dataset labeled in voc. It has a directory structure as follows:

Dataset
|________video1
------------------|______img1
------------------|______img1.xml
|________video2
------------------|______img1
------------------|______img1.xml

I have written a custom script for converting into coco video dataset.

MEMOTR evaluation on MOT17

I am trying to run the evaluation code on MOT17 using my own checkpoint for the MEMOTR model,however every time I try running the evaluation code using the below arguments,the configuration file that is loaded ends up pointing to the Dancetrack dataset.
Could you tell me how to change the arguments or config file so that the evaluation can be done on MOT17.

I saw there is a specific command in eval_engine that runs the evaluation on MOT17,and worst case scenario I am thinking of directly running that command.

This is the command for evaluation

!python main.py --mode eval --data-root /content/MeMOTR/DATADIR/ --eval-mode specific --eval-model /content/MeMOTR/outputs/memotr_mot17/checkpoint_59.pth  --eval-dir /content/MeMOTR/outputs/memotr_mot17/train --eval-threads 0

and this is the error log i get

2024-06-07 05:18:43.709544: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-06-07 05:18:43.763543: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-06-07 05:18:43.763595: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-06-07 05:18:43.765557: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-06-07 05:18:43.774258: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-06-07 05:18:44.912419: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
===>  Running checkpoint '/content/MeMOTR/outputs/memotr_mot17/checkpoint_59.pth'
2024-06-07 05:18:49.724680: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-06-07 05:18:49.724733: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-06-07 05:18:49.725913: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-06-07 05:18:50.837143: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Configs: {'GIT_VERSION': None, 'MODE': 'submit', 'CONFIG_PATH': './configs/train_dancetrack.yaml', 'VISUALIZE': False, 'AVAILABLE_GPUS': '0,1,2,3,4,5,6,7', 'DEVICE': 'cuda', 'OUTPUTS_DIR': './outputs/pycharm/', 'USE_DISTRIBUTED': False, 'USE_CHECKPOINT': False, 'CHECKPOINT_LEVEL': 2, 'RESUME': None, 'RESUME_SCHEDULER': True, 'MULTI_CHECKPOINT': False, 'SUBMIT_DIR': '/content/MeMOTR/outputs/memotr_mot17/train', 'SUBMIT_MODEL': '/content/MeMOTR/outputs/memotr_mot17/checkpoint_59.pth', 'SUBMIT_DATA_SPLIT': 'val', 'DET_SCORE_THRESH': 0.5, 'TRACK_SCORE_THRESH': 0.5, 'RESULT_SCORE_THRESH': 0.5, 'MISS_TOLERANCE': 30, 'USE_MOTION': False, 'MOTION_MIN_LENGTH': 3, 'MOTION_MAX_LENGTH': 5, 'MOTION_LAMBDA': 0.5, 'EVAL_DIR': None, 'EVAL_MODE': 'specific', 'EVAL_MODEL': None, 'EVAL_PORT': None, 'EVAL_THREADS': 1, 'EVAL_DATA_SPLIT': 'val', 'DATASET': 'DanceTrack', 'USE_MOTSYNTH': None, 'USE_CROWDHUMAN': None, 'MOTSYNTH_RATE': None, 'DATA_ROOT': '/content/MeMOTR/DATADIR/', 'DATA_PATH': None, 'NUM_WORKERS': 4, 'BATCH_SIZE': 1, 'ACCUMULATION_STEPS': 1, 'COCO_SIZE': False, 'OVERFLOW_BBOX': False, 'REVERSE_CLIP': 0.0, 'BACKBONE': 'resnet50', 'HIDDEN_DIM': 256, 'FFN_DIM': 2048, 'NUM_FEATURE_LEVELS': 4, 'NUM_HEADS': 8, 'NUM_ENC_POINTS': 4, 'NUM_DEC_POINTS': 4, 'NUM_ENC_LAYERS': 6, 'NUM_DEC_LAYERS': 6, 'MERGE_DET_TRACK_LAYER': 1, 'ACTIVATION': 'ReLU', 'RETURN_INTER_DEC': True, 'EXTRA_TRACK_ATTN': False, 'AUX_LOSS': True, 'USE_DAB': True, 'UPDATE_THRESH': 0.5, 'LONG_MEMORY_LAMBDA': 0.01, 'PRETRAINED_MODEL': 'dab_deformable_detr.pth', 'SAMPLE_STEPS': [6, 10, 14], 'SAMPLE_LENGTHS': [2, 3, 4, 5], 'SAMPLE_MODES': ['random_interval'], 'SAMPLE_INTERVALS': [10], 'SEED': 42, 'EPOCHS': 20, 'ONLY_TRAIN_QUERY_UPDATER_AFTER': 20, 'DROPOUT': 0.0, 'NUM_DET_QUERIES': 300, 'TP_DROP_RATE': 0.0, 'FP_INSERT_RATE': 0.0, 'LR': 0.0002, 'LR_BACKBONE': 2e-05, 'LR_POINTS': 1e-05, 'WEIGHT_DECAY': 0.0005, 'CLIP_MAX_NORM': 0.1, 'LR_SCHEDULER': 'MultiStep', 'LR_DROP_RATE': 0.1, 'LR_DROP_MILESTONES': [12], 'MATCH_COST_CLASS': 2, 'MATCH_COST_BBOX': 5, 'MATCH_COST_GIOU': 2, 'LOSS_WEIGHT_FOCAL': 2, 'LOSS_WEIGHT_L1': 5, 'LOSS_WEIGHT_GIOU': 2, 'AUX_LOSS_WEIGHT': [1.0, 1.0, 1.0, 1.0, 1.0]}
Traceback (most recent call last):
  File "/content/MeMOTR/main.py", line 124, in <module>
    main(config=merged_config)
  File "/content/MeMOTR/main.py", line 109, in main
    submit(config=config)
  File "/content/MeMOTR/submit_engine.py", line 195, in submit
    train_config = yaml_to_dict(path=path.join(config["SUBMIT_DIR"], "train/config.yaml"))
  File "/content/MeMOTR/utils/utils.py", line 53, in yaml_to_dict
    with open(path) as f:
FileNotFoundError: [Errno 2] No such file or directory: '/content/MeMOTR/outputs/memotr_mot17/train/train/config.yaml'
mv: cannot stat '/content/MeMOTR/outputs/memotr_mot17/train/val/tracker': No such file or directory
/content/MeMOTR/TrackEval/trackeval/eval.py:99: SyntaxWarning: "is not" with a literal. Did you mean "!="?
  seq_key is not 'COMBINED_SEQ'}
/content/MeMOTR/TrackEval/trackeval/utils.py:137: SyntaxWarning: "is not" with a literal. Did you mean "!="?
  if (len(current_values) == len(keys)) and seq is not '':
/content/MeMOTR/TrackEval/trackeval/datasets/mot_challenge_2d_box.py:151: SyntaxWarning: "is" with a literal. Did you mean "=="?
  if self.config["SEQMAP_FOLDER"] is 'None':

Eval Config:
USE_PARALLEL         : True                          
NUM_PARALLEL_CORES   : 8                             
BREAK_ON_ERROR       : True                          
RETURN_ON_ERROR      : False                         
LOG_ON_ERROR         : /content/MeMOTR/TrackEval/error_log.txt
PRINT_RESULTS        : True                          
PRINT_ONLY_COMBINED  : False                         
PRINT_CONFIG         : True                          
TIME_PROGRESS        : True                          
DISPLAY_LESS_PROGRESS : False                         
OUTPUT_SUMMARY       : True                          
OUTPUT_EMPTY_CLASSES : True                          
OUTPUT_DETAILED      : True                          
PLOT_CURVES          : False                         

MotChallenge2DBox Config:
PRINT_CONFIG         : True                          
GT_FOLDER            : /content/MeMOTR/DATADIR/DanceTrack/val
TRACKERS_FOLDER      : /content/MeMOTR/outputs/memotr_mot17/checkpoint_59_tracker
OUTPUT_FOLDER        : None                          
TRACKERS_TO_EVAL     : ['']                          
CLASSES_TO_EVAL      : ['pedestrian']                
BENCHMARK            : MOT17                         
SPLIT_TO_EVAL        : val                           
INPUT_AS_ZIP         : False                         
DO_PREPROC           : True                          
TRACKER_SUB_FOLDER   :                               
OUTPUT_SUB_FOLDER    :                               
TRACKER_DISPLAY_NAMES : None                          
SEQMAP_FOLDER        : None                          
SEQMAP_FILE          : /content/MeMOTR/DATADIR/DanceTrack/val_seqmap.txt
SEQ_INFO             : None                          
GT_LOC_FORMAT        : {gt_folder}/{seq}/gt/gt.txt   
SKIP_SPLIT_FOL       : True                          
no seqmap found: /content/MeMOTR/DATADIR/DanceTrack/val_seqmap.txt
Traceback (most recent call last):
  File "/content/MeMOTR/TrackEval/scripts/run_mot_challenge.py", line 84, in <module>
    dataset_list = [trackeval.datasets.MotChallenge2DBox(dataset_config)]
  File "/content/MeMOTR/TrackEval/trackeval/datasets/mot_challenge_2d_box.py", line 82, in __init__
    self.seq_list, self.seq_lengths = self._get_seq_info()
  File "/content/MeMOTR/TrackEval/trackeval/datasets/mot_challenge_2d_box.py", line 157, in _get_seq_info
    raise TrackEvalException('no seqmap found: ' + os.path.basename(seqmap_file))
trackeval.utils.TrackEvalException: no seqmap found: val_seqmap.txt
Traceback (most recent call last):
  File "/content/MeMOTR/main.py", line 124, in <module>
    main(config=merged_config)
  File "/content/MeMOTR/main.py", line 111, in main
    evaluate(config=config)
  File "/content/MeMOTR/eval_engine.py", line 36, in evaluate
    metrics = eval_model(model=config["EVAL_MODEL"], eval_dir=eval_dir,
  File "/content/MeMOTR/eval_engine.py", line 118, in eval_model
    with open(metric_path) as f:
FileNotFoundError: [Errno 2] No such file or directory: '/content/MeMOTR/outputs/memotr_mot17/checkpoint_59_tracker/pedestrian_summary.txt'

not sure at which point the dancetrack configuraiton was referred to here,because I am using the MOT17 dataset.

Distributed operation

Hello, thank you for your excellent work and salute you.
When I reproduced the code, Use the following command: python-mtrch.distributed.run-nproc _ per _ node = 8main.py-mode submit-config-path/home/sunzhaojie/memot/outputs/memot _ mot17/train/ config.yaml --submit-dir /home/sunzhaojie/MeMOTR/outputs/memotr_mot17/ --submit-model dab_deformable_detr.pth --use-distributed --data-root /home/sunzhaojie/MeMOTR/dataset/MOT17
The following error occurred while running the code:

Traceback (most recent call last):
File "/home/sunzhaojie/MeMOTR/main.py", line 120, in
main(config=merged_config)
File "/home/sunzhaojie/MeMOTR/main.py", line 97, in main
torch.cuda.set_device(distributed_rank())
File "/home/sunzhaojie/.conda/envs/mot13/lib/python3.10/site-packages/torch/cuda/init.py", line 326, in set_device
torch._C._cuda_setDevice(device)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Traceback (most recent call last):
File "/home/sunzhaojie/MeMOTR/main.py", line 120, in
main(config=merged_config)
File "/home/sunzhaojie/MeMOTR/main.py", line 97, in main
torch.cuda.set_device(distributed_rank())
File "/home/sunzhaojie/.conda/envs/mot13/lib/python3.10/site-packages/torch/cuda/init.py", line 326, in set_device
torch._C._cuda_setDevice(device)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Traceback (most recent call last):
File "/home/sunzhaojie/MeMOTR/main.py", line 120, in
main(config=merged_config)
File "/home/sunzhaojie/MeMOTR/main.py", line 97, in main
torch.cuda.set_device(distributed_rank())
File "/home/sunzhaojie/.conda/envs/mot13/lib/python3.10/site-packages/torch/cuda/init.py", line 326, in set_device
torch._C._cuda_setDevice(device)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Traceback (most recent call last):
File "/home/sunzhaojie/MeMOTR/main.py", line 120, in
main(config=merged_config)
File "/home/sunzhaojie/MeMOTR/main.py", line 97, in main
Traceback (most recent call last):
torch.cuda.set_device(distributed_rank())
File "/home/sunzhaojie/MeMOTR/main.py", line 120, in
File "/home/sunzhaojie/.conda/envs/mot13/lib/python3.10/site-packages/torch/cuda/init.py", line 326, in set_device
torch._C._cuda_setDevice(device)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
main(config=merged_config)
File "/home/sunzhaojie/MeMOTR/main.py", line 97, in main
torch.cuda.set_device(distributed_rank())
File "/home/sunzhaojie/.conda/envs/mot13/lib/python3.10/site-packages/torch/cuda/init.py", line 326, in set_device
torch._C._cuda_setDevice(device)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Traceback (most recent call last):
File "/home/sunzhaojie/MeMOTR/main.py", line 120, in
main(config=merged_config)
File "/home/sunzhaojie/MeMOTR/main.py", line 97, in main
torch.cuda.set_device(distributed_rank())
File "/home/sunzhaojie/.conda/envs/mot13/lib/python3.10/site-packages/torch/cuda/init.py", line 326, in set_device
torch._C._cuda_setDevice(device)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3949687 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3949688 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 2 (pid: 3949689) of binary: /home/sunzhaojie/.conda/envs/mot13/bin/python
Traceback (most recent call last):
File "/home/sunzhaojie/.conda/envs/mot13/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/sunzhaojie/.conda/envs/mot13/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/sunzhaojie/.conda/envs/mot13/lib/python3.10/site-packages/torch/distributed/run.py", line 766, in
main()
File "/home/sunzhaojie/.conda/envs/mot13/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, **kwargs)
File "/home/sunzhaojie/.conda/envs/mot13/lib/python3.10/site-packages/torch/distributed/run.py", line 762, in main
run(args)
File "/home/sunzhaojie/.conda/envs/mot13/lib/python3.10/site-packages/torch/distributed/run.py", line 753, in run
elastic_launch(
File "/home/sunzhaojie/.conda/envs/mot13/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 132, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/sunzhaojie/.conda/envs/mot13/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

main.py FAILED

Failures:
[1]:
time : 2023-12-15_17:11:52
host : ubuntu-Precision-7920-Tower
rank : 3 (local_rank: 3)
exitcode : 1 (pid: 3949690)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[2]:
time : 2023-12-15_17:11:52
host : ubuntu-Precision-7920-Tower
rank : 4 (local_rank: 4)
exitcode : 1 (pid: 3949691)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[3]:
time : 2023-12-15_17:11:52
host : ubuntu-Precision-7920-Tower
rank : 5 (local_rank: 5)
exitcode : 1 (pid: 3949692)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[4]:
time : 2023-12-15_17:11:52
host : ubuntu-Precision-7920-Tower
rank : 6 (local_rank: 6)
exitcode : 1 (pid: 3949693)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[5]:
time : 2023-12-15_17:11:52
host : ubuntu-Precision-7920-Tower
rank : 7 (local_rank: 7)
exitcode : 1 (pid: 3949694)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Root Cause (first observed failure):
[0]:
time : 2023-12-15_17:11:52
host : ubuntu-Precision-7920-Tower
rank : 2 (local_rank: 2)
exitcode : 1 (pid: 3949689)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

How can I solve this problem? Hope to reply!

Abount bdd100k

Congradulations about the achievement. But i wonder when will release the bdd100k model and train methods?

Training time problem

Hello, I use an RTX3090 graphics card for training. Without using --use-checkpoint, the time I need for an epoch is 30min. Excuse me, is my time normal? Is there any way to speed up the time?

question about short_memory

Hello, When I read your paper and reproduced the code, I had a question. You mentioned in your paper: we fuse the outputs from two adjacent frames with an adaptive aggregation algorithm. As shown in the red box below:
S$JR}7L`P {M31@ OD}DAM7

The implementation of this part in the code is as follows:
%}JV9OB{J 0WVSJN 4Q))EU

My question is as follows: last_output_embed represents the output of the previous frame, why is it not tracks[b-1].last_output but tracks[b].last_output.
I'm sorry to bother you again. If I have any misunderstanding, please advise me.

Performance Reproduction

Hi, thank you very much for providing this well-structured codebase!

I tried training MeMOTR (with DAB-DETR) on DanceTrack and run into performance issues. In particular, using the provided config file and pretrained checkpoint I only obtain:

HOTA DetA AssA
62.481 74.141 52.901

In particular, the association accuracy lags > 2 points behind the performance reported in the paper. Was anyone able to reproduce the original performance? Is there anything I'm missing? @HELLORPG have you tried training this model with the current codebase and config file? Thanks in advance for your help!

An error occurred while replicating the code

Thank you for your excellent work. When I was training on the dancetrack dataset, I got an error: RuntimeError: mat1 and mat2 shapes cannot be multiplied (300x256 and 512x256). This error seems to say that the two matrices have different shapes, but I can't find the problematic block of code. Could you give me some guidance, please? Thank you very much.

ๅœจ่ฟ่กŒ sh make.shๆ—ถๅ‡บ็Žฐ ImportError: No module named torch้—ฎ้ข˜

ๅœจๆ นๆฎไธ‹่ฝฝๆจกๅ—็š„ไปฃ็ ๅˆ›ๅปบๅฅฝไบ†่™šๆ‹Ÿ็Žฏๅขƒๅนถๅฎ‰่ฃ…็›ธๅบ”็š„็ป„ไปถไน‹ๅŽ๏ผŒๆˆ‘cdๅˆฐ็›ธๅบ”็›ฎๅฝ•ไธ‹ๆ‰ง่กŒsh make.shๆ—ถ๏ผŒๅ‡บ็Žฐ็š„่ฟ™ไธช้—ฎ้ข˜๏ผŒๅฎŒๆ•ด็š„ๆŠฅ้”™ๆ˜ฏ๏ผš
Traceback (most recent call last):
File "setup.py", line 11, in
import torch
ImportError: No module named torch
ๆˆ‘ๅœจๅฐ่ฏ•pip install torchๅŽๅ†ๆฌก่ฟ่กŒ่ฟ˜ๆ˜ฏๅพ—ๅˆฐไธ€ๆ ท็š„็ป“ๆžœ

Input format-Training on one frame of the video clip?

Can you please elaborate on "The batch size is set to 1 per GPU, and each batch contains a video clip with multiple frames. Within each clip, video frames are sampled with random intervals from 1 to 10."

Does this mean the actual model is trained on one frame at a time randomly selected from the clip? I am trying to understand the actual input to the transformer encoder and decoder.

and What is the role of no_grad_frames?

Thank you!!

Pytorch environment issues

May I ask what version of pytorch should be installed when cuda version is 11.3?

1ใ€I have tried versions with torch=1.12.1+cu113, torch vision=0.13.1+cu113, and torch studio=0.12.1+cu113. Under these conditions, I will not be able to run the/Deformable DETR/models/ops/test.py file, which will result in an error nvrtc: error: invalid value for -- gpu architecture (- arch).

2ใ€I have tried versions with torch=1.11.0+cu113 torch vision=0.12.0+cu113 torch studio=0.11.0, the/Deformable DETR/models/ops/test.py file can run normally. However, if I run the training main.py, an error will occur as shown in the following figure
c6c9abb902a4532fa0f6f8664b79289f

i am looking forward to your early reply, thx!!!!!!!

the request of code for datasets organization

Hi, thanks for your excellent work.
Could you upload the pre-processed code to organize the dataset?
(as follows):

DATADIR/
  โ”œโ”€โ”€ DanceTrack/
  โ”‚ โ”œโ”€โ”€ train/
  โ”‚ โ”œโ”€โ”€ val/
  โ”‚ โ”œโ”€โ”€ test/
  โ”‚ โ”œโ”€โ”€ train_seqmap.txt
  โ”‚ โ”œโ”€โ”€ val_seqmap.txt
  โ”‚ โ””โ”€โ”€ test_seqmap.txt
  โ”œโ”€โ”€ MOT17/
  โ”‚ โ”œโ”€โ”€ images/
  โ”‚ โ”‚ โ”œโ”€โ”€ train/
  โ”‚ โ”‚ โ””โ”€โ”€ test/
  โ”‚ โ””โ”€โ”€ gts/
  โ”‚   โ””โ”€โ”€ train/
  โ””โ”€โ”€ CrowdHuman/
    โ”œโ”€โ”€ images/
    โ”‚ โ”œโ”€โ”€ train/
    โ”‚ โ””โ”€โ”€ val/
    โ””โ”€โ”€ gts/
      โ”œโ”€โ”€ train/
      โ””โ”€โ”€ val/

undefined symbol: _ZNK2at6Tensor7optionsEv

ไฝ ๅฅฝ๏ผŒๅฝ“ๆˆ‘ไฝฟ็”จ็Žฏๅขƒไธบtorch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1ไปฅๅŠCUDA=11.3ๆ—ถ๏ผŒไผšๅ‡บ็Žฐไปฅไธ‹้—ฎ้ข˜๏ผš

ImportError: /cver/tcying/lib/python3.8/site-packages/MultiScaleDeformableAttention-1.0-py3.8-linux-x86_64.egg/MultiScaleDeformableAttention.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZNK2at6Tensor7optionsEv

่ฟ™่ฒŒไผผๆ˜ฏDETR็ผ–่ฏ‘็š„้—ฎ้ข˜๏ผŒๅ› ไธบๆˆ‘ๅœจๆ‰ง่กŒtest.pyๆ—ถไนŸไผšๆœ‰ๅŒๆ ท็š„้”™่ฏฏใ€‚ๆˆ‘ๆขๆˆๆœ€ๆ–ฐ็š„pytorch็‰ˆๆœฌไพๆ—งไผšๆœ‰่ฟ™ๆ ท็š„้—ฎ้ข˜ใ€‚

ไฝ†ๆ˜ฏๆˆ‘ๅฐ†็Žฏๅขƒๆขๆˆtorch==1.9.1+cu111 torchvision==0.10.1+cu111 torchaudio==0.9.1ไปฅๅŠCUDA=11.1ๆ—ถ๏ผŒDETR็ผ–่ฏ‘ๆˆๅŠŸไบ†๏ผŒไฝ†ๆ˜ฏ่ฟ่กŒๆ—ถๅˆไผšๅ‡บ็Žฐไปฅไธ‹้”™่ฏฏ๏ผš

Traceback (most recent call last):
  File "main.py", line 120, in <module>
    main(config=merged_config)
  File "main.py", line 99, in main
    from train_engine import train
  File "/cver/tcying/ytc/MeMOTR/train_engine.py", line 12, in <module>
    from models import build_model
  File "/cver/tcying/ytc/MeMOTR/models/__init__.py", line 6, in <module>
    from .memotr import build as build_memotr
  File "/cver/tcying/ytc/MeMOTR/models/memotr.py", line 13, in <module>
    from .backbone import BackboneWithPE
  File "/cver/tcying/ytc/MeMOTR/models/backbone.py", line 8, in <module>
    from torchvision.models import resnet50, ResNet50_Weights
ImportError: cannot import name 'ResNet50_Weights' from 'torchvision.models' (/cver/tcying/lib/python3.8/site-packages/torchvision/models/__init__.py)

่ฟ™ไธŽ้—ฎ้ข˜ #6 ๅพˆๅƒ๏ผŒไฝ†ๆ˜ฏๆˆ‘ไธ็Ÿฅ้“่ฏฅๅฆ‚ไฝ•่งฃๅ†ณใ€‚

How to get bbox for occluded frames using motion params?

Hi, my tracker is performing well it keeps track of the object even if it is occluded for 10 to 15 frames. Now how can I get BBox for these occluded frames? I have seen motion parameters in runtime tracker. How can I make use of those params?
Screenshot from 2024-02-24 00-49-18

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.