Git Product home page Git Product logo

mixformer's Introduction

MixFormer

The official implementation of the CVPR 2022 paper MixFormer: End-to-End Tracking with Iterative Mixed Attention

PWC

PWC

PWC

[Models and Raw results] (Google Driver) [Models and Raw results] (Baidu Driver: hmuv)

MixFormer_Framework

News

[Jan 01, 2024]

[Feb 10, 2023]

  • 🔥🔥🔥 Code and models for MixViT and MixViT-ConvMAE are available now ! Thank Tianhui Song for helping us clean up the code.

[Feb 8, 2023]

  • Extended version has been available at https://arxiv.org/abs/2302.02814. In particular, the extented MixViT-L(ConvMAE) achieves AUC score of 73.3% on LaSOT. Besides, we design a new TrackMAE pre-training method for tracking. Code and models will be updated soon.

[Oct 26, 2022]

  • MixFormerL (based on MixViT-L) rank 1/41 on VOT2022-STb public dataset.
  • The VOT2022-RGBD and VOT2022-D winners of MixForRGBD and MixForD, implemented by Lai Simiao, are constructed upon our MixFormer.
  • The VOT2022-STs winner of MS-AOT employs MixFormer as a part of the tracker. The VOT2022-STb winner of APMT_MR employs the SPM proposed in MixFormer to select dynamic templates.

[Mar 29, 2022]

  • Our paper is selected for an oral presentation.

[Mar 21, 2022]

  • MixFormer is accepted to CVPR2022.
  • We release Code, models and raw results.

Highlights

✨ New transformer tracking framework

MixFormer is composed of a target-search mixed attention (MAM) based backbone and a simple corner head, yielding a compact tracking pipeline without an explicit integration module.

✨ End-to-end, post-processing-free

Mixformer is an end-to-end tracking framework without post-processing.

✨ Strong performance

Tracker VOT2020 (EAO) LaSOT (NP) GOT-10K (AO) TrackingNet (NP)
MixViT-L (ConvMAE) 0.567 82.8 - 90.3
MixViT-L 0.584 82.2 75.7 90.2
MixCvT 0.555 79.9 70.7 88.9
ToMP101* (CVPR2022) - 79.2 - 86.4
SBT-large* (CVPR2022) 0.529 - 70.4 -
SwinTrack* (Arxiv2021) - 78.6 69.4 88.2
Sim-L/14* (Arxiv2022) - 79.7 69.8 87.4
STARK (ICCV2021) 0.505 77.0 68.8 86.9
KeepTrack (ICCV2021) - 77.2 - -
TransT (CVPR2021) 0.495 73.8 67.1 86.7
TrDiMP (CVPR2021) - - 67.1 83.3
Siam R-CNN (CVPR2020) - 72.2 64.9 85.4
TREG (Arxiv2021) - 74.1 66.8 83.8

Install the environment

Use the Anaconda

conda create -n mixformer python=3.6
conda activate mixformer
bash install_pytorch17.sh

Data Preparation

Put the tracking datasets in ./data. It should look like:

${MixFormer_ROOT}
 -- data
     -- lasot
         |-- airplane
         |-- basketball
         |-- bear
         ...
     -- got10k
         |-- test
         |-- train
         |-- val
     -- coco
         |-- annotations
         |-- train2017
     -- trackingnet
         |-- TRAIN_0
         |-- TRAIN_1
         ...
         |-- TRAIN_11
         |-- TEST

Set project paths

Run the following command to set paths for this project

python tracking/create_default_local_file.py --workspace_dir . --data_dir ./data --save_dir .

After running this command, you can also modify paths by editing these two files

lib/train/admin/local.py  # paths about training
lib/test/evaluation/local.py  # paths about testing

Train MixFormer

Training with multiple GPUs using DDP. More details of other training settings can be found at tracking/train_mixformer_[cvt/vit/convmae].sh for different backbone respectively.

# MixFormer with CVT backbone
bash tracking/train_mixformer_cvt.sh

# MixFormer with ViT backbone
bash tracking/train_mixformer_vit.sh

# MixFormer with ConvMAE backbone
bash tracking/train_mixformer_convmae.sh

Test and evaluate MixFormer on benchmarks

  • LaSOT/GOT10k-test/TrackingNet/OTB100/UAV123. More details of test settings can be found at tracking/test_mixformer_[cvt/vit/convmae].sh
bash tracking/test_mixformer_cvt.sh
bash tracking/test_mixformer_vit.sh
bash tracking/test_mixformer_convmae.sh
  • VOT2020
    Before evaluating "MixFormer+AR" on VOT2020, please install some extra packages following external/AR/README.md. Also, the VOT toolkit is required to evaluate our tracker. To download and instal VOT toolkit, you can follow this tutorial. For convenience, you can use our example workspaces of VOT toolkit under external/vot20/ by setting trackers.ini.
cd external/vot20/<workspace_dir>
vot evaluate --workspace . MixFormerPython
# generating analysis results
vot analysis --workspace . --nocache

Run MixFormer on your own video

bash tracking/run_video_demo.sh

Compute FLOPs/Params and test speed

bash tracking/profile_mixformer.sh

Visualize attention maps

bash tracking/vis_mixformer_attn.sh

vis_attn

Model Zoo and raw results

The trained models and the raw tracking results are provided in the [Models and Raw results] (Google Driver) or [Models and Raw results] (Baidu Driver: hmuv).

Contact

Yutao Cui: [email protected]

Acknowledgments

  • Thanks for PyTracking Library and STARK Library, which helps us to quickly implement our ideas.
  • We use the implementation of the CvT from the official repo CvT.

✏️ Citation

If you think this project is helpful, please feel free to leave a star⭐️ and cite our paper:

@inproceedings{cui2022mixformer,
  title={Mixformer: End-to-end tracking with iterative mixed attention},
  author={Cui, Yutao and Jiang, Cheng and Wang, Limin and Wu, Gangshan},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={13608--13618},
  year={2022}
}
@ARTICLE{cui2023mixformer,
      journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, 
      title={MixFormer: End-to-End Tracking with Iterative Mixed Attention}, 
      author={Yutao Cui and Cheng Jiang and Gangshan Wu and Limin Wang},
      year={2024}
}

mixformer's People

Contributors

songtianhui avatar wanglimin avatar yutaocui avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

mixformer's Issues

Training time?

Thanks for your solid work. How long does it take to train the Base and Large model on V100 or 2080Ti?

multi-layer feature aggregation strategy and long-term tracking

Thanks for sharing the excellent work!
I have two small questions.
First, as mentioned in your paper, the multi-layer feature aggregation strategy is commonly used in other trackers (e.g., SiamRPN++, STARK). The one in SiamRPN++ is understandable, but the one in STARK is confusing. STARK seems to only use the last stride=16 features for prediction. I would like to know what is the main difference between MixFormer and STARK in this regard?
Second, have you tested the MixFormer on the VOT long-term dataset? STARK performs well on long-term tracking, and it feels like MixFormer could work better.

An error was encountered while testing

Thank you for your outstanding work.
I reproduce your code, there is an error:

{'model': 'mixformer_online_22k.pth.tar', 'search_area_scale': 4.5, 'max_score_decay': 1.0, 'vis_attn': 1}
test config: {'MODEL': {'HEAD_TYPE': 'CORNER', 'HIDDEN_DIM': 384, 'NUM_OBJECT_QUERIES': 1, 'POSITION_EMBEDDING': 'sine', 'PREDICT_MASK': False, 'BACKBONE': {'PRETRAINED': True, 'PRETRAINED_PATH': '', 'INIT': 'trunc_norm', 'NUM_STAGES': 3, 'PATCH_SIZE': [7, 3, 3], 'PATCH_STRIDE': [4, 2, 2], 'PATCH_PADDING': [2, 1, 1], 'DIM_EMBED': [64, 192, 384], 'NUM_HEADS': [1, 3, 6], 'DEPTH': [1, 4, 16], 'MLP_RATIO': [4.0, 4.0, 4.0], 'ATTN_DROP_RATE': [0.0, 0.0, 0.0], 'DROP_RATE': [0.0, 0.0, 0.0], 'DROP_PATH_RATE': [0.0, 0.0, 0.1], 'QKV_BIAS': [True, True, True], 'CLS_TOKEN': [False, False, False], 'POS_EMBED': [False, False, False], 'QKV_PROJ_METHOD': ['dw_bn', 'dw_bn', 'dw_bn'], 'KERNEL_QKV': [3, 3, 3], 'PADDING_KV': [1, 1, 1], 'STRIDE_KV': [2, 2, 2], 'PADDING_Q': [1, 1, 1], 'STRIDE_Q': [1, 1, 1], 'FREEZE_BN': True}, 'PRETRAINED_STAGE1': True, 'NLAYER_HEAD': 3, 'HEAD_FREEZE_BN': True}, 'TRAIN': {'TRAIN_SCORE': True, 'SCORE_WEIGHT': 1.0, 'LR': 0.0001, 'WEIGHT_DECAY': 0.0001, 'EPOCH': 30, 'LR_DROP_EPOCH': 20, 'BATCH_SIZE': 32, 'NUM_WORKER': 8, 'OPTIMIZER': 'ADAMW', 'BACKBONE_MULTIPLIER': 0.1, 'GIOU_WEIGHT': 2.0, 'L1_WEIGHT': 5.0, 'DEEP_SUPERVISION': False, 'FREEZE_STAGE0': False, 'PRINT_INTERVAL': 50, 'VAL_EPOCH_INTERVAL': 5, 'GRAD_CLIP_NORM': 0.1, 'SCHEDULER': {'TYPE': 'step', 'DECAY_RATE': 0.1}}, 'DATA': {'SAMPLER_MODE': 'trident_pro', 'MEAN': [0.485, 0.456, 0.406], 'STD': [0.229, 0.224, 0.225], 'MAX_SAMPLE_INTERVAL': [200], 'TRAIN': {'DATASETS_NAME': ['GOT10K_vottrain', 'LASOT', 'COCO17', 'TRACKINGNET'], 'DATASETS_RATIO': [1, 1, 1, 1], 'SAMPLE_PER_EPOCH': 60000}, 'VAL': {'DATASETS_NAME': ['GOT10K_votval'], 'DATASETS_RATIO': [1], 'SAMPLE_PER_EPOCH': 10000}, 'SEARCH': {'SIZE': 320, 'FACTOR': 5.0, 'CENTER_JITTER': 4.5, 'SCALE_JITTER': 0.5}, 'TEMPLATE': {'SIZE': 128, 'FACTOR': 2.0, 'NUMBER': 2, 'CENTER_JITTER': 0, 'SCALE_JITTER': 0}}, 'TEST': {'TEMPLATE_FACTOR': 2.0, 'TEMPLATE_SIZE': 128, 'SEARCH_FACTOR': 5.0, 'SEARCH_SIZE': 320, 'EPOCH': 40, 'UPDATE_INTERVALS': {'LASOT': [200], 'GOT10K_TEST': [10], 'TRACKINGNET': [25], 'VOT20': [10], 'VOT20LT': [200], 'OTB': [6], 'UAV': [200]}, 'ONLINE_SIZES': {'LASOT': [2], 'GOT10K_TEST': [2], 'TRACKINGNET': [1], 'VOT20': [5], 'VOT20LT': [3], 'OTB': [3], 'UAV': [1]}}}
search_area_scale: 4.5
Evaluating 1 trackers on 1 sequences
Tracker: mixformer_online baseline None , Sequence: Basketball
Warning: Pretrained CVT weights are not loaded
head channel: 384
Online size is: 3
Update interval is: 6
max score decay = 1.0
Error while processing rearrange-reduction pattern "b (h w) c -> b c h w".
Input tensor shape: torch.Size([1, 1, 2048, 64]). Additional info: {'h': 32, 'w': 32}.
Expected 3 dimensions, got 4
Done

How to solve this problem?

How to train SPM in stage2?

Thank you for your excellent work. I have some questions about the training process of SPM.

I encounter a problem when I use the script in train_mixformer.sh to train SPM module.
python tracking/train.py --script mixformer_online --config baseline --save_dir /mysavepath --mode multiple --nproc_per_node 1 --stage1_model /mylatest checkpoint trained in the first stage

But the logs show that it seems that the program has loaded wrong checkpoint because there are so many missing keys

missing keys: ['score_branch.score_token', 'score_branch.score_head.layers.0.weight', 'score_branch.score_head.layers.0.bias', 'score_branch.score_head.layers.1.weight', 'score_branch.score_head.layers.1.bias', 'score_branch.score_head.layers.2.weight', 'score_branch.score_head.layers.2.bias', 'score_branch.proj_q.0.weight', 'score_branch.proj_q.0.bias', 'score_branch.proj_q.1.weight', 'score_branch.proj_q.1.bias', 'score_branch.proj_k.0.weight', 'score_branch.proj_k.0.bias', 'score_branch.proj_k.1.weight', 'score_branch.proj_k.1.bias', 'score_branch.proj_v.0.weight', 'score_branch.proj_v.0.bias', 'score_branch.proj_v.1.weight', 'score_branch.proj_v.1.bias', 'score_branch.proj.0.weight', 'score_branch.proj.0.bias', 'score_branch.proj.1.weight', 'score_branch.proj.1.bias', 'score_branch.norm1.weight', 'score_branch.norm1.bias', 'score_branch.norm2.0.weight', 'score_branch.norm2.0.bias', 'score_branch.norm2.1.weight', 'score_branch.norm2.1.bias'] unexpected keys: [] Loading pretrained mixformer weights done.

I am really confused about how to train the SPM module correctly.

I am appreciate if you can give me some advices.

The whole log shows below:

error logs.txt

关于如何验证模块的可信性?

你好,我是一个小白。我想问一下,您在做实验时,是每加上一个模块,就在got-10k上面训练一遍就测试验证模块的可行性,还是在全量数据集(TrackingNet、COCO、GOT10K、LASOT)一起训练并测试验证该模块的可行性呢?

最近发现我的模型在仅GOT-10训练的情况下,表现很好(AO超过0.74),但是在全量数据集上表现不尽人意(LASOT只有68- 69),十分困惑。

Can I get guideline path?

I want to test model, but i got this error

RuntimeError: YOU HAVE NOT SETUP YOUR local.py!!!

Only i want to test pretrained model, this local.py path need?

TensorRT and ONNX

Hello
Thanks for your great work
How can i convert the model to onnx or TensorRT?
Thank you

What's the difference between experiments/mixformer & mixformer_online?

Hello!
GOT-10k, I used experiment/mixformer/baseline_got.yaml for training and mixformer_online_22k_got.pth.tar for pre-training model, but the training result were not available for testing. There were no results after testing, and no errors were reported during testing. It looks like we didn't get the yaml file. My test statement is :
python tracking/test.py mixformer_online baseline_got --dataset_name got10k_test --threads 12 --num_gpus 2 --params_model 'model absolute path' --params__search_area+scale 4.55.

And then used experiment/mixformer_online/baseline_got.yaml. Repeat this and it works, but the results don't seem to be right. In result.txt , four our numbers, the last two are always 10,10. This was not the case when I used the pre-trained model directly for testing.

Looking forward to your reply.

The code of the Mixed Attention Module

I'm a little confused about the details of the MAM implementation.

In def forward_test() of the class Attention() of lib/models/mixformer/mixformer_online.py, it seems that there is only one calculation process of attention, where q, k, and v are obtained as q = rearrange(search, 'b c h w -> b (h w) c').contiguous(), k = torch.cat([self.t_k, self.ot_k, k], dim=1), and v = torch.cat([self.t_v, self.ot_v, v], dim=1). However, Figure 2 of this paper contains 1 multi-head attention function and 2 attention operations, which cannot directly correspond to the code.

I'm guessing you took a more convenient code in your implementation. Sorry for my poor understanding of code, please explain the details so that I can understand better.

VOT2020

In the results of VOT2020, what does unsupervised mean?
QQ图片20220518101002

VOT

Are you participating in the VOT2022 challenge? Can I ask you some questions about vot-tookit?

Questions about labels used in the training process of Score Prediction Module(SPM)

@yutaocui Thank you for your excellent work!

I have noticed that it seems the labels used for training SPM module are generated randomly by fixed pos_prob.

if random.random() < self.pos_prob:
label = torch.ones(1,)
search_frames, search_anno, meta_obj_test = dataset.get_frames(seq_id, search_frame_ids, seq_info_dict)
search_masks = search_anno['mask'] if 'mask' in search_anno else [torch.zeros(
(H, W))] * self.num_search_frames
# negative samples
else:
label = torch.zeros(1,)
if is_video_dataset:
search_frame_ids = self._sample_visible_ids(visible, num_ids=1, force_invisible=True)
if search_frame_ids is None:
search_frames, search_anno, meta_obj_test = self.get_one_search()
else:
search_frames, search_anno, meta_obj_test = dataset.get_frames(seq_id, search_frame_ids,
seq_info_dict)
search_anno["bbox"] = [self.get_center_box(H, W)]

But SPM module aims at selecting high quality templates.

I am confused that how could this randomly labels play a role in selecting templates.

I am appreciate it if you could solve my questions.

Training log of training SPM

Could you show the training log of stage 2 (i.e., training SPM)?
I want to see if my training process is normal as a reference. Thank you.

Is this a typo?

In line751, it should be named online_template not template, or just am I misunderstanding?

def forward(self, template, online_template, search, run_score_head=False, gt_bboxes=None):
# search: (b, c, h, w)
if template.dim() == 5:
template = template.squeeze(0)
if online_template.dim() == 5:
template = online_template.squeeze(0)
if search.dim() == 5:
search = search.squeeze(0)
template, search = self.backbone(template, online_template, search)
# Forward the corner head
return self.forward_box_head(search)

about mutil-template test

Hello! I found that in the multi-templates test, the test strategy is not the same as the test strategy of the two-templates. The template and the search region are separately calculated for attention, which is different from the training strategy, the k and v values in the MAM module are operated by concat. will this make any difference?

repeat tracker initialize?

First of all, I thanks for your clean and high-quality codes.
But in

def initialize(self, image, mask):
region = rect_from_mask(mask)
init_info = {'init_bbox': region}
self.tracker.initialize(image, init_info)
self.H, self.W, _ = image.shape
gt_bbox_np = np.array(region).astype(np.float32)
'''Initialize STARK for specific video'''
init_info = {'init_bbox': list(gt_bbox_np)}
self.tracker.initialize(image, init_info)
'''initilize refinement module for specific video'''
self.alpha.initialize(image, np.array(gt_bbox_np))

I find two times of tracker.initialize. I think initialize is just a setting step (not online update step), why do we need two times of that?

About Score Prediction Module (SPM) and MixFormer-1k

Hi, Thanks for your work. I have some questions about your paper:

  1. Have you ever tried to use the score prediction module of STARK (MLP) instead of the SPM proposed in your paper? I am curious about the performance difference between SPM and using MLP directly.
  2. The MixFormer-1k model seems to be trained with all dataset, not just the GOT10k, which is different from your paper (it is unreasonable that MixFormer-1k performs better than MixFormer-GOT if MixFormer-1k is also trained with GOT10k only). Is it fair to use it for comparison on GOT10k test?

vot2020

After configuring according to the readme, I want to test vote2020, but ''Unable to connect to tracker'' appears, how can I solve it?

About update MixedAttention operation

Thank you for open source such an excellent work!

The original version before the update was to separate Q, K, and V into templates, online templates, and search ranges, the templates and online templates were separately performed Attention, as shown in the following:

 # template attention
    k1 = torch.cat([k_t, k_ot], dim=2)
    v1 = torch.cat([v_t, v_ot], dim=2)
    attn_score = torch.einsum('bhlk,bhtk->bhlt', [q_t, k1]) * self.scale
    attn = F.softmax(attn_score, dim=-1)
    attn = self.attn_drop(attn)
    x_t = torch.einsum('bhlt,bhtv->bhlv', [attn, v1])
    x_t = rearrange(x_t, 'b h t d -> b t (h d)')

  # online template attention
    k2 = torch.cat([k_t, k_ot], dim=2)
    v2 = torch.cat([v_t, v_ot], dim=2)
    attn_score = torch.einsum('bhlk,bhtk->bhlt', [q_ot, k2]) * self.scale
    attn = F.softmax(attn_score, dim=-1)
    attn = self.attn_drop(attn)
    x_ot = torch.einsum('bhlt,bhtv->bhlv', [attn, v2])
    x_ot = rearrange(x_ot, 'b h t d -> b t (h d)')

Especially in the calculation of attn_score, this part is calculated by q_t and k1, q_ot and k2 (both k1 and k2 are templates concatenated with online templates):

attn_score = torch.einsum('bhlk,bhtk->bhlt', [q_t, k1]) * self.scale

attn_score = torch.einsum('bhlk,bhtk->bhlt', [q_ot, k2]) * self.scale

And now the updated version is that the template and the online template are merged together to execute Attention at the beginning, as shown below:

attn_score = torch.einsum('bhlk,bhtk->bhlt', [q_mt, k_mt]) * self.scale

I would like to ask whether the two ways of calculating templates and online templates Attention before and after the update are equivalent?

About Visualizing Attention Maps

The code about the visualizing attention maps in mixformer_online.py reports an error:
RuntimeError: shape '[8,8,4,4]' is invalid for input of size 2048.

Here, I mainly want to consult about the meanings of q_w, k_w, skip_len, etc., and why attn_weights[::3] when visualizing the attention weights of online_template-to-template, and attn_weights[1::3] when visualizing the attention weights of template-to-online_template?

Looking forward to your answer.

About training logs

Hello!This is an excellent work. I don't see the training log provided in the model zoo. Can you provide the training log of the reproduced results later? like stark

输入图像的问题

@jcaha @yutaocui 你们好,

mixformer的输入图像们在训练和推理过程中是个什么情况?

我看论文中提到需要2个模板图和1个搜索图, 这个是什么原理呢?
我的工作这里只有1个模板图和1个搜索图, 能适用于mixformer吗?

C. Training Details
We propose a 320x320 search region plus two 128x128 input
images to make a fair comparison with prevailing trackers (e.g.,
Siamese-based trackers, STARK and TransT).

Do you get stuck on the first dataset when you run it? Evaluating 1 trackers on 1 sequences Tracker: mixformer_online baseline None , Sequence: Basketball Warning: Pretrained CVT weights are not loaded head channel: 384 It's just like this, it's been running all night

When I tested it, it was stuck below, and it was still the same after running for a night. How can I solve it?
Evaluating 1 trackers on 1 sequences
Tracker: mixformer_online baseline None , Sequence: Basketball
Warning: Pretrained CVT weights are not loaded
head channel: 384

Con not compile Precise RoI Pooling library

{'model': 'mixformer_online_22k.pth.tar', 'update_interval': 25, 'online_sizes': 3, 'search_area_scale': 4.5, 'max_score_decay': 1.0, 'vis_attn': 0}
test config: {'MODEL': {'HEAD_TYPE': 'CORNER', 'HIDDEN_DIM': 384, 'NUM_OBJECT_QUERIES': 1, 'POSITION_EMBEDDING': 'sine', 'PREDICT_MASK': False, 'BACKBONE': {'PRETRAINED': True, 'PRETRAINED_PATH': '', 'INIT': 'trunc_norm', 'NUM_STAGES': 3, 'PATCH_SIZE': [7, 3, 3], 'PATCH_STRIDE': [4, 2, 2], 'PATCH_PADDING': [2, 1, 1], 'DIM_EMBED': [64, 192, 384], 'NUM_HEADS': [1, 3, 6], 'DEPTH': [1, 4, 16], 'MLP_RATIO': [4.0, 4.0, 4.0], 'ATTN_DROP_RATE': [0.0, 0.0, 0.0], 'DROP_RATE': [0.0, 0.0, 0.0], 'DROP_PATH_RATE': [0.0, 0.0, 0.1], 'QKV_BIAS': [True, True, True], 'CLS_TOKEN': [False, False, False], 'POS_EMBED': [False, False, False], 'QKV_PROJ_METHOD': ['dw_bn', 'dw_bn', 'dw_bn'], 'KERNEL_QKV': [3, 3, 3], 'PADDING_KV': [1, 1, 1], 'STRIDE_KV': [2, 2, 2], 'PADDING_Q': [1, 1, 1], 'STRIDE_Q': [1, 1, 1], 'FREEZE_BN': True}, 'PRETRAINED_STAGE1': True, 'NLAYER_HEAD': 3, 'HEAD_FREEZE_BN': True}, 'TRAIN': {'TRAIN_SCORE': True, 'SCORE_WEIGHT': 1.0, 'LR': 0.0001, 'WEIGHT_DECAY': 0.0001, 'EPOCH': 30, 'LR_DROP_EPOCH': 20, 'BATCH_SIZE': 32, 'NUM_WORKER': 8, 'OPTIMIZER': 'ADAMW', 'BACKBONE_MULTIPLIER': 0.1, 'GIOU_WEIGHT': 2.0, 'L1_WEIGHT': 5.0, 'DEEP_SUPERVISION': False, 'FREEZE_STAGE0': False, 'PRINT_INTERVAL': 50, 'VAL_EPOCH_INTERVAL': 5, 'GRAD_CLIP_NORM': 0.1, 'SCHEDULER': {'TYPE': 'step', 'DECAY_RATE': 0.1}}, 'DATA': {'SAMPLER_MODE': 'trident_pro', 'MEAN': [0.485, 0.456, 0.406], 'STD': [0.229, 0.224, 0.225], 'MAX_SAMPLE_INTERVAL': [200], 'TRAIN': {'DATASETS_NAME': ['GOT10K_vottrain', 'LASOT', 'COCO17', 'TRACKINGNET'], 'DATASETS_RATIO': [1, 1, 1, 1], 'SAMPLE_PER_EPOCH': 60000}, 'VAL': {'DATASETS_NAME': ['GOT10K_votval'], 'DATASETS_RATIO': [1], 'SAMPLE_PER_EPOCH': 10000}, 'SEARCH': {'SIZE': 320, 'FACTOR': 5.0, 'CENTER_JITTER': 4.5, 'SCALE_JITTER': 0.5}, 'TEMPLATE': {'SIZE': 128, 'FACTOR': 2.0, 'NUMBER': 2, 'CENTER_JITTER': 0, 'SCALE_JITTER': 0}}, 'TEST': {'TEMPLATE_FACTOR': 2.0, 'TEMPLATE_SIZE': 128, 'SEARCH_FACTOR': 5.0, 'SEARCH_SIZE': 320, 'EPOCH': 40, 'UPDATE_INTERVALS': {'LASOT': [200], 'GOT10K_TEST': [10], 'TRACKINGNET': [25], 'VOT20': [10], 'VOT20LT': [200], 'OTB': [6], 'UAV': [200]}, 'ONLINE_SIZES': {'LASOT': [2], 'GOT10K_TEST': [2], 'TRACKINGNET': [1], 'VOT20': [5], 'VOT20LT': [3], 'OTB': [3], 'UAV': [1]}}}
search_area_scale: 4.5
Warning: Pretrained CVT weights are not loaded
head channel: 384
Online size is: 3
Update interval is: 25
max score decay = 1.0
Using C:\Users\210\AppData\Local\torch_extensions\torch_extensions\Cache as PyTorch extensions root...
C:\Users\210\anaconda3\envs\mixformer1\lib\site-packages\torch\utils\cpp_extension.py:274: UserWarning: Error checking compiler version for cl: 'utf-8' codec can't decode byte 0xd3 in position 0: invalid continuation byte
warnings.warn('Error checking compiler version for {}: {}'.format(compiler, error))
Detected CUDA files, patching ldflags
Emitting ninja build file C:\Users\210\AppData\Local\torch_extensions\torch_extensions\Cache_prroi_pooling\build.ninja...
Building extension module _prroi_pooling...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
1.10.2
Loading extension module _prroi_pooling...
Traceback (most recent call last):
File "tracking..\external\PreciseRoIPooling\pytorch\prroi_pool\functional.py", line 33, in _import_prroi_pooling
verbose=True
File "C:\Users\210\anaconda3\envs\mixformer1\lib\site-packages\torch\utils\cpp_extension.py", line 980, in load
keep_intermediates=keep_intermediates)
File "C:\Users\210\anaconda3\envs\mixformer1\lib\site-packages\torch\utils\cpp_extension.py", line 1196, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
File "C:\Users\210\anaconda3\envs\mixformer1\lib\site-packages\torch\utils\cpp_extension.py", line 1543, in _import_module_from_library
file, path, description = imp.find_module(module_name, [path])
File "C:\Users\210\anaconda3\envs\mixformer1\lib\imp.py", line 297, in find_module
raise ImportError(_ERR_MSG.format(name), name=name)
ImportError: No module named '_prroi_pooling'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "tracking/video_demo.py", line 53, in
main()
File "tracking/video_demo.py", line 49, in main
args.save_results, tracker_params=tracker_params)
File "tracking/video_demo.py", line 21, in run_video
tracker.run_video(videofilepath=videofile, optional_box=optional_box, debug=debug, save_results=save_results)
File "tracking..\lib\test\evaluation\tracker.py", line 228, in run_video
out = tracker.track(frame)
File "tracking..\lib\test\tracker\mixformer_online.py", line 135, in track
out_dict, _ = self.network.forward_test(search, run_score_head=True)
File "tracking..\lib\models\mixformer\mixformer_online.py", line 850, in forward_test
out, outputs_coord_new = self.forward_head(search, template, run_score_head, gt_bboxes)
File "tracking..\lib\models\mixformer\mixformer_online.py", line 875, in forward_head
out_dict.update({'pred_scores': self.score_branch(search, template, gt_bboxes).view(-1)})
File "C:\Users\210\anaconda3\envs\mixformer1\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "tracking..\lib\models\mixformer\mixformer_online.py", line 798, in forward
search_box_feat = rearrange(self.search_prroipool(search_feat, target_roi), 'b c h w -> b (h w) c')
File "C:\Users\210\anaconda3\envs\mixformer1\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "tracking..\external\PreciseRoIPooling\pytorch\prroi_pool\prroi_pool.py", line 28, in forward
return prroi_pool2d(features, rois, self.pooled_height, self.pooled_width, self.spatial_scale)
File "tracking..\external\PreciseRoIPooling\pytorch\prroi_pool\functional.py", line 44, in forward
_prroi_pooling = _import_prroi_pooling()
File "tracking..\external\PreciseRoIPooling\pytorch\prroi_pool\functional.py", line 36, in _import_prroi_pooling
raise ImportError('Can not compile Precise RoI Pooling library.')
ImportError: Can not compile Precise RoI Pooling library.

Please help me! Thanks very much

Data sampling manner

Hello! As for the data sampling manner, I found the code uses the causal sampling manner instead of the trident sampling manner, like stark. Is there any difference in the results?

"trident_pro" sample mode

Hi, why the template_frame_ids_extra could be invisible (line 316) when the sample mode is set to "trident_pro"?

def get_frame_ids_trident(self, visible):
# get template and search ids in a 'trident' manner
template_frame_ids_extra = []
while None in template_frame_ids_extra or len(template_frame_ids_extra) == 0:
template_frame_ids_extra = []
# first randomly sample two frames from a video
template_frame_id1 = self._sample_visible_ids(visible, num_ids=1) # the initial template id
search_frame_ids = self._sample_visible_ids(visible, num_ids=1) # the search region id
# get the dynamic template id
for max_gap in self.max_gap:
if template_frame_id1[0] >= search_frame_ids[0]:
min_id, max_id = search_frame_ids[0], search_frame_ids[0] + max_gap
else:
min_id, max_id = search_frame_ids[0] - max_gap, search_frame_ids[0]
if self.frame_sample_mode == "trident_pro":
f_id = self._sample_visible_ids(visible, num_ids=1, min_id=min_id, max_id=max_id,
allow_invisible=True)
else:
f_id = self._sample_visible_ids(visible, num_ids=1, min_id=min_id, max_id=max_id)
if f_id is None:
template_frame_ids_extra += [None]
else:
template_frame_ids_extra += f_id
template_frame_ids = template_frame_id1 + template_frame_ids_extra
return template_frame_ids, search_frame_ids

Some questions about AR(Alpha Refine).

Have you tried to use Alpha-Refine for evaluation on datasets such as GOT-1OK and TrackingNet? If you have tried, can you provide this part of the code? Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.