whai362 / pvt Goto Github PK

View Code? Open in Web Editor NEW

1.7K 23.0 245.0 14.82 MB

Official implementation of PVT series

License: Apache License 2.0

Python 99.63% Shell 0.37%

transformer backbone pvt detection segmentation pvtv2

pvt's Introduction

My homepage: https://whai362.github.io/

pvt's People

Contributors

Stargazers

Watchers

Forkers

dlwbm123 rainfalj hiyyg liuguoyou jlqzzz peternara legendbc cassie-cv trendingtechnology chaoso dengpingfan peterouzh mtlong banyueqin saulocatharino gyanachand1 louisnust davidko3 zcl912 city292 cv-ip peterzs piaofu110 baodijun yangsenwxy wuxiaolianggit mymuli zhang-deep deepcharle liuwenhaha stevewongv chisyliu shengkaiwu xrosliang night-gale siyisan uzair789 wellxiong phecy stjordanis zlapp zengwang430521 xuewengeophysics xyz100h wh-forker nicholasxin apple3c chunyuanli tzt101 aturkelson wangjingbo1219 xeransis eadwu supersupercong lzhbrian kapitsa2811 pugangqiang bigheartdb sunsmarterjie zlpsophina implus tuskaw ohjoon2 czczup liuguangjin98 danczs cvlinks repo-collection guoquanhao mathpopo peterzhousz chen-yuu volgachen youngwanlee flyingbird93 xyxy1722 g7b9 sunting78 smilelite yihanhu-2022 zytx121 chrisbyd xingshulicc lihaossu mucaoshen ljm198134 xiaohu2015 yzhong22 cherish24 omid-nejati githubfragments krisandchris embedded-dl-lab bobleono yaooxii canyonwind xkyi bu1lder zhengkid food888for

pvt's Issues

classification-pvt load finetune model error

if 'model' in checkpoint: model_without_ddp.load_state_dict(checkpoint['model']) else: model_without_ddp.load_state_dict(checkpoint)

File "main.py", line 271, in main
model_without_ddp.load_state_dict(checkpoint)
UnboundLocalError: local variable 'model_without_ddp' referenced before assignment

Retina-PVT

Do you use NMS and anchors in the RetinaNet-PVT variation?

Pickle file problem in reading

excuse me i have the same issues with couldn't read .pkl files i got .. if you please how the pickle file was created? does the file include a load persistent ID instruction? or it contains references to data outside the pickle file ?

Thanks a lot for your time

Flops Calculation for PVT

Hi, I found in #1, an example is provided to show the complexity of ViT models (using get_vit_flops()). However, since PVT has multiple scales, I wonder if there is a tool provided to measure the flops for PVT? Thanks!

Pretrained model I download is archiev.zip, not .pth

mcloader

Hi, I want to use the mcloader setting. But I found that it fails at 'import mc'. Could you please tell me what I should do to solve this problem?

what is the difference between pvt here?

https://github.com/whai362/PVT/blob/main/pvt.py

https://github.com/whai362/PVT/blob/main/detection/pvt.py

2- does 12 epochs for detection and 300 epochs for classification ?

q

The downsample strides for pvt are different from those in retinanet?

Retinanet uses strides=[8, 16, 32, 64, 128], the first stride is 8, but the first stride for pvt is 4 according to the paper. Have you tried the same strides as retinanet for pvt?

When will you release your pretrained Models ? for image classification ?

Unable to reproduce results on Coco with Retina-PVT small 640x

Hi @whai362,

Thanks for the very nice work. I trained your model with your instructions but I am unable to achieve your results of Retina small 640 on Coco: 38.7 AP. Here is my results:

Do you have any suggestions to reproduce your results on COCO? Thanks!!

Flops Calculation for PVT

It seems a duplicate post is created. I will close this one.

where can i find your pickle code ?

what is the expected content in the pickle file if I want to make detection for image .. the objects and their boundaries only or where can i find the class of pickle you used as i searched but couldn't find it .. thanks for your time

pretrained model load

hello~, i am very interested in your work. Now i meet some questions when the pretrained model was load

checkpoint = torch.load(args.finetune, map_location='cpu')

debug:

pos_embed_checkpoint = checkpoint_model['pos_embed']
the checkpiont have "pos_embed1" "pos_embed2" "pos_embed3" "pos_embed4", but no "pos_embed"

Object Detections Tasks

Hi, great work!
It seems like this code only supports classification tasks, when will the Retina-PVT code be available?

Visualize PVT detection?

How can we visualize the trained models? Is there any demo?
Thanks

Have you compare inference speed or resource cost while testing between ResNet and PVT?

which one for detection ?

1- which one pre-trained i should use for object detection??
https://drive.google.com/file/d/1L5wh2rYsVnuC_CEeFE6yMhU1kENt2gnk/view?usp=sharing
or
https://drive.google.com/file/d/1vtcyoU8KUqNzktlMGXZrYcMRsNNiVZFQ/view?usp=sharing

The 'in_channels' in neck and the nn.Conv2d module in net?

Hi, I have two questions:

The in_channels in neck is much smaller than the original RetinaNet. Did you adjust these parameters intentionally？
I found there still is nn.Conv2d module in the model, so is it a default module in transformer?

pvt_small is not in the backbone registry

i got this error raise type(e)(f'{obj_cls.__name__}: {e}') KeyError: "RetinaNet: 'pvt_small is not in the backbone registry'" when I used python train.py configs/retinanet_pvt_s_fpn_1x_coco_640.py

Question about the drop path in classification model

Thanks for your great work. When I'm reading your code, I noticed that in main.py, line 379
model_without_ddp.reset_drop_path(0.0)
where you manually set the drop path rate to 0, instead of using parser argument.
I'd like to know if this is done intentionally for the classification task, since many related works set the drop path rate to 0.1.
So there are two questions,

Why not using parser argument to set it?
Are all the results in your paper trained with 0 drop path rate?

Thanks a lot!

Pretrained model error

When I load the pretrained weights of Sparse R-CNN with PVT-b2 backbone, it showed this error: "RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory". I believe the file was not saved correctly.

The model and loaded state dict do not match exactly

i got those lines when tried to train images with this command
./dist_train.sh configs/retinanet_pvt_s_fpn_1x_coco_640.py 8 for object detection Task

WARNING - The model and loaded state dict do not match exactly unexpected key in source state_dict: cls_token, norm.weight, norm.bias, head.weight, head.bias
Does that a normal warning?

2- Does the weight file for detection here https://drive.google.com/file/d/1L5wh2rYsVnuC_CEeFE6yMhU1kENt2gnk/view?usp=sharing

misaligned experimental results

Hi there, thx for the sharing.
I found two problems:

The reported image classification performance for Twins-SVT-L ([https://github.com/Meituan-AutoML/Twins]) on ImageNet-1K using 224x224 scales was 83.7, but in your paper it's 83.3, why is that?
You compared large models PVTv2-B4/B5 with Swin-S and Swin-B for image classification, but the corresponding comparisons were missing for object detection and segmentation, could you elaborate more?

PVT DETR?

Can you please update the model and config for detr+pvt?
I get an error when I want to train on this config?

Confusion matrix and classify validation images?

Hello, can the program plot the confusion matrix? This way I can know the accuracy rate for each category. Also, can I test a few images randomly to see the results of the completed training model recognition? Thank you.

Training Batch Size on ImageNet

In the paper, the training batch size on ImageNet is 128 (I assume it is the entire training batch, e.g. 128 = 8 * 16 (8 GPUs with 16 images each) ).
However, the dist_train.py uses 128 per GPU, which means the entire batch size is 128 * 8 = 1024. I wonder which one is the correct setting.

Thanks a lot,

problems about loading pretrained model with pytorch version below 1.6

pytorch 1.6 have switched torch.save to use a zip file-based format by default rather than the old Pickle-based format. This cause pytorchs with version below 1.6 can not load the pretained models AT ALL.

Can you use "_use_new_zipfile_serialization=False" when using torch.save()? just like torch.save(m.state_dict(), 'mymodel.pt', _use_new_zipfile_serialization=False). And provide another version of pretrained models?

Thanks a lot!!!!

How can I get small_pvt.pth?

I run your main.py .. I'm confusing what this class do ? it gave me the accuracy for 500 epoch and loss of them right ? and when I tried to train my images by this command
'dist_train.sh configs/retinanet_pvt_s_fpn_1x_coco_640.py 1'

I got that small_pvt.pth not found .. excuse me does that will be the weights ? Or checkpoints ?

Does small_pvt.pth here
https://drive.google.com/file/d/1vtcyoU8KUqNzktlMGXZrYcMRsNNiVZFQ/view?usp=sharing
For imagenet ? But how can I got pth file.if the dataset.is different . Appreciating your reply. Thanks

KeyError: 'meta' in testing

i was trying to test the model by this command

python test.py /home/user/Desktop/PVT-main/detection/configs/retinanet_pvt_s_fpn_1x_coco_640.py /home/user/Desktop/PVT-main/pretrained/pvt_small.pth --show-dir /home/user/Desktop/results.pkl

and got this warning , this is part of the warning as it is more than 10 lines
unexpected key in source state_dict: pos_embed1, pos_embed2, pos_embed3, pos_embed4, cls_token, patch_embed1.proj.weight, patch_embed1.proj.bias, patch_embed1.norm.weight, patch_embed1.norm.bias, patch_embed2.proj.weight, patch_embed2.proj.bias, patch_embed2.norm.weight, patch_embed2.norm.bias, patch_embed3.proj.weight,

then give this error
Traceback (most recent call last):
File "test.py", line 213, in <module>
main()
File "test.py", line 175, in main
if 'CLASSES' in checkpoint['meta']:
KeyError: 'meta'

代码方面的问题

您好，关于PVT的代码一些问题我想向您请教一下

我图中第二个注释表明我的代码运行出错的地方，我想问一下是我这边代码拷贝错误还是您当时代码上传有误？谢谢

PVT Large deosn't converge

Thanks for your great work. But when I trained PVT Large (pvt_large) as your default settings, the model didn't converge. The loss declined correctly in the first 37 epochs and the accuracy went to 57% but the model went wrong at 38th epoch. I used your code without any change. What's the problem? Thank you!

Below is a part of my training log.

Test: Total time: 0:01:55 (0.4429 s / it)

Acc@1 57.009 Acc@5 81.174 loss 1.948
Accuracy of the network on the 50000 test images: 57.0%
Max accuracy: 57.01%
Epoch: [38] [ 0/1251] eta: 2:06:33 lr: 0.000963 loss: 4.9324 (4.9324) time: 6.0701 data: 3.6057 max mem: 25529
Epoch: [38] [ 10/1251] eta: 0:31:59 lr: 0.000963 loss: 4.5930 (4.5768) time: 1.5465 data: 0.3281 max mem: 25529
Epoch: [38] [ 20/1251] eta: 0:27:07 lr: 0.000963 loss: 4.6624 (4.6160) time: 1.0843 data: 0.0003 max mem: 25529
Epoch: [38] [ 30/1251] eta: 0:25:15 lr: 0.000963 loss: 4.7355 (4.5806) time: 1.0737 data: 0.0003 max mem: 25529
Epoch: [38] [ 40/1251] eta: 0:24:16 lr: 0.000963 loss: 4.6986 (4.5811) time: 1.0784 data: 0.0003 max mem: 25529
Epoch: [38] [ 50/1251] eta: 0:23:33 lr: 0.000963 loss: 4.6986 (4.5609) time: 1.0766 data: 0.0003 max mem: 25529
Epoch: [38] [ 60/1251] eta: 0:23:07 lr: 0.000963 loss: 4.7104 (4.5901) time: 1.0864 data: 0.0003 max mem: 25529
Epoch: [38] [ 70/1251] eta: 0:22:39 lr: 0.000963 loss: 4.8095 (4.6143) time: 1.0854 data: 0.0003 max mem: 25529
Epoch: [38] [ 80/1251] eta: 0:22:17 lr: 0.000963 loss: 4.7373 (4.5898) time: 1.0721 data: 0.0003 max mem: 25529
Epoch: [38] [ 90/1251] eta: 0:21:55 lr: 0.000963 loss: 4.4603 (4.5742) time: 1.0696 data: 0.0003 max mem: 25529
Epoch: [38] [ 100/1251] eta: 0:21:37 lr: 0.000963 loss: 4.5539 (4.5777) time: 1.0682 data: 0.0003 max mem: 25529
Epoch: [38] [ 110/1251] eta: 0:21:21 lr: 0.000963 loss: 4.9701 (4.5993) time: 1.0787 data: 0.0003 max mem: 25529
Epoch: [38] [ 120/1251] eta: 0:21:06 lr: 0.000963 loss: 4.9029 (4.5914) time: 1.0811 data: 0.0003 max mem: 25529
Epoch: [38] [ 130/1251] eta: 0:20:50 lr: 0.000963 loss: 4.7300 (4.5999) time: 1.0711 data: 0.0003 max mem: 25529
Epoch: [38] [ 140/1251] eta: 0:20:35 lr: 0.000963 loss: 4.7998 (4.5936) time: 1.0630 data: 0.0003 max mem: 25529
Epoch: [38] [ 150/1251] eta: 0:20:23 lr: 0.000963 loss: 4.8562 (4.5969) time: 1.0850 data: 0.0003 max mem: 25529
Epoch: [38] [ 160/1251] eta: 0:20:09 lr: 0.000963 loss: 4.8583 (4.5961) time: 1.0852 data: 0.0003 max mem: 25529
Epoch: [38] [ 170/1251] eta: 0:19:55 lr: 0.000963 loss: 4.8583 (4.6029) time: 1.0677 data: 0.0003 max mem: 25529
Epoch: [38] [ 180/1251] eta: 0:19:42 lr: 0.000963 loss: 5.0298 (4.6202) time: 1.0675 data: 0.0003 max mem: 25529
Epoch: [38] [ 190/1251] eta: 0:19:28 lr: 0.000963 loss: 4.8480 (4.6175) time: 1.0634 data: 0.0003 max mem: 25529
Epoch: [38] [ 200/1251] eta: 0:19:15 lr: 0.000963 loss: 4.6446 (4.6124) time: 1.0629 data: 0.0003 max mem: 25529
Epoch: [38] [ 210/1251] eta: 0:19:04 lr: 0.000963 loss: 4.8329 (4.6245) time: 1.0741 data: 0.0003 max mem: 25529
Epoch: [38] [ 220/1251] eta: 0:18:52 lr: 0.000963 loss: 4.9058 (4.6362) time: 1.0833 data: 0.0003 max mem: 25529
Epoch: [38] [ 230/1251] eta: 0:18:40 lr: 0.000963 loss: 4.7250 (4.6332) time: 1.0764 data: 0.0003 max mem: 25529
Epoch: [38] [ 240/1251] eta: 0:18:28 lr: 0.000963 loss: 4.6894 (4.6391) time: 1.0808 data: 0.0003 max mem: 25529
Epoch: [38] [ 250/1251] eta: 0:18:16 lr: 0.000963 loss: 4.8600 (4.6438) time: 1.0789 data: 0.0003 max mem: 25529
Epoch: [38] [ 260/1251] eta: 0:18:04 lr: 0.000963 loss: 4.9939 (4.6550) time: 1.0710 data: 0.0003 max mem: 25529
Epoch: [38] [ 270/1251] eta: 0:17:53 lr: 0.000963 loss: 4.7281 (4.6478) time: 1.0717 data: 0.0003 max mem: 25529
Epoch: [38] [ 280/1251] eta: 0:17:41 lr: 0.000963 loss: 4.3858 (4.6383) time: 1.0664 data: 0.0003 max mem: 25529
Epoch: [38] [ 290/1251] eta: 0:17:29 lr: 0.000963 loss: 4.5126 (4.6390) time: 1.0627 data: 0.0003 max mem: 25529
Epoch: [38] [ 300/1251] eta: 0:17:17 lr: 0.000963 loss: 4.3964 (4.6302) time: 1.0638 data: 0.0003 max mem: 25529
Epoch: [38] [ 310/1251] eta: 0:17:05 lr: 0.000963 loss: 4.3964 (4.6284) time: 1.0683 data: 0.0003 max mem: 25529
Epoch: [38] [ 320/1251] eta: 0:16:54 lr: 0.000963 loss: 4.4917 (4.6220) time: 1.0689 data: 0.0003 max mem: 25529
Epoch: [38] [ 330/1251] eta: 0:16:42 lr: 0.000963 loss: 4.7606 (4.6335) time: 1.0695 data: 0.0003 max mem: 25529
Epoch: [38] [ 340/1251] eta: 0:16:31 lr: 0.000963 loss: 5.0333 (4.6346) time: 1.0699 data: 0.0003 max mem: 25529
Epoch: [38] [ 350/1251] eta: 0:16:20 lr: 0.000963 loss: 4.6795 (4.6276) time: 1.0700 data: 0.0003 max mem: 25529
Epoch: [38] [ 360/1251] eta: 0:16:08 lr: 0.000963 loss: 4.7723 (4.6305) time: 1.0728 data: 0.0003 max mem: 25529
Epoch: [38] [ 370/1251] eta: 0:15:57 lr: 0.000963 loss: 4.8322 (4.6305) time: 1.0767 data: 0.0003 max mem: 25529
Epoch: [38] [ 380/1251] eta: 0:15:46 lr: 0.000963 loss: 4.7535 (4.6310) time: 1.0725 data: 0.0003 max mem: 25529
Epoch: [38] [ 390/1251] eta: 0:15:35 lr: 0.000963 loss: 4.5236 (4.6247) time: 1.0746 data: 0.0003 max mem: 25529
Epoch: [38] [ 400/1251] eta: 0:15:24 lr: 0.000963 loss: 4.5129 (4.6280) time: 1.0783 data: 0.0003 max mem: 25529
Epoch: [38] [ 410/1251] eta: 0:15:13 lr: 0.000963 loss: 4.6520 (4.6250) time: 1.0803 data: 0.0003 max mem: 25529
Epoch: [38] [ 420/1251] eta: 0:15:02 lr: 0.000963 loss: 4.6115 (4.6235) time: 1.0841 data: 0.0003 max mem: 25529
Epoch: [38] [ 430/1251] eta: 0:14:51 lr: 0.000963 loss: 4.5550 (4.6176) time: 1.0788 data: 0.0003 max mem: 25529
Epoch: [38] [ 440/1251] eta: 0:14:40 lr: 0.000963 loss: 4.3985 (4.6097) time: 1.0745 data: 0.0003 max mem: 25529
Epoch: [38] [ 450/1251] eta: 0:14:29 lr: 0.000963 loss: 4.5041 (4.6144) time: 1.0711 data: 0.0004 max mem: 25529
Epoch: [38] [ 460/1251] eta: 0:14:18 lr: 0.000963 loss: 4.7949 (4.6127) time: 1.0769 data: 0.0003 max mem: 25529
Epoch: [38] [ 470/1251] eta: 0:14:07 lr: 0.000963 loss: 4.7556 (4.6148) time: 1.0773 data: 0.0003 max mem: 25529
Epoch: [38] [ 480/1251] eta: 0:13:56 lr: 0.000963 loss: 5.0523 (4.6200) time: 1.0845 data: 0.0003 max mem: 25529
Epoch: [38] [ 490/1251] eta: 0:13:45 lr: 0.000963 loss: 4.5865 (4.6152) time: 1.0781 data: 0.0003 max mem: 25529
Epoch: [38] [ 500/1251] eta: 0:13:34 lr: 0.000963 loss: 4.6311 (4.6210) time: 1.0776 data: 0.0003 max mem: 25529
Epoch: [38] [ 510/1251] eta: 0:13:23 lr: 0.000963 loss: 4.8767 (4.6208) time: 1.0855 data: 0.0003 max mem: 25529
Epoch: [38] [ 520/1251] eta: 0:13:13 lr: 0.000963 loss: 4.7439 (4.6204) time: 1.0891 data: 0.0003 max mem: 25529
Epoch: [38] [ 530/1251] eta: 0:13:02 lr: 0.000963 loss: 4.7974 (4.6190) time: 1.0813 data: 0.0003 max mem: 25529
Epoch: [38] [ 540/1251] eta: 0:12:51 lr: 0.000963 loss: 4.6865 (4.6171) time: 1.0676 data: 0.0003 max mem: 25529
Epoch: [38] [ 550/1251] eta: 0:12:40 lr: 0.000963 loss: 4.4560 (4.6144) time: 1.0727 data: 0.0003 max mem: 25529
Epoch: [38] [ 560/1251] eta: 0:12:29 lr: 0.000963 loss: 4.2302 (4.6069) time: 1.0761 data: 0.0003 max mem: 25529
Epoch: [38] [ 570/1251] eta: 0:12:18 lr: 0.000963 loss: 4.3246 (4.6080) time: 1.0741 data: 0.0003 max mem: 25529
Epoch: [38] [ 580/1251] eta: 0:12:07 lr: 0.000963 loss: 4.5513 (4.6052) time: 1.0661 data: 0.0003 max mem: 25529
Epoch: [38] [ 590/1251] eta: 0:11:56 lr: 0.000963 loss: 4.4924 (4.6075) time: 1.0740 data: 0.0003 max mem: 25529
Epoch: [38] [ 600/1251] eta: 0:11:45 lr: 0.000963 loss: 4.5949 (4.6052) time: 1.0817 data: 0.0003 max mem: 25529
Epoch: [38] [ 610/1251] eta: 0:11:34 lr: 0.000963 loss: 4.5321 (4.6035) time: 1.0638 data: 0.0003 max mem: 25529
Epoch: [38] [ 620/1251] eta: 0:11:23 lr: 0.000963 loss: 4.7689 (4.6075) time: 1.0604 data: 0.0003 max mem: 25529
Epoch: [38] [ 630/1251] eta: 0:11:12 lr: 0.000963 loss: 4.7689 (4.6088) time: 1.0649 data: 0.0003 max mem: 25529
Epoch: [38] [ 640/1251] eta: 0:11:01 lr: 0.000963 loss: 4.4721 (4.6039) time: 1.0580 data: 0.0003 max mem: 25529
Epoch: [38] [ 650/1251] eta: 0:10:50 lr: 0.000963 loss: 4.5410 (4.6067) time: 1.0654 data: 0.0003 max mem: 25529
Epoch: [38] [ 660/1251] eta: 0:10:39 lr: 0.000963 loss: 4.5659 (4.5996) time: 1.0689 data: 0.0003 max mem: 25529
Epoch: [38] [ 670/1251] eta: 0:10:28 lr: 0.000963 loss: 4.4456 (4.5999) time: 1.0727 data: 0.0003 max mem: 25529
Epoch: [38] [ 680/1251] eta: 0:10:17 lr: 0.000963 loss: 4.8766 (4.6035) time: 1.0818 data: 0.0003 max mem: 25529
Epoch: [38] [ 690/1251] eta: 0:10:06 lr: 0.000963 loss: 4.8766 (4.6041) time: 1.0854 data: 0.0003 max mem: 25529
Epoch: [38] [ 700/1251] eta: 0:09:55 lr: 0.000963 loss: 4.9327 (4.6104) time: 1.0805 data: 0.0003 max mem: 25529
Epoch: [38] [ 710/1251] eta: 0:09:44 lr: 0.000963 loss: 5.0049 (4.6129) time: 1.0702 data: 0.0003 max mem: 25529
Epoch: [38] [ 720/1251] eta: 0:09:34 lr: 0.000963 loss: 4.6922 (4.6117) time: 1.0673 data: 0.0003 max mem: 25529
Epoch: [38] [ 730/1251] eta: 0:09:23 lr: 0.000963 loss: 4.6331 (4.6107) time: 1.0810 data: 0.0003 max mem: 25529
Epoch: [38] [ 740/1251] eta: 0:09:12 lr: 0.000963 loss: 4.5547 (4.6111) time: 1.0795 data: 0.0003 max mem: 25529
Epoch: [38] [ 750/1251] eta: 0:09:01 lr: 0.000963 loss: 4.8843 (4.6181) time: 1.0719 data: 0.0003 max mem: 25529
Epoch: [38] [ 760/1251] eta: 0:08:50 lr: 0.000963 loss: 4.8843 (4.6160) time: 1.0851 data: 0.0003 max mem: 25529
Epoch: [38] [ 770/1251] eta: 0:08:40 lr: 0.000963 loss: 4.2934 (4.6119) time: 1.0840 data: 0.0003 max mem: 25529
Epoch: [38] [ 780/1251] eta: 0:08:29 lr: 0.000963 loss: 4.1930 (4.6087) time: 1.0784 data: 0.0003 max mem: 25529
Epoch: [38] [ 790/1251] eta: 0:08:18 lr: 0.000963 loss: 4.4176 (4.6073) time: 1.0748 data: 0.0003 max mem: 25529
Epoch: [38] [ 800/1251] eta: 0:08:07 lr: 0.000963 loss: 4.7402 (4.6115) time: 1.0681 data: 0.0003 max mem: 25529
Epoch: [38] [ 810/1251] eta: 0:07:56 lr: 0.000963 loss: 4.7749 (4.6094) time: 1.0713 data: 0.0003 max mem: 25529
Epoch: [38] [ 820/1251] eta: 0:07:45 lr: 0.000963 loss: 4.6709 (4.6079) time: 1.0732 data: 0.0003 max mem: 25529
Epoch: [38] [ 830/1251] eta: 0:07:34 lr: 0.000963 loss: 4.7506 (4.6088) time: 1.0641 data: 0.0003 max mem: 25529
Epoch: [38] [ 840/1251] eta: 0:07:23 lr: 0.000963 loss: 4.8636 (4.6112) time: 1.0592 data: 0.0003 max mem: 25529
Epoch: [38] [ 850/1251] eta: 0:07:13 lr: 0.000963 loss: 4.9930 (4.6116) time: 1.0767 data: 0.0003 max mem: 25529
Epoch: [38] [ 860/1251] eta: 0:07:02 lr: 0.000963 loss: 5.0639 (4.6155) time: 1.0766 data: 0.0003 max mem: 25529
Epoch: [38] [ 870/1251] eta: 0:06:51 lr: 0.000963 loss: 5.0486 (4.6160) time: 1.0683 data: 0.0003 max mem: 25529
Epoch: [38] [ 880/1251] eta: 0:06:40 lr: 0.000963 loss: 4.6785 (4.6145) time: 1.0654 data: 0.0003 max mem: 25529
Epoch: [38] [ 890/1251] eta: 0:06:29 lr: 0.000963 loss: 4.6382 (4.6126) time: 1.0603 data: 0.0003 max mem: 25529
Epoch: [38] [ 900/1251] eta: 0:06:18 lr: 0.000963 loss: 4.9989 (4.6179) time: 1.0642 data: 0.0003 max mem: 25529
Epoch: [38] [ 910/1251] eta: 0:06:08 lr: 0.000963 loss: 5.0227 (4.6205) time: 1.0740 data: 0.0003 max mem: 25529
Epoch: [38] [ 920/1251] eta: 0:05:57 lr: 0.000963 loss: 4.7505 (4.6198) time: 1.0733 data: 0.0003 max mem: 25529
Epoch: [38] [ 930/1251] eta: 0:05:46 lr: 0.000963 loss: 4.6593 (4.6196) time: 1.0636 data: 0.0003 max mem: 25529
Epoch: [38] [ 940/1251] eta: 0:05:35 lr: 0.000963 loss: 4.7349 (4.6184) time: 1.0697 data: 0.0003 max mem: 25529
Epoch: [38] [ 950/1251] eta: 0:05:24 lr: 0.000963 loss: 4.8424 (4.6185) time: 1.0741 data: 0.0003 max mem: 25529
Epoch: [38] [ 960/1251] eta: 0:05:13 lr: 0.000963 loss: 4.5308 (4.6170) time: 1.0704 data: 0.0003 max mem: 25529
Epoch: [38] [ 970/1251] eta: 0:05:03 lr: 0.000963 loss: 4.6764 (4.6186) time: 1.0749 data: 0.0003 max mem: 25529
Epoch: [38] [ 980/1251] eta: 0:04:52 lr: 0.000963 loss: 4.6764 (4.6176) time: 1.0768 data: 0.0004 max mem: 25529
Epoch: [38] [ 990/1251] eta: 0:04:41 lr: 0.000963 loss: 4.5145 (4.6176) time: 1.0677 data: 0.0004 max mem: 25529
Epoch: [38] [1000/1251] eta: 0:04:30 lr: 0.000963 loss: 4.5645 (4.6202) time: 1.0686 data: 0.0003 max mem: 25529
Epoch: [38] [1010/1251] eta: 0:04:19 lr: 0.000963 loss: 5.3548 (4.6373) time: 1.0613 data: 0.0003 max mem: 25529
Epoch: [38] [1020/1251] eta: 0:04:09 lr: 0.000963 loss: 6.9353 (4.6599) time: 1.0595 data: 0.0003 max mem: 25529
Epoch: [38] [1030/1251] eta: 0:03:58 lr: 0.000963 loss: 6.9423 (4.6820) time: 1.0729 data: 0.0003 max mem: 25529
Epoch: [38] [1040/1251] eta: 0:03:47 lr: 0.000963 loss: 6.9381 (4.7036) time: 1.0715 data: 0.0003 max mem: 25529
Epoch: [38] [1050/1251] eta: 0:03:36 lr: 0.000963 loss: 6.9351 (4.7248) time: 1.0717 data: 0.0003 max mem: 25529
Epoch: [38] [1060/1251] eta: 0:03:25 lr: 0.000963 loss: 6.9315 (4.7456) time: 1.0655 data: 0.0003 max mem: 25529
Epoch: [38] [1070/1251] eta: 0:03:15 lr: 0.000963 loss: 6.9319 (4.7660) time: 1.0609 data: 0.0003 max mem: 25529
Epoch: [38] [1080/1251] eta: 0:03:04 lr: 0.000963 loss: 6.9287 (4.7860) time: 1.0717 data: 0.0003 max mem: 25529
Epoch: [38] [1090/1251] eta: 0:02:53 lr: 0.000963 loss: 6.9198 (4.8055) time: 1.0834 data: 0.0003 max mem: 25529
Epoch: [38] [1100/1251] eta: 0:02:42 lr: 0.000963 loss: 6.9219 (4.8248) time: 1.0835 data: 0.0003 max mem: 25529
Epoch: [38] [1110/1251] eta: 0:02:32 lr: 0.000963 loss: 6.9286 (4.8437) time: 1.1036 data: 0.0003 max mem: 25529
Epoch: [38] [1120/1251] eta: 0:02:21 lr: 0.000963 loss: 6.9209 (4.8622) time: 1.0965 data: 0.0003 max mem: 25529
Epoch: [38] [1130/1251] eta: 0:02:10 lr: 0.000963 loss: 6.9212 (4.8804) time: 1.0701 data: 0.0003 max mem: 25529
Epoch: [38] [1140/1251] eta: 0:01:59 lr: 0.000963 loss: 6.9192 (4.8983) time: 1.0686 data: 0.0003 max mem: 25529
Epoch: [38] [1150/1251] eta: 0:01:48 lr: 0.000963 loss: 6.9192 (4.9159) time: 1.0640 data: 0.0003 max mem: 25529
Epoch: [38] [1160/1251] eta: 0:01:38 lr: 0.000963 loss: 6.9231 (4.9332) time: 1.0687 data: 0.0003 max mem: 25529
Epoch: [38] [1170/1251] eta: 0:01:27 lr: 0.000963 loss: 6.9241 (4.9502) time: 1.0702 data: 0.0003 max mem: 25529
Epoch: [38] [1180/1251] eta: 0:01:16 lr: 0.000963 loss: 6.9240 (4.9669) time: 1.0687 data: 0.0003 max mem: 25529
Epoch: [38] [1190/1251] eta: 0:01:05 lr: 0.000963 loss: 6.9198 (4.9833) time: 1.0668 data: 0.0003 max mem: 25529
Epoch: [38] [1200/1251] eta: 0:00:54 lr: 0.000963 loss: 6.9150 (4.9993) time: 1.0864 data: 0.0003 max mem: 25529
Epoch: [38] [1210/1251] eta: 0:00:44 lr: 0.000963 loss: 6.9144 (5.0152) time: 1.0855 data: 0.0003 max mem: 25529
Epoch: [38] [1220/1251] eta: 0:00:33 lr: 0.000963 loss: 6.9167 (5.0308) time: 1.0714 data: 0.0003 max mem: 25529
Epoch: [38] [1230/1251] eta: 0:00:22 lr: 0.000963 loss: 6.9167 (5.0461) time: 1.0702 data: 0.0003 max mem: 25529
Epoch: [38] [1240/1251] eta: 0:00:11 lr: 0.000963 loss: 6.9135 (5.0612) time: 1.0574 data: 0.0005 max mem: 25529
Epoch: [38] [1250/1251] eta: 0:00:01 lr: 0.000963 loss: 6.9179 (5.0760) time: 1.0532 data: 0.0004 max mem: 25529
Epoch: [38] Total time: 0:22:28 (1.0781 s / it)
Averaged stats: lr: 0.000963 loss: 6.9179 (5.0558)
Test: [ 0/261] eta: 0:31:19 loss: 6.8103 (6.8103) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.0000) time: 7.2018 data: 6.7932 max mem: 25529
Test: [ 10/261] eta: 0:04:17 loss: 6.9766 (6.9290) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.0000) time: 1.0263 data: 0.6262 max mem: 25529
Test: [ 20/261] eta: 0:02:56 loss: 6.9750 (6.9375) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.0000) time: 0.4103 data: 0.0066 max mem: 25529
Test: [ 30/261] eta: 0:02:25 loss: 6.9495 (6.9457) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.0000) time: 0.4091 data: 0.0024 max mem: 25529
Test: [ 40/261] eta: 0:02:06 loss: 6.9158 (6.9258) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.6352) time: 0.4017 data: 0.0010 max mem: 25529
Test: [ 50/261] eta: 0:01:53 loss: 6.8871 (6.9364) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.5106) time: 0.3975 data: 0.0007 max mem: 25529
Test: [ 60/261] eta: 0:01:43 loss: 6.9326 (6.9323) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.4269) time: 0.3969 data: 0.0007 max mem: 25529
Test: [ 70/261] eta: 0:01:35 loss: 6.8942 (6.9268) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.3668) time: 0.3951 data: 0.0016 max mem: 25529
Test: [ 80/261] eta: 0:01:27 loss: 6.8974 (6.9259) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.3215) time: 0.3954 data: 0.0025 max mem: 25529
Test: [ 90/261] eta: 0:01:21 loss: 6.9066 (6.9268) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.2862) time: 0.3983 data: 0.0017 max mem: 25529
Test: [100/261] eta: 0:01:15 loss: 6.9556 (6.9323) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.2578) time: 0.3960 data: 0.0009 max mem: 25529
Test: [110/261] eta: 0:01:09 loss: 6.9268 (6.9298) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.2346) time: 0.3962 data: 0.0010 max mem: 25529
Test: [120/261] eta: 0:01:04 loss: 6.8970 (6.9270) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.2152) time: 0.4211 data: 0.0242 max mem: 25529
Test: [130/261] eta: 0:00:59 loss: 6.8970 (6.9251) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.1988) time: 0.4183 data: 0.0242 max mem: 25529
Test: [140/261] eta: 0:00:54 loss: 6.9251 (6.9268) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.3694) time: 0.3986 data: 0.0018 max mem: 25529
Test: [150/261] eta: 0:00:49 loss: 6.9534 (6.9264) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.3449) time: 0.4021 data: 0.0045 max mem: 25529
Test: [160/261] eta: 0:00:45 loss: 6.8927 (6.9243) acc1: 0.0000 (0.1617) acc5: 0.0000 (0.4852) time: 0.4124 data: 0.0182 max mem: 25529
Test: [170/261] eta: 0:00:40 loss: 6.8886 (6.9231) acc1: 0.0000 (0.1523) acc5: 0.0000 (0.4569) time: 0.4112 data: 0.0157 max mem: 25529
Test: [180/261] eta: 0:00:35 loss: 6.9188 (6.9233) acc1: 0.0000 (0.1439) acc5: 0.0000 (0.4316) time: 0.3997 data: 0.0016 max mem: 25529
Test: [190/261] eta: 0:00:31 loss: 6.9170 (6.9216) acc1: 0.0000 (0.1363) acc5: 0.0000 (0.4090) time: 0.4233 data: 0.0265 max mem: 25529
Test: [200/261] eta: 0:00:26 loss: 6.9137 (6.9224) acc1: 0.0000 (0.1296) acc5: 0.0000 (0.3887) time: 0.4463 data: 0.0536 max mem: 25529
Test: [210/261] eta: 0:00:22 loss: 6.9097 (6.9210) acc1: 0.0000 (0.1234) acc5: 0.0000 (0.3703) time: 0.5000 data: 0.1046 max mem: 25529
Test: [220/261] eta: 0:00:18 loss: 6.8762 (6.9184) acc1: 0.0000 (0.1178) acc5: 0.0000 (0.3535) time: 0.4731 data: 0.0773 max mem: 25529
Test: [230/261] eta: 0:00:13 loss: 6.8775 (6.9185) acc1: 0.0000 (0.1127) acc5: 0.0000 (0.4509) time: 0.3974 data: 0.0048 max mem: 25529
Test: [240/261] eta: 0:00:09 loss: 6.9246 (6.9183) acc1: 0.0000 (0.1081) acc5: 0.0000 (0.4322) time: 0.4009 data: 0.0050 max mem: 25529
Test: [250/261] eta: 0:00:04 loss: 6.9132 (6.9190) acc1: 0.0000 (0.1038) acc5: 0.0000 (0.5188) time: 0.3949 data: 0.0010 max mem: 25529
Test: [260/261] eta: 0:00:00 loss: 6.9128 (6.9180) acc1: 0.0000 (0.1000) acc5: 0.0000 (0.5000) time: 0.3788 data: 0.0001 max mem: 25529
Test: Total time: 0:01:54 (0.4370 s / it)
Acc@1 0.100 Acc@5 0.500 loss 6.918
Accuracy of the network on the 50000 test images: 0.1%
Max accuracy: 57.01%

Repeated Key Improvements

The three key improvements of PVT2 are seemingly a direct copy from the next-door SegFormer.

torchinfo summary doesn't work with PVT

Hello,

While trying to test PVT with a simple prediction, following the default image size and channels (3, 224, 224) i get the error:

RuntimeError: The size of tensor a (49) must match the size of tensor b (784) at non-singleton dimension 1

The problem can be seen and reproduced in this Colab: https://colab.research.google.com/drive/1fmrReOUQEwRgi_U1Z-k5fTiX_AojGaoK?usp=sharing

Does training complete its processing from the last point ?

excuse me, Does training complete its processing from the last epoch if the machine is shut down suddenly or will start from the beginning

The img_size is always 224 for object detection task when using different training image size?

Since the number of pos_embed parameter depends on the img_size, so do you keep img_size=224 fixed when initiating PVT even when you are training larger images, e.g. RetinaNet with 800+ image size.

The input size problem

Thank you for your great work.The size of my picture is (256,832),how should I deal with it?Please tell me more details.thanks.thanks

how to train on single GPU?

semantic segmentation code

Hi,thaks for your excellent work!!!
I have read your paper named 'Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions', and want to apply it in my work in semantic segmentation, When will you make the semantic segmentation code and models public?

how can i load pickle file?

thanks for sharing the code .. i'm trying to load pickle file to read it using these commands

import pickle infile = open('data.pkl','rb') new_dict = pickle.load(infile) infile.close() print(type(new_dict))
but error is
_pickle.UnpicklingError: A load persistent id instruction was encountered, but no persistent_load function was specified.
i searched for the solution but got that pickle file appears to be using advanced features that suggest it was never supposed to be directly loaded this way. can you help, please ?

About FLOPs calculation in Table 2

Hi Wenhai, thanks for this great work.

I have few questions about the FLOPs calculation in this paper. Previously I tested the DeiT models with ptflops, I got 2.51G, 9.20G, 35.13G FLOPs for DeiT-Tiny, DeiT-Small, DeiT-Base, respectively.

B.T.W I also included the matrix multiplications in the self-attention layer, namely q @ k and attn @ v. I assume there is something wrong with my calculation, may I know how do you calculate FLOPs for your experiments?

Thanks.

Training Logs For ImageNet

Would you provide the training_log for imagenet training? Thanks a lot,

处理不定长图片的输入

大神，您好，这个算法可以处理不定长的图片输入吗？ @whai362 @implus @xieenze

Mask R-CNN configs

Hi, thank you for your great work! Recently we would like to compare your model with ours on the Mask R-CNN results. I wonder if you can provide some configs for Mask R-CNN settings? Thanks!

More results for Linear SRA?

Hi, it seems Linear SRA works better with fewer params on PVT-V2-B2. Could you please show more results when applying this Attention to other model variants?

some warnings

hi, when i tried to train the model by myself, lots of warning appeared.
UserWarning: Argument interpolation should be of type InterpolationMode instead of int. Please, use InterpolationMode enum.
Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
although these warnings seems no harm when training the model

How to decide the learning rate?

I found the default lr for mask_rcnn and retinanet are 0.02 and 0.01, respectively. But for retinanet_pvt and mask_rcnn_pvt, the lr are both 0.0001 with AdamW. So how to decide the learning rate, does it depends on the optimizer type or the backbone structure itself? Any advice?

Grad strides do not match bucket view strides

hi when i use it as backbone to train a classifier, i meet the problem 'Grad strides do not match bucket view strides'. it seems transpose causes the gradients stride wrong. Need to be contigous.

Question about pretrained model

Hi, thanks for sharing of your great work. If I change a few layers of your network structure, do I need to retrain on ImageNet to get the pretrained model? Did you compare the performance of models without pretraining?

training didn't complete after epoch4

i used this command for training python train.py configs/retinanet_pvt_s_fpn_1x_coco_640.py but got killed here
Loading and preparing results... DONE (t=24.08s) creating index... index created! Running per image evaluation... Evaluate annotation type *bbox* Killed
i tried to use resume.sh to complete the process for the last point by using this ./dist_resume.sh /media/user/use/PVT-main/detection/work_dirs/retinanet_pvt_s_fpn_1x_coco_640/epoch_4.pth 1 /media/user/use/PVT-main/detection/checkpoint_root --data-path /media/user/use/PVT-main/coco/ but got
Creating model: 1 Traceback (most recent call last): File "main.py", line 442, in <module> main(args) File "main.py", line 251, in main drop_block_rate=None, File "/home/user/anaconda3/lib/python3.7/site-packages/timm/models/factory.py", line 59, in create_model raise RuntimeError('Unknown model (%s)' % model_name) RuntimeError: Unknown model (1) Traceback (most recent call last): File "/home/user/anaconda3/lib/python3.7/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/home/user/anaconda3/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/user/anaconda3/lib/python3.7/site-packages/torch/distributed/launch.py", line 260, in <module> main() File "/home/user/anaconda3/lib/python3.7/site-packages/torch/distributed/launch.py", line 256, in main cmd=cmd) subprocess.CalledProcessError: Command '['/home/user/anaconda3/bin/python', '-u', 'main.py', '--model', '1', '--batch-size', '64', '--epochs', '300', '--data-path', '/media/user/use/PVT-main/images', '--output_dir', '/media/user/use/PVT-main/images', '--resume', '/media/user/use/PVT-main/images/checkpoint.pth', '--output_dir', '/media/user/use/PVT-main/output', '--resume', '/media/user/use/PVT-main/detection/work_dirs/retinanet_pvt_s_fpn_1x_coco_640/epoch_3.pth']' returned non-zero exit status 1.

Did I write or use resume.sh in a false way ?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.