whai362 / pvt Goto Github PK
View Code? Open in Web Editor NEWOfficial implementation of PVT series
License: Apache License 2.0
Official implementation of PVT series
License: Apache License 2.0
if 'model' in checkpoint: model_without_ddp.load_state_dict(checkpoint['model']) else: model_without_ddp.load_state_dict(checkpoint)
File "main.py", line 271, in main
model_without_ddp.load_state_dict(checkpoint)
UnboundLocalError: local variable 'model_without_ddp' referenced before assignment
Do you use NMS and anchors in the RetinaNet-PVT variation?
excuse me i have the same issues with couldn't read .pkl files i got .. if you please how the pickle file was created? does the file include a load persistent ID instruction? or it contains references to data outside the pickle file ?
Thanks a lot for your time
Hi, I found in #1, an example is provided to show the complexity of ViT models (using get_vit_flops()
). However, since PVT has multiple scales, I wonder if there is a tool provided to measure the flops for PVT? Thanks!
Hi, I want to use the mcloader setting. But I found that it fails at 'import mc'. Could you please tell me what I should do to solve this problem?
https://github.com/whai362/PVT/blob/main/pvt.py
https://github.com/whai362/PVT/blob/main/detection/pvt.py
2- does 12 epochs for detection and 300 epochs for classification ?
Retinanet uses strides=[8, 16, 32, 64, 128], the first stride is 8, but the first stride for pvt is 4 according to the paper. Have you tried the same strides as retinanet for pvt?
Hi @whai362,
Thanks for the very nice work. I trained your model with your instructions but I am unable to achieve your results of Retina small 640 on Coco: 38.7 AP. Here is my results:
Do you have any suggestions to reproduce your results on COCO? Thanks!!
Hi, I found in #1, an example is provided to show the complexity of ViT models (using get_vit_flops()
). However, since PVT has multiple scales, I wonder if there is a tool provided to measure the flops for PVT? Thanks!
It seems a duplicate post is created. I will close this one.
what is the expected content in the pickle file if I want to make detection for image .. the objects and their boundaries only or where can i find the class of pickle you used as i searched but couldn't find it .. thanks for your time
hello~, i am very interested in your work. Now i meet some questions when the pretrained model was load
checkpoint = torch.load(args.finetune, map_location='cpu')
debug:
pos_embed_checkpoint = checkpoint_model['pos_embed']
the checkpiont have "pos_embed1" "pos_embed2" "pos_embed3" "pos_embed4", but no "pos_embed"
Hi, great work!
It seems like this code only supports classification tasks, when will the Retina-PVT code be available?
How can we visualize the trained models? Is there any demo?
Thanks
1- which one pre-trained i should use for object detection??
https://drive.google.com/file/d/1L5wh2rYsVnuC_CEeFE6yMhU1kENt2gnk/view?usp=sharing
or
https://drive.google.com/file/d/1vtcyoU8KUqNzktlMGXZrYcMRsNNiVZFQ/view?usp=sharing
i got this error raise type(e)(f'{obj_cls.__name__}: {e}') KeyError: "RetinaNet: 'pvt_small is not in the backbone registry'"
when I used python train.py configs/retinanet_pvt_s_fpn_1x_coco_640.py
Thanks for your great work. When I'm reading your code, I noticed that in main.py, line 379
model_without_ddp.reset_drop_path(0.0)
where you manually set the drop path rate to 0, instead of using parser argument.
I'd like to know if this is done intentionally for the classification task, since many related works set the drop path rate to 0.1.
So there are two questions,
Thanks a lot!
When I load the pretrained weights of Sparse R-CNN with PVT-b2 backbone, it showed this error: "RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory". I believe the file was not saved correctly.
i got those lines when tried to train images with this command
./dist_train.sh configs/retinanet_pvt_s_fpn_1x_coco_640.py 8
for object detection Task
WARNING - The model and loaded state dict do not match exactly unexpected key in source state_dict: cls_token, norm.weight, norm.bias, head.weight, head.bias
Does that a normal warning?
2- Does the weight file for detection here https://drive.google.com/file/d/1L5wh2rYsVnuC_CEeFE6yMhU1kENt2gnk/view?usp=sharing
Hi there, thx for the sharing.
I found two problems:
Can you please update the model and config for detr+pvt?
I get an error when I want to train on this config?
Hello, can the program plot the confusion matrix? This way I can know the accuracy rate for each category. Also, can I test a few images randomly to see the results of the completed training model recognition? Thank you.
In the paper, the training batch size on ImageNet is 128 (I assume it is the entire training batch, e.g. 128 = 8 * 16 (8 GPUs with 16 images each) ).
However, the dist_train.py uses 128 per GPU, which means the entire batch size is 128 * 8 = 1024. I wonder which one is the correct setting.
Thanks a lot,
problems about loading pretrained model with pytorch version below 1.6
pytorch 1.6 have switched torch.save to use a zip file-based format by default rather than the old Pickle-based format. This cause pytorchs with version below 1.6 can not load the pretained models AT ALL.
Can you use "_use_new_zipfile_serialization=False" when using torch.save()? just like torch.save(m.state_dict(), 'mymodel.pt', _use_new_zipfile_serialization=False). And provide another version of pretrained models?
Thanks a lot!!!!
I run your main.py .. I'm confusing what this class do ? it gave me the accuracy for 500 epoch and loss of them right ? and when I tried to train my images by this command
'dist_train.sh configs/retinanet_pvt_s_fpn_1x_coco_640.py 1'
I got that small_pvt.pth not found .. excuse me does that will be the weights ? Or checkpoints ?
Does small_pvt.pth here
https://drive.google.com/file/d/1vtcyoU8KUqNzktlMGXZrYcMRsNNiVZFQ/view?usp=sharing
For imagenet ? But how can I got pth file.if the dataset.is different . Appreciating your reply. Thanks
i was trying to test the model by this command
python test.py /home/user/Desktop/PVT-main/detection/configs/retinanet_pvt_s_fpn_1x_coco_640.py /home/user/Desktop/PVT-main/pretrained/pvt_small.pth --show-dir /home/user/Desktop/results.pkl
and got this warning , this is part of the warning as it is more than 10 lines
unexpected key in source state_dict: pos_embed1, pos_embed2, pos_embed3, pos_embed4, cls_token, patch_embed1.proj.weight, patch_embed1.proj.bias, patch_embed1.norm.weight, patch_embed1.norm.bias, patch_embed2.proj.weight, patch_embed2.proj.bias, patch_embed2.norm.weight, patch_embed2.norm.bias, patch_embed3.proj.weight,
then give this error
Traceback (most recent call last):
File "test.py", line 213, in <module>
main()
File "test.py", line 175, in main
if 'CLASSES' in checkpoint['meta']:
KeyError: 'meta'
Thanks for your great work. But when I trained PVT Large (pvt_large) as your default settings, the model didn't converge. The loss declined correctly in the first 37 epochs and the accuracy went to 57% but the model went wrong at 38th epoch. I used your code without any change. What's the problem? Thank you!
Below is a part of my training log.
Test: Total time: 0:01:55 (0.4429 s / it)
The three key improvements of PVT2 are seemingly a direct copy from the next-door SegFormer.
Hello,
While trying to test PVT with a simple prediction, following the default image size and channels (3, 224, 224) i get the error:
RuntimeError: The size of tensor a (49) must match the size of tensor b (784) at non-singleton dimension 1
The problem can be seen and reproduced in this Colab: https://colab.research.google.com/drive/1fmrReOUQEwRgi_U1Z-k5fTiX_AojGaoK?usp=sharing
excuse me, Does training complete its processing from the last epoch if the machine is shut down suddenly or will start from the beginning
Since the number of pos_embed parameter depends on the img_size, so do you keep img_size=224 fixed when initiating PVT even when you are training larger images, e.g. RetinaNet with 800+ image size.
Thank you for your great work.The size of my picture is (256,832),how should I deal with it?Please tell me more details.thanks.thanks
Hi,thaks for your excellent work!!!
I have read your paper named 'Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions', and want to apply it in my work in semantic segmentation, When will you make the semantic segmentation code and models public?
thanks for sharing the code .. i'm trying to load pickle file to read it using these commands
import pickle infile = open('data.pkl','rb') new_dict = pickle.load(infile) infile.close() print(type(new_dict))
but error is
_pickle.UnpicklingError: A load persistent id instruction was encountered, but no persistent_load function was specified.
i searched for the solution but got that pickle file appears to be using advanced features that suggest it was never supposed to be directly loaded this way. can you help, please ?
Hi Wenhai, thanks for this great work.
I have few questions about the FLOPs calculation in this paper. Previously I tested the DeiT models with ptflops, I got 2.51G, 9.20G, 35.13G FLOPs for DeiT-Tiny, DeiT-Small, DeiT-Base, respectively.
B.T.W I also included the matrix multiplications in the self-attention layer, namely q @ k
and attn @ v
. I assume there is something wrong with my calculation, may I know how do you calculate FLOPs for your experiments?
Thanks.
Would you provide the training_log for imagenet training? Thanks a lot,
Hi, thank you for your great work! Recently we would like to compare your model with ours on the Mask R-CNN results. I wonder if you can provide some configs for Mask R-CNN settings? Thanks!
Hi, it seems Linear SRA works better with fewer params on PVT-V2-B2. Could you please show more results when applying this Attention to other model variants?
hi, when i tried to train the model by myself, lots of warning appeared.
UserWarning: Argument interpolation should be of type InterpolationMode instead of int. Please, use InterpolationMode enum.
Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
although these warnings seems no harm when training the model
I found the default lr for mask_rcnn and retinanet are 0.02 and 0.01, respectively. But for retinanet_pvt and mask_rcnn_pvt, the lr are both 0.0001 with AdamW. So how to decide the learning rate, does it depends on the optimizer type or the backbone structure itself? Any advice?
hi when i use it as backbone to train a classifier, i meet the problem 'Grad strides do not match bucket view strides'. it seems transpose causes the gradients stride wrong. Need to be contigous.
Hi, thanks for sharing of your great work. If I change a few layers of your network structure, do I need to retrain on ImageNet to get the pretrained model? Did you compare the performance of models without pretraining?
i used this command for training python train.py configs/retinanet_pvt_s_fpn_1x_coco_640.py
but got killed here
Loading and preparing results... DONE (t=24.08s) creating index... index created! Running per image evaluation... Evaluate annotation type *bbox* Killed
i tried to use resume.sh to complete the process for the last point by using this ./dist_resume.sh /media/user/use/PVT-main/detection/work_dirs/retinanet_pvt_s_fpn_1x_coco_640/epoch_4.pth 1 /media/user/use/PVT-main/detection/checkpoint_root --data-path /media/user/use/PVT-main/coco/
but got
Creating model: 1 Traceback (most recent call last): File "main.py", line 442, in <module> main(args) File "main.py", line 251, in main drop_block_rate=None, File "/home/user/anaconda3/lib/python3.7/site-packages/timm/models/factory.py", line 59, in create_model raise RuntimeError('Unknown model (%s)' % model_name) RuntimeError: Unknown model (1) Traceback (most recent call last): File "/home/user/anaconda3/lib/python3.7/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/home/user/anaconda3/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/user/anaconda3/lib/python3.7/site-packages/torch/distributed/launch.py", line 260, in <module> main() File "/home/user/anaconda3/lib/python3.7/site-packages/torch/distributed/launch.py", line 256, in main cmd=cmd) subprocess.CalledProcessError: Command '['/home/user/anaconda3/bin/python', '-u', 'main.py', '--model', '1', '--batch-size', '64', '--epochs', '300', '--data-path', '/media/user/use/PVT-main/images', '--output_dir', '/media/user/use/PVT-main/images', '--resume', '/media/user/use/PVT-main/images/checkpoint.pth', '--output_dir', '/media/user/use/PVT-main/output', '--resume', '/media/user/use/PVT-main/detection/work_dirs/retinanet_pvt_s_fpn_1x_coco_640/epoch_3.pth']' returned non-zero exit status 1.
Did I write or use resume.sh in a false way ?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.