wjf5203 / seqformer Goto Github PK
View Code? Open in Web Editor NEWSeqFormer: Sequential Transformer for Video Instance Segmentation (ECCV 2022 Oral)
License: Other
SeqFormer: Sequential Transformer for Video Instance Segmentation (ECCV 2022 Oral)
License: Other
The T=1 pretrained model files do not contain the time_attention_weights
weights.
Not an issue, more of a question: what are the GPU memory requirements of this model?
Thank you 🙂
I would like to get explanations on how the algorithm could be run on a dataset which has labels different from your training dataset
Following the README.md, the ytvis
dataset folder will be in the root directory of this repository. Excecuting the inference.py
script in the same directory will cause an error as it expects ytvis
to be in a parent directory:
Line 108 in edbfba4
I am using the seqformer you provided well and thank you.
The performance when you execute the test code and upload the results.zip to the codalab server is 0.0. The following is the command to conduct the test.
python3 inference.py --masks --backbone resnet50 --model_path weights/r50_weight.pth --save_path results.json
I used SeqFormer_ablation's .pth file downloaded from model zoo.
If I'm doing something wrong, please let me know the answer. Thank you!
Not an issue, just asking about hardware requirements.
I am following you job to do VIS researching. But it came out with some limitation on GPU memory.
First, I run the SeqFormer/models/ops/test.py but after seconds the GPU memory is used out.
Then, I run the inference.py, everything went well at the beginning, when processing the 295th video, the GPU memory was used out again.
My machine is NVIDIA TITAN xp, can you tell me how much GPU RAM is required during inference time and running SeqFormer/models/ops/test.py?
Congrats for the awesome work.
I am trying to reproduce the results for resnet-50 backbone.
I tried following ,
Still I am not able to generate the desired numbers.
Can you please help me out with this ?
Thanks,
It seems the pretrianed weights of Swin variants ['swin_t_p4w7', 'swin_s_p4w7', 'swin_b_p4w7', 'swin_l_p4w7', 'swin_l_p4w12'] is not provided, can you kindly realse these pretrained weights ? Thanks. : )
I try to run python reference.py,but I only got a json file.
Impressive work on VIS. I met problems in evaluating phase. Any ideas are welcome.
It seems the released r50 pretrained model cannot be directly used to evaluate YVIS dataset, since its class head may be trained on coco.
After aligning the class head output dimensionality of the model to the released one, it seems inference still has one issue. I am not sure how to configure the code to address it.
I guess that the "--pretrain_weights weights/r50_weight.pth" means the weight pretained on COCO. But I can not find in you repo. Could you upload you weights? Thanks.
Hi, thank you for your interesting work! I was trying to run your code but I meet OOM when training SeqFormer_swin_L on YouTube-VIS 2019 and COCO by your given script and command. I use 2 nodes and each node contains 8 V100 cards. Did I do something wrong?
Hi, thanks for your good work. I want to know the performance only using the original DETR instead of the improved Deformable DETR for a fair comparison with IFC paper.
Frustratingly Simple Few-Shot Object Detection
Frustratingly Simple Domain Generalization via Image Stylization
I‘m just wondering what's the meaning of 'Frustratingly'...
Hi Junfeng,
Thanks for your excellent work! I meet a problem when I train the SeqFormer on YouTube-VIS 2019 and COCO 2017 jointly. Here is the error information.
Traceback (most recent call last):
File "main.py", line 331, in
main(args)
File "main.py", line 278, in main
model, criterion, data_loader_train, optimizer, device, epoch, args.clip_max_norm)
File "/data/liangzhiyuan/projects/SeqFormer/engine.py", line 48, in train_one_epoch
outputs, loss_dict = model(samples, targets, criterion, train=True)
File "/home/liangzhiyuan/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/liangzhiyuan/.local/lib/python3.6/site-packages/torch/nn/parallel/distributed.py", line 619, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/home/liangzhiyuan/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/data/liangzhiyuan/projects/SeqFormer/models/segmentation.py", line 166, in forward
indices = criterion.matcher(outputs_layer, gt_targets, self.detr.num_frames, valid_ratios)
File "/home/liangzhiyuan/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/data/liangzhiyuan/projects/SeqFormer/models/matcher.py", line 113, in forward
indices = [linear_sum_assignment(c[i]) for i, c in enumerate(C.split(sizes, -1))]
File "/data/liangzhiyuan/projects/SeqFormer/models/matcher.py", line 113, in
indices = [linear_sum_assignment(c[i]) for i, c in enumerate(C.split(sizes, -1))]
File "/usr/local/lib/python3.6/dist-packages/scipy/optimize/_lsap.py", line 93, in linear_sum_assignment
raise ValueError("matrix contains invalid numeric entries")
ValueError: matrix contains invalid numeric entries
It seems that some values of C are nan or inf. Do you meet this problem during training? BTW, the training process using just the YouTube-VIS 2019 dataset works well in my setting.
How can we test on inference especially I'm getting errors on size? I have downloaded the pre-trained weights for r50 from readme.md and put backbone as resnet50. I'm getting size mismatch. Can someone help. My command is below
python3 inference.py --masks --backbone resnet50 --model_path ~/SeqFormer/r50_weight.pth --save_path results.json
First of all, congratulations on the nice work!
I wanted to ask why the number of classes is 42 if YT-Vis only has 40 classes. One extra class is used for the background but what about the other one?
I also don't understand why you include the background class if you use focal loss. Original Deformable DeTR focal loss implementation ignores background because it is basically given by the sigmoid probabilities for all the classes being < 0.5.
Thanks a lot for your help!
How many epochs and how long does it take for pretraining on COCO dataset per model?
Thanks!
Please mention the python version and pycocotools version
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.