wjf5203 / seqformer Goto Github PK

View Code? Open in Web Editor NEW

339.0 339.0 31.0 16.69 MB

SeqFormer: Sequential Transformer for Video Instance Segmentation (ECCV 2022 Oral)

License: Other

Python 80.70% Shell 1.31% C++ 1.63% Cuda 16.37%

transformer-architecture video-instance-segmentation

seqformer's People

Contributors

Stargazers

Watchers

seqformer's Issues

Missing time_attention_weights from T=1 model files

The T=1 pretrained model files do not contain the time_attention_weights weights.

How to get the coco_keepfor_ytvis19.json?

GPU memory requirements

Not an issue, more of a question: what are the GPU memory requirements of this model?

Thank you 🙂

running on custom dataset

I would like to get explanations on how the algorithm could be run on a dataset which has labels different from your training dataset

where are the dataset annotation from?

Following the README.md, the ytvis dataset folder will be in the root directory of this repository. Excecuting the inference.py script in the same directory will cause an error as it expects ytvis to be in a parent directory:

SeqFormer/inference.py

Line 108 in edbfba4

parser.add_argument('--img_path', default='../ytvis/val/JPEGImages/')

The test score is 0.0

I am using the seqformer you provided well and thank you.
The performance when you execute the test code and upload the results.zip to the codalab server is 0.0. The following is the command to conduct the test.
python3 inference.py --masks --backbone resnet50 --model_path weights/r50_weight.pth --save_path results.json
I used SeqFormer_ablation's .pth file downloaded from model zoo.
If I'm doing something wrong, please let me know the answer. Thank you!

About GPU RAM requirements

Not an issue, just asking about hardware requirements.

I am following you job to do VIS researching. But it came out with some limitation on GPU memory.
First, I run the SeqFormer/models/ops/test.py but after seconds the GPU memory is used out.

Then, I run the inference.py, everything went well at the beginning, when processing the 295th video, the GPU memory was used out again.
My machine is NVIDIA TITAN xp, can you tell me how much GPU RAM is required during inference time and running SeqFormer/models/ops/test.py?

Not able to reproduce the results

Congrats for the awesome work.

I am trying to reproduce the results for resnet-50 backbone.
I tried following ,

Train Seqformer on coco dataset (with num_frames=1) for 24 epochs
Train Seqformer on coco+ytvis and ytvis using coco pretrained weights

Still I am not able to generate the desired numbers.

Can you please help me out with this ?

Thanks,

Hi, can you please realese the remained pretrained weights of Swin Transformer?

It seems the pretrianed weights of Swin variants ['swin_t_p4w7', 'swin_s_p4w7', 'swin_b_p4w7', 'swin_l_p4w7', 'swin_l_p4w12'] is not provided, can you kindly realse these pretrained weights ? Thanks. : )

How to get a predicted video or gif?

I try to run python reference.py,but I only got a json file.

Thank you for your wonderful work！I want to know how to get quantitative results locally without uploading them to the server.

format issue of the released r50 model weight

Impressive work on VIS. I met problems in evaluating phase. Any ideas are welcome.

It seems the released r50 pretrained model cannot be directly used to evaluate YVIS dataset, since its class head may be trained on coco.

After aligning the class head output dimensionality of the model to the released one, it seems inference still has one issue. I am not sure how to configure the code to address it.

How to get the pretained weight --pretrain_weights weights/r50_weight.pth

I guess that the "--pretrain_weights weights/r50_weight.pth" means the weight pretained on COCO. But I can not find in you repo. Could you upload you weights? Thanks.

OOM when training SeqFormer_swin_L on YouTube-VIS 2019 and COCO

Hi, thank you for your interesting work! I was trying to run your code but I meet OOM when training SeqFormer_swin_L on YouTube-VIS 2019 and COCO by your given script and command. I use 2 nodes and each node contains 8 V100 cards. Did I do something wrong?

How about the performance when replacing Deformable DETR to the original DETR?

Hi, thanks for your good work. I want to know the performance only using the original DETR instead of the improved Deformable DETR for a fair comparison with IFC paper.

inference error

Why frustratingly?

Frustratingly Simple Few-Shot Object Detection
Frustratingly Simple Domain Generalization via Image Stylization

I‘m just wondering what's the meaning of 'Frustratingly'...

An error occurs when training SeqFormer on YouTube-VIS 2019 and COCO 2017 jointly

Hi Junfeng，
Thanks for your excellent work! I meet a problem when I train the SeqFormer on YouTube-VIS 2019 and COCO 2017 jointly. Here is the error information.
Traceback (most recent call last):
File "main.py", line 331, in
main(args)
File "main.py", line 278, in main
model, criterion, data_loader_train, optimizer, device, epoch, args.clip_max_norm)
File "/data/liangzhiyuan/projects/SeqFormer/engine.py", line 48, in train_one_epoch
outputs, loss_dict = model(samples, targets, criterion, train=True)
File "/home/liangzhiyuan/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/liangzhiyuan/.local/lib/python3.6/site-packages/torch/nn/parallel/distributed.py", line 619, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/home/liangzhiyuan/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/data/liangzhiyuan/projects/SeqFormer/models/segmentation.py", line 166, in forward
indices = criterion.matcher(outputs_layer, gt_targets, self.detr.num_frames, valid_ratios)
File "/home/liangzhiyuan/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/data/liangzhiyuan/projects/SeqFormer/models/matcher.py", line 113, in forward
indices = [linear_sum_assignment(c[i]) for i, c in enumerate(C.split(sizes, -1))]
File "/data/liangzhiyuan/projects/SeqFormer/models/matcher.py", line 113, in
indices = [linear_sum_assignment(c[i]) for i, c in enumerate(C.split(sizes, -1))]
File "/usr/local/lib/python3.6/dist-packages/scipy/optimize/_lsap.py", line 93, in linear_sum_assignment
raise ValueError("matrix contains invalid numeric entries")
ValueError: matrix contains invalid numeric entries

    It seems that some values of C are nan or inf. Do you meet this problem during training? BTW, the training process using just the YouTube-VIS 2019 dataset works well in my setting.

Using Inference Issue

How can we test on inference especially I'm getting errors on size? I have downloaded the pre-trained weights for r50 from readme.md and put backbone as resnet50. I'm getting size mismatch. Can someone help. My command is below

python3 inference.py --masks --backbone resnet50 --model_path ~/SeqFormer/r50_weight.pth --save_path results.json

Why 42 classes?

First of all, congratulations on the nice work!
I wanted to ask why the number of classes is 42 if YT-Vis only has 40 classes. One extra class is used for the background but what about the other one?

I also don't understand why you include the background class if you use focal loss. Original Deformable DeTR focal loss implementation ignores background because it is basically given by the sigmoid probabilities for all the classes being < 0.5.

Thanks a lot for your help!

wjf5203 / seqformer Goto Github PK

seqformer's People

Contributors

Stargazers

Watchers

Forkers

seqformer's Issues

Recommend Projects

Recommend Topics

Recommend Org