henghuiding / rela Goto Github PK

[CVPR2023 Highlight] GRES: Generalized Referring Expression Segmentation

Home Page: https://henghuiding.github.io/GRES/

License: MIT License

Python 74.84% Shell 0.26% C++ 2.48% Cuda 22.42%

cvpr2023 multimodal-learning referring-expression-comprehension referring-expression-segmentation referring-image-segmentation vision-language-transformer

rela's Introduction

Hi there 👋

🔭 Researcher woking on Computer Vision and Artificial Intelligence
🌎 Shanghai, China

Links:

Website

Google Scholar

rela's People

Stargazers

Watchers

Forkers

ntucver cv-seg redfishiaven daitranskku ecvler dddonghwa linmu7177 gogopen haixiongli yutong-dai yiyunwacc chiehyunchen xdeadlocked nnaik39 cookecola fudan-segmentation

rela's Issues

Question regarding Relationship Modeling

According to this paragraph, region-based queries are supervised by mini-map down-sampled from the ground truth. If I understand correctly, all the queries then have the same supervision. If so, how can these queries learn to correspond to different regions? Wouldn't they learn the same thing and correspond to the same region?

Can you kindly explain more about this?

Can you provide the dataset preparation and the download links?

Can you provide the code for Visualizing Model Results?

Thank a lot for your response to my previous question about evaluation result.
I was wondering if you have any code available for the visualization?
If so, could you kindly point me in the right direction or provide a link to the repository?
It will be helpful for my research.
Thanks in advance.

Can I use referring_swin_base.yaml to reproduce RefCOCO, RefCOCO+, RefCOCOg results in the paper?

by only modifying DATASETS.TRAIN and DATASETS.TEST

referring_swin_base.yaml

About the number of images in the validation set

Hi, I noticed that you updated the dataset on August 29th, but why are there more samples in the validation set in the new dataset?

And Which version of the dataset was used to achieve the results in your paper?

Question on evaluation result

I performe the given Inference code but get the different evaluation result from that in the paper:
My gIoU is 66.3407, cIoU is 63.0991, but in the paper they are respectively 63.60 and 62.42

Here follow my running code, file directory and the output. Is there anything wrong? Thank you.

!python train_net.py
--config-file configs/referring_swin_base.yaml
--num-gpus 1 --dist-url auto --eval-only
MODEL.WEIGHTS "/content/ReLA/gres_swin_base.pth"
OUTPUT_DIR "/content/ReLA"

Problems about RIA and RLA

I looked at your code and the code is clear, but I didn't find the RIA and RLA parts. Haven't you released this part of the code yet?

Question about Lang attn in RLA module?

Hi,
I have a question related to RLA module.

  lang_feat_att = self.lang_proj(lang_feat_att)
  lang_feat_att = self.RLA_lang_att(output, lang_feat_att.permute(1,0,2)) * F.sigmoid(self.lang_weight)
  output = output + lang_feat_att * self.rla_weight

It seems that RLA_lang_att does not contribute so much. I have tried to remove these lines of code and the result kept the same.
Moreover, with self.rla_weight=0.1 and only used for the first layer, the lang_feat_att may not affect to the output. However, in the paper, I saw that it improves ~1% in performance. Is there any mistake or I understood in a wrong way?

Question about the checkpoints provided in github

I directly run the inference code and use R50 checkpoint provided in the github but I find that the results is different from that proposed in the readme. I wonder if any problem?

Question about finetune on custom dataset

Thanks for your great work! I wonder how to finetune on other custom dataset，I see in section 3.2 "We developed an online annotation tool to find images, select instances, write expressions, and verify the results"，will the tool be open ?

Can you provide the configuration mentioned in the article?

hello, sir.
We have observed that the current configuration does not match the article, but the provided model. pth performance is an indicator in the article. Can you provide the original configuration or provide all performance for this configuration?

About the training time

According to your supplementary file, the model is trained for 150,000 iterations with a batch size of 24 on four 32G V100 GPUs.
How long does it take to complete a training?

Question about the training result on 4 A100 with batch size = 36

I trained using four A100 GUP and the total batch size is 36.
After a total of 300,000 times of training, this is the result of the model:

which is quite different from the result given in your paper :

I did not change the code, what could be the cause?

About Training

Can you provide pre-trained weights for RefCOCO, RefCOCO+, and RefCOCOg?

It seems that the current released weight is for gRefCOCO.
Can you provide pre-trained weights for RefCOCO, RefCOCO+, and RefCOCOg?
It will be helpful for my research.
Thanks.

Questions about training logs

Hello, I am very interested in your work, but I encountered some problems when reproducing it. Could you provide the training log of the model using Resnet50 as img-encoder?

model

Hi~
what is the "MODEL.WEIGHTS" in the training process?

Questions about train dataset and train metric?

Hello,thanks for your contribution.The train dataset setting in config yaml file is grefcoco_unc_train,and the article says grefcoco inherit some single expressions from refcoco.I notice that the code about dataset create a register dataset called grefcoco_unc_train_full.So my question is if I should change the config file for train dataset.In addition,I train the code and the best gIoU is about 53,which is lower than the metric in the article.I train on 4 2080Ti,and set batchsize to 8,and set max_iter to 900000 and multiply 6 steps about learning rate.I wound be grateful if you give me some solution or advice.Thank you.