Git Product home page Git Product logo

gpt4roi's People

Contributors

jshilong avatar mattmazzola avatar peizesun avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gpt4roi's Issues

ValueError: The following `model_kwargs` are not used by the model: ['images'] (note: typos in the generate arguments will also show up in this list)

image
Professor, it took me a few days to figure out my previous mistakes. But this error cannot be solved. Can you do me a favor? I have resolved all the environment versions, but they still don't work.

When I input promote, the model will report this error, and the following error will appear in the code.

Happy New Year's Day, Professor. I am indeed quite clumsy, could you please give me some guidance.
image

evaluation

In the paper, this code does have a demo, but did you develop evaluation script on dev set or some existing datasets?

Demo error when attempting to send message: PredictBody validation error, missing required field

In the video of #41 I demonstrate an error when running the demo.

It seems to be missing required property in one of the events sent through gradio
Given the error occurs inside gradio-dev runtime I am unsure if is due to the app.py sending the wrong data, or if there is some issue inside the actual gradio-dev package

Running on local URL:  http://0.0.0.0:20012

To create a public link, set `share=True` in `launch()`.
Task exception was never retrieved
future: <Task finished name='6976h8jtnyr_7' coro=<Queue.process_events() done, defined at /workspaces/GPT4RoI/gradio-dev/gradio/queueing.py:342> exception=1 validation error for PredictBody
event_id
  Field required [type=missing, input_value={'data': [], 'event_data'...on_hash': '6976h8jtnyr'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.5/v/missing>
Traceback (most recent call last):
  File "/workspaces/GPT4RoI/gradio-dev/gradio/queueing.py", line 346, in process_events
    client_awake = await self.gather_event_data(event)
  File "/workspaces/GPT4RoI/gradio-dev/gradio/queueing.py", line 219, in gather_event_data
    data, client_awake = await self.get_message(event, timeout=receive_timeout)
  File "/workspaces/GPT4RoI/gradio-dev/gradio/queueing.py", line 448, in get_message
    return PredictBody(**data), True
  File "/home/vscode/miniconda3/envs/gpt4roi/lib/python3.9/site-packages/pydantic/main.py", line 164, in __init__
    __pydantic_self__.__pydantic_validator__.validate_python(data, self_instance=__pydantic_self__)
pydantic_core._pydantic_core.ValidationError: 1 validation error for PredictBody
event_id
  Field required [type=missing, input_value={'data': [], 'event_data'...on_hash': '6976h8jtnyr'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.5/v/missing

n-round chat fails on the demo

The chatbot works well according to the given region when I ask the first question. But for the second ask it keeps processing and never generates a sentence. It may be an issue to be fixed.

ValueError: The following `model_kwargs` are not used by the model: ['images']

Hi @jshilong great work considering ROI for language models.

I am getting this error "ValueError: The following model_kwargs are not used by the model: ['images']" while trying the inference code. Probably because 'images' is not considered as a parameter to the model.generate function.

with torch.amp.autocast(device_type='cuda'):
output_ids = self.model.generate(
input_ids,
images=image.unsqueeze(0).half().cuda(),
do_sample=True,
temperature=0.2,
max_new_tokens=1024,
stopping_criteria=[stopping_criteria])

Could you please confirm if you are using any specific version of the 'torch', 'llava', or 'transformers' library?
Thank you!

weight release

What an amazing job, and thanks for your contributions to the open source community, I'd like to try out some new ideas by using model weights, so do you have any plans to release weights anytime soon?

Load weight error

Hi, Thanks for your excellent work.
Now I ran into an issue when I tried to load GPT4ROI weights to perform stage2 training and there was an error
”Error(s) in loading state_dict for SPILlavaMPTForCausalLM:
size mismatch for lm_head.weight: copying a param with shape torch.Size([32006, 4096]) from checkpoint, the shape in current model is torch.Size([32005, 4096]).“
How to solve this problem?
Looking forward to your reply!

Finetuning stage 2 from a checkpoint

Hi @jshilong, thanks again for releasing the code and the models!
I am trying to finetune the model from stage 2. Could you please share a stage 2 checkpoint.
I am getting a 'ValueError: Can't find a valid checkpoint at ./exp/stage2/checkpoint-0' when trying to start from the current weight directory as the starting point.

Appreciate your help!

train_stage1.sh and train_stage2.sh

Hi, Currently the two bash scripts look similar. Can you please confirm the commands for the 1st and 2nd stage of training? I noticed that the data loading is being controlled from config. How exactly the model is properly frozen in two separate stages?

Retraining issue

Hi,

I appreciate the effort you put into your framework, but I encountered some confusion while attempting to retrain it. The guidance suggests using the original LLaMA weights for training, but I noticed in your script that the model name input is set as vicuna-7b: /mnt/petrelfs/share_data/zhangshilong/vicuna-7b/.

I attempted to use both the original LLaMA and LLaVA huggingface format (haven't applied your delta since it haven't been released yet), but it always resulted in this error:

  File "/gpt4roi/gpt4roi/train/train_mem.py", line 16, in <module>
    train()
  File "/gpt4roi/gpt4roi/train/train.py", line 641, in train
    model.initialize_vision_tokenizer(mm_use_im_start_end=model_args.mm_use_im_start_end,
  File "/gpt4roi/gpt4roi/models/spi_llava.py", line 295, in initialize_vision_tokenizer
    raise ValueError(
ValueError: Unexpected embed_tokens_weight shape. Pretrained: torch.Size([2, 4096]). Current: torch.Size([32006, 4096]). Numer of new tokens: 0.

I would appreciate your guidance in resolving the error and making the code runnable. Could you please provide me with the necessary steps or adjustments to address the issue?

GPU memory

How much GPU memory is required for inference?

weight file

image
The weight of LLaMA 7b does not exist in the provided webpage and the hugging face

About `solve_hallucination` setting in `det_llava.py`

Hi, although the Hallucination_questions and answer' is added into the sources' as shown in line 442, the sources was overwritten in line 449. Therefore, the coversations for solving hallucination are not actually added into the variable sources.

(Pdb) list
450 # print(copy_source)
451 sources = preprocess_multimodal(
452 copy.deepcopy([e['conversations'] for e in copy_source]),
453 self.multimodal_cfg, cur_token_len)
454
455 -> data_dict = preprocess(
456 sources,
457 self.tokenizer)
458 if isinstance(i, int):
459 data_dict = dict(input_ids=data_dict['input_ids'][0],
460 labels=data_dict['labels'][0])
(Pdb) sources
[[{'from': 'human', 'value': 'Can you describe the main features of this image for me?\nThe <im_start><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_patch><im_end> provides an overview of the picture. \n'}, {'from': 'gpt', 'value': "The image shows a baseball field with a group of people gathered, possibly during a small-town baseball game. There is a batter on deck, most likely a female softball player, stretching and practicing her batting stance while holding a baseball bat. She's positioned closer to the left side of the field. \n\nSeveral people stand around the field, some closer to the batter and others farther away, possibly teammates, coaches, or spectators. There are a few sports balls scattered around the field, with two located near the center and another one toward the right side.\n\nOverall, the scene depicts an active sports event, with people of various ages and roles involved in the game."}]]
(Pdb)

RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

Hello@jshilong, have you encountered this problem?

I have trained the model of both two stages. Then I merge the trained model with llama as you described.
When I load the merged model to do test, the errors in below occured.

Traceback (most recent call last):
  File "/hy/code/gpt4roi/train_net.py", line 326, in <module>
    launch(
  File "/hy/code/gpt4roi/detectron2/detectron2/engine/launch.py", line 84, in launch
    main_func(*args)
  File "/hy/code/gpt4roi/train_net.py", line 311, in main
    res = Trainer.test(cfg, model)
  File "/hy/code/gpt4roi/detectron2/detectron2/engine/defaults.py", line 617, in test
    results_i = inference_on_dataset(model, data_loader, evaluator)
  File "/hy/code/gpt4roi/detectron2/detectron2/evaluation/evaluator.py", line 158, in inference_on_dataset
    outputs = model(inputs)
  File "/workspace/conda_env/gpt4roi/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/hy/code/gpt4roi/gpt4roi.py", line 153, in get_output
    output_ids = self.model.generate(
  File "/workspace/conda_env/gpt4roi/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/workspace/conda_env/gpt4roi/lib/python3.10/site-packages/transformers/generation/utils.py", line 1485, in generate
    return self.sample(
  File "/workspace/conda_env/gpt4roi/lib/python3.10/site-packages/transformers/generation/utils.py", line 2562, in sample
    next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

Pre-trained weights

Hello, the download link of the pre-trained weights of huggingface you opened is not available, can you update it? Or other download channels. Thank you very much!

issue with Demo: No response after drawing bounding box and entering text in the chatbox in the demo

Hi Authors,

Thank you for your great work.
While running the demo, I encountered an issue where, after loading an image and subsequently drawing a bounding box, there is no response upon entering text in the chatbox. This appears similar to the problem described in closed issue #9. I have ensured that the gradio_box is correctly set up and followed all provided instructions. The same error is experienced when executing app_box.py in gradio_box. I would really appreciate some help.

Thank you.

Demo is not use

I want to use the demo provided by the author to verify the results, but the demo provided on the author's webpage cannot be used. How can I solve this problem?

Dataset for stage 2

Hello, thank you very much for your excellent work. However, I have some doubts about the dataset, and I would appreciate it if you could clarify them for me. Where can I download the train.json file for visual_genome? Do I need to run EVA-02-DET myself to obtain the llava_150k_bbox_pred_results.pkl file?

How to Evaluate in Terminal instead of web app

Excuse me ,I want to evaluate it on substantial data automatically,
I test it on several examples like this,however the res is not good,one region is ok,but two regions is bad:
image
Is my way sending the box&txt wrong?I want to handle it ,thanks!!!

This project is using ViT-L or ViT-H?

Hi,

Great work! I have a question w.r.t the vision backbone used in paper.
In the paper, it says ViT-H, while in both code and checkpoint, it shows ViT-L.
Thanks!

bug with gradio_box

Hello. Thank you for your excellent work.
I encountered some issues while using the "gradio-box". I install the gradio-box following the instruction successfully. The first uploaded image works well with the gradio-box.
1
But when I upload the second image after clicking the clear button, it can not show image correctly.
2
The browser console has provided the following error message
3
Could you please answer it.

VG Region Captioning Evaluation

Hi,

Thanks for open-sourcing this great work! We are developing some region captioning models and would like to perform a fair comparison with GPT4ROI. Is it possible to release the VG validation data you used for calculating the scores in Table 4? Thanks in advance!

MMCV-FULL

image
image
Whether I downloaded the official Python version that matches mmcv full or the one you provided, this error occurred. Could you please clarify

Training time

Hi @jshilong, in the documentation, it's mentioned that GPT4RoI was trained on 8 A100 GPUs. Could you please provide insights into how much time it took for both stage-1 and stage-2? Having this information would be extremely helpful.

Thank you in advance.

Inquiry on evaluation scripts

Hi @jshilong, thanks for your great project!

I would like to reproduce your experimental results. Do you have a plan to release your evaluation scripts (e.g., Visual7W and VCR)? Thank you.

Question about Table 4

Hi, @jshilong @PeizeSun @ShoufaChen
I would like to ask some questions about "Table 4: Compariation of region caption ability on the validation dataset on Visual Genome".

  1. Do you divide the validation dataset for VG region caption task by yourselves?
    In the original VG dataset, it seems that there is no validation split.
    Could you please provide a link or a README to the validation dataset with me?

  2. Do you reproduce the result of GRiT?
    In GRiT's paper, it also seems that there is no related experimental result (e.g., CIDEr for the validation dataset for VG region captioning).
    Could you provide more details about this experiment?

Thank you in advance.

Question for app.py

Hello, thank you very much for your excellent contribution. But I encountered some issues while using the "app. py" code. My Graph Box is all correct. However, after entering text and pressing Enter, the run function cannot be triggered, which means that the demo has no response. After debugging, we still haven't found the problem. Could you please answer it.

Issue in 2nd Stage of Pre-training

I faced the following error when I launched the 2nd stage of pre-training.
ValueError: loaded state dict contains a parameter group that doesn’t match the size of optimizer’s group
This error is likely due to number of trainable parameters are different in the 2nd stage than the 1st stage. How did you resolved this?

What is the structure of this vision_tower?

Hello, thank you for your contribution. I meet a question on line 66 of the file models/spi_llava.py
image_forward_outs = vision_tower(images,output_hidden_states=True)
What is the structure of this vision_tower?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.