Git Product home page Git Product logo

tifa's Issues

Fine-tuned Flan-T5 release

Hi,

thanks for the amazing work. Could you maybe provide an estimated time for the fine-tuned Flan-T5 release?

Thanks a lot and looking forward to try it out!

Potentially Missing Questions

In the question_gen.py script, it looks like there are 12 in-context examples. In the paper, it says there are 15 examples. Any chance there are 2 missing?

what is the "ast_indexer" and how to change the path of it?

Thanks for your great work!
when I runing the code it will tell me
modelscope - INFO - Loading ast index from /home/nudt/.cache/modelscope/ast_indexer

however, this path is not convenient for me, how can I make it load from the project directory?

Errors in dictionary handling

In the function tifa_score_single, there are these lines:

        if question_answer_pair['question'] not in question_logs:
            question_logs[question_answer_pair['question']] = question_answer_pair
        choices=question_answer_pair['choices']

Consider changing this to:

        if question_answer_pair['question'] not in question_logs:
            question_logs[question_answer_pair['question']] = copy.deepcopy(question_answer_pair)
        choices=question_answer_pair['choices']

Otherwise, whenever you run
result = tifa_score_single(vqa_model, filtered_questions, img_path)
you are changing the original filtered_questions, and result contain a reference to filtered_questions. Wierd things would happen. For example, if you make a new call with the same filtered_questions, the result from the previous call would be changed.

what's mplug-large

image
Hello, when I used the script you provided to test, "loading mplug-large" was displayed, but it failed. What is mplug-large? Is there another way I can solve this problem?

what's the minimum gpu to run 1024*1024 size image?

I tried running tifa_test.py on single image with your drawbench_8.jpg (with llama2) and it worked. When i tried running it with my own image(size of 1150*750) and it returned OOM issue. I'm on L4 gpu(24gb vram).
To bypass the issue, I emptied the gpu cache after after llama tasks and ran tifa_score_single.

what your suggestion for minimal vram?

vqa_models misjudge the spatial relationship in most of the cases

Hello, thank you for your repo.

When I test some cases with tifa, I find that the model totally confused right and left, below is an example:

what I generate is as follows:
image

The result by mPLUG is {'id': 'paintskill_29', 'caption': 'a photo of bird and boat; boat is right to bird', 'question': 'is the boat right to or left to the bird?', 'choices': ['right to', 'left to', 'in front of', 'behind'], 'answer': 'right to', 'element_type': 'spatial', 'element': 'right to', 'free_form_vqa': 'left', 'multiple_choice_vqa': 'left to', 'scores': 0, 'pred_image_path': '/share/project/yhy/project/frag/image_editing_pipeline/baseline/LayoutLLM_T2I_main/auto_RAIG_output/tifa/1111.png'}

Similarly, for the image:
image
The result is
{'id': 'paintskill_14', 'caption': 'a photo of bike and chair; chair is below bike', 'question': 'is the chair below or above the bike?', 'choices': ['below', 'above', 'next to', 'behind'], 'answer': 'below', 'element_type': 'spatial', 'element': 'below', 'free_form_vqa': 'above', 'multiple_choice_vqa': 'above', 'scores': 0, 'pred_image_path': '/share/project/yhy/project/frag/image_editing_pipeline/baseline/LayoutLLM_T2I_main/auto_RAIG_output/tifa/1072.png'}

Is there something wrong?
Can you help me verify this result to test whether it is the bug of my code? (Actually I barely changed the repository code)

Thank you in advance.

MPLUG model not available

Hi,
It looks like the MPLUG and the ofa-large models are no longer available on HuggingFace.
Can you re-upload them or publish the checkpoints somewhere else?
As MPLUG performed best in your experiments, it would be great to use that model!

Thanks is advance :)

OpenAI API update

Hi
Thank you for your great work. I try to use your repo but so far run into problems when trying to reach openai servers.

Traceback (most recent call last):
  File "/home/anasrezklinux/anas_april/visual_story.py", line 1133, in <module>
    custom_diffusion_inference([character_1, character_2], step, lr)
  File "/home/anasrezklinux/anas_april/visual_story.py", line 512, in custom_diffusion_inference
    TIFA_metric_score, DALL_eval_score, ViTS_16_DINO_embeddings = score_images(image_path, real_photo_path_list, prompt)
  File "/home/anasrezklinux/anas_april/visual_story.py", line 53, in score_images
    return TIFA_metric_score(prompt, image_path),DALL_eval_score(prompt, image_path),[ViTS_16_DINO_embeddings(image_path, real_image_path) for real_image_path in real_image_paths]
  File "/home/anasrezklinux/anas_april/compile_story.py", line 86, in TIFA_metric_score
    gpt3_questions = get_question_and_answers(prompt)
  File "/home/anasrezklinux/anas_april/tifa/tifascore/question_gen.py", line 547, in get_question_and_answers
    resp = openai_completion(this_prompt)
  File "/home/anasrezklinux/anas_april/tifa/tifascore/openai_api.py", line 6, in openai_completion
    resp =  openai.ChatCompletion.create(
  File "/home/anasrezklinux/anas_april/venv/lib/python3.10/site-packages/openai/lib/_old_api.py", line 39, in __call__
    raise APIRemovedInV1(symbol=self._symbol)
openai.lib._old_api.APIRemovedInV1: 

You tried to access openai.ChatCompletion, but this is no longer supported in openai>=1.0.0 - see the README at https://github.com/openai/openai-python for the API.

You can run `openai migrate` to automatically upgrade your codebase to use the 1.0.0 interface. 

Alternatively, you can pin your installation to the old version, e.g. `pip install openai==0.28`

A detailed migration guide is available here: https://github.com/openai/openai-python/discussions/742

for now I will revert to openai==0.28 , yet, it would be great if you could update this repo :)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.