Git Product home page Git Product logo

llovi's Issues

Performance of LLoVi with 7B llama2

Dear author, thank you for your work! I would like to know the performance of LLoVi on next-qa, next-gqa and IntentQA, when using 7b llama2 as the LLM.
For larger model like gpt3.5 and gpt4, they are not open-source so we cannot do research on how to improve them on this task. So I think it's beneficial for the community to report the performance on smaller llm

Inconsistent Results on EgoSchema

Hi, Thanks for your great work! I tried to reproduce your results on EgoSchema but found some inconsistency. Specifically, I tried to reproduce the results with standard prompt and (C, Q) —> S prompt with the following command:

standard prompt

python main.py --model gpt-3.5-turbo-1106 --output_base_path output/egoschema --output_filename standard_qa_1106.json

Results:
    "num_total": 500,
    "num_valids": 453,
    "num_corrects": 266,
    "acc": 0.532,

(C, Q) —> S prompt

python main.py --model gpt-3.5-turbo-1106 --task sum --prompt_type sum_q --num_words_in_sum 500 --temperature 1.0 --output_base_path output/egoschema --output_filename sum_q_500_1106.json

python main.py --model gpt-3.5-turbo-1106 --prompt_type qa_sum --data_path output/egoschema/sum_q_500_1106_data.json --output_base_path output/egoschema --output_filename qa_sum_q_500_1106.json

Results:
    "num_total": 500,
    "num_valids": 493,
    "num_corrects": 278,
    "acc": 0.556,

However, it seems the results are different with the reported results in the README:

LaViLa	gpt-3.5-turbo-1106	standard	55.2
LaViLa	gpt-3.5-turbo-1106	(C, Q) —> S	58.8

I have not modified any code and use the captions you released. Any possible reasons for the inconsistency? I also noticed that the results in the README are slightly different with those in the paper. Could you please tell me what is the reason behind? Thank you!

Best regards

Please provide full narration hyper-parameters

Hi ;)

for comparability reasons, it would be beneficial for the community to have insights into the full hyper-parameter setups.
I am especially interested in the LaViLa captioning config to use with your provided fair checkpoint for EgoSchema.
In detail, I need the following information:

  1. You say you use nucleus sampling with top_p=0.95 and choose k=5 for having 5 candidates. What temperature do you use?
  2. Besides, I saw you reporting in the paper that you use a temperature=0.0 for the LLMs, but I see in the readme you provide example commands for the summarization task with temperature=1.0. So do you use temperature=1.0 for LLM in summarization task and temperature=0.0 in QA task?

Clearification would be much appreciated! :)

Cheers,
Maximotus

Inquiry about resources and processng time

Hi,
I am currently working with the NExT-QA dataset and I ran your code using the model meta-llama/Meta-Llama-3-8B, as GPT-3.5 and GPT-4 are not open source. Could you please provide details on the resources you used to achieve the results with the NExT-QA dataset? Additionally, how long did it take for 1000 annotations to be processed?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.