ceezh / llovi Goto Github PK

Official implementation for "A Simple LLM Framework for Long-Range Video Question-Answering"

License: MIT License

Python 100.00%

llovi's Issues

Performance of LLoVi with 7B llama2

Dear author, thank you for your work! I would like to know the performance of LLoVi on next-qa, next-gqa and IntentQA, when using 7b llama2 as the LLM.
For larger model like gpt3.5 and gpt4, they are not open-source so we cannot do research on how to improve them on this task. So I think it's beneficial for the community to report the performance on smaller llm

Great work! Timeline on code release?

Hi, very cool work! I wanted to play around with it, do you have some sense of when the code might be released?

Visual Captioner

Does this code includes visual captioner ?

Inconsistent Results on EgoSchema

Hi, Thanks for your great work! I tried to reproduce your results on EgoSchema but found some inconsistency. Specifically, I tried to reproduce the results with standard prompt and (C, Q) —> S prompt with the following command:

standard prompt

python main.py --model gpt-3.5-turbo-1106 --output_base_path output/egoschema --output_filename standard_qa_1106.json

Results:
    "num_total": 500,
    "num_valids": 453,
    "num_corrects": 266,
    "acc": 0.532,

(C, Q) —> S prompt

python main.py --model gpt-3.5-turbo-1106 --task sum --prompt_type sum_q --num_words_in_sum 500 --temperature 1.0 --output_base_path output/egoschema --output_filename sum_q_500_1106.json

python main.py --model gpt-3.5-turbo-1106 --prompt_type qa_sum --data_path output/egoschema/sum_q_500_1106_data.json --output_base_path output/egoschema --output_filename qa_sum_q_500_1106.json

Results:
    "num_total": 500,
    "num_valids": 493,
    "num_corrects": 278,
    "acc": 0.556,

However, it seems the results are different with the reported results in the README:

LaViLa	gpt-3.5-turbo-1106	standard	55.2
LaViLa	gpt-3.5-turbo-1106	(C, Q) —> S	58.8

I have not modified any code and use the captions you released. Any possible reasons for the inconsistency? I also noticed that the results in the README are slightly different with those in the paper. Could you please tell me what is the reason behind? Thank you!

Best regards

Please provide full narration hyper-parameters

Hi ;)

for comparability reasons, it would be beneficial for the community to have insights into the full hyper-parameter setups.
I am especially interested in the LaViLa captioning config to use with your provided fair checkpoint for EgoSchema.
In detail, I need the following information:

You say you use nucleus sampling with top_p=0.95 and choose k=5 for having 5 candidates. What temperature do you use?
Besides, I saw you reporting in the paper that you use a temperature=0.0 for the LLMs, but I see in the readme you provide example commands for the summarization task with temperature=1.0. So do you use temperature=1.0 for LLM in summarization task and temperature=0.0 in QA task?

Clearification would be much appreciated! :)

Cheers,
Maximotus

Inquiry about resources and processng time

Hi,
I am currently working with the NExT-QA dataset and I ran your code using the model meta-llama/Meta-Llama-3-8B, as GPT-3.5 and GPT-4 are not open source. Could you please provide details on the resources you used to achieve the results with the NExT-QA dataset? Additionally, how long did it take for 1000 annotations to be processed?

ceezh / llovi Goto Github PK

llovi's Issues

Performance of LLoVi with 7B llama2

Great work! Timeline on code release?

Visual Captioner

Inconsistent Results on EgoSchema

Please provide full narration hyper-parameters

Inquiry about resources and processng time

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent