Comments (42)
does the hash match ?
Sorry I was being dumb. I can just simply use the huggingface path and it will download the correct weights by itself. There is no need to download the weights manually.
from minigpt-4.
@vtddggg @VvanGemert Thanks for your interest! Yes, as LLAMA doesn't allow to distributing their weights, models based on LLAMA have to find ways to bypass this rule, like releasing the 'delta' weight instead of the direct working weights. Direct working weights = delta weights + original llama weight. This is also the case for Vicuna as they explain in their instruction. Vicuna provides a script to convert the delta weight when you have LLAMA weight. I'm currently writing a simple introduction for the preparation of Vicuna.
from minigpt-4.
@Andy1621 Thanks for your interest! The weight you download from huggingface is the delta weight instead of the directly working weight. You need to follow their instruction to add delta weight back to the original llama weight to get the final working weight. The reason vicuna doesn't release the direct working weight is, we cannot distribute llama weight according to their rules. I'm currently writing a guide for preparing the Vicuna weight
from minigpt-4.
@alibabadoufu Their v0.1.10 tag should work. What is the weird thing you find?
This is my result:
Here is my steps to get the Vicuna weights
git clone --depth 1 --branch v0.1.10 https://github.com/lm-sys/FastChat.git
git lfs install
git clone https://huggingface.co/decapoda-research/llama-13b-hf
correct the name in the config.json and tokenizer_config.json
Install FastChat from source as specified in the FastChat github repo
Execute the following command:
python3 -m fastchat.model.apply_delta
--base xxx/llama-13b-hf
--target xxx/MiniGPT-4/vicuna-7b
--delta lmsys/vicuna-13b-delta-v0
(ignore the path, I double confirm that I set them correctly)
from minigpt-4.
For any one else having this issue , the version of fastchat matters ! obviously make sure to install
[email protected]
from minigpt-4.
I update the code to remove <s> now
from minigpt-4.
Thanks for your interest! We screenshot this image and checked the model output. It looks normal on my site.
I think some potential reason for your bug could be that you loaded the wrong weights. May I ask how you set up your Vicuna? Vicuna released a new version a few days ago, but currently, we are using the old version (v0 version, check the readme). So if you load the new Vicuna, it might not work.
from minigpt-4.
@TsuTikgiau I suspect its because when I git clone the old Vicuna weight there are some corruptions. I need to fix it first to confirm it's the problem.
from minigpt-4.
Thanks for your interest! We screenshot this image and checked the model output. It looks normal on my site. I think some potential reason for your bug could be that you loaded the wrong weights. May I ask how you set up your Vicuna? Vicuna released a new version a few days ago, but currently, we are using the old version (v0 version, check the readme). So if you load the new Vicuna, it might not work.
When I try to git clone https://huggingface.co/lmsys/vicuna-13b-delta-v0, I always encountered the following issue:
Encountered 3 file(s) that may not have been copied correctly on Windows:
pytorch_model-00003-of-00003.bin
pytorch_model-00001-of-00003.bin
pytorch_model-00002-of-00003.bin
See: `git lfs help smudge` for more details.
Have you experienced the same issue before? If yes, would you suggest some solutions?
from minigpt-4.
@alibabadoufu I've got the same issue. I've checked out these weights and they do work. https://huggingface.co/fasthuggy/vicuna-13b-delta-v1.1-fastchat-conversion/tree/main
Only get the result with strike-through text.
from minigpt-4.
@VvanGemert Thanks for your interest! The weights you share in the link is Vicuna V1.1, which was released only few days ago. Our current model is based on Vicuna V0 as we say in the readme. Therefore, this weight doesn't work for us currently. We plan to train a new version of MiniGPT-4 based on the new Vicuna V1.1 soon.
from minigpt-4.
alibabadoufu do you have git lfs installed? Please remember you can download the files manually. I just encountered a crash on git LFS and im just downloading the files manually.
Here: https://huggingface.co/lmsys/vicuna-13b-delta-v0/tree/main -> Files and versions
bottom left - Download
from minigpt-4.
alibabadoufu do you have git lfs installed? Please remember you can download the files manually. I just encountered a crash on git LFS and im just downloading the files manually. Here: https://huggingface.co/lmsys/vicuna-13b-delta-v0/tree/main -> Files and versions bottom left - Download
I tried doing that but the downloaded file was corrupted as well.
The file size specified and the downloaded file size do not match.
from minigpt-4.
does the hash match ?
from minigpt-4.
I use the vicuna-v0 but the result is still strike-through text.
Meanwhile, I noticed that when the model generates text, there raises a warning:
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
However, I have set padding_side='left'
in the init().
Could you examine why this happened? Thanks in advance :)
from minigpt-4.
I use the vicuna-v0 but the result is still strike-through text.
Meanwhile, I noticed that when the model generates text, there raises a warning:
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
However, I have setpadding_side='left'
in the init().Could you examine why this happened? Thanks in advance :)
I also met this warning. And the model gives some meaningless outputs:
<s> Moreoveravas зв $\{�added CentralONEyou)(partial \] †<s> 村 :)� журна�� sometimes \(\</s>
from minigpt-4.
But I found that even though I downloaded the weights directly using the the way I described above. The result still the same:
from minigpt-4.
But I found that even though I downloaded the weights directly using the the way I described above. The result still the same:
from minigpt-4.
As said by @pixeli99 , do we need to apply the given delta model on llama to get true vicuna weights? @TsuTikgiau
from minigpt-4.
As said by @pixeli99 , do we need to apply the given delta model on llama to get true vicuna weights? @TsuTikgiau
I'm trying this right now. It will take some time to combine the weights
from minigpt-4.
I want to know if I must fill out the form before I can get the weight of llama?
from minigpt-4.
I want to know if I must fill out the form before I can get the weight of llama
Check this
from minigpt-4.
I just meet the same issue.
And I found some keys in Vicuna are fine-tuned.
Is it right to directly download the checkpoint in huggingface, and load the special keys in pretrained_minigpt4.pth
?
Maybe you can provide a whole checkpoint of miniGPT-4 for reproducing the results easier.
from minigpt-4.
I want to know if I must fill out the form before I can get the weight of llama
Check this
tks
from minigpt-4.
@TsuTikgiau Thanks for your guidance! We will try this.
BTW, does mini-gpt4 model support in-context learning (like flamingo)?
from minigpt-4.
@TsuTikgiau Thanks for your suggestion! I wi try it later.
from minigpt-4.
@alibabadoufu @Richar-Du Thanks for your interest! The other user in this issue, @Andy1621 , shows similar issues to your cases. And according to his explanation, he didn't add the delta vicuna weight back to llama to get the final vicuna weight. According to LLAMA's rules, we cannot distribute llama's weight, that is the reason why models like Vicuna or Alpaca-Lore have to release the delta weghit instead of the direct working weight. I'm currently preparing a guide for the vicuna preparation, but if you are in a hurry, you can directly follow vicuna's introduction to create the final working weight
from minigpt-4.
@alibabadoufu @Richar-Du Thanks for your interest! The other user in this issue, @Andy1621 , shows similar issues to your cases. And according to his explanation, he didn't add the delta vicuna weight back to llama to get the final vicuna weight. According to LLAMA's rules, we cannot distribute llama's weight, that is the reason why models like Vicuna or Alpaca-Lore have to release the delta weghit instead of the direct working weight. I'm currently preparing a guide for the vicuna preparation, but if you are in a hurry, you can directly follow vicuna's introduction to create the final working weight
Thanks so much for your response. May I know which commit you use to convert the weights? I was using the latest Fastchat commit but I think its no longer supporting conversion for v.0 vicuna weights. I was trying the v0.1 tag but the result seems weird.
from minigpt-4.
@vtddggg We haven't tested the in-context learning ability yet. However, it is possible to construct such a setting using our code and check the performance. You can check the 'Chat.anser' function in the file 'minigpt4/conversation/conversation.py' to know how we add image embedding to the text embedding if you want to build a in context learning test
from minigpt-4.
@alibabadoufu Their v0.1.10 tag should work. What is the weird thing you find?
from minigpt-4.
@alibabadoufu 🤔 your steps look OK to me. I just create a guide in the PrepareVicuna.md under the root path. I also remake the Vicuna weight again just now from scratch, it works fine in my case. Maybe you can first check the guide and see if there is something different than what you did? May I ask if you see any strange outputs when you apply your delta to llama? And just for double check, you also load the pretrained single-layer we provide in the readme, right?
from minigpt-4.
@alibabadoufu 🤔 your steps look OK to me. I just create a guide in the PrepareVicuna.md under the root path. I also remake the Vicuna weight again just now from scratch, it works fine in my case. Maybe you can first check the guide and see if there is something different than what you did? May I ask if you see any strange outputs when you apply your delta to llama? And just for double check, you also load the pretrained single-layer we provide in the readme, right?
Thanks for the efforts :D. Will give it a try for the PrepareVicuna.md later.
By the way, the pretrained single-layer, are you referring to this step?
3. Prepare the pretrained MiniGPT-4 checkpoint
To play with our pretrained model, download the pretrained checkpoint [here](https://drive.google.com/file/d/1a4zLvaiDBr-36pasffmgpvH5P7CKmpze/view?usp=share_link). Then, set the path to the pretrained checkpoint in the evaluation config file in [eval_configs/minigpt4_eval.yaml](https://github.com/Vision-CAIR/MiniGPT-4/blob/main/eval_configs/minigpt4_eval.yaml#L10) at Line 11.
If so, then I can confirm I followed this instruction for reproducing the work earlier.
from minigpt-4.
@alibabadoufu It works for me after using the correct weights of vicuna.
However, there is some strange strikethrough in my results. I'm not sure it's a bug in the radio or others...
Here are my steps:
- Download the original LLAMA weight here.
- Use the following code to convert the weights (copy
convert_llama_weights_to_hf.py
fromtransformers
):
python src/transformers/models/llama/convert_llama_weights_to_hf.py \
--input_dir /path/to/downloaded/llama/weights --model_size 7B --output_dir /output/path
- Produce the vicuna weights (Using tag v0.1.10)
python3 -m fastchat.model.apply_delta \
--base /path/to/llama-13b \
--target /output/path/to/vicuna-13b \
--delta lmsys/vicuna-13b-delta-v1.0
from minigpt-4.
@alibabadoufu It works for me after using the correct weights of vicuna. However, there is some strange strikethrough in my results. I'm not sure it's a bug in the radio or others... Here are my steps:
- Download the original LLAMA weight here.
- Use the following code to convert the weights (copy
convert_llama_weights_to_hf.py
fromtransformers
):python src/transformers/models/llama/convert_llama_weights_to_hf.py \ --input_dir /path/to/downloaded/llama/weights --model_size 7B --output_dir /output/path
- Produce the vicuna weights (Using tag v0.1.10)
python3 -m fastchat.model.apply_delta \ --base /path/to/llama-13b \ --target /output/path/to/vicuna-13b \ --delta lmsys/vicuna-13b-delta-v1.0
I suspect the llama weights in the huggingface is not the same as what the authors originally used for this project. I will give it a try later following your instruction. Thanks a lot Andy!
from minigpt-4.
@alibabadoufu It works for me after using the correct weights of vicuna. However, there is some strange strikethrough in my results. I'm not sure it's a bug in the radio or others... Here are my steps:
- Download the original LLAMA weight here.
- Use the following code to convert the weights (copy
convert_llama_weights_to_hf.py
fromtransformers
):python src/transformers/models/llama/convert_llama_weights_to_hf.py \ --input_dir /path/to/downloaded/llama/weights --model_size 7B --output_dir /output/path
- Produce the vicuna weights (Using tag v0.1.10)
python3 -m fastchat.model.apply_delta \ --base /path/to/llama-13b \ --target /output/path/to/vicuna-13b \ --delta lmsys/vicuna-13b-delta-v1.0
I think I found the reason.
In the eval_configs/minigpt4_eval.yaml file, if we activate the low_resouce, I will get the same strange result I reported above. However, if I deactivated it like the following:
model:
arch: mini_gpt4
model_type: pretrain_vicuna
freeze_vit: True
freeze_qformer: True
max_txt_len: 160
end_sym: "###"
low_resource: False
prompt_path: "prompts/alignment.txt"
prompt_template: '###Human: {} ###Assistant: '
ckpt: '/data/jasper.laiwy/MiniGPT-4/pretrained_minigpt4.pth'
The result looks awesome now!
But the crossout text is still there.
from minigpt-4.
I guess the strikethrough issue might be caused by the tokenizer. Maybe you can try to print the output token id out for the text with a strikethrough. I can help check how my tokenizer decodes the id and whether it contains something strange. To print the token id for debugging, you can simply add a print(output_token.cpu().numpy()) in Chat.answer in minigpt4/conversation/conversation.py
from minigpt-4.
When trying to merge delta for 13B I am getting
RuntimeError: The size of tensor a (32000) must match the size of tensor b (32001) at non-singleton dimension 0
Any ideas or help ?
from minigpt-4.
When trying to merge delta for 13B I am getting RuntimeError: The size of tensor a (32000) must match the size of tensor b (32001) at non-singleton dimension 0
Any ideas or help ?
You use the wrong FastChat version. I guess you were using the latest version which doesn't support the weight merging for v0 vicuna model. You need to check out the v0.1.1 tag. Please read through the comments above.
from minigpt-4.
I added the print function here
/data/jasper.laiwy/MiniGPT-4/minigpt4/conversation/conversation.py
if output_token[0] == 0:
output_token = output_token[1:]
output_text = self.model.llama_tokenizer.decode(output_token, add_special_tokens=False)
output_text = output_text.split('###')[0] # remove the stop sign '###'
output_text = output_text.split('Assistant:')[-1].strip()
print(output_token.cpu().numpy())
conv.messages[-1][1] = output_text
return output_text, output_token.cpu().numpy()
Here is the output tokens:
[ 1 450 6114 297 278 1967 338 13407 297 4565 310 263
19571 591 4362 263 302 1151 28684 269 3466 10714 411 263
715 686 292 18873 1220 322 263 1183 261 1250 29889 2296
338 3063 472 902 17842 297 278 19571 29889 2277 29937]
Here is the output text:
The woman in the image is standing in front of a mirror wearing a nude colored slip dress with a plunging neckline and a sheer back. She is looking at her reflection in the mirror.
from minigpt-4.
@alibabadoufu You can simply remove the <s>
in the output, which is used as Strikethrough in Markdown.
llm_message = llm_message.replace("<s>", "")
from minigpt-4.
@alibabadoufu You can simply remove the
<s>
in the output, which is used as Strikethrough in Markdown.llm_message = llm_message.replace("<s>", "")
Thanks :D I was looking ways to remove it. Thanks for your suggestion. Now it works like a charm!
Thanks for all the team member's efforts in this project.
from minigpt-4.
@TsuTikgiau Thanks for your guidance! We will try this. BTW, does mini-gpt4 model support in-context learning (like flamingo)?
@vtddggg Have you try the in-context learning of it?
from minigpt-4.
Related Issues (20)
- Vicuna 13b model, could not parse ModelProto from /home/ken/git-repos/vicuna/config.json HOT 3
- MiniGPTV2 Text to Text
- what does the checkpoint2(stage2) and checkpoint(stage3) mean in the Minigpt_v2?
- lego
- [Detection] Is there any way to output detection confidence?
- /media/dell/data1/ljw/code/test3/New/MiniGPT-4/eval_scripts/eval_vqa.py: 行 24: 未预期的符号 `(' 附近有语法错误
- The order of connection between word vectors and image vectors in prompt
- The Hugging Space app is not available.
- I want to use minigpt-v2 to achieve image search. What are the implementation ideas
- What does the parameter "data" printed during training mean HOT 1
- Where is the region_descriptions.json?
- loss becomes nan during training
- valid_step in training epochs
- Could not create share link. Please check your internet connection or our status page: https://status.gradio.app
- MiniGPT-4 Colab notebook
- demo.py error when starting chat HOT 2
- Training Data Download
- Method to accelerate the inference
- Where I can add special token?
- Inconsistent outputs of MiniGPT-v2 (peft version is 0.2.0)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from minigpt-4.