vision-cair / minigpt-4 Goto Github PK

View Code? Open in Web Editor NEW

25.1K 25.1K 2.9K 65.02 MB

Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)

Home Page: https://minigpt-4.github.io

License: BSD 3-Clause "New" or "Revised" License

Python 99.86% Shell 0.14%

minigpt-4's People

Contributors

Stargazers

Watchers

Forkers

jettisonthenet xeerox666 zzmjohn dumpmemory 152334h felixzhang7 harry-stark kemolo brianjking ykx-a etrigger nonfungiblefuturist ooropuloo tomchapin albertoual hay muharremokutan alpine01 alibabadoufu antelligent-app gtmhub setmaster abdallah065 hbcbh1999 kumar045 fabiorafaelcoutada doncai pradeepkr1003 shekharsorot phbou72 generousai huangzhimin4read florin-t gidraf commerceless iamleon121 yuelight cgwgpt funson jeremiahyan sibtainrazajamali hotco87 lihqi wprobot dhrubasumatary manny1185 itsharex chendamin lericdax aorist-ai minmax329 athrunsunny mohanli cslily freefrancisco alexanderinum kustomzone anayo-hub tironiigor shafiahmed dan255 denganhui arnaudstiegler nguyendhst ethansystem gemone fadinglan233 skylarkarms briansigafoos gruvito zekijohn mirprog nishatvasker patrickattankurugu zhuifeng414 derrick-xwp positivevibe marcusasplund zakariae-lahbabi kvanlier theonetrueguy jamesthesnake musweu10 chrishabiak keyboardcartel format37 wregret wei-qingyu kvothemosser goswamig graphgrailai miguelgargallo crivetimihai c0debrain fjfd vikrantrathore wonglynn haojunyu1998 thetargo xhcom-ui

minigpt-4's Issues

difficulties involved with inference on a consumer GPU

Here are the problems I've found today:

vicuna is loaded as fp16. This is a problem for obvious reasons (13B * 2 > all consumer GPUs).
beam search default of 5 beams consumes a lot of vram during generation

To address these problems, I have created a fork where:

vicuna is loaded with 8bit
num_beams is set to 1 by default
I also put ViT-L on CPU (as fp32), because the encoder only needs 1 pass

How to load model with load_8_bit

My GPU is 3090ti, 24G. I have to use load_8_bit to load vicuna 13b. Could you can tell me how to do on MiniGPT-4?

Model weights

Will the model weights used to power the demo be released?

What is the minimum vram size to run the demo?

Hi! I would like to ask about min vram required tu run the demo. I tried to run it on 3090 but got:

Does cpu offloading may work?

Thanks in advance for reply

dependency conflict in environment.yml

When running conda env create -f environment.yml, eventually you get this:

The conflict is caused by:
    The user requested huggingface-hub==0.12.1
    transformers 4.28.0 depends on huggingface-hub<1.0 and >=0.11.0
    timm 0.6.13 depends on huggingface-hub
    gradio 3.24.1 depends on huggingface-hub>=0.13.0

trying again with huggingface-hub==0.13.4 right now

Error when preparing Vicuna v0 weights? Can I use Vicuna v1.1?

Hi,

Thanks for releasing the interesting work.
I'm trying to deploy it on my server.
However, I encountered some difficulties when preparing Vicuna weights.

When apply the delta weights of Vicuna to the original LLaMa weights, I always got vocab mismatch error like this:
RuntimeError: The size of tensor a (32000) must match the size of tensor b (32001) at non-singleton dimension 0
I've searched issues in the FastChat repo but didn't find effective solution.
The author of FastChat suggests to directly move to Vicuna v1.1 since they have fixed a lot of issues in the new version.

I'd like to ask

Do you have any experiences/suggestions to solve the issues I encountered?
Do you think it's feasible to directly move to Vicuna v1.1? I noticed that in the new version, they have some changes, like the separator has been changed from ### to </s>. I'm not sure if it is compatible with MiniGPT-4.

Thanks!

GPU mem for run inference only

Any guidance to run this inference only on GPU mem usage?

set `padding_side='left'

when I run demo.py, error happends: A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set padding_side='left' when initializing the tokenizer.

Weird result

I was trying to serve the model and after I successfully I uploaded the image, the response seem to be weird.

mini-gpt4

Unable to combine Vicuna’s delta weight and original weight

Has anyone else had the same problem as me?

I followed PrepareVicuna.md and download llama-13b-hf by

git clone https://huggingface.co/decapoda-research/llama-13b-hf.

But in the last step, the problem happened.
Could this be the cause of the problem? (But I run it on ubuntu)

Any idea if this will work on CPU?

First of all, thanks for this great project! The output quality seems very good, and the idea of running a multimodal model to work locally is awesome. It seems we already have a GPT-4 like multimodal model in our hands, so very exciting. I was wondering if it is possible to run with llama.cpp on CPU? I am currently running Vicuna-13b on CPU (the 4-bit quantized version) - around 8 GB Ram is enough. It works just fine, and the inference speed is about 1.5 tokens per second for my computer. (lt also seems to work on mobile phones with enough memory. I did not try it, but I saw a few examples). llama.cpp has their own file format (ggml), and provide a way to convert from original weights to ggml. It would be great if people with low VRAM or no VRAM can make it work on CPU. Any thoughts?

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!

How can I enlarge the input tokens without destroying the pre-training weight of Q-former

How to increase my input token without destroying the pre-training weight of Q-former. It seems that the input token is at most 32, so it can't interact with the model well.

Could I have the stage 1 checkpoint model?

Good morning, could I have the stage 1 checkpoint model? As I would like to fine tune the model on stage 2 with my own dataset, thanks for your help!

segmentation fault

mac Apple M1 Pro 32 GB

Vision-CAIR/MiniGPT-4/minigpt4/models/eva_vit.py
self.pos_embed = nn.Parameter(torch.zeros(1, num_patches + 1, embed_dim))

num_patches = 256
embed_dim = 1408

but I got error: "[1] 14423 segmentation fault python demo.py --cfg-path eval_configs/minigpt4_eval.yaml"

Then I find while num_patches>=24, it will error: "segmentation fault"

why?

Mistake in the preparation of vicuna weights (error when loading delta weights)

I run the script to get the vicuna weights and get the following error:

python3 -m fastchat.model.apply_delta
--base /home/docker_current/src/llama_weights/llama-13b
--target /home/docker_current/src/vicuna_weight
--delta /home/docker_current/src/vicuna-13b-delta-v0

Loading the model on multiple GPUs

I have two 4090 24GB, if possible please provide an extra argument to demo.py to either load the model on CPU or 2 or more GPU and another argument to run on 16-bit and take advantage of extra GPU RAM, instead of editing config files.

Non-existent version of transformers

Suggested:
pip install git+https://github.com/huggingface/[email protected]
However, no such version of transformers exists.

This version also conflicts with environment.yml

Ask anything in video

Hi! We have simply extended MiniGPT-4 for video question answering in our project Ask-Anything. Without extra instruction fine-tuning, current results are not satisfactory.

In our other try, we simply encode the video as captions, and input them with ChatGPT, which provides better results.

Now we are trying to build a real video ChatBot with fantastic techniques as used in MiniGPT-4 and Llava. Hopefully, everyone can try our demo, and find the problem, we will try our best to fix it in our future ChatBot.

Issues with image loading and accelerate

FYI , when starting the demo file I get the following message
torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: libtorch_cuda_cu.so: cannot open shared object file:

the other one is to do the hugging-face accelerate

This model has some weights that should be kept in higher precision, you need to upgrade accelerate to properly deal with them (pip install --upgrade accelerate)

The diffrence with BLIP2?

It seems that the mini-GPT4 is the BLIP2 that just changing the LLM to the open-source GPT?

有人成功了吗？

我试图用我的3060（12G）跑，内存不够，需要什么配置的卡才能成功呀，多卡行不行

我在huggingface上建立了一个空间，MiniGPT-4的docker部署，有没有成功的大佬来指点一下
https://huggingface.co/spaces/zylj/MiniGPT-4

MiniGpt4

[Feature Request]add use minigpt4 in google colab

Many people have difficulty using minigpt4, so can you provide a colab on how to use it?

Windows not supported

To save fellow Windows users some time and download bandwidth, this project only runs on Linux due to (at least) one of the dependencies not supporting Windows.

https://github.com/TimDettmers/bitsandbytes#requirements--installation

You'll receive an error upon running it.

python.exe .\demo.py --cfg-path .\eval_configs\minigpt4_eval.yaml
# Required library version not found: libsbitsandbytes_cpu.so. Maybe you need to compile it from source?

The .so extension is for Linux only.

Mistake in the preparation of vicuna weights (error when loading delta weights)

I run the script to get the vicuna weights and get the following error:

python -m fastchat.model.apply_delta --base I:\chatgpt\minigpt4\MiniGPT-4\llama-13b-hf --target I:\chatgpt\minigpt4\MiniGPT-4\model --delta I:\chatgpt\minigpt4\MiniGPT-4\vicuna-13b-delta-v0

Interleaved image/text chat

Correct me if i'm wrong but reading the Blip paper, i believe there's no reason why we can't have some kind of interleaved chat with the model correct ? For instance asking a question about one image then asking a question about another image then asking a question that relates the conversation and two images ?

I'm wondering if you could support this in your code if possible ?

How to make this?

(minigpt4) C:\Users\AiFeier\Desktop\Code\DeepLearning\AI\MiniGPT-4>python demo.py --cfg-path eval_configs/minigpt4_eval.yaml
Initializing Chat
Loading VIT
Loading VIT Done
Loading Q-Former
Loading Q-Former Done
Loading LLAMA
Traceback (most recent call last):
File "C:\Users\AiFeier\Desktop\Code\DeepLearning\AI\MiniGPT-4\demo.py", line 57, in
model = model_cls.from_config(model_config).to('cuda:0')
File "C:\Users\AiFeier\Desktop\Code\DeepLearning\AI\MiniGPT-4\minigpt4\models\mini_gpt4.py", line 241, in from_config
model = cls(
File "C:\Users\AiFeier\Desktop\Code\DeepLearning\AI\MiniGPT-4\minigpt4\models\mini_gpt4.py", line 85, in init
self.llama_tokenizer = LlamaTokenizer.from_pretrained(llama_model, use_fast=False)
File "E:\environment\conda\Miniconda\envs\minigpt4\lib\site-packages\transformers\tokenization_utils_base.py", line 1770, in from_pretrained
resolved_vocab_files[file_id] = cached_file(
File "E:\environment\conda\Miniconda\envs\minigpt4\lib\site-packages\transformers\utils\hub.py", line 409, in cached_file
resolved_file = hf_hub_download(
File "E:\environment\conda\Miniconda\envs\minigpt4\lib\site-packages\huggingface_hub\utils_validators.py", line 112, in _inner_fn
validate_repo_id(arg_value)
File "E:\environment\conda\Miniconda\envs\minigpt4\lib\site-packages\huggingface_hub\utils_validators.py", line 162, in validate_repo_id
f" '{repo_id}'. Use repo_type argument if needed."
NameError: name 'repo_type' is not defined

Thanks for your great project. Is there any results on some standard benchmarks?

Such as COCO, Flickr30k caption, VQAv2, etc.

Batch process feature - very need!

Plz add batch for process directory of jpg files!
Thanks.

So sometimes this model may describe images with imagination?

Loading LLAMA Error

Might be dumb question , but any ideas what this error means ? / solve it ?

Loading LLAMA Traceback (most recent call last): File "/content/drive/MyDrive/chatgpt4/MiniGPT-4/demo.py", line 57, in <module> model = model_cls.from_config(model_config).to('cuda:0') File "/content/drive/MyDrive/chatgpt4/MiniGPT-4/minigpt4/models/mini_gpt4.py", line 241, in from_config model = cls( File "/content/drive/MyDrive/chatgpt4/MiniGPT-4/minigpt4/models/mini_gpt4.py", line 85, in __init__ self.llama_tokenizer = LlamaTokenizer.from_pretrained(llama_model, use_fast=False) File "/usr/local/envs/minigpt4/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1811, in from_pretrained return cls._from_pretrained( File "/usr/local/envs/minigpt4/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1965, in _from_pretrained tokenizer = cls(*init_inputs, **init_kwargs) File "/usr/local/envs/minigpt4/lib/python3.9/site-packages/transformers/models/llama/tokenization_llama.py", line 96, in __init__ self.sp_model.Load(vocab_file) File "/usr/local/envs/minigpt4/lib/python3.9/site-packages/sentencepiece/__init__.py", line 905, in Load return self.LoadFromFile(model_file) File "/usr/local/envs/minigpt4/lib/python3.9/site-packages/sentencepiece/__init__.py", line 310, in LoadFromFile return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg) RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())]

GPTQ quantization version Vicuna?

Can we use a GPTQ quantized version of Vicuna v0 as the backbone?

Trouble getting Vicuna weights

I followed the links to get Vicuna weights but it appears as though I need to fill out a form to get the LLaMa weights and then apply the delta model on top of it?

Am I missing something? If not is there a faster way to get the Vicuna weights?

The top five most profitable and promising market research in the design industry

strikethrough character on gradio web app

The webapp works great and I don't get any errors in working, but I stick to responses with strikethrough character. Does anyone know why?

Thank you and congratulations for the project!!

Benchmark of miniGPT-4 vs. SceneX vs Midjourney `/describe` for Understanding Complex Scenes

did a benchmark

full article on experiment https://jina.ai/news/scenexplain-vs-minigpt4-a-comprehensive-benchmark-of-top-5-image-captioning-algorithms-for-understanding-complex-scenes/

Error running evaluation on dual GPUs

I'm attempting to run the model on dual RTX 4090s. Enabling this would be a great update and would allow more people to run the full float16 model.

Some changes would need to be made, starting by passing kwargs = { "device_map": "auto", "max_memory": {i: "13GiB" for i in range(num_gpus)}, } to LlamaForCausalLM.from_pretrained.

After making this change the model loads but throws the following error when submitting text:
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!

Full Logs
minigpt4-error-logs.txt

Conda env create error on Mac M2

MacBook Air M2

Steps to reproduce:

conda env create -f environment.yml
Collecting package metadata (repodata.json): done
Solving environment: failed

ResolvePackageNotFound:

cudatoolkit

run demo.py error

Hi,
thank you for your great project! but when i operator your guide, it raise error
my environment is A100, contain 4 cards
error is:

(minigpt4) shenjh@chintAI03:~/github/MiniGPT-4$ python demo.py --cfg-path eval_configs/minigpt4_eval.yaml
/home/shenjh/anaconda3/envs/minigpt4/lib/python3.9/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: libtorch_cuda_cu.so: cannot open shared object file: No such file or directory
warn(f"Failed to load image Python extension: {e}")
Initializing Chat
Loading VIT
Loading VIT Done
Loading Q-Former
Loading Q-Former Done
Loading LLAMA
Traceback (most recent call last):
File "/home/shenjh/github/MiniGPT-4/demo.py", line 57, in
model = model_cls.from_config(model_config).to('cuda:0')
File "/home/shenjh/github/MiniGPT-4/minigpt4/models/mini_gpt4.py", line 241, in from_config
model = cls(
File "/home/shenjh/github/MiniGPT-4/minigpt4/models/mini_gpt4.py", line 85, in init
self.llama_tokenizer = LlamaTokenizer.from_pretrained(llama_model, use_fast=False)
File "/home/shenjh/anaconda3/envs/minigpt4/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1770, in from_pretrained
resolved_vocab_files[file_id] = cached_file(
File "/home/shenjh/anaconda3/envs/minigpt4/lib/python3.9/site-packages/transformers/utils/hub.py", line 409, in cached_file
resolved_file = hf_hub_download(
File "/home/shenjh/anaconda3/envs/minigpt4/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 112, in _inner_fn
validate_repo_id(arg_value)
File "/home/shenjh/anaconda3/envs/minigpt4/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 160, in validate_repo_id
raise HFValidationError(
huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/path/to/vicuna/weights/'. Use repo_type argument if needed.

could you give some suggest?

typo

nevermind.

GPU Memory of A100

Thanks for your excellent work! What's the GPU memory of your A100, 40G, or 80G? I have tried vicuna-13b on 8xA100(40G), but it will result in an OOM error.

`load_in_8bit_fp32_cpu_offload=True

Any idea how to solve this:

Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit
the quantized model. If you want to dispatch the model on the CPU or the disk while keeping
these modules in 32-bit, you need to set load_in_8bit_fp32_cpu_offload=True and pass a custom
device_map to from_pretrained. Check
https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu
for more details.

I have 48gb of vram the GPU RAM must be enough!

OSError: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory

Hi, all:
I clone the repo and download the weights from https://huggingface.co/lmsys/vicuna-13b-delta-v0/tree/main, when run

python demo.py --cfg-path eval_configs/minigpt4_eval.yaml

I encounter the following error:

OSError: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in 
directory /media/sdb/xxx/LLama/vicuna_weights.

I double check the downloaded weights in /media/sdb/xxx/LLama/vicuna_weights:

Anyone know what caused the OSError message? thanks.

Correct work, but crossed out text

Started the model, answers image questions correctly, but the text is crossed out for some reason

Strikethrough text (using a V100 graphics card)

I am using a V100 graphics card (it does not support int8 operations, when I try to run low_resource=True there is an error with cuda_setup).

I put low_resource=False and run it, but I get crossed out and wrong text

I used the right weights (git-lfs clone https://huggingface.co/lmsys/vicuna-13b-delta-v0)

where I ask question response Error:IndexError: piece id is out of range

When I use running the demo.py it's normal but when I try to ask question,something was wrong

Traceback (most recent call last):
File "/root/anaconda3/envs/minigpt4/lib/python3.9/site-packages/gradio/routes.py", line 393, in run_predict
output = await app.get_blocks().process_api(
File "/root/anaconda3/envs/minigpt4/lib/python3.9/site-packages/gradio/blocks.py", line 1108, in process_api
result = await self.call_function(
File "/root/anaconda3/envs/minigpt4/lib/python3.9/site-packages/gradio/blocks.py", line 915, in call_function
prediction = await anyio.to_thread.run_sync(
File "/root/anaconda3/envs/minigpt4/lib/python3.9/site-packages/anyio/to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/root/anaconda3/envs/minigpt4/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "/root/anaconda3/envs/minigpt4/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 867, in run
result = context.run(func, *args)
File "/MiniGPT-4/demo.py", line 92, in gradio_answer
llm_message = chat.answer(conv=chat_state, img_list=img_list, max_new_tokens=1000, num_beams=num_beams, temperature=temperature)[0]
File "/MiniGPT-4/minigpt4/conversation/conversation.py", line 156, in answer
output_text = self.model.llama_tokenizer.decode(output_token, add_special_tokens=False)
File "/root/anaconda3/envs/minigpt4/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 3486, in decode
return self._decode(
File "/root/anaconda3/envs/minigpt4/lib/python3.9/site-packages/transformers/tokenization_utils.py", line 931, in _decode
filtered_tokens = self.convert_ids_to_tokens(token_ids, skip_special_tokens=skip_special_tokens)
File "/root/anaconda3/envs/minigpt4/lib/python3.9/site-packages/transformers/tokenization_utils.py", line 912, in convert_ids_to_tokens
tokens.append(self._convert_id_to_token(index))
File "/root/anaconda3/envs/minigpt4/lib/python3.9/site-packages/transformers/models/llama/tokenization_llama.py", line 129, in _convert_id_to_token
token = self.sp_model.IdToPiece(index)
File "/root/anaconda3/envs/minigpt4/lib/python3.9/site-packages/sentencepiece/init.py", line 1045, in _batched_func
return _func(self, arg)
File "/root/anaconda3/envs/minigpt4/lib/python3.9/site-packages/sentencepiece/init.py", line 1038, in _func
raise IndexError('piece id is out of range.')
IndexError: piece id is out of range.

GPUs memory leak

I was trying to serve the model in my local machine. However, when I increased the beam size, the GPU memory kept increasing and will reach the limit.

Name is misleading

This project is not related to GPT-4. It is not part of the same lineage, from the same company, or anywhere near the same size / composition. Unsuspecting users can reasonably be expected to infer that some connection to GPT-4 exists from the name and will be mislead.

I believe the current name is an effort to capitalize on the marketing of GPT-4 and that that is inappropriate.