coobiw / minigpt4qwen Goto Github PK

Personal Project: MPP-Qwen14B & MPP-Qwen-Next(Multimodal Pipeline Parallel based on Qwen-LM). Support [video/image/multi-image] {sft/conversations}. Don't let the poverty limit your imagination! Train your own 8B/14B LLaVA-training-like MLLM on RTX3090/4090 24GB.

Python 21.32% Shell 0.03% Jupyter Notebook 78.65%

multimodal-large-language-models deepspeed model-parallel pipeline-parallelism mllm qwen fine-tuning pretraining video-language-model video-large-language-models

minigpt4qwen's Introduction

Hi ! Here is Coobiw 👋

🙋‍♂️ About Me:

👨‍🦰 I’m currently a M.Phil candidate of Peking University.
👦 Before that, I received the (Honours) B.E., HUST.
❤️‍🔥 Now, I am intersted in Multi-modal Learning especially MLLM.

😋 Projects:

💥 In 2023 summer, I take part in OSPP(Open Source Promotion Plan) Summer Camp , with the honor of contributing for MMPretrain to build prompt-based classifier.
- Now, the implement of zero-shot CLIP classifier has been merged to the main branch. PR Link
- The implement of RAM(Recognize Anything Model) has been merged to the dev branch. Welcome to use the gradio WebUI to test it on MMPretrain! PR Link
💥 2023.10: I implement MiniGPT4Qwen, which is a toy model aligning MiniGPT4 with Qwen-Chat LLM model. I just use 18.8k high quality instruction-tuning data(bi-lingual, selected from minigpt4 and llava). Just fine-tuning the projection layer (3M trainable parameters), this model support Chinese and English! MiniGPT4Qwen
💥 2024.2: I extend MiniGPT4Qwen to MPP-Qwen14B(Multimodal Pipeline Parallel), scaling up both the LLM(to Qwen-14B-Chat) and pretrain-data(using LLaVA-pretrain-data). I also unfreeze the whole LLM during SFT-stage. All training is conducted on 3090/4090 GPUs. To prevent poverty (24GB of VRAM) from limiting imagination, I implemented an MLLM version based on deepspeed Pipeline Parallel. Pre-training can be completed in 22 hours on 2x4090s, while SFT requires training on 6x4090s (because it needs to fully activate the LLM), but due to the small amount of data, it only takes several hours.MPP-Qwen14B
💥 2024.6: MPP-Qwen-Next is released! Support {video/image/multi-image} {single/multi-turn} conversations. All training periods are conducted on 8 RTX3090(24GB) GPUs. MPP-Qwen-Next.

minigpt4qwen's People

Contributors

Stargazers

Watchers

Forkers

lxysl abbhay strategist922 jonathanpaul10 seekpoint sundogs8603 turbo-agi pkuqryan tfgbestneal 157459387 lymdlut jie311 quduoduo

minigpt4qwen's Issues

Code confusion

https://github.com/Coobiw/MiniGPT4Qwen/blob/d13f9657614a6be7553c850b7f95b4c31832eeef/lavis/models/minigpt4qwen_models/minigpt4qwen.py#L110C18-L110C18
predefine ：qformer_text_input = True
lavis repo code ：self.Qformer.resize_token_embeddings(len(self.tokenizer))

your code：raise NotImplementedError

Why do you change it like this?

请问Qwen-7B的权重文件是只需要LFS的吗？还是全部文件都要呢？

ValueError: unknown url type: '/export/dataset/minigpt4/minigpt4_minigpt4qwen_format.json'

root@autodl-container-809011af9e-f140ab75:~/autodl-tmp/MiniGPT4Qwen-master# CUDA_VISIBLE_DEVICES=0 python train.py --cfg-path lavis/projects/instruction_tuning/train.yaml
Not using distributed mode
2023-11-11 14:10:55,477 [INFO]
===== Running Parameters =====
2023-11-11 14:10:55,478 [INFO] {
"accum_grad_iters": 16,
"amp": false,
"batch_size_eval": 1,
"batch_size_train": 1,
"device": "cuda",
"dist_url": "env://",
"distributed": false,
"evaluate": false,
"grad_norm_clip": 1.0,
"init_lr": 0.0001,
"lr_sched": "linear_warmup_cosine_lr",
"max_epoch": 10,
"min_lr": 1e-06,
"num_workers": 4,
"output_dir": "output/instruction_tuning/lr1e-4",
"resume_ckpt_path": null,
"seed": 42,
"task": "image_text_pretrain",
"train_splits": [
"train"
],
"warmup_lr": 0,
"warmup_steps": 500,
"weight_decay": 0.05,
"world_size": 1
}
2023-11-11 14:10:55,478 [INFO]
====== Dataset Attributes ======
2023-11-11 14:10:55,478 [INFO]
======== minigpt4_instruction =======
2023-11-11 14:10:55,479 [INFO] {
"build_info": {
"annotations": {
"train": {
"storage": "dataset/minigpt4/minigpt4_minigpt4qwen_format.json",
"url": "/export/dataset/minigpt4/minigpt4_minigpt4qwen_format.json"
}
},
"images": {
"storage": "dataset/minigpt4/image"
}
},
"data_type": "images",
"text_processor": {
"train": {
"max_words": 100,
"name": "base_instruction"
}
},
"vis_processor": {
"train": {
"image_size": 224,
"name": "blip2_image_train"
}
}
}
2023-11-11 14:10:55,479 [INFO]
======== llava_instruction =======
2023-11-11 14:10:55,479 [INFO] {
"build_info": {
"annotations": {
"train": {
"storage": "dataset/llava/llava_minigpt4qwen_format.json",
"url": "/export/dataset/llava/llava_minigpt4qwen_format.json"
}
},
"images": {
"storage": "dataset/llava/image"
}
},
"data_type": "images",
"text_processor": {
"train": {
"max_words": 100,
"name": "base_instruction"
}
},
"vis_processor": {
"train": {
"image_size": 224,
"name": "blip2_image_train"
}
}
}
2023-11-11 14:10:55,479 [INFO]
====== Model Attributes ======
2023-11-11 14:10:55,480 [INFO] {
"arch": "minigpt4qwen",
"autocast_dtype": "bfloat16",
"drop_path_rate": 0,
"finetuned": "",
"freeze_proj": false,
"freeze_qformer": true,
"freeze_queries": true,
"freeze_vit": true,
"get_lora": false,
"image_size": 224,
"llm_model": "ckpt/qwen_14b_xiaoyu",
"load_finetuned": false,
"load_pretrained": true,
"lora_alpha": 32,
"lora_dropout": 0.05,
"lora_r": 8,
"max_txt_len": 256,
"model_type": "qwen7b_chat",
"num_query_token": 32,
"pretrained": "ckpt/blip2/blip2_pretrained_flant5xxl.pth",
"qformer_text_input": false,
"unfreeze_pos_embed": false,
"use_grad_checkpoint": true,
"vit_model": "eva_clip_g",
"vit_precision": "fp16"
}
/export/dataset/minigpt4/minigpt4_minigpt4qwen_format.json
Traceback (most recent call last):
File "train.py", line 103, in
main()
File "train.py", line 93, in main
datasets = task.build_datasets(cfg)
File "/root/autodl-tmp/MiniGPT4Qwen-master/lavis/tasks/base_task.py", line 57, in build_datasets
dataset = builder.build_datasets()
File "/root/autodl-tmp/MiniGPT4Qwen-master/lavis/datasets/builders/base_dataset_builder.py", line 52, in build_datasets
self._download_data()
File "/root/autodl-tmp/MiniGPT4Qwen-master/lavis/datasets/builders/base_dataset_builder.py", line 99, in _download_data
self._download_ann()
File "/root/autodl-tmp/MiniGPT4Qwen-master/lavis/datasets/builders/base_dataset_builder.py", line 157, in _download_ann
download_url(url=url_or_filename, root=dirname, filename=filename)
File "/root/miniconda3/lib/python3.8/site-packages/torchvision/datasets/utils.py", line 134, in download_url
url = _get_redirect_url(url, max_hops=max_redirect_hops)
File "/root/miniconda3/lib/python3.8/site-packages/torchvision/datasets/utils.py", line 82, in _get_redirect_url
with urllib.request.urlopen(urllib.request.Request(url, headers=headers)) as response:
File "/root/miniconda3/lib/python3.8/urllib/request.py", line 328, in init
self.full_url = url
File "/root/miniconda3/lib/python3.8/urllib/request.py", line 354, in full_url
self._parse()
File "/root/miniconda3/lib/python3.8/urllib/request.py", line 383, in _parse
raise ValueError("unknown url type: %r" % self.full_url)
ValueError: unknown url type: '/export/dataset/minigpt4/minigpt4_minigpt4qwen_format.json'

请问如何用Qwen-14B进行重新训练

Error：safetensors_rust.SafetensorError: Error while deserializing header: HeaderTooLarge

你好，我在按步骤实现minigpt4_qwen-7b-chat时，在执行python test_model_chat.py这一语句时出现错误，safetensors_rust.SafetensorError: Error while deserializing header: HeaderTooLarge。能请教以下如何解决吗？十分感谢。

学习率一直是1e-4不会下降？

run:
task: image_text_pretrain

optimizer

lr_sched: "linear_warmup_cosine_lr"
init_lr: 1e-4
min_lr: 1e-6
warmup_lr: 0
warmup_steps: 500
weight_decay: 0.05
grad_norm_clip: 1.
max_epoch: 1 #5
batch_size_train: 1 #16
batch_size_eval: 1
num_workers: 4
accum_grad_iters: 16 #1

请问为什么我的学习率在到了500step后就一直是1e-4不会下降？

deepspeed training, meet the error "ValueError: optimizer got an empty parameter list"

When i run ddp training code "python -m torch.distributed.run --nproc_per_node=2 --master_port=12233 train_pipeline.py --cfg-path lavis/projects/pp_qwen14b/train_pp.yaml --num-stages 2",GPU0 Trainable Params: 3937280， GPU1 Trainable Params: 0.
Because one card is enough to measure parameters, this problem seems to occur. Is this what you mean?
GPU1 Trainable Params: 0
Traceback (most recent call last):
File "train_pipeline.py", line 260, in
main()
File "train_pipeline.py", line 181, in main
engine, optimizer, _, _ = deepspeed.initialize(
File "/home/duser/miniconda3/envs/gpt/lib/python3.8/site-packages/deepspeed/init.py", line 192, in initialize
engine = PipelineEngine(args=args,
File "/home/duser/miniconda3/envs/gpt/lib/python3.8/site-packages/deepspeed/runtime/pipe/engine.py", line 68, in init
super().init(*super_args, **super_kwargs)
File "/home/duser/miniconda3/envs/gpt/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 307, in init
self._configure_optimizer(optimizer, model_parameters)
File "/home/duser/miniconda3/envs/gpt/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1230, in _configure_optimizer
basic_optimizer = self._configure_basic_optimizer(model_parameters)
File "/home/duser/miniconda3/envs/gpt/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1307, in _configure_basic_optimizer
optimizer = FusedAdam(
File "/home/duser/miniconda3/envs/gpt/lib/python3.8/site-packages/deepspeed/ops/adam/fused_adam.py", line 90, in init
super(FusedAdam, self).init(params, defaults)
File "/home/duser/miniconda3/envs/gpt/lib/python3.8/site-packages/torch/optim/optimizer.py", line 187, in init
raise ValueError("optimizer got an empty parameter list")
ValueError: optimizer got an empty parameter list
GPU0 Trainable Params: 3937280

请问如何减少训练时长

预训练loss 到1.4左右，step到900多就基本不动了，然后还一直训练？是我的显卡问题么？

special token

为什么使用''这一个special token标识图像位置不需要重新训练word_embedding层和最后的lm_head输出层？

关于知乎中提到的多模态接入方案问题

我理解是在推理时用到特有的接入方案，但是我在项目演示截图，在我们的指令中并没有看到类似<Img><ImageHere></Img>的输入嵌入？这是为什么呢？

请问是否支持流水线并行推理

从而减少 bubble time

训练loss异常

按照您的步骤进行复现，训练时发现loss下降很慢，此外训练完的train_loss相比您提供的也不是一个数量级（高一个数量级）

分布式设置错误

请问代码一直卡在了torch.distributed.init_process_group 这个方法，请问如何解决？

环境信息：单击多卡

OS 设置：
os.environ['RANK'] = '0'
os.environ['WORLD_SIZE'] = '4' # 因为我需要只用其中的4张卡
os.environ['LOCAL_RANK'] = '0'
os.environ['MASTER_ADDR'] = '127.0.0.1' # rank0 对应的地址
os.environ['MASTER_PORT'] = '29500' # 任何空闲的端口
os.environ['NCCL_IB_DISABLE'] = "1"
os.environ['NCCL_IBEXT_DISABLE'] = "1"

下面的代码一直超时，请问是哪里设置错误了么？

args.dist_url: "env://"
args.dist_backend = "nccl"

torch.distributed.init_process_group(
backend=args.dist_backend,
init_method=args.dist_url,
world_size=args.world_size,
rank=args.rank,
timeout=datetime.timedelta(
seconds=10
), # allow auto-downloading and de-compressing
)
torch.distributed.barrier()

Encountering RuntimeError Related to Process Group Initialization on RTX 3090

While attempting to train my model on an RTX 3090 GPU, I came across the following error: RuntimeError: Default process group has not been initialized. It seems like init_process_group has not been called.

I'd appreciate it if anyone could provide guidance on resolving this issue.

请教下为什么选用 qwen LLM 有和其他 baichuan2，chatglm3 做过比较么

和千问VL做过比较吗？

是否支持QWEN-14B的INT4的量化版本？

是否支持QWEN-14B的INT4的量化版本？谢谢！

huggingface 下载的Qwen7B-chat/None

line 317, in set_module_tensor_to_device
new_value = value.to(device)
File "D:\anaconda3\envs\minigpt4qwen\lib\site-packages\torch\cuda_init_.py", line 239, in _lazy_init
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

pip installed torch-2.1.0+cu118 torchaudio-2.1.0+cu118 torchvision-0.16.0+cu118