wangrongsheng / xrayglm Goto Github PK

🩺 首个会看胸部X光片的中文多模态医学大模型 | The first Chinese Medical Multimodal Model that Chest Radiographs Summarization.

License: Other

Python 53.47% Shell 1.46% Jupyter Notebook 45.07%

large-language-models llms medical multimodal xray visualglm-6b

xrayglm's Introduction

Note

欢迎关注我们最新的工作：CareLlama (关怀羊驼)，它是一个医疗大语言模型，同时它集合了数十个公开可用的医疗微调数据集和开放可用的医疗大语言模型以促进医疗LLM快速发展：https://github.com/WangRongsheng/CareLlama

在开始看这个项目之前，您应该是保持以学术研究的态度，切不可以实际医疗行为作为评价该项目的标准。我们相信很多事情正是因为有了一步一步的积累才会最终走向成功！

为了更好助力中文社区的医学多模态大模型发展，我们特意在该Repo开设了💬Discussions ，帮助大家互相学习。

介绍

最近，通用领域的大语言模型 (LLM)，例如 ChatGPT，在遵循指令和产生类似人类响应方面取得了显著的成功，这种成功间接促进了多模态大模型的研究和发展，如通用领域的多模态大模型MiniGPT-4、mPLUG-Owl、Multimodal-GPT和LLaVA ，然而，此类多模态大模型却很少出现在医学领域的研究中，阻碍了相关研究发展。visual-med-alpaca虽然在医学多模态大模型方面做出了一些很有成效的工作，然而其数据为英文诊断报告，不利于促进中文领域医学多模态大模型的研究发展。为此，我们开发了XrayGLM以解决上述问题。XrayGLM在医学影像诊断和多轮交互对话上显示出了非凡的潜力。


🌟【官方视频教程】ChatGLM-6B 微调：P-Tuning，LoRA，Full parameter	🌟【官方视频教程】VisualGLM技术讲解	🌟【官方视频教程】XrayGLM微调实践

本文贡献

借助ChatGPT以及公开的数据集，我们构造了一个X光影像-诊断报告对的医学多模态数据集；
我们将构建的中文胸部X光片诊断数据集在VisualGLM-6B进行微调训练，并开放了部分训练权重用于学术研究；

数据集

MIMIC-CXR是一个公开可用的胸部X光片数据集，包括377,110张图像和227,827个相关报告。
OpenI是一个来自印第安纳大学医院的胸部X光片数据集，包括6,459张图像和3,955个报告。

在上述工作中，报告信息都为非结构化的，不利于科学研究。为了生成合理的医学报告，我们对两个数据集进行了预处理，并最终得到了可以用于训练的英文报告。除此之外，为了更好的支持中文社区发展，借助ChatGPT的能力，我们将英文报告进行了中文翻译，并最终形成了可用于训练的数据集。

数据集	数量	下载链接	质量
MIMIC-CXR-zh	-	-	-
OpenI-zh	6,423	诊疗报告(英文)、诊疗报告(中文) 、X光影像	低
OpenI-zh-plus	6,423	-	高

快速上手

1.安装环境

# 安装依赖
pip install -r requirements.txt
# 国内换源安装依赖
pip install -i https://mirrors.aliyun.com/pypi/simple/ -r requirements.txt

此时默认会安装deepspeed库（支持sat库训练），此库对于模型推理并非必要，同时部分Windows环境安装此库时会遇到问题。如果想绕过deepspeed安装，我们可以将命令改为：

# 安装依赖
pip install -i https://mirrors.aliyun.com/pypi/simple/ -r requirements_wo_ds.txt
# 安装SwissArmyTransformer
pip install -i https://mirrors.aliyun.com/pypi/simple/ --no-deps "SwissArmyTransformer>=0.3.6"

2.模型推理

模型权重	下载链接	质量	微调方法
checkpoints-XrayGLM-300		低	LoRA
checkpoints-XrayGLM-3000		低	LoRA
checkpoints-XrayGLM-xxx-plus	-	高	-

CLI推理

python cli_demo.py --from_pretrained checkpoints/checkpoints-XrayGLM-3000 --prompt_zh '详细描述这张胸部X光片的诊断结果'

更多参数

如果您是Google Colab Pro会员，请可以直接在Colab中运行XrayGLM：

WebUI运行

python web_demo.py --from_pretrained checkpoints/checkpoints-XrayGLM-3000

此时访问http://127.0.0.1:7860即可

3.模型训练（复现XrayGLM）

硬件资源

* 实验在A100 (4X, 80GB)上进行

（1）准备诊疗报告(中文)和X光影像在data/Xray文件夹下；
（2）开始训练：

# 设置CUDA变量，主要是为了解决有时候直接训练而出现无法正确加载到显卡问题
export CUDA_VISIBLE_DEVICES=0,1,2,3

# 开始训练
bash finetune_XrayGLM.sh

这里的复现过程非常简单，主要是很多过程我们都为大家准备好了，大家可以随时复现一个自己的XrayGLM。

4.模型训练（通用）

效果展示

*以下效果来自于低质量的数据训练和权重

影像诊断：

X光影像	诊疗报告(医生)	XrayGLM
	根据X射线图像，心脏大小正常，肺部看起来很清晰。已经排除了肺炎、积液、水肿、气胸、腺病、结节或肿块的存在。该发现表明一切正常。换句话说，总体印象是胸部正常。	X光片显示，心脏大小正常。肺部没有气胸或胸腔积液的证据。骨骼结构看起来没有任何异常。总的来说，X光片表明没有发现急性心肺疾病的迹象。
	X光图片显示没有扭曲的区域、可疑的骨质不透明度或胸腺积液。心脏大小轻微增大，但非常稳定。此外，没有气胸的证据，肺野也正常。总体来说，X光片显示没有急性心脏异常，但要注意轻微的心脏扩大。	X光片显示，心脏大小正常。肺部和纵隔轮廓清晰，没有气胸或胸腔积液的证据。骨骼结构完整，没有任何急性异常。
	心影形态在正常范围内。没有急性骨质异常。有可能出现没有明显的吸水或肺实质状况的线索。没有大面积积液或气胸。有可能怀疑其有误吸或肺实质问题。为了确诊和治疗，可能需要进行适当的影像学随访。	心脏大小正常，肺部和纵隔轮廓清晰。没有急性异常或活动受限的证据。没有任何明显的气胸、胸腔积液或肺不张的迹象。此外，也没有发现任何局灶性骨质异常或其他急性疾病。

多轮对话：

X光影像	诊疗报告(医生)
	右侧主动脉突出观察到。心脏大小正常。胸部未发现明显的肺气肿、胸膜积液或可疑的肺部病变。报告指出已知右侧主动脉情况正常。

>> X光影像路径：./data/Xray/861_1.png

>> 用户：这是一张什么图？
>> XrayGLM：这张图片显示的是一张肺部检查的X光片。

>> 用户：请描述这张胸部X光片的诊断结果
>> XrayGLM：心脏大小正常，肺部清晰。没有急性心肺疾病的证据。

>> 用户：是否存在肺气肿？
>> XrayGLM：没有肺气肿。心脏大小正常，肺部清晰。没有发现急性心肺疾病的证据。

>> 用户：平时饮食该注意什么？
>> XrayGLM：饮食应该健康，均衡。建议增加蛋白质的摄入量，因为蛋白质有助于维持肌肉和骨骼的健康。多食用蔬菜、水果、全谷物以及健康的脂肪来源如橄榄油、鱼类等可以帮助获得足够的维生素和矿物质。此外，避免过度饮酒或吸烟，这些习惯可能会对心脏健康产生负面影响。

未来展望

模型的能力更多来源于数据的支持，OpenI-zh作为微调数据集，其数据量足以支持研究，在更广泛的任务和性能表现上，我们认为在大规模数据集上预训练并在高质量少量数据集上微调是一种可行的方案；
普遍意义的理解上，视觉多模态模型=视觉模型+语言模型。除了需要关注视觉模型信息与语言模型输出的搭配外，还需要额外关注到语言模型的加强，在人机的对话中，尤其是医疗语言模型的问答上，除了专业的医疗问题回答，带有人文情怀的有温度的回答更应该是我们追寻的目标。
高精度的模型永远打不过大参数的模型，如果在6B模型和13B模型上选择微调，请在资源充足情况下选择13B的大参数模型；

项目致谢

VisualGLM-6B为我们提供了基础的代码参考和实现；
MiniGPT-4为我们这个项目提供了研发思路；
ChatGPT生成了高质量的中文版X光检查报告以支持XrayGLM训练；
gpt_academic为文档翻译提供了多线程加速；
MedCLIP 、BLIP2 、XrayGPT 等工作也有重大的参考意义；

这项工作由澳门理工大学应用科学学院硕士生王荣胜、段耀菲、李俊蓉完成，指导老师为檀韬副教授、彭祥佑老师。

*特别鸣谢：USTC-PhD Yongle Luo 提供了有3000美金的OpenAI账号，帮助我们完成大量的X光报告翻译工作

免责声明

本项目相关资源仅供学术研究之用，严禁用于商业用途。使用涉及第三方代码的部分时，请严格遵循相应的开源协议。模型生成的内容受模型计算、随机性和量化精度损失等因素影响，本项目无法对其准确性作出保证。即使本项目模型输出符合医学事实，也不能被用作实际医学诊断的依据。对于模型输出的任何内容，本项目不承担任何法律责任，亦不对因使用相关资源和输出结果而可能产生的任何损失承担责任。

项目引用

如果你使用了本项目的模型，数据或者代码，请声明引用：

@misc{wang2023XrayGLM,
      title={XrayGLM: The first Chinese Medical Multimodal Model that Chest Radiographs Summarization}, 
      author={Rongsheng Wang, Yaofei Duan, Junrong Li, Patrick Pang and Tao Tan},
      year={2023},
      publisher = {GitHub},
      journal = {GitHub repository},
      howpublished = {\url{https://github.com/WangRongsheng/XrayGLM}},
}

我们的工作被Evaluating Large Language Models for Radiology Natural Language Processing引用，希望大家去关注一下相关的工作！

使用许可

此存储库遵循CC BY-NC-SA ，请参阅许可条款。

xrayglm's People

Contributors

Stargazers

Watchers

Forkers

railgungun iamleon121 meua knifecaojia twksky tomisfat cctvastu xycjscs smalltigerlee zyairezhaoy techthiyanes 2132660698 lonisy yzxzero kaiven11 openhushen season111 halfss huiyongli1982 lingkeyang kangxiansen chinatian rongstone ooxxxxoo scutcyr netkindom tzningzhuo heycms uli45 jennynanap caoyun001 xiaoral2 wangyujun1985 ripingit zero506 soho1319 jason-chen-2017 eglxiang adambear 1416924176 dingjianfei makealittledent shitoudidi iwillcodeu jundeli-kongsr yuanmeng1120 outbreak-hui jumping jadeluo teahz charlieliu9999 xiaoyza superzxl114514 pei-eng yangsenlu1999 haklmtt luodapang sflyers shadowkun 15392778677 yjyang1990 zhaoyongrui wuyj zwh821 chenjialin-suod linkingtd som-don chanhigal xrunda assassindesign hertera1 yonniem kevinwck jaychuo zilinjiang cangchen8180 newledge tian-003 hisinr cz26 dx2048 zhanghonglishanzai wingjoezhou kekewind wangxin52791 lengyue310 ritch2023 curryli zhongliang-shi shakesmin arcate-sun hill1314 yazooliu dragocytus jaysoncai236 jacquelin803 zxh263 zhouhuashan simongaocd sunshinewhy

xrayglm's Issues

windows 11 下载启动报错

先安装命令
pip install -i https://mirrors.aliyun.com/pypi/simple/ -r requirements_wo_ds.txt
pip install -i https://mirrors.aliyun.com/pypi/simple/ --no-deps "SwissArmyTransformer>=0.3.6"

后下载模型是：XrayGLM-300

启动命令： python web_demo.py --from_pretrained checkpoints/checkpoints-XrayGLM-300
发生错误：
PS F:\GML\XrayGLM> python web_demo.py --from_pretrained checkpoints/checkpoints-XrayGLM-300
[2023-06-05 00:00:17,422] [INFO] DeepSpeed/CUDA is not installed, fallback to Pytorch checkpointing.
[2023-06-05 00:00:17,493] [WARNING] DeepSpeed Not Installed, you cannot import training_main from sat now.
Traceback (most recent call last):
File "F:\GML\XrayGLM\web_demo.py", line 12, in
from finetune_XrayGLM import FineTuneVisualGLMModel
File "F:\GML\XrayGLM\finetune_XrayGLM.py", line 6, in
from sat.training.deepspeed_training import training_main
File "C:\Users\1\AppData\Local\Programs\Python\Python39\lib\site-packages\sat\training\deepspeed_training.py", line 27, in
import deepspeed
ModuleNotFoundError: No module named 'deepspeed'

在COLAB上运行出现错误

运行的时候输入URL和图片地址均无法继续，也无法切换成纯文本模式

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

bin /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /usr/lib64-nvidia did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/sys/fs/cgroup/memory.events /var/colab/cgroup/jupyter-children/memory.events')}
warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('//172.28.0.1'), PosixPath('8013'), PosixPath('http')}
warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('--logtostderr --listen_host=172.28.0.12 --target_host=172.28.0.12 --tunnel_background_save_url=https'), PosixPath('//colab.research.google.com/tun/m/cc48301118ce562b961b3c22d803539adc1e0c19/gpu-a100-s-1ar02uode6jjs --tunnel_background_save_delay=10s --tunnel_periodic_background_save_frequency=30m0s --enable_output_coalescing=true --output_coalescing_required=true')}
warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/env/python')}
warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('module'), PosixPath('//ipykernel.pylab.backend_inline')}
warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0'), PosixPath('/usr/local/cuda/lib64/libcudart.so')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get CUDA error: invalid device function errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
warn(msg)
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so.11.0
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so...
[2023-05-31 02:32:05,739] [INFO] building FineTuneVisualGLMModel model ...
41239
[2023-05-31 02:32:05,741] [INFO] [RANK 0] > initializing model parallel with size 1
[2023-05-31 02:32:05,742] [INFO] [RANK 0] You are using model-only mode.
For torch.distributed users or loading model parallel models, set environment variables RANK, WORLD_SIZE and LOCAL_RANK.
/usr/local/lib/python3.10/dist-packages/torch/nn/init.py:405: UserWarning: Initializing zero-element tensors is a no-op
warnings.warn("Initializing zero-element tensors is a no-op")
replacing layer 0 with lora
replacing layer 14 with lora
[2023-05-31 02:32:19,816] [INFO] [RANK 0] > number of parameters on model parallel rank 0: 7811237376
[2023-05-31 02:32:20,765] [INFO] [RANK 0] global rank 0 is loading checkpoint checkpoints/300/mp_rank_00_model_states.pt
[2023-05-31 02:32:30,374] [INFO] [RANK 0] > successfully loaded checkpoints/300/mp_rank_00_model_states.pt
[2023-05-31 02:32:30,995] [INFO] [RANK 0] > Quantizing model weight to 8 bits
[2023-05-31 02:32:43,283] [INFO] [RANK 0] > Quantized 5502926848 parameters in total.
Downloading (…)okenizer_config.json: 100% 441/441 [00:00<00:00, 2.18MB/s]
Downloading (…)enization_chatglm.py: 100% 17.0k/17.0k [00:00<00:00, 48.8MB/s]
A new version of the following files was downloaded from https://huggingface.co/THUDM/chatglm-6b:

tokenization_chatglm.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
Downloading ice_text.model: 100% 2.71M/2.71M [00:00<00:00, 35.7MB/s]
欢迎使用 XrayGLM 模型，输入图像URL或本地路径读图，继续输入内容对话，clear 重新开始，stop 终止程序
请输入图像路径或URL（回车进入纯文本对话）：
用户：
"addmm_impl_cpu_" not implemented for 'Half'
请输入图像路径或URL（回车进入纯文本对话）： https://img2.baidu.com/it/u=3401817238,2820605663&fm=253&fmt=auto&app=138&f=JPEG?w=500&h=375
"slow_conv2d_cpu" not implemented for 'Half'
请输入图像路径或URL（回车进入纯文本对话）： /content/XrayGLM/2577_1.png
cannot identify image file '/content/XrayGLM/2577_1.png'
请输入图像路径或URL（回车进入纯文本对话）： /2577_1.png
[Errno 2] No such file or directory: '/2577_1.png'
请输入图像路径或URL（回车进入纯文本对话）： 2577_1.png
cannot identify image file '2577_1.png'

微调真的学会了嘛？

XrayGLM真的学会诊断X片还是只是学会了这种说话的方式，示例里都是判断正常的例子，是否他只学会了说“心脏正常，无明显异常”这类的话，并没有真的学会在X片中的细粒度的诊断？
请解答一下,感谢

finetune_XrayGLM.py 加载sat 模型时出现错误

finetune_XrayGLM.py文件中的第178行
model_type = 'visualglm-6b'
model, args = FineTuneVisualGLMModel.from_pretrained(model_type, args)
加载sat库中的模型时报错
Downloading models https://cloud.tsinghua.edu.cn/f/348b98dffcc940b6a09d/?dl=1 into /home/dawnlab/.sat_models/visualglm-6b.zip ...
/home/dawnlab/.sat_models/visualglm-6b.zip: 100%|██████████| 41.8k/41.8k [00:00<00:00, 6.84MB/s]
Unzipping /home/dawnlab/.sat_models/visualglm-6b.zip...
python-BaseException
Traceback (most recent call last):
File "/home/dawnlab/anaconda3/envs/glm/lib/python3.8/site-packages/sat/resources/download.py", line 65, in auto_create
f = zipfile.ZipFile(file_path, 'r')
File "/home/dawnlab/anaconda3/envs/glm/lib/python3.8/zipfile.py", line 1269, in init
self._RealGetContents()
File "/home/dawnlab/anaconda3/envs/glm/lib/python3.8/zipfile.py", line 1336, in _RealGetContents
raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file

Deepspeed没法跳过

安装依赖

pip install -i https://mirrors.aliyun.com/pypi/simple/ -r requirements_wo_ds.txt

安装SwissArmyTransformer

pip install -i https://mirrors.aliyun.com/pypi/simple/ --no-deps "SwissArmyTransformer>=0.3.6"
已经用按照你的方法安装，但依然显示缺少这个模块。并且这个模块在windows11上无法安装。

Traceback (most recent call last):
File "D:\XrayGLM\web_demo.py", line 12, in
from finetune_XrayGLM import FineTuneVisualGLMModel
File "D:\XrayGLM\finetune_XrayGLM.py", line 6, in
from sat.training.deepspeed_training import training_main
File "D:\XrayGLM\venv\lib\site-packages\sat\training\deepspeed_training.py", line 27, in
import deepspeed
ModuleNotFoundError: No module named 'deepspeed'

cli_demo: cannot identify image file

Test using Google Colab A100.
When using the "cli_demo", the following image reading problem occurred.

请输入图像路径或URL（回车进入纯文本对话）： https://github.com/WangRongsheng/XrayGLM/blob/main/data/Xray/2577_1.png
cannot identify image file <_io.BytesIO object at 0x7f65dcf1fa10>

关于训练量的一些问题

请问训练时进行了几次训练呢，一共训练了几个epoch

AttributeError: 'Namespace' object has no attribute 'pad_token_id'

model, model_args = AutoModel.from_pretrained(

File "/root/anaconda3/envs/llm/lib/python3.9/site-packages/sat/model/base_model.py", line 310, in from_pretrained
return cls.from_pretrained_base(name, args=args, home_path=home_path, url=url, prefix=prefix, build_only=build_only, overwrite_args=overwrite_args, **kwargs)
File "/root/anaconda3/envs/llm/lib/python3.9/site-packages/sat/model/base_model.py", line 302, in from_pretrained_base
model = get_model(args, model_cls, **kwargs)
File "/root/anaconda3/envs/llm/lib/python3.9/site-packages/sat/model/base_model.py", line 352, in get_model
model = model_cls(args, params_dtype=params_dtype, **kwargs)
File "/data/data_01/shituo/llm/XrayGLM/finetune_XrayGLM.py", line 13, in init
super().init(args, transformer=transformer, parallel_output=parallel_output, **kw_args)
File "/data/data_01/shituo/llm/XrayGLM/model/visualglm.py", line 29, in init
super().init(args, transformer=transformer, **kwargs)
File "/root/anaconda3/envs/llm/lib/python3.9/site-packages/sat/model/official/chatglm_model.py", line 175, in init
self.pad_token_id = args.pad_token_id
AttributeError: 'Namespace' object has no attribute 'pad_token_id'

运行时出现cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

PyTorch的版本不对导致的？

[help] RuntimeError: "unfolded2d_copy" not implemented for 'Half'怎么解决

报错详情

  File "web_demo.py", line 58, in request_model
    answer = generate_text_with_image(input_text, image, result_text.copy(), request_para, is_zh)
  File "web_demo.py", line 31, in generate_text_with_image
    answer, history, _ = chat(None, model, tokenizer, input_text, history=history, image=input_image, \
  File "D:\projects\XrayGLM-main\model\chat.py", line 141, in chat
    output = filling_sequence(
  File "E:\Anaconda3\envs\xglm\lib\site-packages\sat\generation\autoregressive_sampling.py", line 108, in filling_sequence
    logits, *output_per_layers = model(
  File "E:\Anaconda3\envs\xglm\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "E:\Anaconda3\envs\xglm\lib\site-packages\sat\model\official\chatglm_model.py", line 192, in forward
    return super().forward(input_ids=input_ids, attention_mask=attention_mask, position_ids=position_ids, past_key_values=past_key_values, **kwargs)
  File "E:\Anaconda3\envs\xglm\lib\site-packages\sat\model\base_model.py", line 144, in forward
    return self.transformer(*args, **kwargs)
  File "E:\Anaconda3\envs\xglm\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "E:\Anaconda3\envs\xglm\lib\site-packages\sat\model\transformer.py", line 451, in forward
    hidden_states = self.hooks['word_embedding_forward'](input_ids, output_cross_layer=output_cross_layer, **kw_args)
  File "D:\projects\XrayGLM-main\model\visualglm.py", line 20, in word_embedding_forward
    image_emb = self.model(**kw_args)
  File "E:\Anaconda3\envs\xglm\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "D:\projects\XrayGLM-main\model\blip2.py", line 65, in forward
    enc = self.vit(image)[0]
  File "E:\Anaconda3\envs\xglm\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "D:\projects\XrayGLM-main\model\blip2.py", line 29, in forward
    return super().forward(input_ids=input_ids, position_ids=None, attention_mask=attention_mask, image=image)
  File "E:\Anaconda3\envs\xglm\lib\site-packages\sat\model\base_model.py", line 144, in forward
    return self.transformer(*args, **kwargs)
  File "E:\Anaconda3\envs\xglm\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "E:\Anaconda3\envs\xglm\lib\site-packages\sat\model\transformer.py", line 451, in forward
    hidden_states = self.hooks['word_embedding_forward'](input_ids, output_cross_layer=output_cross_layer, **kw_args)
  File "E:\Anaconda3\envs\xglm\lib\site-packages\sat\model\official\vit_model.py", line 54, in word_embedding_forward
    embeddings = self.proj(images)
  File "E:\Anaconda3\envs\xglm\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "E:\Anaconda3\envs\xglm\lib\site-packages\torch\nn\modules\conv.py", line 446, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "E:\Anaconda3\envs\xglm\lib\site-packages\torch\nn\modules\conv.py", line 442, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: "unfolded2d_copy" not implemented for 'Half'

环境：win10 python3.8.5
尝试：网上说是pytorch不支持cpu fp16 把half改为float即可但是改了还是一样的报错
求解决方法

FineTuneVisualGLMModel

Traceback (most recent call last):
File "D:\AI\XrayGLM\web_demo.py", line 154, in
main(args)
File "D:\AI\XrayGLM\web_demo.py", line 90, in main
model, model_args = AutoModel.from_pretrained(
File "E:\anaconda3\envs\pytorch18\lib\site-packages\sat\model\base_model.py", line 275, in from_pretrained
model_cls = model_registry.get(args.model_class)
File "E:\anaconda3\envs\pytorch18\lib\site-packages\sat\model\registry.py", line 17, in get
raise ValueError(f'model_class {name} not found.')
ValueError: model_class FineTuneVisualGLMModel not found.

现有模型的配置文件不全吧。我修改成VisualGLMModel类型后，又找不到
File "D:\AI\XrayGLM\web_demo.py", line 90, in main
model, model_args = AutoModel.from_pretrained(
File "E:\anaconda3\envs\pytorch18\lib\site-packages\sat\model\base_model.py", line 284, in from_pretrained
load_checkpoint(model, args, load_path=model_path, prefix=prefix)
File "E:\anaconda3\envs\pytorch18\lib\site-packages\sat\training\model_io.py", line 184, in load_checkpoint
iteration, release, success = get_checkpoint_iteration(load_path)
File "E:\anaconda3\envs\pytorch18\lib\site-packages\sat\training\model_io.py", line 152, in get_checkpoint_iteration
raise ValueError('could not find the metadata file {}, please check --load'.format(
ValueError: could not find the metadata file checkpoints\latest, please check --load

求解答，我是WINDOWS还没内有尝试自己微调。

利用本地数据集进行训练，必须依赖Linux环境吗？是否可在Windows平台上进行？

微调训练时自动退出

8x v100 (16G) 是内存不够吗？我在微调训练时出现了自动退出问题，有人遇到过吗？

运行cli_demo.py报错，提示GET was unable to find an engine to execute this computation

如图所示

是因为我的服务器没有科学上网的原因吗？还是其他的原因，求解答，谢谢~

训练iteration数

请问训练多少个iteration可以达到demo中的效果呢

这文件是不是有点不对劲

如图，这个json文件下根本没有图片的路径呀？

命令仅仅把batch改成了1，把use_lora改成了use_qlora:

模型微调出错

AttributeError 'FakeTokenizer' object has no attribute 'pad_token_id'
AttributeError 'FakeTokenizer' object has no attribute 'encode'

模型权重下载链接

冒昧的问一句，是否可以提供模型权重的下载链接

「求解答」data/json 中的训练格式都是 unicode 编码过的内容，这是为什么？

data下的json数据 prompt 和 label 都是 unicode 编码过的内容。
visual 官方的数据格式应该是支持中文的，转过码的原因是什么？节省训练时候的压力？

检查发现是https://cloud.tsinghua.edu.cn/f/348b98dffcc940b6a09d/?dl=1 对应的visualglm-6b.zip缺失了，

请楼主提供新的文件下载地址，感谢

微调问题

你好，想请教你几个关于微调上的细节问题：

请问你在微调的时候考虑分辨率对结果的影响吗？胸片照片如果被压缩至224会不会造成重要像素信息丢失？
微调后visualGLM对label当中的数值信息能否做到精确描述？比如：“这个肿瘤位于左胸下侧3mm处，其大小越威1.2mm”。在保证足够的微调样本的情况下，模型是否还能正确给出准确描述？
请问对模型生成的胸片描述你们是如何进行验证的？

我最近也在准备微调visualGLM，不过我的数据更加偏向于数值预测一些，希望能够多多交流～

拓展到病理阅片的可能性探讨

病理切片以及脱落细胞形态学也是同样静态的具有显著特征的图片。现有的自动阅片系统，事实上已经具备了一定的判读功能。应用相似的训练方法应该也能生成可供临床参考的报告。

为什么用博主的权重效果没有展示的好？

感觉效果差很多

微调问题：读取图片错误

利用示例程序（bash finetune_XrayGLM.sh）进行数据微调，出现以下错误：

Traceback (most recent call last):
  File "/content/drive/MyDrive/XrayGLM/XrayGLM-6B/finetune_XrayGLM.py", line 194, in <module>
    training_main(args, model_cls=model, forward_step_function=forward_step, create_dataset_function=create_dataset_function, collate_fn=data_collator)
  File "/usr/local/lib/python3.10/dist-packages/sat/training/deepspeed_training.py", line 67, in training_main
    train_data, val_data, test_data = make_loaders(args, hooks['create_dataset_function'], collate_fn=collate_fn)
  File "/usr/local/lib/python3.10/dist-packages/sat/data_utils/configure_data.py", line 197, in make_loaders
    train = make_dataset(**data_set_args, args=args, dataset_weights=args.train_data_weights, is_train_data=True)
  File "/usr/local/lib/python3.10/dist-packages/sat/data_utils/configure_data.py", line 124, in make_dataset_full
    d = create_dataset_function(p, args)
  File "/content/drive/MyDrive/XrayGLM/XrayGLM-6B/finetune_XrayGLM.py", line 160, in create_dataset_function
    dataset = FewShotDataset(path, image_processor, tokenizer, args)
  File "/content/drive/MyDrive/XrayGLM/XrayGLM-6B/finetune_XrayGLM.py", line 117, in __init__
    image = processor(Image.open(item['img']).convert('RGB'))
TypeError: string indices must be integers

测试环境：Google Colab A100
数据存储：Google Drive

PS：与Issues5问题类似，读取存放于Google Drive中的图像等数据，出现问题

如何多卡数据并行进行微调？

请问如何使用多卡数据并行微调呢？微调时一直显示只使用了rank0的机器，设置了export CUDA_VISIBLE_DEVICES=0,1,2,3也不行

你好，请问微调硬件最低要求为多大呢？12G / 24G够吗？

如题，因为visualGLM那边最低得可以再9.8G显卡上面微调，不知道这个可以吗？

XrayGLM模型权重下载速度太慢了

为什么我用--qlora训练以后的权重，不能在cli_demo.py中正常加载？

显示是输入数据尺寸不对，我打印了一下这里的self.wigth.data,其shape为：

而后，我再打印了一下这里加载的权重：state_dict[prefix+'weight']，其shape为：

这个值恰好是上面self.wigth.data的shape的乘积的一半，而且上面self.wigth.data是float16，而state_dict[prefix+'weight']是uint8，
能给一点意见吗？感谢

运行cli_demo.py "slow_conv2d_cpu" not implemented for 'Half'

本地搭建问题

使用谷歌COLAB按照主页教程搭建成功
之后在本地的WSL2-Ubuntu-22.04上尝试搭建XrayGLM，运行后报错

第一次报错：
lsj@DESKTOP-H1KB736:/mnt/c/Users/38561/xrayglm$ python cli_demo.py --from_pretrained checkpoints/checkpoints-XrayGLM-300 --prompt_zh '详细描述这张胸部X光片的诊断结果'

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

bin /home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so
/home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
warn("The installed version of bitsandbytes was compiled without GPU support. "
/home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32
/home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /home/lsj/.conda/envs/xrayglm did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
warn(msg)
/home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/cuda-11.6/lib64}')}
warn(msg)
/home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /usr/local/cuda-11.6/lib64} did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
/home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so'), PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get CUDA error: invalid device function errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
warn(msg)
CUDA SETUP: WARNING! libcuda.so not found! Do you have a CUDA driver installed? If you are on a cluster, make sure you are on a CUDA machine!
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
/home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: No GPU detected! Check your CUDA paths. Proceeding to load CPU-only library...
warn(msg)
CUDA SETUP: Loading binary /home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so...
[2023-05-31 16:25:18,755] [WARNING] Failed to load bitsandbytes:No module named 'scipy'
[2023-05-31 16:25:18,763] [INFO] building FineTuneVisualGLMModel model ...
40901
[2023-05-31 16:25:18,845] [INFO] [RANK 0] > initializing model parallel with size 1
[2023-05-31 16:25:18,846] [INFO] [RANK 0] You are using model-only mode.
For torch.distributed users or loading model parallel models, set environment variables RANK, WORLD_SIZE and LOCAL_RANK.
Traceback (most recent call last):
File "/mnt/c/Users/38561/xrayglm/cli_demo.py", line 104, in
main()
File "/mnt/c/Users/38561/xrayglm/cli_demo.py", line 30, in main
model, model_args = AutoModel.from_pretrained(
File "/home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/sat/model/base_model.py", line 282, in from_pretrained
model = get_model(args, model_cls, **kwargs)
File "/home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/sat/model/base_model.py", line 305, in get_model
model = model_cls(args, params_dtype=params_dtype, **kwargs)
File "/mnt/c/Users/38561/xrayglm/finetune_XrayGLM.py", line 13, in init
super().init(args, transformer=transformer, parallel_output=parallel_output, **kw_args)
File "/mnt/c/Users/38561/xrayglm/model/visualglm.py", line 29, in init
super().init(args, transformer=transformer, **kwargs)
File "/home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/sat/model/official/chatglm_model.py", line 170, in init
super(ChatGLMModel, self).init(args, transformer=transformer, activation_func=gelu, **kwargs)
File "/home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/sat/model/base_model.py", line 88, in init
self.transformer = BaseTransformer(
File "/home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/sat/model/transformer.py", line 427, in init
[get_layer(layer_id) for layer_id in range(num_layers)])
File "/home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/sat/model/transformer.py", line 427, in
[get_layer(layer_id) for layer_id in range(num_layers)])
File "/home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/sat/model/transformer.py", line 402, in get_layer
return BaseTransformerLayer(
File "/home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/sat/model/transformer.py", line 313, in init
self.mlp = MLP(
File "/home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/sat/model/transformer.py", line 189, in init
self.dense_h_to_4h = ColumnParallelLinear(
File "/home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/sat/mpu/layers.py", line 219, in init
self.weight = Parameter(torch.empty(self.output_size_per_partition,
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 128.00 MiB (GPU 0; 12.00 GiB total capacity; 11.25 GiB already allocated; 0 bytes free; 11.25 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

看似显存溢出了，把模型改为quant 4，第二次报错：

(xrayglm) lsj@DESKTOP-H1KB736:/mnt/c/Users/38561/xrayglm$ python cli_demo.py --quant 4 --from_pretrained checkpoints/che
ckpoints-XrayGLM-300 --prompt_zh '详细描述这张胸部X光片的诊断结果'

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

好像没装scipy：
pip install scipy
Collecting scipy
Downloading scipy-1.10.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (34.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 34.5/34.5 MB 6.1 MB/s eta 0:00:00
Requirement already satisfied: numpy<1.27.0,>=1.19.5 in /home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages (from scipy) (1.24.3)
Installing collected packages: scipy
Successfully installed scipy-1.10.1

第三次报错：
(xrayglm) lsj@DESKTOP-H1KB736:/mnt/c/Users/38561/xrayglm$ python cli_demo.py --quant 4 --from_pretrained checkpoints/checkpoints-XrayGLM-300 --prompt_zh '详细描述这张胸部X光片的诊断结果'

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

nvcc -V是可以看得到cuda的，好像是bitsandbytes的问题，我按照https://blog.csdn.net/anycall201/article/details/129930919修改了一下
最后还是被killed：
(xrayglm) lsj@DESKTOP-H1KB736:/mnt/c/Users/38561/XrayGLM$ python cli_demo.py --quant 4 --from_pretrained checkpoints/checkpoints-XrayGLM-300 --prompt_zh '详细描述这张胸部X光片的诊断结果'

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

bin /home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda116.so
CUDA SETUP: Loading binary /home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda116.so...
[2023-05-31 16:48:33,857] [INFO] building FineTuneVisualGLMModel model ...
60827
[2023-05-31 16:48:33,862] [INFO] [RANK 0] > initializing model parallel with size 1
[2023-05-31 16:48:33,864] [INFO] [RANK 0] You are using model-only mode.
For torch.distributed users or loading model parallel models, set environment variables RANK, WORLD_SIZE and LOCAL_RANK.
/home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/torch/nn/init.py:405: UserWarning: Initializing zero-element tensors is a no-op
warnings.warn("Initializing zero-element tensors is a no-op")
replacing layer 0 with lora
replacing layer 14 with lora
[2023-05-31 16:48:40,797] [INFO] [RANK 0] > number of parameters on model parallel rank 0: 7811237376
[2023-05-31 16:48:42,470] [INFO] [RANK 0] global rank 0 is loading checkpoint checkpoints/checkpoints-XrayGLM-300/300/mp_rank_00_model_states.pt
Killed

求解答，谢谢！

为什么我的卡着不动

我的卡在这里不动

而且我的没有提示replacing chatglm linear layer with 4

关于文本报告数据的疑问

根据OpenI官网（https://openi.nlm.nih.gov/faq#collection）提供的报告下载链接，我下载到的英文原始报告有时候不完整，部分字符以XXXX代替，然而您的项目里英文报告对应的caption部分好像都是完整的信息。
因此想请教作者是采用了什么预处理方法对报告的信息进行补全吗？
例如“CXR2_IM-0652-2001.jpg”的原始英文报告文件2.xml中相关信息如下：

None.
Preop bariatric surgery.
Borderline cardiomegaly. Midline sternotomy XXXX. Enlarged pulmonary arteries. Clear lungs. Inferior XXXX XXXX XXXX.
No acute pulmonary findings.
相应的作者项目里的caption部分内容是：
"image_id": "2_1",
"caption": "Borderline cardiomegaly with midline sternotomy scar. Enlarged pulmonary arteries with clear lungs. No acute pulmonary abnormalities noted on the X-ray examination. The heart is borderline enlarged with a midline sternotomy scar observed. The pulmonary arteries are mildly enlarged." 很好奇作者项目里的caption内容针对原始数据进行了怎样的处理。

没有论文或者技术报告吗？

如题

could not find the metadata file checkpoints/checkpoints-XrayGLM-3000/latest

为什么我的微调效果很差

1.请问为什么和作者一样微调3000次我的模型几乎不能回答问题，而且好像次数越多效果越差
2.请问作者在GitHub上展示的微调效果是只用了Openl这一个数据集训练的吗

mac m1 可以直接运行吗，还是需要修改代码

您好，开始微调时加载完VisualGLM-6B，提示还需要加载ChatGLM-6B参数吗？如果加载是全部权重吗？

/root/autodl-tmp/XrayGLM/chatglm-6b tokenizer! Now you can get tokenizer() everywhere

其中ChatGLM-6B中的tokenizer()需要全部的权重吗？

请问train-iters一般设置为多少合适

【QA】是不是不适合图片分类的问题

最开始尝试了一个多分类的问题效果很差
然后尝试了一个简单的二分类问题灵敏度也不到70%
训练批次从500-5000基本都试过
分类问题的图片数据是没问题直接通过resnet等算法灵敏度95%以上

如何解决模型过拟合问题？

模型在x光数据集上微调之后，变得只会说这些了。

微调64G内存不够，如何设置减小内存消耗

ModuleNotFoundError: No module named 'deepspeed'

按照顺序进行安装的然后在测试的时候就提示缺少库是windows系统这个是哪里出现问题了吗

RuntimeError: Error building extension 'fused_adam'

训练时报错：

FAILED: multi_tensor_adam.cuda.o
/ssd/wphu/anaconda3/bin/nvcc  -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/ssd/wphu/anaconda3/envs/visualglm/lib/python3.10/site-packages/dee$
speed/ops/csrc/includes -I/ssd/wphu/anaconda3/envs/visualglm/lib/python3.10/site-packages/deepspeed/ops/csrc/adam -isystem /ssd/wphu/anaconda3/envs/visualglm/lib/python3.10/site-packages/torch/include -isystem /ssd/wphu/anaconda3/envs/visualglm/lib/python3.10/site-pack$
ges/torch/include/torch/csrc/api/include -isystem /ssd/wphu/anaconda3/envs/visualglm/lib/python3.10/site-packages/torch/include/TH -isystem /ssd/wphu/anaconda3/envs/visualglm/lib/python3.10/site-packages/torch/include/THC -isystem /ssd/wphu/anaconda3/include -isystem /s
sd/wphu/anaconda3/envs/visualglm/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70
-gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -lineinfo --use_fast_math -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_70,code=compute_70 -std=c++14 -c /ssd/wphu/anaconda3/envs/visualglm/
lib/python3.10/site-packages/deepspeed/ops/csrc/adam/multi_tensor_adam.cu -o multi_tensor_adam.cuda.o
/bin/sh: 1: /ssd/wphu/anaconda3/bin/nvcc: not found
[2/3] c++ -MMD -MF fused_adam_frontend.o.d -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/ssd/wphu/anaconda3/envs/visualglm/lib/python3.10/site-
packages/deepspeed/ops/csrc/includes -I/ssd/wphu/anaconda3/envs/visualglm/lib/python3.10/site-packages/deepspeed/ops/csrc/adam -isystem /ssd/wphu/anaconda3/envs/visualglm/lib/python3.10/site-packages/torch/include -isystem /ssd/wphu/anaconda3/envs/visualglm/lib/python3.
10/site-packages/torch/include/torch/csrc/api/include -isystem /ssd/wphu/anaconda3/envs/visualglm/lib/python3.10/site-packages/torch/include/TH -isystem /ssd/wphu/anaconda3/envs/visualglm/lib/python3.10/site-packages/torch/include/THC -isystem /ssd/wphu/anaconda3/includ
e -isystem /ssd/wphu/anaconda3/envs/visualglm/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -std=c++14 -g -Wno-reorder -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -c /ssd/wphu/anaconda3/envs/visualglm/lib/python3.10/site-packages/deepspeed/op
s/csrc/adam/fused_adam_frontend.cpp -o fused_adam_frontend.o
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
  File "/ssd/wphu/anaconda3/envs/visualglm/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1808, in _run_ninja_build
    subprocess.run(
  File "/ssd/wphu/anaconda3/envs/visualglm/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/ssd/wphu/chatglm/VisualGLM-6B/finetune_visualglm.py", line 195, in <module>
    training_main(args, model_cls=model, forward_step_function=forward_step, create_dataset_function=create_dataset_function, collate_fn=data_collator)
  File "/ssd/wphu/anaconda3/envs/visualglm/lib/python3.10/site-packages/sat/training/deepspeed_training.py", line 98, in training_main
    model, optimizer = setup_model_untrainable_params_and_optimizer(args, model)
  File "/ssd/wphu/anaconda3/envs/visualglm/lib/python3.10/site-packages/sat/training/deepspeed_training.py", line 161, in setup_model_untrainable_params_and_optimizer
    model, optimizer, _, _ = deepspeed.initialize(
  File "/ssd/wphu/anaconda3/envs/visualglm/lib/python3.10/site-packages/deepspeed/__init__.py", line 165, in initialize
    engine = DeepSpeedEngine(args=args,
  File "/ssd/wphu/anaconda3/envs/visualglm/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 308, in __init__
    self._configure_optimizer(optimizer, model_parameters)
  File "/ssd/wphu/anaconda3/envs/visualglm/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1162, in _configure_optimizer
    basic_optimizer = self._configure_basic_optimizer(model_parameters)
  File "/ssd/wphu/anaconda3/envs/visualglm/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1224, in _configure_basic_optimizer
    optimizer = FusedAdam(
  File "/ssd/wphu/anaconda3/envs/visualglm/lib/python3.10/site-packages/deepspeed/ops/adam/fused_adam.py", line 71, in __init__
    fused_adam_cuda = FusedAdamBuilder().load()
  File "/ssd/wphu/anaconda3/envs/visualglm/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 445, in load
    return self.jit_load(verbose)
  File "/ssd/wphu/anaconda3/envs/visualglm/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 480, in jit_load
    op_module = load(name=self.name,
  File "/ssd/wphu/anaconda3/envs/visualglm/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1202, in load
    return _jit_compile(
  File "/ssd/wphu/anaconda3/envs/visualglm/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1425, in _jit_compile
    _write_ninja_file_and_build_library(
  File "/ssd/wphu/anaconda3/envs/visualglm/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1537, in _write_ninja_file_and_build_library
    _run_ninja_build(
  File "/ssd/wphu/anaconda3/envs/visualglm/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1824, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension 'fused_adam'
[2023-06-05 14:26:03,449] [INFO] [launch.py:428:sigkill_handler] Killing subprocess 64806
[2023-06-05 14:26:03,450] [ERROR] [launch.py:434:sigkill_handler] ['/ssd/wphu/anaconda3/envs/visualglm/bin/python', '-u', 'finetune_visualglm.py', '--local_rank=0', '--experiment-name', 'finetune-visualglm-6b', '--model-parallel-size', '1', '--mode', 'finetune', '--train-iters', '300', '--resume-dataloader', '--max_source_length', '64', '--max_target_length', '256', '--lora_rank', '10', '--layer_range', '0', '14', '--pre_seq_len', '4', '--train-data', './data/dataset.json', '--valid-data', './data/dataset.json', '--distributed-backend', 'nccl', '--lr-decay-style', 'cosine', '--warmup', '.02', '--checkpoint-activations', '--save-interval', '300', '--eval-interval', '10000', '--save', './checkpoints', '--split', '1', '--eval-iters', '10', '--eval-batch-size', '1', '--zero-stage', '1', '--lr', '0.0001', '--batch-size', '1', '--gradient-accumulation-steps', '4', '--skip-init', '--fp16', '--use_qlora'] exits with return code = 1

【🌟欢迎共建】XrayGLM项目TODO

这里将会给出一些XrayGLM后续的规划任务：

为XrayGLM增加一个WebUI界面；
将数据中的诊断报告质量进一步提升；
在更高质量报告基础上重新探索XrayGLM的性能提升；
提升XrayGLM模型权重的下载速度；
提供一份详细的关于VisualGLM-6B的训练指南；
类似DetGPT引入检测区域；
测试QLoRA方法微调效果以及时间和显存占用；

Xray的图像数据的百度网盘提取码是多少？

如题，请问Xray的图像数据的百度网盘提取码是多少？

重新训练时出现以下错误，求解答！

bash finetune_XrayGLM.sh
NCCL_DEBUG=info NCCL_IB_DISABLE=0 NCCL_NET_GDR_LEVEL=2 deepspeed --master_port 16666 --hostfile hostfile_single finetune_XrayGLM.py --experiment-name finetune-XrayGLM --model-parallel-size 1 --mode finetune --train-iters 300 --resume-dataloader --max_source_length 64 --max_target_length 256 --lora_rank 10 --pre_seq_len 4 --train-data ./data/Xray/openi-zh.json --valid-data ./data/Xray/openi-zh.json --distributed-backend nccl --lr-decay-style cosine --warmup .02 --checkpoint-activations --save-interval 3000 --eval-interval 10000 --save ./checkpoints --split 1 --eval-iters 10 --eval-batch-size 8 --zero-stage 1 --lr 0.0001 --batch-size 2 --skip-init --fp16 --use_lora
[2023-06-17 05:52:24,428] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-06-17 05:52:24,793] [WARNING] [runner.py:196:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
Detected CUDA_VISIBLE_DEVICES=0: setting --include=localhost:0
[2023-06-17 05:52:24,806] [INFO] [runner.py:555:main] cmd = /home/yunfei/XrayGLM-main/venv2/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMF19 --master_addr=127.0.0.1 --master_port=16666 --enable_each_rank_log=None finetune_XrayGLM.py --experiment-name finetune-XrayGLM --model-parallel-size 1 --mode finetune --train-iters 300 --resume-dataloader --max_source_length 64 --max_target_length 256 --lora_rank 10 --pre_seq_len 4 --train-data ./data/Xray/openi-zh.json --valid-data ./data/Xray/openi-zh.json --distributed-backend nccl --lr-decay-style cosine --warmup .02 --checkpoint-activations --save-interval 3000 --eval-interval 10000 --save ./checkpoints --split 1 --eval-iters 10 --eval-batch-size 8 --zero-stage 1 --lr 0.0001 --batch-size 2 --skip-init --fp16 --use_lora
[2023-06-17 05:52:25,364] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-06-17 05:52:25,599] [INFO] [launch.py:138:main] 0 NCCL_IB_DISABLE=0
[2023-06-17 05:52:25,599] [INFO] [launch.py:138:main] 0 NCCL_DEBUG=info
[2023-06-17 05:52:25,599] [INFO] [launch.py:138:main] 0 NCCL_NET_GDR_LEVEL=2
[2023-06-17 05:52:25,599] [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [0]}
[2023-06-17 05:52:25,599] [INFO] [launch.py:151:main] nnodes=1, num_local_procs=1, node_rank=0
[2023-06-17 05:52:25,599] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0]})
[2023-06-17 05:52:25,599] [INFO] [launch.py:163:main] dist_world_size=1
[2023-06-17 05:52:25,599] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=0
[2023-06-17 05:52:26,180] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

bin /home/yunfei/XrayGLM-main/venv2/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so
/home/yunfei/XrayGLM-main/venv2/lib/python3.9/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
warn("The installed version of bitsandbytes was compiled without GPU support. "
/home/yunfei/XrayGLM-main/venv2/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32
/home/yunfei/XrayGLM-main/venv2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /home/yunfei/anaconda3 did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
warn(msg)
/home/yunfei/XrayGLM-main/venv2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /usr/local/cuda-11.3/lib64 did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
warn(msg)
/home/yunfei/XrayGLM-main/venv2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('@/tmp/.ICE-unix/2258,unix/yunfei'), PosixPath('local/yunfei')}
warn(msg)
/home/yunfei/XrayGLM-main/venv2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/etc/xdg/xdg-ubuntu')}
warn(msg)
/home/yunfei/XrayGLM-main/venv2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('0'), PosixPath('1')}
warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
/home/yunfei/XrayGLM-main/venv2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: No libcudart.so found! Install CUDA or the cudatoolkit package (anaconda)!
warn(msg)
CUDA SETUP: Highest compute capability among GPUs detected: 8.9
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary /home/yunfei/XrayGLM-main/venv2/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so...
[2023-06-17 05:52:26,857] [WARNING] Failed to load bitsandbytes:No module named 'scipy'
[2023-06-17 05:52:26,866] [INFO] using world size: 1 and model-parallel size: 1
[2023-06-17 05:52:26,867] [INFO] > padded vocab (size: 100) with 28 dummy tokens (new size: 128)
[2023-06-17 05:52:26,869] [INFO] [RANK 0] > initializing model parallel with size 1
[2023-06-17 05:52:26,869] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-06-17 05:52:26,869] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-06-17 05:52:26,869] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter cpu_offload is deprecated use offload_optimizer instead
[2023-06-17 05:52:26,870] [INFO] [checkpointing.py:764:_configure_using_config_file] {'partition_activations': False, 'contiguous_memory_optimization': False, 'cpu_checkpointing': False, 'number_checkpoints': None, 'synchronize_checkpoint_boundary': False, 'profile': False}
[2023-06-17 05:52:26,870] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234
[2023-06-17 05:52:26,871] [INFO] [RANK 0] building FineTuneVisualGLMModel model ...
/home/yunfei/XrayGLM-main/venv2/lib/python3.9/site-packages/torch/nn/init.py:405: UserWarning: Initializing zero-element tensors is a no-op
warnings.warn("Initializing zero-element tensors is a no-op")
replacing layer 0 attention with lora
replacing layer 14 attention with lora
[2023-06-17 05:52:32,096] [INFO] [RANK 0] > number of parameters on model parallel rank 0: 7811237376
[2023-06-17 05:52:32,398] [INFO] [RANK 0] global rank 0 is loading checkpoint /home/yunfei/.sat_models/visualglm-6b/1/mp_rank_00_model_states.pt
[2023-06-17 05:52:43,645] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 35510
[2023-06-17 05:52:43,647] [ERROR] [launch.py:321:sigkill_handler] ['/home/yunfei/XrayGLM-main/venv2/bin/python', '-u', 'finetune_XrayGLM.py', '--local_rank=0', '--experiment-name', 'finetune-XrayGLM', '--model-parallel-size', '1', '--mode', 'finetune', '--train-iters', '300', '--resume-dataloader', '--max_source_length', '64', '--max_target_length', '256', '--lora_rank', '10', '--pre_seq_len', '4', '--train-data', './data/Xray/openi-zh.json', '--valid-data', './data/Xray/openi-zh.json', '--distributed-backend', 'nccl', '--lr-decay-style', 'cosine', '--warmup', '.02', '--checkpoint-activations', '--save-interval', '3000', '--eval-interval', '10000', '--save', './checkpoints', '--split', '1', '--eval-iters', '10', '--eval-batch-size', '8', '--zero-stage', '1', '--lr', '0.0001', '--batch-size', '2', '--skip-init', '--fp16', '--use_lora'] exits with return code = -9

无法正常微调

finetune_XrayGLM.py
image = processor(Image.open(item['img']).convert('RGB')) 图像路径不对

item['prompt'] 和 item['label'] 跟openi-zh.json对应不上
诊疗报告(中文)

class FewShotDataset(Dataset):
def init(self, path, processor, tokenizer, args):
max_seq_length = args.max_source_length + args.max_target_length
with open(path, 'r', encoding='utf-8') as f:
data = json.load(f)
self.images = []
self.input_ids = []
self.labels = []
for item in data:
image = processor(Image.open(item['img']).convert('RGB'))
input0 = tokenizer.encode("", add_special_tokens=False)
input1 = [tokenizer.pad_token_id] * args.image_length
input2 = tokenizer.encode("问："+item['prompt']+"\n答：", add_special_tokens=False)
a_ids = sum([input0, input1, input2], [])
b_ids = tokenizer.encode(text=item['label'], add_special_tokens=False)
if len(a_ids) > args.max_source_length - 1:
a_ids = a_ids[: args.max_source_length - 1]
if len(b_ids) > args.max_target_length - 2:
b_ids = b_ids[: args.max_target_length - 2]
pre_image = len(input0)
input_ids = tokenizer.build_inputs_with_special_tokens(a_ids, b_ids)

        context_length = input_ids.index(tokenizer.bos_token_id)
        mask_position = context_length - 1
        labels = [-100] * context_length + input_ids[mask_position+1:]
        
        pad_len = max_seq_length - len(input_ids)
        input_ids = input_ids + [tokenizer.pad_token_id] * pad_len
        labels = labels + [tokenizer.pad_token_id] * pad_len

>> X光影像路径：./data/Xray/861_1.png

>> 用户：这是一张什么图？
>> XrayGLM：这张图片显示的是一张肺部检查的X光片。

>> 用户：请描述这张胸部X光片的诊断结果
>> XrayGLM：心脏大小正常，肺部清晰。没有急性心肺疾病的证据。

>> 用户：是否存在肺气肿？
>> XrayGLM：没有肺气肿。心脏大小正常，肺部清晰。没有发现急性心肺疾病的证据。

>> 用户：平时饮食该注意什么？
>> XrayGLM：饮食应该健康，均衡。建议增加蛋白质的摄入量，因为蛋白质有助于维持肌肉和骨骼的健康。多食用蔬菜、水果、全谷物以及健康的脂肪来源如橄榄油、鱼类等可以帮助获得足够的维生素和矿物质。此外，避免过度饮酒或吸烟，这些习惯可能会对心脏健康产生负面影响。

wangrongsheng / xrayglm Goto Github PK

xrayglm's Introduction

介绍

本文贡献

数据集

快速上手

1.安装环境

2.模型推理

CLI推理

WebUI运行

3.模型训练（复现XrayGLM）

4.模型训练（通用）

效果展示

未来展望

项目致谢

免责声明

项目引用

使用许可

xrayglm's People

Contributors

Stargazers

Watchers

Forkers

xrayglm's Issues

and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

安装依赖

安装SwissArmyTransformer

and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

Recommend Projects

Recommend Topics

Recommend Org