使用谷歌COLAB按照主页教程搭建成功
之后在本地的WSL2-Ubuntu-22.04上尝试搭建XrayGLM,运行后报错
第一次报错:
lsj@DESKTOP-H1KB736:/mnt/c/Users/38561/xrayglm$ python cli_demo.py --from_pretrained checkpoints/checkpoints-XrayGLM-300 --prompt_zh '详细描述这张胸部X光片的诊断结果'
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run
python -m bitsandbytes
bin /home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so
/home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
warn("The installed version of bitsandbytes was compiled without GPU support. "
/home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32
/home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /home/lsj/.conda/envs/xrayglm did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
warn(msg)
/home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/cuda-11.6/lib64}')}
warn(msg)
/home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /usr/local/cuda-11.6/lib64} did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
/home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so'), PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get CUDA error: invalid device function
errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
warn(msg)
CUDA SETUP: WARNING! libcuda.so not found! Do you have a CUDA driver installed? If you are on a cluster, make sure you are on a CUDA machine!
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
/home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: No GPU detected! Check your CUDA paths. Proceeding to load CPU-only library...
warn(msg)
CUDA SETUP: Loading binary /home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so...
[2023-05-31 16:25:18,755] [WARNING] Failed to load bitsandbytes:No module named 'scipy'
[2023-05-31 16:25:18,763] [INFO] building FineTuneVisualGLMModel model ...
40901
[2023-05-31 16:25:18,845] [INFO] [RANK 0] > initializing model parallel with size 1
[2023-05-31 16:25:18,846] [INFO] [RANK 0] You are using model-only mode.
For torch.distributed users or loading model parallel models, set environment variables RANK, WORLD_SIZE and LOCAL_RANK.
Traceback (most recent call last):
File "/mnt/c/Users/38561/xrayglm/cli_demo.py", line 104, in
main()
File "/mnt/c/Users/38561/xrayglm/cli_demo.py", line 30, in main
model, model_args = AutoModel.from_pretrained(
File "/home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/sat/model/base_model.py", line 282, in from_pretrained
model = get_model(args, model_cls, **kwargs)
File "/home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/sat/model/base_model.py", line 305, in get_model
model = model_cls(args, params_dtype=params_dtype, **kwargs)
File "/mnt/c/Users/38561/xrayglm/finetune_XrayGLM.py", line 13, in init
super().init(args, transformer=transformer, parallel_output=parallel_output, **kw_args)
File "/mnt/c/Users/38561/xrayglm/model/visualglm.py", line 29, in init
super().init(args, transformer=transformer, **kwargs)
File "/home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/sat/model/official/chatglm_model.py", line 170, in init
super(ChatGLMModel, self).init(args, transformer=transformer, activation_func=gelu, **kwargs)
File "/home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/sat/model/base_model.py", line 88, in init
self.transformer = BaseTransformer(
File "/home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/sat/model/transformer.py", line 427, in init
[get_layer(layer_id) for layer_id in range(num_layers)])
File "/home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/sat/model/transformer.py", line 427, in
[get_layer(layer_id) for layer_id in range(num_layers)])
File "/home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/sat/model/transformer.py", line 402, in get_layer
return BaseTransformerLayer(
File "/home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/sat/model/transformer.py", line 313, in init
self.mlp = MLP(
File "/home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/sat/model/transformer.py", line 189, in init
self.dense_h_to_4h = ColumnParallelLinear(
File "/home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/sat/mpu/layers.py", line 219, in init
self.weight = Parameter(torch.empty(self.output_size_per_partition,
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 128.00 MiB (GPU 0; 12.00 GiB total capacity; 11.25 GiB already allocated; 0 bytes free; 11.25 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
看似显存溢出了,把模型改为quant 4,第二次报错:
(xrayglm) lsj@DESKTOP-H1KB736:/mnt/c/Users/38561/xrayglm$ python cli_demo.py --quant 4 --from_pretrained checkpoints/che
ckpoints-XrayGLM-300 --prompt_zh '详细描述这张胸部X光片的诊断结果'
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run
python -m bitsandbytes
bin /home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so
/home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
warn("The installed version of bitsandbytes was compiled without GPU support. "
/home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32
/home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /home/lsj/.conda/envs/xrayglm did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
warn(msg)
/home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/cuda-11.6/lib64}')}
warn(msg)
/home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /usr/local/cuda-11.6/lib64} did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
/home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so'), PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get CUDA error: invalid device function
errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
warn(msg)
CUDA SETUP: WARNING! libcuda.so not found! Do you have a CUDA driver installed? If you are on a cluster, make sure you are on a CUDA machine!
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
/home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: No GPU detected! Check your CUDA paths. Proceeding to load CPU-only library...
warn(msg)
CUDA SETUP: Loading binary /home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so...
[2023-05-31 16:32:29,588] [WARNING] Failed to load bitsandbytes:No module named 'scipy'
[2023-05-31 16:32:29,593] [INFO] building FineTuneVisualGLMModel model ...
42795
[2023-05-31 16:32:29,645] [INFO] [RANK 0] > initializing model parallel with size 1
[2023-05-31 16:32:29,647] [INFO] [RANK 0] You are using model-only mode.
For torch.distributed users or loading model parallel models, set environment variables RANK, WORLD_SIZE and LOCAL_RANK.
/home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/torch/nn/init.py:405: UserWarning: Initializing zero-element tensors is a no-op
warnings.warn("Initializing zero-element tensors is a no-op")
replacing layer 0 with lora
replacing layer 14 with lora
[2023-05-31 16:32:55,759] [INFO] [RANK 0] > number of parameters on model parallel rank 0: 7811237376
[2023-05-31 16:32:59,754] [INFO] [RANK 0] global rank 0 is loading checkpoint checkpoints/checkpoints-XrayGLM-300/300/mp_rank_00_model_states.pt
Killed
好像没装scipy:
pip install scipy
Collecting scipy
Downloading scipy-1.10.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (34.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 34.5/34.5 MB 6.1 MB/s eta 0:00:00
Requirement already satisfied: numpy<1.27.0,>=1.19.5 in /home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages (from scipy) (1.24.3)
Installing collected packages: scipy
Successfully installed scipy-1.10.1
第三次报错:
(xrayglm) lsj@DESKTOP-H1KB736:/mnt/c/Users/38561/xrayglm$ python cli_demo.py --quant 4 --from_pretrained checkpoints/checkpoints-XrayGLM-300 --prompt_zh '详细描述这张胸部X光片的诊断结果'
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run
python -m bitsandbytes
bin /home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so
/home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
warn("The installed version of bitsandbytes was compiled without GPU support. "
/home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32
/home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /home/lsj/.conda/envs/xrayglm did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
warn(msg)
/home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/cuda-11.6/lib64}')}
warn(msg)
/home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /usr/local/cuda-11.6/lib64} did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
/home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so'), PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get CUDA error: invalid device function
errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
warn(msg)
CUDA SETUP: WARNING! libcuda.so not found! Do you have a CUDA driver installed? If you are on a cluster, make sure you are on a CUDA machine!
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
/home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: No GPU detected! Check your CUDA paths. Proceeding to load CPU-only library...
warn(msg)
CUDA SETUP: Loading binary /home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so...
[2023-05-31 16:36:46,280] [INFO] building FineTuneVisualGLMModel model ...
60615
[2023-05-31 16:36:46,285] [INFO] [RANK 0] > initializing model parallel with size 1
[2023-05-31 16:36:46,287] [INFO] [RANK 0] You are using model-only mode.
For torch.distributed users or loading model parallel models, set environment variables RANK, WORLD_SIZE and LOCAL_RANK.
/home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/torch/nn/init.py:405: UserWarning: Initializing zero-element tensors is a no-op
warnings.warn("Initializing zero-element tensors is a no-op")
replacing layer 0 with lora
replacing layer 14 with lora
[2023-05-31 16:36:53,258] [INFO] [RANK 0] > number of parameters on model parallel rank 0: 7811237376
[2023-05-31 16:36:53,886] [INFO] [RANK 0] global rank 0 is loading checkpoint checkpoints/checkpoints-XrayGLM-300/300/mp_rank_00_model_states.pt
Killed
nvcc -V是可以看得到cuda的,好像是bitsandbytes的问题,我按照https://blog.csdn.net/anycall201/article/details/129930919修改了一下
最后还是被killed:
(xrayglm) lsj@DESKTOP-H1KB736:/mnt/c/Users/38561/XrayGLM$ python cli_demo.py --quant 4 --from_pretrained checkpoints/checkpoints-XrayGLM-300 --prompt_zh '详细描述这张胸部X光片的诊断结果'
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run
python -m bitsandbytes
bin /home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda116.so
CUDA SETUP: Loading binary /home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda116.so...
[2023-05-31 16:48:33,857] [INFO] building FineTuneVisualGLMModel model ...
60827
[2023-05-31 16:48:33,862] [INFO] [RANK 0] > initializing model parallel with size 1
[2023-05-31 16:48:33,864] [INFO] [RANK 0] You are using model-only mode.
For torch.distributed users or loading model parallel models, set environment variables RANK, WORLD_SIZE and LOCAL_RANK.
/home/lsj/.conda/envs/xrayglm/lib/python3.9/site-packages/torch/nn/init.py:405: UserWarning: Initializing zero-element tensors is a no-op
warnings.warn("Initializing zero-element tensors is a no-op")
replacing layer 0 with lora
replacing layer 14 with lora
[2023-05-31 16:48:40,797] [INFO] [RANK 0] > number of parameters on model parallel rank 0: 7811237376
[2023-05-31 16:48:42,470] [INFO] [RANK 0] global rank 0 is loading checkpoint checkpoints/checkpoints-XrayGLM-300/300/mp_rank_00_model_states.pt
Killed
求解答,谢谢!