spongebbob / finetune-chatglm2-6b Goto Github PK

View Code? Open in Web Editor NEW

394.0 394.0 42.0 93 KB

ChatGLM2-6B 全参数微调，支持多轮对话的高效微调。

License: Apache License 2.0

Python 99.53% Shell 0.47%

finetune-chatglm2-6b's Introduction

ID：包大人

关注我的知乎

finetune-chatglm2-6b's People

Contributors

Stargazers

Watchers

finetune-chatglm2-6b's Issues

max_len 768 这个长度在多轮对话中，是单次输入的长度，还是history+这次输入的长度？如果是单次输入的长度，那么history的长度在哪里？

请问可以训练chatGLM2-6B-32K的模型嘛

RuntimeError: CUDA error: invalid device ordinal

请教下大佬这个怎么调？

───────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /data/chatglm2-6b-code/Finetune-ChatGLM2-6B/main.py:377 in │
│ │
│ 374 │
│ 375 │
│ 376 if name == "main": │
│ ❱ 377 │ main() │
│ 378 │
│ │
│ /data/chatglm2-6b-code/Finetune-ChatGLM2-6B/main.py:61 in main │
│ │
│ 58 │ │ # let's parse it to get our arguments. │
│ 59 │ │ model_args, data_args, training_args = parser.parse_json_file(json_file=os.path. │
│ 60 │ else: │
│ ❱ 61 │ │ model_args, data_args, training_args = parser.parse_args_into_dataclasses() │
│ 62 │ # Setup logging │
│ 63 │ logging.basicConfig( │
│ 64 │ │ format="%(asctime)s - %(levelname)s - %(name)s - %(message)s", │
│ │
│ /opt/conda/lib/python3.10/site-packages/transformers/hf_argparser.py:332 in │
│ parse_args_into_dataclasses │
│ │
│ 329 │ │ │ inputs = {k: v for k, v in vars(namespace).items() if k in keys} │
│ 330 │ │ │ for k in keys: │
│ 331 │ │ │ │ delattr(namespace, k) │
│ ❱ 332 │ │ │ obj = dtype(**inputs) │
│ 333 │ │ │ outputs.append(obj) │
│ 334 │ │ if len(namespace.dict) > 0: │
│ 335 │ │ │ # additional namespace. │
│ in init:113 │
│ │
│ /opt/conda/lib/python3.10/site-packages/transformers/training_args.py:1227 in post_init │
│ │
│ 1224 │ │ if ( │
│ 1225 │ │ │ self.framework == "pt" │
│ 1226 │ │ │ and is_torch_available() │
│ ❱ 1227 │ │ │ and (self.device.type != "cuda") │
│ 1228 │ │ │ and (get_xla_device_type(self.device) != "GPU") │
│ 1229 │ │ │ and (self.fp16 or self.fp16_full_eval) │
│ 1230 │ │ ): │
│ │
│ /opt/conda/lib/python3.10/site-packages/transformers/training_args.py:1660 in device │
│ │
│ 1657 │ │ The device used by this process. │
│ 1658 │ │ """ │
│ 1659 │ │ requires_backends(self, ["torch"]) │
│ ❱ 1660 │ │ return self._setup_devices │
│ 1661 │ │
│ 1662 │ @Property │
│ 1663 │ def n_gpu(self): │
│ │
│ /opt/conda/lib/python3.10/site-packages/transformers/utils/generic.py:54 in get │
│ │
│ 51 │ │ attr = "_cached" + self.fget.name │
│ 52 │ │ cached = getattr(obj, attr, None) │
│ 53 │ │ if cached is None: │
│ ❱ 54 │ │ │ cached = self.fget(obj) │
│ 55 │ │ │ setattr(obj, attr, cached) │
│ 56 │ │ return cached │
│ 57 │
│ │
│ /opt/conda/lib/python3.10/site-packages/transformers/training_args.py:1650 in _setup_devices │
│ │
│ 1647 │ │ │
│ 1648 │ │ if device.type == "cuda": │
│ 1649 │ │ │ print(f"------------device--------:{device}") │
│ ❱ 1650 │ │ │ torch.cuda.set_device(device) │
│ 1651 │ │ │
│ 1652 │ │ return device │
│ 1653 │
│ │
│ /opt/conda/lib/python3.10/site-packages/torch/cuda/init.py:326 in set_device │
│ │
│ 323 │ """ │
│ 324 │ device = _get_device_index(device) │
│ 325 │ if device >= 0: │
│ ❱ 326 │ │ torch._C._cuda_setDevice(device) │
│ 327 │
│ 328 │
│ 329 def get_device_name(device: Optional[_device_t] = None) -> str: │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

sh ds_train_finetune.sh
[2023-07-05 08:26:05,121] [WARNING] [runner.py:191:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
Detected CUDA_VISIBLE_DEVICES=0,1,2 but ignoring it because one or several of --include/--exclude/--num_gpus/--num_nodes cl args were used. If you want to use CUDA_VISIBLE_DEVICES don't pass any of these arguments to deepspeed.
[2023-07-05 08:26:05,168] [INFO] [runner.py:541:main] cmd = /opt/conda/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMSwgMiwgMywgNCwgNSwgNiwgN119 --master_addr=127.0.0.1 --master_port=8888 --enable_each_rank_log=None main.py --deepspeed deepspeed.json --do_train --do_eval --train_file /data/chatglm2-6b-code/Finetune-ChatGLM2-6B/data/131w/train.json --validation_file /data/chatglm2-6b-code/Finetune-ChatGLM2-6B/data/131w/validate.json --prompt_column conversations --overwrite_cache --model_name_or_path /data/chatglm2-6b --output_dir /data/chatglm2-6b-code/Finetune-ChatGLM2-6B/output/output0705-1 --overwrite_output_dir --max_length 762 --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --gradient_accumulation_steps 12 --predict_with_generate --num_train_epochs 3 --logging_steps 50 --save_steps 1000000 --learning_rate 6e-6 --do_eval False --fp16 True --save_total_limit 5
[2023-07-05 08:26:10,601] [INFO] [launch.py:229:main] WORLD INFO DICT: {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]}
[2023-07-05 08:26:10,601] [INFO] [launch.py:235:main] nnodes=1, num_local_procs=8, node_rank=0
[2023-07-05 08:26:10,601] [INFO] [launch.py:246:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]})
[2023-07-05 08:26:10,601] [INFO] [launch.py:247:main] dist_world_size=8
[2023-07-05 08:26:10,601] [INFO] [launch.py:249:main] Setting CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
[2023-07-05 08:26:20,093] [INFO] [comm.py:622:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
------------device--------:cuda:3
------------device--------:cuda:6
------------device--------:cuda:5
------------device--------:cuda:0
------------device--------:cuda:1
------------device--------:cuda:4
------------device--------:cuda:7
------------device--------:cuda:2
Using the WANDB_DISABLED environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the WANDB_DISABLED environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
07/05/2023 08:26:21 - WARNING - main - Process rank: 1, device: cuda:1, n_gpu: 1distributed training: True, 16-bits training: True
07/05/2023 08:26:21 - WARNING - main - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, 16-bits training: True

export CUDA_VISIBLE_DEVICES=0,1,2 好像不起作用

为什么不用新版本的transformer么？

模型训练，input_ids出现None type

[INFO|modeling_utils.py:2927] 2023-07-13 06:17:15,679 >> Generation config file not found, using a generation config created from the model config.
input_ids [64790, 64792, 790, 30951, 517, 30910, 30940, 30996, 13, 13, 54761, 31211, 37234, 31211, 50769, 32096, 34009, 38372, 30939, 30940, 32074, 31643, 35220, 31715, 31123, 31654, 50769, 54561, 32585, 31715, 30943, 32154, 31123, 31783, 54572, 54818, 32074, 54942, 32326, 55055, 31514, 13, 13, 55437, 31211, 30910, 36037, 31809, 32615, 31201, 52116, 31201, 36583, 32927, 31639, 31155, 34992, 31662, 40384, 31211, 32615, 57907, 52116, 59086, 31643, 53668, 31868, 31155, 13, 31659, 50769, 32096, 34009, 54942, 30943, 32154, 31123, 31672, 31804, 52116, 54541, 30943, 38807, 31155, 47322, 32096, 34009, 54552, 38372, 30939, 30940, 32074, 31643, 35220, 31715, 31123, 31814, 31804, 38903, 30939, 30940, 32074, 31123, 54996, 30978, 30940, 30940, 56315, 31155, 13, 31672, 50769, 54818, 32074, 39357, 32585, 54541, 30910, 30943, 32154, 1381, 30910, 30978, 30940, 30940, 56315, 542, 30910, 30940, 30930, 30940, 30940, 30966, 30966, 32154, 30967, 56315, 40663, 30910, 30966, 30930, 30966, 55055, 30967, 56315, 31155, 13, 33161, 31211, 50769, 54818, 32074, 54942, 30966, 30930, 30966, 55055, 31155, 13, 13, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None]
Traceback (most recent call last):
File "main.py", line 376, in
main()
File "main.py", line 207, in main
print_dataset_example(train_dataset[0])
File "main.py", line 186, in print_dataset_example
print("inputs", tokenizer.decode(example["input_ids"]))
File "/opt/conda/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 3509, in decode
return self._decode(
File "/opt/conda/lib/python3.8/site-packages/transformers/tokenization_utils.py", line 931, in _decode
filtered_tokens = self.convert_ids_to_tokens(token_ids, skip_special_tokens=skip_special_tokens)
File "/opt/conda/lib/python3.8/site-packages/transformers/tokenization_utils.py", line 906, in convert_ids_to_tokens
index = int(index)
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'

我只有200多条多轮对话的数据，去做全参微调能有效果吗？

一下是我的参数
LR=6e-6
DATE=0704
EPOCH=2
MAX_LEN=1024
MASTER_PORT=8888
deepspeed --num_gpus=8 --master_port $MASTER_PORT main.py
--deepspeed deepspeed.json
--do_train
--do_eval
--train_file car_train.json
--validation_file car_dev.json
--prompt_column conversations
--overwrite_cache
--model_name_or_path /data/project/th/chatglm2-6b
--output_dir ./output/adgen-chatglm-6b-ft-$LR-$DATE-$MAX_LEN-epoch-$EPOCH
--overwrite_output_dir
--max_length $MAX_LEN
--per_device_train_batch_size 8
--per_device_eval_batch_size 1
--gradient_accumulation_steps 2
--predict_with_generate
--num_train_epochs $EPOCH
--logging_steps 20
--max_steps 1000
--save_steps 500
--learning_rate $LR
--do_eval False
--fp16 True
--save_total_limit 5 \

微调时在runing tokenizer on train dataset这一步卡住

模型加载正常，但是在Runing tokenizer on train dataset这一步卡住，进度条一直不变
使用的显卡型号为A800 80G，训练时开了两张卡。CPU的型号为Intel(R) Xeon(R) Silver 4316 CPU，开了4个核
下面是训练的参数设置

#!/bin/bash

LR=6e-6
DATE=1009

deepspeed --num_gpus=2 main.py
--deepspeed deepspeed.json
--do_train
--train_file /data/nobel/code/glm/Finetune-ChatGLM2-6B/dataset/trainset.json
--overwrite_cache
--model_name_or_path /data/nobel/code/glm/ChatGLM2-6B/chatglm2-6b
--output_dir ./output/adgen-chatglm-6b-ft-$LR-$DATE
--overwrite_output_dir
--preprocessing_num_workers 4
--max_length 1000
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--gradient_accumulation_steps 12
--predict_with_generate
--num_train_epochs 3
--logging_steps 20
--save_steps 1000
--learning_rate $LR
--do_eval False
--fp16 True
--save_total_limit 5 \

--max_length 762，我想问一下这个是输入的最大长度吗，如果是，那么如果是多轮对话，有长度限制吗？

模型训练，出现deepspeed_init() got an unexpected keyword argument 'resume_from_checkpoint'

do_eval时报错

trainer.evaluate报错，说只能支持Stage 3。

但切换到Stage 3会报另一个错误。

请问这个问题怎么解决OSError: Can't get source for <function apply_rotary_pos_emb at 0x7fef8c15f790>. TorchScript requires source access in order to carry out compilation, make sure original .py files are available.

OSError: Can't get source for <function apply_rotary_pos_emb at 0x7fef8c15f790>.
TorchScript requires source access in order to carry out compilation, make sure
original .py files are available.

ChatGLM数据处理中input_ids的padding问题

在实践中注意到ChatGLM的tokenizer跟其它模型的不太一样，其用的是'padding_side=left'，也就是说ChatGLM的tokenzier会在一开始的时候填充pad_token，但我注意到作者您似乎不仅用的是eos_token来padding，而且还padding到末位，这是否会影响到Chatglm finetune的性能呢？

CUDA out of memory. Tried to allocate 11.63 GiB (GPU 0; 23.69 GiB total capacity; 11.63 GiB already allocated; 11.28 GiB free

CUDA out of memory. Tried to allocate 11.63 GiB (GPU 0; 23.69 GiB total capacity; 11.63 GiB already allocated; 11.28 GiB free; 11.63 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
[2023-07-11 08:56:40,747] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 16794

请问全量微调硬件条件是什么啊，一张3090可以跑吗？两张呢？

timed out

timed out initializing process group in store based barrier on rank

全量微调显存占用

请问大佬在全量微调时，显存占用是多少呢？

基于school_math_0.25M.json数据集进行微调训练后得到的模型推理效果很差，是什么原因？

deepspeed --num_gpus=4 --master_port $MASTER_PORT main.py
--deepspeed deepspeed.json
--quantization_bit 8
...
在V100机器上进行4卡训练，加上--quantization_bit 8避免oom，训练一个epoch后，得到的模型进行推理，推理效果非常差。另外通过web_demo2.py启动web服务，经常回答输出一点就停了，观测推理进程是正常的。

tokenizer = AutoTokenizer.from_pretrained("/xxx/ChatGLM2-6B/THUDM/chatglm2-6b-int4", trust_remote_code=True) model = AutoModel.from_pretrained("/xxx/ChatGLM2-6B/output/adgen-chatglm2-6b-ft-1e-4/checkpoint-15000", trust_remote_code=True).cuda(1)

有考虑加入lora训练吗

大佬好，有考虑加入lora训练吗

微调出来的模型可以直接用chatglm2官网发版的web_demo2.py来测吗，会有影响吗？

请问这个项目的依赖是什么，安装的deepspeed版本好像跑不了。

deepspeed_init函数没有resume_from_checkpoint参数，但是你的源码里面用到了

多轮数据下载后要如何处理?

训练时是否对多轮的label mask 有特殊处理

训练数据过多时报错Socket Timeout

8卡A100采用200万数据按照脚本训练，运行正常，但超过230万后，会在Running tokenizer on train dataset这一步时报错：
This may indicate a possible application crash on rank 0 or a network set up issue.[7] is setting up NCCL communicator and retrieving ncclUniqueId from [0] via c10d key-value store by key '0', but store->get('0') got error: Socket Timeout
Exception raised from recvBytes at ../torch/csrc/distributed/c10d/Utils.hpp:604 (most recent call first)
不知各位大神是否遇到同样问题

已解决，需修改deepspeed的timeout参数

请问全参数微调需要什么配置呢？

我是用双卡A100 80G做全量微调，但是一直爆OOM的错误。
db_train_finetune.sh内容是：

MASTER_PORT=8888

deepspeed --num_gpus=1 --master_port $MASTER_PORT main.py
--deepspeed deepspeed.json
--do_train
--do_eval
--train_file belleMath-train5K.json
--validation_file belleMath-dev1K.json
--prompt_column conversations
--overwrite_cache
--model_name_or_path ../models/ChatGLM2_model
--output_dir ./output/adgen-chatglm-6b-ft-$LR-$DATE
--overwrite_output_dir
--max_length 64
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--gradient_accumulation_steps 12
--predict_with_generate
--num_train_epochs 3
--logging_steps 20
--save_steps 1000
--learning_rate $LR
--do_eval False
--fp16 True
--save_total_limit 5 \

deepspeed用的是项目默认的配置：
{
"optimizer": {
"type": "AdamW",
"params": {
"lr": "auto",
"weight_decay": "auto",
"torch_adam": true,
"adam_w_mode": true
}
},
"scheduler": {
"type": "WarmupDecayLR",
"params": {
"warmup_min_lr": "auto",
"warmup_max_lr": "auto",
"warmup_num_steps": "auto",
"total_num_steps": "auto"
}
},
"fp16": {
"enabled": "auto",
"loss_scale": 0,
"loss_scale_window": 1000,
"initial_scale_power": 16,
"hysteresis": 2,
"min_loss_scale": 1
},
"zero_optimization": {
"stage": 2,
"allgather_partitions": true,
"allgather_bucket_size": 2e8,
"reduce_scatter": true,
"reduce_bucket_size": "auto",
"overlap_comm": true,
"contiguous_gradients": true
},
"gradient_accumulation_steps": "auto",
"gradient_clipping": "auto",
"steps_per_print": 20,
"train_batch_size": "auto",
"train_micro_batch_size_per_gpu": "auto",
"wall_clock_breakdown": false
}

spongebbob / finetune-chatglm2-6b Goto Github PK

finetune-chatglm2-6b's Introduction

ID： 包大人

finetune-chatglm2-6b's People

Contributors

Stargazers

Watchers

Forkers

finetune-chatglm2-6b's Issues

Recommend Projects

Recommend Topics

Recommend Org

ID：包大人