spongebbob / finetune-chatglm2-6b Goto Github PK
View Code? Open in Web Editor NEWChatGLM2-6B 全参数微调,支持多轮对话的高效微调。
License: Apache License 2.0
ChatGLM2-6B 全参数微调,支持多轮对话的高效微调。
License: Apache License 2.0
max_len 768 这个长度在多轮对话中,是单次输入的长度,还是history+这次输入的长度? 如果是单次输入的长度,那么history的长度在哪里?
请教下大佬这个怎么调?
───────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /data/chatglm2-6b-code/Finetune-ChatGLM2-6B/main.py:377 in │
│ │
│ 374 │
│ 375 │
│ 376 if name == "main": │
│ ❱ 377 │ main() │
│ 378 │
│ │
│ /data/chatglm2-6b-code/Finetune-ChatGLM2-6B/main.py:61 in main │
│ │
│ 58 │ │ # let's parse it to get our arguments. │
│ 59 │ │ model_args, data_args, training_args = parser.parse_json_file(json_file=os.path. │
│ 60 │ else: │
│ ❱ 61 │ │ model_args, data_args, training_args = parser.parse_args_into_dataclasses() │
│ 62 │ # Setup logging │
│ 63 │ logging.basicConfig( │
│ 64 │ │ format="%(asctime)s - %(levelname)s - %(name)s - %(message)s", │
│ │
│ /opt/conda/lib/python3.10/site-packages/transformers/hf_argparser.py:332 in │
│ parse_args_into_dataclasses │
│ │
│ 329 │ │ │ inputs = {k: v for k, v in vars(namespace).items() if k in keys} │
│ 330 │ │ │ for k in keys: │
│ 331 │ │ │ │ delattr(namespace, k) │
│ ❱ 332 │ │ │ obj = dtype(**inputs) │
│ 333 │ │ │ outputs.append(obj) │
│ 334 │ │ if len(namespace.dict) > 0: │
│ 335 │ │ │ # additional namespace. │
│ in init:113 │
│ │
│ /opt/conda/lib/python3.10/site-packages/transformers/training_args.py:1227 in post_init │
│ │
│ 1224 │ │ if ( │
│ 1225 │ │ │ self.framework == "pt" │
│ 1226 │ │ │ and is_torch_available() │
│ ❱ 1227 │ │ │ and (self.device.type != "cuda") │
│ 1228 │ │ │ and (get_xla_device_type(self.device) != "GPU") │
│ 1229 │ │ │ and (self.fp16 or self.fp16_full_eval) │
│ 1230 │ │ ): │
│ │
│ /opt/conda/lib/python3.10/site-packages/transformers/training_args.py:1660 in device │
│ │
│ 1657 │ │ The device used by this process. │
│ 1658 │ │ """ │
│ 1659 │ │ requires_backends(self, ["torch"]) │
│ ❱ 1660 │ │ return self._setup_devices │
│ 1661 │ │
│ 1662 │ @Property │
│ 1663 │ def n_gpu(self): │
│ │
│ /opt/conda/lib/python3.10/site-packages/transformers/utils/generic.py:54 in get │
│ │
│ 51 │ │ attr = "_cached" + self.fget.name │
│ 52 │ │ cached = getattr(obj, attr, None) │
│ 53 │ │ if cached is None: │
│ ❱ 54 │ │ │ cached = self.fget(obj) │
│ 55 │ │ │ setattr(obj, attr, cached) │
│ 56 │ │ return cached │
│ 57 │
│ │
│ /opt/conda/lib/python3.10/site-packages/transformers/training_args.py:1650 in _setup_devices │
│ │
│ 1647 │ │ │
│ 1648 │ │ if device.type == "cuda": │
│ 1649 │ │ │ print(f"------------device--------:{device}") │
│ ❱ 1650 │ │ │ torch.cuda.set_device(device) │
│ 1651 │ │ │
│ 1652 │ │ return device │
│ 1653 │
│ │
│ /opt/conda/lib/python3.10/site-packages/torch/cuda/init.py:326 in set_device │
│ │
│ 323 │ """ │
│ 324 │ device = _get_device_index(device) │
│ 325 │ if device >= 0: │
│ ❱ 326 │ │ torch._C._cuda_setDevice(device) │
│ 327 │
│ 328 │
│ 329 def get_device_name(device: Optional[_device_t] = None) -> str: │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
sh ds_train_finetune.sh
[2023-07-05 08:26:05,121] [WARNING] [runner.py:191:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
Detected CUDA_VISIBLE_DEVICES=0,1,2 but ignoring it because one or several of --include/--exclude/--num_gpus/--num_nodes cl args were used. If you want to use CUDA_VISIBLE_DEVICES don't pass any of these arguments to deepspeed.
[2023-07-05 08:26:05,168] [INFO] [runner.py:541:main] cmd = /opt/conda/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMSwgMiwgMywgNCwgNSwgNiwgN119 --master_addr=127.0.0.1 --master_port=8888 --enable_each_rank_log=None main.py --deepspeed deepspeed.json --do_train --do_eval --train_file /data/chatglm2-6b-code/Finetune-ChatGLM2-6B/data/131w/train.json --validation_file /data/chatglm2-6b-code/Finetune-ChatGLM2-6B/data/131w/validate.json --prompt_column conversations --overwrite_cache --model_name_or_path /data/chatglm2-6b --output_dir /data/chatglm2-6b-code/Finetune-ChatGLM2-6B/output/output0705-1 --overwrite_output_dir --max_length 762 --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --gradient_accumulation_steps 12 --predict_with_generate --num_train_epochs 3 --logging_steps 50 --save_steps 1000000 --learning_rate 6e-6 --do_eval False --fp16 True --save_total_limit 5
[2023-07-05 08:26:10,601] [INFO] [launch.py:229:main] WORLD INFO DICT: {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]}
[2023-07-05 08:26:10,601] [INFO] [launch.py:235:main] nnodes=1, num_local_procs=8, node_rank=0
[2023-07-05 08:26:10,601] [INFO] [launch.py:246:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]})
[2023-07-05 08:26:10,601] [INFO] [launch.py:247:main] dist_world_size=8
[2023-07-05 08:26:10,601] [INFO] [launch.py:249:main] Setting CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
[2023-07-05 08:26:20,093] [INFO] [comm.py:622:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
------------device--------:cuda:3
------------device--------:cuda:6
------------device--------:cuda:5
------------device--------:cuda:0
------------device--------:cuda:1
------------device--------:cuda:4
------------device--------:cuda:7
------------device--------:cuda:2
Using the WANDB_DISABLED
environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the WANDB_DISABLED
environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
07/05/2023 08:26:21 - WARNING - main - Process rank: 1, device: cuda:1, n_gpu: 1distributed training: True, 16-bits training: True
07/05/2023 08:26:21 - WARNING - main - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, 16-bits training: True
export CUDA_VISIBLE_DEVICES=0,1,2 好像不起作用
[INFO|modeling_utils.py:2927] 2023-07-13 06:17:15,679 >> Generation config file not found, using a generation config created from the model config.
input_ids [64790, 64792, 790, 30951, 517, 30910, 30940, 30996, 13, 13, 54761, 31211, 37234, 31211, 50769, 32096, 34009, 38372, 30939, 30940, 32074, 31643, 35220, 31715, 31123, 31654, 50769, 54561, 32585, 31715, 30943, 32154, 31123, 31783, 54572, 54818, 32074, 54942, 32326, 55055, 31514, 13, 13, 55437, 31211, 30910, 36037, 31809, 32615, 31201, 52116, 31201, 36583, 32927, 31639, 31155, 34992, 31662, 40384, 31211, 32615, 57907, 52116, 59086, 31643, 53668, 31868, 31155, 13, 31659, 50769, 32096, 34009, 54942, 30943, 32154, 31123, 31672, 31804, 52116, 54541, 30943, 38807, 31155, 47322, 32096, 34009, 54552, 38372, 30939, 30940, 32074, 31643, 35220, 31715, 31123, 31814, 31804, 38903, 30939, 30940, 32074, 31123, 54996, 30978, 30940, 30940, 56315, 31155, 13, 31672, 50769, 54818, 32074, 39357, 32585, 54541, 30910, 30943, 32154, 1381, 30910, 30978, 30940, 30940, 56315, 542, 30910, 30940, 30930, 30940, 30940, 30966, 30966, 32154, 30967, 56315, 40663, 30910, 30966, 30930, 30966, 55055, 30967, 56315, 31155, 13, 33161, 31211, 50769, 54818, 32074, 54942, 30966, 30930, 30966, 55055, 31155, 13, 13, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None]
Traceback (most recent call last):
File "main.py", line 376, in
main()
File "main.py", line 207, in main
print_dataset_example(train_dataset[0])
File "main.py", line 186, in print_dataset_example
print("inputs", tokenizer.decode(example["input_ids"]))
File "/opt/conda/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 3509, in decode
return self._decode(
File "/opt/conda/lib/python3.8/site-packages/transformers/tokenization_utils.py", line 931, in _decode
filtered_tokens = self.convert_ids_to_tokens(token_ids, skip_special_tokens=skip_special_tokens)
File "/opt/conda/lib/python3.8/site-packages/transformers/tokenization_utils.py", line 906, in convert_ids_to_tokens
index = int(index)
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'
一下是我的参数
LR=6e-6
DATE=0704
EPOCH=2
MAX_LEN=1024
MASTER_PORT=8888
deepspeed --num_gpus=8 --master_port $MASTER_PORT main.py
--deepspeed deepspeed.json
--do_train
--do_eval
--train_file car_train.json
--validation_file car_dev.json
--prompt_column conversations
--overwrite_cache
--model_name_or_path /data/project/th/chatglm2-6b
--output_dir ./output/adgen-chatglm-6b-ft-$LR-$DATE-$MAX_LEN-epoch-$EPOCH
--overwrite_output_dir
--max_length $MAX_LEN
--per_device_train_batch_size 8
--per_device_eval_batch_size 1
--gradient_accumulation_steps 2
--predict_with_generate
--num_train_epochs $EPOCH
--logging_steps 20
--max_steps 1000
--save_steps 500
--learning_rate $LR
--do_eval False
--fp16 True
--save_total_limit 5 \
模型加载正常,但是在Runing tokenizer on train dataset这一步卡住,进度条一直不变
使用的显卡型号为A800 80G,训练时开了两张卡。CPU的型号为Intel(R) Xeon(R) Silver 4316 CPU,开了4个核
下面是训练的参数设置
#!/bin/bash
LR=6e-6
DATE=1009
deepspeed --num_gpus=2 main.py
--deepspeed deepspeed.json
--do_train
--train_file /data/nobel/code/glm/Finetune-ChatGLM2-6B/dataset/trainset.json
--overwrite_cache
--model_name_or_path /data/nobel/code/glm/ChatGLM2-6B/chatglm2-6b
--output_dir ./output/adgen-chatglm-6b-ft-$LR-$DATE
--overwrite_output_dir
--preprocessing_num_workers 4
--max_length 1000
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--gradient_accumulation_steps 12
--predict_with_generate
--num_train_epochs 3
--logging_steps 20
--save_steps 1000
--learning_rate $LR
--do_eval False
--fp16 True
--save_total_limit 5 \
--max_length 762,我想问一下这个是输入的最大长度吗,如果是,那么如果是多轮对话,有长度限制吗?
OSError: Can't get source for <function apply_rotary_pos_emb at 0x7fef8c15f790>.
TorchScript requires source access in order to carry out compilation, make sure
original .py files are available.
在实践中注意到ChatGLM的tokenizer跟其它模型的不太一样,其用的是'padding_side=left',也就是说ChatGLM的tokenzier会在一开始的时候填充pad_token,但我注意到作者您似乎不仅用的是eos_token来padding,而且还padding到末位,这是否会影响到Chatglm finetune的性能呢?
CUDA out of memory. Tried to allocate 11.63 GiB (GPU 0; 23.69 GiB total capacity; 11.63 GiB already allocated; 11.28 GiB free; 11.63 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
[2023-07-11 08:56:40,747] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 16794
请问全量微调硬件条件是什么啊,一张3090可以跑吗?两张呢?
timed out initializing process group in store based barrier on rank
请问大佬在全量微调时,显存占用是多少呢?
deepspeed --num_gpus=4 --master_port $MASTER_PORT main.py
--deepspeed deepspeed.json
--quantization_bit 8
...
在V100机器上进行4卡训练,加上--quantization_bit 8避免oom,训练一个epoch后,得到的模型进行推理,推理效果非常差。另外通过web_demo2.py启动web服务,经常回答输出一点就停了,观测推理进程是正常的。
tokenizer = AutoTokenizer.from_pretrained("/xxx/ChatGLM2-6B/THUDM/chatglm2-6b-int4", trust_remote_code=True) model = AutoModel.from_pretrained("/xxx/ChatGLM2-6B/output/adgen-chatglm2-6b-ft-1e-4/checkpoint-15000", trust_remote_code=True).cuda(1)
大佬好,有考虑 加入lora训练吗
微调出来的模型可以直接用chatglm2官网发版的web_demo2.py来测吗,会有影响吗?
deepspeed_init函数没有resume_from_checkpoint参数,但是你的源码里面用到了
多轮数据下载后要如何处理?
训练时是否对多轮的label mask 有特殊处理
已解决,需修改deepspeed的timeout参数
我是用双卡A100 80G做全量微调,但是一直爆OOM的错误。
db_train_finetune.sh内容是:
MASTER_PORT=8888
deepspeed --num_gpus=1 --master_port $MASTER_PORT main.py
--deepspeed deepspeed.json
--do_train
--do_eval
--train_file belleMath-train5K.json
--validation_file belleMath-dev1K.json
--prompt_column conversations
--overwrite_cache
--model_name_or_path ../models/ChatGLM2_model
--output_dir ./output/adgen-chatglm-6b-ft-$LR-$DATE
--overwrite_output_dir
--max_length 64
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--gradient_accumulation_steps 12
--predict_with_generate
--num_train_epochs 3
--logging_steps 20
--save_steps 1000
--learning_rate $LR
--do_eval False
--fp16 True
--save_total_limit 5 \
deepspeed用的是项目默认的配置:
{
"optimizer": {
"type": "AdamW",
"params": {
"lr": "auto",
"weight_decay": "auto",
"torch_adam": true,
"adam_w_mode": true
}
},
"scheduler": {
"type": "WarmupDecayLR",
"params": {
"warmup_min_lr": "auto",
"warmup_max_lr": "auto",
"warmup_num_steps": "auto",
"total_num_steps": "auto"
}
},
"fp16": {
"enabled": "auto",
"loss_scale": 0,
"loss_scale_window": 1000,
"initial_scale_power": 16,
"hysteresis": 2,
"min_loss_scale": 1
},
"zero_optimization": {
"stage": 2,
"allgather_partitions": true,
"allgather_bucket_size": 2e8,
"reduce_scatter": true,
"reduce_bucket_size": "auto",
"overlap_comm": true,
"contiguous_gradients": true
},
"gradient_accumulation_steps": "auto",
"gradient_clipping": "auto",
"steps_per_print": 20,
"train_batch_size": "auto",
"train_micro_batch_size_per_gpu": "auto",
"wall_clock_breakdown": false
}
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.