Git Product home page Git Product logo

finetune-chatglm2-6b's Introduction

finetune-chatglm2-6b's People

Contributors

spongebbob avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

finetune-chatglm2-6b's Issues

RuntimeError: CUDA error: invalid device ordinal

请教下大佬这个怎么调?

───────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /data/chatglm2-6b-code/Finetune-ChatGLM2-6B/main.py:377 in │
│ │
│ 374 │
│ 375 │
│ 376 if name == "main": │
│ ❱ 377 │ main() │
│ 378 │
│ │
│ /data/chatglm2-6b-code/Finetune-ChatGLM2-6B/main.py:61 in main │
│ │
│ 58 │ │ # let's parse it to get our arguments. │
│ 59 │ │ model_args, data_args, training_args = parser.parse_json_file(json_file=os.path. │
│ 60 │ else: │
│ ❱ 61 │ │ model_args, data_args, training_args = parser.parse_args_into_dataclasses() │
│ 62 │ # Setup logging │
│ 63 │ logging.basicConfig( │
│ 64 │ │ format="%(asctime)s - %(levelname)s - %(name)s - %(message)s", │
│ │
│ /opt/conda/lib/python3.10/site-packages/transformers/hf_argparser.py:332 in │
│ parse_args_into_dataclasses │
│ │
│ 329 │ │ │ inputs = {k: v for k, v in vars(namespace).items() if k in keys} │
│ 330 │ │ │ for k in keys: │
│ 331 │ │ │ │ delattr(namespace, k) │
│ ❱ 332 │ │ │ obj = dtype(**inputs) │
│ 333 │ │ │ outputs.append(obj) │
│ 334 │ │ if len(namespace.dict) > 0: │
│ 335 │ │ │ # additional namespace. │
│ in init:113 │
│ │
│ /opt/conda/lib/python3.10/site-packages/transformers/training_args.py:1227 in post_init
│ │
│ 1224 │ │ if ( │
│ 1225 │ │ │ self.framework == "pt" │
│ 1226 │ │ │ and is_torch_available() │
│ ❱ 1227 │ │ │ and (self.device.type != "cuda") │
│ 1228 │ │ │ and (get_xla_device_type(self.device) != "GPU") │
│ 1229 │ │ │ and (self.fp16 or self.fp16_full_eval) │
│ 1230 │ │ ): │
│ │
│ /opt/conda/lib/python3.10/site-packages/transformers/training_args.py:1660 in device │
│ │
│ 1657 │ │ The device used by this process. │
│ 1658 │ │ """ │
│ 1659 │ │ requires_backends(self, ["torch"]) │
│ ❱ 1660 │ │ return self._setup_devices │
│ 1661 │ │
│ 1662 │ @Property
│ 1663 │ def n_gpu(self): │
│ │
│ /opt/conda/lib/python3.10/site-packages/transformers/utils/generic.py:54 in get
│ │
│ 51 │ │ attr = "_cached" + self.fget.name
│ 52 │ │ cached = getattr(obj, attr, None) │
│ 53 │ │ if cached is None: │
│ ❱ 54 │ │ │ cached = self.fget(obj) │
│ 55 │ │ │ setattr(obj, attr, cached) │
│ 56 │ │ return cached │
│ 57 │
│ │
│ /opt/conda/lib/python3.10/site-packages/transformers/training_args.py:1650 in _setup_devices │
│ │
│ 1647 │ │ │
│ 1648 │ │ if device.type == "cuda": │
│ 1649 │ │ │ print(f"------------device--------:{device}") │
│ ❱ 1650 │ │ │ torch.cuda.set_device(device) │
│ 1651 │ │ │
│ 1652 │ │ return device │
│ 1653 │
│ │
│ /opt/conda/lib/python3.10/site-packages/torch/cuda/init.py:326 in set_device │
│ │
│ 323 │ """ │
│ 324 │ device = _get_device_index(device) │
│ 325 │ if device >= 0: │
│ ❱ 326 │ │ torch._C._cuda_setDevice(device) │
│ 327 │
│ 328 │
│ 329 def get_device_name(device: Optional[_device_t] = None) -> str: │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

sh ds_train_finetune.sh
[2023-07-05 08:26:05,121] [WARNING] [runner.py:191:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
Detected CUDA_VISIBLE_DEVICES=0,1,2 but ignoring it because one or several of --include/--exclude/--num_gpus/--num_nodes cl args were used. If you want to use CUDA_VISIBLE_DEVICES don't pass any of these arguments to deepspeed.
[2023-07-05 08:26:05,168] [INFO] [runner.py:541:main] cmd = /opt/conda/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMSwgMiwgMywgNCwgNSwgNiwgN119 --master_addr=127.0.0.1 --master_port=8888 --enable_each_rank_log=None main.py --deepspeed deepspeed.json --do_train --do_eval --train_file /data/chatglm2-6b-code/Finetune-ChatGLM2-6B/data/131w/train.json --validation_file /data/chatglm2-6b-code/Finetune-ChatGLM2-6B/data/131w/validate.json --prompt_column conversations --overwrite_cache --model_name_or_path /data/chatglm2-6b --output_dir /data/chatglm2-6b-code/Finetune-ChatGLM2-6B/output/output0705-1 --overwrite_output_dir --max_length 762 --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --gradient_accumulation_steps 12 --predict_with_generate --num_train_epochs 3 --logging_steps 50 --save_steps 1000000 --learning_rate 6e-6 --do_eval False --fp16 True --save_total_limit 5
[2023-07-05 08:26:10,601] [INFO] [launch.py:229:main] WORLD INFO DICT: {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]}
[2023-07-05 08:26:10,601] [INFO] [launch.py:235:main] nnodes=1, num_local_procs=8, node_rank=0
[2023-07-05 08:26:10,601] [INFO] [launch.py:246:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]})
[2023-07-05 08:26:10,601] [INFO] [launch.py:247:main] dist_world_size=8
[2023-07-05 08:26:10,601] [INFO] [launch.py:249:main] Setting CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
[2023-07-05 08:26:20,093] [INFO] [comm.py:622:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
------------device--------:cuda:3
------------device--------:cuda:6
------------device--------:cuda:5
------------device--------:cuda:0
------------device--------:cuda:1
------------device--------:cuda:4
------------device--------:cuda:7
------------device--------:cuda:2
Using the WANDB_DISABLED environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the WANDB_DISABLED environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
07/05/2023 08:26:21 - WARNING - main - Process rank: 1, device: cuda:1, n_gpu: 1distributed training: True, 16-bits training: True
07/05/2023 08:26:21 - WARNING - main - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, 16-bits training: True

export CUDA_VISIBLE_DEVICES=0,1,2 好像不起作用

模型训练,input_ids出现None type

[INFO|modeling_utils.py:2927] 2023-07-13 06:17:15,679 >> Generation config file not found, using a generation config created from the model config.
input_ids [64790, 64792, 790, 30951, 517, 30910, 30940, 30996, 13, 13, 54761, 31211, 37234, 31211, 50769, 32096, 34009, 38372, 30939, 30940, 32074, 31643, 35220, 31715, 31123, 31654, 50769, 54561, 32585, 31715, 30943, 32154, 31123, 31783, 54572, 54818, 32074, 54942, 32326, 55055, 31514, 13, 13, 55437, 31211, 30910, 36037, 31809, 32615, 31201, 52116, 31201, 36583, 32927, 31639, 31155, 34992, 31662, 40384, 31211, 32615, 57907, 52116, 59086, 31643, 53668, 31868, 31155, 13, 31659, 50769, 32096, 34009, 54942, 30943, 32154, 31123, 31672, 31804, 52116, 54541, 30943, 38807, 31155, 47322, 32096, 34009, 54552, 38372, 30939, 30940, 32074, 31643, 35220, 31715, 31123, 31814, 31804, 38903, 30939, 30940, 32074, 31123, 54996, 30978, 30940, 30940, 56315, 31155, 13, 31672, 50769, 54818, 32074, 39357, 32585, 54541, 30910, 30943, 32154, 1381, 30910, 30978, 30940, 30940, 56315, 542, 30910, 30940, 30930, 30940, 30940, 30966, 30966, 32154, 30967, 56315, 40663, 30910, 30966, 30930, 30966, 55055, 30967, 56315, 31155, 13, 33161, 31211, 50769, 54818, 32074, 54942, 30966, 30930, 30966, 55055, 31155, 13, 13, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None]
Traceback (most recent call last):
File "main.py", line 376, in
main()
File "main.py", line 207, in main
print_dataset_example(train_dataset[0])
File "main.py", line 186, in print_dataset_example
print("inputs", tokenizer.decode(example["input_ids"]))
File "/opt/conda/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 3509, in decode
return self._decode(
File "/opt/conda/lib/python3.8/site-packages/transformers/tokenization_utils.py", line 931, in _decode
filtered_tokens = self.convert_ids_to_tokens(token_ids, skip_special_tokens=skip_special_tokens)
File "/opt/conda/lib/python3.8/site-packages/transformers/tokenization_utils.py", line 906, in convert_ids_to_tokens
index = int(index)
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'

我只有200多条多轮对话的数据,去做全参微调能有效果吗?

一下是我的参数
LR=6e-6
DATE=0704
EPOCH=2
MAX_LEN=1024
MASTER_PORT=8888
deepspeed --num_gpus=8 --master_port $MASTER_PORT main.py
--deepspeed deepspeed.json
--do_train
--do_eval
--train_file car_train.json
--validation_file car_dev.json
--prompt_column conversations
--overwrite_cache
--model_name_or_path /data/project/th/chatglm2-6b
--output_dir ./output/adgen-chatglm-6b-ft-$LR-$DATE-$MAX_LEN-epoch-$EPOCH
--overwrite_output_dir
--max_length $MAX_LEN
--per_device_train_batch_size 8
--per_device_eval_batch_size 1
--gradient_accumulation_steps 2
--predict_with_generate
--num_train_epochs $EPOCH
--logging_steps 20
--max_steps 1000
--save_steps 500
--learning_rate $LR
--do_eval False
--fp16 True
--save_total_limit 5 \

微调时在runing tokenizer on train dataset这一步卡住

模型加载正常,但是在Runing tokenizer on train dataset这一步卡住,进度条一直不变
使用的显卡型号为A800 80G,训练时开了两张卡。CPU的型号为Intel(R) Xeon(R) Silver 4316 CPU,开了4个核
下面是训练的参数设置

#!/bin/bash

LR=6e-6
DATE=1009

deepspeed --num_gpus=2 main.py
--deepspeed deepspeed.json
--do_train
--train_file /data/nobel/code/glm/Finetune-ChatGLM2-6B/dataset/trainset.json
--overwrite_cache
--model_name_or_path /data/nobel/code/glm/ChatGLM2-6B/chatglm2-6b
--output_dir ./output/adgen-chatglm-6b-ft-$LR-$DATE
--overwrite_output_dir
--preprocessing_num_workers 4
--max_length 1000
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--gradient_accumulation_steps 12
--predict_with_generate
--num_train_epochs 3
--logging_steps 20
--save_steps 1000
--learning_rate $LR
--do_eval False
--fp16 True
--save_total_limit 5 \

do_eval时报错

image
trainer.evaluate报错,说只能支持Stage 3。

image
但切换到Stage 3会报另一个错误。

ChatGLM数据处理中input_ids的padding问题

在实践中注意到ChatGLM的tokenizer跟其它模型的不太一样,其用的是'padding_side=left',也就是说ChatGLM的tokenzier会在一开始的时候填充pad_token,但我注意到作者您似乎不仅用的是eos_token来padding,而且还padding到末位,这是否会影响到Chatglm finetune的性能呢?

CUDA out of memory. Tried to allocate 11.63 GiB (GPU 0; 23.69 GiB total capacity; 11.63 GiB already allocated; 11.28 GiB free

CUDA out of memory. Tried to allocate 11.63 GiB (GPU 0; 23.69 GiB total capacity; 11.63 GiB already allocated; 11.28 GiB free; 11.63 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
[2023-07-11 08:56:40,747] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 16794

请问全量微调硬件条件是什么啊,一张3090可以跑吗?两张呢?

timed out

timed out initializing process group in store based barrier on rank

基于school_math_0.25M.json数据集进行微调训练后得到的模型推理效果很差,是什么原因?

deepspeed --num_gpus=4 --master_port $MASTER_PORT main.py
--deepspeed deepspeed.json
--quantization_bit 8
...
在V100机器上进行4卡训练,加上--quantization_bit 8避免oom,训练一个epoch后,得到的模型进行推理,推理效果非常差。另外通过web_demo2.py启动web服务,经常回答输出一点就停了,观测推理进程是正常的。

tokenizer = AutoTokenizer.from_pretrained("/xxx/ChatGLM2-6B/THUDM/chatglm2-6b-int4", trust_remote_code=True) model = AutoModel.from_pretrained("/xxx/ChatGLM2-6B/output/adgen-chatglm2-6b-ft-1e-4/checkpoint-15000", trust_remote_code=True).cuda(1)

训练数据过多时报错Socket Timeout

8卡A100采用200万数据按照脚本训练,运行正常,但超过230万后,会在Running tokenizer on train dataset这一步时报错:
This may indicate a possible application crash on rank 0 or a network set up issue.[7] is setting up NCCL communicator and retrieving ncclUniqueId from [0] via c10d key-value store by key '0', but store->get('0') got error: Socket Timeout
Exception raised from recvBytes at ../torch/csrc/distributed/c10d/Utils.hpp:604 (most recent call first)
不知各位大神是否遇到同样问题

已解决,需修改deepspeed的timeout参数

请问全参数微调需要什么配置呢?

我是用双卡A100 80G做全量微调,但是一直爆OOM的错误。
db_train_finetune.sh内容是:

MASTER_PORT=8888

deepspeed --num_gpus=1 --master_port $MASTER_PORT main.py
--deepspeed deepspeed.json
--do_train
--do_eval
--train_file belleMath-train5K.json
--validation_file belleMath-dev1K.json
--prompt_column conversations
--overwrite_cache
--model_name_or_path ../models/ChatGLM2_model
--output_dir ./output/adgen-chatglm-6b-ft-$LR-$DATE
--overwrite_output_dir
--max_length 64
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--gradient_accumulation_steps 12
--predict_with_generate
--num_train_epochs 3
--logging_steps 20
--save_steps 1000
--learning_rate $LR
--do_eval False
--fp16 True
--save_total_limit 5 \

deepspeed用的是项目默认的配置:
{
"optimizer": {
"type": "AdamW",
"params": {
"lr": "auto",
"weight_decay": "auto",
"torch_adam": true,
"adam_w_mode": true
}
},
"scheduler": {
"type": "WarmupDecayLR",
"params": {
"warmup_min_lr": "auto",
"warmup_max_lr": "auto",
"warmup_num_steps": "auto",
"total_num_steps": "auto"
}
},
"fp16": {
"enabled": "auto",
"loss_scale": 0,
"loss_scale_window": 1000,
"initial_scale_power": 16,
"hysteresis": 2,
"min_loss_scale": 1
},
"zero_optimization": {
"stage": 2,
"allgather_partitions": true,
"allgather_bucket_size": 2e8,
"reduce_scatter": true,
"reduce_bucket_size": "auto",
"overlap_comm": true,
"contiguous_gradients": true
},
"gradient_accumulation_steps": "auto",
"gradient_clipping": "auto",
"steps_per_print": 20,
"train_batch_size": "auto",
"train_micro_batch_size_per_gpu": "auto",
"wall_clock_breakdown": false
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.