我使用ppo ray进行训练。训练会正常的进行若干步，随后出现错误 <div class="snippet-clipboard-content notranslat

我用的机器内存蛮大的，有2T 下面是启动脚本 <div class="snippet-clipboard-content not

好像用Lora来训练就暂时没有报错。不过还是希望可以全量训练。 <div class="snippet-clipboard-con

检查是不是用的我们提供的 docker image <a href="https://github.com/OpenLLMAI/OpenRLHF/tree/ma

Strange Kill of Critic Model,about openllmai/openrlhf

Comments (5)

Ricardokevins commented on July 30, 2024

对应的err文件

:job_id:03000000
[2024-05-24 18:57:09,526] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
:actor_name:CriticModelRayActor
[2024-05-24 18:58:07,087] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-05-24 18:58:07,088] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
LLMForSequenceRegression(
(model): LlamaModel(
(embed_tokens): Embedding(128256, 4096)
(layers): ModuleList(
(0-31): 32 x LlamaDecoderLayer(
(self_attn): LlamaFlashAttention2(
(q_proj): Linear(in_features=4096, out_features=4096, bias=False)
(k_proj): Linear(in_features=4096, out_features=1024, bias=False)
(v_proj): Linear(in_features=4096, out_features=1024, bias=False)
(o_proj): Linear(in_features=4096, out_features=4096, bias=False)
(rotary_emb): LlamaRotaryEmbedding()
)
(mlp): LlamaMLP(
(gate_proj): Linear(in_features=4096, out_features=14336, bias=False)
(up_proj): Linear(in_features=4096, out_features=14336, bias=False)
(down_proj): Linear(in_features=14336, out_features=4096, bias=False)
(act_fn): SiLU()
)
(input_layernorm): LlamaRMSNorm()
(post_attention_layernorm): LlamaRMSNorm()
)
)
(norm): LlamaRMSNorm()
)
(value_head): Linear(in_features=4096, out_features=1, bias=False)
)
reward normalization status: True
mean: tensor([0.], dtype=torch.bfloat16), std tensor([1.], dtype=torch.bfloat16)
Time to load cpu_adam op: 2.4042906761169434 seconds
[2024-05-24 18:58:16,843] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.13.5, git-hash=unknown, git-branch=unknown
[2024-05-24 18:58:16,843] [INFO] [comm.py:662:init_distributed] Distributed backend already initialized
Adam Optimizer #0 is created with AVX512 arithmetic capability.
Config: alpha=0.000009, betas=(0.900000, 0.950000), weight_decay=0.000000, adam_w=1
n136-112-040:375881:375881 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0
n136-112-040:375881:375881 [0] NCCL INFO Bootstrap : Using eth0:10.136.112.40<0>
n136-112-040:375881:375881 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
n136-112-040:375881:375881 [0] NCCL INFO cudaDriverVersion 12010
NCCL version 2.20.5+cuda12.4
n136-112-040:375881:377674 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
n136-112-040:375881:377674 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0
n136-112-040:375881:377674 [0] NCCL INFO NCCL_IB_HCA set to mlx5
n136-112-040:375881:377674 [0] NCCL INFO NET/IB : No device found.
n136-112-040:375881:377674 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0
n136-112-040:375881:377674 [0] NCCL INFO NET/Socket : Using [0]eth0:10.136.112.40<0>
n136-112-040:375881:377674 [0] NCCL INFO Using non-device net plugin version 0
n136-112-040:375881:377674 [0] NCCL INFO Using network Socket
n136-112-040:375881:377674 [0] NCCL INFO comm 0x1bbd32a0 rank 0 nranks 2 cudaDev 0 nvmlDev 2 busId 4a000 commId 0xb015bed65ddba90d - Init START
n136-112-040:375881:377674 [0] NCCL INFO Setting affinity for GPU 2 to ffffffff,00000000,ffffffff
n136-112-040:375881:377674 [0] NCCL INFO comm 0x1bbd32a0 rank 0 nRanks 2 nNodes 1 localRanks 2 localRank 0 MNNVL 0
n136-112-040:375881:377674 [0] NCCL INFO Channel 00/24 : 0 1
n136-112-040:375881:377674 [0] NCCL INFO Channel 01/24 : 0 1
n136-112-040:375881:377674 [0] NCCL INFO Channel 02/24 : 0 1
n136-112-040:375881:377674 [0] NCCL INFO Channel 03/24 : 0 1
n136-112-040:375881:377674 [0] NCCL INFO Channel 04/24 : 0 1
n136-112-040:375881:377674 [0] NCCL INFO Channel 05/24 : 0 1
n136-112-040:375881:377674 [0] NCCL INFO Channel 06/24 : 0 1
n136-112-040:375881:377674 [0] NCCL INFO Channel 07/24 : 0 1
n136-112-040:375881:377674 [0] NCCL INFO Channel 08/24 : 0 1
n136-112-040:375881:377674 [0] NCCL INFO Channel 09/24 : 0 1
n136-112-040:375881:377674 [0] NCCL INFO Channel 10/24 : 0 1
n136-112-040:375881:377674 [0] NCCL INFO Channel 11/24 : 0 1
n136-112-040:375881:377674 [0] NCCL INFO Channel 12/24 : 0 1
n136-112-040:375881:377674 [0] NCCL INFO Channel 13/24 : 0 1
n136-112-040:375881:377674 [0] NCCL INFO Channel 14/24 : 0 1
n136-112-040:375881:377674 [0] NCCL INFO Channel 15/24 : 0 1
n136-112-040:375881:377674 [0] NCCL INFO Channel 16/24 : 0 1
n136-112-040:375881:377674 [0] NCCL INFO Channel 17/24 : 0 1
n136-112-040:375881:377674 [0] NCCL INFO Channel 18/24 : 0 1
n136-112-040:375881:377674 [0] NCCL INFO Channel 19/24 : 0 1
n136-112-040:375881:377674 [0] NCCL INFO Channel 20/24 : 0 1
n136-112-040:375881:377674 [0] NCCL INFO Channel 21/24 : 0 1
n136-112-040:375881:377674 [0] NCCL INFO Channel 22/24 : 0 1
n136-112-040:375881:377674 [0] NCCL INFO Channel 23/24 : 0 1
n136-112-040:375881:377674 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1 [4] 1/-1/-1->0->-1 [5] 1/-1/-1->0->-1 [6] -1/-1/-1->0->1 [7] -1/-1/-1->0->1 [8] -1/-1/-1->0->1 [9] -1/-1/-1->0->1 [10] -1/-1/-1->0->1 [11] -1/-1/-1->0->1 [12] 1/-1/-1->0->-1 [13] 1/-1/-1->0->-1 [14] 1/-1/-1->0->-1 [15] 1/-1/-1->0->-1 [16] 1/-1/-1->0->-1 [17] 1/-1/-1->0->-1 [18] -1/-1/-1->0->1 [19] -1/-1/-1->0->1 [20] -1/-1/-1->0->1 [21] -1/-1/-1->0->1 [22] -1/-1/-1->0->1 [23] -1/-1/-1->0->1
n136-112-040:375881:377674 [0] NCCL INFO P2P Chunksize set to 524288
n136-112-040:375881:377674 [0] NCCL INFO Channel 00/0 : 0[2] -> 1[3] via P2P/CUMEM/read
n136-112-040:375881:377674 [0] NCCL INFO Channel 01/0 : 0[2] -> 1[3] via P2P/CUMEM/read
n136-112-040:375881:377674 [0] NCCL INFO Channel 02/0 : 0[2] -> 1[3] via P2P/CUMEM/read
n136-112-040:375881:377674 [0] NCCL INFO Channel 03/0 : 0[2] -> 1[3] via P2P/CUMEM/read
n136-112-040:375881:377674 [0] NCCL INFO Channel 04/0 : 0[2] -> 1[3] via P2P/CUMEM/read
n136-112-040:375881:377674 [0] NCCL INFO Channel 05/0 : 0[2] -> 1[3] via P2P/CUMEM/read
n136-112-040:375881:377674 [0] NCCL INFO Channel 06/0 : 0[2] -> 1[3] via P2P/CUMEM/read
n136-112-040:375881:377674 [0] NCCL INFO Channel 07/0 : 0[2] -> 1[3] via P2P/CUMEM/read
n136-112-040:375881:377674 [0] NCCL INFO Channel 08/0 : 0[2] -> 1[3] via P2P/CUMEM/read
n136-112-040:375881:377674 [0] NCCL INFO Channel 09/0 : 0[2] -> 1[3] via P2P/CUMEM/read
n136-112-040:375881:377674 [0] NCCL INFO Channel 10/0 : 0[2] -> 1[3] via P2[2024-05-24 18:58:21,492] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
[2024-05-24 18:58:21,493] [INFO] [logging.py:96:log_dist] [Rank 0] Using client Optimizer as basic optimizer
[2024-05-24 18:58:21,493] [INFO] [logging.py:96:log_dist] [Rank 0] Removing param_group that has no 'params' in the basic Optimizer
[2024-05-24 18:58:21,504] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Basic Optimizer = DeepSpeedCPUAdam
[2024-05-24 18:58:21,504] [INFO] [utils.py:56:is_zero_supported_optimizer] Checking ZeRO support for optimizer=DeepSpeedCPUAdam type=<class 'deepspeed.ops.adam.cpu_adam.DeepSpeedCPUAdam'>
[2024-05-24 18:58:21,504] [INFO] [logging.py:96:log_dist] [Rank 0] Creating torch.bfloat16 ZeRO stage 2 optimizer
[2024-05-24 18:58:21,504] [INFO] [stage_1_and_2.py:149:init] Reduce bucket size 500,000,000
[2024-05-24 18:58:21,504] [INFO] [stage_1_and_2.py:150:init] Allgather bucket size 500,000,000
[2024-05-24 18:58:21,504] [INFO] [stage_1_and_2.py:151:init] CPU Offload: True
[2024-05-24 18:58:21,504] [INFO] [stage_1_and_2.py:152:init] Round robin gradient partitioning: False
[2024-05-24 18:58:40,366] [INFO] [utils.py:800:see_memory_usage] Before initializing optimizer states
[2024-05-24 18:58:40,367] [INFO] [utils.py:801:see_memory_usage] MA 15.08 GB Max_MA 15.08 GB CA 15.59 GB Max_CA 16 GB
[2024-05-24 18:58:40,367] [INFO] [utils.py:808:see_memory_usage] CPU Virtual Memory: used = 222.94 GB, percent = 11.1%
[2024-05-24 18:58:47,530] [INFO] [utils.py:800:see_memory_usage] After initializing optimizer states
[2024-05-24 18:58:47,530] [INFO] [utils.py:801:see_memory_usage] MA 15.08 GB Max_MA 15.08 GB CA 15.59 GB Max_CA 16 GB
[2024-05-24 18:58:47,530] [INFO] [utils.py:808:see_memory_usage] CPU Virtual Memory: used = 245.8 GB, percent = 12.2%
[2024-05-24 18:58:47,530] [INFO] [stage_1_and_2.py:539:init] optimizer state initialized
[2024-05-24 18:58:47,639] [INFO] [utils.py:800:see_memory_usage] After initializing ZeRO optimizer
[2024-05-24 18:58:47,639] [INFO] [utils.py:801:see_memory_usage] MA 15.08 GB Max_MA 15.08 GB CA 15.59 GB Max_CA 16 GB
[2024-05-24 18:58:47,639] [INFO] [utils.py:808:see_memory_usage] CPU Virtual Memory: used = 245.78 GB, percent = 12.2%
[2024-05-24 18:58:47,643] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Final Optimizer = DeepSpeedCPUAdam
[2024-05-24 18:58:47,643] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed using client LR scheduler
[2024-05-24 18:58:47,643] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed LR Scheduler = <torch.optim.lr_scheduler.LambdaLR object at 0x7fa8334f8ed0>
[2024-05-24 18:58:47,643] [INFO] [logging.py:96:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.95), (0.9, 0.95)]
[2024-05-24 18:58:47,644] [INFO] [config.py:996:print] DeepSpeedEngine configuration:
[2024-05-24 18:58:47,644] [INFO] [config.py:1000:print] activation_checkpointing_config {
"partition_activations": false,
"contiguous_memory_optimization": false,
"cpu_checkpointing": false,
"number_checkpoints": null,
"synchronize_checkpoint_boundary": false,
"profile": false
}
[2024-05-24 18:58:47,644] [INFO] [config.py:1000:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2024-05-24 18:58:47,644] [INFO] [config.py:1000:print] amp_enabled .................. False
[2024-05-24 18:58:47,644] [INFO] [config.py:1000:print] amp_params ................... False
[2024-05-24 18:58:47,644] [INFO] [config.py:1000:print] autotuning_config ............ {
"enabled": false,
"start_step": null,
"end_step": null,
"metric_path": null,
"arg_mappings": null,
"metric": "throughput",
"model_info": null,
"results_dir": "autotuning_results",
"exps_dir": "autotuning_exps",
"overwrite": true,
"fast": true,
"start_profile_step": 3,
"end_profile_step": 5,
"tuner_type": "gridsearch",
"tuner_early_stopping": 5,
"tuner_num_trials": 50,
"model_info_path": null,
"mp_size": 1,
"max_train_batch_size": null,
"min_train_batch_size": 1,
"max_train_micro_batch_size_per_gpu": 1.024000e+03,
"min_train_micro_batch_size_per_gpu": 1,
"num_tuning_micro_batch_sizes": 3
}
[2024-05-24 18:58:47,644] [INFO] [config.py:1000:print] bfloat16_enabled ............. True
[2024-05-24 18:58:47,644] [INFO] [config.py:1000:print] bfloat16_immediate_grad_update False
[2024-05-24 18:58:47,644] [INFO] [config.py:1000:print] checkpoint_parallel_write_pipeline False
[2024-05-24 18:58:47,644] [INFO] [config.py:1000:print] checkpoint_tag_validation_enabled True
[2024-05-24 18:58:47,644] [INFO] [config.py:1000:print] checkpoint_tag_validation_fail False
[2024-05-24 18:58:47,644] [INFO] [config.py:1000:print] comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x7fa832d61a50>
[2024-05-24 18:58:47,644] [INFO] [config.py:1000:print] communication_data_type ...... None
[2024-05-24 18:58:47,644] [INFO] [config.py:1000:print] compile_config ............... enabled=False backend='inductor' kwargs={}
[2024-05-24 18:58:47,644] [INFO] [config.py:1000:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}}
[2024-05-24 18:58:47,644] [INFO] [config.py:1000:print] curriculum_enabled_legacy .... False
[2024-05-24 18:58:47,644] [INFO] [config.py:1000:print] curriculum_params_legacy ..... False
[2024-05-24 18:58:47,644] [INFO] [config.py:1000:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}}
[2024-05-24 18:58:47,644] [INFO] [config.py:1000:print] data_efficiency_enabled ...... False
[2024-05-24 18:58:47,645] [INFO] [config.py:1000:print] dataloader_drop_last ......... False
[2024-05-24 18:58:47,645] [INFO] [config.py:1000:print] disable_allgather ............ False
[2024-05-24 18:58:47,645] [INFO] [config.py:1000:print] dump_state ................... False
[2024-05-24 18:58:47,645] [INFO] [config.py:1000:print] dynamic_loss_scale_args ...... None
[2024-05-24 18:58:47,645] [INFO] [config.py:1000:print] eigenvalue_enabled ........... False
[2024-05-24 18:58:47,645] [INFO] [config.py:1000:print] eigenvalue_gas_boundary_resolution 1
[2024-05-24 18:58:47,645] [INFO] [config.py:1000:print] eigenvalue_layer_name ........ bert.encoder.layer
[2024-05-24 18:58:47,645] [INFO] [config.py:1000:print] eigenvalue_layer_num ......... 0
[2024-05-24 18:58:47,645] [INFO] [config.py:1000:print] eigenvalue_max_iter .......... 100
[2024-05-24 18:58:47,645] [INFO] [config.py:1000:print] eigenvalue_stability ......... 1e-06
[2024-05-24 18:58:47,645] [INFO] [config.py:1000:print] eigenvalue_tol ............... 0.01
[2024-05-24 18:58:47,645] [INFO] [config.py:1000:print] eigenvalue_verbose ........... False
[2024-05-24 18:58:47,645] [INFO] [config.py:1000:print] elasticity_enabled ........... False
[2024-05-24 18:58:47,645] [INFO] [config.py:1000:print] flops_profiler_config ........ {
"enabled": false,
"recompute_fwd_factor": 0.0,
"profile_step": 1,
"module_depth": -1,
"top_modules": 1,
"detailed": true,
"output_file": null
}
[2024-05-24 18:58:47,645] [INFO] [config.py:1000:print] fp16_auto_cast ............... None
[2024-05-24 18:58:47,645] [INFO] [config.py:1000:print] fp16_enabled ................. False
[2024-05-24 18:58:47,645] [INFO] [config.py:1000:print] fp16_master_weights_and_gradients False
[2024-05-24 18:58:47,645] [INFO] [config.py:1000:print] global_rank .................. 0
[2024-05-24 18:58:47,645] [INFO] [config.py:1000:print] grad_accum_dtype ............. bf16
[2024-05-24 18:58:47,645] [INFO] [config.py:1000:print] gradient_accumulation_steps .. 16
[2024-05-24 18:58:47,645] [INFO] [config.py:1000:print] gradient_clipping ............ 1.0
[2024-05-24 18:58:47,645] [INFO] [config.py:1000:print] gradient_predivide_factor .... 1.0
[2024-05-24 18:58:47,645] [INFO] [config.py:1000:print] graph_harvesting ............. False
[2024-05-24 18:58:47,645] [INFO] [config.py:1000:print] hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8
[2024-05-24 18:58:47,645] [INFO] [config.py:1000:print] initial_dynamic_scale ........ 1
[2024-05-24 18:58:47,645] [INFO] [config.py:1000:print] load_universal_checkpoint .... False
[2024-05-24 18:58:47,645] [INFO] [config.py:1000:print] loss_scale ................... 1.0
[2024-05-24 18:58:47,645] [INFO] [config.py:1000:print] memory_breakdown ............. False
[2024-05-24 18:58:47,645] [INFO] [config.py:1000:print] mics_hierarchial_params_gather False
[2024-05-24 18:58:47,645] [INFO] [config.py:1000:print] mics_shard_size .............. -1
[2024-05-24 18:58:47,645] [INFO] [config.py:1000:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False
[2024-05-24 18:58:47,645] [INFO] [config.py:1000:print] nebula_config ................ {
"enabled": false,
"persistent_storage_path": null,
"persistent_time_interval": 100,
"num_of_version_in_retention": 2,
"enable_nebula_load": true,
"load_path": null
}
[2024-05-24 18:58:47,645] [INFO] [config.py:1000:print] optimizer_legacy_fusion ...... False
[2024-05-24 18:58:47,645] [INFO] [config.py:1000:print] optimizer_name ............... None
[2024-05-24 18:58:47,645] [INFO] [config.py:1000:print] optimizer_params ............. None
[2024-05-24 18:58:47,645] [INFO] [config.py:1000:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0, 'pipe_partitioned': True, 'grad_partitioned': True}
[2024-05-24 18:58:47,645] [INFO] [config.py:1000:print] pld_enabled .................. False
[2024-05-24 18:58:47,645] [INFO] [config.py:1000:print] pld_params ................... False
[2024-05-24 18:58:47,645] [INFO] [config.py:1000:print] prescale_gradients ........... False
[2024-05-24 18:58:47,645] [INFO] [config.py:1000:print] scheduler_name ............... None
[2024-05-24 18:58:47,645] [INFO] [config.py:1000:print] scheduler_params ............. None
[2024-05-24 18:58:47,645] [INFO] [config.py:1000:print] seq_parallel_communication_data_type torch.float32
[2024-05-24 18:58:47,645] [INFO] [config.py:1000:print] sparse_attention ............. None
[2024-05-24 18:58:47,645] [INFO] [config.py:1000:print] sparse_gradients_enabled ..... False
[2024-05-24 18:58:47,645] [INFO] [config.py:1000:print] steps_per_print .............. 100
[2024-05-24 18:58:47,645] [INFO] [config.py:1000:print] train_batch_size ............. 64
[2024-05-24 18:58:47,645] [INFO] [config.py:1000:print] train_micro_batch_size_per_gpu 2
[2024-05-24 18:58:47,645] [INFO] [config.py:1000:print] use_data_before_expert_parallel_ False
[2024-05-24 18:58:47,645] [INFO] [config.py:1000:print] use_node_local_storage ....... False
[2024-05-24 18:58:47,645] [INFO] [config.py:1000:print] wall_clock_breakdown ......... False
[2024-05-24 18:58:47,646] [INFO] [config.py:1000:print] weight_quantization_config ... None
[2024-05-24 18:58:47,646] [INFO] [config.py:1000:print] world_size ................... 2
[2024-05-24 18:58:47,646] [INFO] [config.py:1000:print] zero_allow_untested_optimizer False
[2024-05-24 18:58:47,646] [INFO] [config.py:1000:print] zero_config .................. stage=2 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 use_multi_rank_bucket_allreduce=True allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=DeepSpeedZeroOffloadParamConfig(device='none', nvme_path=None, buffer_count=5, buffer_size=100,000,000, max_in_cpu=1,000,000,000, pin_memory=False) offload_optimizer=DeepSpeedZeroOffloadOptimizerConfig(device='cpu', nvme_path=None, buffer_count=4, pin_memory=True, pipeline=False, pipeline_read=False, pipeline_write=False, fast_init=False, ratio=1.0) sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=100,000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False zero_hpz_partition_size=1 zero_quantized_weights=False zero_quantized_nontrainable_weights=False zero_quantized_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=True pipeline_loading_checkpoint=False override_module_apply=True
[2024-05-24 18:58:47,646] [INFO] [config.py:1000:print] zero_enabled ................. True
[2024-05-24 18:58:47,646] [INFO] [config.py:1000:print] zero_force_ds_cpu_optimizer .. True
[2024-05-24 18:58:47,646] [INFO] [config.py:1000:print] zero_optimization_stage ...... 2
[2024-05-24 18:58:47,646] [INFO] [config.py:986:print_user_config] json = {
"steps_per_print": 100,
"zero_optimization": {
"stage": 2,
"offload_param": {
"device": "none"
},
"offload_optimizer": {
"device": "cpu",
"pin_memory": true
},
"sub_group_size": "auto",
"stage3_max_live_parameters": "auto",
"stage3_max_reuse_distance": "auto",
"stage3_param_persistence_threshold": "auto",
"stage3_prefetch_bucket_size": "auto",
"reduce_bucket_size": "auto",
"zero_hpz_partition_size": 1,
"zero_quantized_weights": false,
"zero_quantized_gradients": false
},
"bf16": {
"enabled": true
},
"gradient_clipping": 1.0,
"prescale_gradients": false,
"wall_clock_breakdown": false,
"data_types": {
"grad_accum_dtype": "bf16"
},
"train_micro_batch_size_per_gpu": 2,
"train_batch_size": 64
}
Generates critic values. ===== I am alive
P/CUMEM/read
n136-112-040:375881:377674 [0] NCCL INFO Channel 11/0 : 0[2] -> 1[3] via P2P/CUMEM/read
n136-112-040:375881:377674 [0] NCCL INFO Channel 12/0 : 0[2] -> 1[3] via P2P/CUMEM/read
n136-112-040:375881:377674 [0] NCCL INFO Channel 13/0 : 0[2] -> 1[3] via P2P/CUMEM/read
n136-112-040:375881:377674 [0] NCCL INFO Channel 14/0 : 0[2] -> 1[3] via P2P/CUMEM/read
n136-112-040:375881:377674 [0] NCCL INFO Channel 15/0 : 0[2] -> 1[3] via P2P/CUMEM/read
n136-112-040:375881:377674 [0] NCCL INFO Channel 16/0 : 0[2] -> 1[3] via P2P/CUMEM/read
n136-112-040:375881:377674 [0] NCCL INFO Channel 17/0 : 0[2] -> 1[3] via P2P/CUMEM/read
n136-112-040:375881:377674 [0] NCCL INFO Channel 18/0 : 0[2] -> 1[3] via P2P/CUMEM/read
n136-112-040:375881:377674 [0] NCCL INFO Channel 19/0 : 0[2] -> 1[3] via P2P/CUMEM/read
n136-112-040:375881:377674 [0] NCCL INFO Channel 20/0 : 0[2] -> 1[3] via P2P/CUMEM/read
n136-112-040:375881:377674 [0] NCCL INFO Channel 21/0 : 0[2] -> 1[3] via P2P/CUMEM/read
n136-112-040:375881:377674 [0] NCCL INFO Channel 22/0 : 0[2] -> 1[3] via P2P/CUMEM/read
n136-112-040:375881:377674 [0] NCCL INFO Channel 23/0 : 0[2] -> 1[3] via P2P/CUMEM/read
n136-112-040:375881:377674 [0] NCCL INFO Connected all rings
n136-112-040:375881:377674 [0] NCCL INFO Connected all trees
n136-112-040:375881:377674 [0] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 512 | 512
n136-112-040:375881:377674 [0] NCCL INFO 24 coll channels, 0 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer
n136-112-040:375881:377674 [0] NCCL INFO comm 0x1bbd32a0 rank 0 nranks 2 cudaDev 0 nvmlDev 2 busId 4a000 commId 0xb015bed65ddba90d - Init COMPLETE
Generates critic values. ===== I am alive
Generates critic values. ===== I am alive
Generates critic values. ===== I am alive
Generates critic values. ===== I am alive
Generates critic values. ===== I am alive
Generates critic values. ===== I am alive
Generates critic values. ===== I am alive
Generates critic values. ===== I am alive
Generates critic values. ===== I am alive
Generates critic values. ===== I am alive
Generates critic values. ===== I am alive
Generates critic values. ===== I am alive
Generates critic values. ===== I am alive
Generates critic values. ===== I am alive
Generates critic values. ===== I am alive
Generates critic values. ===== I am alive
Generates critic values. ===== I am alive
Generates critic values. ===== I am alive
Generates critic values. ===== I am alive
Generates critic values. ===== I am alive
Generates critic values. ===== I am alive
Generates critic values. ===== I am alive
Generates critic values. ===== I am alive
Generates critic values. ===== I am alive
Generates critic values. ===== I am alive
Generates critic values. ===== I am alive
Generates critic values. ===== I am alive
Generates critic values. ===== I am alive
Generates critic values. ===== I am alive
Generates critic values. ===== I am alive
Generates critic values. ===== I am alive
I am progressing 0 !~
n136-112-040:375881:382814 [0] NCCL INFO Using non-device net plugin version 0
n136-112-040:375881:382814 [0] NCCL INFO Using network Socket
n136-112-040:375881:382814 [0] NCCL INFO comm 0x26252b80 rank 0 nranks 2 cudaDev 0 nvmlDev 2 busId 4a000 commId 0xe84413f7a9d26087 - Init START
n136-112-040:375881:382814 [0] NCCL INFO Setting affinity for GPU 2 to ffffffff,00000000,ffffffff
n136-112-040:375881:382814 [0] NCCL INFO comm 0x26252b80 rank 0 nRanks 2 nNodes 1 localRanks 2 localRank 0 MNNVL 0
n136-112-040:375881:382814 [0] NCCL INFO Channel 00/24 : 0 1
n136-112-040:375881:382814 [0] NCCL INFO Channel 01/24 : 0 1
n136-112-040:375881:382814 [0] NCCL INFO Channel 02/24 : 0 1
n136-112-040:375881:382814 [0] NCCL INFO Channel 03/24 : 0 1
n136-112-040:375881:382814 [0] NCCL INFO Channel 04/24 : 0 1
n136-112-040:375881:382814 [0] NCCL INFO Channel 05/24 : 0 1
n136-112-040:375881:382814 [0] NCCL INFO Channel 06/24 : 0 1
n136-112-040:375881:382814 [0] NCCL INFO Channel 07/24 : 0 1
n136-112-040:375881:382814 [0] NCCL INFO Channel 08/24 : 0 1
n136-112-040:375881:382814 [0] NCCL INFO Channel 09/24 : 0 1
n136-112-040:375881:382814 [0] NCCL INFO Channel 10/24 : 0 1
n136-112-040:375881:382814 [0] NCCL INFO Channel 11/24 : 0 1
n136-112-040:375881:382814 [0] NCCL INFO Channel 12/24 : 0 1
n136-112-040:375881:382814 [0] NCCL INFO Channel 13/24 : 0 1
n136-112-040:375881:382814 [0] NCCL INFO Channel 14/24 : 0 1
n136-112-040:375881:382814 [0] NCCL INFO Channel 15/24 : 0 1
n136-112-040:375881:382814 [0] NCCL INFO Channel 16/24 : 0 1
n136-112-040:375881:382814 [0] NCCL INFO Channel 17/24 : 0 1
n136-112-040:375881:382814 [0] NCCL INFO Channel 18/24 : 0 1
n136-112-040:375881:382814 [0] NCCL INFO Channel 19/24 : 0 1
n136-112-040:375881:382814 [0] NCCL INFO Channel 20/24 : 0 1
n136-112-040:375881:382814 [0] NCCL INFO Channel 21/24 : 0 1
n136-112-040:375881:382814 [0] NCCL INFO Channel 22/24 : 0 1
n136-112-040:375881:382814 [0] NCCL INFO Channel 23/24 : 0 1
n136-112-040:375881:382814 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1 [4] 1/-1/-1->0->-1 [5] 1/-1/-1->0->-1 [6] -1/-1/-1->0->1 [7] -1/-1/-1->0->1 [8] -1/-1/-1->0->1 [9] -1/-1/-1->0->1 [10] -1/-1/-1->0->1 [11] -1/-1/-1->0->1 [12] 1/-1/-1->0->-1 [13] 1/-1/-1->0->-1 [14] 1/-1/-1->0->-1 [15] 1/-1/-1->0->-1 [16] 1/-1/-1->0->-1 [17] 1/-1/-1->0->-1 [18] -1/-1/-1->0->1 [19] -1/-1/-1->0->1 [20] -1/-1/-1->0->1 [21] -1/-1/-1->0->1 [22] -1/-1/-1->0->1 [23] -1/-1/-1->0->1
n136-112-040:375881:382814 [0] NCCL INFO P2P Chunksize set to 524288
n136-112-040:375881:382814 [0] NCCL INFO Channel 00/0 : 0[2] -> 1[3] via P2P/CUMEM/read
n136-112-040:375881:382814 [0] NCCL INFO Channel 01/0 : 0[2] -> 1[3] via P2P/CUMEM/read
n136-112-040:375881:382814 [0] NCCL INFO Channel 02/0 : 0[2] -> 1[3] via P2P/CUMEM/read
n136-112-040:375881:382814 [0] NCCL INFO Channel 03/0 : 0[2] -> 1[3] via P2P/CUMEM/read
n136-112-040:375881:382814 [0] NCCL INFO Channel 04/0 : 0[2] -> 1[3] via P2P/CUMEM/read
n136-112-040:375881:382814 [0] NCCL INFO Channel 05/0 : 0[2] -> 1[3] via P2P/CUMEM/read
n136-112-040:375881:382814 [0] NCCL INFO Channel 06/0 : 0[2] -> 1[3] via P2P/CUMEM/read
n136-112-040:375881:382814 [0] NCCL INFO Channel 07/0 : 0[2] -> 1[3] via P2P/CUMEM/read
n136-112-040:375881:382814 [0] NCCL INFO Channel 08/0 : 0[2] -> 1[3] via P2P/CUMEM/read
n136-112-040:375881:382814 [0] NCCL INFO Channel 09/0 : 0[2] -> 1[3] via P2P/CUMEM/read
n136-112-040:375881:382814 [0] NCCL INFO Channel 10/0 : 0[2] -> 1[3] via P2P/CUMEM/read
n136-112-040:375881:382814 [0] NCCL INFO Channel 11/0 : 0[2] -> 1[3] via P2P/CUMEM/read
n136-112-040:375881:382814 [0] NCCL INFO Channel 12/0 : 0[2] -> 1[3] via P2P/CUMEM/read
n136-112-040:375881:382814 [0] NCCL INFO Channel 13/0 : 0[2] -> 1[3] via P2P/CUMEM/read
n136-112-040:375881:382814 [0] NCCL INFO Channel 14/0 : 0[2] -> 1[3] via P2P/CUMEM/read
n136-112-040:375881:382814 [0] NCCL INFO Channel 15/0 : 0[2] -> 1[3] via P2P/CUMEM/read
n136-112-040:375881:382814 [0] NCCL INFO Channel 16/0 : 0[2] -> 1[3] via P2P/CUMEM/read
n136-112-040:375881:382814 [0] NCCL INFO Channel 17/0 : 0[2] -> 1[3] via P2P/CUMEM/read
n136-112-040:375881:382814 [0] NCCL INFO Channel 18/0 : 0[2] -> 1[3] via P2P/CUMEM/read
n136-112-040:375881:382814 [0] NCCL INFO Channel 19/0 : 0[2] -> 1[3] via P2P/CUMEM/read
n136-112-040:375881:382814 [0] NCCL INFO Channel 20/0 : 0[2] -> 1[3] via P2P/CUMEM/read
n136-112-040:375881:382814 [0] NCCL INFO Channel 21/0 : 0[2] -> 1[3] via P2P/CUMEM/read
n136-112-040:375881:382814 [0] NCCL INFO Channel 22/0 : 0[2] -> 1[3] via P2P/CUMEM/read
n136-112-040:375881:382814 [0] NCCL INFO Channel 23/0 : 0[2] -> 1[3] via P2P/CUMEM/read
n136-112-040:375881:382814 [0] NCCL INFO Connected all rings
n136-112-040:375881:382814 [0] NCCL INFO Connected all trees
n136-112-040:375881:382814 [0] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 512 | 512
n136-112-040:375881:382814 [0] NCCL INFO 24 coll channels, 0 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer
n136-112-040:375881:382814 [0] NCCL INFO comm 0x26252b80 rank 0 nranks 2 cudaDev 0 nvmlDev 2 busId 4a000 commId 0xe84413f7a9d26087 - Init COMPLETE
I am progressing 1 !~
I am progressing 2 !~
I am progressing 3 !~
I am progressing 4 !~
I am progressing 5 !~
I am progressing 6 !~
I am progressing 7 !~
I am progressing 8 !~
I am progressing 9 !~
I am progressing 10 !~
I am progressing 11 !~
I am progressing 12 !~
I am progressing 13 !~
I am progressing 14 !~
I am progressing 15 !~

和
:job_id:03000000
/home/tiger/.local/lib/python3.11/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
:actor_name:CriticModelRayActor
The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored.
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with model.to('cuda').

Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:00<00:01, 1.54it/s]
Loading checkpoint shards: 50%|█████ | 2/4 [00:01<00:01, 1.34it/s]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:02<00:00, 1.21it/s]
Loading checkpoint shards: 100%|██████████| 4/4 [00:02<00:00, 1.69it/s]
Loading checkpoint shards: 100%|██████████| 4/4 [00:02<00:00, 1.53it/s]
Some weights of LLMForSequenceRegression were not initialized from the model checkpoint at /mnt/bn/shesjlq20t/HDFS/Trained/Llama3-8b-chat-rm-v14 and are newly initialized: ['value_head.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Using /home/tiger/.cache/torch_extensions/py311_cu121 as PyTorch extensions root...
Loading extension module cpu_adam...
libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'.

Train epoch [1/1]: 0%| | 0/64 [00:00<?, ?it/s]
Train epoch [1/1]: 0%| | 0/64 [00:10<?, ?it/s, critic_loss=0.0367, values=0.42]
Train epoch [1/1]: 2%|▏ | 1/64 [00:10<10:37, 10.12s/it, critic_loss=0.0367, values=0.42]
Train epoch [1/1]: 2%|▏ | 1/64 [00:12<10:37, 10.12s/it, critic_loss=0.0226, values=0.271]
Train epoch [1/1]: 3%|▎ | 2/64 [00:12<05:28, 5.30s/it, critic_loss=0.0226, values=0.271]
Train epoch [1/1]: 3%|▎ | 2/64 [00:14<05:28, 5.30s/it, critic_loss=0.0202, values=0.19]
Train epoch [1/1]: 5%|▍ | 3/64 [00:14<04:02, 3.98s/it, critic_loss=0.0202, values=0.19]
Train epoch [1/1]: 5%|▍ | 3/64 [00:16<04:02, 3.98s/it, critic_loss=0.0462, values=0.646]
Train epoch [1/1]: 6%|▋ | 4/64 [00:16<03:10, 3.17s/it, critic_loss=0.0462, values=0.646]
Train epoch [1/1]: 6%|▋ | 4/64 [00:18<03:10, 3.17s/it, critic_loss=0.0225, values=0.54]
Train epoch [1/1]: 8%|▊ | 5/64 [00:18<02:42, 2.75s/it, critic_loss=0.0225, values=0.54]
Train epoch [1/1]: 8%|▊ | 5/64 [00:20<02:42, 2.75s/it, critic_loss=0.021, values=0.117]
Train epoch [1/1]: 9%|▉ | 6/64 [00:20<02:20, 2.42s/it, critic_loss=0.021, values=0.117]
Train epoch [1/1]: 9%|▉ | 6/64 [00:21<02:20, 2.42s/it, critic_loss=0.0237, values=0.508]
Train epoch [1/1]: 11%|█ | 7/64 [00:21<02:05, 2.21s/it, critic_loss=0.0237, values=0.508]
Train epoch [1/1]: 11%|█ | 7/64 [00:23<02:05, 2.21s/it, critic_loss=0.0565, values=0.0332]
Train epoch [1/1]: 12%|█▎ | 8/64 [00:23<01:57, 2.09s/it, critic_loss=0.0565, values=0.0332]
Train epoch [1/1]: 12%|█▎ | 8/64 [00:25<01:57, 2.09s/it, critic_loss=0.0381, values=0.412]
Train epoch [1/1]: 14%|█▍ | 9/64 [00:25<01:48, 1.98s/it, critic_loss=0.0381, values=0.412]
Train epoch [1/1]: 14%|█▍ | 9/64 [00:27<01:48, 1.98s/it, critic_loss=0.0519, values=0.582]
Train epoch [1/1]: 16%|█▌ | 10/64 [00:27<01:45, 1.96s/it, critic_loss=0.0519, values=0.582]
Train epoch [1/1]: 16%|█▌ | 10/64 [00:29<01:45, 1.96s/it, critic_loss=0.0439, values=0.0381]
Train epoch [1/1]: 17%|█▋ | 11/64 [00:29<01:44, 1.98s/it, critic_loss=0.0439, values=0.0381]
Train epoch [1/1]: 17%|█▋ | 11/64 [00:31<01:44, 1.98s/it, critic_loss=0.0211, values=0.313]
Train epoch [1/1]: 19%|█▉ | 12/64 [00:31<01:40, 1.93s/it, critic_loss=0.0211, values=0.313]
Train epoch [1/1]: 19%|█▉ | 12/64 [00:32<01:40, 1.93s/it, critic_loss=0.041, values=0.951]
Train epoch [1/1]: 20%|██ | 13/64 [00:32<01:32, 1.82s/it, critic_loss=0.041, values=0.951]
Train epoch [1/1]: 20%|██ | 13/64 [00:34<01:32, 1.82s/it, critic_loss=0.0165, values=-0.578]
Train epoch [1/1]: 22%|██▏ | 14/64 [00:34<01:34, 1.89s/it, critic_loss=0.0165, values=-0.578]
Train epoch [1/1]: 22%|██▏ | 14/64 [00:36<01:34, 1.89s/it, critic_loss=0.0243, values=0.105]
Train epoch [1/1]: 23%|██▎ | 15/64 [00:36<01:31, 1.87s/it, critic_loss=0.0243, values=0.105]

from openrlhf.

Ricardokevins commented on July 30, 2024

Generates critic values. ===== I am alive 和 I am progressing 15 !~是我加上的调试语句，希望可以看出哪里有问题（但是失败了

from openrlhf.

Ricardokevins commented on July 30, 2024

我用的机器内存蛮大的，有2T

下面是启动脚本

set -x 



# ray start --head --node-ip-address 0.0.0.0 --num-gpus 8
# if you want to launch ray on more nodes, use
# ray start --address {MASTER-NODE-ADDRESS}:6379  --num-gpus 8
# ray stop
# ps aux | grep '/usr/bin/python3' | grep -v grep | awk '{print $2}' | xargs kill
WANDB_PROJECT=${project} WANDB_NAME=${expr} ray job submit --address="http://127.0.0.1:8265" \
    --runtime-env-json='{"working_dir": "xxxxx/OpenRLHF-main", "pip": "xxxxxxxOpenRLHF-main/requirements.txt"}' \
    -- python3 examples/train_ppo_ray.py \
    --ref_num_nodes 1 \
    --ref_num_gpus_per_node 1 \
    --reward_num_nodes 1 \
    --reward_num_gpus_per_node 1 \
    --critic_num_nodes 1 \
    --critic_num_gpus_per_node 2 \
    --actor_num_nodes 1 \
    --actor_num_gpus_per_node 2 \
    --vllm_num_engines 2 \
    --vllm_tensor_parallel_size 1 \
    --use_wandb HelloWorldHelloWorldHelloWorldHelloWorld \
    --wandb_project ${project} \
    --wandb_run_name ${expr} \
    --save_path ./7b_llama \
    --micro_train_batch_size 2 \
    --train_batch_size 64 \
    --micro_rollout_batch_size 4 \
    --rollout_batch_size 256 \
    --max_epochs 1 \
    --grad_accum_dtype bf16 \
    --prompt_max_len 1024 \
    --generate_max_len 1024 \
    --zero_stage 2 \
    --bf16 \
    --actor_learning_rate 5e-7 \
    --critic_learning_rate 9e-6 \
    --init_kl_coef 0.01 \
    --max_samples 10000 \
    --normalize_reward \
    --actor_init_on_gpu \
    --adam_offload \
    --flash_attn \
    --gradient_checkpointing

主要是critic model被kill的没有报错，感觉很奇怪，没有什么思路

from openrlhf.

Ricardokevins commented on July 30, 2024

好像用Lora来训练就暂时没有报错。不过还是希望可以全量训练。

  --lora_rank 16 \
  --lora_alpha 32 \
  --lora_dropout 0.05 \

我注意到一个奇怪的现象，有没有可能是因为critic model在两张卡share参数的时候显存分配问题？如GPU 4,5

from openrlhf.

hijkzzz commented on July 30, 2024

检查是不是用的我们提供的 docker image https://github.com/OpenLLMAI/OpenRLHF/tree/main/dockerfile 或者类似兼容的
可以尝试下：

git pull （升级了ray 减少了内存使用）

ray job submit --address="http://127.0.0.1:8265" \
    --runtime-env-json='{"working_dir": "/openrlhf", "pip": "/openrlhf/requirements.txt"}' \
    -- python3 examples/train_ppo_ray.py \
    --ref_num_nodes 1 \
    --ref_num_gpus_per_node 2 \
    --reward_num_nodes 1 \
    --reward_num_gpus_per_node 2 \
    --critic_num_nodes 1 \
    --critic_num_gpus_per_node 2 \
    --actor_num_nodes 1 \
    --actor_num_gpus_per_node 2 \
    --vllm_num_engines 2 \
    --vllm_tensor_parallel_size 2 \
    --colocate_critic_reward \
    --colocate_actor_ref \
    --ref_reward_offload \ 
    --pretrain meta-llama/Meta-Llama-3-8B-Instruct \
    --reward_pretrain meta-llama/Meta-Llama-3-8B-Instruct \
    --save_path /openrlhf/examples/test_scripts/ckpt/llama_ray \
    --micro_train_batch_size 4 \
    --train_batch_size 128 \
    --micro_rollout_batch_size 16 \
    --rollout_batch_size 1024 \
    --max_epochs 1 \
    --prompt_max_len 1024 \
    --generate_max_len 1024 \
    --zero_stage 3 \
    --bf16 \
    --actor_learning_rate 5e-7 \
    --critic_learning_rate 9e-6 \
    --init_kl_coef 0.01 \
    --prompt_data Open-Orca/OpenOrca \
    --prompt_data_probs 1.0 \
    --max_samples 50000 \
    --normalize_reward \
    --adam_offload \
    --flash_attn \
    --gradient_checkpointing

from openrlhf.

Strange Kill of Critic Model about openrlhf HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent