the original finetune_on_pregenerated.py from hugging face/pytorch-transformers repository actually has some bugs:
optimizer = AdamW(optimizer_grouped_parameters,
lr=args.learning_rate,
warmup=args.warmup_proportion,
t_total=num_train_optimization_steps)
actually "warmup,t_total" shouldn't be used.
Besides, I first run "Pregenerating training data", then the "Training on pregenerated data" as the readme from "huggingface," said. but the final files didn't improve the performance
torch.save(model_to_save.state_dict(), output_model_file)
model_to_save.config.to_json_file(output_config_file)
tokenizer.save_vocabulary(args.output_dir)