Git Product home page Git Product logo

extreme-bert / extreme-bert Goto Github PK

View Code? Open in Web Editor NEW
282.0 11.0 15.0 6.89 MB

ExtremeBERT is a toolkit that accelerates the pretraining of customized language models on customized datasets, described in the paper “ExtremeBERT: A Toolkit for Accelerating Pretraining of Customized BERT”.

Home Page: https://extreme-bert.github.io/extreme-bert-page

License: Apache License 2.0

Shell 12.28% Python 87.72%
machine-learning natural-language-processing nlp python deep-learning bert language-model language-models pytorch transformer

extreme-bert's Introduction

ExtremeBERT: Accelerate your LM Pre-training!

ExtremeBERT is a toolkit that accelerates the pretraining and finetuning of BERT on customized datasets. Take a quick look at our documentation and paper.

Fast BERT Pre-training

GitHub Documentation Contributor Covenant

Features

Simple Installation

We simplify the installation process of dependencies required by the package. They can be easily installed by a single command source install.sh. No other steps are needed!

Fast Training

Pretraining time can be reduced to a time-based value by defining --total_training_time=24.0 (24 hours for example). Mixed precision is supported by adding --fp16.

glue-performance

Dataset Zoo

Support a large number of pre-training datasets. Users can easily use them via huggingface dataset hub. In addition, we also support users' custom dataset.

Optimization-friendly

It is hard to verify the effectiveness of a new optimization method in LM pre-training. With our package, we support integration of customized optimizers and schedulers into the pipeline, which help researchers in the optimization community to easily verify their algorithms. Efficient optimization and data processing algorithms will also be added to future releases.

Supported Models

Model Status
BERT ✅ Supported
RoBERTa, ALBERT, DeBERTa 🚧 Developing

System requirements

  • Linux Ubuntu >= 18.04.
  • At least 20 GB GPU memory, e.g. GPU 3090 x1, or 2080Ti GPU x2, GeForce RTX 3070 x4, etc.
  • At least 200GB disk space.

Configurable pipeline

First, one may refer to configs/bert-simple.yaml and make suitable configs for the pipeline, including datasets, number of gpus available, etc. Then, by simply running the following command, the whole pipeline will be executed stage by stage,

source install.sh; python main.py --config configs/bert-simple.yaml

which will run environment installation, dataset prepration, pretraining, finetuning and test result collection one by one and generate the .zip file for GLUE test server submission under output_test_translated/finetune/*/*.zip. Please refer to PIPELINE_CONFIG.md for more information about the YAML config file.

The details of each stages are illustrated in following sections.

Installation

run source install.sh

Dataset

The dataset directory includes scripts to pre-process the datasets we used in our experiments (Wikipedia, Bookcorpus). See dedicated README for full details.

Pretraining

Pretraining script: run_pretraining.py

For all possible pretraining arguments see: python run_pretraining.py -h

Example for training with the best configuration presented in our paper (24-layers/1024H/time-based learning rate schedule/fp16):
deepspeed run_pretraining.py \
  --model_type bert-mlm --tokenizer_name bert-large-uncased \
  --hidden_act gelu \
  --hidden_size 1024 \
  --num_hidden_layers 24 \
  --num_attention_heads 16 \
  --intermediate_size 4096 \
  --hidden_dropout_prob 0.1 \
  --attention_probs_dropout_prob 0.1 \
  --encoder_ln_mode pre-ln \
  --lr 1e-3 \
  --train_batch_size 4096 \
  --train_micro_batch_size_per_gpu 32 \
  --lr_schedule time \
  --curve linear \
  --warmup_proportion 0.06 \
  --gradient_clipping 0.0 \
  --optimizer_type adamw \
  --weight_decay 0.01 \
  --adam_beta1 0.9 \
  --adam_beta2 0.98 \
  --adam_eps 1e-6 \
  --total_training_time 24.0 \
  --early_exit_time_marker 24.0 \
  --dataset_path <dataset path> \
  --output_dir /tmp/training-out \
  --print_steps 100 \
  --num_epochs_between_checkpoints 10000 \
  --job_name pretraining_experiment \
  --project_name budget-bert-pretraining \
  --validation_epochs 3 \
  --validation_epochs_begin 1 \
  --validation_epochs_end 1 \
  --validation_begin_proportion 0.05 \
  --validation_end_proportion 0.01 \
  --validation_micro_batch 16 \
  --deepspeed \
  --data_loader_type dist \
  --do_validation \
  --use_early_stopping \
  --early_stop_time 180 \
  --early_stop_eval_loss 6 \
  --seed 42 \
  --fp16

Finetuning

Use run_glue.py to run finetuning for a saved checkpoint on GLUE tasks.

The finetuning script is identical to the one provided by Huggingface with the addition of our model.

For all possible pretraining arguments see: python run_glue.py -h

Example for finetuning on MRPC:
python run_glue.py \
  --model_name_or_path <path to model> \
  --task_name MRPC \
  --max_seq_length 128 \
  --output_dir /tmp/finetuning \
  --overwrite_output_dir \
  --do_train --do_eval \
  --evaluation_strategy steps \
  --per_device_train_batch_size 32 --gradient_accumulation_steps 1 \
  --per_device_eval_batch_size 32 \
  --learning_rate 5e-5 \
  --weight_decay 0.01 \
  --eval_steps 50 --evaluation_strategy steps \
  --max_grad_norm 1.0 \
  --num_train_epochs 5 \
  --lr_scheduler_type polynomial \
  --warmup_steps 50

Citation

If you find this repository useful, you may cite our paper as:

@inproceedings{extreme-bert,
    title={ExtremeBERT: A Toolkit for Accelerating Pretraining of Customized BERT}, 
    author={Rui Pan and Shizhe Diao and Jianlin Chen and Tong Zhang},
    year={2022},
    eprint={2211.17201},
    archivePrefix={arXiv},
    primaryClass={cs.CL},
    url={https://arxiv.org/abs/2211.17201},
}

Acknowledgements

A significant portion of the code is based on How to Train BERT with an Academic Budget licensed under Apache 2.0.

Contact

For help or issues using this package, please submit a GitHub issue.

For personal communication related to this package, please contact Rui Pan ([email protected]) and Shizhe Diao ([email protected]).

extreme-bert's People

Contributors

isaac-jl-chen avatar research4pan avatar shizhediao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

extreme-bert's Issues

Could not locate module 'data'

File shard_data.py depends on module 'data' (from data import TextSharding), which was missing from the repository.
I think the file TextSharding.py was based on the file https
However, the file signature is different. shard_data.py has 12 positional arguments and the original file only has 6 positional arguments.

按照文档步骤安装后,运行deepspeed run_pretraining.py 出错!!!

输入(Input:)

deepspeed run_pretraining.py \
>   --model_type bert-mlm --tokenizer_name bert-large-uncased \
>   --hidden_act gelu \
>   --hidden_size 1024 \
>   --num_hidden_layers 24 \
>   --num_attention_heads 16 \
>   --intermediate_size 4096 \
>   --hidden_dropout_prob 0.1 \
>   --attention_probs_dropout_prob 0.1 \
>   --encoder_ln_mode pre-ln \
>   --lr 1e-3 \
>   --train_batch_size 4096 \
>   --train_micro_batch_size_per_gpu 32 \
>   --lr_schedule time \
>   --curve linear \
>   --warmup_proportion 0.06 \
>   --gradient_clipping 0.0 \
>   --optimizer_type adamw \
>   --weight_decay 0.01 \
>   --adam_beta1 0.9 \
>   --adam_beta2 0.98 \
>   --adam_eps 1e-6 \
>   --total_training_time 24.0 \
>   --early_exit_time_marker 24.0 \
>   --dataset_path ./example_multisent_perline.txt \
>   --output_dir /tmp/training-out \
>   --print_steps 100 \
>   --num_epochs_between_checkpoints 10000 \
>   --job_name pretraining_experiment \
>   --project_name budget-bert-pretraining \
>   --validation_epochs 3 \
>   --validation_epochs_begin 1 \
>   --validation_epochs_end 1 \
>   --validation_begin_proportion 0.05 \
>   --validation_end_proportion 0.01 \
>   --validation_micro_batch 16 \
>   --deepspeed \
>   --data_loader_type dist \
>   --do_validation \
>   --use_early_stopping \
>   --early_stop_time 180 \
>   --early_stop_eval_loss 6 \
>   --seed 42 \
>   --fp16

输出(Error info):

[1/2] c++ -MMD -MF flatten_unflatten.o.d -DTORCH_EXTENSION_NAME=utils -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /root/anaconda3/envs/exbert/lib/python3.8/site-packages/torch/include -isystem /root/anaconda3/envs/exbert/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /root/anaconda3/envs/exbert/lib/python3.8/site-packages/torch/include/TH -isystem /root/anaconda3/envs/exbert/lib/python3.8/site-packages/torch/include/THC -isystem /root/anaconda3/envs/exbert/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /root/anaconda3/envs/exbert/lib/python3.8/site-packages/deepspeed/ops/csrc/utils/flatten_unflatten.cpp -o flatten_unflatten.o
FAILED: flatten_unflatten.o
c++ -MMD -MF flatten_unflatten.o.d -DTORCH_EXTENSION_NAME=utils -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /root/anaconda3/envs/exbert/lib/python3.8/site-packages/torch/include -isystem /root/anaconda3/envs/exbert/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /root/anaconda3/envs/exbert/lib/python3.8/site-packages/torch/include/TH -isystem /root/anaconda3/envs/exbert/lib/python3.8/site-packages/torch/include/THC -isystem /root/anaconda3/envs/exbert/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /root/anaconda3/envs/exbert/lib/python3.8/site-packages/deepspeed/ops/csrc/utils/flatten_unflatten.cpp -o flatten_unflatten.o
c++: 错误:unrecognized command line option ‘-std=c++14’
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
  File "/root/anaconda3/envs/exbert/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1717, in _run_ninja_build
    subprocess.run(
  File "/root/anaconda3/envs/exbert/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "run_pretraining.py", line 712, in <module>
    main()
  File "run_pretraining.py", line 692, in main
    model, optimizer, lr_scheduler = prepare_model_and_optimizer(args)
  File "run_pretraining.py", line 439, in prepare_model_and_optimizer
    model.network, optimizer, _, lr_scheduler = deepspeed.initialize(
  File "/root/anaconda3/envs/exbert/lib/python3.8/site-packages/deepspeed/__init__.py", line 126, in initialize
    engine = DeepSpeedEngine(args=args,
  File "/root/anaconda3/envs/exbert/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 223, in __init__
    util_ops = UtilsBuilder().load()
  File "/root/anaconda3/envs/exbert/lib/python3.8/site-packages/deepspeed/ops/op_builder/builder.py", line 239, in load
    return self.jit_load(verbose)
  File "/root/anaconda3/envs/exbert/lib/python3.8/site-packages/deepspeed/ops/op_builder/builder.py", line 267, in jit_load
    op_module = load(
  File "/root/anaconda3/envs/exbert/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1124, in load
    return _jit_compile(
  File "/root/anaconda3/envs/exbert/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1337, in _jit_compile
    _write_ninja_file_and_build_library(
  File "/root/anaconda3/envs/exbert/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1449, in _write_ninja_file_and_build_library
    _run_ninja_build(
  File "/root/anaconda3/envs/exbert/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1733, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension 'utils'
Killing subprocess 860
Traceback (most recent call last):
  File "/root/anaconda3/envs/exbert/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/root/anaconda3/envs/exbert/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/root/anaconda3/envs/exbert/lib/python3.8/site-packages/deepspeed/launcher/launch.py", line 171, in <module>
    main()
  File "/root/anaconda3/envs/exbert/lib/python3.8/site-packages/deepspeed/launcher/launch.py", line 161, in main
    sigkill_handler(signal.SIGTERM, None)  # not coming back
  File "/root/anaconda3/envs/exbert/lib/python3.8/site-packages/deepspeed/launcher/launch.py", line 139, in sigkill_handler
    raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd)
subprocess.CalledProcessError: Command '['/root/anaconda3/envs/exbert/bin/python', '-u', 'run_pretraining.py', '--local_rank=0', '--model_type', 'bert-mlm', '--tokenizer_name', 'bert-large-uncased', '--hidden_act', 'gelu', '--hidden_size', '1024', '--num_hidden_layers', '24', '--num_attention_heads', '16', '--intermediate_size', '4096', '--hidden_dropout_prob', '0.1', '--attention_probs_dropout_prob', '0.1', '--encoder_ln_mode', 'pre-ln', '--lr', '1e-3', '--train_batch_size', '4096', '--train_micro_batch_size_per_gpu', '32', '--lr_schedule', 'time', '--curve', 'linear', '--warmup_proportion', '0.06', '--gradient_clipping', '0.0', '--optimizer_type', 'adamw', '--weight_decay', '0.01', '--adam_beta1', '0.9', '--adam_beta2', '0.98', '--adam_eps', '1e-6', '--total_training_time', '24.0', '--early_exit_time_marker', '24.0', '--dataset_path', './example_multisent_perline.txt', '--output_dir', '/tmp/training-out', '--print_steps', '100', '--num_epochs_between_checkpoints', '10000', '--job_name', 'pretraining_experiment', '--project_name', 'budget-bert-pretraining', '--validation_epochs', '3', '--validation_epochs_begin', '1', '--validation_epochs_end', '1', '--validation_begin_proportion', '0.05', '--validation_end_proportion', '0.01', '--validation_micro_batch', '16', '--deepspeed', '--data_loader_type', 'dist', '--do_validation', '--use_early_stopping', '--early_stop_time', '180', '--early_stop_eval_loss', '6', '--seed', '42', '--fp16']' returned non-zero exit status 1.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.