Git Product home page Git Product logo

extreme-bert / extreme-bert Goto Github PK

View Code? Open in Web Editor NEW
282.0 11.0 15.0 6.89 MB

ExtremeBERT is a toolkit that accelerates the pretraining of customized language models on customized datasets, described in the paper “ExtremeBERT: A Toolkit for Accelerating Pretraining of Customized BERT”.

Home Page: https://extreme-bert.github.io/extreme-bert-page

License: Apache License 2.0

Shell 12.28% Python 87.72%
machine-learning natural-language-processing nlp python deep-learning bert language-model language-models pytorch transformer

extreme-bert's Issues

Could not locate module 'data'

File shard_data.py depends on module 'data' (from data import TextSharding), which was missing from the repository.
I think the file TextSharding.py was based on the file https
However, the file signature is different. shard_data.py has 12 positional arguments and the original file only has 6 positional arguments.

按照文档步骤安装后,运行deepspeed run_pretraining.py 出错!!!

输入(Input:)

deepspeed run_pretraining.py \
>   --model_type bert-mlm --tokenizer_name bert-large-uncased \
>   --hidden_act gelu \
>   --hidden_size 1024 \
>   --num_hidden_layers 24 \
>   --num_attention_heads 16 \
>   --intermediate_size 4096 \
>   --hidden_dropout_prob 0.1 \
>   --attention_probs_dropout_prob 0.1 \
>   --encoder_ln_mode pre-ln \
>   --lr 1e-3 \
>   --train_batch_size 4096 \
>   --train_micro_batch_size_per_gpu 32 \
>   --lr_schedule time \
>   --curve linear \
>   --warmup_proportion 0.06 \
>   --gradient_clipping 0.0 \
>   --optimizer_type adamw \
>   --weight_decay 0.01 \
>   --adam_beta1 0.9 \
>   --adam_beta2 0.98 \
>   --adam_eps 1e-6 \
>   --total_training_time 24.0 \
>   --early_exit_time_marker 24.0 \
>   --dataset_path ./example_multisent_perline.txt \
>   --output_dir /tmp/training-out \
>   --print_steps 100 \
>   --num_epochs_between_checkpoints 10000 \
>   --job_name pretraining_experiment \
>   --project_name budget-bert-pretraining \
>   --validation_epochs 3 \
>   --validation_epochs_begin 1 \
>   --validation_epochs_end 1 \
>   --validation_begin_proportion 0.05 \
>   --validation_end_proportion 0.01 \
>   --validation_micro_batch 16 \
>   --deepspeed \
>   --data_loader_type dist \
>   --do_validation \
>   --use_early_stopping \
>   --early_stop_time 180 \
>   --early_stop_eval_loss 6 \
>   --seed 42 \
>   --fp16

输出(Error info):

[1/2] c++ -MMD -MF flatten_unflatten.o.d -DTORCH_EXTENSION_NAME=utils -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /root/anaconda3/envs/exbert/lib/python3.8/site-packages/torch/include -isystem /root/anaconda3/envs/exbert/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /root/anaconda3/envs/exbert/lib/python3.8/site-packages/torch/include/TH -isystem /root/anaconda3/envs/exbert/lib/python3.8/site-packages/torch/include/THC -isystem /root/anaconda3/envs/exbert/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /root/anaconda3/envs/exbert/lib/python3.8/site-packages/deepspeed/ops/csrc/utils/flatten_unflatten.cpp -o flatten_unflatten.o
FAILED: flatten_unflatten.o
c++ -MMD -MF flatten_unflatten.o.d -DTORCH_EXTENSION_NAME=utils -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /root/anaconda3/envs/exbert/lib/python3.8/site-packages/torch/include -isystem /root/anaconda3/envs/exbert/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /root/anaconda3/envs/exbert/lib/python3.8/site-packages/torch/include/TH -isystem /root/anaconda3/envs/exbert/lib/python3.8/site-packages/torch/include/THC -isystem /root/anaconda3/envs/exbert/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /root/anaconda3/envs/exbert/lib/python3.8/site-packages/deepspeed/ops/csrc/utils/flatten_unflatten.cpp -o flatten_unflatten.o
c++: 错误:unrecognized command line option ‘-std=c++14’
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
  File "/root/anaconda3/envs/exbert/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1717, in _run_ninja_build
    subprocess.run(
  File "/root/anaconda3/envs/exbert/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "run_pretraining.py", line 712, in <module>
    main()
  File "run_pretraining.py", line 692, in main
    model, optimizer, lr_scheduler = prepare_model_and_optimizer(args)
  File "run_pretraining.py", line 439, in prepare_model_and_optimizer
    model.network, optimizer, _, lr_scheduler = deepspeed.initialize(
  File "/root/anaconda3/envs/exbert/lib/python3.8/site-packages/deepspeed/__init__.py", line 126, in initialize
    engine = DeepSpeedEngine(args=args,
  File "/root/anaconda3/envs/exbert/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 223, in __init__
    util_ops = UtilsBuilder().load()
  File "/root/anaconda3/envs/exbert/lib/python3.8/site-packages/deepspeed/ops/op_builder/builder.py", line 239, in load
    return self.jit_load(verbose)
  File "/root/anaconda3/envs/exbert/lib/python3.8/site-packages/deepspeed/ops/op_builder/builder.py", line 267, in jit_load
    op_module = load(
  File "/root/anaconda3/envs/exbert/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1124, in load
    return _jit_compile(
  File "/root/anaconda3/envs/exbert/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1337, in _jit_compile
    _write_ninja_file_and_build_library(
  File "/root/anaconda3/envs/exbert/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1449, in _write_ninja_file_and_build_library
    _run_ninja_build(
  File "/root/anaconda3/envs/exbert/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1733, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension 'utils'
Killing subprocess 860
Traceback (most recent call last):
  File "/root/anaconda3/envs/exbert/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/root/anaconda3/envs/exbert/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/root/anaconda3/envs/exbert/lib/python3.8/site-packages/deepspeed/launcher/launch.py", line 171, in <module>
    main()
  File "/root/anaconda3/envs/exbert/lib/python3.8/site-packages/deepspeed/launcher/launch.py", line 161, in main
    sigkill_handler(signal.SIGTERM, None)  # not coming back
  File "/root/anaconda3/envs/exbert/lib/python3.8/site-packages/deepspeed/launcher/launch.py", line 139, in sigkill_handler
    raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd)
subprocess.CalledProcessError: Command '['/root/anaconda3/envs/exbert/bin/python', '-u', 'run_pretraining.py', '--local_rank=0', '--model_type', 'bert-mlm', '--tokenizer_name', 'bert-large-uncased', '--hidden_act', 'gelu', '--hidden_size', '1024', '--num_hidden_layers', '24', '--num_attention_heads', '16', '--intermediate_size', '4096', '--hidden_dropout_prob', '0.1', '--attention_probs_dropout_prob', '0.1', '--encoder_ln_mode', 'pre-ln', '--lr', '1e-3', '--train_batch_size', '4096', '--train_micro_batch_size_per_gpu', '32', '--lr_schedule', 'time', '--curve', 'linear', '--warmup_proportion', '0.06', '--gradient_clipping', '0.0', '--optimizer_type', 'adamw', '--weight_decay', '0.01', '--adam_beta1', '0.9', '--adam_beta2', '0.98', '--adam_eps', '1e-6', '--total_training_time', '24.0', '--early_exit_time_marker', '24.0', '--dataset_path', './example_multisent_perline.txt', '--output_dir', '/tmp/training-out', '--print_steps', '100', '--num_epochs_between_checkpoints', '10000', '--job_name', 'pretraining_experiment', '--project_name', 'budget-bert-pretraining', '--validation_epochs', '3', '--validation_epochs_begin', '1', '--validation_epochs_end', '1', '--validation_begin_proportion', '0.05', '--validation_end_proportion', '0.01', '--validation_micro_batch', '16', '--deepspeed', '--data_loader_type', 'dist', '--do_validation', '--use_early_stopping', '--early_stop_time', '180', '--early_stop_eval_loss', '6', '--seed', '42', '--fp16']' returned non-zero exit status 1.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.