Git Product home page Git Product logo

llmisallyouneed's People

Contributors

visionshao avatar

Watchers

 avatar

llmisallyouneed's Issues

Why does some LLMs use eos_token as pad_token?

Hi all! There’s an interesting story here.

In general you are correct that causal LMs like Falcon are not trained with a pad token, and so the tokenizer does not have one set. This is true for a lot of causal LMs in the Hub. During training, these models are often fed sequences that have been concatenated together and truncated at the maximum sequence length, and so there is never any empty space that needs padding.

The reason we add one later is because a lot of downstream methods use padding and attention masks in some way. However, in many cases it doesn’t really matter what you set the padding token to! This is because the padded tokens will generally be masked by setting the attention_mask to 0, so those tokens will not be attended to by the rest of the sequence.

However, one place the choice of padding token can matter is in the labels when fine-tuning the model. This is because in standard CLM training, the labels are the inputs, shifted by a single position. This would mean that in the final position of the sequence before the padding at the end, the label at that position will be the padding token. When training models with shorter sequences (such as for chat), we generally want them to mark the end of the text they’ve generated, using a token like eos_token. As a result, we commonly just use eos_token as the padding token.

However, depending on your fine-tuning task, you may not want the model to learn to predict eos_token at the end of a sequence - if this is the case, simply change the label at that position to the token you do want, or set the label to -100 to mask the label at that position.

Does that answer the questions you had? Feel free to let me know if I missed anything here!

from https://discuss.huggingface.co/t/why-does-the-falcon-qlora-tutorial-code-use-eos-token-as-pad-token/45954/9

DeepSpeed Initialization Error

Detected CUDA files, patching ldflags
Emitting ninja build file /mnt/cache/weishao4/.cache/torch_extensions/py39_cu117/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/3] /mnt/cache/weishao4/anaconda3/envs/toxicity/bin/nvcc -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/mnt/cache/weishao4/anaconda3/envs/toxicity/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -I/mnt/cache/weishao4/anaconda3/envs/toxicity/include -isystem /mnt/cache/weishao4/anaconda3/envs/toxicity/lib/python3.9/site-packages/torch/include -isystem /mnt/cache/weishao4/anaconda3/envs/toxicity/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /mnt/cache/weishao4/anaconda3/envs/toxicity/lib/python3.9/site-packages/torch/include/TH -isystem /mnt/cache/weishao4/anaconda3/envs/toxicity/lib/python3.9/site-packages/torch/include/THC -isystem /mnt/cache/weishao4/anaconda3/envs/toxicity/include -isystem /mnt/cache/weishao4/anaconda3/envs/toxicity/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -DBF16_AVAILABLE -c /mnt/cache/weishao4/anaconda3/envs/toxicity/lib/python3.9/site-packages/deepspeed/ops/csrc/common/custom_cuda_kernel.cu -o custom_cuda_kernel.cuda.o
FAILED: custom_cuda_kernel.cuda.o
/mnt/cache/weishao4/anaconda3/envs/toxicity/bin/nvcc -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/mnt/cache/weishao4/anaconda3/envs/toxicity/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -I/mnt/cache/weishao4/anaconda3/envs/toxicity/include -isystem /mnt/cache/weishao4/anaconda3/envs/toxicity/lib/python3.9/site-packages/torch/include -isystem /mnt/cache/weishao4/anaconda3/envs/toxicity/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /mnt/cache/weishao4/anaconda3/envs/toxicity/lib/python3.9/site-packages/torch/include/TH -isystem /mnt/cache/weishao4/anaconda3/envs/toxicity/lib/python3.9/site-packages/torch/include/THC -isystem /mnt/cache/weishao4/anaconda3/envs/toxicity/include -isystem /mnt/cache/weishao4/anaconda3/envs/toxicity/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -DBF16_AVAILABLE -c /mnt/cache/weishao4/anaconda3/envs/toxicity/lib/python3.9/site-packages/deepspeed/ops/csrc/common/custom_cuda_kernel.cu -o custom_cuda_kernel.cuda.o
cc1plus: fatal error: cuda_runtime.h: No such file or directory
compilation terminated.
[2/3] c++ -MMD -MF cpu_adam.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/mnt/cache/weishao4/anaconda3/envs/toxicity/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -I/mnt/cache/weishao4/anaconda3/envs/toxicity/include -isystem /mnt/cache/weishao4/anaconda3/envs/toxicity/lib/python3.9/site-packages/torch/include -isystem /mnt/cache/weishao4/anaconda3/envs/toxicity/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /mnt/cache/weishao4/anaconda3/envs/toxicity/lib/python3.9/site-packages/torch/include/TH -isystem /mnt/cache/weishao4/anaconda3/envs/toxicity/lib/python3.9/site-packages/torch/include/THC -isystem /mnt/cache/weishao4/anaconda3/envs/toxicity/include -isystem /mnt/cache/weishao4/anaconda3/envs/toxicity/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -std=c++17 -g -Wno-reorder -L/mnt/cache/weishao4/anaconda3/envs/toxicity/lib64 -lcudart -lcublas -g -march=native -fopenmp -D__AVX512 -D__ENABLE_CUDA_ -DBF16_AVAILABLE -c /mnt/cache/weishao4/anaconda3/envs/toxicity/lib/python3.9/site-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp -o cpu_adam.o
FAILED: cpu_adam.o
c++ -MMD -MF cpu_adam.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/mnt/cache/weishao4/anaconda3/envs/toxicity/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -I/mnt/cache/weishao4/anaconda3/envs/toxicity/include -isystem /mnt/cache/weishao4/anaconda3/envs/toxicity/lib/python3.9/site-packages/torch/include -isystem /mnt/cache/weishao4/anaconda3/envs/toxicity/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /mnt/cache/weishao4/anaconda3/envs/toxicity/lib/python3.9/site-packages/torch/include/TH -isystem /mnt/cache/weishao4/anaconda3/envs/toxicity/lib/python3.9/site-packages/torch/include/THC -isystem /mnt/cache/weishao4/anaconda3/envs/toxicity/include -isystem /mnt/cache/weishao4/anaconda3/envs/toxicity/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -std=c++17 -g -Wno-reorder -L/mnt/cache/weishao4/anaconda3/envs/toxicity/lib64 -lcudart -lcublas -g -march=native -fopenmp -D__AVX512 -D__ENABLE_CUDA_ -DBF16_AVAILABLE -c /mnt/cache/weishao4/anaconda3/envs/toxicity/lib/python3.9/site-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp -o cpu_adam.o
In file included from /mnt/cache/weishao4/anaconda3/envs/toxicity/lib/python3.9/site-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp:6:
/mnt/cache/weishao4/anaconda3/envs/toxicity/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/cpu_adam.h:16:10: fatal error: cuda_fp16.h: No such file or directory
#include <cuda_fp16.h>
^~~~~~~~~~~~~
compilation terminated.
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/mnt/cache/weishao4/anaconda3/envs/toxicity/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1900, in _run_ninja_build
subprocess.run(
File "/mnt/cache/weishao4/anaconda3/envs/toxicity/lib/python3.9/subprocess.py", line 528, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/mnt/cache/weishao4/Projects/Toxicity/LLM_fine_tune/ToxDetLLaMa/main.py", line 620, in
main()
File "/mnt/cache/weishao4/Projects/Toxicity/LLM_fine_tune/ToxDetLLaMa/main.py", line 599, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/mnt/cache/weishao4/Projects/Toxicity/LLM_fine_tune/ToxDetLLaMa/transformers/src/transformers/trainer.py", line 1648, in train
return inner_training_loop(
File "/mnt/cache/weishao4/Projects/Toxicity/LLM_fine_tune/ToxDetLLaMa/transformers/src/transformers/trainer.py", line 1717, in _inner_training_loop
deepspeed_engine, optimizer, lr_scheduler = deepspeed_init(
File "/mnt/cache/weishao4/Projects/Toxicity/LLM_fine_tune/ToxDetLLaMa/transformers/src/transformers/deepspeed.py", line 378, in deepspeed_init
deepspeed_engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
File "/mnt/cache/weishao4/anaconda3/envs/toxicity/lib/python3.9/site-packages/deepspeed/init.py", line 165, in initialize
engine = DeepSpeedEngine(args=args,
File "/mnt/cache/weishao4/anaconda3/envs/toxicity/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 309, in init
self._configure_optimizer(optimizer, model_parameters)
File "/mnt/cache/weishao4/anaconda3/envs/toxicity/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1174, in _configure_optimizer
basic_optimizer = self._configure_basic_optimizer(model_parameters)
File "/mnt/cache/weishao4/anaconda3/envs/toxicity/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1230, in _configure_basic_optimizer
optimizer = DeepSpeedCPUAdam(model_parameters,
File "/mnt/cache/weishao4/anaconda3/envs/toxicity/lib/python3.9/site-packages/deepspeed/ops/adam/cpu_adam.py", line 94, in init
self.ds_opt_adam = CPUAdamBuilder().load()
File "/mnt/cache/weishao4/anaconda3/envs/toxicity/lib/python3.9/site-packages/deepspeed/ops/op_builder/builder.py", line 454, in load
return self.jit_load(verbose)
File "/mnt/cache/weishao4/anaconda3/envs/toxicity/lib/python3.9/site-packages/deepspeed/ops/op_builder/builder.py", line 497, in jit_load
op_module = load(name=self.name,
File "/mnt/cache/weishao4/anaconda3/envs/toxicity/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1284, in load
return _jit_compile(
File "/mnt/cache/weishao4/anaconda3/envs/toxicity/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1508, in _jit_compile
_write_ninja_file_and_build_library(
File "/mnt/cache/weishao4/anaconda3/envs/toxicity/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1623, in _write_ninja_file_and_build_library
_run_ninja_build(
File "/mnt/cache/weishao4/anaconda3/envs/toxicity/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1916, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension 'cpu_adam'
Loading extension module cpu_adam...
Traceback (most recent call last):
File "/mnt/cache/weishao4/Projects/Toxicity/LLM_fine_tune/ToxDetLLaMa/main.py", line 620, in
main()
File "/mnt/cache/weishao4/Projects/Toxicity/LLM_fine_tune/ToxDetLLaMa/main.py", line 599, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/mnt/cache/weishao4/Projects/Toxicity/LLM_fine_tune/ToxDetLLaMa/transformers/src/transformers/trainer.py", line 1648, in train
return inner_training_loop(
File "/mnt/cache/weishao4/Projects/Toxicity/LLM_fine_tune/ToxDetLLaMa/transformers/src/transformers/trainer.py", line 1717, in _inner_training_loop
deepspeed_engine, optimizer, lr_scheduler = deepspeed_init(
File "/mnt/cache/weishao4/Projects/Toxicity/LLM_fine_tune/ToxDetLLaMa/transformers/src/transformers/deepspeed.py", line 378, in deepspeed_init
deepspeed_engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
File "/mnt/cache/weishao4/anaconda3/envs/toxicity/lib/python3.9/site-packages/deepspeed/init.py", line 165, in initialize
engine = DeepSpeedEngine(args=args,
File "/mnt/cache/weishao4/anaconda3/envs/toxicity/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 309, in init
self._configure_optimizer(optimizer, model_parameters)
File "/mnt/cache/weishao4/anaconda3/envs/toxicity/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1174, in _configure_optimizer
basic_optimizer = self._configure_basic_optimizer(model_parameters)
File "/mnt/cache/weishao4/anaconda3/envs/toxicity/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1230, in _configure_basic_optimizer
optimizer = DeepSpeedCPUAdam(model_parameters,
File "/mnt/cache/weishao4/anaconda3/envs/toxicity/lib/python3.9/site-packages/deepspeed/ops/adam/cpu_adam.py", line 94, in init
self.ds_opt_adam = CPUAdamBuilder().load()
File "/mnt/cache/weishao4/anaconda3/envs/toxicity/lib/python3.9/site-packages/deepspeed/ops/op_builder/builder.py", line 454, in load
return self.jit_load(verbose)
File "/mnt/cache/weishao4/anaconda3/envs/toxicity/lib/python3.9/site-packages/deepspeed/ops/op_builder/builder.py", line 497, in jit_load
op_module = load(name=self.name,
File "/mnt/cache/weishao4/anaconda3/envs/toxicity/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1284, in load
return _jit_compile(
File "/mnt/cache/weishao4/anaconda3/envs/toxicity/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1534, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
File "/mnt/cache/weishao4/anaconda3/envs/toxicity/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1936, in _import_module_from_library
module = importlib.util.module_from_spec(spec)
File "", line 565, in module_from_spec
File "", line 1173, in create_module
File "", line 228, in _call_with_frames_removed
ImportError: /mnt/cache/weishao4/.cache/torch_extensions/py39_cu117/cpu_adam/cpu_adam.so: cannot open shared object file: No such file or directory
Exception ignored in: <function DeepSpeedCPUAdam.del at 0x7faf8116e8b0>
Traceback (most recent call last):
File "/mnt/cache/weishao4/anaconda3/envs/toxicity/lib/python3.9/site-packages/deepspeed/ops/adam/cpu_adam.py", line 102, in del
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
Exception ignored in: <function DeepSpeedCPUAdam.del at 0x7fc8dcf6e8b0>
Traceback (most recent call last):
File "/mnt/cache/weishao4/anaconda3/envs/toxicity/lib/python3.9/site-packages/deepspeed/ops/adam/cpu_adam.py", line 102, in del
self.ds_opt_adam.destroy_adam(self.opt_id)
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 848830) of binary: /mnt/cache/weishao4/anaconda3/envs/toxicity/bin/python
Traceback (most recent call last):
File "/mnt/cache/weishao4/anaconda3/envs/toxicity/bin/torchrun", line 8, in
sys.exit(main())
File "/mnt/cache/weishao4/anaconda3/envs/toxicity/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, **kwargs)
File "/mnt/cache/weishao4/anaconda3/envs/toxicity/lib/python3.9/site-packages/torch/distributed/run.py", line 762, in main
run(args)
File "/mnt/cache/weishao4/anaconda3/envs/toxicity/lib/python3.9/site-packages/torch/distributed/run.py", line 753, in run
elastic_launch(
File "/mnt/cache/weishao4/anaconda3/envs/toxicity/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 132, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/mnt/cache/weishao4/anaconda3/envs/toxicity/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

/mnt/cache/weishao4/Projects/Toxicity/LLM_fine_tune/ToxDetLLaMa/main.py FAILED

Failures:
[1]:
time : 2023-09-27_14:00:16
host : xgcsdx-SYS-740GP-TNRT
rank : 1 (local_rank: 1)
exitcode : 1 (pid: 848831)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Root Cause (first observed failure):
[0]:
time : 2023-09-27_14:00:16
host : xgcsdx-SYS-740GP-TNRT
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 848830)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Environment:
A6000 80G
Pytorch 1.13.1
Python 3.9
CUDA 11.7
DeepSpeed 0.9.3

CUDA_HOME is not found when you try to install Deepspeed

This error is very common when you are a deep PyTorch user. Due to some reasons (I also don't know more details but may be known in the future. hhh!), the coda version PyTorch only automatically install partial CUDA dependencies. Unfortunately, the nvcc is not included. This leads to the lost of CUDA_HOME and failure of installing deepspeed. A recommend way to tackle with this problem is that: Install the following dependencies after installing CUDA version PyTorch.

conda install -c nvidia cudatoolkit

conda install -c "nvidia/label/cuda-11.7.0" cuda-nvcc

Reference link:
https://blog.csdn.net/muyao987/article/details/130426069

https://anaconda.org/nvidia/cuda-nvcc

https://www.zhihu.com/question/344950161

https://blog.csdn.net/weixin_44589524/article/details/131663046

No data loaded when you pass a self-constructed model for trainer

When you create a new model for trainer, you should indicate the inputs for forward function. Like

forward(self, input_ids, attention_mask, labels, **kwargs):

The "input_ids", "attention_mask" are the output variables of preprocess_function for data.map().

If the parameters of forward do not contain the output of preprocess_function, no data will be loaded due to the existing of remove_unused_column in trainer class.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.