Comments (17)
Yes, keep the spit. I strongly recommend using data_dir rather than data_files. Keep the same config, but replace data_files with data_dir
from llm-foundry.
from llm-foundry.
I tried fine tuning mpt7b using dolly dataset. Using below command:
composer train.py yamls/finetune/mpt-7b_dolly_sft.yaml
Before strating training i am getting below error:
[Eval batch=321/321] Eval on eval data:
Eval metrics/eval/LanguageCrossEntropy: 9.1594
Eval metrics/eval/LanguagePerplexity: 9503.6523
/home/stsingha/LLM/llm-foundry/llmfoundry-venv/lib/python3.9/site-packages/torch/utils/data/dataloader.py:554: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
warnings.warn(_create_warning_msg(
Traceback (most recent call last):
File "", line 21, in _bwd_kernel
KeyError: ('2-.-0-.-0-842f0fbd42a6607893f7134cdd9d16f2-2b0c5161c53c71b37ae20a9996ee4bb8-c1f92808b4e4644c1732e8338187ac87-f24b6aa9b101a518b6a4a6bddded372e-12f7ac1ca211e037f62a7c0c323d9990-5c5e32ff210f3b7f56c98ca29917c25e-06f0df2d61979d629033f4a22eff5198-0dd03b0bd512a184b3512b278d9dfa59-d35ab04ae841e2714a253c523530b071', (torch.bfloat16, torch.bfloat16, torch.bfloat16, torch.bfloat16, torch.bfloat16, torch.float32, torch.bfloat16, torch.bfloat16, torch.float32, torch.float32, 'fp32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32'), ('vector', True, 128, False, True, True, True, 128, 128), (True, True, True, True, True, True, True, True, True, True, (False,), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False)))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/stsingha/LLM/llm-foundry/scripts/train/train.py", line 254, in
main(cfg)
File "/home/stsingha/LLM/llm-foundry/scripts/train/train.py", line 243, in main
trainer.fit()
File "/home/stsingha/LLM/llm-foundry/llmfoundry-venv/lib/python3.9/site-packages/composer/trainer/trainer.py", line 1766, in fit
self._train_loop()
File "/home/stsingha/LLM/llm-foundry/llmfoundry-venv/lib/python3.9/site-packages/composer/trainer/trainer.py", line 1940, in _train_loop
total_loss_dict = self._train_batch(use_grad_scaling)
File "/home/stsingha/LLM/llm-foundry/llmfoundry-venv/lib/python3.9/site-packages/composer/trainer/trainer.py", line 2115, in _train_batch
optimizer.step(closure=lambda **kwargs: self._train_microbatches(
File "/home/stsingha/LLM/llm-foundry/llmfoundry-venv/lib/python3.9/site-packages/torch/optim/lr_scheduler.py", line 68, in wrapper
return wrapped(*args, **kwargs)
File "/home/stsingha/LLM/llm-foundry/llmfoundry-venv/lib/python3.9/site-packages/torch/optim/optimizer.py", line 140, in wrapper
out = func(*args, **kwargs)
File "/home/stsingha/LLM/llm-foundry/llmfoundry-venv/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/stsingha/LLM/llm-foundry/llmfoundry-venv/lib/python3.9/site-packages/composer/optim/decoupled_weight_decay.py", line 288, in step
loss = closure()
File "/home/stsingha/LLM/llm-foundry/llmfoundry-venv/lib/python3.9/site-packages/composer/trainer/trainer.py", line 2115, in
optimizer.step(closure=lambda **kwargs: self._train_microbatches(
File "/home/stsingha/LLM/llm-foundry/llmfoundry-venv/lib/python3.9/site-packages/composer/trainer/trainer.py", line 2213, in _train_microbatches
microbatch_loss_dict = self._train_microbatch(use_grad_scaling, current_batch_size, is_final_microbatch)
File "/home/stsingha/LLM/llm-foundry/llmfoundry-venv/lib/python3.9/site-packages/composer/trainer/trainer.py", line 2340, in _train_microbatch
microbatch_loss.backward(create_graph=self._backwards_create_graph)
File "/home/stsingha/LLM/llm-foundry/llmfoundry-venv/lib/python3.9/site-packages/torch/_tensor.py", line 487, in backward
torch.autograd.backward(
File "/home/stsingha/LLM/llm-foundry/llmfoundry-venv/lib/python3.9/site-packages/torch/autograd/init.py", line 197, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/home/stsingha/LLM/llm-foundry/llmfoundry-venv/lib/python3.9/site-packages/torch/autograd/function.py", line 267, in apply
return user_fn(self, *args)
File "/home/stsingha/LLM/llm-foundry/llmfoundry-venv/lib/python3.9/site-packages/flash_attn/flash_attn_triton.py", line 827, in backward
_flash_attn_backward(do, q, k, v, o, lse, dq, dk, dv,
File "/home/stsingha/LLM/llm-foundry/llmfoundry-venv/lib/python3.9/site-packages/flash_attn/flash_attn_triton.py", line 694, in _flash_attn_backward
_bwd_kernel[grid](
File "/home/stsingha/LLM/llm-foundry/llmfoundry-venv/lib/python3.9/site-packages/triton/runtime/jit.py", line 106, in launcher
return self.run(*args, grid=grid, **kwargs)
File "/home/stsingha/LLM/llm-foundry/llmfoundry-venv/lib/python3.9/site-packages/triton/runtime/autotuner.py", line 73, in run
timings = {config: self._bench(*args, config=config, **kwargs)
File "/home/stsingha/LLM/llm-foundry/llmfoundry-venv/lib/python3.9/site-packages/triton/runtime/autotuner.py", line 73, in
timings = {config: self._bench(*args, config=config, **kwargs)
File "/home/stsingha/LLM/llm-foundry/llmfoundry-venv/lib/python3.9/site-packages/triton/runtime/autotuner.py", line 63, in _bench
return do_bench(kernel_call)
File "/home/stsingha/LLM/llm-foundry/llmfoundry-venv/lib/python3.9/site-packages/triton/testing.py", line 140, in do_bench
fn()
File "/home/stsingha/LLM/llm-foundry/llmfoundry-venv/lib/python3.9/site-packages/triton/runtime/autotuner.py", line 62, in kernel_call
self.fn.run(*args, num_warps=config.num_warps, num_stages=config.num_stages, **current)
File "/home/stsingha/LLM/llm-foundry/llmfoundry-venv/lib/python3.9/site-packages/triton/runtime/autotuner.py", line 200, in run
return self.fn.run(*args, **kwargs)
File "", line 43, in _bwd_kernel
RuntimeError: Triton Error [CUDA]: invalid argument
Could you please help in this issue. @arpitkk @baptistejamin @alextrott16
from llm-foundry.
Hi Team, I have identified the issue there was problem with batch size.. now its working fine.. thanks for support !!
from llm-foundry.
I tried doing the same, and I do agree the instructions are not clear at all.
from llm-foundry.
Sorry for the lack of clarity here, we'll update the docs.
Going off of @arpitkk 's example, you would want to use train_loader.dataset.hf_name: json
and keep the rest the same.
Just to clarify, that config will eventually influence the behavior of this dataset-building method here:
So, by setting your config to
train_loader:
name: finetuning
dataset:
hf_name: json
hf_kwargs:
data_files:
train: /path/to/train.jsonl
preprocessing_fn: my.import.path:my_preprocessing_fn
split: train
The build_from_hf
method will effectively execute the following code:
import datasets
from my.import.path import my_preprocessing_fn
dataset = datasets.load_dataset('json', split='train', data_files={'train': '/path/to/train.jsonl'})
def dataset_mapper(example: Dict):
example = my_preprocessing_fn(example)
return _tokenize_formatted_example(example, tokenizer)
columns_to_remove = list(dataset[0].keys())
tokenized_dataset = dataset.map(
dataset_mapper,
batched=False,
remove_columns=columns_to_remove,
)
Does that help to clear things up? If so, I can work a similar explanation into the README. If not, please let me know what still feels unclear!
from llm-foundry.
There might be an issue then. Here is the config I am using:
train_loader:
name: finetuning
dataset:
hf_name: json
hf_kwargs:
data_files:
train: /mnt/training/mylocaldataset/train.jsonl
preprocessing_fn: mylocaldataset.utils:prep_fn
split: train
max_seq_len: ${max_seq_len}
allow_pad_trimming: false
decoder_only_format: true
shuffle: true
I am running with: composer llm-foundry/scripts/train/train.py mpt.yml save_folder=mpt-tuned
at the path /mnt/training
.
wc -l /mnt/training/mylocaldataset/train.jsonl: 1212821
I will debug this today and let you know if I make some progress :)
from llm-foundry.
Issue found!
It seems there are a couple of bugs in the HuggingFace library. First, due to a regex problem the system is mixing the train.jsonl
and the split train
.
The dataset should not be named train.jsonl
. Name it for instance prompts.jsonl
.
Also, I now use data_dir instead and it works fine this way.
dataset:
hf_name: json
hf_kwargs:
keep_in_memory: true
data_dir: /mnt/training/mylocaldataset
preprocessing_fn: mylocaldataset.utils:prep_fn
from llm-foundry.
@baptistejamin, do you have to specify the splits? so data_dir is only the dir? how would it find prompt.jsonl then?
from llm-foundry.
i do not know why, but i keep getting error "FileNotFoundError: Unable to find '/workspace/scripts/train/train' at /workspace/scripts/train"
at dataset = datasets.load_dataset(dataset_name, split=split, **kwargs)
my yaml is # Dataloaders train_loader: name: finetuning dataset: hf_name: json hf_kwargs: data_files: train: data/train.json preprocessing_fn: preprocess_investopedia:preprocess_investopedia
the data folder is inside the current working dir.
from llm-foundry.
The global batch size is too high. try with a batch size of 1, and then increase it until OOM
from llm-foundry.
Thanks for helping to surface these issues!!
@baptistejamin @arpitkk Did the explanation I posted above provide a useful intuition for how to set up the YAML? I want to make sure our README instructions are clear. I'll use your feedback to update them.
I'll also aim to include some of the gotchas that you have caught, e.g., data_dir vs data_files, relative pathing.
from llm-foundry.
Hi, After running few batches the code is failing with the below errors:
IndexError: Caught IndexError in DataLoader worker process 6.
Original Traceback (most recent call last):
File "/home/anaconda3/envs/mpt-train/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
data = fetcher.fetch(index)
File "/home/anaconda3/envs/mpt-train/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 61, in fetch
return self.collate_fn(data)
File "/home/llm-foundry/llmfoundry/data/finetuning/collator.py", line 116, in call
batch = self._process_and_batch_decoder_only(examples)
File "/home/llm-foundry/llmfoundry/data/finetuning/collator.py", line 222, in _process_and_batch_decoder_only
batch = self.tokenizer.pad(
File "/home/anaconda3/envs/mpt-train/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2949, in pad
if isinstance(encoded_inputs, (list, tuple)) and isinstance(encoded_inputs[0], Mapping):
IndexError: list index out of range
I am using the below yaml file
max_seq_len: 2048
global_seed: 17
Run Name
run_name: # If left blank, will be read from env var $COMPOSER_RUN_NAME
Model
These must match pretraining
model:
name: hf_causal_lm
device: cuda:0
pretrained: true
pretrained_model_name_or_path: mosaicml/mpt-7b
Tokenizer
tokenizer:
name: EleutherAI/gpt-neox-20b
kwargs:
model_max_length: ${max_seq_len}
dataset: &hf_dataset
hf_name: json
hf_kwargs:
data_files:
train: /home/MPT-7B/mpt_dataset/mpt-train.jsonl
test: /home/MPT-7B/mpt_dataset/mpt-test.jsonl
Dataloaders
train_loader: &train_loader
name: finetuning
dataset:
<<: *hf_dataset
split: train
max_seq_len: ${max_seq_len}
allow_pad_trimming: false
decoder_only_format: true
shuffle: true
# Use python llmfoundry/data/packing.py --yaml-path /path/to/this/yaml/ ...
to profile
# this run's optimal packing_ratio
# packing_ratio:
drop_last: true
num_workers: 8
pin_memory: false
prefetch_factor: 2
persistent_workers: true
timeout: 0
eval_loader:
<<: *train_loader
dataset:
<<: *hf_dataset
split: test
max_seq_len: ${max_seq_len}
allow_pad_trimming: false
decoder_only_format: true
shuffle: false
Optimization
scheduler:
name: linear_decay_with_warmup # linear no warmup is HF default which dolly used
t_warmup: 0ba
alpha_f: 0
optimizer:
mimic HF defaults to replicate dolly
name: decoupled_adamw
lr: 1.0e-5
betas:
- 0.9
- 0.999
eps: 1.0e-8
weight_decay: 0
algorithms:
gradient_clipping:
clipping_type: norm
clipping_threshold: 1.0
max_duration: 1ep
eval_interval: 500ba
eval_first: false
eval_subset_num_batches: -1
global_train_batch_size: 2
System
seed: ${global_seed}
device_eval_batch_size: 1
device_train_microbatch_size: 1
device_train_microbatch_size: auto
precision: amp_bf16
FSDP
fsdp_config:
sharding_strategy: FULL_SHARD
mixed_precision: PURE
activation_checkpointing: false
activation_checkpointing_reentrant: false
activation_cpu_offload: false
limit_all_gathers: true
verbose: false
Logging
progress_bar: false
log_to_console: true
console_log_interval: 1ba
callbacks:
speed_monitor:
window_size: 10
lr_monitor: {}
memory_monitor: {}
runtime_estimator: {}
loggers:
wandb: {}
Checkpoint to local filesystem or remote object store
save_interval: 2000ba
save_num_checkpoints_to_keep: 1 # Important, this cleans up checkpoints saved to DISK
save_folder: ./llm_local_finetune/checkpoints
save_folder: s3://my-bucket/my-folder/{run_name}/checkpoints
Load from remote object store
REPLACE THE BELOW with you own checkpoint!
#load_path: oci://my-bucket/my-folder/mpt-7b/checkpoints/some_checkpoint.pt
The training is failing at 486 ba irrespective of which dataset I use and I check the if any empty inputs are getting passed to collator.py but it has data
[{'input_ids': [30003, 310, 271, 9775, 326, 8631, 247, 4836, 15, 19566, 247, 2380, 326, 20420, 29141, 253, 2748, 15, 187, 187, 4118, 41959, 27, 187, 2513, 627, 667, 1039, 281, 1721, 555, 14, 24382, 247, 40315, 8393, 32, 535, 187, 4118, 19371, 27, 187], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 'labels': [8469, 20444, 20798, 476, 320, 1892, 281, 1721, 555, 14, 24382, 13, 1580, 597, 10748, 1453, 616, 2133, 3276, 285, 18676, 407, 1469, 281, 253, 15336, 347, 3058, 13, 2299, 352, 310, 1896, 949, 247, 9950, 3733, 1232, 15, 50276, 4943, 403, 690, 8521, 7259, 323, 5547, 3733, 27, 187, 187, 18, 15, 50276, 12864, 634, 40315, 8393, 715, 247, 1355, 13, 6537, 2317, 342, 521, 390, 617, 7583, 2739, 285, 23908, 13, 285, 1918, 634, 40315, 8393, 5044, 3733, 15, 187, 19, 15, 50276, 13502, 38529, 9848, 3057, 342, 634, 40315, 8393, 4768, 253, 3733, 1232, 15, 187, 20, 15, 50276, 30802, 472, 1721, 555, 3879, 342, 1355, 26574, 13, 2739, 285, 1132, 2606, 15, 9225, 504, 3081, 26574, 846, 1016, 5547, 3733, 6874, 15, 209, 187, 21, 15, 50276, 5279, 673, 13, 13237, 2572, 253, 673, 634, 40315, 8393, 310, 7591, 281, 2289, 521, 390, 617, 1721, 555, 2170, 15, 50276, 21914, 326, 253, 3733, 1232, 3936, 673, 13, 594, 22450, 285, 5185, 49495, 403, 253, 2234, 281, 2323, 15, 209, 187, 22, 15, 50276, 29146, 1230, 4575, 634, 40315, 8393, 434, 6196, 281, 359, 266, 779, 390, 617, 432, 3081, 26574, 285, 6558, 253, 6799, 3879, 15, 187, 23, 15, 50276, 16628, 5277, 634, 40315, 8393, 342, 24443, 4158, 30653, 281, 1361, 731, 755, 9848, 342, 970, 247, 1721, 555, 275, 253, 987, 4328, 15, 50276, 187, 187, 1231, 671, 971, 281, 22175, 326, 40315, 20798, 403, 1355, 285, 28304, 13, 285, 2430, 2714, 1557, 285, 10885, 15, 50276, 6693, 187]}]
from llm-foundry.
see #143 (comment)
from llm-foundry.
Since this issue was originally made to address the lack of clarity around finetuning from a local dataset, I just want to let folks know that we just pushed a PR that includes a much more concrete example of this workflow.
In the scripts/train
directory, you'll find finetune_example
, which includes:
- a detailed README
- an example local training dataset
- an implementation of a preprocessing function for that dataset
- a YAML which puts it all together and can be run locally via
train.py
To help us stay on top of other issues, I'll close this one. If things remain unclear, feel free to add another comment and I'll re-open the issue if necessary. Thank you!
from llm-foundry.
Hi Team, I have identified the issue there was problem with batch size.. now its working fine.. thanks for support !!
@arpitkk can you explain what you changed to get this working? I am running into the same issue.
from llm-foundry.
Hi Team, I have identified the issue there was problem with batch size.. now its working fine.. thanks for support !!
@arpitkk can you explain what you changed to get this working? I am running into the same issue.
I am facing the same issue as well. @arpitkk can you please share as to what you changed to get this working?
from llm-foundry.
Related Issues (20)
- Fly tokenization with multiple streams HOT 12
- Setting Dropout in MPT Prefix-LM after Exporting to HuggingFace Crashes during Fine-tuning HOT 2
- Evaluation for long_context_tasks failed with a KeyError: 'continuation_indices' HOT 3
- Can you add the pre-training of dbrx? HOT 3
- Installation issue from habana_alpha branch HOT 2
- Fine-tune dbrx-instruct on a single VM with 8 H100s HOT 1
- Is there a way to figure out what dependencies are installed in the docker image? HOT 1
- Opt-3b Pretrain YAML config failing with mosaicml/llm-foundry/2.2.1_cu121_flash2-4aef5de docker HOT 1
- Observing 1/2 the throughput on AMD MI250 HOT 4
- Freeze when using cpu offload HOT 2
- Issues in using FP8 for MPT baselines on H100 HOT 2
- FP8 not working HOT 2
- convert_dataset_hf.py example stuck HOT 2
- Is flops calculation correct? HOT 2
- Loss curve differences for pretraining HOT 1
- How to run inference/convert_composer_to_hf.py with MPT-1B model on Habana Gaudi 2, file formats do not match HOT 4
- `ValueError` when following finetuning `mpt-7b-arc-easy--gpu.yaml` example with different default batch size HOT 2
- Issue when installing "pip install -e ".[gpu-flash2]"" HOT 3
- Wrong number of samples for C4? HOT 2
- Composer crashes when attempting to load sharded checkpoint HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from llm-foundry.