Found 7 modules to quantize: ['k_proj', 'gate_proj', 'o_proj', 'down_proj', 'q_proj', 'up_proj', 'v_proj']
trainable params: 250,347,520 || all params: 6,922,327,040 || trainable%: 3.6165225733108386
/opt/conda/lib/python3.10/site-packages/transformers/optimization.py:411: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set no_deprecation_warning=True
to disable this warning
warnings.warn(
0%| | 0/1872 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/opt/ml/code/run_clm.py", line 253, in
main()
File "/opt/ml/code/run_clm.py", line 249, in main
training_function(args)
File "/opt/ml/code/run_clm.py", line 212, in training_function
trainer.train()
File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 1539, in train
return inner_training_loop(
File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 1787, in _inner_training_loop
for step, inputs in enumerate(epoch_iterator):
File "/opt/conda/lib/python3.10/site-packages/accelerate/data_loader.py", line 394, in iter
next_batch = next(dataloader_iter)
File "/opt/conda/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 634, in next
data = self._next_data()
File "/opt/conda/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 678, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/opt/conda/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 54, in fetch
return self.collate_fn(data)
File "/opt/conda/lib/python3.10/site-packages/transformers/data/data_collator.py", line 70, in default_data_collator
return torch_default_data_collator(features)
File "/opt/conda/lib/python3.10/site-packages/transformers/data/data_collator.py", line 136, in torch_default_data_collator
batch[k] = torch.tensor([f[k] for f in features])
ValueError: expected sequence of length 2048 at dim 1 (got 0)
0%| | 0/1872 [00:00<?, ?it/s]
2023-11-01 10:02:48,388 sagemaker-training-toolkit INFO Waiting for the process to finish and give a return code.
2023-11-01 10:02:48,388 sagemaker-training-toolkit INFO Done waiting for a return code. Received 1 from exiting process.
2023-11-01 10:02:48,388 sagemaker-training-toolkit ERROR Reporting training FAILURE
2023-11-01 10:02:48,388 sagemaker-training-toolkit ERROR ExecuteUserScriptError:
ExitCode 1
ErrorMessage "ValueError: expected sequence of length 2048 at dim 1 (got 0)
0%| | 0/1872 [00:00<?, ?it/s]"
Command "/opt/conda/bin/python3.10 run_clm.py --dataset_path /opt/ml/input/data/training --epochs 24 --hf_token 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 --lr 0.0002 --merge_weights True --model_id meta-llama/Llama-2-13b-hf --per_device_train_batch_size 2"
2023-11-01 10:02:48,388 sagemaker-training-toolkit ERROR Encountered exit_code 1
I was trying to fine-tune the model with my data, i have followed the same steps and stored the data to the s3 bucket aswell
while started the training i got this error?
I checked my input sequence length it's 2048, not able to find out what's wrong
print("Shape of processed data:", llm_dataset.shape)
# Assuming llm_dataset["input_ids"] contains the tokenized sequences
print("Length of sequences:", len(llm_dataset["input_ids"][0]))
@philschmid any help will be appreciated?