Comments (4)
the following is detailed error message:
Traceback (most recent call last):
File "pretrain_gpt.py", line 276, in
pretrain(train_valid_test_datasets_provider, model_provider, forward_step,
File "/root/large_model/Megatron-DeepSpeed/megatron/training.py", line 163, in pretrain
iteration = train(forward_step_func,
File "/root/large_model/Megatron-DeepSpeed/megatron/training.py", line 877, in train
train_step(forward_step_func,
File "/root/large_model/Megatron-DeepSpeed/megatron/training.py", line 499, in train_step
losses_reduced = forward_backward_func(
File "/root/large_model/Megatron-DeepSpeed/megatron/schedules.py", line 154, in forward_backward_no_pipelining
backward_step(optimizer, input_tensor, output_tensor,
File "/root/large_model/Megatron-DeepSpeed/megatron/schedules.py", line 104, in backward_step
torch.autograd.backward(output_tensor, grad_tensors=output_tensor_grad)
File "/opt/conda/lib/python3.8/site-packages/torch/autograd/init.py", line 145, in backward
Variable._execution_engine.run_backward(
File "/opt/conda/lib/python3.8/site-packages/torch/autograd/function.py", line 89, in apply
return self._forward_cls.backward(self, *args) # type: ignore
File "/root/large_model/Megatron-DeepSpeed/megatron/mpu/random.py", line 311, in backward
torch.autograd.backward(outputs, args)
File "/opt/conda/lib/python3.8/site-packages/torch/autograd/init.py", line 145, in backward
Variable._execution_engine.run_backward(
RuntimeError: element 1 of tensors does not require grad and does not have a grad_fn
from megatron-deepspeed.
I found the same problem.
help!
from megatron-deepspeed.
git checkout v2.4
from megatron-deepspeed.
git checkout v2.4
thanks, i have trained gpt by megatron-lm repo successfully.
i will try this way later.
from megatron-deepspeed.
Related Issues (20)
- ImportError: /mnt/afs/hzy/Megatron-DeepSpeed/megatron/fused_kernels/build/scaled_upper_triang_masked_softmax_cuda.so HOT 2
- error in generate_text.sh HOT 1
- bos_token_id is assigned to be eos_token_id in _HFTokenizer HOT 1
- how to convert deepspeed model to megatron, when pp=2, tp=2, nnode=2 HOT 1
- Doubts about GPU memory
- [QUESTION]how to use' nsight compute' to profile 'pretrain_llama2_distributed.sh' in the 'examples_deepspeed' folder ?
- it seems there is a version problem please help me
- [QUESTION] Does the dev team have a plan to merge Mega-LM 0.4?
- Error set num-experts>1 when running the generate_test.sh
- 2nodes, 4 gpu, tp=2,pp=2, timeout
- The link (https://the-eye.eu/public/AI/pile_neox/data/BookCorpusDataset_text_document.bin ) has expired. HOT 1
- FileNotFoundError: [Errno 2] No such file or directory: 'dataset/index-cache/xxx_doc_idx.npy' HOT 3
- Unreasonably low throughput on HGX-H100s
- Loss is increasing when fine-tuning from a Megatron-Deepspeed pretrained checkpoint.
- Problem in hf2megads_weight_converter.py
- Fine-tune llama2 with sequence parallelism HOT 3
- Bugs in GPT2 Inference Example HOT 2
- [REQUEST] Could you add a new release version tag to Megatron-Deepspeed?Thanks HOT 2
- [BUG] Problems with Mixture-of-Experts (MoE)
- Pipeline parallelism + CPU offload?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from megatron-deepspeed.