>(base) root@84d353835da2:/workspace/mucoco# bash decode_example.sh data output plain debug plain
Some weights of the model checkpoint at /workspace/mucoco/primary_model were not used when initializing GPT2LMHeadModel: ['transformer.extra_embedding_project.bias', 'transformer.extra_embedding_project.weight']
- This IS expected if you are initializing GPT2LMHeadModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing GPT2LMHeadModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
batch_size为1
skip this example? Fears for T N pension after talks . Unions representing workers at Turner Newall say they are ' disappointed ' after talks with stricken parent firm Federal Mogul . [yes(y)/maybe(m)/no(n)]n
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Fears for T N pension after talks . Unions representing workers at Turner Newall say they are ' disappointed ' after talks with stricken parent firm Federal Mogul . unions representing workers at Turner Newall have expressed disappointment after talks with the firm's parent company, Federal Mogul. tensor([[ 3260, 6130, 351, 15406, 968, 439, 11, 791, 507, 10200,
3259, 379, 15406, 968, 439, 6241, 18641, 287, 262, 4081,
1222, 499, 418, 26, 82, 2560, 1664, 11, 5618, 30926,
377, 13]])
predicting a sentence length: 32
Traceback (most recent call last):
File "/workspace/mucoco/decode.py", line 4, in <module>
cli_main()
File "/workspace/mucoco/mucoco/decode.py", line 821, in cli_main
main(args)
File "/workspace/mucoco/mucoco/decode.py", line 565, in main
optimizer.backward(total_batchloss, retain_graph=True, scaler=scaler)
File "/workspace/mucoco/mucoco/utils/optim.py", line 360, in backward
loss.backward(retain_graph=retain_graph)
File "/opt/conda/lib/python3.9/site-packages/torch/_tensor.py", line 396, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/opt/conda/lib/python3.9/site-packages/torch/autograd/__init__.py", line 173, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: CUDA out of memory. Tried to allocate 7.67 GiB (GPU 0; 23.70 GiB total capacity; 15.35 GiB already allocated; 3.61 GiB free; 19.13 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF