I was trying to run tape-embed, but received the following error message (everything went fine when I ran it with the --no_cuda flag):
(protein) wbogud@cuda:~/projects/protein$ time tape-embed transformer ../data/test.fasta embeddings.npz models/tape/bert-base/
20/04/22 16:12:11 - INFO - tape.training - device: cuda n_gpu: 4
20/04/22 16:12:11 - INFO - tape.models.modeling_utils - loading configuration file models/tape/bert-base/config.json
20/04/22 16:12:11 - INFO - tape.models.modeling_utils - Model config {
"attention_probs_dropout_prob": 0.1,
"base_model": "transformer",
"finetuning_task": null,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"initializer_range": 0.02,
"input_size": 768,
"intermediate_size": 3072,
"layer_norm_eps": 1e-12,
"max_position_embeddings": 8192,
"num_attention_heads": 12,
"num_hidden_layers": 12,
"num_labels": -1,
"output_attentions": false,
"output_hidden_states": false,
"output_size": 768,
"pruned_heads": {},
"torchscript": false,
"type_vocab_size": 1,
"vocab_size": 30
}
20/04/22 16:12:11 - INFO - tape.models.modeling_utils - loading weights file models/tape/bert-base/pytorch_model.bin
0%| | 0/1 [00:02<?, ?it/s]
Traceback (most recent call last):
File "/home/wbogud/anaconda3/envs/protein/bin/tape-embed", line 8, in <module>
sys.exit(run_embed())
File "/home/wbogud/anaconda3/envs/protein/lib/python3.8/site-packages/tape/main.py", line 234, in run_embed
training.run_embed(**embed_args)
File "/home/wbogud/anaconda3/envs/protein/lib/python3.8/site-packages/tape/training.py", line 642, in run_embed
outputs = runner.forward(batch, no_loss=True)
File "/home/wbogud/anaconda3/envs/protein/lib/python3.8/site-packages/tape/training.py", line 86, in forward
outputs = self.model(**batch)
File "/home/wbogud/anaconda3/envs/protein/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/home/wbogud/anaconda3/envs/protein/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 155, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/wbogud/anaconda3/envs/protein/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 165, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/wbogud/anaconda3/envs/protein/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply
output.reraise()
File "/home/wbogud/anaconda3/envs/protein/lib/python3.8/site-packages/torch/_utils.py", line 395, in reraise
raise self.exc_type(msg)
StopIteration: Caught StopIteration in replica 1 on device 1.
Original Traceback (most recent call last):
File "/home/wbogud/anaconda3/envs/protein/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
output = module(*input, **kwargs)
File "/home/wbogud/anaconda3/envs/protein/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/home/wbogud/anaconda3/envs/protein/lib/python3.8/site-packages/tape/models/modeling_bert.py", line 443, in forward
dtype=next(self.parameters()).dtype) # fp16 compatibility
StopIteration
Downgrading to PyTorch 1.4.0 solved the issue.