Hi, thanks for sharing your excellent work.
I am trying to pretrain the model in my own dataset and an error occurs as shown below.
Start pre-train phase.
Rest for 0.000 hour:
logs/pre_train/soybean-resnet12
Epoch 1, total loss=5.9711 acc=0.0000: 100%|██| 600/600 [00:17<00:00, 34.51it/s]
0%| | 0/3000 [00:06<?, ?it/s]
Traceback (most recent call last):
File "main.py", line 104, in
trainer.pre_train()
File "/home/ubuntu/junwang/paper/bmvc/e3bm/trainer/meta_trainer.py", line 376, in pre_train
(data_shot, data_query))
File "/home/ubuntu/anaconda3/envs/bmvc/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/ubuntu/anaconda3/envs/bmvc/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 152, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/ubuntu/anaconda3/envs/bmvc/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 162, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/ubuntu/anaconda3/envs/bmvc/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply
output.reraise()
File "/home/ubuntu/anaconda3/envs/bmvc/lib/python3.7/site-packages/torch/_utils.py", line 394, in reraise
raise self.exc_type(msg)
TypeError: Caught TypeError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/home/ubuntu/anaconda3/envs/bmvc/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
output = module(*input, **kwargs)
File "/home/ubuntu/anaconda3/envs/bmvc/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/ubuntu/junwang/paper/bmvc/e3bm/model/meta_model.py", line 158, in forward
return self.pretrain_forward(inputs)
File "/home/ubuntu/junwang/paper/bmvc/e3bm/model/meta_model.py", line 166, in pretrain_forward
return self.fc(self.encoder(input))
File "/home/ubuntu/anaconda3/envs/bmvc/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/ubuntu/junwang/paper/bmvc/e3bm/model/resnet12.py", line 113, in forward
x = self.layer1(x)
File "/home/ubuntu/anaconda3/envs/bmvc/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/ubuntu/anaconda3/envs/bmvc/lib/python3.7/site-packages/torch/nn/modules/container.py", line 100, in forward
input = module(input)
File "/home/ubuntu/anaconda3/envs/bmvc/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/ubuntu/junwang/paper/bmvc/e3bm/model/resnet12.py", line 49, in forward
out = self.conv1(x)
File "/home/ubuntu/anaconda3/envs/bmvc/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/ubuntu/anaconda3/envs/bmvc/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 345, in forward
return self.conv2d_forward(input, self.weight)
File "/home/ubuntu/anaconda3/envs/bmvc/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 342, in conv2d_forward
self.padding, self.dilation, self.groups)
TypeError: conv2d(): argument 'input' (position 1) must be Tensor, not tuple
In conclusion, the error is that the model takes tensor as input but a tuple (data_shot, data_query) are given.
In addition, I fould another problem:
When choosing the paratemers to be optimized in the optmizer, the ''self.model' seems incorrect as error occurs, I change 'self.model' to 'self.model.module', it works.
These errors may caused by the multi-gpu scenerio.
However, when I use only one gpu, an error below occurs,
Start pre-train phase.
Rest for 0.000 hour:
logs/pre_train/soybean-resnet12
Epoch 1, total loss=5.4935 acc=0.0000: 100%|██| 600/600 [00:12<00:00, 46.65it/s]
0%| | 0/3000 [00:00<?, ?it/s]
Traceback (most recent call last):
File "main.py", line 104, in
trainer.pre_train()
File "/home/ubuntu/junwang/paper/bmvc/e3bm/trainer/meta_trainer.py", line 376, in pre_train
(data_shot, data_query))
File "/home/ubuntu/anaconda3/envs/bmvc/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/ubuntu/junwang/paper/bmvc/e3bm/model/meta_model.py", line 161, in forward
return self.meta_forward(data_shot, data_query)
File "/home/ubuntu/junwang/paper/bmvc/e3bm/model/meta_model.py", line 220, in meta_forward
grad = torch.autograd.grad(loss, fast_weights)
File "/home/ubuntu/anaconda3/envs/bmvc/lib/python3.7/site-packages/torch/autograd/init.py", line 157, in grad
inputs, allow_unused)
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
I am grateful if you can provide any support to figure out the problem.