facebookresearch / clevr-iep Goto Github PK
View Code? Open in Web Editor NEWInferring and Executing Programs for Visual Reasoning
License: Other
Inferring and Executing Programs for Visual Reasoning
License: Other
tomtop5@tomtop5-B360-M-AORUS-PRO:~/download/clevr-iep-master$ python3 scripts/run_model.py --program_generator models/CLEVR/program_generator_700k.pt --execution_engine models/CLEVR/execution_engine_700k_strong.pt --image img/CLEVR_val_000013.png --question "Does the small sphere have the same color as the cube left of the gray cube?"
Loading program generator from models/CLEVR/program_generator_700k.pt
Loading execution engine from models/CLEVR/execution_engine_700k_strong.pt
Loading CNN for feature extraction
scripts/run_model.py:133: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad():
instead.
img_var = Variable(torch.FloatTensor(img).type(dtype), volatile=True)
THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=663 error=11 : invalid argument
scripts/run_model.py:146: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad():
instead.
question_var = Variable(question_encoded, volatile=True)
Running the model
Traceback (most recent call last):
File "scripts/run_model.py", line 301, in
main(args)
File "scripts/run_model.py", line 85, in main
run_single_example(args, model)
File "scripts/run_model.py", line 159, in run_single_example
argmax=(args.sample_argmax == 1))
File "/home/tomtop5/download/clevr-iep-master/scripts/iep/models/seq2seq.py", line 162, in reinforce_sample
encoded = self.encoder(x)
File "/home/tomtop5/download/clevr-iep-master/scripts/iep/models/seq2seq.py", line 86, in encoder
out, _ = self.encoder_rnn(embed, (h0, c0))
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/rnn.py", line 192, in forward
output, hidden = func(input, self.all_weights, hx, batch_sizes)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/_functions/rnn.py", line 324, in forward
return func(input, *fargs, **fkwargs)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/_functions/rnn.py", line 288, in forward
dropout_ts)
RuntimeError: CuDNN error: CUDNN_STATUS_SUCCESS
if you have same problem,plz help me to solve it ! Thx!
i use pytorch 0.4 , python 3.5.2 ; GPU:1660 ; CUDA: 10.1 ; Cudnn:7.5.0
When I run baseline model in pytorch 0.2.0, I got following error,
train_loader has 699989 samples
val_loader has 10000 samples
Starting epoch 1
Traceback (most recent call last):
File "scripts/train_model.py", line 491, in <module>
main(args)
File "scripts/train_model.py", line 152, in main
train_loop(args, train_loader, val_loader)
File "scripts/train_model.py", line 235, in train_loop
scores = baseline_model(questions_var, feats_var)
File "/root/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in __call__
result = self.forward(*input, **kwargs)
File "/home/sharedfolder/clevr/clevr-iep/iep/models/baselines.py", line 242, in forward
u = sa(v, u)
File "/root/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in __call__
result = self.forward(*input, **kwargs)
File "/home/sharedfolder/clevr/clevr-iep/iep/models/baselines.py", line 45, in forward
v_tilde = (p.expand_as(v) * v).sum(2).sum(3).view(N, D)
File "/root/anaconda3/lib/python3.6/site-packages/torch/autograd/variable.py", line 476, in sum
return Sum.apply(self, dim, keepdim)
File "/root/anaconda3/lib/python3.6/site-packages/torch/autograd/_functions/reduce.py", line 21, in forward
return input.sum(dim)
RuntimeError: dimension out of range (expected to be in range of [-3, 2], but got 3)
how can I fix this?
Hi there, when I ran the scripts as instructed, I got the RuntimeError. Below are the details:
(PYENV) --- files/clevr-iep ‹master* M?› » python scripts/run_model.py \ 1 ↵
--program_generator models/CLEVR/program_generator_18k.pt \
--execution_engine models/CLEVR/execution_engine_18k.pt \
--image img/CLEVR_val_000013.png \
--question "Does the small sphere have the same color as the cube left of the gray cube?"
Loading program generator from models/CLEVR/program_generator_18k.pt
Loading execution engine from models/CLEVR/execution_engine_18k.pt
Loading CNN for feature extraction
Running the model
y is: None
y is: Variable containing:
1
[torch.cuda.LongTensor of size 1x1 (GPU 0)]
y is: Variable containing:
5
[torch.cuda.LongTensor of size 1 (GPU 0)]
Traceback (most recent call last):
File "scripts/run_model.py", line 301, in <module>
main(args)
File "scripts/run_model.py", line 85, in main
run_single_example(args, model)
File "scripts/run_model.py", line 159, in run_single_example
argmax=(args.sample_argmax == 1))
File "/u/home/downloads/files/clevr-iep/iep/models/seq2seq.py", line 171, in reinforce_sample
logprobs, h, c = self.decoder(encoded, cur_input, h0=h, c0=c)
File "/u/home/downloads/files/clevr-iep/iep/models/seq2seq.py", line 93, in decoder
V_in, V_out, D, H, L, N, T_in, T_out = self.get_dims(y=y)
File "/u/home/downloads/files/clevr-iep/iep/models/seq2seq.py", line 58, in get_dims
T_out = y.size(1) if y is not None else None
RuntimeError: invalid argument 2: out of range at /opt/conda/conda-bld/pytorch_1502009910772/work/torch/lib/THC/generic/THCTensor.c:23
My python is of version 3.6, and the latest pytorch is installed.
I have printed the content of y, and it can be None, 1-by-1 torch.cuda.LongTensor, and 1 torch.cuda.LongTensor. So, when the size is 1, T_out = y.size(1) will lead to the RuntimeError.
Where can I find the test set answers of the CLEVR dataset? Thanks.
我下载完数据之后在运行第二步提取图像特征时出现了内存溢出问题,我把图片缩减为20张之后出现下面这个错误,有谁可以帮我说一下怎么解决吗? 谢谢
python scripts/extract_features.py --input_image_dir data/CLEVR_v1.0/images/train --output_h5_file data/train_features.h5
('data/CLEVR_v1.0/images/train/CLEVR_train_000000.png', 0)
('data/CLEVR_v1.0/images/train/CLEVR_train_000019.png', 19)
Traceback (most recent call last):
File "scripts/extract_features.py", line 114, in
main(args)
File "scripts/extract_features.py", line 108, in main
feat_dset[i0:i1] = feats
TypeError: 'NoneType' object does not support item assignment
When I train the baseline model using the command:
python scripts/train_model.py \
--model_type CNN+LSTM+SA \
--classifier_fc_dims 1024 \
--num_iterations 400000 \
--checkpoint_path data/cnn_lstm_sa_mlp.pt
I found the gpu usage is zero and got the following output:
2017-06-11-22:43:00] 1, 8.812, 30.340, 3.445
2017-06-11-22:43:01] 2, 0.057, 0.237, 3.334
2017-06-11-22:43:17] 3, 0.051, 16.018, 3.215
2017-06-11-22:43:35] 4, 0.047, 17.887, 3.052
2017-06-11-22:43:51] 5, 0.045, 16.408, 2.972
2017-06-11-22:44:08] 6, 0.048, 16.996, 2.878
2017-06-11-22:44:26] 7, 0.044, 17.460, 2.587
2017-06-11-22:44:41] 8, 0.041, 15.066, 2.851
2017-06-11-22:44:46] 9, 0.049, 5.564, 2.603
2017-06-11-22:45:08] 10, 0.048, 22.074, 2.876
2017-06-11-22:45:23] 11, 0.043, 14.640, 2.516
2017-06-11-22:45:37] 12, 0.049, 13.797, 2.853
2017-06-11-22:45:52] 13, 0.053, 15.052, 2.847
2017-06-11-22:46:07] 14, 0.043, 14.625, 2.867
2017-06-11-22:46:21] 15, 0.049, 14.314, 2.676
2017-06-11-22:46:36] 16, 0.047, 14.443, 2.634
I changed the output a little bit. The second column is the actual batch train time and third column is the data loading time. It seems the data loading dominates...
I guess the problem is that the image h5 file is not loaded into the memory and random access the h5 file is very slow...
With the current code, it seems impossible to finish the training within reasonable time.
Assuming each iteration takes 10 second, then training takes 400000 * 10/3600/24 = 46
days
While running baseline model(CNN+LSTM+SA
) in python 3.6 and pytorch 0.2.0, I got following EOFError.
It seems that it happened after several times of checking accuracy.
I found it by setting checkpoint_every
to 1.
Anybody ran into this error?
Checking training accuracy ...
Traceback (most recent call last):
File "scripts/train_model.py", line 498, in <module>
main(args)
File "scripts/train_model.py", line 152, in main
train_loop(args, train_loader, val_loader)
File "scripts/train_model.py", line 276, in train_loop
baseline_model, train_loader)
File "scripts/train_model.py", line 454, in check_accuracy
for batch in loader:
File "/root/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 195, in __next__
idx, batch = self.data_queue.get()
File "/root/anaconda3/lib/python3.6/multiprocessing/queues.py", line 345, in get
return _ForkingPickler.loads(res)
File "/root/anaconda3/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 70, in rebuild_storage_fd
fd = df.detach()
File "/root/anaconda3/lib/python3.6/multiprocessing/resource_sharer.py", line 57, in detach
with _resource_sharer.get_connection(self._id) as conn:
File "/root/anaconda3/lib/python3.6/multiprocessing/resource_sharer.py", line 87, in get_connection
c = Client(address, authkey=process.current_process().authkey)
File "/root/anaconda3/lib/python3.6/multiprocessing/connection.py", line 493, in Client
answer_challenge(c, authkey)
File "/root/anaconda3/lib/python3.6/multiprocessing/connection.py", line 732, in answer_challenge
message = connection.recv_bytes(256) # reject large message
File "/root/anaconda3/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/root/anaconda3/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/root/anaconda3/lib/python3.6/multiprocessing/connection.py", line 383, in _recv
raise EOFError
EOFError
clevr-iep/scripts/run_model.py
Line 240 in 35a4eea
python3.5 scripts/train_model.py --model_type PG --num_train_samples 18000 --num_iterations 20000 --checkpoint_every 1000 --checkpoint_path data/program_generator.pt
/opt/data/penggao/pytorch_project/clevr-iep/clevr-iep/scripts/train_model.py(27)()
-> import iep.utils as utils
(Pdb) c
Reading features from data/train_features.h5
Reading questions from data/train_questions.h5
Reading question data into memory
Reading features from data/val_features.h5
Reading questions from data/val_questions.h5
Reading question data into memory
Here is the program generator:
Seq2Seq (
(encoder_embed): Embedding(93, 300)
(encoder_rnn): LSTM(300, 256, num_layers=2, batch_first=True)
(decoder_embed): Embedding(44, 300)
(decoder_rnn): LSTM(556, 256, num_layers=2, batch_first=True)
(decoder_linear): Linear (256 -> 44)
)
train_loader has 18000 samples
val_loader has 10000 samples
Starting epoch 1
Traceback (most recent call last):
File "scripts/train_model.py", line 27, in
import iep.utils as utils
File "scripts/train_model.py", line 154, in main
train_loop(args, train_loader, val_loader)
File "scripts/train_model.py", line 211, in train_loop
for batch in train_loader:
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 206, in next
idx, batch = self.data_queue.get()
File "/usr/lib/python3.5/multiprocessing/queues.py", line 345, in get
return ForkingPickler.loads(res)
File "/usr/local/lib/python3.5/dist-packages/torch/multiprocessing/reductions.py", line 70, in rebuild_storage_fd
fd = df.detach()
File "/usr/lib/python3.5/multiprocessing/resource_sharer.py", line 57, in detach
with _resource_sharer.get_connection(self._id) as conn:
File "/usr/lib/python3.5/multiprocessing/resource_sharer.py", line 87, in get_connection
c = Client(address, authkey=process.current_process().authkey)
File "/usr/lib/python3.5/multiprocessing/connection.py", line 493, in Client
answer_challenge(c, authkey)
File "/usr/lib/python3.5/multiprocessing/connection.py", line 732, in answer_challenge
message = connection.recv_bytes(256) # reject large message
File "/usr/lib/python3.5/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/usr/lib/python3.5/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/usr/lib/python3.5/multiprocessing/connection.py", line 383, in _recv
raise EOFError
EOFError
I used python 3.7 and pytorch 1.0.
when I try to train on CLEVR it occur a program like below:
python scripts/train_model.py \
--model_type PG
--num_train_samples 18000
--num_iterations 20000
--checkpoint_every 1000
--checkpoint_path data/program_generator.pt
Reading features from data/train_features.h5
Reading questions from data/train_questions.h5
Reading question data into memory
Reading features from data/val_features.h5
Reading questions from data/val_questions.h5
Reading question data into memory
Here is the program generator:
Seq2Seq(
(encoder_embed): Embedding(93, 300)
(encoder_rnn): LSTM(300, 256, num_layers=2, batch_first=True)
(decoder_embed): Embedding(44, 300)
(decoder_rnn): LSTM(556, 256, num_layers=2, batch_first=True)
(decoder_linear): Linear(in_features=256, out_features=44, bias=True)
)
train_loader has 18000 samples
val_loader has 10000 samples
Starting epoch 1
Traceback (most recent call last):
File "scripts/train_model.py", line 492, in
main(args)
File "scripts/train_model.py", line 153, in main
train_loop(args, train_loader, val_loader)
File "scripts/train_model.py", line 210, in train_loop
for batch in train_loader:
File "/home/nicholas/anaconda3/envs/py/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 637, in next
return self._process_next_batch(batch)
File "/home/nicholas/anaconda3/envs/py/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 658, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
KeyError: 'Traceback (most recent call last):\n File "/home/nicholas/anaconda3/envs/py/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 138, in _worker_loop\n samples = collate_fn([dataset[i] for i in batch_indices])\n File "/home/nicholas/anaconda3/envs/py/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 138, in \n samples = collate_fn([dataset[i] for i in batch_indices])\n File "/home/nicholas/code/clevr-iep-master/iep/data.py", line 81, in getitem\n fn_str = self.vocab['program_idx_to_token'][fn_idx]\nKeyError: tensor(1)\n'
is my python&pytorch version didn't fit it?
If anyone can solve this I'll be really appreciate
thx
I see that answers in CLEVR are only for train and validation set,how to get ground-truth answers to be compared with our predicted answers??I couldn't find test.py in your Read.me file,there is only predicted answer.Or I need to upload my answer json file to CLEVR server or something else??
Looking forward to your reply,it will be very helpful for me~~
Thank you very much~~~
I used python 3.5 and pytorch 0.4.
At first, I got RuntimeError: dimension out of range (expected to be in range of [-1, 0], but got 1) at seq2seq.py, line 57, in get_dims
T_out = y.size(1) if y is not None else None
After i change seq2seq model line 173
if argmax:
_, cur_output = probs.max(1)
else:
to
if argmax:
_, cur_output = probs.max(1)
cur_output = cur_output.unsqueeze(0)
else:
as commented by manoja328, I got the IndexError : too many indices for tensor of dimension 1 ,at seq2seq.py, line 182,
in reinforce_sample y[:, t][not_done] = cur_output_data[not_done]
Thank you!
Checking training accuracy ...
Traceback (most recent call last):
File "scripts/train_model.py", line 490, in
main(args)
File "scripts/train_model.py", line 151, in main
train_loop(args, train_loader, val_loader)
File "scripts/train_model.py", line 269, in train_loop
baseline_model, train_loader)
File "scripts/train_model.py", line 460, in check_accuracy
program_pred = program_generator.sample(Variable(questions[i:i+1].cuda(), volatile=True))
File "/home/jiaruizou/research/clevr-iep/iep/models/seq2seq.py", line 154, in sample
y.append(next_y[0, 0, 0])
IndexError: trying to index 3 dimensions of a 2 dimensional tensor
When I train the PG, it encounter this error...
I downloaded CLEVR_v1.0 from https://dl.fbaipublicfiles.com/clevr/CLEVR_v1.0.zip, but there are only 67882 images in the "/images/train", test and val data are both 15000. How to solve it?
Hi, first of all great work on this paper. Looking forward to see it at ICCV.
I have an issue trying to run the baseline models such as CNN+LSTM and LSTM. I get the following error. Have you seen this problem before? It seems it always stops at that iteration. I also saw the error when running all the other baseline methods. Thanks for your help.
6721 0.31234210729599
6722 0.2944237291812897
6723 0.23815006017684937
Traceback (most recent call last):
File "scripts/train_model.py", line 514, in
main(args)
File "scripts/train_model.py", line 160, in main
train_loop(args, train_loader, val_loader)
File "scripts/train_model.py", line 246, in train_loop
baseline_optimizer.step()
File "/home/oliver/.local/lib/python3.5/site-packages/torch/optim/adam.py", line 70, in step
bias_correction1 = 1 - beta1 ** state['step']
OverflowError: (34, 'Numerical result out of range')
I would like to save the parameters of the pretrained model . I have downloaded the pretrained model but while trying to save the predictions in output_h5 , I am getting the following error . I am quite new to this , please help .
This is what I typed in my terminal
python scripts/run_model.py --program_generator models/CLEVR/program_generator_18k.pt --execution_engine models/CLEVR/execution_engine_18k.pt --output_h5 params.h5
$pip3 list --user
DEPRECATION: The default format will switch to columns in the future. You can use --format=(legacy|columns) (or define a format=(legacy|columns) in your pip.conf under the [list] section) to disable this warning.
h5py (2.7.0)
olefile (0.44)
Pillow (4.1.1)
PyYAML (3.12)
torch (0.1.12.post2)
torchvision (0.1.8)
virtualenv (15.1.0)
however:
$python3 scripts/run_model.py --program_generator models/CLEVR/program_generator_18k.pt --execution_engine models/CLEVR/execution_engine_18k.pt --image img/CLEVR_val_000013.png --question "Does the small sphere have the same color as the cube left of the gray cube?"
Traceback (most recent call last):
File "scripts/run_model.py", line 14, in
import torch
File "../.local/lib/python3.6/site-packages/torch/init.py", line 53, in
from torch._C import *
ImportError: libpython3.6m.so.1.0: cannot open shared object file: No such file or directory
In the file programs.py inside the function build_subtree, the script tries to create a new key as follows:
def list_to_tree(program_list):
def build_subtree(cur):
return {
'function': cur['function'],
'value_inputs': [x for x in cur['value_inputs']],
'inputs': [build_subtree(program_list[i]) for i in cur['inputs']],
}
return build_subtree(program_list[-1])
I have tried generating new images with new questions, but never I see the key function in cur[function]. Has it happened with anyone else? Were you able to resolve it?
Hi there, when I ran the scripts as instructed, I got the RuntimeError. Below are the details:
python3 scripts/run_model.py --program_generator models/CLEVR/program_generator_18k.pt --execution_engine models/CLEVR/execution_engine_18k.pt --image img/CLEVR_val_000013.png --question "Does the small sphere have the same color as the cube left of the gray cube?"
Loading program generator from models/CLEVR/program_generator_18k.pt
Loading execution engine from models/CLEVR/execution_engine_18k.pt
Loading CNN for feature extraction
Running the model
/home/pyp/clevr-iep-master/iep/models/seq2seq.py:172: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
probs = F.softmax(logprobs.view(N, -1)) # Now N x V
Traceback (most recent call last):
File "scripts/run_model.py", line 303, in
main(args)
File "scripts/run_model.py", line 85, in main
run_single_example(args, model)
File "scripts/run_model.py", line 161, in run_single_example
argmax=(args.sample_argmax == 1))
File "/home/pyp/clevr-iep-master/iep/models/seq2seq.py", line 183, in reinforce_sample
y[:, t][not_done] = cur_output_data[not_done]
RuntimeError: expand(torch.LongTensor{[1, 1]}, size=[1]): the number of sizes provided (1) must be greater or equal to the number of dimensions in the tensor (2)
============================================================================
I used python 3.5 and pytorch 0.4.
At first, I got RuntimeError: dimension out of range (expected to be in range of [-1, 0], but got 1) at seq2seq.py, line 57, in get_dims
T_out = y.size(1) if y is not None else None
After i change seq2seq model line 173
if argmax:
_, cur_output = probs.max(1)
else:
to
if argmax:
_, cur_output = probs.max(1)
cur_output = cur_output.unsqueeze(0)
else:
as commented by manoja328, I got the RuntimeError in the title: RuntimeError: expand(torch.LongTensor{[1, 1]}, size=[1]): the number of sizes provided (1) must be greater or equal to the number of dimensions in the tensor (2) at seq2seq.py, line 183, in reinforce_sample y[:, t][not_done] = cur_output_data[not_done]
Thank you!
Can someone please explain me how is the loss calculated . Is the loss being calculated by finding cross entropy loss between concatenated natural language q. and concatenated functional program as inputs or something else is taking place .
def compute_loss(self, output_logprobs, y):
"""
Compute loss. We assume that the first element of the output sequence y is
a start token, and that each element of y is left-aligned and right-padded
with self.NULL out to T_out. We want the output_logprobs to predict the
sequence y, shifted by one timestep so that y[0] is fed to the network and
then y[1] is predicted. We also don't want to compute loss for padded
timesteps.
Inputs:
- output_logprobs: Variable of shape (N, T_out, V_out)
- y: LongTensor Variable of shape (N, T_out)
"""
print "hi I am inside "
self.multinomial_outputs = None
V_in, V_out, D, H, L, N, T_in, T_out = self.get_dims(y=y)
mask = y.data != self.NULL
y_mask = Variable(torch.Tensor(N, T_out).fill_(0).type_as(mask))
y_mask[:, 1:] = mask[:, 1:]
y_masked = y[y_mask]
out_mask = Variable(torch.Tensor(N, T_out).fill_(0).type_as(mask))
out_mask[:, :-1] = mask[:, 1:]
out_mask = out_mask.view(N, T_out, 1).expand(N, T_out, V_out)
out_masked = output_logprobs[out_mask].view(-1, V_out)
loss = F.cross_entropy(out_masked, y_masked)
return loss
Thanks!
According to dataset provided on https://cs.stanford.edu/people/jcjohns/clevr/, the answers of test set seems not to be given. Are there other ways (testing on a server) to test model results on test set of CLEVR-v1.0 except sending results to jcjohnson? Thank you very much.
[jalal@goku clevr-iep]$ python scripts/run_model.py --program_generator models/CLEVR/program_generator_18k.pt --execution_engine models/CLEVR/execution_engine_18k.pt --image img/CLEVR_val_000013.png --question "Does the small sphere have the same color as the cube left of the gray cube?"
Loading program generator from models/CLEVR/program_generator_18k.pt
Loading execution engine from models/CLEVR/execution_engine_18k.pt
Loading CNN for feature extraction
Downloading: "https://download.pytorch.org/models/resnet101-5d3b4d8f.pth" to /home/grad3/jalal/.torch/models/resnet101-5d3b4d8f.pth
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 178728960/178728960 [00:06<00:00, 28902620.29it/s]
scripts/run_model.py:133: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
img_var = Variable(torch.FloatTensor(img).type(dtype), volatile=True)
scripts/run_model.py:146: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
question_var = Variable(question_encoded, volatile=True)
Running the model
clevr-iep/iep/models/seq2seq.py:172: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
probs = F.softmax(logprobs.view(N, -1)) # Now N x V
Traceback (most recent call last):
File "scripts/run_model.py", line 301, in <module>
main(args)
File "scripts/run_model.py", line 85, in main
run_single_example(args, model)
File "scripts/run_model.py", line 159, in run_single_example
argmax=(args.sample_argmax == 1))
File "/clevr-iep/iep/models/seq2seq.py", line 170, in reinforce_sample
logprobs, h, c = self.decoder(encoded, cur_input, h0=h, c0=c)
File "/clevr-iep/iep/models/seq2seq.py", line 92, in decoder
V_in, V_out, D, H, L, N, T_in, T_out = self.get_dims(y=y)
File "clevr-iep/iep/models/seq2seq.py", line 57, in get_dims
T_out = y.size(1) if y is not None else None
RuntimeError: Dimension out of range (expected to be in range of [-1, 0], but got 1)
Can you please help how to fix this error?
I don't want to train a whole model and predict. I only want to understand it. Is it possible to debug it, only using CPU without GPU?
Thank you!
In run_model.py, line 161, The type of ' predicted_program' is torch.Tensor.
However, in iep/module/module_net.py, the type of this parameter is restricted to list, tuple and Viarable (dim=2).
I tried to convert predicted_program form torch.Tensor to list or Variable, but failed.
In your paper, you said that you will open source the code for dataset generation. Where can I find the code?
Thank you so much for your amazing code. I learned a lot from your stanford lectures and your open-sourced codes.
With the update of pytorch, the current code may run into error during inference and training. Here I will show the bugs and how to change the code.
Hello,
Regarding the training procedure on step 2:
python scripts/train_model.py --model_type EE --program_generator_start_from data/program_generator.py --num_iterations 100000 --checkpoint_path data/execution_engine.pt
I do not know if I have missed something, but program_generator_start_from
is only invoked inside get_program_generator, for 'PG+EE' and 'PG' model types.
Thank you.
The link for downloading CLEVR dataset seems to be down. I get the error
for https://s3-us-west-1.amazonaws.com/clevr/CLEVR_v1.0.zip
This XML file does not appear to have any style information associated with it. The document tree is shown below.
Hey,
In the paper , in section 3.1 , it is mentioned that the syntax is set by pre-speciying a list of functions - F. I assume this is a fixed set of functions or say a vocabulary.
Where is this vocabulary defined in the code? And It is not clear to me how you arrive at this list of functions for the given problem of QA on CLEVER.
Can any of the authors please answer my question. Please excuse if my question is ambiguous
--
thanks
Hi,
I was using the code with the following configuration
python scripts/train_model.py --model_type PG --num_train_samples 18000 --num_iterations 20000 --checkpoint_every 1000 --checkpoint_path data/program_generator.pt --batch_size 2
on a Tesla K40M. I followed the previous steps on data creation and all the data is available on the local machine.
I observed that the code is stuck in computing loss.backward on following line.
My Memory usage on the GPU stays at 296MB but the volatile gpu usage utilization is 0. Does anyone have any possible ideas on why this might be happening?
Traceback (most recent call last):
File "scripts/preprocess_questions.py", line 193, in
main(args)
File "scripts/preprocess_questions.py", line 88, in main
'answer_token_to_idx': answer_token_to_idx,
UnboundLocalError: local variable 'answer_token_to_idx' referenced before assignment
I just used your code,but it seems alright,i don't know how to fix it....
Thank you~~
According to others' advice i modified these part:
But there is a new error in training step 3 (PG+EE):
Traceback (most recent call last):
File "scripts/train_model.py", line 490, in
main(args)
File "scripts/train_model.py", line 151, in main
train_loop(args, train_loader, val_loader)
File "scripts/train_model.py", line 239, in train_loop
programs_pred = program_generator.reinforce_sample(questions_var)
File "/home/lxp/CodeBase/VisualReasoning/clevr-iep/iep/models/seq2seq.py", line 182, in reinforce_sample
y[:, t][not_done] = cur_output_data[not_done]
RuntimeError: inconsistent tensor size, expected src [64 x 1] and mask [64 x 64] to have the same number of elements, but got 64 and 4096 elements respectively at /opt/conda/conda-bld/pytorch_1513368888240/work/torch/lib/TH/generic/THTensorMath.c:197
And I don't know why. Is there any solution to solve it?
I have two GTX 1080 Ti but I get out of GPU memory error when I want to train. What GPU did you use and can I really not train it with GTX 1080 Ti? What is the minimum GPU specs I could run this code on?
[jalal@goku clevr-iep]$ python scripts/extract_features.py --input_image_dir data/CLEVR_v1.0/images/train --output_h5_file data/train_features.h5
('data/CLEVR_v1.0/images/train/CLEVR_train_000000.png', 0)
('data/CLEVR_v1.0/images/train/CLEVR_train_069999.png', 69999)
scripts/extract_features.py:57: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
image_batch = torch.autograd.Variable(image_batch, volatile=True)
Traceback (most recent call last):
File "scripts/extract_features.py", line 114, in <module>
main(args)
File "scripts/extract_features.py", line 94, in main
feats = run_batch(cur_batch, model)
File "scripts/extract_features.py", line 59, in run_batch
feats = model(image_batch)
File "/scratch/sjn/anaconda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
result = self.forward(*input, **kwargs)
File "/scratch/sjn/anaconda/lib/python3.6/site-packages/torch/nn/modules/container.py", line 91, in forward
input = module(input)
File "/scratch/sjn/anaconda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
result = self.forward(*input, **kwargs)
File "/scratch/sjn/anaconda/lib/python3.6/site-packages/torch/nn/modules/container.py", line 91, in forward
input = module(input)
File "/scratch/sjn/anaconda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
result = self.forward(*input, **kwargs)
File "/scratch/sjn/anaconda/lib/python3.6/site-packages/torchvision-0.2.1-py3.6.egg/torchvision/models/resnet.py", line 84, in forward
File "/scratch/sjn/anaconda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
result = self.forward(*input, **kwargs)
File "/scratch/sjn/anaconda/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 301, in forward
self.padding, self.dilation, self.groups)
RuntimeError: CUDA error: out of memory
[jalal@goku clevr-iep]$ nvidia-smi
Mon Sep 24 22:52:14 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.48 Driver Version: 410.48 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 00000000:05:00.0 On | N/A |
| 0% 25C P0 62W / 250W | 445MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 108... Off | 00000000:06:00.0 Off | N/A |
| 0% 26C P8 12W / 250W | 2MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 2177 G /usr/bin/X 309MiB |
| 0 3731 G /usr/bin/gnome-shell 125MiB |
| 0 10560 G /usr/lib64/firefox/firefox 2MiB |
| 0 13620 G /usr/lib64/firefox/firefox 2MiB |
| 0 14157 G /usr/lib64/firefox/firefox 2MiB |
+-----------------------------------------------------------------------------+
[jalal@goku clevr-iep]$
[jalal@goku clevr-iep]$ pip install -r requirements.txt
torch-0.1.11.post5-cp35-cp35m-linux_x86_64.whl is not a supported wheel on this platform.
[jalal@goku clevr-iep]$ uname -a
Linux goku.bu.edu 3.10.0-862.11.6.el7.x86_64 #1 SMP Tue Aug 14 21:49:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Hello,
I am wondering if there is a typo in the command listed in section https://github.com/facebookresearch/clevr-iep/blob/master/TRAINING.md#step-2-train-the-execution-engine,
python scripts/train_model.py
--model_type EE
--program_generator_start_from data/program_generator.py
--num_iterations 100000
--checkpoint_path data/execution_engine.pt
should "--program_generator_start_from data/program_generator.py " be "--program_generator_start_from data/program_generator.pt "?
Please check and confirm it, thanks!
Whenever i try to run the program i get the following error.Somebody fix this please
Loading program generator from scripts/models/CLEVR/program_generator_18k.pt
Loading execution engine from scripts/models/CLEVR/execution_engine_18k.pt
Loading CNN for feature extraction
Running the model
Traceback (most recent call last):
File "run_model.py", line 301, in
main(args)
File "run_model.py", line 85, in main
run_single_example(args, model)
File "run_model.py", line 159, in run_single_example
argmax=(args.sample_argmax == 1))
File "/home/manoj/Desktop/clevr-iep/iep/models/seq2seq.py", line 169, in reinforce_sample
logprobs, h, c = self.decoder(encoded, cur_input, h0=h, c0=c)
File "/home/manoj/Desktop/clevr-iep/iep/models/seq2seq.py", line 91, in decoder
V_in, V_out, D, H, L, N, T_in, T_out = self.get_dims(y=y)
File "/home/manoj/Desktop/clevr-iep/iep/models/seq2seq.py", line 56, in get_dims
T_out = y.size(1) if y is not None else None
RuntimeError: invalid argument 2: out of range at /opt/conda/conda-bld/pytorch_1502009910772/work/torch/lib/THC/generic/THCTensor.c:23
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.