facebookresearch / clevr-iep Goto Github PK

Inferring and Executing Programs for Visual Reasoning

License: Other

Python 95.35% Shell 4.65%

clevr-iep's Issues

RuntimeError: CuDNN error: CUDNN_STATUS_SUCCESS

tomtop5@tomtop5-B360-M-AORUS-PRO:~/download/clevr-iep-master$ python3 scripts/run_model.py --program_generator models/CLEVR/program_generator_700k.pt --execution_engine models/CLEVR/execution_engine_700k_strong.pt --image img/CLEVR_val_000013.png --question "Does the small sphere have the same color as the cube left of the gray cube?"

Loading program generator from models/CLEVR/program_generator_700k.pt
Loading execution engine from models/CLEVR/execution_engine_700k_strong.pt
Loading CNN for feature extraction
scripts/run_model.py:133: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead.
img_var = Variable(torch.FloatTensor(img).type(dtype), volatile=True)
THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=663 error=11 : invalid argument
scripts/run_model.py:146: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead.
question_var = Variable(question_encoded, volatile=True)
Running the model

Traceback (most recent call last):
File "scripts/run_model.py", line 301, in
main(args)
File "scripts/run_model.py", line 85, in main
run_single_example(args, model)
File "scripts/run_model.py", line 159, in run_single_example
argmax=(args.sample_argmax == 1))
File "/home/tomtop5/download/clevr-iep-master/scripts/iep/models/seq2seq.py", line 162, in reinforce_sample
encoded = self.encoder(x)
File "/home/tomtop5/download/clevr-iep-master/scripts/iep/models/seq2seq.py", line 86, in encoder
out, _ = self.encoder_rnn(embed, (h0, c0))
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/rnn.py", line 192, in forward
output, hidden = func(input, self.all_weights, hx, batch_sizes)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/_functions/rnn.py", line 324, in forward
return func(input, *fargs, **fkwargs)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/_functions/rnn.py", line 288, in forward
dropout_ts)
RuntimeError: CuDNN error: CUDNN_STATUS_SUCCESS

if you have same problem，plz help me to solve it ! Thx!

i use pytorch 0.4 , python 3.5.2 ; GPU:1660 ; CUDA: 10.1 ; Cudnn:7.5.0

Error while running baseline model

When I run baseline model in pytorch 0.2.0, I got following error,

train_loader has 699989 samples
val_loader has 10000 samples
Starting epoch 1
Traceback (most recent call last):
  File "scripts/train_model.py", line 491, in <module>
    main(args)
  File "scripts/train_model.py", line 152, in main
    train_loop(args, train_loader, val_loader)
  File "scripts/train_model.py", line 235, in train_loop
    scores = baseline_model(questions_var, feats_var)
  File "/root/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/sharedfolder/clevr/clevr-iep/iep/models/baselines.py", line 242, in forward
    u = sa(v, u)
  File "/root/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/sharedfolder/clevr/clevr-iep/iep/models/baselines.py", line 45, in forward
    v_tilde = (p.expand_as(v) * v).sum(2).sum(3).view(N, D)
  File "/root/anaconda3/lib/python3.6/site-packages/torch/autograd/variable.py", line 476, in sum
    return Sum.apply(self, dim, keepdim)
  File "/root/anaconda3/lib/python3.6/site-packages/torch/autograd/_functions/reduce.py", line 21, in forward
    return input.sum(dim)
RuntimeError: dimension out of range (expected to be in range of [-3, 2], but got 3)

how can I fix this?

CLEVR_v1.0 dataset downloaded from this link:https://s3-us-west-1.amazonaws.com/clevr/CLEVR_v1.0.zip, have missed about 85 images in test set

As shown above, my downloaded test sets don't have 15000 images. Has anybody been facing the same problem?

RuntimeError of 'T_out = y.size(1) if y is not None else None'

Hi there, when I ran the scripts as instructed, I got the RuntimeError. Below are the details:

(PYENV) --- files/clevr-iep ‹master* M?› » python scripts/run_model.py \                                        1 ↵
  --program_generator models/CLEVR/program_generator_18k.pt \
  --execution_engine models/CLEVR/execution_engine_18k.pt \
  --image img/CLEVR_val_000013.png \
  --question "Does the small sphere have the same color as the cube left of the gray cube?"

Loading program generator from  models/CLEVR/program_generator_18k.pt
Loading execution engine from  models/CLEVR/execution_engine_18k.pt
Loading CNN for feature extraction
Running the model

y is: None
y is: Variable containing:
 1
[torch.cuda.LongTensor of size 1x1 (GPU 0)]

y is: Variable containing:
 5
[torch.cuda.LongTensor of size 1 (GPU 0)]

Traceback (most recent call last):
  File "scripts/run_model.py", line 301, in <module>
    main(args)
  File "scripts/run_model.py", line 85, in main
    run_single_example(args, model)
  File "scripts/run_model.py", line 159, in run_single_example
    argmax=(args.sample_argmax == 1))
  File "/u/home/downloads/files/clevr-iep/iep/models/seq2seq.py", line 171, in reinforce_sample
    logprobs, h, c = self.decoder(encoded, cur_input, h0=h, c0=c)
  File "/u/home/downloads/files/clevr-iep/iep/models/seq2seq.py", line 93, in decoder
    V_in, V_out, D, H, L, N, T_in, T_out = self.get_dims(y=y)
  File "/u/home/downloads/files/clevr-iep/iep/models/seq2seq.py", line 58, in get_dims
    T_out = y.size(1) if y is not None else None
RuntimeError: invalid argument 2: out of range at /opt/conda/conda-bld/pytorch_1502009910772/work/torch/lib/THC/generic/THCTensor.c:23

My python is of version 3.6, and the latest pytorch is installed.
I have printed the content of y, and it can be None, 1-by-1 torch.cuda.LongTensor, and 1 torch.cuda.LongTensor. So, when the size is 1, T_out = y.size(1) will lead to the RuntimeError.

Test answers of CLEVR dataset

Where can I find the test set answers of the CLEVR dataset? Thanks.

NoneType

我下载完数据之后在运行第二步提取图像特征时出现了内存溢出问题，我把图片缩减为20张之后出现下面这个错误，有谁可以帮我说一下怎么解决吗？谢谢
python scripts/extract_features.py --input_image_dir data/CLEVR_v1.0/images/train --output_h5_file data/train_features.h5

('data/CLEVR_v1.0/images/train/CLEVR_train_000000.png', 0)
('data/CLEVR_v1.0/images/train/CLEVR_train_000019.png', 19)
Traceback (most recent call last):
File "scripts/extract_features.py", line 114, in
main(args)
File "scripts/extract_features.py", line 108, in main
feat_dset[i0:i1] = feats
TypeError: 'NoneType' object does not support item assignment

where can I find the relationship between 32 questions and the category it belongs to?

zero gpu usage when training model

When I train the baseline model using the command:

python scripts/train_model.py   \
--model_type CNN+LSTM+SA   \
--classifier_fc_dims 1024  \
 --num_iterations 400000  \
 --checkpoint_path data/cnn_lstm_sa_mlp.pt

I found the gpu usage is zero and got the following output:

2017-06-11-22:43:00] 1, 8.812, 30.340, 3.445
2017-06-11-22:43:01] 2, 0.057, 0.237, 3.334
2017-06-11-22:43:17] 3, 0.051, 16.018, 3.215
2017-06-11-22:43:35] 4, 0.047, 17.887, 3.052
2017-06-11-22:43:51] 5, 0.045, 16.408, 2.972
2017-06-11-22:44:08] 6, 0.048, 16.996, 2.878
2017-06-11-22:44:26] 7, 0.044, 17.460, 2.587
2017-06-11-22:44:41] 8, 0.041, 15.066, 2.851
2017-06-11-22:44:46] 9, 0.049, 5.564, 2.603
2017-06-11-22:45:08] 10, 0.048, 22.074, 2.876
2017-06-11-22:45:23] 11, 0.043, 14.640, 2.516
2017-06-11-22:45:37] 12, 0.049, 13.797, 2.853
2017-06-11-22:45:52] 13, 0.053, 15.052, 2.847
2017-06-11-22:46:07] 14, 0.043, 14.625, 2.867
2017-06-11-22:46:21] 15, 0.049, 14.314, 2.676
2017-06-11-22:46:36] 16, 0.047, 14.443, 2.634

I changed the output a little bit. The second column is the actual batch train time and third column is the data loading time. It seems the data loading dominates...

I guess the problem is that the image h5 file is not loaded into the memory and random access the h5 file is very slow...

With the current code, it seems impossible to finish the training within reasonable time.
Assuming each iteration takes 10 second, then training takes 400000 * 10/3600/24 = 46 days

EOFError

While running baseline model(CNN+LSTM+SA) in python 3.6 and pytorch 0.2.0, I got following EOFError.
It seems that it happened after several times of checking accuracy.
I found it by setting checkpoint_every to 1.
Anybody ran into this error?

Checking training accuracy ... 
Traceback (most recent call last):
  File "scripts/train_model.py", line 498, in <module>
    main(args)
  File "scripts/train_model.py", line 152, in main
    train_loop(args, train_loader, val_loader)
  File "scripts/train_model.py", line 276, in train_loop
    baseline_model, train_loader)
  File "scripts/train_model.py", line 454, in check_accuracy
    for batch in loader:
  File "/root/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 195, in __next__
    idx, batch = self.data_queue.get()
  File "/root/anaconda3/lib/python3.6/multiprocessing/queues.py", line 345, in get
    return _ForkingPickler.loads(res)
  File "/root/anaconda3/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 70, in rebuild_storage_fd
    fd = df.detach()
  File "/root/anaconda3/lib/python3.6/multiprocessing/resource_sharer.py", line 57, in detach
    with _resource_sharer.get_connection(self._id) as conn:
  File "/root/anaconda3/lib/python3.6/multiprocessing/resource_sharer.py", line 87, in get_connection
    c = Client(address, authkey=process.current_process().authkey)
  File "/root/anaconda3/lib/python3.6/multiprocessing/connection.py", line 493, in Client
    answer_challenge(c, authkey)
  File "/root/anaconda3/lib/python3.6/multiprocessing/connection.py", line 732, in answer_challenge
    message = connection.recv_bytes(256)         # reject large message
  File "/root/anaconda3/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "/root/anaconda3/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/root/anaconda3/lib/python3.6/multiprocessing/connection.py", line 383, in _recv
    raise EOFError
EOFError

Typo

clevr-iep/scripts/run_model.py

Line 240 in 35a4eea

print('God %d / %d = %.2f correct' % (num_correct, num_samples, 100 * acc))

program within dataloader

python3.5 scripts/train_model.py --model_type PG --num_train_samples 18000 --num_iterations 20000 --checkpoint_every 1000 --checkpoint_path data/program_generator.pt

/opt/data/penggao/pytorch_project/clevr-iep/clevr-iep/scripts/train_model.py(27)()
-> import iep.utils as utils
(Pdb) c
Reading features from data/train_features.h5
Reading questions from data/train_questions.h5
Reading question data into memory
Reading features from data/val_features.h5
Reading questions from data/val_questions.h5
Reading question data into memory

Here is the program generator:
Seq2Seq (
(encoder_embed): Embedding(93, 300)
(encoder_rnn): LSTM(300, 256, num_layers=2, batch_first=True)
(decoder_embed): Embedding(44, 300)
(decoder_rnn): LSTM(556, 256, num_layers=2, batch_first=True)
(decoder_linear): Linear (256 -> 44)
)
train_loader has 18000 samples
val_loader has 10000 samples
Starting epoch 1

Traceback (most recent call last):
File "scripts/train_model.py", line 27, in
import iep.utils as utils
File "scripts/train_model.py", line 154, in main
train_loop(args, train_loader, val_loader)
File "scripts/train_model.py", line 211, in train_loop
for batch in train_loader:
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 206, in next
idx, batch = self.data_queue.get()
File "/usr/lib/python3.5/multiprocessing/queues.py", line 345, in get
return ForkingPickler.loads(res)
File "/usr/local/lib/python3.5/dist-packages/torch/multiprocessing/reductions.py", line 70, in rebuild_storage_fd
fd = df.detach()
File "/usr/lib/python3.5/multiprocessing/resource_sharer.py", line 57, in detach
with _resource_sharer.get_connection(self._id) as conn:
File "/usr/lib/python3.5/multiprocessing/resource_sharer.py", line 87, in get_connection
c = Client(address, authkey=process.current_process().authkey)
File "/usr/lib/python3.5/multiprocessing/connection.py", line 493, in Client
answer_challenge(c, authkey)
File "/usr/lib/python3.5/multiprocessing/connection.py", line 732, in answer_challenge
message = connection.recv_bytes(256) # reject large message
File "/usr/lib/python3.5/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/usr/lib/python3.5/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/usr/lib/python3.5/multiprocessing/connection.py", line 383, in _recv
raise EOFError
EOFError

line 81, in getitem\n fn_str = self.vocab[\'program_idx_to_token\'][fn_idx]\nKeyError: tensor(1)\n'

I used python 3.7 and pytorch 1.0.
when I try to train on CLEVR it occur a program like below:
python scripts/train_model.py \

--model_type PG
--num_train_samples 18000
--num_iterations 20000
--checkpoint_every 1000
--checkpoint_path data/program_generator.pt
Reading features from data/train_features.h5
Reading questions from data/train_questions.h5
Reading question data into memory
Reading features from data/val_features.h5
Reading questions from data/val_questions.h5
Reading question data into memory
Here is the program generator:
Seq2Seq(
(encoder_embed): Embedding(93, 300)
(encoder_rnn): LSTM(300, 256, num_layers=2, batch_first=True)
(decoder_embed): Embedding(44, 300)
(decoder_rnn): LSTM(556, 256, num_layers=2, batch_first=True)
(decoder_linear): Linear(in_features=256, out_features=44, bias=True)
)
train_loader has 18000 samples
val_loader has 10000 samples
Starting epoch 1
Traceback (most recent call last):
File "scripts/train_model.py", line 492, in
main(args)
File "scripts/train_model.py", line 153, in main
train_loop(args, train_loader, val_loader)
File "scripts/train_model.py", line 210, in train_loop
for batch in train_loader:
File "/home/nicholas/anaconda3/envs/py/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 637, in next
return self._process_next_batch(batch)
File "/home/nicholas/anaconda3/envs/py/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 658, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
KeyError: 'Traceback (most recent call last):\n File "/home/nicholas/anaconda3/envs/py/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 138, in _worker_loop\n samples = collate_fn([dataset[i] for i in batch_indices])\n File "/home/nicholas/anaconda3/envs/py/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 138, in \n samples = collate_fn([dataset[i] for i in batch_indices])\n File "/home/nicholas/code/clevr-iep-master/iep/data.py", line 81, in getitem\n fn_str = self.vocab['program_idx_to_token'][fn_idx]\nKeyError: tensor(1)\n'

is my python&pytorch version didn't fit it?
If anyone can solve this I'll be really appreciate
thx

how to compute test accuracy

I see that answers in CLEVR are only for train and validation set,how to get ground-truth answers to be compared with our predicted answers??I couldn't find test.py in your Read.me file,there is only predicted answer.Or I need to upload my answer json file to CLEVR server or something else??
Looking forward to your reply,it will be very helpful for me~~
Thank you very much~~~

IndexError:too many indices for tensor of dimension 1

I used python 3.5 and pytorch 0.4.
At first, I got RuntimeError: dimension out of range (expected to be in range of [-1, 0], but got 1) at seq2seq.py, line 57, in get_dims
T_out = y.size(1) if y is not None else None
After i change seq2seq model line 173
if argmax:
_, cur_output = probs.max(1)
else:

if argmax:
_, cur_output = probs.max(1)
cur_output = cur_output.unsqueeze(0)
else:

as commented by manoja328, I got the IndexError : too many indices for tensor of dimension 1 ,at seq2seq.py, line 182,
in reinforce_sample y[:, t][not_done] = cur_output_data[not_done]

Thank you!

IndexError!

Checking training accuracy ...
Traceback (most recent call last):
File "scripts/train_model.py", line 490, in
main(args)
File "scripts/train_model.py", line 151, in main
train_loop(args, train_loader, val_loader)
File "scripts/train_model.py", line 269, in train_loop
baseline_model, train_loader)
File "scripts/train_model.py", line 460, in check_accuracy
program_pred = program_generator.sample(Variable(questions[i:i+1].cuda(), volatile=True))
File "/home/jiaruizou/research/clevr-iep/iep/models/seq2seq.py", line 154, in sample
y.append(next_y[0, 0, 0])
IndexError: trying to index 3 dimensions of a 2 dimensional tensor

When I train the PG, it encounter this error...

Training data is incomplete?

I downloaded CLEVR_v1.0 from https://dl.fbaipublicfiles.com/clevr/CLEVR_v1.0.zip, but there are only 67882 images in the "/images/train", test and val data are both 15000. How to solve it?

Error running baseline models

Hi, first of all great work on this paper. Looking forward to see it at ICCV.
I have an issue trying to run the baseline models such as CNN+LSTM and LSTM. I get the following error. Have you seen this problem before? It seems it always stops at that iteration. I also saw the error when running all the other baseline methods. Thanks for your help.

6721 0.31234210729599
6722 0.2944237291812897
6723 0.23815006017684937
Traceback (most recent call last):
File "scripts/train_model.py", line 514, in
main(args)
File "scripts/train_model.py", line 160, in main
train_loop(args, train_loader, val_loader)
File "scripts/train_model.py", line 246, in train_loop
baseline_optimizer.step()
File "/home/oliver/.local/lib/python3.5/site-packages/torch/optim/adam.py", line 70, in step
bias_correction1 = 1 - beta1 ** state['step']
OverflowError: (34, 'Numerical result out of range')

error while trying to save predictions in output_h5

I would like to save the parameters of the pretrained model . I have downloaded the pretrained model but while trying to save the predictions in output_h5 , I am getting the following error . I am quite new to this , please help .

This is what I typed in my terminal
python scripts/run_model.py --program_generator models/CLEVR/program_generator_18k.pt --execution_engine models/CLEVR/execution_engine_18k.pt --output_h5 params.h5

This is what I get .

pytorch installation issue!

$pip3 list --user
DEPRECATION: The default format will switch to columns in the future. You can use --format=(legacy|columns) (or define a format=(legacy|columns) in your pip.conf under the [list] section) to disable this warning.
h5py (2.7.0)
olefile (0.44)
Pillow (4.1.1)
PyYAML (3.12)
torch (0.1.12.post2)
torchvision (0.1.8)
virtualenv (15.1.0)

however:
$python3 scripts/run_model.py --program_generator models/CLEVR/program_generator_18k.pt --execution_engine models/CLEVR/execution_engine_18k.pt --image img/CLEVR_val_000013.png --question "Does the small sphere have the same color as the cube left of the gray cube?"
Traceback (most recent call last):
File "scripts/run_model.py", line 14, in
import torch
File "../.local/lib/python3.6/site-packages/torch/init.py", line 53, in
from torch._C import *
ImportError: libpython3.6m.so.1.0: cannot open shared object file: No such file or directory

KeyError: 'function' not found

In the file programs.py inside the function build_subtree, the script tries to create a new key as follows:

def list_to_tree(program_list):
    def build_subtree(cur):
        return {
            'function': cur['function'],
            'value_inputs': [x for x in cur['value_inputs']],
            'inputs': [build_subtree(program_list[i]) for i in cur['inputs']],
        }
    return build_subtree(program_list[-1])

I have tried generating new images with new questions, but never I see the key function in cur[function]. Has it happened with anyone else? Were you able to resolve it?

RuntimeError: expand(torch.LongTensor{[1, 1]}, size=[1]): the number of sizes provided (1) must be greater or equal to the number of dimensions in the tensor (2) at seq2seq.py, line 183, in reinforce_sample y[:, t][not_done] = cur_output_data[not_done]

Hi there, when I ran the scripts as instructed, I got the RuntimeError. Below are the details:
python3 scripts/run_model.py --program_generator models/CLEVR/program_generator_18k.pt --execution_engine models/CLEVR/execution_engine_18k.pt --image img/CLEVR_val_000013.png --question "Does the small sphere have the same color as the cube left of the gray cube?"

Loading program generator from models/CLEVR/program_generator_18k.pt
Loading execution engine from models/CLEVR/execution_engine_18k.pt
Loading CNN for feature extraction
Running the model

/home/pyp/clevr-iep-master/iep/models/seq2seq.py:172: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
probs = F.softmax(logprobs.view(N, -1)) # Now N x V
Traceback (most recent call last):
File "scripts/run_model.py", line 303, in
main(args)
File "scripts/run_model.py", line 85, in main
run_single_example(args, model)
File "scripts/run_model.py", line 161, in run_single_example
argmax=(args.sample_argmax == 1))
File "/home/pyp/clevr-iep-master/iep/models/seq2seq.py", line 183, in reinforce_sample
y[:, t][not_done] = cur_output_data[not_done]
RuntimeError: expand(torch.LongTensor{[1, 1]}, size=[1]): the number of sizes provided (1) must be greater or equal to the number of dimensions in the tensor (2)

============================================================================
I used python 3.5 and pytorch 0.4.
At first, I got RuntimeError: dimension out of range (expected to be in range of [-1, 0], but got 1) at seq2seq.py, line 57, in get_dims
T_out = y.size(1) if y is not None else None
After i change seq2seq model line 173
if argmax:
_, cur_output = probs.max(1)
else:

if argmax:
_, cur_output = probs.max(1)
cur_output = cur_output.unsqueeze(0)
else:

as commented by manoja328, I got the RuntimeError in the title: RuntimeError: expand(torch.LongTensor{[1, 1]}, size=[1]): the number of sizes provided (1) must be greater or equal to the number of dimensions in the tensor (2) at seq2seq.py, line 183, in reinforce_sample y[:, t][not_done] = cur_output_data[not_done]

Thank you!

Loss calculation !!

Can someone please explain me how is the loss calculated . Is the loss being calculated by finding cross entropy loss between concatenated natural language q. and concatenated functional program as inputs or something else is taking place .

def compute_loss(self, output_logprobs, y):
"""
Compute loss. We assume that the first element of the output sequence y is
a start token, and that each element of y is left-aligned and right-padded
with self.NULL out to T_out. We want the output_logprobs to predict the
sequence y, shifted by one timestep so that y[0] is fed to the network and
then y[1] is predicted. We also don't want to compute loss for padded
timesteps.

Inputs:
- output_logprobs: Variable of shape (N, T_out, V_out)
- y: LongTensor Variable of shape (N, T_out)
"""
print "hi I am inside "
self.multinomial_outputs = None
V_in, V_out, D, H, L, N, T_in, T_out = self.get_dims(y=y)

mask = y.data != self.NULL
y_mask = Variable(torch.Tensor(N, T_out).fill_(0).type_as(mask))
y_mask[:, 1:] = mask[:, 1:]
y_masked = y[y_mask]
out_mask = Variable(torch.Tensor(N, T_out).fill_(0).type_as(mask))
out_mask[:, :-1] = mask[:, 1:]
out_mask = out_mask.view(N, T_out, 1).expand(N, T_out, V_out)
out_masked = output_logprobs[out_mask].view(-1, V_out)
loss = F.cross_entropy(out_masked, y_masked)
return loss

Thanks!

Are there other ways to get results on test set?

According to dataset provided on https://cs.stanford.edu/people/jcjohns/clevr/, the answers of test set seems not to be given. Are there other ways (testing on a server) to test model results on test set of CLEVR-v1.0 except sending results to jcjohnson? Thank you very much.

how to get the testing accuracy?

File "clevr-iep/iep/models/seq2seq.py", line 57, in get_dims T_out = y.size(1) if y is not None else None

[jalal@goku clevr-iep]$ python scripts/run_model.py   --program_generator models/CLEVR/program_generator_18k.pt   --execution_engine models/CLEVR/execution_engine_18k.pt   --image img/CLEVR_val_000013.png   --question "Does the small sphere have the same color as the cube left of the gray cube?"

Loading program generator from  models/CLEVR/program_generator_18k.pt
Loading execution engine from  models/CLEVR/execution_engine_18k.pt
Loading CNN for feature extraction
Downloading: "https://download.pytorch.org/models/resnet101-5d3b4d8f.pth" to /home/grad3/jalal/.torch/models/resnet101-5d3b4d8f.pth
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 178728960/178728960 [00:06<00:00, 28902620.29it/s]
scripts/run_model.py:133: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
  img_var = Variable(torch.FloatTensor(img).type(dtype), volatile=True)
scripts/run_model.py:146: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
  question_var = Variable(question_encoded, volatile=True)
Running the model

clevr-iep/iep/models/seq2seq.py:172: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
  probs = F.softmax(logprobs.view(N, -1)) # Now N x V
Traceback (most recent call last):
  File "scripts/run_model.py", line 301, in <module>
    main(args)
  File "scripts/run_model.py", line 85, in main
    run_single_example(args, model)
  File "scripts/run_model.py", line 159, in run_single_example
    argmax=(args.sample_argmax == 1))
  File "/clevr-iep/iep/models/seq2seq.py", line 170, in reinforce_sample
    logprobs, h, c = self.decoder(encoded, cur_input, h0=h, c0=c)
  File "/clevr-iep/iep/models/seq2seq.py", line 92, in decoder
    V_in, V_out, D, H, L, N, T_in, T_out = self.get_dims(y=y)
  File "clevr-iep/iep/models/seq2seq.py", line 57, in get_dims
    T_out = y.size(1) if y is not None else None
RuntimeError: Dimension out of range (expected to be in range of [-1, 0], but got 1)

Can you please help how to fix this error?

Can I only use CPU to debug and understand it?

I don't want to train a whole model and predict. I only want to understand it. Is it possible to debug it, only using CPU without GPU?
Thank you!

ValueError: unrecognized program format

In run_model.py, line 161, The type of ' predicted_program' is torch.Tensor.
However, in iep/module/module_net.py, the type of this parameter is restricted to list, tuple and Viarable (dim=2).
I tried to convert predicted_program form torch.Tensor to list or Variable, but failed.

The table 1 results in your paper is on validation set right?

where is the code to generate your dataset?

In your paper, you said that you will open source the code for dataset generation. Where can I find the code?
Thank you so much for your amazing code. I learned a lot from your stanford lectures and your open-sourced codes.

pytorch 0.2.0 version compatibility

With the update of pytorch, the current code may run into error during inference and training. Here I will show the bugs and how to change the code.

Training: Step 2

Hello,

Regarding the training procedure on step 2:
python scripts/train_model.py --model_type EE --program_generator_start_from data/program_generator.py --num_iterations 100000 --checkpoint_path data/execution_engine.pt

I do not know if I have missed something, but program_generator_start_from is only invoked inside get_program_generator, for 'PG+EE' and 'PG' model types.

Thank you.

Unable to download CLEVR dataset

The link for downloading CLEVR dataset seems to be down. I get the error
for https://s3-us-west-1.amazonaws.com/clevr/CLEVR_v1.0.zip

This XML file does not appear to have any style information associated with it. The document tree is shown below.

Question: A list of program modules

Hey,

In the paper , in section 3.1 , it is mentioned that the syntax is set by pre-speciying a list of functions - F. I assume this is a fixed set of functions or say a vocabulary.

Where is this vocabulary defined in the code? And It is not clear to me how you arrive at this list of functions for the given problem of QA on CLEVER.

Can any of the authors please answer my question. Please excuse if my question is ambiguous

--
thanks

Code stuck around loss.backward() for forever

Hi,
I was using the code with the following configuration
python scripts/train_model.py --model_type PG --num_train_samples 18000 --num_iterations 20000 --checkpoint_every 1000 --checkpoint_path data/program_generator.pt --batch_size 2 on a Tesla K40M. I followed the previous steps on data creation and all the data is available on the local machine.
I observed that the code is stuck in computing loss.backward on following line.
My Memory usage on the GPU stays at 296MB but the volatile gpu usage utilization is 0. Does anyone have any possible ideas on why this might be happening?

preprocess_question

Traceback (most recent call last):
File "scripts/preprocess_questions.py", line 193, in
main(args)
File "scripts/preprocess_questions.py", line 88, in main
'answer_token_to_idx': answer_token_to_idx,
UnboundLocalError: local variable 'answer_token_to_idx' referenced before assignment
I just used your code,but it seems alright,i don't know how to fix it....
Thank you~~

RuntimeError: inconsistent tensor size

According to others' advice i modified these part:

File "/home/lxp/CodeBase/VisualReasoning/clevr-iep/iep/models/seq2seq.py", line 154, in sample
y.append(next_y[0, 0, 0]) ===> y.append(next_y[0, 0])
add cur_output = cur_output.unsqueeze(0) to seq2seq.py, line 175, in sample

But there is a new error in training step 3 (PG+EE):

Traceback (most recent call last):
File "scripts/train_model.py", line 490, in
main(args)
File "scripts/train_model.py", line 151, in main
train_loop(args, train_loader, val_loader)
File "scripts/train_model.py", line 239, in train_loop
programs_pred = program_generator.reinforce_sample(questions_var)
File "/home/lxp/CodeBase/VisualReasoning/clevr-iep/iep/models/seq2seq.py", line 182, in reinforce_sample
y[:, t][not_done] = cur_output_data[not_done]
RuntimeError: inconsistent tensor size, expected src [64 x 1] and mask [64 x 64] to have the same number of elements, but got 64 and 4096 elements respectively at /opt/conda/conda-bld/pytorch_1513368888240/work/torch/lib/TH/generic/THTensorMath.c:197

And I don't know why. Is there any solution to solve it?

RuntimeError: CUDA error: out of memory for GTX 1080 Ti

I have two GTX 1080 Ti but I get out of GPU memory error when I want to train. What GPU did you use and can I really not train it with GTX 1080 Ti? What is the minimum GPU specs I could run this code on?

[jalal@goku clevr-iep]$ python scripts/extract_features.py   --input_image_dir data/CLEVR_v1.0/images/train   --output_h5_file data/train_features.h5
('data/CLEVR_v1.0/images/train/CLEVR_train_000000.png', 0)
('data/CLEVR_v1.0/images/train/CLEVR_train_069999.png', 69999)
scripts/extract_features.py:57: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
  image_batch = torch.autograd.Variable(image_batch, volatile=True)
Traceback (most recent call last):
  File "scripts/extract_features.py", line 114, in <module>
    main(args)
  File "scripts/extract_features.py", line 94, in main
    feats = run_batch(cur_batch, model)
  File "scripts/extract_features.py", line 59, in run_batch
    feats = model(image_batch)
  File "/scratch/sjn/anaconda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/scratch/sjn/anaconda/lib/python3.6/site-packages/torch/nn/modules/container.py", line 91, in forward
    input = module(input)
  File "/scratch/sjn/anaconda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/scratch/sjn/anaconda/lib/python3.6/site-packages/torch/nn/modules/container.py", line 91, in forward
    input = module(input)
  File "/scratch/sjn/anaconda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/scratch/sjn/anaconda/lib/python3.6/site-packages/torchvision-0.2.1-py3.6.egg/torchvision/models/resnet.py", line 84, in forward
  File "/scratch/sjn/anaconda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/scratch/sjn/anaconda/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 301, in forward
    self.padding, self.dilation, self.groups)
RuntimeError: CUDA error: out of memory
[jalal@goku clevr-iep]$ nvidia-smi
Mon Sep 24 22:52:14 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.48                 Driver Version: 410.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:05:00.0  On |                  N/A |
|  0%   25C    P0    62W / 250W |    445MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 108...  Off  | 00000000:06:00.0 Off |                  N/A |
|  0%   26C    P8    12W / 250W |      2MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      2177      G   /usr/bin/X                                   309MiB |
|    0      3731      G   /usr/bin/gnome-shell                         125MiB |
|    0     10560      G   /usr/lib64/firefox/firefox                     2MiB |
|    0     13620      G   /usr/lib64/firefox/firefox                     2MiB |
|    0     14157      G   /usr/lib64/firefox/firefox                     2MiB |
+-----------------------------------------------------------------------------+
[jalal@goku clevr-iep]$

torch-0.1.11.post5-cp35-cp35m-linux_x86_64.whl is not a supported wheel on this platform.

[jalal@goku clevr-iep]$ pip install -r requirements.txt
torch-0.1.11.post5-cp35-cp35m-linux_x86_64.whl is not a supported wheel on this platform.
[jalal@goku clevr-iep]$ uname -a
Linux goku.bu.edu 3.10.0-862.11.6.el7.x86_64 #1 SMP Tue Aug 14 21:49:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

a possible typo!

Hello,
I am wondering if there is a typo in the command listed in section https://github.com/facebookresearch/clevr-iep/blob/master/TRAINING.md#step-2-train-the-execution-engine,

python scripts/train_model.py
--model_type EE
--program_generator_start_from data/program_generator.py
--num_iterations 100000
--checkpoint_path data/execution_engine.pt

should "--program_generator_start_from data/program_generator.py " be "--program_generator_start_from data/program_generator.pt "?

Please check and confirm it, thanks!

Not working with cuda 8.0 and python 3.6 and latest pytorch version

Whenever i try to run the program i get the following error.Somebody fix this please

Loading program generator from scripts/models/CLEVR/program_generator_18k.pt
Loading execution engine from scripts/models/CLEVR/execution_engine_18k.pt
Loading CNN for feature extraction
Running the model

Traceback (most recent call last):
File "run_model.py", line 301, in
main(args)
File "run_model.py", line 85, in main
run_single_example(args, model)
File "run_model.py", line 159, in run_single_example
argmax=(args.sample_argmax == 1))
File "/home/manoj/Desktop/clevr-iep/iep/models/seq2seq.py", line 169, in reinforce_sample
logprobs, h, c = self.decoder(encoded, cur_input, h0=h, c0=c)
File "/home/manoj/Desktop/clevr-iep/iep/models/seq2seq.py", line 91, in decoder
V_in, V_out, D, H, L, N, T_in, T_out = self.get_dims(y=y)
File "/home/manoj/Desktop/clevr-iep/iep/models/seq2seq.py", line 56, in get_dims
T_out = y.size(1) if y is not None else None
RuntimeError: invalid argument 2: out of range at /opt/conda/conda-bld/pytorch_1502009910772/work/torch/lib/THC/generic/THCTensor.c:23

Undefined variable

https://github.com/ethanjperez/film/blob/fe43ddf8a22b339dcca2efa07091ce9d498955cf/vr/models/film_gen.py#L187

facebookresearch / clevr-iep Goto Github PK

clevr-iep's Issues

Recommend Projects

Recommend Topics

Recommend Org