davidmascharka / tbd-nets Goto Github PK

PyTorch implementation of "Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning"

Home Page: https://arxiv.org/abs/1803.05268

License: MIT License

Jupyter Notebook 94.58% Python 5.42%

machine-learning pytorch visualization deep-learning visual-question-answering vqa neural-networks

tbd-nets's People

Contributors

Stargazers

Watchers

tbd-nets's Issues

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Trying to reproduce the experiments on train-model.ipynb and using the proposed enviroment with pytorch 0.4.1 the code produced the following error:

RuntimeError                              Traceback (most recent call last)
<ipython-input-14-82ec354902a5> in <module>()
      6     epoch += 1
      7     print('starting epoch', epoch)
----> 8     train_epoch()
      9 
     10 save_checkpoint(epoch, 'example-{:02d}.pt'.format(epoch))

<ipython-input-13-2216c33e0bef> in train_epoch()
     33 
     34         loss_file.write('Loss: {}\n'.format(loss.item()))
---> 35         loss.backward()
     36         optimizer.step()
     37         break

~/anaconda2/envs/tbd-env/lib/python3.6/site-packages/torch/tensor.py in backward(self, gradient, retain_graph, create_graph)
     91                 products. Defaults to ``False``.
     92         """
---> 93         torch.autograd.backward(self, gradient, retain_graph, create_graph)
     94 
     95     def register_hook(self, hook):

~/anaconda2/envs/tbd-env/lib/python3.6/site-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
     88     Variable._execution_engine.run_backward(
     89         tensors, grad_tensors, retain_graph, create_graph,
---> 90         allow_unreachable=True)  # allow_unreachable flag
     91 
     92 

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Pytorch is trying to backpropagate through a tensor with no grad_fn, but I wasn't able to find the problem yet.

No longer runs on mybinder.org

Hey,

I was wondering if you tried running this lately and had an ideas as to why it doesn't run successfully anymore on mybinder.org. I don't really know much about the code in the repo nor pytorch. From looking at the environment.yml and the errors I get my guess would be that there is now a newer version of pytorch that changed conventions or some such?

I've used this repository before as an example in talks about Binder and wanted to do so again but during my run through I noticed that it doesn't work anymore. If you don't have time to fix this that is totally fine, I'll find a different repo for demo purposes.

evaluate error on val

Hello, when I evaluate on val datasets, the error appears, so what's wrong?

Traceback (most recent call last):
  File "eval.py", line 17, in <module>
    dest_dir=Path('/data'), batch_size=128)
  File "/home/dengwei/tbd-nets/utils/generate_programs.py", line 256, in generate_programs
    programs_pred = program_generator.reinforce_sample(questions_var)
  File "/home/dengwei/tbd-nets/utils/generate_programs.py", line 121, in reinforce_sample
    encoded = self.encoder(x)
  File "/home/dengwei/tbd-nets/utils/generate_programs.py", line 91, in encoder
    embed = self.encoder_embed(x)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/sparse.py", line 103, in forward
    self.scale_grad_by_freq, self.sparse
RuntimeError: save_for_backward can only save input or output tensors, but argument 0 doesn't satisfy this condition

and this is the place I changed in test-eval:

vocab_path = Path('data/vocab.json')
model_path = Path('models/clevr-reg-hres.pt')
tbd_net = load_tbd_net(model_path, load_vocab(vocab_path))

program_generator = load_program_generator(Path('models/program_generator.pt'))
generate_programs(Path('data/val_questions.h5'), program_generator, 
                  dest_dir=Path('/data'), batch_size=128)
                  
use_np_features = False
if use_np_features:
    features = np.load(str(Path('data/test/test_features.npy')), mmap_mode='r')
else:
    features = h5py.File(Path('data/val_features.h5'))['features']

question_np = np.load(Path('data/val_questions.npy'))
image_idx_np = np.load(Path('data/val_image_idxs.npy'))
programs_np = np.load(Path('data/val_programs.npy'))

can not find file scripts/extract_features.py

Excuse me,Thanks for your great work.when I run this code ,It have a little question.

"python scripts/extract_features.py
--input_image_dir </path/to/CLEVR/images/train>
--output_h5_file </path/to/train_features.h5>
--model_stage 2"

I can not find file scripts/extract_features.py
could you help me?

tensor matches error

My eval.py file copies from test-eval.ipynb

import torch

from pathlib import Path
import numpy as np
import h5py

from tbd.module_net import load_tbd_net
from utils.clevr import load_vocab
from utils.generate_programs import load_program_generator, generate_programs


vocab_path = Path('data/vocab.json')
model_path = Path('models/clevr-reg-hres.pt')
tbd_net = load_tbd_net(model_path, load_vocab(vocab_path))


program_generator = load_program_generator(Path('models/program_generator.pt'))
generate_programs(Path('data/val_questions.h5'), program_generator, 
                  dest_dir=Path('data/val/'), batch_size=128)


use_np_features = False
if use_np_features:
    features = np.load(str(Path('data/val/val_features.npy')), mmap_mode='r')
else:
    features = h5py.File(Path('data/val_features.h5'))['features']

question_np = np.load(Path('data/val/questions.npy'))
image_idx_np = np.load(Path('data/val/image_idxs.npy'))
programs_np = np.load(Path('data/val/programs.npy'))


answers = ['blue', 'brown', 'cyan', 'gray', 'green', 'purple', 'red', 'yellow',
           'cube', 'cylinder', 'sphere',
           'large', 'small',
           'metal', 'rubber',
           'no', 'yes',
           '0', '1', '10', '2', '3', '4', '5', '6', '7', '8', '9']

pred_idx_to_token = dict(zip(range(len(answers)), answers))


f = open('predicted_answers.txt', 'w')
def write_preds(preds):
    for pred in preds:
        f.write(pred)
        f.write('\n')



device = 'cuda' if torch.cuda.is_available() else 'cpu'



batch_size = 128
for batch in range(0, len(programs_np), batch_size):
    image_idx = image_idx_np[batch:batch+batch_size]
    programs = torch.LongTensor(programs_np[batch:batch+batch_size]).to(device)
    
    if use_np_features:
        feats = torch.FloatTensor(np.asarray(features[image_idx])).to(device)
    else:
        # Using HDF5 files requires some overhead due to constraints on how those may
        # be accessed. We cannot index into the file using a numpy array. We also cannot 
        # access the same element multiple times (e.g. we cannot index into an h5py.File 
        # with [1,1,1]) because we are constrained to increasing sequences
        feats = []
        for idx in image_idx:
            feats.append(np.asarray(features[idx]))
        feats = torch.FloatTensor(np.asarray(feats)).to(device)

    outputs = tbd_net(feats, programs)
    _, preds = outputs.max(1)
    preds = [pred_idx_to_token[pred] for pred in preds.detach().to('cpu').numpy()]
    write_preds(preds)
f.close()

and error as

Traceback (most recent call last):
  File "eval.py", line 72, in <module>
    outputs = tbd_net(feats, programs)
  File "/home/dengwei/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/dengwei/tbd-nets/tbd/module_net.py", line 195, in forward
    output = module(feat_input, output)
  File "/home/dengwei/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/dengwei/tbd-nets/tbd/modules.py", line 92, in forward
    attended_feats = torch.mul(feats, attn.repeat(1, self.dim, 1, 1))
RuntimeError: The size of tensor a (128) must match the size of tensor b (16384) at non-singleton dimension 1

maybe I should use NUMPY file rather HDF5 file?I extract feature from this master.

The environment setting

I find that I am stuck with the environment settings. My system is Ubuntu 16.04 ,NVIDIA driver 384.111 cuda9.1 and GTX 1080 ti.
But the error with step 2 is "Cuda runtime error(25):CUDA driver version is insufficient for CUDA runtime version".
With the NVIDIA driver up to 387.26 or 390.42, Ubuntu cannot identity the NVIDIA driver.
Nevertheless with CUDA version down to 8, the other ImportError libcudart.so.9.1: cannot open shared object files.
So may I ask what's the environment setting appropriate for the recreation?

Version bumps

I am able to run our demo on PyTorch 1.1, and I am sure we are Python 3.7 compatible. Can we bump our documented versions?

What's the use of cat operation in SameModule

In SameModule, after the mul of feats and attended_feats，cross-correlation is done. So what's the use of cat operation? cat with attn, does it help?

About converting HDF5 to npy

Hi, what is the advantage of converting HDF5 to npy first? Will it accelerate the training speed or accuracy?

Efficiency question about the model

Hey, I didn't run the code yet. But I noticed the code module_net.py process questions in a batch one by one, the batch only share the same stem and classifier module. Although this design is quite reasonable since different questions need different modules, I still worry about the efficiency of the training pharse. What's your setup while training (GPU number, batchsize, training time, etc..)? Do you have some advices on accelerating this? Thanks!

How to evaluate test results?

Hi,
After getting predicted answers for test data, how can I evaluate results?
Since there are different setting in your paper (e.g., Count, Compare, Exist, and so on), could u have code snippet to conveniently achieve this?
Thanks

Properties not specified in modules?

Hey, I've read the codes for different modules. It seems that the modules does not contain any design for encoding properties (e.g. red or blue for color property). Take attention module for example, if we're not sure what color we are attending, how can the module attend to the right locations? Please correct me if I missed something, thanks!

Training the model Step 2

`imresize` was removed in scipy 1.3

deprecated in 1.0; they suggest using straight PIL instead

davidmascharka / tbd-nets Goto Github PK

tbd-nets's People

Contributors

Stargazers

Watchers

Forkers

tbd-nets's Issues

Recommend Projects

Recommend Topics

Recommend Org