vacancy / nscl-pytorch-release Goto Github PK
View Code? Open in Web Editor NEWPyTorch implementation for the Neuro-Symbolic Concept Learner (NS-CL).
Home Page: http://nscl.csail.mit.edu
License: MIT License
PyTorch implementation for the Neuro-Symbolic Concept Learner (NS-CL).
Home Page: http://nscl.csail.mit.edu
License: MIT License
I've followed all the instructions in the README.md. However, when I get to running the command jac-crun <gpu_id> scripts/trainval.py --desc experiments/clevr/desc_nscl_derender.py --training-target derender --curriculum all --dataset clevr --data-dir <data_dir>/clevr/train --batch-size 32 --epoch 100 --validation-interval 5 --save-interval 5 --data-split 0.95
, I get this error
(nscl) crytting@hatch:~$ jac-crun GPU-0ff7a712-8a42-af6d-7f21-17147fda6a7c --desc experiments/clevr/desc_nscl_derender.py --training-target derender --curriculum all --dataset clevr --data-dir ./clevr/train --batch-size 32 --epoch 100 --validation-interval 5 --save-interval 5 --data-split 0.95
11 09:56:09 Loading jacinle config: /home/crytting/Jacinle/jacinle.yml.
11 09:56:09 Loading vendor: ReservoirSample-PyTorch.
11 09:56:09 Loading vendor: AdvancedIndexing-PyTorch.
11 09:56:09 Loading vendor: SynchronizedBatchNorm-PyTorch.
11 09:56:09 Loading vendor: PreciseRoIPooling-PyTorch.
11 09:56:09 Loading vendor: SceneGraphParser.
/home/crytting/Jacinle/bin/jac-run: line 10: exec: --: invalid option
exec: usage: exec [-cl] [-a name] [command [arguments ...]] [redirection ...]
The line 10 that it references is
exec "$@"
from the mentioned file /home/crytting/Jacinle/bin/jac-run
. Any ideas on how to fix?
Hi,
The links to the project page and object detection files is broken.
The clevr/train/scenes.json and clevr/val/scenes.json can not be found.
When trying to train, it stucks at 'Building data loader' stage.
In addition, when Testing, stops at 34% of validation
I run your code, while the following message doesn't appear.
[07 16:30:54 [email protected]:/data/vision/billf/scratch/jiayuanm/projects/NSCL-PyTorch/nscl/datasets/factory.py] Filtering out questions containing "how big" and "made of", #before = 699989, #after = 633615.
Hi!
Just wanted to make a request. Would it be possible for you to add instructions in the README.md for how one can replicate results on the VQS dataset?
Thanks!
The links to the project page and object detection files is broken.
The following error appears with PyTorch 1.1:
File "Jacinle/jactorch/data/collate.py", line 149, in _stack
if torchdl._use_shared_memory:
AttributeError: module 'torch.utils.data.dataloader' has no attribute '_use_shared_memory'
dataloader in 1.1 does not have an attribute '_use_shared_memory', see here.
Suppose the model predicts a synonym for metal, e.g. "shiny", how does the program executor map shiny to what it is really supposed to mean, i.e. metal?
Hello,
I can train the model since the process kills itself after this message:
Building the data loader. Curriculum = 3/8, length = 32218.
Epoch 15 acc/qa=1.000000 loss=0.046158 loss/qa=0.046158 time/data=0.008719 time/step=1.016501: 100%|##############################| 1006/1006 [18:08<00:00, 1.08s/it]
Epoch 15 (validation) validation/acc/qa=1.000000: 2%|#4 | 20/1094 [00:41<11:04, 1.62it/s]/home/colors/Desktop/nscl/Jacinle/bin/jac-crun: line 6: 3305 Killed
Hi, an AssertionError
assert len(progs) == len(batch_features)
appears when
args.gpu_parallel=True
did you meet this error?
Hi!
We are currently doing a research on your VQA papers, and would really like to have the source code for zero annotation parser training, as it was one of the benefits of the NSCL paper.
Note: This current release contains only training codes for the visual modules. That is, currently we still assume that a semantic parser is pre-trained using program annotations. In the full NS-CL, this pre-training is not required. We also plan to release the full training code soon.
Is there any possibility code will be released soon?
Hi,
I had been trying to run your code to train a model for the CLEVR dataset but I'm running into an issue. The traceback is below:
...
09 15:59:43 Building the model.
09 16:01:08 Writing meter logs to file: "dumps/clevr/desc_nscl_derender/derender-curriculum_all-qtrans_off/meta/run-2019-07-09-15-58-56.meter.json".
09 16:01:08 Building the data loader.
09 16:05:22 Building the data loader. Curriculum = 3/4, length = 1930.
0%| | 0/60 [00:00<?, ?it/s]Using /tmp/torch_extensions as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /tmp/torch_extensions/_prroi_pooling/build.ninja...
Building extension module _prroi_pooling...
ninja: no work to do.
Loading extension module _prroi_pooling...
cudaCheckError() failed : CUDA driver version is insufficient for CUDA runtime version
This was the command that I ran:
jac-crun 1 scripts/trainval.py --desc experiments/clevr/desc_nscl_derender.py --training-target derender --curriculum all --dataset clevr --data-dir clevr/train --batch-size 32 --epoch 100 --validation-interval 5 --save-interval 5 --data-split 0.95
I checked that my driver version (410.78) is in fact compatible with the CUDA version (10.0). Moreover, I am able to run other code that relies on PyTorch and uses the GPU. Am I missing something here?
Would appreciate any help, thanks!
This current release contains only training codes for the visual modules. That is, currently we still assume that a semantic parser is pre-trained using program annotations. In the full NS-CL, this pre-training is not required. We also plan to release the full training code soon.
Just a reminder, will there be any updates about the training codes of the semantic parser? Thanks!
Hello, any chance you could release a one-or-two self contained file with the trained mask-rcnn model, I'd like to use to perform some experiments.
Thanks in advance
For the validation, questions.json file is provided. It contains 15,000 unique images with filenames ranging from CLEVR_val_000000.png to CLEVR_val_014999.png. However, the original CLEVR dataset has two validation splits, split A has filenames going from CLEVR_ValA_00000.png to CLEVR_ValA_014999.png, whereas split B has filenames going from CLEVR_ValB_000000.png to CLEVR_ValB_000000.png. The images in each split are different, so which filenames is questions.json referring to?
Hello,
Can anyone explain me what is the difference between 'scenes-raw.json' and 'scenes.json' files? The scenes.json files with direct link have the exact same format data as in clevr dataset scenes files.
Hi,
you mention that the code for training full semantic parser will be released later. It will be helpful if instructions for training semantic parser can be provided or full code can be released. I am trying to replicate the experiments for CLEVR and VQS datasets
RuntimeError: one of the variables needed for gradient computation has been modified by an in-place operation: [torch.cuda.FloatTensor [3, 256, 7, 7]], which is output 0 of CudnnConvolutionBackward, is at version 1; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
while running the training code using the command provided in the readme. Further research showed that I could debug this error using the PyTorch anomaly detector. This showed that the error occurred when calling a forward function.
Full traceback:
/pytorch/torch/csrc/autograd/python_anomaly_mode.cpp:57: UserWarning: Traceback of forward call that caused the error:
File "scripts/trainval.py", line 401, in
main()
File "scripts/trainval.py", line 173, in main
main_train(train_dataset, validation_dataset, extra_dataset)
File "scripts/trainval.py", line 291, in main_train
train_epoch(epoch, trainer, train_dataloader, meters)
File "scripts/trainval.py", line 346, in train_epoch
loss, monitors, output_dict, extra_info = trainer.step(feed_dict, cast_tensor=False)
File "/home/user/Jacinle/jactorch/train/env.py", line 135, in step
loss, monitors, output_dict = self._model(feed_dict)
File "/home/user/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "experiments/clevr/desc_nscl_derender.py", line 40, in forward
f_sng = self.scene_graph(f_scene, feed_dict.objects, feed_dict.objects_length)
File "/home/weichen/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/data2/PycharmProjects/NSCL-PyTorch-Release/nscl/nn/scene_graph/scene_graph.py", line 125, in forward
this_object_features[sub_id], this_object_features[obj_id],
Traceback (most recent call last):
File "scripts/trainval.py", line 401, in
main()
File "scripts/trainval.py", line 173, in main
main_train(train_dataset, validation_dataset, extra_dataset)
File "scripts/trainval.py", line 291, in main_train
train_epoch(epoch, trainer, train_dataloader, meters)
File "scripts/trainval.py", line 346, in train_epoch
loss, monitors, output_dict, extra_info = trainer.step(feed_dict, cast_tensor=False)
File "/home/user/Jacinle/jactorch/train/env.py", line 155, in step
loss.backward()
File "/home/user/.local/lib/python3.7/site-packages/torch/tensor.py", line 118, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/user/.local/lib/python3.7/site-packages/torch/autograd/init.py", line 93, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [3, 256, 7, 7]], which is output 0 of CudnnConvolutionBackward, is at version 1; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
The error occurred in epoch 6. The loss value seemed to be decreasing normally.
Epoch 6 acc/qa=0.687500 loss=0.708516 loss/qa=0.708516 time/data=0.542045 time/step=1.222588: 0%| | 1/469 [00:01<13:45, 1.76s/i
Epoch 6 acc/qa=0.562500 loss=0.713307 loss/qa=0.713307 time/data=0.009401 time/step=1.012083: 0%| | 1/469 [00:02<13:45, 1.76s/i
Epoch 6 acc/qa=0.562500 loss=0.713307 loss/qa=0.713307 time/data=0.009401 time/step=1.012083: 0%| | 2/469 [00:02<11:59, 1.54s/i
Epoch 6 acc/qa=0.812500 loss=0.566920 loss/qa=0.566920 time/data=0.011552 time/step=1.050238: 0%| | 2/469 [00:03<11:59, 1.54s/i
Epoch 6 acc/qa=0.812500 loss=0.566920 loss/qa=0.566920 time/data=0.011552 time/step=1.050238: 1%| | 3/469 [00:03<10:51, 1.40s/it]/
Any ideas,
Thanks
Hi, in your paper, Pr[object i is Red] is given by shifted and scaled sigmoid function, but there seems no sigmoid function in your code as shown below
If so, the range of the 'logits' variable can be a problem when it adds with the 'belong' vector. Could you explain more about this? Thx!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.