hyeonwoonoh / dppnet Goto Github PK
View Code? Open in Web Editor NEWDPPnet: Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction
License: Other
DPPnet: Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction
License: Other
NVCC is running and using older version of torch which is referred in one of the solutions of the problem.
Still getting this exception and not getting any stack trace.
Line 1874 "local is_cached = path.isfile(cache_path)" in vqa_loader.lua attempt to index global 'path' (a nil value) as it has not been declared anywhere.
In the Readme file, it says Run "setup.sh" for setting up. But I could not find the file in this repository.
I get a C++ exception when I run the vqa_tes.lua in 006_test_DPPNet. Could you help me out?
Problem Statement:
When I am running the following command
th vqa_train.lua -gpuid 1
I get the following message :
loading cache..: /home1/badri/badripatro/VQA/workspace_project/image_qa_dpp/DPPnet-master/004_train_DPPnet_fixed_cnn/cache/vqa_data_cache_major_test-dev2015_54
done
creating a neural network with random initialization
/home/cse/torch/install/bin/luajit: C++ exception
badri@cse-desktop:/DPPnet-master/004_train_DPPnet_fixed_cnn$
I have installed all the Dependencies for this code such as
torch [https://github.com/torch/distro]
loadcaffe [https://github.com/szagoruyko/loadcaffe]
xxhash [install: luarocks install xxhash]
Also, I have narrowed it down to the line 79 of file "DPPnet-master_1/model/HashedNets/HasherME.lua" and get get "libhashnn.mysort()" has problem
libhashnn.mysort(self['sort_key_' .. WorB],self['sort_val_'.. WorB])
Does anyone have any advice on how I can try to further determine the problem?
Problem Statement:
When I am running the following command
th vqa_train.lua -gpuid 1
I get the following message :
loading cache..: /home1/badri/badripatro/VQA/workspace_project/image_qa_dpp/DPPnet-master/004_train_DPPnet_fixed_cnn/cache/vqa_data_cache_major_test-dev2015_54
done
creating a neural network with random initialization
/home/cse/torch/install/bin/luajit: C++ exception
badri@cse-desktop:/DPPnet-master/004_train_DPPnet_fixed_cnn$
Also, I have narrowed it down to the line 79 of file "DPPnet-master_1/model/HashedNets/HasherME.lua" and get get "libhashnn.mysort()" has problem
libhashnn.mysort(self['sort_key' .. WorB],self['sort_val_'.. WorB])_
Then I have commented the line -79, and complied again
th vqa_train.lua -gpuid 1
I get the following message :
loading cache..: /home1/badri/badripatro/VQA/workspace_project/image_qa_dpp/DPPnet-master/004_train_DPPnet_fixed_cnn/cache/vqa_data_cache_major_test-dev2015_54
done
creating a neural network with random initialization
initialing weights..
[train2014val2014] set batch order option 1 : shuffle __________________________________________________
THCudaCheck FAIL file=/home1/badri/torch/extra/cutorch/lib/THC/generic/THCStorage.c line=147 error=77 : an illegal memory access was encountered
/home1/badri/torch/install/bin/luajit: cuda runtime error (77) : an illegal memory access was encountered at /home1/badri/torch/extra/cutorch/lib/THC/generic/THCStorage.c:147
I have narrowed this problem down to the line 423 of file
004_train_DPPnet_fixed_cnn/vqa_train.lua
**dlinear_out[i] = HasherME:backward(dhashed_out)**
Still on more debug, find in line no 114 of file "DPPnet-master_1/model/HashedNets/HasherME.lua" and get get "libhashnn.mysort()" has problem
libhashnn.myreduce(self.sort_key_W,self.gradOBuffer,self.unique_idxW,self.gradInput,self.buffer_W)
Always getting problem in the "libhashnn".
Does anyone have any advice on how I can try to further determine the problem?
Hi
I tried to run 004_train_DPPnet_fixed_cnn/vqa_train.lua -gpuid -1 but I encounterd following error:
lua: bad argument #1 to '?' (table expected, got string)
stack traceback:
[C]: ?
[C]: in function 'require'
vqa_train.lua:102: in main chunk
[C]: ?
Can you please suggest possible reasons or ways to fix it.
Thanks
When compiling under CUDA 7.5 fails with following error
/usr/local/cuda/include/cuda_fp16.h:314:83: error: conflicting declaration of C function ‘__half __ldg(const _half)’
/usr/local/cuda/include/cuda_fp16.h:313:60: note: previous declaration ‘__half2 __ldg(const _half2)’
/usr/local/cuda/include/cuda_fp16.h: In function ‘__half2 __ldg(const _half2)’:
/usr/local/cuda/include/cuda_fp16.h:1180:84: error: conflicting declaration of C function ‘__half2 __ldg(const _half2)’
/usr/local/cuda/include/cuda_fp16.h:314:59: note: previous declaration ‘__half __ldg(const __half*)’
CMake Error at hashnn_generated_myhashnn.cu.o.cmake:262 (message):
Error generating file
/home/ap/DPPnet/model/HashedNets/libhashnn/_build/CMakeFiles/hashnn.dir//./hashnn_generated_myhashnn.cu.o
I find that it cost about 48 hours to train 005_train_DPPnet_finetune_cnn.
Can we use cudnn to speed up this project? Have you ever tried to do so ?
If I want to use cudnn, which part do I need to modify?
hi
Are quantitative results on VQA test-dev stored somewhere? I can just see the json file for it.
when I run vqa_test.lua,it occurs:
/home/jxas/cwq/torch/install/bin/luajit: C++ exception
I find that the problem may be casued by the function " libhashnn.mysort(self['sort_key_' .. WorB],self['sort_val_'.. WorB])".
I wrint the print statement,and find that the program can't catch the function libhashnn.mysort() in the file called myhashnn.cu.
I see the previous Issues about this problem ,and I install the old version torch ,but it doesn't work.
who can give some addvice to solve the problem?
hello~
when i run"th vqa_test.lua" or "th vqa_train.lua",the results are as the follows:
/home/amax/torch/install/bin/luajit: cannot open </home/amax/Desktop/lichunye/DPPnet-master/006_test_DPPnet/data/VQA_torch/Annotations/mscoco_train2014_annotations/annotations.t7> in mode r at /home/amax/torch/pkg/torch/lib/TH/THDiskFile.c:673
stack traceback:
[C]: at 0x7f1b638d9450
[C]: in function 'DiskFile'
/home/amax/torch/install/share/lua/5.1/torch/File.lua:405: in function 'load'
./utils/vqa_loader.lua:37: in function 'load_data'
./utils/vqa_loader.lua:1900: in function 'load_data'
vqa_test.lua:98: in main chunk
[C]: in function 'dofile'
...amax/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00405d50
and it is empty in the "006_test_DPPnet/save_result_vqa_test/results/"and "004_train_DPPnet_fixed_cnn/save_result_vqa/results/"
but i don't why this happens.
Running ./compile.sh compiled libhashnn.
Then when I call HashLinear, it cannot get libhashnn.
It shows me following error-
./HashLinear.lua:119: attempt to index global 'libhashnn' (a nil value)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.