Git Product home page Git Product logo

cuda-convnet2's People

Contributors

akrizhevsky avatar

Watchers

 avatar

cuda-convnet2's Issues

cost.sum2 crash

What steps will reproduce the problem?
1. Simply add a cost.sum2 layer into the layer definition
2. Run it
3.

What is the expected output? What do you see instead?
It crashed and said:
python: src/nvmatrix.cu:738: bool NVMatrix::resize(int, int, bool): Assertion 
`_ownsData || (_numElements == numRows * numCols && isContiguous())' failed.
Error signal 6:

What version of the product are you using? On what operating system?
Latest cuda-convnet2 + Titan + CUDA 5.5 + Ubuntu 12.04

Please provide any additional information below.

To reproduce the problem, download the attached layer definition (reg.cfg and 
reg-params.cfg) and test it with command:

python convnet.py --data-path=. --save-path=./tmp --test-range=1 
--train-range=1 --layer-def=layers/reg-ori.cfg 
--layer-params=layers/reg-params.cfg --data-provider=dummy-labeled-1 --gpu=0

It seems that getAct() of the sum2 layer will produce a 0*128 matrix and thus 
cause the error.

Original issue reported on code.google.com by [email protected] on 8 Aug 2014 at 12:42

Attachments:

program will crash because of line 1473 in nvmatrix/src/nvmatrix.cu

What steps will reproduce the problem?
1. I can reproduce it if I am luck 
2.
3.

What is the expected output? What do you see instead?


What version of the product are you using? On what operating system?
state of the art 

Please provide any additional information below.

this is because the tye of cudaTextureObject_t is not a pointer

Original issue reported on code.google.com by [email protected] on 30 Jul 2014 at 2:28

Loading all data in shownet

What steps will reproduce the problem?
1.
2.
3.

What is the expected output? What do you see instead?


What version of the product are you using? On what operating system?


Please provide any additional information below.

Not exactly a bug, but if I want to see predictions with shownet (python 
shownet --show-preds=probs), the script loads all batches before showing me 
predictions from the test batch. 
If have many GBs of data for training, the script takes a lot of time before I 
can see test the case predictions.

Original issue reported on code.google.com by [email protected] on 18 Aug 2014 at 3:42

Multiple data layer with binomialcrossEntropyCostLayer

What steps will reproduce the problem?
1. There are more than 2 data layers
2. Use BinomialCrossEntropyCostLayer cost layer
3.

What is the expected output? What do you see instead?

The (output) dimension of third data layer is 1700.

Without adding "start=0 end=1700" in the layer definition file for the third 
layer, the program will crash, 
 The error info is

src/../../cudaconv3/include/../../nvmatrix/include/nvmatrix.cuh:376: void 
NVMatrix::applyBinary(Op, NVMatrix&, NVMatrix&, CUstream_st*) [with Op = 
BinomialCrossEntOperator]: Assertion `this->isSameDims(b)' failed.

Then I add the following lines in layer.cu
          int numCases = labels.getLeadingDim(); //line 2108 in layer.cu
          printf("%d %d=====\n\n",probs.getNumRows(), probs.getNumCols()); 
          printf("%d %d=====\n\n",labels.getNumRows(),labels.getNumCols());

     The size of labels is (0, 1024), and the size of probs is (1700,1024).

After adding start=0 end=1700, the size will be correct, but I got the 
following error, 
        CUDA error at src/../include/memory.cuh:272 code=2(cudaErrorMemoryAllocation) "cudaMalloc(data, size)" 



What version of the product are you using? On what operating system?
Cuda5.5, CentOS6.5

Please provide any additional information below.


Original issue reported on code.google.com by [email protected] on 26 Aug 2014 at 3:19

(nvmatrix.cu) Kernel execution failed error with cuda5.5

What steps will reproduce the problem?
1. Compile Success -> Batch Generation Success -> Fail to run convnet.py
2.
3.

What is the expected output? What do you see instead?
src/nvmatrix.cu(394): getLastCudaError() CUDA error : kSetupCurand: Kernel 
execution failed : (8) invalid device function.

What version of the product are you using? On what operating system?
"cuda-convnet2-c67ec1220aca" with cuda5.5/python2.7.3

Please provide any additional information below.
There's no problem with cuda-convnet(convnet1) code, but with cuda-convnet2, 
this error occurred.


Original issue reported on code.google.com by [email protected] on 1 Aug 2014 at 10:09

Benchmark on CUDA 6.5

It looks like the RC for CUDA 6.5 is out: 
https://developer.nvidia.com/cuda-toolkit

Original issue reported on code.google.com by [email protected] on 31 Jul 2014 at 8:12

conv1 weights and biases become nan

What steps will reproduce the problem?
1. Follow the steps in Compiling, Data, and TrainingExample for ILSVRC2012
2. I used same parameters for running convnet except --train-freq=10


What is the expected output? What do you see instead?
Keep training with correct weights. (Top-5 error reaches 0.7 before 13th epochs)
After 14th epochs, conv1 weights and biases become nan and top-5 error becomes 
0.99...

What version of the product are you using? On what operating system?
latest version of cuda-convnet2, Ubuntu 14.04

Please provide any additional information below.


Original issue reported on code.google.com by [email protected] on 1 Dec 2014 at 1:07

Element wise sum not working as expected

What steps will reproduce the problem?
1. Add an element-wise sum layer to your config file.
2. Specify exactly same inputs to the 'inputs' parameter
3. Specify coeffs=1, -1

What is the expected output? What do you see instead?
With exactly same inputs and the coeffs specified as 1 and -1, it is expected 
that the output of the layer produce 0.0
Instead we see a non-zero output. Also changing the coeffs values does not seem 
to have any effect.

What version of the product are you using? On what operating system?
cuda-convnet2 on ubuntu-linux

Please provide any additional information below.


Original issue reported on code.google.com by [email protected] on 29 Oct 2014 at 11:17

Does not work on 8 GPUs

I have some problems running this code on 8 gpus. It crashed at the line: 
assert(same.size() == 3); in reducepipeline.cu

What steps will reproduce the problem?
1. get 8 k40 gpu, install them in 2 PCI buses. 4 for each.
2. train with 512 mini batch, data parallelism.


Original issue reported on code.google.com by [email protected] on 3 Oct 2014 at 5:04

Error: cannot allocate memory for thread-local data: ABORT


When running the convnet.py, I encountered the following error:
“1.1 (0.00%)...cannot allocate memory for thread-local data: ABORT”.
I have no idea about this.

By the way, I used cuda-convnet2 and the codes are running on Red Hat.



Original issue reported on code.google.com by [email protected] on 30 Nov 2014 at 4:21

Memory limits due to texture memory

Nvidia cards don't allow textures bigger than 512MB. Because this code uses 
texture memory, this imposes a limit on the sizes of various buffers. For 
example if your layer has too many filters (such that its output size exceeds 
512MB), the code will crash.

TODO: add non-texture-using routines to bypass this. 

Original issue reported on code.google.com by [email protected] on 25 Jul 2014 at 1:28

GTX7XX support

Will  the code run good on GTX 770, 780 and 780Ti GPU?
Thanks. 

Original issue reported on code.google.com by [email protected] on 13 Oct 2014 at 12:02

saving multiview predictions (--test-out) does not work

What steps will reproduce the problem?
1. train a model
2. multiview test the model and --test-out=1
3.

What is the expected output? What do you see instead?
probs matrix of multiview tested result.
All zero matrix

What version of the product are you using? On what operating system?
latest version

Please provide any additional information below.
Is --test-out function not yet developed? Since I saw the part of writing probs 
matrix is commented. Or is there any other simple way to save the multiview 
test predictions? thanks

Original issue reported on code.google.com by [email protected] on 13 Aug 2014 at 6:45

Remove NPY deprecated warnings

What steps will reproduce the problem?
1. building project dumps a lot of deprecated warnings due to NPY api version
2.
3.

What is the expected output? What do you see instead?


What version of the product are you using? On what operating system?


Please provide any additional information below.
added some workarounds to remove the messages in my cloned version: 
alexpark-numpyclean


Original issue reported on code.google.com by [email protected] on 26 Nov 2014 at 9:58

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.