akrizhevsky / cuda-convnet2 Goto Github PK

View Code? Open in Web Editor NEW

765.0 79.0 287.0 1.47 MB

Automatically exported from code.google.com/p/cuda-convnet2

License: Apache License 2.0

Shell 0.13% Python 12.05% Makefile 0.93% Cuda 82.99% C++ 3.91%

cuda-convnet2's Introduction

cuda-convnet2

Automatically exported from code.google.com/p/cuda-convnet2

You can read the documentation in two ways:

On this site: go to branches > wiki.
On Google Code (for now?): https://code.google.com/p/cuda-convnet2/

cuda-convnet2's People

Contributors

Stargazers

Watchers

Forkers

bestimage-tencent taizixzl wlw208dzy rt0220 cmxnono victorus tmoore7 coldzoo baowenbo abhi-kumar t0903 nkhuyu dawity hujinshui deercoderresearch jainxy xyy19920105 lwossnig phenixi 10sun wanji darkseed aniketvartak leimingyu milestonesvn jnhwkim kotalikg laudarch lingerlman amitibo vyouman wang93 marszhuo rtvt123 victoralov mrkn chengchengowen poneyo twistedmove cvml elliotchencv yuechengli fmacias64 alexeykurov hardegg sigmaquan allenhsin huangzehao ztgao cpehle myles-zmy dsimba tvijay333 xiaoyutao liyucode rollingstone yuanattach sunyinhuicoding xhy20070406 stanta pinglmlcv pombredanne lifematrix yukoba malaychakrabarti muhammadriz tinyloop zendevelopmentsystems kensun0 nagyistge belvo rubik0cube hma02 felixzhang00 bcyj xflee world2005 davenw16hd junguri97 xuxucmkox fleapo joyivan euwen cykustcc wltongxing zhongqiang-guo wanjinchang virajshah esokullu echohenry2006 bowrein winning1120xx tianyahechu tigerjim wangyutum marcoadurno harry-chen-1116 unosonu walkoncross holdlen2dh

cuda-convnet2's Issues

conv1 weights and biases become nan

What steps will reproduce the problem?
1. Follow the steps in Compiling, Data, and TrainingExample for ILSVRC2012
2. I used same parameters for running convnet except --train-freq=10


What is the expected output? What do you see instead?
Keep training with correct weights. (Top-5 error reaches 0.7 before 13th epochs)
After 14th epochs, conv1 weights and biases become nan and top-5 error becomes 
0.99...

What version of the product are you using? On what operating system?
latest version of cuda-convnet2, Ubuntu 14.04

Please provide any additional information below.

Original issue reported on code.google.com by [email protected] on 1 Dec 2014 at 1:07

Compatible with CUDA 7.0?

Hi,
I use CUDA 7.0 and two NVIDIA GPUs TITAN X to run the examples. (I have changed the path to my environment.

python convnet.py --data-path /usr/local/storage/akrizhevsky/ilsvrc-2012-batches --train-range 0-417 --test-range 1000-1016 --save-path /usr/local/storage/akrizhevsky/tmp --epochs 90 --layer-def layers/layers-imagenet-1gpu.cfg --layer-params layers/layer-params-imagenet-1gpu.cfg --data-provider image --inner-size 224 --gpu 0 --mini 128 --test-freq 201 --color-noise 0.1

Nonetheless, I will get this error message.

src/nvmatrix.cu(394) : getLastCudaError() CUDA error : kSetupCurand: Kernel execution failed : (8) invalid device function.

Does current cuda-convnet2 support CUDA 7.0? or I need to downgrade my CUDA to 6.0? (I have tried but it seems that there are some issues in installing CUDA 6.0 on my server)

Thanks.

saving multiview predictions (--test-out) does not work

What steps will reproduce the problem?
1. train a model
2. multiview test the model and --test-out=1
3.

What is the expected output? What do you see instead?
probs matrix of multiview tested result.
All zero matrix

What version of the product are you using? On what operating system?
latest version

Please provide any additional information below.
Is --test-out function not yet developed? Since I saw the part of writing probs 
matrix is commented. Or is there any other simple way to save the multiview 
test predictions? thanks

Original issue reported on code.google.com by [email protected] on 13 Aug 2014 at 6:45

IndexErrors get thrown when running examples

After running the following command

python convnet.py --data-path /media/ae/ImageStuff/FotoDataSets/batches --train-range 0-417 --test-range 1000-1016 --save-path /media/ae/ImageStuff/FotoDataSets/savePoints --epochs 90 --layer-def layers/layers-imagenet-1gpu.cfg --layer-params layers/layer-params-imagenet-1gpu.cfg --data-provider image --inner-size 224 --gpu 0 --mini 128 --test-freq 201 --color-noise 0.1

The following exception is thrown at random times during the excecution

File "convnet.py", line 289, in
model.start()
File "/home/ae/DeepLearning/cuda-convnet2/python_util/gpumodel.py", line 134, in start
self.train()
File "/home/ae/DeepLearning/cuda-convnet2/python_util/gpumodel.py", line 169, in train
self.test_outputs += [self.get_test_error()]
File "/home/ae/DeepLearning/cuda-convnet2/python_util/gpumodel.py", line 242, in get_test_error
next_data = self.get_next_batch(train=False)
File "/home/ae/DeepLearning/cuda-convnet2/python_util/gpumodel.py", line 191, in get_next_batch
return self.parse_batch_data(dp.get_next_batch(), train=train)
File "/home/ae/DeepLearning/cuda-convnet2/convdata.py", line 189, in get_next_batch
self.get_data_from_loader()
File "/home/ae/DeepLearning/cuda-convnet2/convdata.py", line 152, in get_data_from_loader
self.data[self.d_idx] = self.load_data[0]
IndexError: list index out of range

I've attached a debugger and checked that self.load is empty thus self.load_data[0] causes an exception.

Is cuda-convnet2 a "back-end" or a self-contained deep learning library?

I've done some reading and it appears that some of the CUDA libraries available for deep learning can be integrated with (used by) DNN frameworks like Theano, etc. My main goal is to implement the image classification network described in Matthew Zeiler's paper "Visualizing and Understanding Convolutional Networks":

http://www.matthewzeiler.com/pubs/arxive2013/arxive2013.pdf

I'm really interested in replicating the visualization graphics described in that paper. I saw this page on Google Code in conjunction with the original cuda-convnet project:

https://code.google.com/p/cuda-convnet/

That appears to show the same graphics I saw in Matthew's paper, depicting the features the network is learning. So my questions are:

Is cuda-convnet2 a back-end library for other DNN frameworks or is it a self-contained package that can be used all by itself for building, training, and deploying a DNN?
If it is a back-end, what is the best DNN framework to use with it for someone with deep programming skills, but not heavy math skills? (Theano, Caffee, Torch7, etc.)
Can I use cuda-convnet2 implement the DNN in Matthew's paper and replicate the visualization graphics shown in that paper with cuda-convnet2? If I can, what other code/libraries will I need to replicate the visualization part? Are there any cuda-convnet2 samples that show how to do that?

Also, if I haven't already used up my question quota, is an Nvidia GTX 680 with 4GB RAM good enough to do useful work or is it better to bite the bullet and start with a GTX TitanX? ( which is about 3 times the price of a used GTX 680)

invalid device function with GTX 980

When i try to run training example of cuda-convnet2 i get this error :

src/nvmatrix.cu(394) : getLastCudaError() CUDA error : kSetupCurand: Kernel 
execution failed : (8) invalid device function .

I have GTX 980 on my machine and it has compute capability 5.2

I tried to modify makefiles in cudaconv3 & cudaconvnet & nvmatrix like this and 
to add 52 instead of 50 tooand i stil have same error. 

GENCODE_SM35    := -gencode arch=compute_35,code=sm_35
GENCODE_FLAGS   := $(GENCODE_SM35)

to

GENCODE_SM35    := -gencode arch=compute_35,code=sm_35
GENCODE_SM50    := -gencode arch=compute_50,code=sm_50
GENCODE_FLAGS   := $(GENCODE_SM50)

Original issue reported on code.google.com by [email protected] on 11 Apr 2015 at 9:21

Attachments:

log.txt.txt

(nvmatrix.cu) Kernel execution failed error with cuda5.5

What steps will reproduce the problem?
1. Compile Success -> Batch Generation Success -> Fail to run convnet.py
2.
3.

What is the expected output? What do you see instead?
src/nvmatrix.cu(394): getLastCudaError() CUDA error : kSetupCurand: Kernel 
execution failed : (8) invalid device function.

What version of the product are you using? On what operating system?
"cuda-convnet2-c67ec1220aca" with cuda5.5/python2.7.3

Please provide any additional information below.
There's no problem with cuda-convnet(convnet1) code, but with cuda-convnet2, 
this error occurred.

Original issue reported on code.google.com by [email protected] on 1 Aug 2014 at 10:09

Images missing in Wiki

In the Wiki branch, the Layers Parameters page is missing some images.
https://github.com/akrizhevsky/cuda-convnet2/blob/wiki/LayerParams.md

Memory limits due to texture memory

Nvidia cards don't allow textures bigger than 512MB. Because this code uses 
texture memory, this imposes a limit on the sizes of various buffers. For 
example if your layer has too many filters (such that its output size exceeds 
512MB), the code will crash.

TODO: add non-texture-using routines to bypass this.

Original issue reported on code.google.com by [email protected] on 25 Jul 2014 at 1:28

GTX7XX support

Will  the code run good on GTX 770, 780 and 780Ti GPU?
Thanks.

Original issue reported on code.google.com by [email protected] on 13 Oct 2014 at 12:02

Cannot access links from cuda-convnet2/LayerParams.md

Hello,

Why can't I access any links from cuda-convnet2/LayerParams.md, such as, https://camo.githubusercontent.com/811e79e3fd150648d6ff18a0786c9c1fa106ef49/687474703a2f2f637564612d636f6e766e65742e676f6f676c65636f64652e636f6d2f73766e2f77696b692f696d616765732f726e6f726d2e676966?

Can anyone help me?

Thanking you.

Kind regards.

Does not work on 8 GPUs

I have some problems running this code on 8 gpus. It crashed at the line: 
assert(same.size() == 3); in reducepipeline.cu

What steps will reproduce the problem?
1. get 8 k40 gpu, install them in 2 PCI buses. 4 for each.
2. train with 512 mini batch, data parallelism.

Original issue reported on code.google.com by [email protected] on 3 Oct 2014 at 5:04

Loading all data in shownet

What steps will reproduce the problem?
1.
2.
3.

What is the expected output? What do you see instead?


What version of the product are you using? On what operating system?


Please provide any additional information below.

Not exactly a bug, but if I want to see predictions with shownet (python 
shownet --show-preds=probs), the script loads all batches before showing me 
predictions from the test batch. 
If have many GBs of data for training, the script takes a lot of time before I 
can see test the case predictions.

Original issue reported on code.google.com by [email protected] on 18 Aug 2014 at 3:42

Cuda-convnet--ImportError: /usr/local/cuda/lib64/libcublas.so.4: undefined symbol: cudaMemsetAsync

Hi, all,
I am a new CNN learning and trying run cuda-convnet on different GPU platforms.
Cuda-convnet can run well on my server. but when I run cuda-convnet with the gpgpusim simulator. an error happenes. The error is "ImportError: /usr/local/cuda/lib64/libcublas.so.4: undefined symbol: cudaMemsetAsync"
The details are like this:

Importing _ConvNet C++ module
Traceback (most recent call last):
File "convnet.py", line 203, in
model = ConvNet(op, load_dic)
File "convnet.py", line 43, in init
IGPUModel.init(self, "ConvNet", op, load_dic, filename_options, dp_params=dp_params)
File "/root/convnet/trunk/gpumodel.py", line 88, in init
self.import_model()
File "convnet.py", line 49, in import_model
self.libmodel = import(lib_name)
ImportError: /usr/local/cuda/lib64/libcublas.so.4: undefined symbol: cudaMemsetAsync

The wired thing is the gpugpusim can run other cuda code. Cuda-convnet can run well on my server.
I setup my gpgpusim simulator following its tutorial and copy the config files into the convnet's work path, and then I type ""

My gcc version is gcc version 4.4.7 (Ubuntu/Linaro 4.4.7-8ubuntu1) . The GPU model is GTX480.
My cuda is version 4.0.
Here is the output of my libcublas.so
nm -D /usr/local/cuda/lib64/libcublas.so |grep ‘cuda’

             U __cudaRegisterFatBinary
             U __cudaRegisterFunction
             U __cudaRegisterTexture
             U __cudaUnregisterFatBinary
             U cudaBindTexture
             U cudaConfigureCall
             U cudaCreateChannelDesc
             U cudaEventCreateWithFlags
             U cudaEventDestroy
             U cudaEventQuery
             U cudaEventRecord
             U cudaEventSynchronize
             U cudaFree
             U cudaFuncGetAttributes
             U cudaGetDevice
             U cudaGetDeviceProperties
             U cudaGetExportTable
             U cudaGetLastError
             U cudaLaunch
             U cudaMalloc
             U cudaMemcpy
             U cudaMemcpy2D
             U cudaMemcpy2DAsync
             U cudaMemcpyAsync
             U cudaMemsetAsync
             U cudaSetupArgument
             U cudaThreadSynchronize
             U cudaUnbindTexture

ll /usr/local/cuda/lib64/libcublas.so

lrwxrwxrwx 1 root root 14 Mar 17 16:10 /usr/local/cuda/lib64/libcublas.so -> libcublas.so.4*

Should I change something in the makefile? Due to the requirement of the gpgpusim simulator, I could not update the Cuda toolkit version.

Could anyone help me out with this issue? Thanks a lot in advance!

Multiple data layer with binomialcrossEntropyCostLayer

What steps will reproduce the problem?
1. There are more than 2 data layers
2. Use BinomialCrossEntropyCostLayer cost layer
3.

What is the expected output? What do you see instead?

The (output) dimension of third data layer is 1700.

Without adding "start=0 end=1700" in the layer definition file for the third 
layer, the program will crash, 
 The error info is

src/../../cudaconv3/include/../../nvmatrix/include/nvmatrix.cuh:376: void 
NVMatrix::applyBinary(Op, NVMatrix&, NVMatrix&, CUstream_st*) [with Op = 
BinomialCrossEntOperator]: Assertion `this->isSameDims(b)' failed.

Then I add the following lines in layer.cu
          int numCases = labels.getLeadingDim(); //line 2108 in layer.cu
          printf("%d %d=====\n\n",probs.getNumRows(), probs.getNumCols()); 
          printf("%d %d=====\n\n",labels.getNumRows(),labels.getNumCols());

     The size of labels is (0, 1024), and the size of probs is (1700,1024).

After adding start=0 end=1700, the size will be correct, but I got the 
following error, 
        CUDA error at src/../include/memory.cuh:272 code=2(cudaErrorMemoryAllocation) "cudaMalloc(data, size)" 



What version of the product are you using? On what operating system?
Cuda5.5, CentOS6.5

Please provide any additional information below.

Original issue reported on code.google.com by [email protected] on 26 Aug 2014 at 3:19

Error: cannot allocate memory for thread-local data: ABORT


When running the convnet.py, I encountered the following error:
“1.1 (0.00%)...cannot allocate memory for thread-local data: ABORT”.
I have no idea about this.

By the way, I used cuda-convnet2 and the codes are running on Red Hat.

Original issue reported on code.google.com by [email protected] on 30 Nov 2014 at 4:21

Benchmark on CUDA 6.5

It looks like the RC for CUDA 6.5 is out: 
https://developer.nvidia.com/cuda-toolkit

Original issue reported on code.google.com by [email protected] on 31 Jul 2014 at 8:12

program will crash because of line 1473 in nvmatrix/src/nvmatrix.cu

What steps will reproduce the problem?
1. I can reproduce it if I am luck 
2.
3.

What is the expected output? What do you see instead?


What version of the product are you using? On what operating system?
state of the art 

Please provide any additional information below.

this is because the tye of cudaTextureObject_t is not a pointer

Original issue reported on code.google.com by [email protected] on 30 Jul 2014 at 2:28

Remove NPY deprecated warnings

What steps will reproduce the problem?
1. building project dumps a lot of deprecated warnings due to NPY api version
2.
3.

What is the expected output? What do you see instead?


What version of the product are you using? On what operating system?


Please provide any additional information below.
added some workarounds to remove the messages in my cloned version: 
alexpark-numpyclean

Original issue reported on code.google.com by [email protected] on 26 Nov 2014 at 9:58

The imagenet example is getting an error of Empty input file even if the all batch files were previously generated

I am trying to run the imagenet example:
python convnet.py --data-path /nvme/my/ilsvrc-2012/batches/ --train-range 0-417 --test-range 1000-1016 --save-path /nvme/my/ilsvrc-2012/storage/tmp/ --epochs 90 --layer-def layers/layers-imagenet-1gpu.cfg --layer-params layers/layer-params-imagenet-1gpu.cfg --data-provider image --inner-size 224 --gpu 0 --mini 128 --test-freq 201 --color-noise 0.1

However, the execution is getting an error of "Empty input file". The input file exist; I have successfully executed the make-data.py script and all batch files were created. Although, the folder that contains the batch files has only 23MB... Is it right? Isn't the batch files supposed to carry the images?

Note that, the cifar example runs (with the given batch files).

Please, find following the full log of the execution.

Initialized data layer 'data', producing 150528 outputs

Initialized data layer 'labvec', producing 1 outputs

Initialized convolutional layer 'conv1' on GPUs 0, producing 55x55 64-channel output

Initialized cross-map response-normalization layer 'rnorm1' on GPUs 0, producing 55x55 64-channel output

Initialized max-pooling layer 'pool1' on GPUs 0, producing 27x27 64-channel output

Initialized convolutional layer 'conv2' on GPUs 0, producing 27x27 192-channel output

Initialized cross-map response-normalization layer 'rnorm2' on GPUs 0, producing 27x27 192-channel output

Initialized max-pooling layer 'pool2' on GPUs 0, producing 13x13 192-channel output

Initialized convolutional layer 'conv3' on GPUs 0, producing 13x13 384-channel output

Initialized convolutional layer 'conv4' on GPUs 0, producing 13x13 256-channel output

Initialized convolutional layer 'conv5' on GPUs 0, producing 13x13 256-channel output

Initialized max-pooling layer 'pool3' on GPUs 0, producing 6x6 256-channel output

Initialized fully-connected layer 'fc4096a' on GPUs 0, producing 4096 outputs

Initialized dropout2 layer 'dropout1' on GPUs 0, producing 4096 outputs

Initialized fully-connected layer 'fc4096b' on GPUs 0, producing 4096 outputs

Initialized dropout2 layer 'dropout2' on GPUs 0, producing 4096 outputs

Initialized fully-connected layer 'fc1000' on GPUs 0, producing 1000 outputs

Initialized softmax layer 'probs' on GPUs 0, producing 1000 outputs

Initialized logistic regression cost 'logprob' on GPUs 0

Initialized neuron layer 'fc4096b_neuron' on GPUs 0, producing 4096 outputs

Initialized neuron layer 'conv3_neuron' on GPUs 0, producing 64896 outputs

Initialized neuron layer 'conv2_neuron' on GPUs 0, producing 139968 outputs

Initialized neuron layer 'conv4_neuron' on GPUs 0, producing 43264 outputs

Initialized neuron layer 'pool3_neuron' on GPUs 0, producing 9216 outputs

Initialized neuron layer 'pool1_neuron' on GPUs 0, producing 46656 outputs

Initialized neuron layer 'fc4096a_neuron' on GPUs 0, producing 4096 outputs

Layer conv3_neuron using acts from layer conv3

Layer fc4096a_neuron using acts from layer fc4096a

Layer fc4096b_neuron using acts from layer fc4096b

Layer conv2_neuron using acts from layer conv2

Layer conv4_neuron using acts from layer conv4

=========================

Importing cudaconvnet._ConvNet C++ module

Device id=0

Fwd terminal: logprob

found bwd terminal conv1[0] in passIdx=0

=========================

Training ConvNet

Add PCA noise to color channels with given scale : 0.1

Check gradients and quit? : 0 [DEFAULT]

Conserve GPU memory (slower)? : 0 [DEFAULT]

Convert given conv layers to unshared local :

Cropped DP: crop size (0 = don't crop) : 224

Cropped DP: test on multiple patches? : 0 [DEFAULT]

Data batch range: testing : 1000-1016

Data batch range: training : 0-417

Data path : /nvme/my/ilsvrc-2012/batches/

Data provider : image

Force save before quitting : 0 [DEFAULT]

GPU override : 0

Layer definition file : layers/layers-imagenet-1gpu.cfg

Layer file path prefix : [DEFAULT]

Layer parameter file : layers/layer-params-imagenet-1gpu.cfg

Load file : [DEFAULT]

Logreg cost layer name (for --test-out) : [DEFAULT]

Minibatch size : 128

Number of epochs : 90

Output test case predictions to given path : [DEFAULT]

Save file override :

Save path : /nvme/my/ilsvrc-2012/storage/tmp/

Subtract this scalar from image (-1 = don't) : -1 [DEFAULT]

Test and quit? : 0 [DEFAULT]

Test on one batch at a time? : 1 [DEFAULT]

Testing frequency : 201

Unshare weight matrices in given layers :

Write test data features from given layer : [DEFAULT]

Write test data features to this path (to be used with --write-features): [DEFAULT]

=========================

Running on CUDA device(s) 0

Current time: Tue Mar 13 09:59:17 2018

Saving checkpoints to /nvme/my/ilsvrc-2012/storage/tmp/ConvNet__2018-03-13_09.59.10

=========================

Empty input file

cost.sum2 crash

What steps will reproduce the problem?
1. Simply add a cost.sum2 layer into the layer definition
2. Run it
3.

What is the expected output? What do you see instead?
It crashed and said:
python: src/nvmatrix.cu:738: bool NVMatrix::resize(int, int, bool): Assertion 
`_ownsData || (_numElements == numRows * numCols && isContiguous())' failed.
Error signal 6:

What version of the product are you using? On what operating system?
Latest cuda-convnet2 + Titan + CUDA 5.5 + Ubuntu 12.04

Please provide any additional information below.

To reproduce the problem, download the attached layer definition (reg.cfg and 
reg-params.cfg) and test it with command:

python convnet.py --data-path=. --save-path=./tmp --test-range=1 
--train-range=1 --layer-def=layers/reg-ori.cfg 
--layer-params=layers/reg-params.cfg --data-provider=dummy-labeled-1 --gpu=0

It seems that getAct() of the sum2 layer will produce a 0*128 matrix and thus 
cause the error.

Original issue reported on code.google.com by [email protected] on 8 Aug 2014 at 12:42

Attachments:

python: src/../include/data.cuh:34: void CPUData::assertDimensions(): Assertion `_data->at(i-1)->isTrans() == _data->at(i)->isTrans()' failed.

python convnet.py --data-path ~/data/8_imagenet/ilsvrc2012/batchesx --train-range 0-417 --test-range 1000-1016 --save-path
~/data/8_imagenet/ilsvrc2012/models/ --epochs 90 --layer-def layers/layers-imagenet-1gpu.cfg --layer-params layers/layer-params-imagenet-1gpu.cfg --da
ta-provider image --inner-size 227 --gpu 0 --mini 128 --test-freq 201 --color-noise 0.1
Initialized data layer 'data', producing 154587 outputs
Initialized data layer 'labvec', producing 1 outputs
Initialized convolutional layer 'conv1' on GPUs 0, producing 55x55 64-channel output
Initialized cross-map response-normalization layer 'rnorm1' on GPUs 0, producing 55x55 64-channel output
Initialized max-pooling layer 'pool1' on GPUs 0, producing 27x27 64-channel output
Initialized convolutional layer 'conv2' on GPUs 0, producing 27x27 192-channel output
Initialized cross-map response-normalization layer 'rnorm2' on GPUs 0, producing 27x27 192-channel output
Initialized max-pooling layer 'pool2' on GPUs 0, producing 13x13 192-channel output
Initialized convolutional layer 'conv3' on GPUs 0, producing 13x13 384-channel output
Initialized convolutional layer 'conv4' on GPUs 0, producing 13x13 256-channel output
Initialized convolutional layer 'conv5' on GPUs 0, producing 13x13 256-channel output
Initialized max-pooling layer 'pool3' on GPUs 0, producing 6x6 256-channel output
Initialized fully-connected layer 'fc4096a' on GPUs 0, producing 4096 outputs
Initialized dropout2 layer 'dropout1' on GPUs 0, producing 4096 outputs
Initialized fully-connected layer 'fc4096b' on GPUs 0, producing 4096 outputs
Initialized dropout2 layer 'dropout2' on GPUs 0, producing 4096 outputs
Initialized fully-connected layer 'fc1000' on GPUs 0, producing 1000 outputs
Initialized softmax layer 'probs' on GPUs 0, producing 1000 outputs
Initialized logistic regression cost 'logprob' on GPUs 0
Initialized neuron layer 'fc4096b_neuron' on GPUs 0, producing 4096 outputs
Initialized neuron layer 'conv3_neuron' on GPUs 0, producing 64896 outputs
Initialized neuron layer 'conv2_neuron' on GPUs 0, producing 139968 outputs
Initialized neuron layer 'conv4_neuron' on GPUs 0, producing 43264 outputs
Initialized neuron layer 'pool3_neuron' on GPUs 0, producing 9216 outputs
Initialized neuron layer 'pool1_neuron' on GPUs 0, producing 46656 outputs
Initialized neuron layer 'fc4096a_neuron' on GPUs 0, producing 4096 outputs
Layer conv3_neuron using acts from layer conv3
Layer fc4096a_neuron using acts from layer fc4096a
Layer fc4096b_neuron using acts from layer fc4096b
Layer conv2_neuron using acts from layer conv2

Layer conv4_neuron using acts from layer conv4

Importing cudaconvnet._ConvNet C++ module
Fwd terminal: logprob

found bwd terminal conv1[0] in passIdx=0

Training ConvNet
Add PCA noise to color channels with given scale : 0.1
Check gradients and quit? : 0 [DEFAULT]
Conserve GPU memory (slower)? : 0 [DEFAULT]
Convert given conv layers to unshared local :
Cropped DP: crop size (0 = don't crop) : 227
Cropped DP: test on multiple patches? : 0 [DEFAULT]
Data batch range: testing : 1000-1016
Data batch range: training : 0-417
Data path : ./data/8_imagenet/ilsvrc2012/batchesx
Data provider : image
Force save before quitting : 0 [DEFAULT]
GPU override : 0
Layer definition file : layers/layers-imagenet-1gpu.cfg
Layer file path prefix : [DEFAULT]
Layer parameter file : layers/layer-params-imagenet-1gpu.cfg
Load file : [DEFAULT]
Logreg cost layer name (for --test-out) : [DEFAULT]
Minibatch size : 128
Number of epochs : 90
Output test case predictions to given path : [DEFAULT]
Save file override :
Save path : ./data/8_imagenet/ilsvrc2012/models/
Subtract this scalar from image (-1 = don't) : -1 [DEFAULT]
Test and quit? : 0 [DEFAULT]
Test on one batch at a time? : 1 [DEFAULT]
Testing frequency : 201
Unshare weight matrices in given layers :
Write test data features from given layer : [DEFAULT]

Write test data features to this path (to be used with --write-features): [DEFAULT]

Running on CUDA device(s) 0
Current time: Fri Jun 12 00:03:57 2015

Saving checkpoints to ./data/8_imagenet/ilsvrc2012/models/ConvNet__2015-06-12_00.03.53

1.0 (0.00%)... FUCK1
FUCK11
python: src/../include/data.cuh:34: void CPUData::assertDimensions(): Assertion `_data->at(i-1)->isTrans() == _data->at(i)->isTrans()' failed.
Error signal 6:
/home/chenxiu/src/36_convnet/cudaconvnet/_ConvNet.so(_Z13signalHandleri+0x1b)[0x7fc1c442d5fb]
/lib64/libc.so.6[0x37c6a326a0]
/lib64/libc.so.6(gsignal+0x35)[0x37c6a32625]
/lib64/libc.so.6(abort+0x175)[0x37c6a33e05]
/lib64/libc.so.6[0x37c6a2b74e]
/lib64/libc.so.6(__assert_perror_fail+0x0)[0x37c6a2b810]
./src/36_convnet/cudaconvnet/_ConvNet.so(_ZN7CPUData16assertDimensionsEv+0x144)[0x7fc1c442eb34]
./src/36_convnet/cudaconvnet/_ConvNet.so(Z10startBatchP7_objectS0+0x78)[0x7fc1c442dee8]
/usr/lib64/libpython2.6.so.1.0(PyEval_EvalFrameEx+0x5244)[0x37d3ad59e4]
/usr/lib64/libpython2.6.so.1.0(PyEval_EvalFrameEx+0x63ef)[0x37d3ad6b8f]
/usr/lib64/libpython2.6.so.1.0(PyEval_EvalCodeEx+0x927)[0x37d3ad7657]
/usr/lib64/libpython2.6.so.1.0(PyEval_EvalFrameEx+0x5304)[0x37d3ad5aa4]
/usr/lib64/libpython2.6.so.1.0(PyEval_EvalFrameEx+0x63ef)[0x37d3ad6b8f]
/usr/lib64/libpython2.6.so.1.0(PyEval_EvalFrameEx+0x63ef)[0x37d3ad6b8f]
/usr/lib64/libpython2.6.so.1.0(PyEval_EvalCodeEx+0x927)[0x37d3ad7657]
/usr/lib64/libpython2.6.so.1.0(PyEval_EvalCode+0x32)[0x37d3ad7732]
/usr/lib64/libpython2.6.so.1.0[0x37d3af1bac]
/usr/lib64/libpython2.6.so.1.0(PyRun_FileExFlags+0x90)[0x37d3af1c80]
/usr/lib64/libpython2.6.so.1.0(PyRun_SimpleFileExFlags+0xdc)[0x37d3af316c]
/usr/lib64/libpython2.6.so.1.0(Py_Main+0xb62)[0x37d3aff8a2]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x37c6a1ed5d]
python[0x400649]

Element wise sum not working as expected

What steps will reproduce the problem?
1. Add an element-wise sum layer to your config file.
2. Specify exactly same inputs to the 'inputs' parameter
3. Specify coeffs=1, -1

What is the expected output? What do you see instead?
With exactly same inputs and the coeffs specified as 1 and -1, it is expected 
that the output of the layer produce 0.0
Instead we see a non-zero output. Also changing the coeffs values does not seem 
to have any effect.

What version of the product are you using? On what operating system?
cuda-convnet2 on ubuntu-linux

Please provide any additional information below.

Original issue reported on code.google.com by [email protected] on 29 Oct 2014 at 11:17