dmlc / cxxnet Goto Github PK

move forward to https://github.com/dmlc/mxnet

License: Other

Python 9.90% Shell 0.51% C++ 84.12% Cuda 0.43% C 1.67% MATLAB 2.08% Mercury 0.01% Makefile 1.29%

cxxnet's Introduction

Distributed Machine Learning Common Codebase

DMLC-Core is the backbone library to support all DMLC projects, offers the bricks to build efficient and scalable distributed machine learning libraries.

Developer Channel

What's New

Note on Parameter Module for Machine Learning

Known Issues

RecordIO format is not portable across different processor endians. So it is not possible to save RecordIO file on a x86 machine and then load it on a SPARC machine, because x86 is little endian while SPARC is big endian.

Contributing

Contributing to dmlc-core is welcomed! dmlc-core follows google's C style guide. If you are interested in contributing, take a look at feature wishlist and open a new issue if you like to add something.

DMLC-Core uses C++11 standard. Ensure that your C++ compiler supports C++11.
Try to introduce minimum dependency when possible

CheckList before submit code

Type make lint and fix all the style problems.
Type make doc and fix all the warnings.

NOTE

deps:

libcurl4-openssl-dev

cxxnet's People

Contributors

Stargazers

Watchers

Forkers

chen3feng bruce2008github wachaong erkang wangshaohan1989 ty01csbaidu jchjava alienfeel raytan zhimingz kennydreame stoictraveler lcode chenlingpeng chenglin mulinfro ericchanbd fanfannothing chagge golden1232004 douxiaotian yiiwood niuzhiheng iemppu jianminsun reyoung spideryan openhero sijinli maydaygmail prez1313 scyoyo qhduan jdeng pl8787 dreadlord1984 mfzhang ktoc fdoperezi vseledkin mapleyustat fish444555 nkhuyu xjzhou ldfaiztt ilovejs iverzhang languagefun ebenolson vickkyy uknowxk narayana1208 njuhugn wycg1984 darcy0511 xiaodiu2010 darrenwang00 billy-inn jiansen udibr aronmiller jimmy0000 zerkh liyong3forever taposh tohnperfect navoj fcastanedo uday1889 j007 wenhuizhang zhjting1981 timwee pjankiewicz sinzero vipuldivyanshu92 palvesyale gumpu xiaozhouwang hothhowler hhamin eubmabe twistedmove yzli chenglongchen chris1201 mikethomson namlq arogers1 jeffhebert mdjawad mrgloom agnish mitchsanders suzhaolong starsnet83 dfrsg seyvu stevenhickson xinyu009

cxxnet's Issues

Does this lib support the spatial pyramid pooling as in He Kaiming's SPP net ?

Tutorial and documentation

Hello,

I was looking into the kaggle example file bowl.conf. An extract is pasted below:
...
layer[0->1] = conv
kernel_size = 5
stride = 4
nchannel = 96
pad = 2
layer[1->2] = relu
layer[2->3] = max_pooling
kernel_size = 3
stride = 2

layer[3->4] = conv
nchannel = 128
kernel_size = 3
pad = 2
...
I found the keyword 'pad'. I could not find any details in the documentation. Could you please mention the meaning of this parameter?

Thank you.

Possible data augmentation: brightness and/or contrast

After taking a look at the work you're doing in the V2-refactor branch - very nice by the way! - I thought it might be worth mentioning another possible augmentation: random brightness and/or contrast changes. It would be very simple to do using OpenCV's Mat::convertTo.

Reference: http://docs.opencv.org/modules/core/doc/basic_structures.html#void%20Mat::convertTo%28OutputArray%20m,%20int%20rtype,%20double%20alpha,%20double%20beta%29%20const

Tutorial: http://docs.opencv.org/doc/tutorials/core/basic_linear_transform/basic_linear_transform.html#brightness-and-contrast-adjustments

rookie's question about how to set up binary classification

sorry if this sounds too rudimentary.

I am learning the CNN and tried to use cxxnet for some test runs. I tried to set up a simple classification test run, just for sanity check but doesn't seem get the right result. The labels are just 0 and 1 and images are 25x25 pixels. I modified my run from the MNIST example

here is my configuration file

# training iterator
data = train
iter = mnist
    path_img = "../PrepareData/train_image.gz"
    path_label = "../PrepareData/train_label.gz"
    input_flat = 0
    shuffle = 1
iter = end
# evaluation iterator
eval = test
iter = mnist
    input_flat = 0
    path_img = "../PrepareData/test_image.gz"
    path_label = "../PrepareData/test_label.gz"
iter = end

netconfig=start
layer[0->1] = conv:cv1
  kernel_size = 4
  pad = 0
  stride = 1
  nchannel = 32
  random_type = xavier
  no_bias=0
layer[1->2] = max_pooling
  kernel_size = 2
  stride = 2
layer[2->3] = flatten
layer[3->3] = dropout
  threshold = 0.5
layer[3->4] = fullc:fc1
  nhidden = 100
  init_sigma = 0.01
layer[4->5] = sigmoid:se1
layer[5->6] = fullc:fc2
  nhidden = 10
  init_sigma = 0.01
layer[6->6] = softmax
netconfig=end

# input shape not including batch
input_shape = 1,25,25
batch_size = 100

## global parameters
dev = cpu
save_model = 15
max_round = 15
num_round = 15
train_eval = 1
random_type = gaussian
## learning parameters
eta = 0.1
momentum = 0.9
wd  = 0.0
# evaluation metric
#metric = logloss
metric = error
eval_train = 1
# end of config
--------------------------------------------------------------------

and here is the output

--------------------------------------------------------------------
finish initialization with 1 devices
Initializing layer: cv1
Initializing layer: 1
Initializing layer: 2
Initializing layer: 3
Initializing layer: fc1
Initializing layer: se1
Initializing layer: fc2
Initializing layer: 7
SGDUpdater: eta=0.100000, mom=0.900000
SGDUpdater: eta=0.100000, mom=0.900000
SGDUpdater: eta=0.100000, mom=0.900000
SGDUpdater: eta=0.100000, mom=0.900000
SGDUpdater: eta=0.100000, mom=0.900000
SGDUpdater: eta=0.100000, mom=0.900000
node[in].shape: 100,1,25,25
node[1].shape: 100,32,22,22
node[2].shape: 100,32,11,11
node[3].shape: 100,1,1,3872
node[4].shape: 100,1,1,100
node[5].shape: 100,1,1,100
node[6].shape: 100,1,1,10
MNISTIterator: load 38376 images, shuffle=1, shape=100,1,25,25
MNISTIterator: load 38377 images, shuffle=0, shape=100,1,25,25
initializing end, start working
update round 0
round        0:[     300] 52 sec elapsed[1]     train-error:0.166554    test-error:0.0373629
round        1:[     300] 139 sec elapsed[2]    train-error:0.0303655   test-error:0.0174674
round        2:[     300] 225 sec elapsed[3]    train-error:0.180522    test-error:0.501018
round        3:[     300] 310 sec elapsed[4]    train-error:0.500235    test-error:0.501018
round        4:[     300] 392 sec elapsed[5]    train-error:0.500235    test-error:0.501018
round        5:[     300] 473 sec elapsed[6]    train-error:0.500235    test-error:0.501018
round        6:[     300] 554 sec elapsed[7]    train-error:0.500235    test-error:0.501018
round        7:[     300] 641 sec elapsed[8]    train-error:0.500235    test-error:0.501018
round        8:[     300] 720 sec elapsed[9]    train-error:0.500235    test-error:0.501018
round        9:[     300] 802 sec elapsed[10]   train-error:0.500235    test-error:0.501018

Basically the training error "converges" to 0.5:-(

I tried to use only 1 or 2 unit at output layer, and change the metric to logloss but seems always get the same "converged" training error. can you see if I made some stupid mistakes

Thanks

how to get raw output from the network

Is there a way to get the raw output from the output? I tried to use the "pred" keyword and it only gives me the classified output

Thanks

How can I deal with my dataset as the correct format input?

I have a dataset which contains int and double,how can I deal with my datasets as the correct format input?

Can logloss be added to the metrics?

I see that it is in the dev and refactor branches but not in the master. When I tried to run it on the kaggle problem it did not seem to work correctly since logloss stayed above 4.0 for the entire training.

Predictions are only written in multiples of the batch size

The batch size in the pred.conf file included with the kaggle competition is 100; this does not cause an issue when scoring the kaggle test set since that has 130400 observations. However I found that when predicting on another dataset which has 9051 observations the output was truncated at 9000 records. I was able to correct this by setting batch size to 1.

This behavior is a little unintuitive; you may want to consider scoring the fractional batch as well when the task is set to predict.

Trouble compiling cxxnet on OSX 10.9.4

I am having trouble compiling cxxnet on OSX 10.9.4

the make ends with error as follows:Undefined symbols for architecture x86_64:
"cv::imread(std::1::basic_string<char, std::1::char_traits, std::1::allocator > const&, int)", referenced from:
cxxnet::ImageIterator::LoadImage(mshadow::TensorContainer<mshadow::cpu, 3>&, cxxnet::DataInst&, char const) in cxxnet_data.o
"std::string::Rep::M_destroy(std::allocator const&)", referenced from:
cxxnet::SGHMCUpdater<mshadow::gpu, 1>::SGHMCUpdater(mshadow::Randommshadow::gpu&, mshadow::Tensor<mshadow::gpu, 1>&, mshadow::Tensor<mshadow::gpu, 1>&, char const) in cxxnet_nnet_gpu.o
cxxnet::SGHMCUpdater<mshadow::gpu, 1>::~SGHMCUpdater() in cxxnet_nnet_gpu.o
cxxnet::SGDUpdater<mshadow::gpu, 1>::SGDUpdater(mshadow::Tensor<mshadow::gpu, 1>&, mshadow::Tensor<mshadow::gpu, 1>&, char const) in cxxnet_nnet_gpu.o
cxxnet::SGDUpdater<mshadow::gpu, 1>::~SGDUpdater() in cxxnet_nnet_gpu.o
cxxnet::SGDUpdater<mshadow::gpu, 1>::~SGDUpdater() in cxxnet_nnet_gpu.o
cxxnet::SGHMCUpdater<mshadow::gpu, 3>::SGHMCUpdater(mshadow::Randommshadow::gpu&, mshadow::Tensor<mshadow::gpu, 3>&, mshadow::Tensor<mshadow::gpu, 3>&, char const) in cxxnet_nnet_gpu.o
cxxnet::SGHMCUpdater<mshadow::gpu, 3>::~SGHMCUpdater() in cxxnet_nnet_gpu.o
...
"std::string::Rep::S_empty_rep_storage", referenced from:
cxxnet::SGHMCUpdater<mshadow::gpu, 1>::SGHMCUpdater(mshadow::Randommshadow::gpu&, mshadow::Tensor<mshadow::gpu, 1>&, mshadow::Tensor<mshadow::gpu, 1>&, char const) in cxxnet_nnet_gpu.o
cxxnet::SGHMCUpdater<mshadow::gpu, 1>::~SGHMCUpdater() in cxxnet_nnet_gpu.o
cxxnet::SGDUpdater<mshadow::gpu, 1>::SGDUpdater(mshadow::Tensor<mshadow::gpu, 1>&, mshadow::Tensor<mshadow::gpu, 1>&, char const) in cxxnet_nnet_gpu.o
cxxnet::SGDUpdater<mshadow::gpu, 1>::~SGDUpdater() in cxxnet_nnet_gpu.o
cxxnet::SGDUpdater<mshadow::gpu, 1>::~SGDUpdater() in cxxnet_nnet_gpu.o
cxxnet::SGHMCUpdater<mshadow::gpu, 3>::SGHMCUpdater(mshadow::Randommshadow::gpu&, mshadow::Tensor<mshadow::gpu, 3>&, mshadow::Tensor<mshadow::gpu, 3>&, char const) in cxxnet_nnet_gpu.o
cxxnet::SGHMCUpdater<mshadow::gpu, 3>::~SGHMCUpdater() in cxxnet_nnet_gpu.o
...
"std::string::assign(char const, unsigned long)", referenced from:
cxxnet::SGHMCUpdater<mshadow::gpu, 1>::SGHMCUpdater(mshadow::Randommshadow::gpu&, mshadow::Tensor<mshadow::gpu, 1>&, mshadow::Tensor<mshadow::gpu, 1>&, char const) in cxxnet_nnet_gpu.o
cxxnet::SGDUpdater<mshadow::gpu, 1>::SGDUpdater(mshadow::Tensor<mshadow::gpu, 1>&, mshadow::Tensor<mshadow::gpu, 1>&, char const) in cxxnet_nnet_gpu.o
cxxnet::SGHMCUpdater<mshadow::gpu, 3>::SGHMCUpdater(mshadow::Randommshadow::gpu&, mshadow::Tensor<mshadow::gpu, 3>&, mshadow::Tensor<mshadow::gpu, 3>&, char const_) in cxxnet_nnet_gpu.o
cxxnet::SGDUpdater<mshadow::gpu, 3>::SGDUpdater(mshadow::Tensor<mshadow::gpu, 3>&, mshadow::Tensor<mshadow::gpu, 3>&, char const_) in cxxnet_nnet_gpu.o
cxxnet::SGHMCUpdater<mshadow::gpu, 2>::SGHMCUpdater(mshadow::Randommshadow::gpu&, mshadow::Tensor<mshadow::gpu, 2>&, mshadow::Tensor<mshadow::gpu, 2>&, char const_) in cxxnet_nnet_gpu.o
cxxnet::SGDUpdater<mshadow::gpu, 2>::SGDUpdater(mshadow::Tensor<mshadow::gpu, 2>&, mshadow::Tensor<mshadow::gpu, 2>&, char const_) in cxxnet_nnet_gpu.o
cxxnet::utils::MetricSet::AddMetric(char const_) in cxxnet_nnet_gpu.o
...
"std::string::assign(std::string const&)", referenced from:
std::vector<std::pair<std::string, std::string>, std::allocator<std::pair<std::string, std::string> > >::M_insert_aux(__gnu_cxx::__normal_iterator<std::pair<std::string, std::string>, std::vector<std::pair<std::string, std::string>, std::allocator<std::pair<std::string, std::string> > > >, std::pair<std::string, std::string> const&) in cxxnet_nnet_gpu.o
"std::basic_string<char, std::char_traits, std::allocator >::basic_string(char const_, std::allocator const&)", referenced from:
cxxnet::NetConfigHelper::SetParam(char const_, char const_) in cxxnet_nnet_gpu.o
"std::basic_string<char, std::char_traits, std::allocator >::basic_string(std::string const&)", referenced from:
cxxnet::NetConfigHelper::SetParam(char const_, char const_) in cxxnet_nnet_gpu.o
std::vector<std::pair<std::string, std::string>, std::allocator<std::pair<std::string, std::string> > >::M_insert_aux(__gnu_cxx::__normal_iterator<std::pair<std::string, std::string>, std::vector<std::pair<std::string, std::string>, std::allocator<std::pair<std::string, std::string> > > >, std::pair<std::string, std::string> const&) in cxxnet_nnet_gpu.o
"std::_throw_length_error(char const)", referenced from:
std::vector<float, std::allocator >::_M_fill_insert(__gnu_cxx::_normal_iterator<float, std::vector<float, std::allocator > >, unsigned long, float const&) in cxxnet_nnet_gpu.o
std::vector<std::vector<float, std::allocator >, std::allocator<std::vector<float, std::allocator > > >::M_fill_insert(__gnu_cxx::__normal_iterator<std::vector<float, std::allocator >, std::vector<std::vector<float, std::allocator >, std::allocator<std::vector<float, std::allocator > > > >, unsigned long, std::vector<float, std::allocator > const&) in cxxnet_nnet_gpu.o
std::vector<float, std::allocator >::_M_insert_aux(__gnu_cxx::_normal_iterator<float, std::vector<float, std::allocator > >, float const&) in cxxnet_nnet_gpu.o
std::vector<int, std::allocator >::_M_fill_insert(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator > >, unsigned long, int const&) in cxxnet_nnet_gpu.o
std::vector<cxxnet::IUpdater*, std::allocatorcxxnet::IUpdater* >::_M_insert_aux(__gnu_cxx::__normal_iterator<cxxnet::IUpdater**, std::vector<cxxnet::IUpdater*, std::allocatorcxxnet::IUpdater* > >, cxxnet::IUpdater* const&) in cxxnet_nnet_gpu.o
std::vector<cxxnet::ILayer*, std::allocatorcxxnet::ILayer* >::_M_insert_aux(__gnu_cxx::__normal_iterator<cxxnet::ILayer**, std::vector<cxxnet::ILayer*, std::allocatorcxxnet::ILayer* > >, cxxnet::ILayer* const&) in cxxnet_nnet_gpu.o
std::vectorcxxnet::Node<mshadow::gpu, std::allocatorcxxnet::Node<mshadow::gpu > >::M_insert_aux(__gnu_cxx::_normal_iteratorcxxnet::Node<mshadow::gpu, std::vectorcxxnet::Node<mshadow::gpu, std::allocatorcxxnet::Node<mshadow::gpu > > >, cxxnet::Nodemshadow::gpu const&) in cxxnet_nnet_gpu.o
...
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make: ** [bin/cxxnet] Error 1

Any help appreciated.

trains with cpu but not with gpu

Hi again,

I got everything setup with my own data. When I go to train with dev=gpu the program just stops after "update round 0". There is no error given (please see below). I get the same issue when I try to run the MNIST example with gpu. dev=cpu makes everything work fine. I don't think there is a problem with my cuda installation because it seems to recognize the video card and I have used the gpu in other frameworks before. Any ideas?

lex@lex-lin:/Documents/cxxnet/bin$ ./cxxnet ../example/ImageNet/ImageNet.conf
Use CUDA Device 0: GeForce GTX TITAN Black
CXXNetTrainer, devCPU=0
ConvolutionLayer: nstep=15
ConvolutionLayer: nstep=9
ConvolutionLayer: nstep=43
ConvolutionLayer: nstep=26
ConvolutionLayer: nstep=26
SGDUpdater: eta=0.010000, mom=0.900000
SGDUpdater: eta=0.020000, mom=0.900000
SGDUpdater: eta=0.010000, mom=0.900000
SGDUpdater: eta=0.020000, mom=0.900000
SGDUpdater: eta=0.010000, mom=0.900000
SGDUpdater: eta=0.020000, mom=0.900000
SGDUpdater: eta=0.010000, mom=0.900000
SGDUpdater: eta=0.020000, mom=0.900000
SGDUpdater: eta=0.010000, mom=0.900000
SGDUpdater: eta=0.020000, mom=0.900000
SGDUpdater: eta=0.010000, mom=0.900000
SGDUpdater: eta=0.020000, mom=0.900000
SGDUpdater: eta=0.010000, mom=0.900000
SGDUpdater: eta=0.020000, mom=0.900000
SGDUpdater: eta=0.010000, mom=0.900000
SGDUpdater: eta=0.020000, mom=0.900000
node[0].shape: 256,3,227,227
node[1].shape: 256,96,55,55
node[2].shape: 256,96,55,55
node[3].shape: 256,96,27,27
node[4].shape: 256,96,27,27
node[5].shape: 256,256,27,27
node[6].shape: 256,256,27,27
node[7].shape: 256,256,13,13
node[8].shape: 256,256,13,13
node[9].shape: 256,384,13,13
node[10].shape: 256,384,13,13
node[11].shape: 256,384,13,13
node[12].shape: 256,384,13,13
node[13].shape: 256,256,13,13
node[14].shape: 256,256,13,13
node[15].shape: 256,256,6,6
node[16].shape: 1,1,256,9216
node[17].shape: 1,1,256,4096
node[18].shape: 1,1,256,4096
node[19].shape: 1,1,256,4096
node[20].shape: 1,1,256,4096
node[21].shape: 1,1,256,1000
ThreadImagePageIterator:image_list=/home/lex/Documents/cxxnet/lexutils/train.lst, bin=/home/lex/Documents/cxxnet/tools/TRAIN.BIN
ThreadBufferIterator: buffer_size=2
ThreadImagePageIterator:image_list=/home/lex/Documents/cxxnet/lexutils/test.lst, bin=/home/lex/Documents/cxxnet/tools/TEST.BIN
initializing end, start working
update round 0lex@lex-lin:/Documents/cxxnet/bin$

Pin() and Unpin()

I could not see the necessity of Pin() and Unpin() function, to me it seems like the middle calculation result is not trustworthy.

Training stop at update round 0

I follow kaggle bowl example. However, when I tried to train the model, it stops at the 0 round.
I saw there is a similar issue: #10
However, I think my mshadow is up-to-date, and the suggestion:

#if !defined(__CUDA_ARCH__) || __CUDA_ARCH__ >= 200

has already been added into mshadow...
So I'm wondering what is the cause?

My environment:

/opt/NVIDIA/cuda-5.5
/opt/intel/mkl-2013
/opt/opencv-2.4.7

The full output of training:

$ ../../bin/cxxnet bowl.conf
Use CUDA Device 0: Tesla C2070
CXXNetTrainer, devCPU=0
ConvolutionLayer: nstep=256
ConvolutionLayer: nstep=256
ConvolutionLayer: nstep=128
SGDUpdater: eta=0.010000, mom=0.900000
SGDUpdater: eta=0.020000, mom=0.900000
SGDUpdater: eta=0.010000, mom=0.900000
SGDUpdater: eta=0.020000, mom=0.900000
SGDUpdater: eta=0.010000, mom=0.900000
SGDUpdater: eta=0.020000, mom=0.900000
SGDUpdater: eta=0.010000, mom=0.900000
SGDUpdater: eta=0.020000, mom=0.900000
SGDUpdater: eta=0.010000, mom=0.900000
SGDUpdater: eta=0.020000, mom=0.900000
SGDUpdater: eta=0.010000, mom=0.900000
SGDUpdater: eta=0.020000, mom=0.900000
node[0].shape: 256,3,48,48
node[1].shape: 256,96,12,12
node[2].shape: 256,96,12,12
node[3].shape: 256,96,6,6
node[4].shape: 256,128,8,8
node[5].shape: 256,128,8,8
node[6].shape: 256,128,8,8
node[7].shape: 256,128,8,8
node[8].shape: 256,128,4,4
node[9].shape: 1,1,256,2048
node[10].shape: 1,1,256,512
node[11].shape: 1,1,256,512
node[12].shape: 1,1,256,512
node[13].shape: 1,1,256,512
node[14].shape: 1,1,256,121
ThreadImagePageIterator:image_list=./train.lst, bin=./train.bin
loading mean image from models/image_mean.bin
ThreadBufferIterator: buffer_size=2
ThreadImagePageIterator:image_list=./train.lst, bin=./train.bin
loading mean image from models/image_mean.bin
ThreadBufferIterator: buffer_size=2
initializing end, start working
update round 0[lazywei@accelerando kaggle_bowl]$

Build Issue on Ubuntu 12.04

Hi,

I have a trouble to install cxxnet on my PC with Intel i5 on Ubuntu 12.04
-- libevent-pthreads: 2.0.21-stable-1ubuntu1
-- gcc: (Ubuntu 4.9.2-0ubuntu1~14.04) 4.9.2

The following is the error message

$ sudo make blas=1
g++ -Wall -g -O3 -msse3 -Wno-unknown-pragmas -funroll-loops -I./mshadow/ -DMSHADOW_USE_MKL=0 -DMSHADOW_USE_CBLAS=1  -o bin/cxxnet cxxnet/cxxnet_main.cpp cxxnet_data.o cxxnet_nnet_cpu.o cxxnet_nnet_gpu.o -lm -lcudart -lcublas -lcurand -lz `pkg-config --libs opencv` -lblas
/usr/bin/ld: cxxnet_data.o: undefined reference to symbol 'sem_post@@GLIBC_2.2.5'
//lib/x86_64-linux-gnu/libpthread.so.0: error adding symbols: DSO missing from command line
collect2: error: ld returned 1 exit status
make: *** [bin/cxxnet] Error 1

In the Kaggle Forum Bing suggests it might related to pthread, but as I use blas=1, and

ifeq ($(blas),1)
 LDFLAGS= -lm -lcudart -lcublas -lcurand -lz `pkg-config --libs opencv` -lblas
 CFLAGS+= -DMSHADOW_USE_MKL=0 -DMSHADOW_USE_CBLAS=1
else
 LDFLAGS= -lm -lcudart -lcublas -lmkl_core -lmkl_intel_lp64 -lmkl_intel_thread -liomp5 -lpthread -lcurand -lz `pkg-config --libs opencv`
endif

It seems that pthread is not used in my build.

Do you have further sugggestions?

Cheers

Can cxxnet included in GraphLab Create run in a cluster?

It seems that the open source GraphLab repository on github does not contain anything related to the extensions described by Vijay Srinivas A of Impetus Technologies in Realizing Large-scale Distributed Deep Learning Networks over GraphLab. Is the cxxnet deep learning module of GraphLab Create a single node version or is it able to run in a cluster?

GPU does not train for kaggle_bowl example

I've been able to run the kaggle_bowl example but the GPU runs report a training-error that remains flat. I run the same bowl.conf only with dev=cpu and the training seems to begin to train fine. (I've appended the output of both below). I haven't otherwise changed bowl.conf. I did run the GPU on the MNIST example and it worked fine.

Do you have an idea of what might be going on here? Thanks!

Use CUDA Device 0: Tesla C1060
CXXNetTrainer, devCPU=0
ConvolutionLayer: nstep=256
ConvolutionLayer: nstep=256
ConvolutionLayer: nstep=128
SGDUpdater: eta=0.010000, mom=0.900000
SGDUpdater: eta=0.020000, mom=0.900000
SGDUpdater: eta=0.010000, mom=0.900000
SGDUpdater: eta=0.020000, mom=0.900000
SGDUpdater: eta=0.010000, mom=0.900000
SGDUpdater: eta=0.020000, mom=0.900000
SGDUpdater: eta=0.010000, mom=0.900000
SGDUpdater: eta=0.020000, mom=0.900000
SGDUpdater: eta=0.010000, mom=0.900000
SGDUpdater: eta=0.020000, mom=0.900000
SGDUpdater: eta=0.010000, mom=0.900000
SGDUpdater: eta=0.020000, mom=0.900000
node[0].shape: 256,3,48,48
node[1].shape: 256,96,12,12
node[2].shape: 256,96,12,12
node[3].shape: 256,96,6,6
node[4].shape: 256,128,8,8
node[5].shape: 256,128,8,8
node[6].shape: 256,128,8,8
node[7].shape: 256,128,8,8
node[8].shape: 256,128,4,4
node[9].shape: 1,1,256,2048
node[10].shape: 1,1,256,512
node[11].shape: 1,1,256,512
node[12].shape: 1,1,256,512
node[13].shape: 1,1,256,512
node[14].shape: 1,1,256,121
ThreadImagePageIterator:image_list=./train.lst, bin=./train.bin
loading mean image from models/image_mean.bin
ThreadBufferIterator: buffer_size=2
ThreadImagePageIterator:image_list=./train.lst, bin=./train.bin
loading mean image from models/image_mean.bin
ThreadBufferIterator: buffer_size=2
initializing end, start working
round 0:[ 100] 13 sec elapsed[1] train-error:0.999570
round 1:[ 100] 33 sec elapsed[2] train-error:0.999570
round 2:[ 100] 52 sec elapsed[3] train-error:0.999570
round 3:[ 100] 72 sec elapsed[4] train-error:0.999570

CXXNetTrainer, devCPU=1
ConvolutionLayer: nstep=256
ConvolutionLayer: nstep=256
ConvolutionLayer: nstep=128
SGDUpdater: eta=0.010000, mom=0.900000
SGDUpdater: eta=0.020000, mom=0.900000
SGDUpdater: eta=0.010000, mom=0.900000
SGDUpdater: eta=0.020000, mom=0.900000
SGDUpdater: eta=0.010000, mom=0.900000
SGDUpdater: eta=0.020000, mom=0.900000
SGDUpdater: eta=0.010000, mom=0.900000
SGDUpdater: eta=0.020000, mom=0.900000
SGDUpdater: eta=0.010000, mom=0.900000
SGDUpdater: eta=0.020000, mom=0.900000
SGDUpdater: eta=0.010000, mom=0.900000
SGDUpdater: eta=0.020000, mom=0.900000
node[0].shape: 256,3,48,48
node[1].shape: 256,96,12,12
node[2].shape: 256,96,12,12
node[3].shape: 256,96,6,6
node[4].shape: 256,128,8,8
node[5].shape: 256,128,8,8
node[6].shape: 256,128,8,8
node[7].shape: 256,128,8,8
node[8].shape: 256,128,4,4
node[9].shape: 1,1,256,2048
node[10].shape: 1,1,256,512
node[11].shape: 1,1,256,512
node[12].shape: 1,1,256,512
node[13].shape: 1,1,256,512
node[14].shape: 1,1,256,121
ThreadImagePageIterator:image_list=./train.lst, bin=./train.bin
loading mean image from models/image_mean.bin
ThreadBufferIterator: buffer_size=2
ThreadImagePageIterator:image_list=./train.lst, bin=./train.bin
loading mean image from models/image_mean.bin
ThreadBufferIterator: buffer_size=2
initializing end, start working
round 0:[ 100] 311 sec elapsed[1] train-error:0.776947
round 1:[ 100] 815 sec elapsed[2] train-error:0.689883
update round 2

cuDNN, compile errors

compile with cuDNN library provide error below.. without cudnn all compiles ok, 660M nvidia card(compute capability 3.0). Using cuDNN v2 Release Candidate 3. With cuDNN 6.5 R1 - similar errors... any helpful ideas?

nvcc -c -o layer_gpu.o --use_fast_math -g -O3 -ccbin g++ -Xcompiler "-DMSHADOW_FORCE_STREAM -Wall -g -O3 -I./mshadow/ -fopenmp -fPIC -msse3 -funroll-loops -Wno-unused-parameter -Wno-unknown-pragmas -DMSHADOW_USE_CBLAS=1 -DMSHADOW_USE_MKL=0 -DMSHADOW_DIST_PS=0 -DCXXNET_USE_OPENCV=1 -DCXXNET_USE_CUDNN=1 -I/usr/local/cuda-7.0/include/ -L/usr/local/cuda-7.0/lib64/ " src/layer/layer_impl.cu
src/layer/./cudnn_pooling_layer-inl.hpp(84): error: identifier "CUDNN_POOLING_AVERAGE" is undefined
detected during:
instantiation of "void cxxnet::layer::CuDNNPoolingLayer<Reducer, mode, xpu>::InitConnection(const std::vectorcxxnet::layer::Node<xpu *, std::allocatorcxxnet::layer::Node<xpu *>> &, const std::vectorcxxnet::layer::Node<xpu *, std::allocatorcxxnet::layer::Node<xpu *>> &, cxxnet::layer::ConnectState *) [with Reducer=mshadow::red::maximum, mode=11, xpu=cxxnet::gpu]"
(16): here
instantiation of "cxxnet::layer::CuDNNPoolingLayer<Reducer, mode, xpu>::CuDNNPoolingLayer() [with Reducer=mshadow::red::maximum, mode=11, xpu=cxxnet::gpu]"
src/layer/layer_impl-inl.hpp(55): here
instantiation of "cxxnet::layer::ILayer *cxxnet::layer::CreateLayer_(cxxnet::layer::LayerType, mshadow::Random<xpu, mshadow::default_real_t> *, const cxxnet::layer::LabelInfo *) [with xpu=cxxnet::gpu]"
src/layer/layer_impl.cu(12): here

1 error detected in the compilation of "/tmp/tmpxft_00003eba_00000000-7_layer_impl.cpp1.ii".
make: *** [layer_gpu.o] Error 2

"decoding fail"

I am running the "kaggle_bowl" example and get an error "decoding fail" (full message appended below). I saw on the kaggle forum that this means opencv can't decode image in the binary file generated by im2bin. Does this mean im2bin is not producing the expected output??

$ ../../bin/cxxnet bowl.conf
Use CUDA Device 0: Tesla C1060
CXXNetTrainer, devCPU=0
ConvolutionLayer: nstep=256
ConvolutionLayer: nstep=256
ConvolutionLayer: nstep=128
SGDUpdater: eta=0.010000, mom=0.900000
SGDUpdater: eta=0.020000, mom=0.900000
SGDUpdater: eta=0.010000, mom=0.900000
SGDUpdater: eta=0.020000, mom=0.900000
SGDUpdater: eta=0.010000, mom=0.900000
SGDUpdater: eta=0.020000, mom=0.900000
SGDUpdater: eta=0.010000, mom=0.900000
SGDUpdater: eta=0.020000, mom=0.900000
SGDUpdater: eta=0.010000, mom=0.900000
SGDUpdater: eta=0.020000, mom=0.900000
SGDUpdater: eta=0.010000, mom=0.900000
SGDUpdater: eta=0.020000, mom=0.900000
node[0].shape: 256,3,48,48
node[1].shape: 256,96,12,12
node[2].shape: 256,96,12,12
node[3].shape: 256,96,6,6
node[4].shape: 256,128,8,8
node[5].shape: 256,128,8,8
node[6].shape: 256,128,8,8
node[7].shape: 256,128,8,8
node[8].shape: 256,128,4,4
node[9].shape: 1,1,256,2048
node[10].shape: 1,1,256,512
node[11].shape: 1,1,256,512
node[12].shape: 1,1,256,512
node[13].shape: 1,1,256,512
node[14].shape: 1,1,256,121
ThreadImagePageIterator:image_list=./train.lst, bin=./train.bin
cannot find models/image_mean.bin: create mean image, this will take some time...
Error:decoding fail

How to make cxxnet runs on only CPU

I have compile cxxnet in a virtual machine in ubuntu 14.04 version.
when running the bowl.conf example

../../bin/cxxnet bowl.conf
Error:Cannot find CUDA device. Please check CUDA-Configuration

How can we configure bowl.conf to run with only CPU?

Thanks
patrick

Weight Initialization

Is it possible to externally provide an external weight set initialiation? (rather than the inbuilt random initialization)
must be possible via a manually created model file, right? Is there any documentation on the fileformat of this fileformat ..

error during build in Ubuntu 12.04

I have a problem installing cxxnet on my PC with Intel i7 on Ubuntu 12.04

The following is the error message

./build.sh blas=1

Fetch mshadow...
Cloning into 'mshadow'...
remote: Counting objects: 1589, done.
remote: Compressing objects: 100% (13/13), done.
remote: Total 1589 (delta 8), reused 12 (delta 7)
Receiving objects: 100% (1589/1589), 488.00 KiB | 303 KiB/s, done.
Resolving deltas: 100% (1068/1068), done.
g++ -c -Wall -g -O3 -msse3 -Wno-unknown-pragmas -funroll-loops -I./mshadow/ -DMSHADOW_USE_MKL=0 -DMSHADOW_USE_CBLAS=1 -o cxxnet_data.o cxxnet/io/cxxnet_data.cpp
In file included from ./mshadow/mshadow/tensor.h:11:0,
from cxxnet/io/cxxnet_data.cpp:6:
./mshadow/mshadow/tensor_base.h:83:22: fatal error: cublas.h: No such file or directory
compilation terminated.
make: *** [cxxnet_data.o] Error 1

GPU error: Incorrect Device ID

Hi,

When I run the example with MNIST_Conv.conf, I got the following error:

Error:Incorrect Device ID?

The same error appeared when using MNIST.conf if I change it dev = gpu.

Best,

Eric

error while run python example mnist.py

I got the following error message when trying python example mnist.py
Traceback (most recent call last):
File "mnist.py", line 50, in
net = cxxnet.train(cfg, data, 1, param, eval_data = deval)
TypeError: train() got an unexpected keyword argument 'eval_data'

this may be caused by two "train" method in python wrapper 'cxxnet.py', since the last "train" method will cover the previous one.

Problem with training imagenet 2012 model

When training imagenet 2012 data set, I get the evaluation result of first 9 rounds :

[1] test-error:0.998998 test-rec@1:0.001002 test-rec@5:0.005008 train-error:0.999028 train-rec@1:0.000972 train-rec@5:0.004782
[2] test-error:0.998998 test-rec@1:0.001002 test-rec@5:0.005008 train-error:0.999058 train-rec@1:0.000942 train-rec@5:0.004806
[3] test-error:0.999018 test-rec@1:0.000982 test-rec@5:0.004988 train-error:0.999038 train-rec@1:0.000962 train-rec@5:0.004808
[4] test-error:0.998998 test-rec@1:0.001002 test-rec@5:0.005008 train-error:0.999017 train-rec@1:0.000983 train-rec@5:0.004776
[5] test-error:0.999018 test-rec@1:0.000982 test-rec@5:0.004988 train-error:0.999003 train-rec@1:0.000997 train-rec@5:0.004986
[6] test-error:0.998998 test-rec@1:0.001002 test-rec@5:0.005008 train-error:0.999009 train-rec@1:0.000991 train-rec@5:0.005019
[7] test-error:0.998998 test-rec@1:0.001002 test-rec@5:0.004988 train-error:0.999013 train-rec@1:0.000987 train-rec@5:0.004820
[8] test-error:0.998998 test-rec@1:0.001002 test-rec@5:0.004968 train-error:0.998989 train-rec@1:0.001011 train-rec@5:0.004948
[9] test-error:0.998998 test-rec@1:0.001002 test-rec@5:0.004988 train-error:0.999016 train-rec@1:0.000984 train-rec@5:0.004963

The training error is 0.999+-. Is the result normal ?

Multi core execution

I was trying to run cxxnet without any gpu.
I had set
dev=cpu

The process seems to use only one CPU core.. Could I make it use 4 cores ?

I have compiled cxxnet for Ubuntu 14.01

Thanks
Deepu

question

Build issue with CPU

Hi,
I am a novice, need some clarification on build. I am running the code in Ubuntu 14.0.4 VM ware and I do not have a Nvidia GPU, hence I will be using CPU.My questions are below.
System H/W: 16GB RAM

Do I have to run the make file within the tools directory or the master directory?
I have CBLAS installed on the system, what is the config that I have to change

Please advice, I am stuck with the following error.
mj@ubuntu:~/Desktop/National-DS/CodeFiles/cxxnet-master$ ./build.sh blas=1
Fetch mshadow...
fatal: destination path 'mshadow' already exists and is not an empty directory.
g++ -c -Wall -g -O3 -msse3 -Wno-unknown-pragmas -funroll-loops -I./mshadow/ -I/usr/include/ -DMSHADOW_USE_MKL=0 -DMSHADOW_USE_CBLAS=0 -o cxxnet_data.o cxxnet/io/cxxnet_data.cpp
In file included from ./mshadow/mshadow/tensor.h:11:0,
from cxxnet/io/cxxnet_data.cpp:6:
./mshadow/mshadow/tensor_base.h:83:22: fatal error: cublas.h: No such file or directory
#include <cublas.h>
^

Is it possible to compile in Windows by any chance?

Does it support only linux or do you have any run on Windows also?

compilation problem

Hello everyone. I am having difficulty compiling cxxnet (relatively new to the NN/linux scene). I am running arch 64 bit and I have installed cuda toolkit 6.5 with atlas/lapack. I have installed OpenCV and when I try 'sh build.sh blas=1" I get:

Fetch mshadow...
fatal: destination path 'mshadow' already exists and is not an empty directory.
g++ -c -Wall -g -O3 -msse3 -Wno-unknown-pragmas -funroll-loops -I./mshadow/ -I /usr/local/cuda/include/ -L /usr/local/cuda/lib64/ -pthread -DMSHADOW_USE_MKL=0 -DMSHADOW_USE_CBLAS=1 -o cxxnet_data.o cxxnet/io/cxxnet_data.cpp
In file included from ./mshadow/mshadow/tensor.h:11:0,
from cxxnet/io/cxxnet_data.cpp:6:
./mshadow/mshadow/tensor_base.h:83:22: fatal error: cublas.h: No such file or directory
#include <cublas.h>
^
compilation terminated.
Makefile:34: recipe for target 'cxxnet_data.o' failed
make: *** [cxxnet_data.o] Error 1

Any advice would be greatly appreciated!

thank you

About multi-label

Hi. I would like to ask whether this toolkit supports multi-label for training and testing? Thanks!

Implement GoogLeNet?

Google disclose the details of GoogLeNet in http://arxiv.org/pdf/1409.4842v1.pdf.
Does cxxnet have all the things to implement it?

log_loss metric

I've been having some issues getting the log_loss metric to work; most examples have

metric= error

at some point. On the basis of

void AddMetric( const char *name ){ if( !strcmp( name, "rmse") ) evals_.push_back( new MetricRMSE() ); if( !strcmp( name, "error") ) evals_.push_back( new MetricError() ); if( !strcmp( name, "r2") ) evals_.push_back( new MetricCorrSqr() ); if( !strncmp( name, "rec@",4) ) evals_.push_back( new MetricRecall( name ) ); if( !strcmp( name, "logloss") ) evals_.push_back( new MetricLogloss() ); // simple way to enforce uniqueness, not a good way, not ok here std::sort( evals_.begin(), evals_.end(), CmpName ); evals_.resize( std::unique( evals_.begin(), evals_.end(), EqualName ) - evals_.begin() ); }
I assumed one could get the log loss by replacing that line by

metric = logloss

However, this change just results in training iterations being printed with no corresponding scores.

V2-Refactor How to predict for Kaggle Bowl (pred_raw gone)

Hi,

I noticed that pred_raw has gone, but pred does not produce the same amount of data (and only one column).

If I use extract with extract_node_name = top[-1], then I get predicted data of the correct size and format, but the Kaggle leaderboard score is very bad (27!).

How do we predict for kaggle_bowl?

Thanks

Is it easy to implement a RNN?

Could you please add a simple example if possible?

Error building cxxnet using Ubuntu 14.04

I'm building cxxnet in CPU only mode using ATLAS blas. I replaced makefile with make/Cblas.cpu.ml and Cpu_only.mk (tried with both), but get this error "undefined reference to symbol 'sem_post@@GLIBC_2.2.5'"

Any suggestions on what to do? Thanks!

g++ -Wall -g -O3 -msse3 -Wno-unknown-pragmas -funroll-loops -I./mshadow/ -DMSHADOW_USE_CUDA=0 -DMSHADOW_USE_SSE=1 -DMSHADOW_USE_MKL=0 -DMSHADOW_USE_CBLAS=1 -o bin/cxxnet cxxnet/cxxnet_main.cpp cxxnet_data.o cxxnet_nnet_cpu.o -lm -lz pkg-config --libs opencv -lcblas
/usr/bin/ld: cxxnet_data.o: undefined reference to symbol 'sem_post@@GLIBC_2.2.5'
//lib/x86_64-linux-gnu/libpthread.so.0: error adding symbols: DSO missing from command line
collect2: error: ld returned 1 exit status
make: *** [bin/cxxnet] Error 1

Compiling with Atlas

I was trying to link an ATLAS implementation using -latlas instead of -lblas in the makefile

it gets me the following error..
./mshadow/mshadow/tensor_expr_engine-inl.hpp:278: undefined reference to `cblas_sgemm'

Google gave me the result
http://stackoverflow.com/questions/1137483/c-c-linking-error
Basically advised me to have extern "C" declaration..

When i see it in mshadow, it already has an extern "C" there.. Could somebody help..

Thanks
Deepu

Remove unnecessary null pointer checks

An extra null pointer check is not needed in functions like the following.

how to generate image mean file on your own data?

Greetings,

Was wondering if there was a tool or method that people used to generate their own image_mean.bin file for their own data? I wasn't able to find out packaged with cxxnet

Layerwise pretraining?

Hi Tianqi and Bing,
maybe this is a dummy question, but did you use layerwise pretraining in NetTrainer for deep architechures? I didn't see any of this in the code, if I miss anything, please give me a hint, thanks.

The new cudnn convolution code still cannot be compiled in VS2012

The gpu-specialize code in cudnn_convolution_layer-inl.hpp still has some problem. I think that the gpu-specialize code should be wrote in a independent .cuh header file, not in a .hpp one. When the c/c++ compiler visit the hpp file, it would also try to complie the gpu codes, which causes error on the cuda kernels. Now I just cut all the codes from layer_impl.cpp to layer_impl.cu, in order to use nvcc to compile all the cpu/gpu codes. Hope for a better solution.

Problem building on Amazon Linux with Opencv,NVIDIA CUDA drivers, blas installed

Can you provide a script which will install cxxnet automatically assuming opencv and NVIDIA drivers are installed?

 ./build.sh blas=1
Fetch mshadow...
fatal: destination path 'mshadow' already exists and is not an empty directory.
g++ -Wall -g -O3 -msse3 -Wno-unknown-pragmas -funroll-loops -I./mshadow/ -I /opt/nvidia/cuda/include/ -DMSHADOW_USE_MKL=0 -DMSHADOW_USE_CBLAS=1  -o bin/cxxnet cxxnet/cxxnet_main.cpp cxxnet_data.o cxxnet_nnet_cpu.o cxxnet_nnet_gpu.o -L/opt/nvidia/cuda/lib64/ -L/usr/local/lib/ -lm -lcudart -lcublas -lcurand -lz -lopencv_core -lblas -lgfortran
cxxnet_data.o: In function `cxxnet::ImageIterator::LoadImage(mshadow::TensorContainer<mshadow::cpu, 3>&, cxxnet::DataInst&, char const*)':
/home/ec2-user/cxxnet/trunk/cxxnet/io/cxxnet_iter_img-inl.hpp:66: undefined reference to `cv::imread(std::string const&, int)'
/home/ec2-user/cxxnet/trunk/cxxnet/io/cxxnet_iter_img-inl.hpp:66: undefined reference to `cv::imread(std::string const&, int)'
cxxnet_data.o: In function `cxxnet::ThreadImagePageIterator::LoadImage(mshadow::TensorContainer<mshadow::cpu, 3>&, cxxnet::DataInst&, std::vector<unsigned char, std::allocator<unsigned char> >&)':
/home/ec2-user/cxxnet/trunk/cxxnet/io/cxxnet_iter_thread_imbin-inl.hpp:64: undefined reference to `cv::imdecode(cv::_InputArray const&, int)'
cxxnet_nnet_cpu.o: In function `mshadow::expr::BLASEngine<mshadow::cpu>::gemm(bool, bool, int, int, int, float, float const*, int, float const*, int, float, float*, int)':
/home/ec2-user/cxxnet/trunk/./mshadow/mshadow/tensor_expr_engine-inl.hpp:278: undefined reference to `cblas_sgemm'
/home/ec2-user/cxxnet/trunk/./mshadow/mshadow/tensor_expr_engine-inl.hpp:278: undefined reference to `cblas_sgemm'
/home/ec2-user/cxxnet/trunk/./mshadow/mshadow/tensor_expr_engine-inl.hpp:278: undefined reference to `cblas_sgemm'
/home/ec2-user/cxxnet/trunk/./mshadow/mshadow/tensor_expr_engine-inl.hpp:278: undefined reference to `cblas_sgemm'
/home/ec2-user/cxxnet/trunk/./mshadow/mshadow/tensor_expr_engine-inl.hpp:278: undefined reference to `cblas_sgemm'
cxxnet_nnet_cpu.o:/home/ec2-user/cxxnet/trunk/./mshadow/mshadow/tensor_expr_engine-inl.hpp:278: more undefined references to `cblas_sgemm' follow
collect2: error: ld returned 1 exit status
make: *** [bin/cxxnet] Error 1

Support for adding novel features at intermediate layers through separate bin/feature files?

This is a feature enhancement request to add features at intermediate layers in cxxnet through the use of separate bin/or image files.

error during build in Ubuntu 14.04 with MKL library

I have the MKL library installed but with ./build.sh I got the following error.

g++ -c -Wall -g -O3 -msse3 -Wno-unknown-pragmas -funroll-loops -I./mshadow/ -o cxxnet_data.o cxxnet/io/cxxnet_data.cpp
In file included from ./mshadow/mshadow/tensor.h:11:0,
from cxxnet/io/cxxnet_data.cpp:6:
./mshadow/mshadow/tensor_base.h:76:19: fatal error: mkl.h: No such file or directory
#include <mkl.h>
^
compilation terminated.
make: *** [cxxnet_data.o] Error 1

I am on Ubuntu 14.04 system with Cuda6.5.

Thank you.

Difficulty running bowl.conf with different image sizes

I resized all the images for the kaggle bowl to 60x60 and changed input_shape to 3,60,60 but I am getting an error message that tensor dimensions don't match up. Is there some other modification I need to make?

Fail to build on OSX

Hi, I'm trying to build cxxnet on OSX using recent CUDA (6.5). The problem I face seems to be related to boost/CUDA issue reported here (https://svn.boost.org/trac/boost/ticket/10418), but I did check that my latest boost had the fix mentioned there. Anyway, here is my build issue:

Fetch mshadow...
fatal: destination path 'mshadow' already exists and is not an empty directory.
nvcc -c -o cxxnet_nnet_gpu.o --use_fast_math -g -O3 -ccbin g++ -Xcompiler "-Wall -g -O3 -msse3 -Wno-unknown-pragmas -funroll-loops -I/opt/local/include -I/Developer/NVIDIA/CUDA-6.5/include/ -I./mshadow -DMSHADOW_USE_MKL=0 -DMSHADOW_USE_CBLAS=1" cxxnet/nnet/cxxnet_nnet.cu
/opt/local/include/gcc48/c++/cstdlib(178): error: identifier "__int128" is undefined
/opt/local/include/gcc48/c++/cstdlib(179): error: identifier "__int128" is undefined

Do you have any idea how to fix this?
Thanks,
Valentin.

V2-refactor - Higher train errors with eval=val

I just switched to v2-refactor for the Kaggle bowl. My first goal was to match the CNN model with what I had been using with master and see if I get the same train error scores...just as a way to baseline.

I kept getting much higher train errors in v2-refactor (0.33 vs. 0.19 in master). I was able to root cause this to using "eval = val" in bowl.conf. If I change to "eval = train", then train error matches what I see in master.

Just curious to know why this is the case? I was assuming "eval=val" should only affect the validation error score and not train errors.

reading from multiple imbins

support read from multiple bins with a list

Compilation Error

I am trying to compile cxxnet but I am getting the following errors:

Fetch mshadow...
Cloning into 'mshadow'...
remote: Counting objects: 2523, done.
remote: Compressing objects: 100% (5/5), done.
remote: Total 2523 (delta 0), reused 0 (delta 0)
Receiving objects: 100% (2523/2523), 693.91 KiB | 340.00 KiB/s, done.
Resolving deltas: 100% (1729/1729), done.
Checking connectivity... done.
g++ -c -Wall -g -O3 -msse3 -stdlib=libstdc++ -Wno-unknown-pragmas -funroll-loops -I./mshadow/ -I/Volumes/../Developer/NVIDIA/CUDA-6.5/include -I/usr/local/include -o cxxnet_data.o cxxnet/io/cxxnet_data.cpp
cxxnet/io/cxxnet_data.cpp:12:22: error: no type named 'real_t' in namespace
'mshadow'
typedef mshadow::real_t real_t;
~~~~~~~~~^
In file included from cxxnet/io/cxxnet_data.cpp:15:
cxxnet/io/cxxnet_data.h:78:17: error: no matching function for call to
'FreeSpace'
mshadow::FreeSpace( data );
^~~~~~~~~~~~~~~~~~
./mshadow/mshadow/./tensor_cpu-inl.h:102:13: note: candidate template ignored:
could not match 'Tensor<mshadow::cpu, dim, DType> ' against
'mshadow::Tensor<mshadow::cpu, 4>'
inline void FreeSpace(Tensor<cpu, dim, DType> *obj) {
^
./mshadow/mshadow/./tensor_gpu-inl.h:61:13: note: candidate template ignored:
could not match 'Tensor<mshadow::gpu, dim, DType> *' against
'mshadow::Tensor<mshadow::cpu, 4>'
inline void FreeSpace(Tensor<gpu, dim, DType> *obj) {
^
In file included from cxxnet/io/cxxnet_data.cpp:18:
cxxnet/io/cxxnet_iter_cifar-inl.hpp:14:18: error: no member named 'dptr' in
'mshadow::Tensor<mshadow::cpu, 4, float>'; did you mean 'dptr'?
img_.dptr = NULL;
^~~~
dptr_
./mshadow/mshadow/tensor.h:250:10: note: 'dptr_' declared here
DType *dptr_;
^
In file included from cxxnet/io/cxxnet_data.cpp:18:
cxxnet/io/cxxnet_iter_cifar-inl.hpp:22:22: error: no member named 'dptr' in
'mshadow::Tensor<mshadow::cpu, 4, float>'; did you mean 'dptr_'?
if( img_.dptr != NULL ) delete []img_.dptr;
^~~~
dptr_
./mshadow/mshadow/tensor.h:250:10: note: 'dptr_' declared here
DType *dptr_;
^
In file included from cxxnet/io/cxxnet_data.cpp:18:
cxxnet/io/cxxnet_iter_cifar-inl.hpp:22:51: error: no member named 'dptr' in
'mshadow::Tensor<mshadow::cpu, 4, float>'; did you mean 'dptr_'?
if( img_.dptr != NULL ) delete []img_.dptr;
^~~~
dptr_
./mshadow/mshadow/tensor.h:250:10: note: 'dptr_' declared here
DType *dptr_;
^
In file included from cxxnet/io/cxxnet_data.cpp:18:
cxxnet/io/cxxnet_iter_cifar-inl.hpp:42:27: error: no member named 'shape' in
'mshadow::Tensor<mshadow::cpu, 4, float>'; did you mean 'shape_'?
out_.data.shape = mshadow::Shape4(1,1,batch_size_,img_.s...
^~~~~
shape_
./mshadow/mshadow/tensor.h:252:20: note: 'shape_' declared here
Shape shape_;
^
In file included from cxxnet/io/cxxnet_data.cpp:18:
cxxnet/io/cxxnet_iter_cifar-inl.hpp:42:72: error: no member named 'shape' in
'mshadow::Tensor<mshadow::cpu, 4, float>'; did you mean 'shape_'?
...out_.data.shape = mshadow::Shape4(1,1,batch_size_,img_.shape[1] * img_.s...
^~~~~
shape_
./mshadow/mshadow/tensor.h:252:20: note: 'shape_' declared here
Shape shape_;
^
In file included from cxxnet/io/cxxnet_data.cpp:18:
cxxnet/io/cxxnet_iter_cifar-inl.hpp:42:88: error: no member named 'shape' in
'mshadow::Tensor<mshadow::cpu, 4, float>'; did you mean 'shape_'?
...= mshadow::Shape4(1,1,batch_size_,img_.shape[1] * img_.shape[0] * 3 );
^~~~~
shape_
./mshadow/mshadow/tensor.h:252:20: note: 'shape_' declared here
Shape shape_;
^
In file included from cxxnet/io/cxxnet_data.cpp:18:
cxxnet/io/cxxnet_iter_cifar-inl.hpp:44:27: error: no member named 'shape' in
'mshadow::Tensor<mshadow::cpu, 4, float>'; did you mean 'shape_'?
out_.data.shape = mshadow::Shape4( batch_size_, 3, img_....
^~~~~
shape_
./mshadow/mshadow/tensor.h:252:20: note: 'shape_' declared here
Shape shape_;
^
In file included from cxxnet/io/cxxnet_data.cpp:18:
cxxnet/io/cxxnet_iter_cifar-inl.hpp:44:73: error: no member named 'shape' in
'mshadow::Tensor<mshadow::cpu, 4, float>'; did you mean 'shape_'?
...out_.data.shape = mshadow::Shape4( batch_size_, 3, img_.shape[1], img_.s...
^~~~~
shape_
./mshadow/mshadow/tensor.h:252:20: note: 'shape_' declared here
Shape shape_;
^
In file included from cxxnet/io/cxxnet_data.cpp:18:
cxxnet/io/cxxnet_iter_cifar-inl.hpp:44:88: error: no member named 'shape' in
'mshadow::Tensor<mshadow::cpu, 4, float>'; did you mean 'shape_'?
...= mshadow::Shape4( batch_size_, 3, img_.shape[1], img_.shape[0] );
^~~~~
shape_
./mshadow/mshadow/tensor.h:252:20: note: 'shape_' declared here
Shape shape_;
^
In file included from cxxnet/io/cxxnet_data.cpp:18:
cxxnet/io/cxxnet_iter_cifar-inl.hpp:47:23: error: no member named 'shape' in
'mshadow::Tensor<mshadow::cpu, 4, float>'; did you mean 'shape_'?
out_.data.shape.stride_ = out_.data.shape[0];
^~~~~
shape_
./mshadow/mshadow/tensor.h:252:20: note: 'shape_' declared here
Shape shape_;
^
In file included from cxxnet/io/cxxnet_data.cpp:18:
cxxnet/io/cxxnet_iter_cifar-inl.hpp:47:29: error: no member named 'stride_' in
'mshadow::Shape<4>'
out_.data.shape.stride_ = out_.data.shape[0];
~~~~~~~~~~~~~~~ ^
cxxnet/io/cxxnet_iter_cifar-inl.hpp:47:49: error: no member named 'shape' in
'mshadow::Tensor<mshadow::cpu, 4, float>'; did you mean 'shape_'?
out_.data.shape.stride_ = out_.data.shape[0];
^~~~~
shape_
./mshadow/mshadow/tensor.h:252:20: note: 'shape_' declared here
Shape shape_;
^
In file included from cxxnet/io/cxxnet_data.cpp:18:
cxxnet/io/cxxnet_iter_cifar-inl.hpp:51:49: error: no member named 'shape' in
'mshadow::Tensor<mshadow::cpu, 4, float>'; did you mean 'shape_'?
mshadow::Shape<4> s = out_.data.shape;
^~~~~
shape_
./mshadow/mshadow/tensor.h:252:20: note: 'shape_' declared here
Shape shape_;
^
In file included from cxxnet/io/cxxnet_data.cpp:18:
cxxnet/io/cxxnet_iter_cifar-inl.hpp:53:39: error: no member named 'shape' in
'mshadow::Tensor<mshadow::cpu, 4, float>'; did you mean 'shape_'?
(unsigned)img_.shape[3], shuffle_, s[3],s[2],s[1],s[0] );
^~~~~
shape_
./mshadow/mshadow/tensor.h:252:20: note: 'shape_' declared here
Shape shape_;
^
In file included from cxxnet/io/cxxnet_data.cpp:18:
cxxnet/io/cxxnet_iter_cifar-inl.hpp:61:44: error: no member named 'shape' in
'mshadow::Tensor<mshadow::cpu, 4, float>'; did you mean 'shape_'?
if( loc_ + batch_size_ <= img_.shape[3] ) {
^~~~~
shape_
./mshadow/mshadow/tensor.h:252:20: note: 'shape_' declared here
Shape shape_;
^
In file included from cxxnet/io/cxxnet_data.cpp:18:
cxxnet/io/cxxnet_iter_cifar-inl.hpp:62:27: error: no member named 'dptr' in
'mshadow::Tensor<mshadow::cpu, 4, float>'; did you mean 'dptr_'?
out_.data.dptr = img_[ loc_ ].dptr;
^~~~
dptr_
./mshadow/mshadow/tensor.h:250:10: note: 'dptr_' declared here
DType *dptr_;
^
In file included from cxxnet/io/cxxnet_data.cpp:18:
cxxnet/io/cxxnet_iter_cifar-inl.hpp:62:47: error: no member named 'dptr' in
'mshadow::Tensor<mshadow::cpu, 3, float>'; did you mean 'dptr_'?
out_.data.dptr = img_[ loc_ ].dptr;
^~~~
dptr_
./mshadow/mshadow/tensor.h:250:10: note: 'dptr_' declared here
DType *dptr_;
^
fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated.
make: *_* [cxxnet_data.o] Error 1

MShadow TensorHolder Template Compilation

./mshadow/mshadow/./tensor_holder.h: In member function ‘mshadow::Shape mshadow::ShapeHolder::get() const’:
./mshadow/mshadow/./tensor_holder.h:67:38: error: invalid use of member (did you forget the ‘&’ ?)
./mshadow/mshadow/./tensor_holder.h:69:15: error: could not convert template argument ‘mshadow::ShapeHolder::ndim’ to ‘int’
./mshadow/mshadow/./tensor_holder.h:69:18: error: invalid type in declaration before ‘;’ token
./mshadow/mshadow/./tensor_holder.h:70:29: error: invalid use of member (did you forget the ‘&’ ?)
./mshadow/mshadow/./tensor_holder.h:71:10: error: invalid types ‘int[mshadow::index_t {aka unsigned int}]’ for array subscript
./mshadow/mshadow/./tensor_holder.h: In member function ‘mshadow::ShapeHolder& mshadow::ShapeHolder::operator=(const mshadow::Shape&)’:
./mshadow/mshadow/./tensor_holder.h:82:23: error: no matching function for call to ‘std::vector::resize()’
./mshadow/mshadow/./tensor_holder.h:82:23: note: candidate is:
/usr/include/c++/4.6/bits/stl_vector.h:629:7: note: void std::vector<_Tp, _Alloc>::resize(std::vector<_Tp, _Alloc>::size_type, std::vector<_Tp, _Alloc>::value_type) [with _Tp = unsigned int, _Alloc = std::allocator, std::vector<_Tp, _Alloc>::size_type = long unsigned int, std::vector<_Tp, _Alloc>::value_type = unsigned int]
/usr/include/c++/4.6/bits/stl_vector.h:629:7: note: no known conversion for argument 1 from ‘’ to ‘long unsigned int’
./mshadow/mshadow/./tensor_holder.h:83:29: error: invalid use of member (did you forget the ‘&’ ?)
./mshadow/mshadow/./tensor_holder.h: In member function ‘bool mshadow::ShapeHolder::operator==(const mshadow::Shape&) const’:
./mshadow/mshadow/./tensor_holder.h:106:26: error: invalid use of member (did you forget the ‘&’ ?)
./mshadow/mshadow/./tensor_holder.h:107:29: error: invalid use of member (did you forget the ‘&’ ?)
make: *** [layer_cpu.o] Error 1

Lots of undefined references to MKL functions

Here is the error message from build.sh:

Fetch mshadow...
fatal: destination path 'mshadow' already exists and is not an empty directory.
g++ -Wall -g -O3 -msse3 -Wno-unknown-pragmas -funroll-loops -I./mshadow/  -o bin/cxxnet cxxnet/cxxnet_main.cpp cxxnet_data.o cxxnet_nnet_cpu.o cxxnet_nnet_gpu.o -lm -lcudart -lcublas -lmkl_core -lmkl_intel_lp64 -lmkl_intel_thread -liomp5 -lpthread -lcurand -lz `pkg-config --libs opencv`
/usr/bin/ld: skipping incompatible /usr/local/cuda/lib64/../lib/libcudart.so when searching for -lcudart
/opt/intel/composer_xe_2015.1.133/mkl/lib/intel64/libmkl_intel_thread.so: undefined reference to `mkl_spblas_zdia1ttluf__smout_par'
/opt/intel/composer_xe_2015.1.133/mkl/lib/intel64/libmkl_intel_thread.so: undefined reference to `mkl_lapack_dpotrf_host_pack'
/opt/intel/composer_xe_2015.1.133/mkl/lib/intel64/libmkl_intel_thread.so: undefined reference to `mkl_spblas_lp64_ccsr1cslnf__mvout_par'
/opt/intel/composer_xe_2015.1.133/mkl/lib/intel64/libmkl_intel_thread.so: undefined reference to `mkl_spblas_dcoo0stlnc__smout_par'
/opt/intel/composer_xe_2015.1.133/mkl/lib/intel64/libmkl_intel_thread.so: undefined reference to `mkl_spblas_lp64_zcsr1nhuuf__mmout_par'
(omitted)
/opt/intel/composer_xe_2015.1.133/mkl/lib/intel64/libmkl_intel_lp64.so: undefined reference to `mkl_lapack_zlaed8'
/opt/intel/composer_xe_2015.1.133/mkl/lib/intel64/libmkl_intel_lp64.so: undefined reference to `mkl_iss_dcsrilut'
/opt/intel/composer_xe_2015.1.133/mkl/lib/intel64/libmkl_intel_lp64.so: undefined reference to `mkl_lapack_ao_ssytrd'
/opt/intel/composer_xe_2015.1.133/mkl/lib/intel64/libmkl_intel_thread.so: undefined reference to `mkl_spblas_lp64_dbsc_gauss'
/opt/intel/composer_xe_2015.1.133/mkl/lib/intel64/libmkl_intel_thread.so: undefined reference to `mkl_spblas_lp64_ccoo1ng__c__mmout_par'
collect2: error: ld returned 1 exit status
make: *** [bin/cxxnet] Error 1

And I checked with nm, giving:

$ nm -g /opt/intel/composer_xe_2015.1.133/mkl/lib/intel64/libmkl_intel_thread.so | grep 'mkl_spblas_zdia1ttluf__smout_par'
             U mkl_spblas_zdia1ttluf__smout_par`

What to do with it? Thank you.

How to configure CXXNET for more than 1 fullc path ?

Thanks for the great work in CXXNet... I would like to find out how can we configure cxxnet to have more than 1 full path (i.e. split from max_pooling layer) and going into 2 separate fullc path and layer recombining into a fullc layer with softmax.

Appreciate if the authors can help advise.

Thanks
Dr Patrick

parameter server integration example

Hi,
Could you please include tutorial on how to setup & run with parameter server?

P.S. congrats for V2 release!

dmlc / cxxnet Goto Github PK

cxxnet's Introduction

Distributed Machine Learning Common Codebase

What's New

Contents

Known Issues

Contributing

CheckList before submit code

NOTE

cxxnet's People

Contributors

Stargazers

Watchers

Forkers

cxxnet's Issues

question

Can you provide a script which will install cxxnet automatically assuming opencv and NVIDIA drivers are installed?

Recommend Projects

Recommend Topics

Recommend Org