haanjack / mnist-cudnn Goto Github PK

View Code? Open in Web Editor NEW

35.0 4.0 13.0 35 KB

CUDA for MNIST training/inference

License: MIT License

Makefile 2.13% Cuda 46.84% Shell 0.53% C++ 49.24% Batchfile 1.27%

cuda cudnn mnist

mnist-cudnn's Introduction

cuda-for-deep-learning

Transparent CUDNN / CUBLAS usage for the deep learning training using MNIST dataset.

How to use

$ git clone https://github.com/haanjack/cudnn-mnist-training
$ cd cudnn-mnist-training
$ bash download-mnist-dataset.sh
$ make
$ ./train

Expected output

== MNIST training with CUDNN ==
[TRAIN]
loading ./dataset/train-images-idx3-ubyte
loaded 60000 items..
.. model Configuration ..
CUDA: conv1
CUDA: pool
CUDA: conv2
CUDA: pool
CUDA: dense1
CUDA: relu
CUDA: dense2
CUDA: softmax
.. initialized conv1 layer ..
.. initialized conv2 layer ..
.. initialized dense1 layer ..
.. initialized dense2 layer ..
step:  200, loss: 0.561, accuracy: 75.762%
step:  400, loss: 2.754, accuracy: 96.574%
step:  600, loss: 0.157, accuracy: 97.004%
step:  800, loss: 0.005, accuracy: 97.006%
step: 1000, loss: 0.178, accuracy: 97.016%
step: 1200, loss: 0.014, accuracy: 96.998%
step: 1400, loss: 0.854, accuracy: 96.998%
step: 1600, loss: 0.165, accuracy: 96.984%
step: 1800, loss: 0.051, accuracy: 97.006%
step: 2000, loss: 0.284, accuracy: 97.025%
step: 2200, loss: 0.002, accuracy: 96.996%
step: 2400, loss: 0.013, accuracy: 96.990%
[INFERENCE]
loading ./dataset/t10k-images-idx3-ubyte
loaded 10000 items..
loss: 3.165, accuracy: 85.500%
Done.

Features

Parameter saving and loading
Network modification
Learning rate modificiation
Dataset shuffling
Testing
Add more layers

All these features requires re-compilation

mnist-cudnn's People

Contributors

Stargazers

Watchers

Forkers

yanjiegao ruthvik92 jovanwang kelvin-ng lucapontisso nottug ryancargan asloan7 sid9993 patrickvibild linlll aethocesora dummycat-0000

mnist-cudnn's Issues

Compiler error

#obj directory is not in the source file
SRC = src
OBJ_DIR = obj
so instead Makefile create directory
$(shell mkdir -p $(obj))
which is slow process
it would be better to put obj directory like src directory
simple fix is mkdir obj ^^

How to inference based on saved parameters without training procedure.

Hello, I want to inference based saved parameters without training.
When I comment out this code, inference result seems not correct. I'm not sure if just inference.

while (step < num_steps_train)
    {
       ....
        }
    }

Inference result as show as follows, load_pretrain is true.

== MNIST training with CUDNN ==
[TRAIN]
loading ./dataset/train-images-idx3-ubyte
loaded 60000 items..
.. model Configuration ..
CUDA: conv1
CUDA: pool
CUDA: conv2
CUDA: pool
CUDA: dense1
CUDA: relu
CUDA: dense2
CUDA: softmax
[INFERENCE]
loading ./dataset/t10k-images-idx3-ubyte
loaded 10000 items..
loss:    0, accuracy: 8.5%
Done.

How to build ResNet network to train with cuda

Although deep learning frameworks(e.g., TF, Pytorch,...) are commonly used to train models, I still want to try to build a Resnet model using Cudnn/ Cublas library for the deep learning training.
Could you give me some suggestions about how to create classic deep learning network(e.g., ResNet, ...) for cuda.
Thank you very much!
Actually, for resnet18, only Pad/AddV2/Reshape/Mean Ops are need to implement.

Did not use all the data(picture) during training or inferencing

Hi,
Thanks very much for your project, and I learnt a lot from it. But it seems that your project did not use all the data(picture) during training or inferencing. I modified the code a little to keep track of the index of all the pictures used:

I defined a public variable in class MNIST: public: std::vector<int> idx_store;

I stored all the indexes of picture used: (function void MNIST::get_batch())

for (int i = 0; i < batch_size_; i++) {
    std::copy(data_pool_[data_idx + i].data(),
        &data_pool_[data_idx + i].data()[data_size],
        &data_->ptr()[data_size * i]);
    idx_store.push_back(data_idx + i); // added
}

After the training, I tried to find an index greater than 500, but didn't work:

while (step < num_steps_train) {
    /* training... */
}

auto idx = train_data_loader.idx_store;
for (int i = 0; i < idx.size(); i++) {
    if (idx[i] > 500) {
        std::cout << idx[i] << std::endl; // debug here
    }
}

Then I figured out what the problem was: the following code in function void MNIST::get_batch()

int data_idx = (step_ * batch_size_) % num_steps_

This code limits the range of data_idx to between 0 and num_steps_, but it should be between 0 and 60000 (10000, test), so it only needs to be modified this way

int data_idx = step_ % num_steps_ * batch_size_;

After this modification, here is the result running on my machine:

[INFERENCE]
loading ./dataset/t10k-images.idx3-ubyte
loaded 10000 items..
conv1: Available Algorithm Count [FWD]: 10
conv1: Available Algorithm Count [BWD-filter]: 9
conv1: Available Algorithm Count [BWD-data]: 8
conv2: Available Algorithm Count [FWD]: 10
conv2: Available Algorithm Count [BWD-filter]: 9
conv2: Available Algorithm Count [BWD-data]: 8
loss: 0.145, accuracy: 90.050%
Done.

What is these identifier? compile error says that's undefined.

Hi.

Thanks for your code first. I can run cudnn/cuda deep learning example code successfully with yours.
By the way, I have question. When I make convolution using Makefile, the following error is occurred.
What can I do? What is the meaning of these identifiers?

/usr/local/cuda/bin/nvcc -ccbin g++ -m64 -g -std=c++11 -G --resource-usage -Xcompiler -rdynamic -Xcompiler -fopenmp -rdc=true -lnvToolsExt -I/usr/local/cuda/samples/common/inc -I/usr/local/cuda/include -L/usr/local/cuda/lib64 -lcublas -lcudnn -lgomp -lcurand -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -o convolution convolution.cu
convolution.cu(82): error: identifier "CUDNN_CONVOLUTION_FWD_PREFER_FASTEST" is undefined
convolution.cu(82): error: identifier "cudnnGetConvolutionForwardAlgorithm" is undefined
convolution.cu(87): error: identifier "CUDNN_CONVOLUTION_BWD_FILTER_PREFER_FASTEST" is undefined
convolution.cu(87): error: identifier "cudnnGetConvolutionBackwardFilterAlgorithm" is undefined
convolution.cu(92): error: identifier "CUDNN_CONVOLUTION_BWD_DATA_PREFER_FASTEST" is undefined
convolution.cu(92): error: identifier "cudnnGetConvolutionBackwardDataAlgorithm" is undefined
6 errors detected in the compilation of "convolution.cu".

Compile error on cuDNN v8

When compiling on cuDNN v8.7, the following compilation error occurs.

convolution.cu(82): error: identifier "CUDNN_CONVOLUTION_FWD_PREFER_FASTEST" is undefined

convolution.cu(82): error: identifier "cudnnGetConvolutionForwardAlgorithm" is undefined

convolution.cu(87): error: identifier "CUDNN_CONVOLUTION_BWD_FILTER_PREFER_FASTEST" is undefined

convolution.cu(87): error: identifier "cudnnGetConvolutionBackwardFilterAlgorithm" is undefined

convolution.cu(92): error: identifier "CUDNN_CONVOLUTION_BWD_DATA_PREFER_FASTEST" is undefined

convolution.cu(92): error: identifier "cudnnGetConvolutionBackwardDataAlgorithm" is undefined

6 errors detected in the compilation of "convolution.cu".
make: *** [Makefile:43: convolution] Error 1

Facing errors wrt g++ version type

The version that I'm currently running on are as follows:
CUDA - 10.0
OS - Linux-x86_64
cudnn-10.2-linux-x64-v7.6.5.32.tgz
gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23)

I tried to download and run the code as mentioned in the README file. But I faced the following errors:

/usr/local/cuda/bin/nvcc -ccbin g++ -I/usr/local/cuda/samples/common/inc -I/usr/local/cuda/include -m64 -g -std=c++11 -G --resource-usage -Xcompiler -rdynamic -Xcompiler -fopenmp -rdc=true -lnvToolsExt -I/usr/local/cuda/samples/common/inc -I/usr/local/cuda/include -L/usr/local/cuda/lib -lcublas -lcudnn -lgomp -lcurand -gencode arch=compute_50,code=sm_50 -c train.cpp -o obj/train.o
nvcc warning : Resource usage is not shown as the final resource allocation is not done.
nvcc warning : The -c++11 flag is not supported with the configured host compiler. Flag will be ignored.
In file included from /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/array:35,
from src/mnist.h:6,
from train.cpp:1:
/usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/c++0x_warning.h:31:2: error: #error This file requires compiler and library support for the upcoming ISO C++ standard, C++0x. This support is currently experimental, and must be enabled with the -std=c++0x or -std=gnu++0x compiler options.
In file included from src/mnist.h:13,
from train.cpp:1:
src/blob.h:45: error: expected ‘)’ before ‘<’ token
src/blob.h:97: error: ‘std::array’ has not been declared
src/blob.h:97: error: expected ‘,’ or ‘...’ before ‘<’ token
src/blob.h:103: error: ISO C++ forbids declaration of ‘array’ with no type
src/blob.h:103: error: invalid use of ‘::’
src/blob.h:103: error: expected ‘;’ before ‘<’ token
train.cpp:149: error: expected ‘;’ at end of input
train.cpp:149: error: expected ‘}’ at end of input
In file included from src/mnist.h:13,
from train.cpp:1:
src/blob.h: In destructor ‘cudl::Blob::~Blob()’:
src/blob.h:60: error: ‘is_tensor_’ was not declared in this scope
src/blob.h:61: error: ‘tensor_desc_’ was not declared in this scope
src/blob.h: In member function ‘void cudl::Blob::reset(int, int, int, int)’:
src/blob.h:87: error: ‘cudl::cuda’ cannot be used as a function
src/blob.h:90: error: ‘is_tensor_’ was not declared in this scope
src/blob.h:92: error: ‘tensor_desc_’ was not declared in this scope
src/blob.h: In member function ‘void cudl::Blob::reset(int)’:
src/blob.h:99: error: ‘size’ was not declared in this scope
src/blob.h: At global scope:
src/blob.h:100: error: expected unqualified-id at end of input
src/blob.h:100: error: expected ‘}’ at end of input
make: *** [obj/train.o] Error 1