haanjack / mnist-cudnn Goto Github PK

CUDA for MNIST training/inference

License: MIT License

Makefile 2.13% Cuda 46.84% Shell 0.53% C++ 49.24% Batchfile 1.27%

mnist-cudnn's Issues

Facing errors wrt g++ version type

The version that I'm currently running on are as follows:
CUDA - 10.0
OS - Linux-x86_64
cudnn-10.2-linux-x64-v7.6.5.32.tgz
gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23)

I tried to download and run the code as mentioned in the README file. But I faced the following errors:

/usr/local/cuda/bin/nvcc -ccbin g++ -I/usr/local/cuda/samples/common/inc -I/usr/local/cuda/include -m64 -g -std=c++11 -G --resource-usage -Xcompiler -rdynamic -Xcompiler -fopenmp -rdc=true -lnvToolsExt -I/usr/local/cuda/samples/common/inc -I/usr/local/cuda/include -L/usr/local/cuda/lib -lcublas -lcudnn -lgomp -lcurand -gencode arch=compute_50,code=sm_50 -c train.cpp -o obj/train.o
nvcc warning : Resource usage is not shown as the final resource allocation is not done.
nvcc warning : The -c++11 flag is not supported with the configured host compiler. Flag will be ignored.
In file included from /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/array:35,
from src/mnist.h:6,
from train.cpp:1:
/usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/c++0x_warning.h:31:2: error: #error This file requires compiler and library support for the upcoming ISO C++ standard, C++0x. This support is currently experimental, and must be enabled with the -std=c++0x or -std=gnu++0x compiler options.
In file included from src/mnist.h:13,
from train.cpp:1:
src/blob.h:45: error: expected ‘)’ before ‘<’ token
src/blob.h:97: error: ‘std::array’ has not been declared
src/blob.h:97: error: expected ‘,’ or ‘...’ before ‘<’ token
src/blob.h:103: error: ISO C++ forbids declaration of ‘array’ with no type
src/blob.h:103: error: invalid use of ‘::’
src/blob.h:103: error: expected ‘;’ before ‘<’ token
train.cpp:149: error: expected ‘;’ at end of input
train.cpp:149: error: expected ‘}’ at end of input
In file included from src/mnist.h:13,
from train.cpp:1:
src/blob.h: In destructor ‘cudl::Blob::~Blob()’:
src/blob.h:60: error: ‘is_tensor_’ was not declared in this scope
src/blob.h:61: error: ‘tensor_desc_’ was not declared in this scope
src/blob.h: In member function ‘void cudl::Blob::reset(int, int, int, int)’:
src/blob.h:87: error: ‘cudl::cuda’ cannot be used as a function
src/blob.h:90: error: ‘is_tensor_’ was not declared in this scope
src/blob.h:92: error: ‘tensor_desc_’ was not declared in this scope
src/blob.h: In member function ‘void cudl::Blob::reset(int)’:
src/blob.h:99: error: ‘size’ was not declared in this scope
src/blob.h: At global scope:
src/blob.h:100: error: expected unqualified-id at end of input
src/blob.h:100: error: expected ‘}’ at end of input
make: *** [obj/train.o] Error 1

How to inference based on saved parameters without training procedure.

Hello, I want to inference based saved parameters without training.
When I comment out this code, inference result seems not correct. I'm not sure if just inference.

while (step < num_steps_train)
    {
       ....
        }
    }

Inference result as show as follows, load_pretrain is true.

== MNIST training with CUDNN ==
[TRAIN]
loading ./dataset/train-images-idx3-ubyte
loaded 60000 items..
.. model Configuration ..
CUDA: conv1
CUDA: pool
CUDA: conv2
CUDA: pool
CUDA: dense1
CUDA: relu
CUDA: dense2
CUDA: softmax
[INFERENCE]
loading ./dataset/t10k-images-idx3-ubyte
loaded 10000 items..
loss:    0, accuracy: 8.5%
Done.

Compile error on cuDNN v8

When compiling on cuDNN v8.7, the following compilation error occurs.

convolution.cu(82): error: identifier "CUDNN_CONVOLUTION_FWD_PREFER_FASTEST" is undefined

convolution.cu(82): error: identifier "cudnnGetConvolutionForwardAlgorithm" is undefined

convolution.cu(87): error: identifier "CUDNN_CONVOLUTION_BWD_FILTER_PREFER_FASTEST" is undefined

convolution.cu(87): error: identifier "cudnnGetConvolutionBackwardFilterAlgorithm" is undefined

convolution.cu(92): error: identifier "CUDNN_CONVOLUTION_BWD_DATA_PREFER_FASTEST" is undefined

convolution.cu(92): error: identifier "cudnnGetConvolutionBackwardDataAlgorithm" is undefined

6 errors detected in the compilation of "convolution.cu".
make: *** [Makefile:43: convolution] Error 1

Did not use all the data(picture) during training or inferencing

Hi,
Thanks very much for your project, and I learnt a lot from it. But it seems that your project did not use all the data(picture) during training or inferencing. I modified the code a little to keep track of the index of all the pictures used:

I defined a public variable in class MNIST: public: std::vector<int> idx_store;

I stored all the indexes of picture used: (function void MNIST::get_batch())

for (int i = 0; i < batch_size_; i++) {
    std::copy(data_pool_[data_idx + i].data(),
        &data_pool_[data_idx + i].data()[data_size],
        &data_->ptr()[data_size * i]);
    idx_store.push_back(data_idx + i); // added
}

After the training, I tried to find an index greater than 500, but didn't work:

while (step < num_steps_train) {
    /* training... */
}

auto idx = train_data_loader.idx_store;
for (int i = 0; i < idx.size(); i++) {
    if (idx[i] > 500) {
        std::cout << idx[i] << std::endl; // debug here
    }
}

Then I figured out what the problem was: the following code in function void MNIST::get_batch()

int data_idx = (step_ * batch_size_) % num_steps_

This code limits the range of data_idx to between 0 and num_steps_, but it should be between 0 and 60000 (10000, test), so it only needs to be modified this way

int data_idx = step_ % num_steps_ * batch_size_;

After this modification, here is the result running on my machine:

[INFERENCE]
loading ./dataset/t10k-images.idx3-ubyte
loaded 10000 items..
conv1: Available Algorithm Count [FWD]: 10
conv1: Available Algorithm Count [BWD-filter]: 9
conv1: Available Algorithm Count [BWD-data]: 8
conv2: Available Algorithm Count [FWD]: 10
conv2: Available Algorithm Count [BWD-filter]: 9
conv2: Available Algorithm Count [BWD-data]: 8
loss: 0.145, accuracy: 90.050%
Done.

What is these identifier? compile error says that's undefined.

Hi.

Thanks for your code first. I can run cudnn/cuda deep learning example code successfully with yours.
By the way, I have question. When I make convolution using Makefile, the following error is occurred.
What can I do? What is the meaning of these identifiers?

/usr/local/cuda/bin/nvcc -ccbin g++ -m64 -g -std=c++11 -G --resource-usage -Xcompiler -rdynamic -Xcompiler -fopenmp -rdc=true -lnvToolsExt -I/usr/local/cuda/samples/common/inc -I/usr/local/cuda/include -L/usr/local/cuda/lib64 -lcublas -lcudnn -lgomp -lcurand -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -o convolution convolution.cu
convolution.cu(82): error: identifier "CUDNN_CONVOLUTION_FWD_PREFER_FASTEST" is undefined
convolution.cu(82): error: identifier "cudnnGetConvolutionForwardAlgorithm" is undefined
convolution.cu(87): error: identifier "CUDNN_CONVOLUTION_BWD_FILTER_PREFER_FASTEST" is undefined
convolution.cu(87): error: identifier "cudnnGetConvolutionBackwardFilterAlgorithm" is undefined
convolution.cu(92): error: identifier "CUDNN_CONVOLUTION_BWD_DATA_PREFER_FASTEST" is undefined
convolution.cu(92): error: identifier "cudnnGetConvolutionBackwardDataAlgorithm" is undefined
6 errors detected in the compilation of "convolution.cu".

Compiler error

#obj directory is not in the source file
SRC = src
OBJ_DIR = obj
so instead Makefile create directory
$(shell mkdir -p $(obj))
which is slow process
it would be better to put obj directory like src directory
simple fix is mkdir obj ^^

How to build ResNet network to train with cuda

Although deep learning frameworks(e.g., TF, Pytorch,...) are commonly used to train models, I still want to try to build a Resnet model using Cudnn/ Cublas library for the deep learning training.
Could you give me some suggestions about how to create classic deep learning network(e.g., ResNet, ...) for cuda.
Thank you very much!
Actually, for resnet18, only Pad/AddV2/Reshape/Mean Ops are need to implement.

haanjack / mnist-cudnn Goto Github PK

mnist-cudnn's Issues

Facing errors wrt g++ version type

How to inference based on saved parameters without training procedure.

Compile error on cuDNN v8

Did not use all the data(picture) during training or inferencing

What is these identifier? compile error says that's undefined.

Compiler error

How to build ResNet network to train with cuda

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent