Hi, there, I can successfully build Paddle on my machine installed L

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

duplicated with <a class="issue-link js-issue-link" data-error-text="Failed to load ti

Please track <a class="issue-link js-issue-link" data-error-text="Failed to load title

Hi, <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Thanks for your reply <a class="user-mention notranslate" data-hovercard-type="user" d

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Compatiblity issue with CUDA 8.0 about paddle HOT 16 CLOSED

paddlepaddle commented on May 3, 2024

Compatiblity issue with CUDA 8.0

from paddle.

Comments (16)

gangliao commented on May 3, 2024 2

Hi, invalid device function indicates that you have a CUDA / GPU incompatibility.
Since you use GTX 1080, you can modify CMake to fix it.

open cmake/flags.cmake and add following code:
if (CUDA_VERSION VERSION_GREATER "8.0")

list(APPEND __arch_flags " -gencode arch=compute_60,code=sm_60")

endif()

then, rebuild the project

from paddle.

stoneyang commented on May 3, 2024 1

@hedaoyuan , Hi, thanks for your reply! Do you mean add GLOG_v=3 in train.sh under demo\image_classification? I tried this and Paddle gives me the following error log (seems a little different from the one in the original post):

I1004 10:36:08.486361  8402 Util.cpp:151] commandline: ../../build/paddle/trainer/paddle_trainer --config=vgg_16_cifar.py --dot_period=10 --log_period=100 --test_all_data_in_one_period=1 --use_gpu=1 --trainer_count=1 --num_passes=200 --save_dir=./cifar_vgg_model
I1004 10:36:14.866709  8402 Util.cpp:126] Calling runInitFunctions
I1004 10:36:14.866930  8402 Util.cpp:139] Call runInitFunctions done.
I1004 10:36:14.875442  8402 TrainerConfigHelper.cpp:55] Parsing trainer config vgg_16_cifar.py
[INFO 2016-10-04 10:36:15,093 layers.py:1620] channels=3 size=3072
[INFO 2016-10-04 10:36:15,093 layers.py:1620] output size for __conv_0__ is 32
[INFO 2016-10-04 10:36:15,096 layers.py:1620] channels=64 size=65536
[INFO 2016-10-04 10:36:15,097 layers.py:1620] output size for __conv_1__ is 32
[INFO 2016-10-04 10:36:15,099 layers.py:1681] output size for __pool_0__ is 16*16
[INFO 2016-10-04 10:36:15,100 layers.py:1620] channels=64 size=16384
[INFO 2016-10-04 10:36:15,100 layers.py:1620] output size for __conv_2__ is 16
[INFO 2016-10-04 10:36:15,101 layers.py:1620] channels=128 size=32768
[INFO 2016-10-04 10:36:15,102 layers.py:1620] output size for __conv_3__ is 16
[INFO 2016-10-04 10:36:15,103 layers.py:1681] output size for __pool_1__ is 8*8
[INFO 2016-10-04 10:36:15,103 layers.py:1620] channels=128 size=8192
[INFO 2016-10-04 10:36:15,104 layers.py:1620] output size for __conv_4__ is 8
[INFO 2016-10-04 10:36:15,105 layers.py:1620] channels=256 size=16384
[INFO 2016-10-04 10:36:15,105 layers.py:1620] output size for __conv_5__ is 8
[INFO 2016-10-04 10:36:15,106 layers.py:1620] channels=256 size=16384
[INFO 2016-10-04 10:36:15,106 layers.py:1620] output size for __conv_6__ is 8
[INFO 2016-10-04 10:36:15,108 layers.py:1681] output size for __pool_2__ is 4*4
[INFO 2016-10-04 10:36:15,108 layers.py:1620] channels=256 size=4096
[INFO 2016-10-04 10:36:15,108 layers.py:1620] output size for __conv_7__ is 4
[INFO 2016-10-04 10:36:15,110 layers.py:1620] channels=512 size=8192
[INFO 2016-10-04 10:36:15,110 layers.py:1620] output size for __conv_8__ is 4
[INFO 2016-10-04 10:36:15,111 layers.py:1620] channels=512 size=8192
[INFO 2016-10-04 10:36:15,111 layers.py:1620] output size for __conv_9__ is 4
[INFO 2016-10-04 10:36:15,112 layers.py:1681] output size for __pool_3__ is 2*2
[INFO 2016-10-04 10:36:15,113 layers.py:1681] output size for __pool_4__ is 1*1
[INFO 2016-10-04 10:36:15,115 networks.py:1125] The input order is [image, label]
[INFO 2016-10-04 10:36:15,115 networks.py:1132] The output order is [__cost_0__]
I1004 10:36:15.138290  8402 Trainer.cpp:170] trainer mode: Normal
F1004 10:36:15.139696  8402 hl_gpu_matrix_kernel.cuh:181] Check failed: cudaSuccess == err (0 vs. 8) [hl_gpu_apply_unary_op failed] CUDA error: invalid device function
*** Check failure stack trace: ***
    @     0x7fdd6ccaedaa  (unknown)
    @     0x7fdd6ccaece4  (unknown)
    @     0x7fdd6ccae6e6  (unknown)
    @     0x7fdd6ccb1687  (unknown)
    @           0x78ffa9  hl_gpu_apply_unary_op<>()
    @           0x758d2f  paddle::BaseMatrixT<>::applyUnary<>()
    @           0x758919  paddle::BaseMatrixT<>::applyUnary<>()
    @           0x742e9f  paddle::BaseMatrixT<>::zero()
    @           0x62c94e  paddle::Parameter::enableType()
    @           0x628e0c  paddle::parameterInitNN()
    @           0x62b27b  paddle::NeuralNetwork::init()
    @           0x630e93  paddle::GradientMachine::create()
    @           0x6ad345  paddle::TrainerInternal::init()
    @           0x6a9727  paddle::Trainer::init()
    @           0x542a65  main
    @     0x7fdd6bebaf45  (unknown)
    @           0x54e355  (unknown)
    @              (nil)  (unknown)
Aborted (core dumped)
No data to plot. Exiting!

from paddle.

gangliao commented on May 3, 2024 1

@stoneyang fixed Fix CUDA_VERSION Comparsion #165

from paddle.

reyoung commented on May 3, 2024

duplicated with #3 .

CUDA 8.0 support issue is under resolving.

from paddle.

reyoung commented on May 3, 2024

Please track #3 for recent progresses.

from paddle.

stoneyang commented on May 3, 2024

Hi, @reyoung

The problem I posted here is an issue on using PaddlePaddle AFTER SUCCESSFULLY BUILDING THE CODE, which is different from #3 you try to referenced. As far as I know, guy who reported an issue of the building process with CUDA 8.0! That issue can be easily shot down by my merged PR #15 .

So I don't think the two issue can be treated like the same problem.

Appended is my nvcc version for your reference:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Wed_May__4_21:01:56_CDT_2016
Cuda compilation tools, release 8.0, V8.0.26

Hope it helps :)

from paddle.

stoneyang commented on May 3, 2024

Thanks for your reply @gangliao !
I'll try it later.

from paddle.

stoneyang commented on May 3, 2024

@gangliao same errors when running train.sh as my original post ....

I appended the lines at the end of flags.cmake under cmake directory as:

if (CUDA_VERSION VERSION_GREATER "7.0")
    list(APPEND __arch_flags " -gencode arch=compute_52,code=sm_52")
endif()

if(CUDA_VERSION VERSION_GREATER "8.0")
    list(APPEND __arch_flags " -gencode arch=compute_60,code=sm_52")
endif()

set(CUDA_NVCC_FLAGS ${__arch_flags} ${CUDA_NVCC_FLAGS})

and run the same building steps and it did success.

P.S.: Same error when directly replaced if() ... endif() for 7.0 check and configure with 8.0:

# if (CUDA_VERSION VERSION_GREATER "7.0")
#     list(APPEND __arch_flags " -gencode arch=compute_52,code=sm_52")
# endif()

if(CUDA_VERSION VERSION_GREATER "8.0")
    list(APPEND __arch_flags " -gencode arch=compute_60,code=sm_52")
endif()

set(CUDA_NVCC_FLAGS ${__arch_flags} ${CUDA_NVCC_FLAGS})

from paddle.

gangliao commented on May 3, 2024

@stoneyang Why do you set sm_52? how about -gencode arch=compute_60,code=sm_60 ??

if (CUDA_VERSION VERSION_GREATER "7.0")
    list(APPEND __arch_flags " -gencode arch=compute_52,code=sm_52")
endif()

if(CUDA_VERSION VERSION_GREATER "8.0")
    list(APPEND __arch_flags " -gencode arch=compute_60,code=sm_60")
endif()

set(CUDA_NVCC_FLAGS ${__arch_flags} ${CUDA_NVCC_FLAGS})

from paddle.

stoneyang commented on May 3, 2024

@gangliao sorry for the typo ....

I changed those lines (see PR #40). And the new CUDA code generation configuration works fine, which is different from your suggestion. :)

foreach(capability 30 35 50 52 60)
    list(APPEND __arch_flags " -gencode arch=compute_${capability},code=sm_${capability}")
endforeach()

Please note that the former if() ... endif()-s for CUDA 7.0 or later are commented out since it is invalid (they keep producing errors in my original post). This might be the cmake-thing, which I do not have enough time to verify it for now.

But new error occurs when running:

$ sh train.sh

The log:

I0905 13:50:01.279917 20299 Util.cpp:144] commandline: /path/to/Paddle/build/paddle/trainer/paddle_trainer --config=vgg_16_cifar.py --dot_period=10 --log_period=100 --test_all_data_in_one_period=1 --use_gpu=1 --trainer_count=1 --num_passes=200 --save_dir=./cifar_vgg_model
I0905 13:50:02.728484 20299 Util.cpp:113] Calling runInitFunctions
I0905 13:50:02.728711 20299 Util.cpp:126] Call runInitFunctions done.
[INFO 2016-09-05 13:50:02,782 layers.py:1438] channels=3 size=3072
[INFO 2016-09-05 13:50:02,782 layers.py:1438] output size for __conv_0__ is 32
[INFO 2016-09-05 13:50:02,784 layers.py:1438] channels=64 size=65536
[INFO 2016-09-05 13:50:02,784 layers.py:1438] output size for __conv_1__ is 32
[INFO 2016-09-05 13:50:02,785 layers.py:1499] output size for __pool_0__ is 16*16
[INFO 2016-09-05 13:50:02,786 layers.py:1438] channels=64 size=16384
[INFO 2016-09-05 13:50:02,786 layers.py:1438] output size for __conv_2__ is 16
[INFO 2016-09-05 13:50:02,787 layers.py:1438] channels=128 size=32768
[INFO 2016-09-05 13:50:02,787 layers.py:1438] output size for __conv_3__ is 16
[INFO 2016-09-05 13:50:02,788 layers.py:1499] output size for __pool_1__ is 8*8
[INFO 2016-09-05 13:50:02,789 layers.py:1438] channels=128 size=8192
[INFO 2016-09-05 13:50:02,789 layers.py:1438] output size for __conv_4__ is 8
[INFO 2016-09-05 13:50:02,790 layers.py:1438] channels=256 size=16384
[INFO 2016-09-05 13:50:02,790 layers.py:1438] output size for __conv_5__ is 8
[INFO 2016-09-05 13:50:02,791 layers.py:1438] channels=256 size=16384
[INFO 2016-09-05 13:50:02,792 layers.py:1438] output size for __conv_6__ is 8
[INFO 2016-09-05 13:50:02,793 layers.py:1499] output size for __pool_2__ is 4*4
[INFO 2016-09-05 13:50:02,793 layers.py:1438] channels=256 size=4096
[INFO 2016-09-05 13:50:02,793 layers.py:1438] output size for __conv_7__ is 4
[INFO 2016-09-05 13:50:02,794 layers.py:1438] channels=512 size=8192
[INFO 2016-09-05 13:50:02,795 layers.py:1438] output size for __conv_8__ is 4
[INFO 2016-09-05 13:50:02,796 layers.py:1438] channels=512 size=8192
[INFO 2016-09-05 13:50:02,796 layers.py:1438] output size for __conv_9__ is 4
[INFO 2016-09-05 13:50:02,797 layers.py:1499] output size for __pool_3__ is 2*2
[INFO 2016-09-05 13:50:02,797 layers.py:1499] output size for __pool_4__ is 1*1
[INFO 2016-09-05 13:50:02,799 networks.py:1122] The input order is [image, label]
[INFO 2016-09-05 13:50:02,799 networks.py:1129] The output order is [__cost_0__]
I0905 13:50:02.806649 20299 Trainer.cpp:169] trainer mode: Normal
I0905 13:50:02.840154 20299 PyDataProvider2.cpp:219] loading dataprovider image_provider::processData
[INFO 2016-09-05 13:50:03,068 image_provider.py:52] Image size: 32
[INFO 2016-09-05 13:50:03,069 image_provider.py:53] Meta path: data/cifar-out/batches/batches.meta
[INFO 2016-09-05 13:50:03,069 image_provider.py:58] DataProvider Initialization finished
I0905 13:50:03.069367 20299 PyDataProvider2.cpp:219] loading dataprovider image_provider::processData
[INFO 2016-09-05 13:50:03,069 image_provider.py:52] Image size: 32
[INFO 2016-09-05 13:50:03,069 image_provider.py:53] Meta path: data/cifar-out/batches/batches.meta
[INFO 2016-09-05 13:50:03,070 image_provider.py:58] DataProvider Initialization finished
I0905 13:50:03.070163 20299 GradientMachine.cpp:134] Initing parameters..
I0905 13:50:03.706565 20299 GradientMachine.cpp:141] Init parameters done.
.........
I0905 13:50:44.860791 20299 TrainerInternal.cpp:162]  Batch=100 samples=12800 AvgCost=2.34658 CurrentCost=2.34658 Eval: classification_error_evaluator=0.825234  CurrentEval: classification_error_evaluator=0.825234
.........
I0905 13:50:52.310623 20299 TrainerInternal.cpp:162]  Batch=200 samples=25600 AvgCost=2.16351 CurrentCost=1.98044 Eval: classification_error_evaluator=0.782852  CurrentEval: classification_error_evaluator=0.740469
.........
I0905 13:50:59.720386 20299 TrainerInternal.cpp:162]  Batch=300 samples=38400 AvgCost=2.01217 CurrentCost=1.70949 Eval: classification_error_evaluator=0.743021  CurrentEval: classification_error_evaluator=0.663359
.........I0905 13:51:06.456202 20299 TrainerInternal.cpp:179]  Pass=0 Batch=391 samples=50048 AvgCost=1.89668 Eval: classification_error_evaluator=0.705
F0905 13:51:08.651224 20299 hl_cuda_cudnn.cc:779] Check failed: CUDNN_STATUS_SUCCESS == cudnnStat (0 vs. 5) Cudnn Error: CUDNN_STATUS_INVALID_VALUE
*** Check failure stack trace: ***
    @     0x7fc6006fddaa  (unknown)
    @     0x7fc6006fdce4  (unknown)
    @     0x7fc6006fd6e6  (unknown)
    @     0x7fc600700687  (unknown)
    @           0x8b4594  hl_convolution_forward()
    @           0x5b3fac  paddle::CudnnConvLayer::forward()
    @           0x62a01c  paddle::NeuralNetwork::forward()
    @           0x6bdaff  paddle::Tester::testOneBatch()
    @           0x6be412  paddle::Tester::testOnePeriod()
    @           0x6a28d4  paddle::Trainer::trainOnePass()
    @           0x6a5cc7  paddle::Trainer::train()
    @           0x5439f3  main
    @     0x7fc5ff909f45  (unknown)
    @           0x54efd5  (unknown)
    @              (nil)  (unknown)
Aborted (core dumped)

Looks like an cuDNN issue about the function cudnnConvolutionForward() defined in Nvidia's cuDNN library. Might it is a wrapper issue?

from paddle.

gangliao commented on May 3, 2024

GPU K20/40 works fine on this demo. Currently, this bug only appeared on GPU GTX, one of my colleagues already reproduced it and will solve it soon. Thanks.

from paddle.

stoneyang commented on May 3, 2024

Thanks for reopening this issue! @gangliao

Appended is the version of cuDNN: 5.0.5.

Hope it is helpful.

from paddle.

hedaoyuan commented on May 3, 2024

Fixed #107, and closed.

from paddle.

stoneyang commented on May 3, 2024

Same issue as the original report still persists after new commits were fetched and re-make-ed. Any further suggestions? @gangliao @hedaoyuan

from paddle.

hedaoyuan commented on May 3, 2024

@stoneyang Can you try adding GLOG_v=3 option on the command line(like this GLOG_v=3 paddle ...). This will print out more debug information.

from paddle.

stoneyang commented on May 3, 2024

@gangliao , seems perfect now!

from paddle.

Compatiblity issue with CUDA 8.0 about paddle HOT 16 CLOSED

Comments (16)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent