Git Product home page Git Product logo

nvidia_libs_test's Issues

cuda/extras/CUPTI/include/cupti_result.h relocated in CUDA11.1

this file is relocated under /usr/local/cuda in CUDA 11.1(installed from local deb.)

cuda_util.h:26:10: fatal error: cuda/extras/CUPTI/include/cupti_result.h: No such file or directory
 #include "cuda/extras/CUPTI/include/cupti_result.h"
          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
Target //:cudnn_test failed to build
Use --verbose_failures to see the command lines of failed build steps.

An issue was reported when running cuda-memcheck --tool racecheck

When running the commanding "cuda-memcheck --tool racecheck --print-level error --flush-to-disk no --error-exitcode 1 /usr/bin/bazel run //:cudnn_test --action_env=CUDNN_PATH=/home/swqa/.vulcan/install/cuda --action_env=CUDA_PATH=/home/swqa/.vulcan/install/cuda -- --gtest_filter=CONVOLUTION_FWD_NCHW_TENSOR_OP_52x7x112x4_873x7x3x3_VALID_GetAlgo_v7" on TITAN V, the following issue was reported
"
[ RUN ] FromFile/ConvolutionTest.CompareResults/CONVOLUTION_FWD_NCHW_TENSOR_OP_52x7x112x4_873x7x3x3_VALID_GetAlgo_v7
F1023 04:04:30.495419 17575 cuda_util.cc:92] Check failed: OkStatus() == GetStatus(cudaFree(ptr_)) (ok vs. CUDA Runtime API error 'an illegal memory access was encountered')
*** Check failure stack trace: ***
@ 0x186dde0 google::LogMessage::Fail()
@ 0x186dd24 google::LogMessage::SendToLog()
@ 0x186d675 google::LogMessage::Flush()
@ 0x1870aee google::LogMessageFatal::~LogMessageFatal()
@ 0x46c42b nvidia_libs_test::DeviceMemory::~DeviceMemory()
@ 0x40e9d9 _ZN16nvidia_libs_test12_GLOBAL__N_114RunConvolutionEddRKSt10unique_ptrI12cudnnContextNS_6detail18CudnnHandleDeleterEERKNS_11ConvolutionERKN4absl7variantIJ25cudnnConvolutionFwdAlgo_t29cudnnConvolutionBwdDataAlgo_t31cudnnConvolutionBwdFilterAlgo_tEEE
@ 0x410b42 nvidia_libs_test::(anonymous namespace)::ConvolutionTest_CompareResults_Test::TestBody()
@ 0x18bf017 testing::internal::HandleSehExceptionsInMethodIfSupported<>()
@ 0x18ba07f testing::internal::HandleExceptionsInMethodIfSupported<>()
@ 0x189f35e testing::Test::Run()
@ 0x189fc50 testing::TestInfo::Run()
@ 0x18a02a5 testing::TestCase::Run()
@ 0x18a72a1 testing::internal::UnitTestImpl::RunAllTests()
@ 0x18bfd3f testing::internal::HandleSehExceptionsInMethodIfSupported<>()
@ 0x18bacb5 testing::internal::HandleExceptionsInMethodIfSupported<>()
@ 0x18a5f0f testing::UnitTest::Run()
@ 0x451181 RUN_ALL_TESTS()
@ 0x4509e8 main
@ 0x7fb41c5ff830 __libc_start_main
@ 0x40d639 _start
@ (nil) (unknown)
========= CUDA-MEMCHECK
========= RACECHECK SUMMARY: 0 hazards displayed (0 errors, 0 warnings)
"

Thanks
Bo

Windows support

I would find this incredibly helpful to verify correctness of my CuDNN installation, but it appears to me that the bazel code supports online Linux. (Please correct me if I am wrong.)

build failed

bazel run //:cudnn_benchmark
ERROR: /data//.cache/bazel/bazel/dad4edfde16590c5e6cb1df1644cc6bb/external/rules_proto/proto/private/native.bzl:22:19: name 'ProtoInfo' is not defined
ERROR: /data//tmp/nvidia_libs_test/BUILD:98:1: error loading package '@com_google_protobuf//': Extension 'proto/private/native.bzl' has errors and referenced by '//:cudnn_benchmark'
ERROR: Analysis of target '//:cudnn_benchmark' failed; build aborted: error loading package '@com_google_protobuf//': Extension 'proto/private/native.bzl' has errors
INFO: Elapsed time: 5.034s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (6 packages loaded)
currently loading: @com_google_protobuf// ... (3 packages)
ERROR: Build failed. Not running target

an intermittent error, failed on V100

bazel run //:cudnn_test --action_env=CUDNN_PATH=cuda9.0_cudnn_v7.4.1/cuda --action_env=CUDA_PATH=cuda -- --gtest_filter="Conv3d/ConvolutionTest.CompareResults/CONVOLUTION_BWD_FILTER_NCHW_TRUE_HALF_82x4x79x9x2_12x4x2x13x5_SAME"
DEBUG: /home/lab/.vulcan/install/cuda/_tests/google_cudnn_test/codes/nvidia_libs_test-master/cuda_configure.bzl:12:5: Using CUDA from /home/lab/.vulcan/install/cuda
DEBUG: /home/lab/.vulcan/install/cuda/_tests/google_cudnn_test/codes/nvidia_libs_test-master/cuda_configure.bzl:13:5: Using cuDNN from /home/lab/bow/project/5_software/cuda9.0_cudnn_v7.4.1/cuda
INFO: Analysed target //:cudnn_test (0 packages loaded).
INFO: Found 1 target...
Target //:cudnn_test up-to-date:
bazel-bin/cudnn_test
INFO: Elapsed time: 0.112s, Critical Path: 0.00s
INFO: 0 processes.
INFO: Build completed successfully, 1 total action
INFO: Running command line: external/bazel_tools/tools/test/test-setup.sh ./cudnn_test '--gtest_filter=*Conv3d/ConvolutionTest.CompareRINFO: Build completed successfully, 1 total action
exec ${PAGER:-/usr/bin/less} "$0" || exit 1
Executing tests from //:cudnn_test
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1212 02:57:55.774238 18306 cudnn_util.cc:68] Running cuDNN v7.4.1 for CUDA 9.0.0 on Tesla V100-DGXS-16GB
Note: Google Test filter = Conv3d/ConvolutionTest.CompareResults/CONVOLUTION_BWD_FILTER_NCHW_TRUE_HALF_82x4x79x9x2_12x4x2x13x5_SAME
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from Conv3d/ConvolutionTest
[ RUN ] Conv3d/ConvolutionTest.CompareResults/CONVOLUTION_BWD_FILTER_NCHW_TRUE_HALF_82x4x79x9x2_12x4x2x13x5_SAME
cudnn_conv_test.cc:462: Failure
Value of: IsOk(TensorDataEqual(ref_result_data, *result_data, *result_desc, tolerance))
Actual: false (6 elements differ more than 10. Largest differences:
[2788]: 0.22229 vs nan, error = nan
[5904]: 0 vs nan, error = nan
[1744]: 0 vs nan, error = nan
[2784]: 0 vs nan, error = nan
[1748]: 0.221191 vs nan, error = nan
[5908]: 0.220581 vs nan, error = nan)
Expected: true
format: TENSOR_NCHW
data_type: DATA_HALF
compute_mode: DATA_HALF
math_type: DEFAULT_MATH
algo: CONVOLUTION_BWD_FILTER_ALGO_1
[ FAILED ] Conv3d/ConvolutionTest.CompareResults/CONVOLUTION_BWD_FILTER_NCHW_TRUE_HALF_82x4x79x9x2_12x4x2x13x5_SAME, where GetParam() =
reference {
input {
dimension: 82
dimension: 4
dimension: 79
dimension: 9
dimension: 2
data_type: DATA_DOUBLE
format: TENSOR_NCHW
}
filter {
dimension: 12
dimension: 4
dimension: 2
dimension: 13
dimension: 5
data_type: DATA_DOUBLE
format: TENSOR_NCHW
}
convolution {
pad: 1
pad: 6
pad: 2
compute_mode: DATA_DOUBLE
}
one_minus_alpha: 0.99996569585949024
bwd_filter_algo: CONVOLUTION_BWD_FILTER_ALGO_0
label: "CONVOLUTION_BWD_FILTER_NCHW_TRUE_HALF_82x4x79x9x2_12x4x2x13x5_SAME"
}
test {
input {
data_type: DATA_HALF
format: TENSOR_NCHW
}
filter {
data_type: DATA_HALF
format: TENSOR_NCHW
}
convolution {
compute_mode: DATA_HALF
math_type: DEFAULT_MATH
}
all_algos: CONVOLUTION_BWD_FILTER
}
(21 ms)
[----------] 1 test from Conv3d/ConvolutionTest (21 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (21 ms total)
[ PASSED ] 0 tests.
[ FAILED ] 1 test, listed below:
[ FAILED ] Conv3d/ConvolutionTest.CompareResults/CONVOLUTION_BWD_FILTER_NCHW_TRUE_HALF_82x4x79x9x2_12x4x2x13x5_SAME, where GetParam() =
reference {
input {
dimension: 82
dimension: 4
dimension: 79
dimension: 9
dimension: 2
data_type: DATA_DOUBLE
format: TENSOR_NCHW
}
filter {
dimension: 12
dimension: 4
dimension: 2
dimension: 13
dimension: 5
data_type: DATA_DOUBLE
format: TENSOR_NCHW
}
convolution {
pad: 1
pad: 6
pad: 2
compute_mode: DATA_DOUBLE
}
one_minus_alpha: 0.99996569585949024
bwd_filter_algo: CONVOLUTION_BWD_FILTER_ALGO_0
label: "CONVOLUTION_BWD_FILTER_NCHW_TRUE_HALF_82x4x79x9x2_12x4x2x13x5_SAME"
}
test {
input {
data_type: DATA_HALF
format: TENSOR_NCHW
}
filter {
data_type: DATA_HALF
format: TENSOR_NCHW
}
convolution {
compute_mode: DATA_HALF
math_type: DEFAULT_MATH
}
all_algos: CONVOLUTION_BWD_FILTER
}

1 FAILED TEST

kernel_timer.cc build failed with "error: 'uint64' was not declared in this scope domain_cycles.end(), uint64{0})"

Build cudnn_benchmark
bazel run //:cudnn_benchmark -c opt --action_env=CUDNN_PATH=/data/tmp/cuda --action_env=CUDA_PATH=/data/tmp/cuda
kernel_timer.cc build failed with the following error:
###############
kernel_timer.cc:194:68: error: 'uint64' was not declared in this scope
domain_cycles.end(), uint64{0});
^
################
This error was by caused by latest commit a378e0f
It should be uint64_t{0}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.