google / nvidia_libs_test Goto Github PK
View Code? Open in Web Editor NEWTests and benchmarks for cudnn (and in the future, other nvidia libraries)
License: Apache License 2.0
Tests and benchmarks for cudnn (and in the future, other nvidia libraries)
License: Apache License 2.0
this file is relocated under /usr/local/cuda in CUDA 11.1(installed from local deb.)
cuda_util.h:26:10: fatal error: cuda/extras/CUPTI/include/cupti_result.h: No such file or directory
#include "cuda/extras/CUPTI/include/cupti_result.h"
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
Target //:cudnn_test failed to build
Use --verbose_failures to see the command lines of failed build steps.
When running the commanding "cuda-memcheck --tool racecheck --print-level error --flush-to-disk no --error-exitcode 1 /usr/bin/bazel run //:cudnn_test --action_env=CUDNN_PATH=/home/swqa/.vulcan/install/cuda --action_env=CUDA_PATH=/home/swqa/.vulcan/install/cuda -- --gtest_filter=CONVOLUTION_FWD_NCHW_TENSOR_OP_52x7x112x4_873x7x3x3_VALID_GetAlgo_v7" on TITAN V, the following issue was reported
"
[ RUN ] FromFile/ConvolutionTest.CompareResults/CONVOLUTION_FWD_NCHW_TENSOR_OP_52x7x112x4_873x7x3x3_VALID_GetAlgo_v7
F1023 04:04:30.495419 17575 cuda_util.cc:92] Check failed: OkStatus() == GetStatus(cudaFree(ptr_)) (ok vs. CUDA Runtime API error 'an illegal memory access was encountered')
*** Check failure stack trace: ***
@ 0x186dde0 google::LogMessage::Fail()
@ 0x186dd24 google::LogMessage::SendToLog()
@ 0x186d675 google::LogMessage::Flush()
@ 0x1870aee google::LogMessageFatal::~LogMessageFatal()
@ 0x46c42b nvidia_libs_test::DeviceMemory::~DeviceMemory()
@ 0x40e9d9 _ZN16nvidia_libs_test12_GLOBAL__N_114RunConvolutionEddRKSt10unique_ptrI12cudnnContextNS_6detail18CudnnHandleDeleterEERKNS_11ConvolutionERKN4absl7variantIJ25cudnnConvolutionFwdAlgo_t29cudnnConvolutionBwdDataAlgo_t31cudnnConvolutionBwdFilterAlgo_tEEE
@ 0x410b42 nvidia_libs_test::(anonymous namespace)::ConvolutionTest_CompareResults_Test::TestBody()
@ 0x18bf017 testing::internal::HandleSehExceptionsInMethodIfSupported<>()
@ 0x18ba07f testing::internal::HandleExceptionsInMethodIfSupported<>()
@ 0x189f35e testing::Test::Run()
@ 0x189fc50 testing::TestInfo::Run()
@ 0x18a02a5 testing::TestCase::Run()
@ 0x18a72a1 testing::internal::UnitTestImpl::RunAllTests()
@ 0x18bfd3f testing::internal::HandleSehExceptionsInMethodIfSupported<>()
@ 0x18bacb5 testing::internal::HandleExceptionsInMethodIfSupported<>()
@ 0x18a5f0f testing::UnitTest::Run()
@ 0x451181 RUN_ALL_TESTS()
@ 0x4509e8 main
@ 0x7fb41c5ff830 __libc_start_main
@ 0x40d639 _start
@ (nil) (unknown)
========= CUDA-MEMCHECK
========= RACECHECK SUMMARY: 0 hazards displayed (0 errors, 0 warnings)
"
Thanks
Bo
I would find this incredibly helpful to verify correctness of my CuDNN installation, but it appears to me that the bazel code supports online Linux. (Please correct me if I am wrong.)
bazel run //:cudnn_benchmark
ERROR: /data//.cache/bazel/bazel/dad4edfde16590c5e6cb1df1644cc6bb/external/rules_proto/proto/private/native.bzl:22:19: name 'ProtoInfo' is not defined
ERROR: /data//tmp/nvidia_libs_test/BUILD:98:1: error loading package '@com_google_protobuf//': Extension 'proto/private/native.bzl' has errors and referenced by '//:cudnn_benchmark'
ERROR: Analysis of target '//:cudnn_benchmark' failed; build aborted: error loading package '@com_google_protobuf//': Extension 'proto/private/native.bzl' has errors
INFO: Elapsed time: 5.034s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (6 packages loaded)
currently loading: @com_google_protobuf// ... (3 packages)
ERROR: Build failed. Not running target
bazel run //:cudnn_test --action_env=CUDNN_PATH=cuda9.0_cudnn_v7.4.1/cuda --action_env=CUDA_PATH=cuda -- --gtest_filter="Conv3d/ConvolutionTest.CompareResults/CONVOLUTION_BWD_FILTER_NCHW_TRUE_HALF_82x4x79x9x2_12x4x2x13x5_SAME"
DEBUG: /home/lab/.vulcan/install/cuda/_tests/google_cudnn_test/codes/nvidia_libs_test-master/cuda_configure.bzl:12:5: Using CUDA from /home/lab/.vulcan/install/cuda
DEBUG: /home/lab/.vulcan/install/cuda/_tests/google_cudnn_test/codes/nvidia_libs_test-master/cuda_configure.bzl:13:5: Using cuDNN from /home/lab/bow/project/5_software/cuda9.0_cudnn_v7.4.1/cuda
INFO: Analysed target //:cudnn_test (0 packages loaded).
INFO: Found 1 target...
Target //:cudnn_test up-to-date:
bazel-bin/cudnn_test
INFO: Elapsed time: 0.112s, Critical Path: 0.00s
INFO: 0 processes.
INFO: Build completed successfully, 1 total action
INFO: Running command line: external/bazel_tools/tools/test/test-setup.sh ./cudnn_test '--gtest_filter=*Conv3d/ConvolutionTest.CompareRINFO: Build completed successfully, 1 total action
exec ${PAGER:-/usr/bin/less} "$0" || exit 1
Executing tests from //:cudnn_test
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1212 02:57:55.774238 18306 cudnn_util.cc:68] Running cuDNN v7.4.1 for CUDA 9.0.0 on Tesla V100-DGXS-16GB
Note: Google Test filter = Conv3d/ConvolutionTest.CompareResults/CONVOLUTION_BWD_FILTER_NCHW_TRUE_HALF_82x4x79x9x2_12x4x2x13x5_SAME
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from Conv3d/ConvolutionTest
[ RUN ] Conv3d/ConvolutionTest.CompareResults/CONVOLUTION_BWD_FILTER_NCHW_TRUE_HALF_82x4x79x9x2_12x4x2x13x5_SAME
cudnn_conv_test.cc:462: Failure
Value of: IsOk(TensorDataEqual(ref_result_data, *result_data, *result_desc, tolerance))
Actual: false (6 elements differ more than 10. Largest differences:
[2788]: 0.22229 vs nan, error = nan
[5904]: 0 vs nan, error = nan
[1744]: 0 vs nan, error = nan
[2784]: 0 vs nan, error = nan
[1748]: 0.221191 vs nan, error = nan
[5908]: 0.220581 vs nan, error = nan)
Expected: true
format: TENSOR_NCHW
data_type: DATA_HALF
compute_mode: DATA_HALF
math_type: DEFAULT_MATH
algo: CONVOLUTION_BWD_FILTER_ALGO_1
[ FAILED ] Conv3d/ConvolutionTest.CompareResults/CONVOLUTION_BWD_FILTER_NCHW_TRUE_HALF_82x4x79x9x2_12x4x2x13x5_SAME, where GetParam() =
reference {
input {
dimension: 82
dimension: 4
dimension: 79
dimension: 9
dimension: 2
data_type: DATA_DOUBLE
format: TENSOR_NCHW
}
filter {
dimension: 12
dimension: 4
dimension: 2
dimension: 13
dimension: 5
data_type: DATA_DOUBLE
format: TENSOR_NCHW
}
convolution {
pad: 1
pad: 6
pad: 2
compute_mode: DATA_DOUBLE
}
one_minus_alpha: 0.99996569585949024
bwd_filter_algo: CONVOLUTION_BWD_FILTER_ALGO_0
label: "CONVOLUTION_BWD_FILTER_NCHW_TRUE_HALF_82x4x79x9x2_12x4x2x13x5_SAME"
}
test {
input {
data_type: DATA_HALF
format: TENSOR_NCHW
}
filter {
data_type: DATA_HALF
format: TENSOR_NCHW
}
convolution {
compute_mode: DATA_HALF
math_type: DEFAULT_MATH
}
all_algos: CONVOLUTION_BWD_FILTER
}
(21 ms)
[----------] 1 test from Conv3d/ConvolutionTest (21 ms total)
[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (21 ms total)
[ PASSED ] 0 tests.
[ FAILED ] 1 test, listed below:
[ FAILED ] Conv3d/ConvolutionTest.CompareResults/CONVOLUTION_BWD_FILTER_NCHW_TRUE_HALF_82x4x79x9x2_12x4x2x13x5_SAME, where GetParam() =
reference {
input {
dimension: 82
dimension: 4
dimension: 79
dimension: 9
dimension: 2
data_type: DATA_DOUBLE
format: TENSOR_NCHW
}
filter {
dimension: 12
dimension: 4
dimension: 2
dimension: 13
dimension: 5
data_type: DATA_DOUBLE
format: TENSOR_NCHW
}
convolution {
pad: 1
pad: 6
pad: 2
compute_mode: DATA_DOUBLE
}
one_minus_alpha: 0.99996569585949024
bwd_filter_algo: CONVOLUTION_BWD_FILTER_ALGO_0
label: "CONVOLUTION_BWD_FILTER_NCHW_TRUE_HALF_82x4x79x9x2_12x4x2x13x5_SAME"
}
test {
input {
data_type: DATA_HALF
format: TENSOR_NCHW
}
filter {
data_type: DATA_HALF
format: TENSOR_NCHW
}
convolution {
compute_mode: DATA_HALF
math_type: DEFAULT_MATH
}
all_algos: CONVOLUTION_BWD_FILTER
}
1 FAILED TEST
Build cudnn_benchmark
bazel run //:cudnn_benchmark -c opt --action_env=CUDNN_PATH=/data/tmp/cuda --action_env=CUDA_PATH=/data/tmp/cuda
kernel_timer.cc build failed with the following error:
###############
kernel_timer.cc:194:68: error: 'uint64' was not declared in this scope
domain_cycles.end(), uint64{0});
^
################
This error was by caused by latest commit a378e0f
It should be uint64_t{0}
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.