Git Product home page Git Product logo

nvidia_libs_test's Introduction

Tests and Benchmarks for cuDNN

The repository contains a set of convolution tests and benchmarks for NVIDIA's cuDNN library.

This is not an officially supported Google product.

Prerequisites

Install bazel (instructions).

Install the CUDA Toolkit (CUDA 8 is the minimal supported version)

Install the cuDNN SDK (cuDNN 6 is the minimal supported version).

Common parameters

Bazel parameters:

  • -c opt Build with optimizations. Recommended for benchmarks.

  • --action_env=CUDA_PATH=<path>: Path to the CUDA SDK directory. Default is /usr/local/cuda.

  • --action_env=CUDNN_PATH=<path>: Path to the CUDNN SDK directory. Default is CUDA_PATH.

  • --action_env=CC=<compiler>: Name of (or path to) the compiler. Examples: clang, gcc-6.

Executable parameters:

  • --cuda_device=<device_id>: CUDA device to use. Default is 0.

  • --device_memory_limit_mb=<size>: Upper limit of device memory (in megabytes) to use for cuDNN workspace after tensors have been allocated. Negative values specify an offset from the memory available at startup. Default is 4096.

Test instructions

bazel run [bazel parameters] //:cudnn_test -- [test parameters]

Bazel can run tests in a sandbox (which allows reporting crashes as failures). To run in a sandbox, replace 'bazel run ...' with 'bazel test ...' and prefix each test parameter with '--test_arg='

bazel test [bazel parameters] //:cudnn_test --test_arg=[test parameter 1] ...

Test parameters:

  • --gtest_filter=<pattern>: Only run tests that match the given pattern. Example pattern: '*FWD*'.

  • --proto_path=<path>: Path to textproto file that contains additional convolutions to run. Default is 'cudnn_tests.textproto'.

  • --gtest_random_seed=<value>: Seed for random generator. Changing the value produces tests with a different mix of tensor shapes, filter sizes, etc. Default is 0.

  • --gtest_also_run_disabled_tests: Include disabled tests (i.e. tests with names that start with DISABLED_).

  • --help for more options.

Benchmark instructions

bazel run [bazel parameters] //:cudnn_benchmark -- [benchmark parameters]

Benchmark parameters:

  • --benchmark_filter=<regex>: Only run tests that match the given regex pattern. Example: 'BWD_DATA'.

  • --proto_path=<path>: Path to textproto file that contains additional convolutions to run. Default is 'cudnn_benchmarks.textproto'.

  • --timing=<method>: How to measure execution time. One of 'kernel-duration' (default), 'kernel-cycles', or 'host-duration'.

  • --help and --helpfull for more options.

nvidia_libs_test's People

Contributors

chsigg avatar nluehr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nvidia_libs_test's Issues

an intermittent error, failed on V100

bazel run //:cudnn_test --action_env=CUDNN_PATH=cuda9.0_cudnn_v7.4.1/cuda --action_env=CUDA_PATH=cuda -- --gtest_filter="Conv3d/ConvolutionTest.CompareResults/CONVOLUTION_BWD_FILTER_NCHW_TRUE_HALF_82x4x79x9x2_12x4x2x13x5_SAME"
DEBUG: /home/lab/.vulcan/install/cuda/_tests/google_cudnn_test/codes/nvidia_libs_test-master/cuda_configure.bzl:12:5: Using CUDA from /home/lab/.vulcan/install/cuda
DEBUG: /home/lab/.vulcan/install/cuda/_tests/google_cudnn_test/codes/nvidia_libs_test-master/cuda_configure.bzl:13:5: Using cuDNN from /home/lab/bow/project/5_software/cuda9.0_cudnn_v7.4.1/cuda
INFO: Analysed target //:cudnn_test (0 packages loaded).
INFO: Found 1 target...
Target //:cudnn_test up-to-date:
bazel-bin/cudnn_test
INFO: Elapsed time: 0.112s, Critical Path: 0.00s
INFO: 0 processes.
INFO: Build completed successfully, 1 total action
INFO: Running command line: external/bazel_tools/tools/test/test-setup.sh ./cudnn_test '--gtest_filter=*Conv3d/ConvolutionTest.CompareRINFO: Build completed successfully, 1 total action
exec ${PAGER:-/usr/bin/less} "$0" || exit 1
Executing tests from //:cudnn_test
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1212 02:57:55.774238 18306 cudnn_util.cc:68] Running cuDNN v7.4.1 for CUDA 9.0.0 on Tesla V100-DGXS-16GB
Note: Google Test filter = Conv3d/ConvolutionTest.CompareResults/CONVOLUTION_BWD_FILTER_NCHW_TRUE_HALF_82x4x79x9x2_12x4x2x13x5_SAME
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from Conv3d/ConvolutionTest
[ RUN ] Conv3d/ConvolutionTest.CompareResults/CONVOLUTION_BWD_FILTER_NCHW_TRUE_HALF_82x4x79x9x2_12x4x2x13x5_SAME
cudnn_conv_test.cc:462: Failure
Value of: IsOk(TensorDataEqual(ref_result_data, *result_data, *result_desc, tolerance))
Actual: false (6 elements differ more than 10. Largest differences:
[2788]: 0.22229 vs nan, error = nan
[5904]: 0 vs nan, error = nan
[1744]: 0 vs nan, error = nan
[2784]: 0 vs nan, error = nan
[1748]: 0.221191 vs nan, error = nan
[5908]: 0.220581 vs nan, error = nan)
Expected: true
format: TENSOR_NCHW
data_type: DATA_HALF
compute_mode: DATA_HALF
math_type: DEFAULT_MATH
algo: CONVOLUTION_BWD_FILTER_ALGO_1
[ FAILED ] Conv3d/ConvolutionTest.CompareResults/CONVOLUTION_BWD_FILTER_NCHW_TRUE_HALF_82x4x79x9x2_12x4x2x13x5_SAME, where GetParam() =
reference {
input {
dimension: 82
dimension: 4
dimension: 79
dimension: 9
dimension: 2
data_type: DATA_DOUBLE
format: TENSOR_NCHW
}
filter {
dimension: 12
dimension: 4
dimension: 2
dimension: 13
dimension: 5
data_type: DATA_DOUBLE
format: TENSOR_NCHW
}
convolution {
pad: 1
pad: 6
pad: 2
compute_mode: DATA_DOUBLE
}
one_minus_alpha: 0.99996569585949024
bwd_filter_algo: CONVOLUTION_BWD_FILTER_ALGO_0
label: "CONVOLUTION_BWD_FILTER_NCHW_TRUE_HALF_82x4x79x9x2_12x4x2x13x5_SAME"
}
test {
input {
data_type: DATA_HALF
format: TENSOR_NCHW
}
filter {
data_type: DATA_HALF
format: TENSOR_NCHW
}
convolution {
compute_mode: DATA_HALF
math_type: DEFAULT_MATH
}
all_algos: CONVOLUTION_BWD_FILTER
}
(21 ms)
[----------] 1 test from Conv3d/ConvolutionTest (21 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (21 ms total)
[ PASSED ] 0 tests.
[ FAILED ] 1 test, listed below:
[ FAILED ] Conv3d/ConvolutionTest.CompareResults/CONVOLUTION_BWD_FILTER_NCHW_TRUE_HALF_82x4x79x9x2_12x4x2x13x5_SAME, where GetParam() =
reference {
input {
dimension: 82
dimension: 4
dimension: 79
dimension: 9
dimension: 2
data_type: DATA_DOUBLE
format: TENSOR_NCHW
}
filter {
dimension: 12
dimension: 4
dimension: 2
dimension: 13
dimension: 5
data_type: DATA_DOUBLE
format: TENSOR_NCHW
}
convolution {
pad: 1
pad: 6
pad: 2
compute_mode: DATA_DOUBLE
}
one_minus_alpha: 0.99996569585949024
bwd_filter_algo: CONVOLUTION_BWD_FILTER_ALGO_0
label: "CONVOLUTION_BWD_FILTER_NCHW_TRUE_HALF_82x4x79x9x2_12x4x2x13x5_SAME"
}
test {
input {
data_type: DATA_HALF
format: TENSOR_NCHW
}
filter {
data_type: DATA_HALF
format: TENSOR_NCHW
}
convolution {
compute_mode: DATA_HALF
math_type: DEFAULT_MATH
}
all_algos: CONVOLUTION_BWD_FILTER
}

1 FAILED TEST

Windows support

I would find this incredibly helpful to verify correctness of my CuDNN installation, but it appears to me that the bazel code supports online Linux. (Please correct me if I am wrong.)

An issue was reported when running cuda-memcheck --tool racecheck

When running the commanding "cuda-memcheck --tool racecheck --print-level error --flush-to-disk no --error-exitcode 1 /usr/bin/bazel run //:cudnn_test --action_env=CUDNN_PATH=/home/swqa/.vulcan/install/cuda --action_env=CUDA_PATH=/home/swqa/.vulcan/install/cuda -- --gtest_filter=CONVOLUTION_FWD_NCHW_TENSOR_OP_52x7x112x4_873x7x3x3_VALID_GetAlgo_v7" on TITAN V, the following issue was reported
"
[ RUN ] FromFile/ConvolutionTest.CompareResults/CONVOLUTION_FWD_NCHW_TENSOR_OP_52x7x112x4_873x7x3x3_VALID_GetAlgo_v7
F1023 04:04:30.495419 17575 cuda_util.cc:92] Check failed: OkStatus() == GetStatus(cudaFree(ptr_)) (ok vs. CUDA Runtime API error 'an illegal memory access was encountered')
*** Check failure stack trace: ***
@ 0x186dde0 google::LogMessage::Fail()
@ 0x186dd24 google::LogMessage::SendToLog()
@ 0x186d675 google::LogMessage::Flush()
@ 0x1870aee google::LogMessageFatal::~LogMessageFatal()
@ 0x46c42b nvidia_libs_test::DeviceMemory::~DeviceMemory()
@ 0x40e9d9 _ZN16nvidia_libs_test12_GLOBAL__N_114RunConvolutionEddRKSt10unique_ptrI12cudnnContextNS_6detail18CudnnHandleDeleterEERKNS_11ConvolutionERKN4absl7variantIJ25cudnnConvolutionFwdAlgo_t29cudnnConvolutionBwdDataAlgo_t31cudnnConvolutionBwdFilterAlgo_tEEE
@ 0x410b42 nvidia_libs_test::(anonymous namespace)::ConvolutionTest_CompareResults_Test::TestBody()
@ 0x18bf017 testing::internal::HandleSehExceptionsInMethodIfSupported<>()
@ 0x18ba07f testing::internal::HandleExceptionsInMethodIfSupported<>()
@ 0x189f35e testing::Test::Run()
@ 0x189fc50 testing::TestInfo::Run()
@ 0x18a02a5 testing::TestCase::Run()
@ 0x18a72a1 testing::internal::UnitTestImpl::RunAllTests()
@ 0x18bfd3f testing::internal::HandleSehExceptionsInMethodIfSupported<>()
@ 0x18bacb5 testing::internal::HandleExceptionsInMethodIfSupported<>()
@ 0x18a5f0f testing::UnitTest::Run()
@ 0x451181 RUN_ALL_TESTS()
@ 0x4509e8 main
@ 0x7fb41c5ff830 __libc_start_main
@ 0x40d639 _start
@ (nil) (unknown)
========= CUDA-MEMCHECK
========= RACECHECK SUMMARY: 0 hazards displayed (0 errors, 0 warnings)
"

Thanks
Bo

cuda/extras/CUPTI/include/cupti_result.h relocated in CUDA11.1

this file is relocated under /usr/local/cuda in CUDA 11.1(installed from local deb.)

cuda_util.h:26:10: fatal error: cuda/extras/CUPTI/include/cupti_result.h: No such file or directory
 #include "cuda/extras/CUPTI/include/cupti_result.h"
          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
Target //:cudnn_test failed to build
Use --verbose_failures to see the command lines of failed build steps.

build failed

bazel run //:cudnn_benchmark
ERROR: /data//.cache/bazel/bazel/dad4edfde16590c5e6cb1df1644cc6bb/external/rules_proto/proto/private/native.bzl:22:19: name 'ProtoInfo' is not defined
ERROR: /data//tmp/nvidia_libs_test/BUILD:98:1: error loading package '@com_google_protobuf//': Extension 'proto/private/native.bzl' has errors and referenced by '//:cudnn_benchmark'
ERROR: Analysis of target '//:cudnn_benchmark' failed; build aborted: error loading package '@com_google_protobuf//': Extension 'proto/private/native.bzl' has errors
INFO: Elapsed time: 5.034s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (6 packages loaded)
currently loading: @com_google_protobuf// ... (3 packages)
ERROR: Build failed. Not running target

kernel_timer.cc build failed with "error: 'uint64' was not declared in this scope domain_cycles.end(), uint64{0})"

Build cudnn_benchmark
bazel run //:cudnn_benchmark -c opt --action_env=CUDNN_PATH=/data/tmp/cuda --action_env=CUDA_PATH=/data/tmp/cuda
kernel_timer.cc build failed with the following error:
###############
kernel_timer.cc:194:68: error: 'uint64' was not declared in this scope
domain_cycles.end(), uint64{0});
^
################
This error was by caused by latest commit a378e0f
It should be uint64_t{0}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.