google / uvkcompute Goto Github PK

View Code? Open in Web Editor NEW

200.0 14.0 33.0 266 KB

A micro Vulkan compute pipeline and a collection of benchmarking compute shaders

License: Apache License 2.0

CMake 9.46% GLSL 11.61% C++ 75.50% Shell 0.47% PowerShell 0.37% Dockerfile 0.57% Python 1.08% C 0.94%

vulkan spirv benchmark glsl

uvkcompute's Introduction

µVkCompute

µVkCompute is a micro Vulkan compute pipeline and a collection of compute shaders for benchmarking/profiling purposes.

Rationale

Vulkan provides a ubiquitous way to access GPUs by many hardware vendors across different form factors on various platforms. The great reachability not only benefits graphics rendering; it can also be leveraged for general compute, given that Vulkan is both a graphics and compute API.

However, being able to target various GPUs does not mean one size fits all. Developers still needs to understand the characteristics of the target hardware to gain the best utilization. A simple pipeline and a collection of shaders to probe various characteristics of the target hardware often come as handy for the purpose. Thus this repository.

Goals

µVkCompute meant to provide a straightforward compute pipeline to facilitate writing compute shader microbenchmarks. It tries to

Hide Vulkan boilerplate that are required for every Vulkan application, e.g., Vulkan instance and device creation.
Simplify shader resource managemnet, e.g., using reflection over SPIR-V to construct pipeline layouts and compute pipelines.
Provide thin wrapper over command buffer construction and shader dispatch.

µVkCompute focuses more on single compute shader dispatch. µVkCompute does not try to demostrate Vulkan programming best practices. For example, it just uses the system allocator and allocates separate memory for each buffer. Simplicity is favored instead of building a production-level Vulkan application.

Dependencies

This repository requires a common C++ project development environment:

CMake with version >= 3.13
(Optional) the Ninja build system
A C/C++ compiler that supports C11/C++14
Python3

It additionally requires the Vulkan SDK, which will be used for both the Vulkan shared library and shader compilers like glslc for (GLSL) and dxc (for HLSL). Please make sure you have set the VULKAN_SDK environment variable.

Building and Running

Android

git clone https://github.com/google/uVkCompute.git
cd uVkCompute
git submodule update --init

cmake -G Ninja -S ./ -B build-android/  \
  -DCMAKE_TOOLCHAIN_FILE="${ANDROID_NDK?}/build/cmake/android.toolchain.cmake" \
  -DANDROID_ABI="arm64-v8a" -DANDROID_PLATFORM=android-29
cmake --build build-android/

Where ANDROID_NDK is the path to the Android NDK installation. See Android's CMake guide for explanation over ANDROID_ABI and ANROID_PLATFORM.

Afterwards, you can use adb push and adb shell to run the benchmark binaries generated into the build-android/ directory on Android devices. For example, for a benchmark binary bench at build-android/benchmarks/foo/bar/bench:

# Push the benchmark to the Android device
adb push build-android/benchmarks/foo/bar/bench /data/local/tmp
adb shell "cd /data/local/tmp && ./bench"

Note that for Android 10, if you see the "Failed to match any benchmarks against regex: ." error message, it means that no Vulkan ICDs (a.k.a., Vulkan vendor drivers) are discovered. This is a known issue that is fixed in Android 11. A workaround is to copy the Vulkan ICD (normally as /vendor/lib[64]/hw/vulkan.*.so) to /data/local/tmp and run the benchmark binary with LD_LIBRARY_PATH=/data/local/tmp.

Linux/macOS

git clone https://github.com/google/uVkCompute.git
cd uVkCompute
git submodule update --init

cmake -G Ninja -S ./ -B build/
cmake --build build/

Afterwards you can run the benchmark binaries generated into the build/ directory on the host machine.

Windows

git clone https://github.com/google/uVkCompute.git
cd uVkCompute
git submodule update --init

cmake -G "Visual Studio 16 2019" -A x64 -S ./ -B build/
cmake --build build/

Afterwards you can run the benchmark binaries generated into the build/ directory on the host machine.

uvkcompute's People

Contributors

Stargazers

Watchers

uvkcompute's Issues

Build Failing @ Ubuntu 18.4

ninja: error: 'uvkc/benchmark/Vulkan::glslc', needed by 'uvkc/benchmark/void_shader_spirv_instance.inc', missing and no known rule to make it

Any idea how to deal with this?

Regards
Yao

Build Fail on Jetson Nano

I try to build it on Jetson nano and encountered 2 errors:

`glslc` not found

lunarg Vulkan SDK only provide x86_64 version
glslang repo only provide glslangValidator and spriv-xxx

Is there any possible to build/install an armglslc ?

absl error

uVkCompute/uvkc/vulkan/dynamic_symbols.cc:179:10: error: could not convert ‘syms’ from ‘std::unique_ptr<uvkc::vulkan::DynamicSymbols>’ to ‘absl::lts_20211102::StatusOr<std::unique_ptr<uvkc::vulkan::DynamicSymbols> >’
   return syms;

I have checked #14 and fixed it with cmake -DCMAKE_C_COMPILER=$(which clang) -DCMAKE_CXX_COMPILER=$(which clang++) ... If uVkCompute only support clang, would you please review this PR #16 ?

subgroup_arithmetic benchmark fails verification due to unexpected gl_SubgroupSize

Similar to #43 but for subgroup_arithmetic.

uVkCompute/benchmarks/subgroup/subgroup_arithmetic_main.cc:184: check error: destination buffer element #16 has incorrect value: expected to be 1 but found 32

Why large loop count will cause problem on integrated gpu?

I have a problem when writing gpgpu code by using vulkan. I don't know where to ask this question, so I put here to seek a answer.

This problem is from https://github.com/google/uVkCompute/tree/main/benchmarks/compute
I try to do the same benchmark test on my own vulkan framework. I found in integrated gpu like intel gpu, when kLoopSize is very large, the result will be wrong. But when I reduce the operation count(only 4 operations in one loop), It will work well too.
This example work well in discrete gpu like AMD and NVIDIA.

uVKCompute work well in both of them, why it would happend? It's hard to understand. I found the only different in pipeline is that I don't recreate command buffer but reuse it in command pool. But I don't think it will cause that difference.

Fix instance creation error on VULKAN_SDK >= 1.3.216 by opting-in to extension VK_KHR_PORTABILITY_subset

Related issue posted here: Encountered VK_ERROR_INCOMPATIBLE_DRIVER.

Proposed fix:

Adds the VK_INSTANCE_CREATE_ENUMERATE_PORTABILITY_BIT_KHR bit to the VkInstanceCreateInfo structure flags.
Adds VK_KHR_portability_enumeration, VK_KHR_get_physical_device_properties2 to the instance extensions list.

fails to compile on Linux (clang and gcc)

Archlinux latest as of today. tried with gcc and clang. clang is version 13.0.0-2

$ cmake -DCMAKE_C_COMPILER=$(which clang) -DCMAKE_CXX_COMPILER=$(which clang++)  -G Ninja -S ./ -B build/
$ cmake --build build/
[73/257] Building CXX object third_party/abseil-cpp/absl/synchronization/CMakeFiles/absl_graphcycles_internal.dir/internal/graphcycles.cc.o
FAILED: third_party/abseil-cpp/absl/synchronization/CMakeFiles/absl_graphcycles_internal.dir/internal/graphcycles.cc.o 
/usr/bin/clang++  -I/home/damjan/src/uVkCompute/third_party/abseil-cpp -w -Wall -Wextra -Weverything -Wno-c++98-compat-pedantic -Wno-conversion -Wno-covered-switch-default -Wno-deprecated -Wno-disabled-macro-expansion -Wno-double-promotion -Wno-comma -Wno-extra-semi -Wno-extra-semi-stmt -Wno-packed -Wno-padded -Wno-sign-compare -Wno-float-conversion -Wno-float-equal -Wno-format-nonliteral -Wno-gcc-compat -Wno-global-constructors -Wno-exit-time-destructors -Wno-non-modular-include-in-module -Wno-old-style-cast -Wno-range-loop-analysis -Wno-reserved-id-macro -Wno-shorten-64-to-32 -Wno-switch-enum -Wno-thread-safety-negative -Wno-unknown-warning-option -Wno-unreachable-code -Wno-unused-macros -Wno-weak-vtables -Wno-zero-as-null-pointer-constant -Wbitfield-enum-conversion -Wbool-conversion -Wconstant-conversion -Wenum-conversion -Wint-conversion -Wliteral-conversion -Wnon-literal-null-conversion -Wnull-conversion -Wobjc-literal-conversion -Wno-sign-conversion -Wstring-conversion -DNOMINMAX -std=gnu++14 -MD -MT third_party/abseil-cpp/absl/synchronization/CMakeFiles/absl_graphcycles_internal.dir/internal/graphcycles.cc.o -MF third_party/abseil-cpp/absl/synchronization/CMakeFiles/absl_graphcycles_internal.dir/internal/graphcycles.cc.o.d -o third_party/abseil-cpp/absl/synchronization/CMakeFiles/absl_graphcycles_internal.dir/internal/graphcycles.cc.o -c /home/damjan/src/uVkCompute/third_party/abseil-cpp/absl/synchronization/internal/graphcycles.cc
/home/damjan/src/uVkCompute/third_party/abseil-cpp/absl/synchronization/internal/graphcycles.cc:451:26: error: no member named 'numeric_limits' in namespace 'std'
  if (x->version == std::numeric_limits<uint32_t>::max()) {
                    ~~~~~^
/home/damjan/src/uVkCompute/third_party/abseil-cpp/absl/synchronization/internal/graphcycles.cc:451:41: error: unexpected type name 'uint32_t': expected expression
  if (x->version == std::numeric_limits<uint32_t>::max()) {
                                        ^
/home/damjan/src/uVkCompute/third_party/abseil-cpp/absl/synchronization/internal/graphcycles.cc:451:52: error: no member named 'max' in the global namespace
  if (x->version == std::numeric_limits<uint32_t>::max()) {
                                                 ~~^
3 errors generated.
[86/257] Building CXX object third_party/abseil-cpp/absl/strings/CMakeFiles/absl_cord.dir/cord.cc.o
ninja: build stopped: subcommand failed.

Inconsistent gl_SubgroupSize across different GPUs and Vulkan versions/extensions

This Intel device reports:

subgroupSize 32
minSubgroupSize 8
maxSubgroupSize 32 (Strangely a recent update set this to 8... investigating)

See discussion at https://gitlab.freedesktop.org/mesa/mesa/-/blob/698344b93c49a9f3a257a0ef4546edf5cd3a9130/src/intel/compiler/brw_compiler.h#L159

But the shader copy_storage_buffer_scalar.glsl uses gl_SubgroupSize to stride across the data.
It has value 32. But when the actual subgroup size is 8, that means we only write 1/4 of the data, and the test fails its own validation.

New vector-times-matrix-transposed benchmark fails to run on Nvidia GPUs..

Hi,
running on Nvidia 4070 I get:

uVkCompute/build/benchmarks/vmt
 ./vmt_rdna3
2023-11-07T17:08:45+01:00
Running ./vmt_rdna3
Run on (32 X 5881 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x16)
  L1 Instruction 32 KiB (x16)
  L2 Unified 1024 KiB (x16)
  L3 Unified 32768 KiB (x2)
Load Average: 8.08, 5.68, 2.31
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
***WARNING*** Library was built as DEBUG. Timings may be affected.
uVkCompute/benchmarks/vmt/vmt_main.cc:123: check error: destination buffer element (0) has incorrect value: expected to be 1404 but found -1
        ^ In shader: Tile[1x16], i8->i32
Abortado (`core' generado)

Benchmark mad crash on Jetson Nano

mad_throughput crashed on Jetson Nano and throw VK_ERROR_DEVICE_LOST.

This is call chain:

mad_throughput_main.cc:189  --->   GetDeviceBufferViaStagingBuffer -->  vulkan_buffer_util.cc:67 ---->  QueueSubmitAndWait ---> crash

No nullptr or bad variable found.

I have tried to fix it by validation layer, but Jetson Nano does not support it ... 0 == layerCount

$ vulkaninfo
Instance Extensions:
====================
Instance Extensions	count = 16
	VK_KHR_device_group_creation        : extension revision  1
	VK_KHR_display                      : extension revision 23
	VK_KHR_external_fence_capabilities  : extension revision  1
	VK_KHR_external_memory_capabilities : extension revision  1
	VK_KHR_external_semaphore_capabilities: extension revision  1
	VK_KHR_get_display_properties2      : extension revision  1
	VK_KHR_get_physical_device_properties2: extension revision  2
	VK_KHR_get_surface_capabilities2    : extension revision  1
	VK_KHR_surface                      : extension revision 25
	VK_KHR_surface_protected_capabilities: extension revision  1
	VK_KHR_wayland_surface              : extension revision  6
	VK_KHR_xcb_surface                  : extension revision  6
	VK_KHR_xlib_surface                 : extension revision  6
	VK_EXT_debug_report                 : extension revision  9
	VK_EXT_debug_utils                  : extension revision  1
	VK_EXT_display_surface_counter      : extension revision  1
Layers: count = 0

this is my draft PR #17

How to run this om Android devices?

I have successfully build the project for Android and copied one benchmark over to a device, but when trying to run any of the benchmarks I get "permission denied". I tried with a Google Pixel 4a and a rooted Xiaomi Redmi 4X. Is there anything special required to have this run on Android?