Git Product home page Git Product logo

uvkcompute's Introduction

µVkCompute

Android/Linux Build Status Windows Build status

µVkCompute is a micro Vulkan compute pipeline and a collection of compute shaders for benchmarking/profiling purposes.

Rationale

Vulkan provides a ubiquitous way to access GPUs by many hardware vendors across different form factors on various platforms. The great reachability not only benefits graphics rendering; it can also be leveraged for general compute, given that Vulkan is both a graphics and compute API.

However, being able to target various GPUs does not mean one size fits all. Developers still needs to understand the characteristics of the target hardware to gain the best utilization. A simple pipeline and a collection of shaders to probe various characteristics of the target hardware often come as handy for the purpose. Thus this repository.

Goals

µVkCompute meant to provide a straightforward compute pipeline to facilitate writing compute shader microbenchmarks. It tries to

  • Hide Vulkan boilerplate that are required for every Vulkan application, e.g., Vulkan instance and device creation.
  • Simplify shader resource managemnet, e.g., using reflection over SPIR-V to construct pipeline layouts and compute pipelines.
  • Provide thin wrapper over command buffer construction and shader dispatch.

µVkCompute focuses more on single compute shader dispatch. µVkCompute does not try to demostrate Vulkan programming best practices. For example, it just uses the system allocator and allocates separate memory for each buffer. Simplicity is favored instead of building a production-level Vulkan application.

Dependencies

This repository requires a common C++ project development environment:

  • CMake with version >= 3.13
  • (Optional) the Ninja build system
  • A C/C++ compiler that supports C11/C++14
  • Python3

It additionally requires the Vulkan SDK, which will be used for both the Vulkan shared library and shader compilers like glslc for (GLSL) and dxc (for HLSL). Please make sure you have set the VULKAN_SDK environment variable.

Building and Running

Android

git clone https://github.com/google/uVkCompute.git
cd uVkCompute
git submodule update --init

cmake -G Ninja -S ./ -B build-android/  \
  -DCMAKE_TOOLCHAIN_FILE="${ANDROID_NDK?}/build/cmake/android.toolchain.cmake" \
  -DANDROID_ABI="arm64-v8a" -DANDROID_PLATFORM=android-29
cmake --build build-android/

Where ANDROID_NDK is the path to the Android NDK installation. See Android's CMake guide for explanation over ANDROID_ABI and ANROID_PLATFORM.

Afterwards, you can use adb push and adb shell to run the benchmark binaries generated into the build-android/ directory on Android devices. For example, for a benchmark binary bench at build-android/benchmarks/foo/bar/bench:

# Push the benchmark to the Android device
adb push build-android/benchmarks/foo/bar/bench /data/local/tmp
adb shell "cd /data/local/tmp && ./bench"

Note that for Android 10, if you see the "Failed to match any benchmarks against regex: ." error message, it means that no Vulkan ICDs (a.k.a., Vulkan vendor drivers) are discovered. This is a known issue that is fixed in Android 11. A workaround is to copy the Vulkan ICD (normally as /vendor/lib[64]/hw/vulkan.*.so) to /data/local/tmp and run the benchmark binary with LD_LIBRARY_PATH=/data/local/tmp.

Linux/macOS

git clone https://github.com/google/uVkCompute.git
cd uVkCompute
git submodule update --init

cmake -G Ninja -S ./ -B build/
cmake --build build/

Afterwards you can run the benchmark binaries generated into the build/ directory on the host machine.

Windows

git clone https://github.com/google/uVkCompute.git
cd uVkCompute
git submodule update --init

cmake -G "Visual Studio 16 2019" -A x64 -S ./ -B build/
cmake --build build/

Afterwards you can run the benchmark binaries generated into the build/ directory on the host machine.

uvkcompute's People

Contributors

antiagainst avatar dneto0 avatar ergawy avatar kdub avatar kuhar avatar sofiageo avatar thomasraoux avatar tpoisonooo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

uvkcompute's Issues

Build Failing @ Ubuntu 18.4

ninja: error: 'uvkc/benchmark/Vulkan::glslc', needed by 'uvkc/benchmark/void_shader_spirv_instance.inc', missing and no known rule to make it

Any idea how to deal with this?

Regards
Yao

Build Fail on Jetson Nano

I try to build it on Jetson nano and encountered 2 errors:

glslc not found

  • lunarg Vulkan SDK only provide x86_64 version
  • glslang repo only provide glslangValidator and spriv-xxx

Is there any possible to build/install an armglslc ?

absl error

uVkCompute/uvkc/vulkan/dynamic_symbols.cc:179:10: error: could not convert ‘syms’ from ‘std::unique_ptr<uvkc::vulkan::DynamicSymbols>’ to ‘absl::lts_20211102::StatusOr<std::unique_ptr<uvkc::vulkan::DynamicSymbols> >return syms;

I have checked #14 and fixed it with cmake -DCMAKE_C_COMPILER=$(which clang) -DCMAKE_CXX_COMPILER=$(which clang++) ... If uVkCompute only support clang, would you please review this PR #16 ?

Why large loop count will cause problem on integrated gpu?

I have a problem when writing gpgpu code by using vulkan. I don't know where to ask this question, so I put here to seek a answer.

This problem is from https://github.com/google/uVkCompute/tree/main/benchmarks/compute
I try to do the same benchmark test on my own vulkan framework. I found in integrated gpu like intel gpu, when kLoopSize is very large, the result will be wrong. But when I reduce the operation count(only 4 operations in one loop), It will work well too.
This example work well in discrete gpu like AMD and NVIDIA.

uVKCompute work well in both of them, why it would happend? It's hard to understand. I found the only different in pipeline is that I don't recreate command buffer but reuse it in command pool. But I don't think it will cause that difference.

fails to compile on Linux (clang and gcc)

Archlinux latest as of today. tried with gcc and clang. clang is version 13.0.0-2

$ cmake -DCMAKE_C_COMPILER=$(which clang) -DCMAKE_CXX_COMPILER=$(which clang++)  -G Ninja -S ./ -B build/
$ cmake --build build/
[73/257] Building CXX object third_party/abseil-cpp/absl/synchronization/CMakeFiles/absl_graphcycles_internal.dir/internal/graphcycles.cc.o
FAILED: third_party/abseil-cpp/absl/synchronization/CMakeFiles/absl_graphcycles_internal.dir/internal/graphcycles.cc.o 
/usr/bin/clang++  -I/home/damjan/src/uVkCompute/third_party/abseil-cpp -w -Wall -Wextra -Weverything -Wno-c++98-compat-pedantic -Wno-conversion -Wno-covered-switch-default -Wno-deprecated -Wno-disabled-macro-expansion -Wno-double-promotion -Wno-comma -Wno-extra-semi -Wno-extra-semi-stmt -Wno-packed -Wno-padded -Wno-sign-compare -Wno-float-conversion -Wno-float-equal -Wno-format-nonliteral -Wno-gcc-compat -Wno-global-constructors -Wno-exit-time-destructors -Wno-non-modular-include-in-module -Wno-old-style-cast -Wno-range-loop-analysis -Wno-reserved-id-macro -Wno-shorten-64-to-32 -Wno-switch-enum -Wno-thread-safety-negative -Wno-unknown-warning-option -Wno-unreachable-code -Wno-unused-macros -Wno-weak-vtables -Wno-zero-as-null-pointer-constant -Wbitfield-enum-conversion -Wbool-conversion -Wconstant-conversion -Wenum-conversion -Wint-conversion -Wliteral-conversion -Wnon-literal-null-conversion -Wnull-conversion -Wobjc-literal-conversion -Wno-sign-conversion -Wstring-conversion -DNOMINMAX -std=gnu++14 -MD -MT third_party/abseil-cpp/absl/synchronization/CMakeFiles/absl_graphcycles_internal.dir/internal/graphcycles.cc.o -MF third_party/abseil-cpp/absl/synchronization/CMakeFiles/absl_graphcycles_internal.dir/internal/graphcycles.cc.o.d -o third_party/abseil-cpp/absl/synchronization/CMakeFiles/absl_graphcycles_internal.dir/internal/graphcycles.cc.o -c /home/damjan/src/uVkCompute/third_party/abseil-cpp/absl/synchronization/internal/graphcycles.cc
/home/damjan/src/uVkCompute/third_party/abseil-cpp/absl/synchronization/internal/graphcycles.cc:451:26: error: no member named 'numeric_limits' in namespace 'std'
  if (x->version == std::numeric_limits<uint32_t>::max()) {
                    ~~~~~^
/home/damjan/src/uVkCompute/third_party/abseil-cpp/absl/synchronization/internal/graphcycles.cc:451:41: error: unexpected type name 'uint32_t': expected expression
  if (x->version == std::numeric_limits<uint32_t>::max()) {
                                        ^
/home/damjan/src/uVkCompute/third_party/abseil-cpp/absl/synchronization/internal/graphcycles.cc:451:52: error: no member named 'max' in the global namespace
  if (x->version == std::numeric_limits<uint32_t>::max()) {
                                                 ~~^
3 errors generated.
[86/257] Building CXX object third_party/abseil-cpp/absl/strings/CMakeFiles/absl_cord.dir/cord.cc.o
ninja: build stopped: subcommand failed.

Inconsistent gl_SubgroupSize across different GPUs and Vulkan versions/extensions

This Intel device reports:

  • subgroupSize 32
  • minSubgroupSize 8
  • maxSubgroupSize 32 (Strangely a recent update set this to 8... investigating)

See discussion at https://gitlab.freedesktop.org/mesa/mesa/-/blob/698344b93c49a9f3a257a0ef4546edf5cd3a9130/src/intel/compiler/brw_compiler.h#L159

But the shader copy_storage_buffer_scalar.glsl uses gl_SubgroupSize to stride across the data.
It has value 32. But when the actual subgroup size is 8, that means we only write 1/4 of the data, and the test fails its own validation.

New vector-times-matrix-transposed benchmark fails to run on Nvidia GPUs..

Hi,
running on Nvidia 4070 I get:

uVkCompute/build/benchmarks/vmt
 ./vmt_rdna3
2023-11-07T17:08:45+01:00
Running ./vmt_rdna3
Run on (32 X 5881 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x16)
  L1 Instruction 32 KiB (x16)
  L2 Unified 1024 KiB (x16)
  L3 Unified 32768 KiB (x2)
Load Average: 8.08, 5.68, 2.31
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
***WARNING*** Library was built as DEBUG. Timings may be affected.
uVkCompute/benchmarks/vmt/vmt_main.cc:123: check error: destination buffer element (0) has incorrect value: expected to be 1404 but found -1
        ^ In shader: Tile[1x16], i8->i32
Abortado (`core' generado)

Benchmark mad crash on Jetson Nano

mad_throughput crashed on Jetson Nano and throw VK_ERROR_DEVICE_LOST.

This is call chain:

mad_throughput_main.cc:189  --->   GetDeviceBufferViaStagingBuffer -->  vulkan_buffer_util.cc:67 ---->  QueueSubmitAndWait ---> crash

No nullptr or bad variable found.

I have tried to fix it by validation layer, but Jetson Nano does not support it ... 0 == layerCount

$ vulkaninfo
Instance Extensions:
====================
Instance Extensions	count = 16
	VK_KHR_device_group_creation        : extension revision  1
	VK_KHR_display                      : extension revision 23
	VK_KHR_external_fence_capabilities  : extension revision  1
	VK_KHR_external_memory_capabilities : extension revision  1
	VK_KHR_external_semaphore_capabilities: extension revision  1
	VK_KHR_get_display_properties2      : extension revision  1
	VK_KHR_get_physical_device_properties2: extension revision  2
	VK_KHR_get_surface_capabilities2    : extension revision  1
	VK_KHR_surface                      : extension revision 25
	VK_KHR_surface_protected_capabilities: extension revision  1
	VK_KHR_wayland_surface              : extension revision  6
	VK_KHR_xcb_surface                  : extension revision  6
	VK_KHR_xlib_surface                 : extension revision  6
	VK_EXT_debug_report                 : extension revision  9
	VK_EXT_debug_utils                  : extension revision  1
	VK_EXT_display_surface_counter      : extension revision  1
Layers: count = 0

this is my draft PR #17

How to run this om Android devices?

I have successfully build the project for Android and copied one benchmark over to a device, but when trying to run any of the benchmarks I get "permission denied". I tried with a Google Pixel 4a and a rooted Xiaomi Redmi 4X. Is there anything special required to have this run on Android?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.