Git Product home page Git Product logo

clpeak's Introduction

clpeak

Build Status Snap Status

A synthetic benchmarking tool to measure peak capabilities of opencl devices. It only measures the peak metrics that can be achieved using vector operations and does not represent a real-world use case

Building

git submodule update --init --recursive --remote
mkdir build
cd build
cmake ..
cmake --build .

Sample

Platform: NVIDIA CUDA
  Device: Tesla V100-SXM2-16GB
    Driver version  : 390.77 (Linux x64)
    Compute units   : 80
    Clock frequency : 1530 MHz

    Global memory bandwidth (GBPS)
      float   : 767.48
      float2  : 810.81
      float4  : 843.06
      float8  : 726.12
      float16 : 735.98

    Single-precision compute (GFLOPS)
      float   : 15680.96
      float2  : 15674.50
      float4  : 15645.58
      float8  : 15583.27
      float16 : 15466.50

    No half precision support! Skipped

    Double-precision compute (GFLOPS)
      double   : 7859.49
      double2  : 7849.96
      double4  : 7832.96
      double8  : 7799.82
      double16 : 7740.88

    Integer compute (GIOPS)
      int   : 15653.47
      int2  : 15654.40
      int4  : 15655.21
      int8  : 15659.04
      int16 : 15608.65

    Transfer bandwidth (GBPS)
      enqueueWriteBuffer         : 10.64
      enqueueReadBuffer          : 11.92
      enqueueMapBuffer(for read) : 9.97
        memcpy from mapped ptr   : 8.62
      enqueueUnmap(after write)  : 11.04
        memcpy to mapped ptr     : 9.16

    Kernel launch latency : 7.22 us

clpeak's People

Contributors

atomsymbol-notifications avatar ddemidov avatar dmtkats avatar doe300 avatar ekondis avatar ericwolter avatar espes avatar gos-k avatar hatzel avatar he-sk avatar hjmallon avatar hwaccel avatar iotamudelta avatar isowson avatar jjkeijser avatar krishnaraj-atonarp avatar krrishnarraj avatar nchristensen avatar paranlee avatar prior99 avatar ribalda avatar rigred avatar rjodinchr avatar ssanchez11 avatar svenstaro avatar tgurr avatar woachk avatar xchern avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

clpeak's Issues

Half Precision not detected for RTX 3090

clpeak version: 1.1.2

Platform: NVIDIA CUDA
  Device: NVIDIA GeForce RTX 3090
    Driver version  : 525.89.02 (Linux x64)
    Compute units   : 82
    Clock frequency : 1725 MHz

    Global memory bandwidth (GBPS)
      float   : 816.91
      float2  : 841.68
      float4  : 856.31
      float8  : 785.62
      float16 : 844.80

    Single-precision compute (GFLOPS)
      float   : 35976.15
      float2  : 35279.88
      float4  : 35448.44
      float8  : 35229.30
      float16 : 34781.18

    No half precision support! Skipped

    Double-precision compute (GFLOPS)
      double   : 635.40
      double2  : 634.58
      double4  : 633.12
      double8  : 630.11
      double16 : 624.10

    Integer compute (GIOPS)
      int   : 19650.09
      int2  : 19531.53
      int4  : 19486.43
      int8  : 19548.59
      int16 : 19539.19

    Integer compute Fast 24bit (GIOPS)
      int   : 19452.70
      int2  : 18920.43
      int4  : 19145.33
      int8  : 19143.94
      int16 : 19075.51

    Transfer bandwidth (GBPS)
      enqueueWriteBuffer              : 9.96
      enqueueReadBuffer               : 10.48
      enqueueWriteBuffer non-blocking : 5.47
      enqueueReadBuffer non-blocking  : 5.55
      enqueueMapBuffer(for read)      : 10.76
        memcpy from mapped ptr        : 15.20
      enqueueUnmap(after write)       : 13.04
        memcpy to mapped ptr          : 15.20

    Kernel launch latency : 3.56 us

Global memory bandwith test failed on R9 280X (Tahiti)

All I've got is clCreateBuffer (-4).

P. S. What's float/float2/4/8/16 and so on? A brief explanation in the README would be awesome, because I don't understand what it is. The results differ a lot in some cases so it doesn't seem to be individual runs of the same test.

Measurments are noisy / inaccurate.

It would be nice to have a flag, to override default number of iterations, time of each test, and number of repetitions, because right now, I feel they are a bit too short, and only run once, not giving much confidants about the performance. I.e. the GPU will not have much time to ramp up the core clock frequency, and be easily influenced by small background on CPU or GPU stuff running at the same time.

I easily see 15% deviations between repeated runs of clpeak, sometimes more.

clpeak does run any test.

branch: master
System: Intel 11700K with UHD750 and NVIDIA RTX 3050
OS: Windows 11
Visual Studio 2022(version 17).

edison@11700k MINGW64 /clpeak/build (master)
$ cmake --build . --config Release
用于 .NET Framework 的 Microsoft (R) 生成引擎版本 17.2.1+52cd2da31
版权所有(C) Microsoft Corporation。保留所有权利。

  Checking Build System
  Building Custom Rule D:/temp/PortableGit/clpeak/CMakeLists.txt
  common.cpp
  clpeak.cpp
  options.cpp
  logger.cpp
  global_bandwidth.cpp
  compute_sp.cpp
  compute_hp.cpp
  compute_dp.cpp
  compute_integer.cpp
  compute_integer_fast.cpp
  transfer_bandwidth.cpp
  kernel_latency.cpp
  entry.cpp
  正在生成代码...
  clpeak.vcxproj -> D:\temp\PortableGit\clpeak\build\Release\clpeak.exe
  Building Custom Rule D:/temp/PortableGit/clpeak/CMakeLists.txt

edison@11700k MINGW64 /clpeak/build (master)
$ cd Release/

edison@11700k MINGW64 /clpeak/build/Release (master)
$ ls
clpeak.exe*            compute_hp_kernels.cl     compute_integer_kernels.cl  global_bandwidth_kernels.cl
compute_dp_kernels.cl  compute_int24_kernels.cl  compute_sp_kernels.cl

edison@11700k MINGW64 /clpeak/build/Release (master)
$ ./clpeak.exe

Platform: Intel(R) OpenCL HD Graphics
  Device: Intel(R) UHD Graphics 750
    Driver version  : 27.20.100.9127 (Win64)
    Compute units   : 32
    Clock frequency : 1300 MHz

image

How can I fix this problem?

Kubuntu 22.04 LTS: clpeak 1.1.0-rc2 apt version works while clpeak snap version 1.1.2 don't

Hi all

Here follows a short bug report about clpeak on Kubuntu 22.04 LTS.

When I want to start the snap version of clpeak (1.1.2) I get:

clpeak
clGetPlatformIDs (-1001)
no platforms found

So no OpenCL environment is detected.

Interestingly the apt version of clpeak (1.1.0-rc2) starts up fine (although it crashes later because of an LLVM or Mesa clover bug). Edit: Maybe that crash is related somehow to LLVM bug 55698. But that is a different topic.

This happens with stock Ubuntu Mesa as well as with the devel version from the oibaf PPA. Currently installed is: Mesa 23.2.0-devel (git-1a24f43 2023-05-19 jammy-oibaf-ppa).

So at the moment the snap version of clpeak doesn't work. 😉

Build Log: input.cl:34:127: error: call to 'mad' is ambiguous

I'm getting the following error after running clpeak.

Platform: Clover^@
  Device: Radeon RX Vega (VEGA10 / DRM 3.23.0 / 4.15.0-23-generic, LLVM 6.0.0)^@
    Driver version  : 18.0.5^@ (Linux x64)
    Compute units   : 64
    Clock frequency : 1630 MHz
    Build Log: input.cl:34:127: error: call to 'mad' is ambiguous
input.cl:30:22: note: expanded from macro 'MAD_64'
input.cl:29:22: note: expanded from macro 'MAD_16'
input.cl:28:25: note: expanded from macro 'MAD_4'
/usr/include/clc/math/mad.inc:1:39: note: candidate function
/usr/include/clc/math/mad.inc:1:39: note: candidate function
/usr/include/clc/math/mad.inc:1:39: note: candidate function
/usr/include/clc/math/mad.inc:1:39: note: candidate function
/usr/include/clc/math/mad.inc:1:39: note: candidate function
/usr/include/clc/math/mad.inc:1:39: note: candidate function
/usr/include/clc/math/mad.inc:1:39: note: candidate function
/usr/include/clc/math/mad.inc:1:39: note: candidate function
/usr/include/clc/math/mad.inc:1:39: note: candidate function
/usr/include/clc/math/mad.inc:1:39: note: candidate function
/usr/include/clc/math/mad.inc:1:39: note: candidate function
/usr/include/clc/math/mad.inc:1:39: note: candidate function
input.cl:34:127: error: call to 'mad' is ambiguous
input.cl:30:22: note: expanded from macro 'MAD_64'
input.cl:29:22: note: expanded from macro 'MAD_16'
input.cl:28:43: note: expanded from macro 'MAD_4'
/usr/include/clc/math/mad.inc:1:39: note: candidate function
/usr/include/clc/math/mad.inc:1:39: note: candidate function
/usr/include/clc/math/mad.inc:1:39: note: candidate function
/usr/include/clc/math/mad.inc:1:39: note: candidate function
/usr/include/clc/math/mad.inc:1:39: note: candidate function
/usr/include/clc/math/mad.inc:1:39: note: candidate function
/usr/include/clc/math/mad.inc:1:39: note: candidate function
/usr/include/clc/math/mad.inc:1:39: note: candidate function
/usr/include/clc/math/mad.inc:1:39: note: candidate function
/usr/include/clc/math/mad.inc:1:39: note: candidate function
/usr/include/clc/math/mad.inc:1:39: note: candidate function
/usr/include/clc/math/mad.inc:1:39: note: candidate function
input.cl:34:127: error: call to 'mad' is ambiguous
input.cl:30:22: note: expanded from macro 'MAD_64'
input.cl:29:22: note: expanded from macro 'MAD_16'
input.cl:28:61: note: expanded from macro 'MAD_4'
/usr/include/clc/math/mad.inc:1:39: note: candidate function
/usr/include/clc/math/mad.inc:1:39: note: candidate function
/usr/include/clc/math/mad.inc:1:39: note: candidate function
/usr/include/clc/math/mad.inc:1:39: note: candidate function
/usr/include/clc/math/mad.inc:1:39: note: candidate function
/usr/include/clc/math/mad.inc:1:39: note: candidate function
/usr/include/clc/math/mad.inc:1:39: note: candidate function
/usr/include/clc/math/mad.inc:1:39: note: candidate function

clpeak crashes with runtime error "clGetPlatformIDs (-1001) no platforms found"

I am using:

Command to launch the container:

docker run -e ACCEPT_FSL_EULA=1 -it --rm --name=clpeak-container \
--net=host -v /dev:/dev \
-v /tmp:/tmp -v /run/udev/:/run/udev/ \
--cap-add CAP_SYS_TTY_CONFIG \
--device-cgroup-rule='c 4:* rmw'  --device-cgroup-rule='c 13:* rmw' \
--device-cgroup-rule='c 199:* rmw' --device-cgroup-rule='c 226:* rmw' \
opencl

clpeak crashes with runtime error, that prevents the benchmark from running:

 "clGetPlatformIDs (-1001)
 no platforms found"

Build failure on Ubuntu ARM

clpeak does not build on Ubuntu 13.10 ARM out-of-the box:
kraiskil@notdroid:/opencl/clpeak/build$ cmake ..
-- The C compiler identification is GNU 4.8.1
-- The CXX compiler identification is GNU 4.8.1
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Setting build type to Release
-- Selected OpenCL includes from /usr/include
-- Selected OpenCL lib /usr/lib/arm-linux-gnueabihf/libOpenCL.so
-- Configuring done
-- Generating done
-- Build files have been written to: /home/kraiskil/opencl/clpeak/build
kraiskil@notdroid:
/opencl/clpeak/build$ make
Scanning dependencies of target clpeak
[ 11%] Building CXX object CMakeFiles/clpeak.dir/src/common.cpp.o
In file included from /home/kraiskil/opencl/clpeak/src/include/common.h:7:0,
from /home/kraiskil/opencl/clpeak/src/common.cpp:1:
/usr/include/CL/cl.hpp:216:23: fatal error: emmintrin.h: No such file or directory
#include <emmintrin.h>
^
compilation terminated.
make[2]: *** [CMakeFiles/clpeak.dir/src/common.cpp.o] Error 1
make[1]: *** [CMakeFiles/clpeak.dir/all] Error 2
make: *** [all] Error 2

The cl.hpp is installed via Ubuntu repositories, the opencl-headers package version 1.2-2013.06.28-2. Updating to latest khronos.org cl.hpp does not help or change the error.

Using External Thunderbolt 3 GPU on Windows has suspiciously fast "Transfer bandwidth"

I am not sure whether this is an issue or me misusing the program. Please close if it isn't relevant.

I am using a Radeon RX 580 in a Sonnet Thunderbolt 3 chassis. When I build clpeak with the AMD APP SDK (which is very difficult to install because the installer seems to crash a lot) I get Transfer Bandwidth of ~8.5GB/s. This is far more than available over Thunderbolt 3.

Eventually I found the AMD OpenCL optimisation guide here (http://developer.amd.com/amd-accelerated-parallel-processing-app-sdk/opencl-optimization-guide/#50401315_92101), section "OpenCL Memory Object Properties". This states that if you create a buffer with CL_MEM_ALLOC_HOST_PTR is is created as "Pinned host memory shared by all devices in context (unless only device in context is CPU; then, host memory)" rather than "Device memory". So the data is only going to local memory. If I removed the CL_MEM_ALLOC_HOST_PTR option on the buffer creation call then I got a more realistic value.

clpeak fails for local-size not a power of two

Given a local size, which is not a power of two, all kernel-based clpeak tests fail with CL_INVALID_WORK_GROUP_SIZE, since the global work-size is not divisible by the local size.

Given this excerpt from compute_sp.cpp:

uint globalWIs = (devInfo.numCUs) * (devInfo.computeWgsPerCU) * (devInfo.maxWGSize);
uint t = MIN((globalWIs * sizeof(cl_float)), devInfo.maxAllocSize);
t = roundToPowOf2(t);
globalWIs = t / sizeof(cl_float);

The reason for this is the line t = roundToPowOf2(t) which forces the global work-size to be a power of two, regardless of whether the local size is. And since powers of two are only divisible by powers of two, enqueueing a kernel with these values fails.

results for AMD Radeon 6900XT connected via Thunderbolt 3 in Windows 11

results for AMD Radeon 6900XT connected via Thunderbolt 3 in Windows

Platform: AMD Accelerated Parallel Processing
Device: gfx1030
Driver version : 3516.0 (PAL,LC) (Win64)
Compute units : 40
Clock frequency : 2015 MHz

Global memory bandwidth (GBPS)
  float   : 446.26
  float2  : 479.50
  float4  : 489.94
  float8  : 497.28
  float16 : 500.41

Single-precision compute (GFLOPS)
  float   : 25405.74
  float2  : 24505.80
  float4  : 24389.60
  float8  : 23754.91
  float16 : 23152.01

Half-precision compute (GFLOPS)
  half   : 24808.24
  half2  : 49398.67
  half4  : 48301.48
  half8  : 46714.04
  half16 : 45684.66

Double-precision compute (GFLOPS)
  double   : 1609.84
  double2  : 1608.84
  double4  : 1605.60
  double8  : 1598.52
  double16 : 1585.71

Integer compute (GIOPS)
  int   : 5089.61
  int2  : 5043.19
  int4  : 5027.10
  int8  : 5001.98
  int16 : 4962.62

Integer compute Fast 24bit (GIOPS)
  int   : 21753.65
  int2  : 21389.28
  int4  : 21226.49
  int8  : 20617.82
  int16 : 18573.64

Transfer bandwidth (GBPS)
  enqueueWriteBuffer              : 16.95
  enqueueReadBuffer               : 17.03
  enqueueWriteBuffer non-blocking : 17.14
  enqueueReadBuffer non-blocking  : 17.20
  enqueueMapBuffer(for read)      : 313501.28
    memcpy from mapped ptr        : 17.20
  enqueueUnmap(after write)       : 1651910.50
    memcpy to mapped ptr          : 17.20

Kernel launch latency : 54.70 us

segfault in getPlatformVersion()

clpeak crashes on a parallella using coprthr-1.6.2:

gdb clpeak

(gdb) r
Starting program: /root/clpeak/build/clpeak
coprthr-1.6.2 (Freewill)

Platform: coprthr

Program received signal SIGSEGV, Segmentation fault.
0xb6fcc590 in clGetPlatformInfo () from /usr/lib/arm-linux-gnueabihf/libOpenCL.so.1
(gdb) bt
#0 0xb6fcc590 in clGetPlatformInfo () from /usr/lib/arm-linux-gnueabihf/libOpenCL.so.1
#1 0x00018770 in cl::detail::getPlatformVersion (platform=0xffffffff)

at /root/clpeak/src/include/override/CL/cl.hpp:1690

#2 0x000187d4 in cl::detail::getDevicePlatformVersion (device=0x44388)

at /root/clpeak/src/include/override/CL/cl.hpp:1700

clGetPlatformInfo() is called twice, but the second call with (platform=0xfffffff).

clinfo reports one platform and two devices:

clinfo

[5628] clinfo: report OpenCL platform and device information: 127.0.1.1
coprthr-1.6.2 (Freewill)
[5628] nplatforms1
[5628] clinfo: Number of platforms found = 1
[5628] clinfo: platform 0:
[5628] clinfo: CL_PLATFORM_PROFILE =
[5628] clinfo: CL_PLATFORM_VERSION = coprthr-1.6-CURRENT (Freewill)
[5628] clinfo: CL_PLATFORM_NAME = coprthr
[5628] clinfo: CL_PLATFORM_VENDOR = Brown Deer Technology, LLC.
[5628] clinfo: CL_PLATFORM_EXTENSIONS = cl_khr_icd
[5628] clinfo: Number of devices found for this platform = 2

latest master commit crashes with fedora 26

clpeak

Platform: Clover
Device: NV124
Driver version : 17.1.4 (Linux x64)
Compute units : 10
Clock frequency : 512 MHz
Build Log: invalid source

Platform: Intel Gen OCL Driver
Device: Intel(R) HD Graphics Haswell GT2 Mobile
Driver version : 1.4 (Linux x64)
Compute units : 20
Clock frequency : 1000 MHz
ASSERTION FAILED: Not supported
at file /builddir/build/BUILD/beignet-36f6a8b6b956ffed15d100abe677125d4a5aeaed/backend/src/backend/gen_insn_selection.cpp, function uint32_t gbe::getByteScatterGatherSize(gbe::Selection::Opaque&, gbe::ir::Type), line 4157
Trace/breakpoint trap

strace log here: https://paste.fedoraproject.org/paste/5RQSj31j5kBN80Q5hATaWw

ASSERTION FAILED: sel.hasHalfType()

Hello,

I am running clpeak from master with beignet 1.3.2 on openSUSE Leap 15.1. The same behavior with 3rd and 4th Intel CPU generations.


Platform: Intel Gen OCL Driver
  Device: Intel(R) HD Graphics IvyBridge GT1
    Driver version  : 1.3 (Linux x64)
    Compute units   : 6
    Clock frequency : 1000 MHz
ASSERTION FAILED: sel.hasHalfType()
  at file /home/abuild/rpmbuild/BUILD/Beignet-1.3.2-Source/backend/src/backend/gen_insn_selection.cpp, function void gbe::ConvertInstructionPattern::convert32bitsToSmall(gbe::Selection::Opaque&, const gbe::ir::ConvertInstruction&, bool&) const, line 5841
fish: './clpeak' terminated by signal SIGTRAP (Trace or breakpoint trap)

Platform: Intel Gen OCL Driver
  Device: Intel(R) HD Graphics Haswell GT2 Desktop
    Driver version  : 1.3 (Linux x64)
    Compute units   : 20
    Clock frequency : 1000 MHz
ASSERTION FAILED: sel.hasHalfType()
  at file /home/abuild/rpmbuild/BUILD/Beignet-1.3.2-Source/backend/src/backend/gen_insn_selection.cpp, function void gbe::ConvertInstructionPattern::convert32bitsToSmall(gbe::Selection::Opaque&, const gbe::ir::ConvertInstruction&, bool&) const, line 5841
Trace/breakpoint trap (core dumped)

When gdb is uses to trap the fault:

(gdb) bt
#0  0x00007fffec274e2d in ?? () from /usr/lib64/beignet//libgbe.so
#1  0x00007fffec3afcb1 in ?? () from /usr/lib64/beignet//libgbe.so
#2  0x00007fffec390105 in ?? () from /usr/lib64/beignet//libgbe.so
#3  0x00007fffec390701 in ?? () from /usr/lib64/beignet//libgbe.so
#4  0x00007fffec3914cd in ?? () from /usr/lib64/beignet//libgbe.so
#5  0x00007fffec44334b in ?? () from /usr/lib64/beignet//libgbe.so
#6  0x00007fffec2d3532 in ?? () from /usr/lib64/beignet//libgbe.so
#7  0x00007fffec45aa4e in ?? () from /usr/lib64/beignet//libgbe.so
#8  0x00007fffec2d70fd in ?? () from /usr/lib64/beignet//libgbe.so
#9  0x00007fffec2d7404 in ?? () from /usr/lib64/beignet//libgbe.so
#10 0x00007fffec45b412 in ?? () from /usr/lib64/beignet//libgbe.so
#11 0x00007fffec2e9158 in ?? () from /usr/lib64/beignet//libgbe.so
#12 0x00007ffff01da651 in ?? () from /usr/lib64/beignet//libcl.so
#13 0x00007ffff01d0237 in clBuildProgram () from /usr/lib64/beignet//libcl.so
#14 0x00007ffff7bc0b8f in clBuildProgram () from /usr/lib64/libOpenCL.so.1
#15 0x00000000004088a8 in cl::Program::build (this=this@entry=0x7fffffffde08, devices=std::vector of length 1, capacity 1 = {...}, options=options@entry=0x418feb " -cl-mad-enable ", 
    notifyFptr=notifyFptr@entry=0x0, data=data@entry=0x0) at /home/matwey/temp/clpeak/build/clhpp_install/include/CL/cl.hpp:5273
#16 0x0000000000407a9e in clPeak::runAll (this=this@entry=0x7fffffffe0e0) at /home/matwey/temp/clpeak/src/clpeak.cpp:98
#17 0x000000000040567d in main (argc=1, argv=0x7fffffffe208) at /home/matwey/temp/clpeak/src/entry.cpp:9

Even if this hardware configuration cannot be supported, I would expect more informative message than the crash.

Clpeak build failure observed : Could not find a package configuration file provided by "OpenCLHeaders"

Steps followed :

git clone https://github.com/krrishnarraj/clpeak
  cd clpeak
  git submodule update --init --recursive --remote
  mkdir build;cd build
  cmake ..
  cmake --build .

Cmake Error Trace :

cmake ..
-- Setting build type to Release
CMake Warning (dev) in CMakeLists.txt:
  No project() command is present.  The top-level CMakeLists.txt file must
  contain a literal, direct call to the project() command.  Add a line of
  code such as

  project(ProjectName)
  near the top of the file, but after cmake_minimum_required().
  CMake is pretending there is a "project(Project)" command on the first
  line.

This warning is for project developers.  Use -Wno-dev to suppress it.
-- Configuring done
-- Generating done
-- Build files have been written to: /home/xx/clpeak/build/clhpp/build
[ 12%] Performing update step for 'hpp_headers'
Current branch master is up to date.
[ 25%] Performing configure step for 'hpp_headers'
CMake Error at CMakeLists.txt:43 (find_package):

  By not providing "FindOpenCLHeaders.cmake" in CMAKE_MODULE_PATH this
  project has asked CMake to find a package configuration file provided by
  "OpenCLHeaders", but CMake did not find one.
  Could not find a package configuration file provided by "OpenCLHeaders"
  with any of the following names:

    OpenCLHeadersConfig.cmake
    openclheaders-config.cmake

 
  Add the installation prefix of "OpenCLHeaders" to CMAKE_PREFIX_PATH or set
  "OpenCLHeaders_DIR" to a directory containing one of the above files.  If
  "OpenCLHeaders" provides a separate development package or SDK, be sure it
  has been installed.

-- Configuring incomplete, errors occurred!
See also "/home/xx/clpeak/build/clhpp/build/hpp/src/hpp_headers-build/CMakeFiles/CMakeOutput.log".
make[2]: *** [CMakeFiles/hpp_headers.dir/build.make:107: hpp/src/hpp_headers-stamp/hpp_headers-configure] Error 1
make[1]: *** [CMakeFiles/Makefile2:76: CMakeFiles/hpp_headers.dir/all] Error 2
make: *** [Makefile:84: all] Error 2
-- Selected OpenCL includes from /usr/include;/home/xx/clpeak/build/clhpp_install/include
-- Selected OpenCL lib /usr/lib/x86_64-linux-gnu/libOpenCL.so
-- Configuring done
-- Generating done
-- Build files have been written to: /home/xx/clpeak/build

Support 128 bit integer MAD benchmark? (NV only)

Hi,
don't know if a good idea to add to this benchmark, as it's NV only for the moment,
but NV OpenCL 3.0 implementation since 510.xx drivers supports int128 type, and altough it's a non native integer format it's supported, and supposedly uses efficient PTX instructions for carry,etc..
curious to know perf on a MAD benchmark..

requires using new OpenCL NVVM 7.0 compiler using:
set NVCL_USE_NVVM70_COMPILER=1 or
export “NVCL_USE_NVVM70_COMPILER=1 on Linux

shown in "511.23-win11-win10-release-notes.pdf":
"128-bit integer types or “(un)signed long long” is available as a native data type in the new
compiler. This type is enabled by default and does not require any macros to be defined"

Build project failed

Run command cmake .., build failed log is:

[ 25%] Building C object CMakeFiles/OpenCL.dir/icd.c.o
In file included from /home/mlx/clpeak/build/icd/build/icd/src/icd/./inc/CL/cl.h:20,
from /home/mlx/clpeak/build/icd/build/icd/src/icd/icd.h:53,
from /home/mlx/clpeak/build/icd/build/icd/src/icd/icd.c:38:
/home/mlx/clpeak/build/icd/build/icd/src/icd/./inc/CL/cl_version.h:22:9: note: #pragma message: cl_version.h: CL_TARGET_OPENCL_VERSION is not defined. Defaulting to 300 (OpenCL 3.0)
22 | #pragma message("cl_version.h: CL_TARGET_OPENCL_VERSION is not defined. Defaulting to 300 (OpenCL 3.0)")
| ^~~~~~~
In file included from /home/mlx/clpeak/build/icd/build/icd/src/icd/icd_dispatch.h:70,
from /home/mlx/clpeak/build/icd/build/icd/src/icd/icd.c:39:
/home/mlx/clpeak/build/icd/build/icd/src/icd/./inc/CL/cl_gl_ext.h:18:9: note: #pragma message: All OpenGL-related extensions have been moved into cl_gl.h. Please include cl_gl.h directly.
18 | #pragma message("All OpenGL-related extensions have been moved into cl_gl.h. Please include cl_gl.h directly.")
| ^~~~~~~
In file included from /home/mlx/clpeak/build/icd/build/icd/src/icd/icd.c:39:
/home/mlx/clpeak/build/icd/build/icd/src/icd/icd_dispatch.h:418:56: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘attribute’ before ‘CL_EXT_SUFFIX__VERSION_2_0’
418 | size_t* /param_value_size_ret/) CL_EXT_SUFFIX__VERSION_2_0;
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
/home/mlx/clpeak/build/icd/build/icd/src/icd/icd_dispatch.h:756:51: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘attribute’ before ‘CL_EXT_SUFFIX__VERSION_1_0_DEPRECATED’
756 | cl_command_queue_properties * old_properties) CL_EXT_SUFFIX__VERSION_1_0_DEPRECATED;
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/mlx/clpeak/build/icd/build/icd/src/icd/icd_dispatch.h:766:42: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘attribute’ before ‘CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED’
766 | cl_int * errcode_ret) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED;
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/mlx/clpeak/build/icd/build/icd/src/icd/icd_dispatch.h:778:42: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘attribute’ before ‘CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED’
778 | cl_int * errcode_ret) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED;
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/mlx/clpeak/build/icd/build/icd/src/icd/icd_dispatch.h:780:74: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘attribute’ before ‘CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED’
780 | typedef CL_API_ENTRY cl_int (CL_API_CALL *KHRpfn_clUnloadCompiler)(void) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED;
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/mlx/clpeak/build/icd/build/icd/src/icd/icd_dispatch.h:784:32: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘attribute’ before ‘CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED’
784 | cl_event * event) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED;
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/mlx/clpeak/build/icd/build/icd/src/icd/icd_dispatch.h:789:34: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘attribute’ before ‘CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED’
789 | const cl_event * event_list) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED;
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/mlx/clpeak/build/icd/build/icd/src/icd/icd_dispatch.h:791:100: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘attribute’ before ‘CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED’
791 | typedef CL_API_ENTRY cl_int (CL_API_CALL *KHRpfn_clEnqueueBarrier)(cl_command_queue command_queue) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED;
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /home/mlx/clpeak/build/icd/build/icd/src/icd/icd.c:39:
/home/mlx/clpeak/build/icd/build/icd/src/icd/icd_dispatch.h:793:108: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘attribute’ before ‘CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED’
793 | typedef CL_API_ENTRY void * (CL_API_CALL *KHRpfn_clGetExtensionFunctionAddress)(const char *function_name) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED;
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/mlx/clpeak/build/icd/build/icd/src/icd/icd_dispatch.h:1317:5: error: unknown type name ‘KHRpfn_clSetCommandQueueProperty’
1317 | KHRpfn_clSetCommandQueueProperty clSetCommandQueueProperty;
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/mlx/clpeak/build/icd/build/icd/src/icd/icd_dispatch.h:1319:5: error: unknown type name ‘KHRpfn_clCreateImage2D’
1319 | KHRpfn_clCreateImage2D clCreateImage2D;
| ^~~~~~~~~~~~~~~~~~~~~~
/home/mlx/clpeak/build/icd/build/icd/src/icd/icd_dispatch.h:1320:5: error: unknown type name ‘KHRpfn_clCreateImage3D’
1320 | KHRpfn_clCreateImage3D clCreateImage3D;
| ^~~~~~~~~~~~~~~~~~~~~~
/home/mlx/clpeak/build/icd/build/icd/src/icd/icd_dispatch.h:1335:5: error: unknown type name ‘KHRpfn_clUnloadCompiler’
1335 | KHRpfn_clUnloadCompiler clUnloadCompiler;
| ^~~~~~~~~~~~~~~~~~~~~~~
/home/mlx/clpeak/build/icd/build/icd/src/icd/icd_dispatch.h:1366:5: error: unknown type name ‘KHRpfn_clEnqueueMarker’
1366 | KHRpfn_clEnqueueMarker clEnqueueMarker;
| ^~~~~~~~~~~~~~~~~~~~~~
/home/mlx/clpeak/build/icd/build/icd/src/icd/icd_dispatch.h:1367:5: error: unknown type name ‘KHRpfn_clEnqueueWaitForEvents’
1367 | KHRpfn_clEnqueueWaitForEvents clEnqueueWaitForEvents;
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/mlx/clpeak/build/icd/build/icd/src/icd/icd_dispatch.h:1368:5: error: unknown type name ‘KHRpfn_clEnqueueBarrier’
1368 | KHRpfn_clEnqueueBarrier clEnqueueBarrier;
| ^~~~~~~~~~~~~~~~~~~~~~~
/home/mlx/clpeak/build/icd/build/icd/src/icd/icd_dispatch.h:1369:5: error: unknown type name ‘KHRpfn_clGetExtensionFunctionAddress’
1369 | KHRpfn_clGetExtensionFunctionAddress clGetExtensionFunctionAddress;
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/mlx/clpeak/build/icd/build/icd/src/icd/icd_dispatch.h:1462:5: error: unknown type name ‘KHRpfn_clGetKernelSubGroupInfoKHR’
1462 | KHRpfn_clGetKernelSubGroupInfoKHR clGetKernelSubGroupInfoKHR;
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
make[5]: *** [CMakeFiles/OpenCL.dir/build.make:63: CMakeFiles/OpenCL.dir/icd.c.o] Error 1
make[4]: *** [CMakeFiles/Makefile2:76: CMakeFiles/OpenCL.dir/all] Error 2
make[3]: *** [Makefile:130: all] Error 2
make[2]: *** [CMakeFiles/icd_build.dir/build.make:114: icd/src/icd_build-stamp/icd_build-build] Error 2
make[1]: *** [CMakeFiles/Makefile2:81: CMakeFiles/icd_build.dir/all] Error 2
make: *** [Makefile:84: all] Error 2
CMake Error at /usr/share/cmake-3.16/Modules/FindPackageHandleStandardArgs.cmake:146 (message):
Could NOT find OpenCL (missing: OpenCL_LIBRARY OpenCL_INCLUDE_DIR)
Call Stack (most recent call first):
/usr/share/cmake-3.16/Modules/FindPackageHandleStandardArgs.cmake:393 (_FPHSA_FAILURE_MESSAGE)
/usr/share/cmake-3.16/Modules/FindOpenCL.cmake:150 (find_package_handle_standard_args)
CMakeLists.txt:17 (find_package)

-- Configuring incomplete, errors occurred!

results for AMD Radeon RX 6900 XT connected via Thunderbolt 3 in Ubuntu 22.04 / linux 5.15.0-60

results for AMD Radeon RX 6900 XT connected via Thunderbolt 3 in Ubuntu 22.04 / linux 5.15.0-60
clpeak version: 1.1.2

./clpeak -dn gfx1030

Platform: AMD Accelerated Parallel Processing
Device: gfx1030
Driver version : 3513.0 (HSA1.1,LC) (Linux x64)
Compute units : 40
Clock frequency : 2660 MHz

Global memory bandwidth (GBPS)
  float   : 423.79
  float2  : 441.29
  float4  : 446.95
  float8  : 456.39
  float16 : 477.69

Single-precision compute (GFLOPS)
  float   : 24483.21
  float2  : 22691.08
  float4  : 23121.26
  float8  : 22931.97
  float16 : 22159.95

Half-precision compute (GFLOPS)
  half   : 23514.74
  half2  : 46242.11
  half4  : 45222.88
  half8  : 42697.05
  half16 : 42386.64

Double-precision compute (GFLOPS)
  double   : 1581.64
  double2  : 1581.80
  double4  : 1578.37
  double8  : 1565.53
  double16 : 1536.39

Integer compute (GIOPS)
  int   : 6056.93
  int2  : 5219.01
  int4  : 5554.51
  int8  : 5781.25
  int16 : 5647.09

Integer compute Fast 24bit (GIOPS)
  int   : 19833.61
  int2  : 19475.66
  int4  : 19342.05
  int8  : 18979.65
  int16 : 19366.18

Transfer bandwidth (GBPS)
  enqueueWriteBuffer              : 19.48
  enqueueReadBuffer               : 20.26
  enqueueWriteBuffer non-blocking : 20.46
  enqueueReadBuffer non-blocking  : 20.72
  enqueueMapBuffer(for read)      : 517465.91
    memcpy from mapped ptr        : 20.57
  enqueueUnmap(after write)       : 1022611.31
    memcpy to mapped ptr          : 20.02

Kernel launch latency : 20.53 us

Making a new release to fix #73

clpeak is getting packaged and I found that release 1.1.0 is no longer able to be built because of the deprecation of <CL/cl.hpp> which is mentioned in #73 and fixed in this commit. Could you please release the change so that it can be packaged?

results for AMD Radeon VII connected via Thunderbolt 3 in Windows 11

clpeak 1.1.2

Platform: AMD Accelerated Parallel Processing
Device: gfx906
Driver version : 3516.0 (PAL,HSAIL) (Win64)
Compute units : 60
Clock frequency : 1801 MHz

Global memory bandwidth (GBPS)
  float   : 822.46
  float2  : 840.00
  float4  : 840.63
  float8  : 782.67
  float16 : 692.59

Single-precision compute (GFLOPS)
  float   : 13705.64
  float2  : 13681.48
  float4  : 13649.84
  float8  : 13563.53
  float16 : 13370.20

Half-precision compute (GFLOPS)
  half   : 9133.37
  half2  : 26691.18
  half4  : 26193.26
  half8  : 25393.31
  half16 : 23530.63

Double-precision compute (GFLOPS)
  double   : 3434.87
  double2  : 3430.05
  double4  : 3416.19
  double8  : 3410.32
  double16 : 3364.18

Integer compute (GIOPS)
  int   : 4536.94
  int2  : 4494.69
  int4  : 4507.57
  int8  : 4501.87
  int16 : 4504.83

Integer compute Fast 24bit (GIOPS)
  int   : 13274.74
  int2  : 12844.57
  int4  : 12779.77
  int8  : 12491.75
  int16 : 12344.62

Transfer bandwidth (GBPS)
  enqueueWriteBuffer              : 16.78
  enqueueReadBuffer               : 17.05
  enqueueWriteBuffer non-blocking : 17.08
  enqueueReadBuffer non-blocking  : 17.12
  enqueueMapBuffer(for read)      : 383479.22
    memcpy from mapped ptr        : 17.18
  enqueueUnmap(after write)       : 1867377.12
    memcpy to mapped ptr          : 17.15

Kernel launch latency : 57.15 us

instructions for compiling clpeak for Windows

In Windows, install Visual Studio (VS) 2022 Community Edition configured for Desktop C/C++ development. https://visualstudio.microsoft.com/vs/community/

git clone http://github.com/krrishnarraj/clpeak

Download the current release of OpenCL headers and library from https://github.com/KhronosGroup/OpenCL-SDK/releases

Do not use OCL_SDK_Light from https://github.com/GPUOpen-LibrariesAndSDKs/OCL-SDK/releases

Place the Khronos OpenCL-SDK in a file path of your choice.

Add Windows environment variables or add these variables to the CMakeSettings of a VS project's configuration.

Note that OpenCL_LIBRARY should be set to the fully qualified path of the OpenCL.lib file.

example of Windows environment variables:

OpenCL_INCLUDE_DIR
C:/Users/your_user_name/some_path/OpenCL-SDK-v2023.02.06-Win-x64/OpenCL-SDK-v2023.02.06-Win-x64/include

OpenCL_LIBRARY
C:/Users/your_user_name/some_path/OpenCL-SDK-v2023.02.06-Win-x64/OpenCL-SDK-v2023.02.06-Win-x64/lib/OpenCL.lib

example of CMakeSettings variables of a VS project configuration:

{
"name": "OpenCL_INCLUDE_DIR",
"value": "C:/Users/your_user_name/some_path/OpenCL-SDK-v2023.02.06-Win-x64/OpenCL-SDK-v2023.02.06-Win-x64/include",
"type": "PATH"
},
{
"name": "OpenCL_LIBRARY",
"value": "C:/Users/your_user_name/some_path/OpenCL-SDK-v2023.02.06-Win-x64/OpenCL-SDK-v2023.02.06-Win-x64/lib/OpenCL.lib",
"type": "FILEPATH"
}

In Visual Studio, build a release of clpeak.

Cmake generated path

Dear Krishnaraj Bhat,

Cmake generates a long path:
\opencl_sdk\build\opencl_sdk-prefix\src\opencl_sdk.git\modules\headers-cpp\modules\external\CMock\modules\vendor\c_exception\modules\vendor\unity...

In case is not only a small path cmake stops with error 'path too long'.

Regards,
Fx

Size of buffer is over maxAllocSize

In compute tests, following pattern is used:

globalWIs = roundToMultipleOf(t, devInfo.maxWGSize);

uint64_t globalWIs = (devInfo.numCUs) * (devInfo.computeWgsPerCU) * (devInfo.maxWGSize);
uint64_t t = MIN((globalWIs * sizeof(cl_float)), devInfo.maxAllocSize) / sizeof(cl_float);
globalWIs = roundToMultipleOf(t, devInfo.maxWGSize);

cl::Buffer outputBuf = cl::Buffer(ctx, CL_MEM_WRITE_ONLY, (globalWIs * sizeof(cl_float)));

t is computed as minimal of globalWIs * sizeof(cl_float) and maxAllocSize, let's assume that maxAllocSize is smaller.

in next line globalWIs is rounded to the next multiple of maxWGSize, it means it may be larger then t which is computed using maxAllocSize.

That will cause the buffer creation to fail, as it will be above maxAllocSize.

commit 7904f26499851488461fdbaad5667a09ba8ea920 fails to build on aarch64/ppc64 w/ gcc7

logs here:
https://kojipkgs.fedoraproject.org//work/tasks/6222/20056222/build.log
https://koji.fedoraproject.org/koji/taskinfo?taskID=20056219

[ 30%] Building CXX object CMakeFiles/clpeak.dir/src/logger.cpp.o
/usr/bin/c++ -I/builddir/build/BUILD/clpeak-7904f26499851488461fdbaad5667a09ba8ea920/deps/OpenCL-CLHPP/include -I/builddir/build/BUILD/clpeak-7904f26499851488461fdbaad5667a09ba8ea920/include -I/builddir/build/BUILD/clpeak-7904f26499851488461fdbaad5667a09ba8ea920/src/kernels -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -DNDEBUG -std=gnu++11 -march=native -fPIC -Wall -Wextra -Wno-deprecated-declarations -Wno-unused-parameter -Wno-ignored-attributes -o CMakeFiles/clpeak.dir/src/logger.cpp.o -c /builddir/build/BUILD/clpeak-7904f26499851488461fdbaad5667a09ba8ea920/src/logger.cpp
/builddir/build/BUILD/clpeak-7904f26499851488461fdbaad5667a09ba8ea920/src/clpeak.cpp: In member function 'int clPeak::runAll()':
/builddir/build/BUILD/clpeak-7904f26499851488461fdbaad5667a09ba8ea920/src/clpeak.cpp:51:33: error: 'OS_NAME' was not declared in this scope
log->xmlAppendAttribs("os", OS_NAME);
^~~~~~~
/builddir/build/BUILD/clpeak-7904f26499851488461fdbaad5667a09ba8ea920/src/clpeak.cpp:51:33: note: suggested alternative: 'LC_NAME'
log->xmlAppendAttribs("os", OS_NAME);
^~~~~~~
LC_NAME
/builddir/build/BUILD/clpeak-7904f26499851488461fdbaad5667a09ba8ea920/src/clpeak.cpp:105:61: error: expected ')' before 'OS_NAME'
log->print(devInfo.driverVersion); log->print(" (" OS_NAME ")" NEWLINE);
^~~~~~~
make[2]: *** [CMakeFiles/clpeak.dir/build.make:90: CMakeFiles/clpeak.dir/src/clpeak.cpp.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[2]: Leaving directory '/builddir/build/BUILD/clpeak-7904f26499851488461fdbaad5667a09ba8ea920/build'
make[1]: Leaving directory '/builddir/build/BUILD/clpeak-7904f26499851488461fdbaad5667a09ba8ea920/build'
make[1]: *** [CMakeFiles/Makefile2:71: CMakeFiles/clpeak.dir/all] Error 2

AMD 6700s/Linux no platforms found

Hello,

I have all the core ROCm 5.3.0 libraries installed on OpenSUSE Tumbleweed on the Asus laptop with the 6700s (Navi 23), such that I can get the desired output from running ./clinfo from the /opt/rocm-5.3.0/opencl/bin directory:

Number of platforms:                             1
  Platform Profile:                              FULL_PROFILE
  Platform Version:                              OpenCL 2.1 AMD-APP (3486.0)
  Platform Name:                                 AMD Accelerated Parallel Processing
  Platform Vendor:                               Advanced Micro Devices, Inc.
  Platform Extensions:                           cl_khr_icd cl_amd_event_callback

.... 

etc.

I installed clpeak 1.1.1 from here. I've tried running clpeak several ways, including as root and with HSA_OVERRIDE_GFX_VERSION=10.3.0, but they all give:

clGetPlatformIDs (-1001)
no platforms found

I have DRI_PRIME=1 set as a system-wide environmental variable, and know that I can use ROCm with this GPU on this Linux distro at least to some degree, and have done so with Pytorch and Tensorflow using the HSA_OVERRIDE_GFX_VERSION=10.3.0 method.

Is it possible that I need MIOpen installed? I have miopen-hip installed, and the link says miopen-hip and miopen-opencl cannot both be installed currently.

rocminfo gives

Agent 2                  
*******                  
  Name:                    gfx1032                            
  Uuid:                    GPU-XX                             
  Marketing Name:          AMD Radeon RX 6700S                
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
....                                                        
  Device Type:             GPU
....

among other things. Has anyone else tried using clpeak with this GPU on Linux? It might be as simple as creating a symbolic link or aliasing a command, but I'm not sure. Running cpu-x gives a similar message:

There is no platform with OpenCL support (CL_PLATFORM_NOT_FOUND_KHR)

results for NVIDIA GeForce RTX 4090 (at 248W power limit) in Windows 11

  Platform: NVIDIA CUDA
  Device: NVIDIA GeForce RTX 4090
    Driver version  : 531.61 (Win64)
    Compute units   : 128
    Clock frequency : 2520 MHz

    Global memory bandwidth (GBPS)
      float   : 866.65
      float2  : 888.99
      float4  : 909.81
      float8  : 920.69
      float16 : 921.32

    Single-precision compute (GFLOPS)
      float   : 71356.09
      float2  : 75607.30
      float4  : 76967.14
      float8  : 71584.66
      float16 : 70986.91

    No half precision support! Skipped

    Double-precision compute (GFLOPS)
      double   : 1289.16
      double2  : 1310.90
      double4  : 1364.63
      double8  : 1311.27
      double16 : 1356.09

    Integer compute (GIOPS)
      int   : 40810.12
      int2  : 35957.76
      int4  : 35848.03
      int8  : 35623.48
      int16 : 35670.32

    Integer compute Fast 24bit (GIOPS)
      int   : 36497.91
      int2  : 35032.96
      int4  : 35321.97
      int8  : 35034.14
      int16 : 35219.38

    Transfer bandwidth (GBPS)
      enqueueWriteBuffer              : 20.93
      enqueueReadBuffer               : 20.06
      enqueueWriteBuffer non-blocking : 20.93
      enqueueReadBuffer non-blocking  : 20.06
      enqueueMapBuffer(for read)      : 10.78
        memcpy from mapped ptr        : 28.55
      enqueueUnmap(after write)       : 26.87
        memcpy to mapped ptr          : 28.09

    Kernel launch latency : 8.36 us

Cmake for 64bit

Probably not yours, but I got wrong path to opencl.lib 32bit instead of 64bit (debug and release).
After changing C:\Program Files (x86)\Common Files\Intel\Shared Libraries\ia32\OpenCL.lib to
C:\Program Files (x86)\Common Files\Intel\Shared Libraries\OpenCL.lib
it works.

`half` should not be passed as kernel argument

The compute kernels testing half-precision performance pass a half value as kernel argument, even though this is forbidden by the specification (see the relevant bullet point here.

A possible solution would be to pass the extra argument as an int and then cast as appropriate.

(Most platforms seem to accept it, but beignet follows the spec more closely.)

double-precison test seg-faults on pocl

  1. doubel-precison compute test seg-faults on pocl
  2. integer compute test takes long long time to compile in llc step

These 2 tests are currently disabled for pocl

tracker thread

Questions about Global Memory Bandwidth

Hello,

I have one question about global memory bandwidth.
I find that global memory bandwidth may decrease for float8 and float16 in most of devices.
I hope to know the reason why global memory bandwidth decreases.
The following is my log from my MacPro.

Platform: Apple
  Device: Intel(R) Iris(TM) Graphics 6100
    Driver version  : 1.2(Apr 11 2017 16:38:15) (Macintosh)
    Compute units   : 48
    Clock frequency : 1050 MHz

    Global memory bandwidth (GBPS)
      float   : 13.71
      float2  : 14.28
      float4  : 14.75
      float8  : 7.58
      float16 : 3.97

    Single-precision compute (GFLOPS)
      float   : 631.50
      float2  : 641.30
      float4  : 640.04
      float8  : 638.97
      float16 : 634.22

    No half precision support! Skipped

    No double precision support! Skipped

    Transfer bandwidth (GBPS)
      enqueueWriteBuffer         : 4.81
      enqueueReadBuffer          : 5.41
      enqueueMapBuffer(for read) : 447.69
        memcpy from mapped ptr   : 4.64
      enqueueUnmap(after write)  : 6376.14
        memcpy to mapped ptr     : 5.27

    Kernel launch latency : 71.60 us

Thanks a lot.

exits on second GPU when hybrid graphics present

If I run optirun ./clpeak -d 0
I get: Platform: Intel Gen OCL Driver Device: Intel(R) HD Graphics Kabylake Desktop GT1.5 Driver version : 1.3 (Linux x64) Compute units : 24 Clock frequency : 1000 MHz and so on - it works;
But If I run: optirun ./clpeak -d 1, I get:
Platform: Intel Gen OCL Driver Platform: NVIDIA CUDA and that's all. It stops here. Doesn't continue with the Nvidia.

Wrong definitions for MIN/MAX macros

The MIN and MAX definitions in include/common.h should be fixed. With current approach, when used in compute_sp.cpp, for instance, the code

uint64_t t = MIN((globalWIs * sizeof(cl_float)), devInfo.maxAllocSize) / sizeof(cl_float);

leads to the following pre-processed code

uint64_t t = ((globalWIs * sizeof(cl_float)) < devInfo.maxAllocSize)? (globalWIs * sizeof(cl_float)): devInfo.maxAllocSize / sizeof(cl_float);

in which the division only affects to devInfo.maxAllocSize.

Possible workarounds are:

  • Add parenthesis to MAX and MIN definitions, or
  • Replace MAX and MIN and use instead std::max and std::min

Add lib64 EGL libGLES_mali.so

In main MainActivity.java program. libopenclSoPaths have /system/lib/egl/libGLES_mali.so in 32bit.

Android ≥ 9 in 64bit Exynos SoC that equiped mali GPU have /system/lib/egl/libGLES_mali.so

Miscellaneous observations

  • fix link for AMD APP SDK in CMakeLists.txt (ftr there are also direct links for Win)
  • AMD APP SDK is already setting $AMDAPPSDKROOT. Check and copy that into $OPENCL_ROOT
  • add a win64 configuration?
  • add binaries in release section (at least for windows, most are too noob to run cmake .)
  • #define CL_USE_DEPRECATED_OPENCL_2_0_APIS, so to avoid tens of clCreateSampler, clEnqueueTask and clCreateCommandQueue warnings (at least with AMD SDK, and at least if keeping this 1,2 is desired)
    situation might be different for others sdk tho

Also, these are my results:

Also also, is it normal for Transfer bandwidth (especially for enqueueMaps) to be so aleatory?

Unable to get debug mode in clpeak

I need to run gdb over the clpeak but I was getting "No Debugging Symbols found" warning.
Thus, to fix this I tried putting -g option in cmake commands in MakeFile but still it is not working for me.

Can anyone please share steps on how I can enable this ?

Cmake find library issue

Cmake doesn't find OpenCL with AMDGPU-pro.
As a hack/workaround I added the lib path to the find_library hints:
/opt/amdgpu-pro/lib/x86_64-linux-gnu/

I came across lots of bug reports about Cmake multiarch issues, but it's not clear what the "correct" solution is.

clpeak can't get all devices in multi devices platform

Hi,

clpeak use ctx.getInfo<CL_CONTEXT_DEVICES>() interface to get devices.
It will call OpenCL C API clCreateContextFromType eventually.
There is a description about clCreateContextFromType in OpenCL specification "clCreateContextFromType may return all or a subset of the actual physical devices present in the platform and that match device_type."
So on some platform, this method can't get all the devices.

There is another interface platform.getDevices() which also can get devices.
It will call OpenCL C API clGetDeviceIDs eventually.
Actually, clGetDeviceIDs also has similar description in the specification "clGetDeviceIDs may return all or a subset of the actual physical devices present in the platform and that match device_type."

We don't know which interface can return more devices on different platform.
So my idea is: we can call these two interfaces and use the one which returns more devices to get better compatibility.

Here is the patch I worked out clpeak_diff.txt
Can you give some suggestion for this issue and do you think this solution is acceptable?

clpeak does not work with POCL


Platform: Portable Computing Language
  Device: pthread-AMD FX(tm)-8350 Eight-Core Processor
    Driver version  : 1.1-pre (Linux x64)
    Compute units   : 8
    Clock frequency : 4000 MHz

    Global memory bandwidth (GBPS)
      float   : clEnqueueNDRangeKernel (-63)
      Tests skipped

    Single-precision compute (GFLOPS)
clCreateBuffer (-61)
      Tests skipped

    No half precision support! Skipped

    Double-precision compute (GFLOPS)
clCreateBuffer (-61)
      Tests skipped

    Integer compute (GIOPS)
clCreateBuffer (-61)
      Tests skipped

    Transfer bandwidth (GBPS)
      enqueueWriteBuffer         : 0.00
      enqueueReadBuffer          : 0.00
      enqueueMapBuffer(for read) : 0.00
        memcpy from mapped ptr   : inf
      enqueueUnmap(after write)  : 0.00
        memcpy to mapped ptr     : inf

    Kernel launch latency : 

clpeak stops responding after printing "Kernel launch latency:"

See pocl/pocl#522; they seem to think it's a clpeak issue.

Outputting to file makes bad text files

./clpeak > output.txt

This makes bad output files as the functions like "platforms[p].getInfo<CL_PLATFORM_NAME>()" seems to give ['A', 'p', 'p', 'l', 'e', '\0'] rather than ['A', 'p', 'p', 'l', 'e']. If I try to change cl.hpp to cl2.hpp then this problem is fixed. I didn't want to submit a pull request as I have only tested on macOS and it works when building with Xcode but CMAKE_BUILD_TYPE=Debug and building with make returns "Release Object (-34)" adn I haven't got that working yet.

diff --git a/include/clpeak.h b/include/clpeak.h
index 1d4c59f..156dc57 100644
--- a/include/clpeak.h
+++ b/include/clpeak.h
@@ -1,9 +1,10 @@
 #ifndef CLPEAK_HPP
 #define CLPEAK_HPP
 
-#define __CL_ENABLE_EXCEPTIONS
-
-#include <CL/cl.hpp>
+#define CL_HPP_ENABLE_EXCEPTIONS
+#define CL_HPP_MINIMUM_OPENCL_VERSION 100
+#define CL_HPP_TARGET_OPENCL_VERSION 100
+#include <CL/cl2.hpp>
 
 #include <iostream>
 #include <stdio.h>
diff --git a/src/clpeak.cpp b/src/clpeak.cpp
index 273447a..e44d085 100644
--- a/src/clpeak.cpp
+++ b/src/clpeak.cpp
@@ -2,7 +2,7 @@
 
 #define MSTRINGIFY(...) #__VA_ARGS__
 
-static const char *stringifiedKernels =
+static const string stringifiedKernels =
     #include "global_bandwidth_kernels.cl"
     #include "compute_sp_kernels.cl"
     #include "compute_hp_kernels.cl"
@@ -10,7 +10,7 @@ static const char *stringifiedKernels =
     #include "compute_integer_kernels.cl"
     ;
 
-static const char *stringifiedKernelsNoInt =
+static const string stringifiedKernelsNoInt =
     #include "global_bandwidth_kernels.cl"
     #include "compute_sp_kernels.cl"
     #include "compute_hp_kernels.cl"
@@ -83,13 +84,13 @@ int clPeak::runAll()
       // Causes Segmentation fault: 11
       if(isIntel || isApple)
       {
-        cl::Program::Sources source(1, make_pair(stringifiedKernelsNoInt, (strlen(stringifiedKernelsNoInt)+1)));
+        cl::Program::Sources source(1, stringifiedKernelsNoInt);
         isComputeInt = false;
         prog = cl::Program(ctx, source);
       }
       else
       {
-        cl::Program::Sources source(1, make_pair(stringifiedKernels, (strlen(stringifiedKernels)+1)));
+        cl::Program::Sources source(1, stringifiedKernels);
         prog = cl::Program(ctx, source);
       }

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.