oneapi-src / level-zero Goto Github PK
View Code? Open in Web Editor NEWoneAPI Level Zero Specification Headers and Loader
Home Page: https://spec.oneapi.com/versions/latest/elements/l0/source/index.html
License: MIT License
oneAPI Level Zero Specification Headers and Loader
Home Page: https://spec.oneapi.com/versions/latest/elements/l0/source/index.html
License: MIT License
Regressed by intel/compute-runtime@691a4ea
I wonder if Intel bindings need to be updated.
$ ze_peak
zeDriverGet...
Device :
* name : Intel(R) Gen9
* vendorId : 32902
* deviceId : 6418
* subdeviceId : 0
* isSubdevice : FALSE
* coreClockRate : 1150
* numAsyncComputeEngines : 3
* numAsyncCopyEngines : 0
* maxCommandQueuePriority : 0
Process 82663 stopped
* thread #1, name = 'ze_peak', stop reason = signal SIGABRT
frame #0: 0x000000080060b02a libc.so.7`__sys_thr_kill at thr_kill.S:4
(lldb) bt
* thread #1, name = 'ze_peak', stop reason = signal SIGABRT
* frame #0: 0x000000080060b02a libc.so.7`__sys_thr_kill at thr_kill.S:4
frame #1: 0x000000080060a9c4 libc.so.7`__raise(s=6) at raise.c:52:10
frame #2: 0x0000000800573459 libc.so.7`abort at abort.c:67:8
frame #3: 0x00000008003d6a49 libcxxrt.so.1`report_failure(err=<unavailable>, thrown_exception=0x0000000802717fa8) at exception.cc:719:5
frame #4: 0x000000000020df59 ze_peak`L0Context::init_xe(this=0x00007fffffffde70) at ze_peak.cpp:183:5
frame #5: 0x00000000002115d9 ze_peak`main(argc=1, argv=0x00007fffffffe0f0) at ze_peak.cpp:760:11
frame #6: 0x000000000020b8bf ze_peak`_start(ap=<unavailable>, cleanup=<unavailable>) at crt1.c:76:7
(lldb) f 4
frame #4: 0x000000000020df59 ze_peak`L0Context::init_xe(this=0x00007fffffffde70) at ze_peak.cpp:183:5
180 result =
181 zeCommandListCreate(device, &command_list_description, &command_list);
182 if (result) {
-> 183 throw std::runtime_error("zeDeviceCreateCommandList failed: " +
184 std::to_string(result));
185 }
186 if (verbose)
This is a question about the Level Zero API.
I have the following sequence of instructions in which I accidentally forgot to reset
. Once the command list is closed, I can still append commands into the command lists without getting any error/s. Is this behaviour expected?
Perhaps getting something like ZE_INVALID_COMMAND_LIST
?
zeCommandListCreate -> Status: 0
zeMemAllocDevice -> Status: 0
zeModuleCreate -> Status: 0
zeModuleBuildLogDestroy -> Status: 0
zeKernelCreate -> Status: 0
zeCommandListReset -> Status: 0
zeKernelSuggestGroupSize -> Status: 0
zeKernelSuggestGroupSize -> Status: 0
zeKernelSetArgumentValue -> Status: 0
zeCommandListAppendLaunchKernel -> Status: 0
zeCommandListAppendBarrier -> Status: 0
zeCommandListAppendMemoryCopy -> Status: 0
zeCommandListAppendBarrier -> Status: 0
zeCommandListClose -> Status: 0
zeCommandQueueExecuteCommandLists -> Status: 0
zeCommandQueueSynchronize -> Status: 0
zeCommandListAppendMemoryCopy -> Status: 0 << Shouldn't we get an error? The list was already closed.
zeCommandListAppendBarrier -> Status: 0
Thank you,
level-zero/include/core/ze_module.h
Lines 493 to 506 in f50b570
level-zero/source/layers/validation/ze_core_layer.cpp
Lines 2630 to 2631 in f50b570
I'm experimenting with oneL0, but on my system the very first call to ze_init
returns ZE_RESULT_ERROR_UNINITIALIZED. For example, using https://github.com/alcf-perfengr/alcl:
$ git clone https://github.com/alcf-perfengr/alcl
$ cd alcl/L0
$ make CFLAGS="-I /usr/include/level_zero -lze_loader" $ ./tiny_zeinfo [master]
>>> Initializing L0 Platform and Device...
Non-successful return code 2013265921 (ZE_RESULT_ERROR_UNINITIALIZED) for zeInit. Exiting.
I'm on Arch Linux, kernel 5.5.13, using intel-compute-runtime 20.12.16259 and level-zero-loader 0.91.10 from the Arch user repositories. clinfo
output:
$ docker run -it --device /dev/dri:/dev/dri --rm docker.io/intelopencl/intel-opencl:ubuntu-18.04-ppa clinfo
Number of platforms 1
Platform Name Intel(R) OpenCL HD Graphics
Platform Vendor Intel(R) Corporation
Platform Version OpenCL 2.1
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_driver_diagnostics cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_intel_subgroups_char cl_intel_subgroups_long cl_khr_fp64 cl_khr_subgroups cl_khr_il_program cl_intel_spirv_device_side_avc_motion_estimation cl_intel_spirv_media_block_io cl_intel_spirv_subgroups cl_khr_spirv_no_integer_wrap_decoration cl_intel_unified_shared_memory_preview cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_intel_planar_yuv cl_intel_packed_yuv cl_intel_motion_estimation cl_intel_device_side_avc_motion_estimation cl_intel_advanced_motion_estimation cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_image2d_from_buffer cl_khr_depth_images cl_intel_media_block_io cl_khr_3d_image_writes
Platform Host timer resolution 1ns
Platform Extensions function suffix INTEL
Platform Name Intel(R) OpenCL HD Graphics
Number of devices 1
Device Name Intel(R) Gen9 HD Graphics NEO
Device Vendor Intel(R) Corporation
Device Vendor ID 0x8086
Device Version OpenCL 2.1 NEO
Driver Version 20.11.16158
Device OpenCL C Version OpenCL C 2.0
Device Type GPU
Device Profile FULL_PROFILE
Device Available Yes
Compiler Available Yes
Linker Available Yes
Max compute units 24
Max clock frequency 1150MHz
Device Partition (core)
Max number of sub-devices 0
Supported partition types None, None
Max work item dimensions 3
Max work item sizes 256x256x256
Max work group size 256
Preferred work group size multiple 32
Max sub-groups per work group 32
Sub-group sizes (Intel) 8, 16, 32
Preferred / native vector sizes
char 16 / 16
short 8 / 8
int 4 / 4
long 1 / 1
half 8 / 8 (cl_khr_fp16)
float 1 / 1
double 1 / 1 (cl_khr_fp64)
Half-precision Floating-point support (cl_khr_fp16)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Single-precision Floating-point support (core)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations Yes
Double-precision Floating-point support (cl_khr_fp64)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Address bits 64, Little-Endian
Global memory size 26845274112 (25GiB)
Error Correction support No
Max memory allocation 4294959104 (4GiB)
Unified memory for Host and Device Yes
Shared Virtual Memory (SVM) capabilities (core)
Coarse-grained buffer sharing Yes
Fine-grained buffer sharing No
Fine-grained system sharing No
Atomics No
Minimum alignment for any data type 128 bytes
Alignment of base address 1024 bits (128 bytes)
Preferred alignment for atomics
SVM 64 bytes
Global 64 bytes
Local 64 bytes
Max size for global variable 65536 (64KiB)
Preferred total size of global vars 4294959104 (4GiB)
Global Memory cache type Read/Write
Global Memory cache size 524288 (512KiB)
Global Memory cache line size 64 bytes
Image support Yes
Max number of samplers per kernel 16
Max size for 1D images from buffer 268434944 pixels
Max 1D or 2D image array size 2048 images
Base address alignment for 2D image buffers 4 bytes
Pitch alignment for 2D image buffers 4 pixels
Max 2D image size 16384x16384 pixels
Max planar YUV image size 16384x16352 pixels
Max 3D image size 16384x16384x2048 pixels
Max number of read image args 128
Max number of write image args 128
Max number of read/write image args 128
Max number of pipe args 16
Max active pipe reservations 1
Max pipe packet size 1024
Local memory type Local
Local memory size 65536 (64KiB)
Max number of constant args 8
Max constant buffer size 4294959104 (4GiB)
Max size of kernel argument 1024
Queue properties (on host)
Out-of-order execution Yes
Profiling Yes
Queue properties (on device)
Out-of-order execution Yes
Profiling Yes
Preferred size 131072 (128KiB)
Max size 67108864 (64MiB)
Max queues on device 1
Max events on device 1024
Prefer user sync for interop Yes
Profiling timer resolution 83ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels No
Sub-group independent forward progress Yes
IL version SPIR-V_1.2
SPIR versions 1.2
printf() buffer size 4194304 (4MiB)
Built-in kernels block_motion_estimate_intel;block_advanced_motion_estimate_check_intel;block_advanced_motion_estimate_bidirectional_check_intel;
Motion Estimation accelerator version (Intel) 2
Device-side AVC Motion Estimation version 1
Supports texture sampler use Yes
Supports preemption No
Device Extensions cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_driver_diagnostics cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_intel_subgroups_char cl_intel_subgroups_long cl_khr_fp64 cl_khr_subgroups cl_khr_il_program cl_intel_spirv_device_side_avc_motion_estimation cl_intel_spirv_media_block_io cl_intel_spirv_subgroups cl_khr_spirv_no_integer_wrap_decoration cl_intel_unified_shared_memory_preview cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_intel_planar_yuv cl_intel_packed_yuv cl_intel_motion_estimation cl_intel_device_side_avc_motion_estimation cl_intel_advanced_motion_estimation cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_image2d_from_buffer cl_khr_depth_images cl_intel_media_block_io cl_khr_3d_image_writes
NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) Intel(R) OpenCL HD Graphics
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [INTEL]
clCreateContext(NULL, ...) [default] Success [INTEL]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) Success (1)
Platform Name Intel(R) OpenCL HD Graphics
Device Name Intel(R) Gen9 HD Graphics NEO
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1)
Platform Name Intel(R) OpenCL HD Graphics
Device Name Intel(R) Gen9 HD Graphics NEO
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (1)
Platform Name Intel(R) OpenCL HD Graphics
Device Name Intel(R) Gen9 HD Graphics NEO
ICD loader properties
ICD loader Name OpenCL ICD Loader
ICD loader Vendor OCL Icd free software
ICD loader Version 2.2.11
ICD loader Profile OpenCL 2.1
Permissions on the device nodes are OK too:
crw-rw-rw- 1 root render 226, 128 Mar 30 21:36 /dev/dri/renderD128
crw-rw-rw- 1 root render 226, 129 Mar 30 21:36 /dev/dri/renderD129
Although this is probably a misconfiguration of some sort, what are the guarantees about calling ze_init? Is it possible for it to fail when, e.g., the user doesn't have compatible hardware or a driver?
Hello
Several properties structures in the sysman API have a "maxBandwidth" field near a "width" field that says how many lanes are in the link. The documentation should say whether the bandwidth is per lane or not.
Matters at least for zet_fabric_port_speed_t and zet_pci_speed_t.
From what I see PciImp::init(), it's the total bandwidth for PCI, not per lane. That's not obvious at all (but OK if documented). I didn't check for fabric port speed.
Thanks
Build makes invalid assumptions about Git repository tags / content:
$ git clone --branch v0.91.21 --depth 1 https://github.com/oneapi-src/level-zero.git && cd level-zero && cmake -Wno-dev -G Ninja -DCMAKE_BUILD_TYPE=Release .
Cloning into 'level-zero'...
Note: switching to '317bc0d1ed9630924d409b419a30465dfdcf2be3'.
You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.
If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:
git switch -c <new-branch-name>
Or undo this operation with:
git switch -
Turn off this advice by setting config variable advice.detachedHead to false
-- The C compiler identification is GNU 9.3.0
-- The CXX compiler identification is GNU 9.3.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
fatal: ambiguous argument 'v0.91..HEAD': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'
CMake Error at CMakeLists.txt:43 (message):
Cannot determine patch version - couldn't find v0.91 tag in repository
(It builds fine without the "--depth 1" option which I use to reduce git checkout sizes in build containers.)
From https://spec.oneapi.com/versions/0.7/oneL0/core/PROG.html#images:
This methodology allows for device-specific encoding of image contents (e.g., tile swizzle patterns, lossless compression, etc.) and avoids exposing these details in the API in a backwards compatible fashion.
Is that wording intentional? I would assume backwards compatibility for the Image API is a desired quality , not something to avoid...
Is it possible to expose maxMemAllocSize via ze_device_memory_properties_t?
Currently, level_zero implementation for Intel GPU uses this property to check if size is valid or not (e.g. https://github.com/intel/compute-runtime/blob/master/level_zero/core/source/memory/memory.cpp#L123) and it seems that the user doesn't have a way to know what its value is (besides looking at compute-runtime source code).
The API for zeCommandListAppendWaitOnEvents is analogous to clEnqueuMarkerWithWaitList in OpenCL
There are a couple of issues
In addition - the OpenCL API returns an event that can be used. To achieve the same behavior in Level Zero, an additional API call is needed i.e.zeCommandListAppendSignalEvent which returns an event.
It would be useful to have the same API behavior as clEnqueueMarkerWithWaitList
Hi,
In the Intel OpenCL Extension cl_intel_required_subgroup_size, the kernel property: CL_โKERNEL_โSPILL_โMEM_โSIZE_โINTEL who "Returns the amount of spill memory used by a kernel", was added.
I was not able to find an equivalent in the L0 API. Do you think it will be possible to add this query to Level0?
Thanks,
The CMakeLists.txt file suggests that the main headers are installed to <prefix>/include/level_zero/
which would imply that users should #include <level_zero/ze_api.h>
instead of #include <ze_api.h>
. However, the documentation (https://spec.oneapi.com/level-zero/latest/core/INTRO.html#application-binary-interface) states that users should just include "ze_api.h".
Can we make a concrete statement in the documentation about how to include <level_zero/ze_api.h> instead of <ze_api.h>?
This also brings into question how the include parts of #50 should be structured (basically should we remove the level_zero
component of the include?)
The validation enforces the number of event in the pool to be zero:
level-zero/source/layers/validation/ze_valddi.cpp
Lines 1493 to 1494 in b843465
Whereas the documentation of struct ze_event_pool_desc_t
specifies:
uint32_t count
[in] number of events within the pool; must be greater than 0
The same way of building works for v.1.2.3:
`C:\tools\builds\fB-KRWEF\0\ispc\ispc-deps>git clone https://github.com/oneapi-src/level-zero.git
Cloning into 'level-zero'...
C:\tools\builds\fB-KRWEF\0\ispc\ispc-deps>cd level-zero
C:\tools\builds\fB-KRWEF\0\ispc\ispc-deps\level-zero>git checkout v1.3.0
Note: switching to 'v1.3.0'.
You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.
If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:
git switch -c
Or undo this operation with:
git switch -
Turn off this advice by setting config variable advice.detachedHead to false
HEAD is now at ae10adc Bump minor version to from v1.2 to v1.3
C:\tools\builds\fB-KRWEF\0\ispc\ispc-deps\level-zero>git log --pretty -1
commit ae10adc
Author: Brandon Yates [email protected]
Date: Thu May 27 10:38:10 2021 -0400
Bump minor version to from v1.2 to v1.3
Signed-off-by: Brandon Yates <[email protected]>
C:\tools\builds\fB-KRWEF\0\ispc\ispc-deps\level-zero>mkdir build
C:\tools\builds\fB-KRWEF\0\ispc\ispc-deps\level-zero>cd build
C:\tools\builds\fB-KRWEF\0\ispc\ispc-deps\level-zero\build>cmake .. -DCMAKE_BUILD_TYPE=Release -DOpenCL_INCLUDE_DIR=c:\tools\builds\fB-KRWEF\0\ispc\ispc-deps\OpenCL-Headers -DCMAKE_INSTALL_PREFIX=c:\tools\builds\fB-KRWEF\0\ispc\ispc-deps\install\level-zero -G "NMake Makefiles"
-- The C compiler identification is MSVC 19.28.29333.0
-- The CXX compiler identification is MSVC 19.28.29333.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/VC/Tools/MSVC/14.28.29333/bin/Hostx64/x64/cl.exe - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/VC/Tools/MSVC/14.28.29333/bin/Hostx64/x64/cl.exe - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Git: C:/Program Files/Git/cmd/git.exe (found version "2.23.0.windows.1")
-- Using patch version from commit count in git repository: 0
-- Configuring done
-- Generating done
CMake Warning:
Manually-specified variables were not used by the project:
OpenCL_INCLUDE_DIR
-- Build files have been written to: C:/tools/builds/fB-KRWEF/0/ispc/ispc-deps/level-zero/build
C:\tools\builds\fB-KRWEF\0\ispc\ispc-deps\level-zero\build>nmake install
Microsoft (R) Program Maintenance Utility Version 14.28.29333.0
Copyright (C) Microsoft Corporation. All rights reserved.
Scanning dependencies of target ze_loader
[ 2%] Building RC object source/CMakeFiles/ze_loader.dir/ZeLoaderVersion.rc.res
Microsoft (R) Windows (R) Resource Compiler Version 10.0.10011.16384
Copyright (C) Microsoft Corporation. All rights reserved.
[ 4%] Building CXX object source/CMakeFiles/ze_loader.dir/lib/ze_libapi.cpp.obj
ze_libapi.cpp
[ 7%] Building CXX object source/CMakeFiles/ze_loader.dir/lib/ze_libddi.cpp.obj
ze_libddi.cpp
[ 9%] Building CXX object source/CMakeFiles/ze_loader.dir/lib/zet_libapi.cpp.obj
zet_libapi.cpp
[ 12%] Building CXX object source/CMakeFiles/ze_loader.dir/lib/zet_libddi.cpp.obj
zet_libddi.cpp
[ 14%] Building CXX object source/CMakeFiles/ze_loader.dir/lib/zes_libapi.cpp.obj
zes_libapi.cpp
[ 17%] Building CXX object source/CMakeFiles/ze_loader.dir/lib/zes_libddi.cpp.obj
zes_libddi.cpp
[ 19%] Building CXX object source/CMakeFiles/ze_loader.dir/lib/zel_tracing_libapi.cpp.obj
zel_tracing_libapi.cpp
[ 21%] Building CXX object source/CMakeFiles/ze_loader.dir/lib/zel_tracing_libddi.cpp.obj
zel_tracing_libddi.cpp
[ 24%] Building CXX object source/CMakeFiles/ze_loader.dir/lib/ze_lib.cpp.obj
ze_lib.cpp
[ 26%] Building CXX object source/CMakeFiles/ze_loader.dir/lib/ze_tracing_register_cb_libapi.cpp.obj
ze_tracing_register_cb_libapi.cpp
[ 29%] Building CXX object source/CMakeFiles/ze_loader.dir/lib/windows/lib_init.cpp.obj
lib_init.cpp
[ 31%] Building CXX object source/CMakeFiles/ze_loader.dir/loader/ze_loader.cpp.obj
ze_loader.cpp
[ 34%] Building CXX object source/CMakeFiles/ze_loader.dir/loader/ze_loader_api.cpp.obj
ze_loader_api.cpp
[ 36%] Building CXX object source/CMakeFiles/ze_loader.dir/loader/ze_ldrddi.cpp.obj
ze_ldrddi.cpp
[ 39%] Building CXX object source/CMakeFiles/ze_loader.dir/loader/zet_ldrddi.cpp.obj
zet_ldrddi.cpp
[ 41%] Building CXX object source/CMakeFiles/ze_loader.dir/loader/zes_ldrddi.cpp.obj
zes_ldrddi.cpp
[ 43%] Building CXX object source/CMakeFiles/ze_loader.dir/loader/zel_tracing_ldrddi.cpp.obj
zel_tracing_ldrddi.cpp
[ 46%] Building CXX object source/CMakeFiles/ze_loader.dir/loader/windows/driver_discovery_win.cpp.obj
driver_discovery_win.cpp
C:\tools\builds\fB-KRWEF\0\ispc\ispc-deps\level-zero\source\loader\windows\driver_discovery_win.cpp(37): error C2065: 'GUID_DEVCLASS_COMPUTEACCELERATOR': undeclared identifier
C:\tools\builds\fB-KRWEF\0\ispc\ispc-deps\level-zero\source\loader\windows\driver_discovery_win.cpp(39): error C3536: 'computeDrivers': cannot be used before it is initialized
C:\tools\builds\fB-KRWEF\0\ispc\ispc-deps\level-zero\source\loader\windows\driver_discovery_win.cpp(39): error C2661: 'std::vector<loader::DriverLibraryPath,std::allocatorloader::DriverLibraryPath>::insert': no overloaded function takes 1 arguments
NMAKE : fatal error U1077: 'C:\PROGRA2\MICROS1\2019\PROFES1\VC\Tools\MSVC\14281.293\bin\Hostx64\x64\cl.exe' : return code '0x2'
Stop.
NMAKE : fatal error U1077: '"C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\VC\Tools\MSVC\14.28.29333\bin\HostX64\x64\nmake.exe"' : return code '0x2'
Stop.
NMAKE : fatal error U1077: '"C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\VC\Tools\MSVC\14.28.29333\bin\HostX64\x64\nmake.exe"' : return code '0x2'
Stop.`
(https://github.com/oneapi-src/level-zero/releases) shows that only 20.12.16259 is supported.
Can level-zero loader support more up-to-date version such as 20.21.16886?
Thanks
Why are command lists per device not per queue?
If you have multi-queue hardware with say separate compute and copy queues, you can't record the final command buffer until submit time, this means driver have to keep some sort of meta queue that isn't the real underlying hw queue.
Vulkan puts the command lists per queue for this reason, so that when you record a a compute or transfer (copy) command list it can be baked down to actual hw commands for the queue the command is intended to be executed on.
level-zero/include/core/ze_event.h
Lines 127 to 138 in f50b570
level-zero/source/layers/validation/ze_core_layer.cpp
Lines 1345 to 1349 in f50b570
The ze_device_properties_t
struct has a name
field to describe the device.
Would it make sense to add a name
field to the ze_driver_properties_t
as well, to describe the driver ?
E.g. something like "Intel(R) LevelZero HD Graphics" or "Intel(R) LevelZero NEO driver".
Executing a single thread kernel (without specifying the thread-id intrinsic in the kernel) gives me wrong results.
I simplified the kernel as follows:
__kernel void initValues(__global long* input, __global long* output) {
output[0] = input[0];
}
After running the SPIRV kernel I get wrong values in the output. However, if I run the following kernel:
__kernel void initValues(__global long* input, __global long* output) {
uint idx = get_global_id(0);
output[idx] = input[idx];
}
I get the correct results when running with a single thread.
This is the kernel dispatch
uint32_t groupSizeX = 1u;
uint32_t groupSizeY = 1u;
uint32_t groupSizeZ = 1u;
VALIDATECALL(zeKernelSuggestGroupSize(kernel, items, 1U, 1U, &groupSizeX, &groupSizeY, &groupSizeZ));
VALIDATECALL(zeKernelSetGroupSize(kernel, groupSizeX, groupSizeY, groupSizeZ));
// Push arguments
VALIDATECALL(zeKernelSetArgumentValue(kernel, 0, sizeof(inputBuffer), &inputBuffer));
VALIDATECALL(zeKernelSetArgumentValue(kernel, 1, sizeof(outputBuffer), &outputBuffer));
// Kernel thread-dispatch
ze_group_count_t dispatch;
dispatch.groupCountX = 1u;
dispatch.groupCountY = 1u;
dispatch.groupCountZ = 1u;
VALIDATECALL(zeCommandListAppendLaunchKernel(cmdList, kernel, &dispatch, nullptr, 0, nullptr))
Note, I am working on a tool that generates SPIRV, sometimes from single treads kernels (so we do not expose thread-ids). I wonder if I am missing something. I am not sure if this is an error in Level Zero, but it works in pure OpenCL.
Full program available here:
https://gist.github.com/jjfumero/a15f24c567953af57c3a02a963577980
Any ideas/pointers are appreciated.
Level-zero newbie here!
I am running level-zero on CentOS 7.8.
The default GCC version on CentOS is 4.8.5, which seems to be very old to compile level-zero.
By using:
$ scl enable devtoolset-9 bash
$ gcc --version
gcc (GCC) 9.3.1 20200408 (Red Hat 9.3.1-2)
I can compile and run the examples successfully. I think it would be good to add the minimum GCC version in the README file.
GCC >= ??
UINT32_MAX in nanoseconds is just over 4 seconds which seems to short. All timeouts should be uint64_t for future proofiing.
level-zero/include/core/ze_event.h
Lines 36 to 44 in f50b570
level-zero/source/layers/validation/ze_core_layer.cpp
Lines 1287 to 1288 in f50b570
When invoking zeKernelGetSourceAttributes
with a nullptr pString
pointer to obtain the attribute size, https://github.com/intel/compute-runtime/blob/128cd8a31c16977ecc41bf13bdd35c2ac4907a5b/level_zero/core/source/kernel/kernel_imp.cpp#L443-L445, the validator erroneously throws an ERROR_INVALID_NULL_POINTER,
level-zero/source/layers/validation/ze_valddi.cpp
Lines 3087 to 3088 in 284ccb0
When we invoked zeModuleGetGlobalPointer to retrieve the size of global variable specified by the 2nd argument (which is a string for global variable's name) , zeModuleGetGlobalPointer just returned 0 in the 3rd argument, but gave correct device visible pointer in the 4th argument.
We referred to the description at: https://spec.oneapi.com/level-zero/latest/core/api.html?highlight=globalpointer#zemodulegetglobalpointer . The L0 module was create by adding the compilation flag '-cl-std=CL2.0 -cl-take-global-address'
We tried the definition of global variable as scalar type or array type, neither of their size can be retrieved via zeModuleGetGlobalPointer correctly.
Is there any documentation about how to build level-zero from source on Windows 10? The documentation only specifies for Linux.
Looking at the device memory allocation API,
__ze_api_export ze_result_t __zecall
zeDriverAllocDeviceMem(
ze_driver_handle_t hDriver, ///< [in] handle of the driver instance
const ze_device_mem_alloc_desc_t* device_desc, ///< [in] pointer to device mem alloc descriptor
size_t size, ///< [in] size in bytes to allocate
size_t alignment, ///< [in] minimum alignment in bytes for the allocation
ze_device_handle_t hDevice, ///< [in] handle of the device
void** pptr ///< [out] pointer to device allocation
);
It looks like it always passes back a (void *). Is this expected to be a mapped pointer accessible from the CPU?
There are GPUs out there (maybe not yet from Intel) that have limited access to the VRAM from the CPU, by forcing that *pptr is a mapped address for all allocation this will limit the amount of mapped accessible objects you can have.
This why Vulkan has VkDeviceMemory objects, and Map/Unmap APIs.
Documentation here: https://spec.oneapi.com/level-zero/latest/sysman/api.html#_CPPv418zes_engine_group_t
Lists following defines:
However, they're missing from the latest L0 header: https://github.com/oneapi-src/level-zero/blob/master/include/zes_api.h#L1086
(And the values of several defines differ from the documentation.)
Ps. Link to L0 specification at end of README.md gives 404. Spec link in the sidebar works.
Line 111 in 0d30b1f
If to use provided build instructions the following error occurs:
CMake Error at CMakeLists.txt:77 (message):
OpenCL headers required for interop APIs
It is clear after looking to cmake file that path to headers must be provided in the following way:
cmake -DOpenCL_INCLUDE_DIR=[path] ..
But should this be mentioned in instructions? So that user can build without problems from the first attempt and without looking to cmake files.
The spec doesn't seem to explicitly say this but kernel objects are currently defined should be generated from modules per command recording thread.
I wonder if it would be better to add a way to just record a kernel to a command list, set group size and inputs for and dispatch it. This seems more thread safe and would allow for sharing kernel objects across threads if all you wanted to do was record a command stream with different inputs or group sizes.
Otherwise you'd have to create a new kernel object per command stream, I suppose it depends on whether the API usage envisages kernel objects are being light to construct or something that will trigger a compiler backend pass.
Hi,
Could you please put the libze_loader.so symlink in the .deb package also?
Currently there are only files with versions installed, e.g. libze_loader.so.0.91
Thank you.
BR,
Anton
I am getting this error code 0x78000004
, after trying to create a SPIRV kernel with the wrong name. The error is not listed here:
https://spec.oneapi.com/level-zero/latest/core/api.html?highlight=zecommandlistclose#zekernelcreate
This error code code corresponds to an invalid argument:
ZE_RESULT_ERROR_INVALID_ARGUMENT = 0x78000004
However, following the documentation, it should be
ZE_RESULT_ERROR_INVALID_KERNEL_NAME = 0x78000011
I am getting this for SPIR-V modules with OpenCL compute.
Some details about the environment:
Level Zero version: 284ccb089184180e34864a9f1e23971d3d736bd8
Platform: Intel(R) HD Graphics 630 [0x591b]
OpenCL version: OpenCL C 3.0
Driver: 21.12.19358
Hello
I am porting hwloc to L0 for exposing the locality of GPU devices. Things worked fine with API 0.91 but porting to 1.0 doesn't look good.
I am listing devices with zeDriverGet() and zeDeviceGet() and then calling zesDevicePciGetProperties() and zesDeviceGetProperties() after casting into zes_device_handle_t. As documented in 1.0 doc, this only works if ZES_ENABLE_SYSMAN=1 in the environment. This looks like a terrible idea.
Requiring a third-party library to always modify the environment of a program isn't nice. It's ok when enabling some debugging in a library from time to time, but not when enabling a mandatory feature. I . Passing a flag to zeInit() would be better.
This won't work if multiple layers use L0 at the same time: an application using L0 (most likely a runtime system managing tasks) will call zeInit() without ZES_ENABLE_SYSMAN (because it doesn't need it). It will then call hwloc to get the locality of that some ze devices (to select the local one, etc). hwloc will fail because sysman wasn't enabled in L0 (it will actually segfault in the current implementation). hwloc could set the env var and call its own zeInit() too but it's likely too late anyway. Given that all drivers/devices are shared between all these layers (which is good), the behavior of L0 will basically change depending on which layer calls zeInit() first, that's unpredictable and bad.
0.95 had zetSysmanGet() for converting between ze and zet devices. 1.0 should just do the same: have an explicit function to enable sysman and "cast" a ze device handle to zes. The env var of init flags won't work just like multiple MPI_Init() don't work in MPI.
Currently, the documentation of zeModuleCreate specifies that nullptr == desc->pConstants
is an error, and the validation layer checks this. But the documentation of ze_module_desc_t
specifies that pConstants
must be set to nullptr
if no specialization constants are provided.
Edit: The validation of zeModuleCreate also checks:
if( 1 <= desc->format )
return ZE_RESULT_ERROR_INVALID_ENUMERATION;
But there exist two possible formats:
ZE_MODULE_FORMAT_SPIRV (0)
ZE_MODULE_FORMAT_NATIVE (1)
pkg-config is the standard interface of declaring project version, include paths and libraries: https://en.wikipedia.org/wiki/Pkg-config
It's used by most open source projects and all build systems support using it for checking these details. I think level-zero should also install level-zero.pc file with this info.
Is it possible that level 0 can support Broadwell GPUs ?
I got segfault when running a program as shown below.
Thread 1 "a.out" received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
(gdb) bt
#0 0x0000000000000000 in ?? ()
#1 0x00007ffff6573b94 in L0::Kernel::create(unsigned int, L0::Module*, _ze_kernel_desc_t const*, _ze_result_t*) ()
from /usr/local/lib/libze_intel_gpu.so.0.8
#2 0x00007ffff6575cf6 in L0::ModuleImp::createKernel(_ze_kernel_desc_t const*, _ze_kernel_handle_t**) ()
from /usr/local/lib/libze_intel_gpu.so.0.8
#3 0x00007ffff6568207 in L0::BuiltinFunctionsLibImpl::loadBuiltIn(unsigned int, char const*) ()
from /usr/local/lib/libze_intel_gpu.so.0.8
#4 0x00007ffff656832b in L0::BuiltinFunctionsLibImpl::initFunctions() () from /usr/local/lib/libze_intel_gpu.so.0.8
#5 0x00007ffff656bb9d in L0::Device::create(L0::DriverHandle*, NEO::Device*) () from /usr/local/lib/libze_intel_gpu.so.0.8
#6 0x00007ffff656cc0b in L0::DriverHandleImp::initialize(std::vector<std::unique_ptr<NEO::Device, std::default_deleteNEO::Device >, std::allocator<std::unique_ptr<NEO::Device, std::default_deleteNEO::Device > > >) () from /usr/local/lib/libze_intel_gpu.so.0.8
#7 0x00007ffff656cd9b in L0::DriverHandle::create(std::vector<std::unique_ptr<NEO::Device, std::default_deleteNEO::Device >, std::allocator<std::unique_ptr<NEO::Device, std::default_deleteNEO::Device > > >) () from /usr/local/lib/libze_intel_gpu.so.0.8
#8 0x00007ffff656da0a in L0::DriverImp::initialize(bool*) () from /usr/local/lib/libze_intel_gpu.so.0.8
#9 0x00007ffff656d60e in ?? () from /usr/local/lib/libze_intel_gpu.so.0.8
#10 0x00007ffff621c827 in __pthread_once_slow () from /lib/x86_64-linux-gnu/libpthread.so.0
#11 0x00007ffff656d8f0 in L0::init(_ze_init_flag_t) () from /usr/local/lib/libze_intel_gpu.so.0.8
#12 0x00007ffff7b2df6a in zeInit () from /usr/local/lib/libze_loader.so.0.91
#13 0x0000555555557285 in main ()
To create a Module, you need to pass a module description. (link)
typedef struct _ze_module_desc_t
{
ze_module_desc_version_t version;
ze_module_format_t format;
size_t inputSize; ///< [in] size of input IL or ISA from pInputModule.
const uint8_t* pInputModule; ///< [in] pointer to IL or ISA
const char* pBuildFlags;
const ze_module_constants_t* pConstants;
} ze_module_desc_t;
I'm having trouble determining the 3rd and 4th argument. Could someone provide me code samples of ze_module_desc_t for both "ZE_MODULE_FORMAT_IL_SPIRV" and "ZE_MODULE_FORMAT_NATIVE" format.
Hello,
The library part of libze_loader.so loads itself:
level-zero/source/lib/ze_lib.cpp
Line 32 in 2fb87e7
This makes sense should the library be built as a standalone one and needs to open the loader library.
This is harmful for the loader itself as it can be made to use external symbols for it's initialization. The most simple example I can think of is:
#include <dlfcn.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
typedef int (*zeInit_t)();
int main() {
const char *s;
s = getenv("ZELIBPATH");
if (!s) {
printf("No path given.\n");
exit(1);
}
void *handle = NULL;
handle = dlopen(s, RTLD_LAZY | RTLD_LOCAL);
if (!handle) {
printf("Wrong path given.\n");
exit(1);
}
zeInit_t pf = (zeInit_t)(intptr_t)dlsym(handle, "zeInit");
int res = (*pf)(0);
if (res != 0)
printf("Error.\n");
else
printf("Success.\n");
}
Compiling the sample and invoking it with (and with 2 different ze_loader installation):
LD_PRELOAD=/path/to/first/libze_loader.so ZELIBPATH=/path/to/second/libze_loader.so a.out
will cause a segfault, because the second ze_loader will try to use the first ze_loader symbols to initialize itself.
I think the loader shouldn't be trying to open itself, and use a different mechanism if the library is standalone or used inside a loader.
I hope I made a good enough job of explaining the issue.
Thanks,
Brice
On Linux, the hardcoded library names include the major and minor library version, i.e.
MAKE_LIBRARY_NAME( "ze_intel_gpu", "0.4")
results in libze_intel_gpu.so.0.4
.
Being hardcoded, this breaks loading the driver when its major or minor version changes (currently it is libze_intel_gpu.so.0.8
).
Until a proper mechanism for enumerating the drivers is introduced, would it make sense to drop the version number from the library name also on Linux, as it is already the case on Windows ?
Playing a bit with the thread scheduler I noticed that the following call throws an exception when setting a wrong block size:
uint32_t groupSizeX = 32u;
uint32_t groupSizeY = 32u;
uint32_t groupSizeZ = 1u;
zeKernelSetGroupSize(kernel, groupSizeX, groupSizeY, groupSizeZ);
The exception is as follows:
Error at zeKernelSetGroupSize(kernel, groupSizeX, groupSizeY, groupSizeZ): main: 161
terminate called without an active exception
Aborted (core dumped)
From my understanding, following the spec (https://spec.oneapi.com/level-zero/latest/core/api.html#zekernelsetgroupsize), we should get the error code ZE_RESULT_ERROR_INVALID_GROUP_SIZE_DIMENSION
.
Is this the expected behaviour?
I am using level zero: ae10adc
compute-runtime: 21.20.19883
From what I understand of the source code of both the loader and the driver, tracer callbacks get driver handles rather than loader handles (for level zero Objects). This raises a couple of issues:
Am I correct?
Does this also mean that I cannot mix ddi API code with loader API code as objects would be different (unless I force loader intercept and ddi becomes moot)?
I tried to write a simple tool that I can load with LD_PRELOAD (in the spirit of MPI tools for example, or OpenCL interposers).
It seems to initialize OK but it doesn't get called.
One problem is that I can't get to the device handles that the runtime initializes, however the level0 spec seems to imply that I should be able to open the device again.
Code attached. Is this hopeless? Seems like there should be a way to have third-party tools invoked.
This isn't really an issue, it's more of an FYI to level0 developers.
In the piglit open source testing project we have a bunch of CL program tests. These tests are individual files describing a test run by an test execution binary. The tests cover a bunch of CL builtins and other things. I recently added CL SPIR-V support to the testing binary, which uses clang + llvm-spirv to generate spir-v from the CL C tests.
Following on from that I've gotten basic level0 execution working from the same files.
https://gitlab.freedesktop.org/airlied/piglit/-/tree/lvl0
I've only tested this against my own level0 implementation not the intel one and I'm not even 100% confident it's working at all, but some tests to appear to pass.
This tool might be useful for writing quick tests also that just use CL C to exercise some corner cases. It doesn't have image or half support yet, and a few other things are likely broken, but I thought I'd mention it in case it was useful.
I'm not sure why level 0 invents a new set of image parameters, it probably makes more sense to just import the vulkan image format list and use that as a baseline so you can remain compatible.
No other API does a layout/type/swizzle as the format without having a name and definition for users.
I don't see any clear advantage in diverging in this area.
loader::zeCommandQueueExecuteCommandLists "spoils" the input command list handles.
ze_ldrddi.cpp:818 - 819
I'm quite confused because now I do not understand how it has been working all this time...
Normally user has a local variable ze_command_list_handle_t cmd_list and passes it to zeCommandQueueExecuteCommandLists by address.
It's quite unexpected that its value changes after the call. And I guess it becomes non-usable for the further API calls like Destroy, or Reset, etc?
Is it just that this implementation is not used in the default case?
I got to this code from a machine with igfx + dgfx running 2 different drivers.
While level 0 has fences and events it doesn't appear to have any semaphore support.
Events can be a messy solution from a scheduler point of view, as they pretty much block the GPU command processing hard at a point in time, some GPUs can reschedule other command streams, some can't. So while useful events can be a bit of a large hammer to hit a system with from a user application. Also GPU selected events invariably mean the CPU side is spinning on some memory location polling the event, again not really a great thing to have to depend on.
Vulkan recently added timeline semaphores and perhaps there should be some consideration of adding them to Level 0 as a better scheduling mechanism.
ze_device_kernel_properties_t kernel_properties;
kernel_properties.version = ZE_DEVICE_KERNEL_PROPERTIES_VERSION_CURRENT;
kernel_properties.dp4aSupported = 100;
zeDeviceGetKernelProperties(allDevices[d], &kernel_properties);
printf("dp4aSupported: %i\n", kernel_properties.dp4aSupported);
This retains the 100
value. Am I supposed to zero-initialize these structs?
Also, is the use of the version
field correct here?
Following the spec:
https://spec.oneapi.com/level-zero/latest/core/INTRO.html
Level-zero SPEC also works for spatial architectures.
When looking at the zeInit
function (https://github.com/oneapi-src/level-zero/blob/master/include/ze_api.h#L567), the current implementation only supports GPU.
typedef enum _ze_init_flag_t
{
ZE_INIT_FLAG_GPU_ONLY = ZE_BIT(0), ///< only initialize GPU drivers
ZE_INIT_FLAG_FORCE_UINT32 = 0x7fffffff
} ze_init_flag_t;
Are there any plans to include FPGAs?
Related to FPGAs, if I want Level-Zero and SPIRV, I guess I would need at least OpenCL 2.1? However, I see that the latest Quartus still implements OpenCL 1.0. Any plans on this front?
Thanks
Juan
zesFanGetConfig
callback is intended to return the fan configuration and in case if the fan is configured for work with table - speedTable
field should contain array of temperature/speed pairs, not a single one.
In current form of spec speedTable
field in zes_fan_config_t
is representing "A table containing temperature/speed pairs", but that's obviously not true because used type - zes_fan_temp_speed_t
is represent only single temperature/speed pair, it's not an array.
To match logic, speedTable
should be at least a literally array - which means that it still need to change it's type from zes_fan_temp_speed_t
to zes_fan_temp_speed_t[]
. The proper a data structure that will contain not only array of temperature/speed pairs, but also information how many of them are defined is zes_fan_speed_table_t
.
I've created a pull request to update API - #44, but since corresponding change should be done in spec itself first, I 'm going to close PR and track this problem as an issue as suggested by @bmyates.
Under source/wrapper/
there is what looks like a C++ wrapper on top of the Level Zero C API.
While it is built into a set of libraries (libze_api_cpp.a
and libzet_api_cpp.a
), there doesn't seem to be any public interface for it.
In the spec, it seems data transfers using this function
zeCommandListAppendMemoryCopy
are not blocking. Is there any variant for blocking calls? Or is there any equivalent to the OpenCL call clEnqueue{Read/Write}Buffer
with CL_TRUE
to indicate a blocking call?
Similar to the kernel launch, is there any way to specify a blocking call?
What I have found is to close a command lists, and then launch all pending command within the list after each data transfer or kernel launch. For example, by running the following sequence:
zeCommandListAppendMemoryCopy( .. )
zeCommandListClose( .. )
zeCommandQueueExecuteCommandLists ( .. )
zeCommandQueueSynchronize ( ..)
zeCommandListReset ( .. )
but is there any other way to get blocking calls?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.