oneapi-src / level-zero Goto Github PK

View Code? Open in Web Editor NEW

180.0 180.0 72.0 3.17 MB

oneAPI Level Zero Specification Headers and Loader

Home Page: https://spec.oneapi.com/versions/latest/elements/l0/source/index.html

License: MIT License

CMake 0.40% Dockerfile 0.01% C 23.49% C++ 63.82% Python 11.60% Mako 0.69%

oneapi

level-zero's People

Contributors

Stargazers

Watchers

level-zero's Issues

zeCommandListCreate fails since Intel NEO 20.15.16524

Regressed by intel/compute-runtime@691a4ea

I wonder if Intel bindings need to be updated.

$ ze_peak
zeDriverGet...
Device :
 * name : Intel(R) Gen9
 * vendorId : 32902
 * deviceId : 6418
 * subdeviceId : 0
 * isSubdevice : FALSE
 * coreClockRate : 1150
 * numAsyncComputeEngines : 3
 * numAsyncCopyEngines  : 0
 * maxCommandQueuePriority : 0
Process 82663 stopped
* thread #1, name = 'ze_peak', stop reason = signal SIGABRT
    frame #0: 0x000000080060b02a libc.so.7`__sys_thr_kill at thr_kill.S:4
(lldb) bt
* thread #1, name = 'ze_peak', stop reason = signal SIGABRT
  * frame #0: 0x000000080060b02a libc.so.7`__sys_thr_kill at thr_kill.S:4
    frame #1: 0x000000080060a9c4 libc.so.7`__raise(s=6) at raise.c:52:10
    frame #2: 0x0000000800573459 libc.so.7`abort at abort.c:67:8
    frame #3: 0x00000008003d6a49 libcxxrt.so.1`report_failure(err=<unavailable>, thrown_exception=0x0000000802717fa8) at exception.cc:719:5
    frame #4: 0x000000000020df59 ze_peak`L0Context::init_xe(this=0x00007fffffffde70) at ze_peak.cpp:183:5
    frame #5: 0x00000000002115d9 ze_peak`main(argc=1, argv=0x00007fffffffe0f0) at ze_peak.cpp:760:11
    frame #6: 0x000000000020b8bf ze_peak`_start(ap=<unavailable>, cleanup=<unavailable>) at crt1.c:76:7
(lldb) f 4
frame #4: 0x000000000020df59 ze_peak`L0Context::init_xe(this=0x00007fffffffde70) at ze_peak.cpp:183:5
   180    result =
   181        zeCommandListCreate(device, &command_list_description, &command_list);
   182    if (result) {
-> 183      throw std::runtime_error("zeDeviceCreateCommandList failed: " +
   184                               std::to_string(result));
   185    }
   186    if (verbose)

Question: Reusing Command Lists without reset?

This is a question about the Level Zero API.

I have the following sequence of instructions in which I accidentally forgot to reset. Once the command list is closed, I can still append commands into the command lists without getting any error/s. Is this behaviour expected?

Perhaps getting something like ZE_INVALID_COMMAND_LIST?

zeCommandListCreate -> Status: 0
zeMemAllocDevice -> Status: 0
zeModuleCreate -> Status: 0
zeModuleBuildLogDestroy -> Status: 0
zeKernelCreate -> Status: 0
zeCommandListReset -> Status: 0
zeKernelSuggestGroupSize -> Status: 0
zeKernelSuggestGroupSize -> Status: 0
zeKernelSetArgumentValue -> Status: 0
zeCommandListAppendLaunchKernel -> Status: 0
zeCommandListAppendBarrier -> Status: 0
zeCommandListAppendMemoryCopy -> Status: 0
zeCommandListAppendBarrier -> Status: 0
zeCommandListClose -> Status: 0
zeCommandQueueExecuteCommandLists -> Status: 0
zeCommandQueueSynchronize -> Status: 0
zeCommandListAppendMemoryCopy -> Status: 0         << Shouldn't we get an error? The list was already closed.
zeCommandListAppendBarrier -> Status: 0

Thank you,

zeKernelGetAttribute(ZE_KERNEL_ATTR_SOURCE_ATTRIBUTE) fails validation

level-zero/include/core/ze_module.h

Lines 493 to 506 in f50b570

 typedef enum _ze_kernel_attribute_t 

 { 

 ZE_KERNEL_ATTR_INDIRECT_HOST_ACCESS = 0, ///< Indicates that the function accesses host allocations indirectly 

 ///< (default: false, type: bool_t) 

 ZE_KERNEL_ATTR_INDIRECT_DEVICE_ACCESS, ///< Indicates that the function accesses device allocations indirectly 

 ///< (default: false, type: bool_t) 

 ZE_KERNEL_ATTR_INDIRECT_SHARED_ACCESS, ///< Indicates that the function accesses shared allocations indirectly 

 ///< (default: false, type: bool_t) 

 ZE_KERNEL_ATTR_SOURCE_ATTRIBUTE, ///< Declared kernel attributes (i.e. can be specified with __attribute__ 

 ///< in runtime language). (type: char[]) Returned as a null-terminated 

 ///< string and each attribute is separated by a space. 

 ///< ::zeKernelSetAttribute is not supported for this. 

 } ze_kernel_attribute_t;

level-zero/source/layers/validation/ze_core_layer.cpp

Lines 2630 to 2631 in f50b570

 if( 3 <= attr ) 

 return ZE_RESULT_ERROR_INVALID_ENUMERATION;

ze_init returns ZE_RESULT_ERROR_UNINITIALIZED

I'm experimenting with oneL0, but on my system the very first call to ze_init returns ZE_RESULT_ERROR_UNINITIALIZED. For example, using https://github.com/alcf-perfengr/alcl:

$ git clone https://github.com/alcf-perfengr/alcl                                                                                                                                                                                                              
$ cd alcl/L0                              
                                                                                                                                                                                                                     
$ make CFLAGS="-I /usr/include/level_zero -lze_loader"                                                                                                                                                                                        $ ./tiny_zeinfo                                                                                                                                                                                                                               [master] 
>>> Initializing L0  Platform and Device...
Non-successful return code 2013265921 (ZE_RESULT_ERROR_UNINITIALIZED) for zeInit.  Exiting.

I'm on Arch Linux, kernel 5.5.13, using intel-compute-runtime 20.12.16259 and level-zero-loader 0.91.10 from the Arch user repositories. clinfo output:

$ docker run -it --device /dev/dri:/dev/dri --rm docker.io/intelopencl/intel-opencl:ubuntu-18.04-ppa clinfo
Number of platforms                               1
  Platform Name                                   Intel(R) OpenCL HD Graphics
  Platform Vendor                                 Intel(R) Corporation
  Platform Version                                OpenCL 2.1 
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_driver_diagnostics cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_intel_subgroups_char cl_intel_subgroups_long cl_khr_fp64 cl_khr_subgroups cl_khr_il_program cl_intel_spirv_device_side_avc_motion_estimation cl_intel_spirv_media_block_io cl_intel_spirv_subgroups cl_khr_spirv_no_integer_wrap_decoration cl_intel_unified_shared_memory_preview cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_intel_planar_yuv cl_intel_packed_yuv cl_intel_motion_estimation cl_intel_device_side_avc_motion_estimation cl_intel_advanced_motion_estimation cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_image2d_from_buffer cl_khr_depth_images cl_intel_media_block_io cl_khr_3d_image_writes 
  Platform Host timer resolution                  1ns
  Platform Extensions function suffix             INTEL

  Platform Name                                   Intel(R) OpenCL HD Graphics
Number of devices                                 1
  Device Name                                     Intel(R) Gen9 HD Graphics NEO
  Device Vendor                                   Intel(R) Corporation
  Device Vendor ID                                0x8086
  Device Version                                  OpenCL 2.1 NEO 
  Driver Version                                  20.11.16158
  Device OpenCL C Version                         OpenCL C 2.0 
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               24
  Max clock frequency                             1150MHz
  Device Partition                                (core)
    Max number of sub-devices                     0
    Supported partition types                     None, None
  Max work item dimensions                        3
  Max work item sizes                             256x256x256
  Max work group size                             256
  Preferred work group size multiple              32
  Max sub-groups per work group                   32
  Sub-group sizes (Intel)                         8, 16, 32
  Preferred / native vector sizes                 
    char                                                16 / 16      
    short                                                8 / 8       
    int                                                  4 / 4       
    long                                                 1 / 1       
    half                                                 8 / 8        (cl_khr_fp16)
    float                                                1 / 1       
    double                                               1 / 1        (cl_khr_fp64)
  Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              26845274112 (25GiB)
  Error Correction support                        No
  Max memory allocation                           4294959104 (4GiB)
  Unified memory for Host and Device              Yes
  Shared Virtual Memory (SVM) capabilities        (core)
    Coarse-grained buffer sharing                 Yes
    Fine-grained buffer sharing                   No
    Fine-grained system sharing                   No
    Atomics                                       No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       1024 bits (128 bytes)
  Preferred alignment for atomics                 
    SVM                                           64 bytes
    Global                                        64 bytes
    Local                                         64 bytes
  Max size for global variable                    65536 (64KiB)
  Preferred total size of global vars             4294959104 (4GiB)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        524288 (512KiB)
  Global Memory cache line size                   64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            268434944 pixels
    Max 1D or 2D image array size                 2048 images
    Base address alignment for 2D image buffers   4 bytes
    Pitch alignment for 2D image buffers          4 pixels
    Max 2D image size                             16384x16384 pixels
    Max planar YUV image size                     16384x16352 pixels
    Max 3D image size                             16384x16384x2048 pixels
    Max number of read image args                 128
    Max number of write image args                128
    Max number of read/write image args           128
  Max number of pipe args                         16
  Max active pipe reservations                    1
  Max pipe packet size                            1024
  Local memory type                               Local
  Local memory size                               65536 (64KiB)
  Max number of constant args                     8
  Max constant buffer size                        4294959104 (4GiB)
  Max size of kernel argument                     1024
  Queue properties (on host)                      
    Out-of-order execution                        Yes
    Profiling                                     Yes
  Queue properties (on device)                    
    Out-of-order execution                        Yes
    Profiling                                     Yes
    Preferred size                                131072 (128KiB)
    Max size                                      67108864 (64MiB)
  Max queues on device                            1
  Max events on device                            1024
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      83ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Sub-group independent forward progress        Yes
    IL version                                    SPIR-V_1.2 
    SPIR versions                                 1.2 
  printf() buffer size                            4194304 (4MiB)
  Built-in kernels                                block_motion_estimate_intel;block_advanced_motion_estimate_check_intel;block_advanced_motion_estimate_bidirectional_check_intel;
  Motion Estimation accelerator version (Intel)   2
    Device-side AVC Motion Estimation version     1
      Supports texture sampler use                Yes
      Supports preemption                         No
  Device Extensions                               cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_driver_diagnostics cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_intel_subgroups_char cl_intel_subgroups_long cl_khr_fp64 cl_khr_subgroups cl_khr_il_program cl_intel_spirv_device_side_avc_motion_estimation cl_intel_spirv_media_block_io cl_intel_spirv_subgroups cl_khr_spirv_no_integer_wrap_decoration cl_intel_unified_shared_memory_preview cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_intel_planar_yuv cl_intel_packed_yuv cl_intel_motion_estimation cl_intel_device_side_avc_motion_estimation cl_intel_advanced_motion_estimation cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_image2d_from_buffer cl_khr_depth_images cl_intel_media_block_io cl_khr_3d_image_writes 

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  Intel(R) OpenCL HD Graphics
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [INTEL]
  clCreateContext(NULL, ...) [default]            Success [INTEL]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  Success (1)
    Platform Name                                 Intel(R) OpenCL HD Graphics
    Device Name                                   Intel(R) Gen9 HD Graphics NEO
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
    Platform Name                                 Intel(R) OpenCL HD Graphics
    Device Name                                   Intel(R) Gen9 HD Graphics NEO
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
    Platform Name                                 Intel(R) OpenCL HD Graphics
    Device Name                                   Intel(R) Gen9 HD Graphics NEO

ICD loader properties
  ICD loader Name                                 OpenCL ICD Loader
  ICD loader Vendor                               OCL Icd free software
  ICD loader Version                              2.2.11
  ICD loader Profile                              OpenCL 2.1

Permissions on the device nodes are OK too:

crw-rw-rw- 1 root render 226, 128 Mar 30 21:36 /dev/dri/renderD128
crw-rw-rw- 1 root render 226, 129 Mar 30 21:36 /dev/dri/renderD129

Although this is probably a misconfiguration of some sort, what are the guarantees about calling ze_init? Is it possible for it to fail when, e.g., the user doesn't have compatible hardware or a driver?

clarify PCI mawBandwidth is not per lane

Hello

Several properties structures in the sysman API have a "maxBandwidth" field near a "width" field that says how many lanes are in the link. The documentation should say whether the bandwidth is per lane or not.

Matters at least for zet_fabric_port_speed_t and zet_pci_speed_t.

From what I see PciImp::init(), it's the total bandwidth for PCI, not per lane. That's not obvious at all (but OK if documented). I didn't check for fabric port speed.

Thanks

Broken Git version check

Build makes invalid assumptions about Git repository tags / content:

$ git clone --branch v0.91.21 --depth 1 https://github.com/oneapi-src/level-zero.git  && cd level-zero && cmake -Wno-dev -G Ninja -DCMAKE_BUILD_TYPE=Release .
Cloning into 'level-zero'...
Note: switching to '317bc0d1ed9630924d409b419a30465dfdcf2be3'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c <new-branch-name>

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

-- The C compiler identification is GNU 9.3.0
-- The CXX compiler identification is GNU 9.3.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
fatal: ambiguous argument 'v0.91..HEAD': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'
CMake Error at CMakeLists.txt:43 (message):
  Cannot determine patch version - couldn't find v0.91 tag in repository

(It builds fine without the "--depth 1" option which I use to reduce git checkout sizes in build containers.)

Image API intentionally not backwards compatible?

From https://spec.oneapi.com/versions/0.7/oneL0/core/PROG.html#images:

This methodology allows for device-specific encoding of image contents (e.g., tile swizzle patterns, lossless compression, etc.) and avoids exposing these details in the API in a backwards compatible fashion.

Is that wording intentional? I would assume backwards compatibility for the Image API is a desired quality , not something to avoid...

exposing maxMemAllocSize via ze_device_memory_properties_t

Is it possible to expose maxMemAllocSize via ze_device_memory_properties_t?
Currently, level_zero implementation for Intel GPU uses this property to check if size is valid or not (e.g. https://github.com/intel/compute-runtime/blob/master/level_zero/core/source/memory/memory.cpp#L123) and it seems that the user doesn't have a way to know what its value is (besides looking at compute-runtime source code).

zeCommandListAppendWaitOnEvents API issues

The API for zeCommandListAppendWaitOnEvents is analogous to clEnqueuMarkerWithWaitList in OpenCL
There are a couple of issues

There is no information on how the Level Zero API behaves when the wait list is empty. According to OpenCL i.e. https://www.khronos.org/registry/OpenCL//sdk/2.2/docs/man/html/clEnqueueMarkerWithWaitList.html
the behavior is defined as "If event_wait_list is NULL, then this particular command waits until all previous enqueued commands to command_queue have completed."

In addition - the OpenCL API returns an event that can be used. To achieve the same behavior in Level Zero, an additional API call is needed i.e.zeCommandListAppendSignalEvent which returns an event.
It would be useful to have the same API behavior as clEnqueueMarkerWithWaitList

CL_KERNEL_SPILL_MEM_SIZE_INTEL equivalent

Hi,

In the Intel OpenCL Extension cl_intel_required_subgroup_size, the kernel property: CL_KERNEL_SPILL_MEM_SIZE_INTEL who "Returns the amount of spill memory used by a kernel", was added.

I was not able to find an equivalent in the L0 API. Do you think it will be possible to add this query to Level0?

Thanks,

Explicitly tell users how to include ze_api.h and friends.

The CMakeLists.txt file suggests that the main headers are installed to <prefix>/include/level_zero/ which would imply that users should #include <level_zero/ze_api.h> instead of #include <ze_api.h>. However, the documentation (https://spec.oneapi.com/level-zero/latest/core/INTRO.html#application-binary-interface) states that users should just include "ze_api.h".

Can we make a concrete statement in the documentation about how to include <level_zero/ze_api.h> instead of <ze_api.h>?

This also brings into question how the include parts of #50 should be structured (basically should we remove the level_zero component of the include?)

Incorrect validation of zeEventPoolCreate

The validation enforces the number of event in the pool to be zero:

level-zero/source/layers/validation/ze_valddi.cpp

Lines 1493 to 1494 in b843465

 if( 0 < desc->count ) 

 return ZE_RESULT_ERROR_INVALID_SIZE;

Whereas the documentation of struct ze_event_pool_desc_t specifies:

uint32_t count

[in] number of events within the pool; must be greater than 0

v1.3. - Can't build level zero loader on windows MSVC

The same way of building works for v.1.2.3:

`C:\tools\builds\fB-KRWEF\0\ispc\ispc-deps>git clone https://github.com/oneapi-src/level-zero.git
Cloning into 'level-zero'...

C:\tools\builds\fB-KRWEF\0\ispc\ispc-deps>cd level-zero

C:\tools\builds\fB-KRWEF\0\ispc\ispc-deps\level-zero>git checkout v1.3.0
Note: switching to 'v1.3.0'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

git switch -c

Or undo this operation with:

git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at ae10adc Bump minor version to from v1.2 to v1.3

C:\tools\builds\fB-KRWEF\0\ispc\ispc-deps\level-zero>git log --pretty -1
commit ae10adc
Author: Brandon Yates [email protected]
Date: Thu May 27 10:38:10 2021 -0400

Bump minor version to from v1.2 to v1.3

Signed-off-by: Brandon Yates <[email protected]>

C:\tools\builds\fB-KRWEF\0\ispc\ispc-deps\level-zero>mkdir build

C:\tools\builds\fB-KRWEF\0\ispc\ispc-deps\level-zero>cd build

C:\tools\builds\fB-KRWEF\0\ispc\ispc-deps\level-zero\build>cmake .. -DCMAKE_BUILD_TYPE=Release -DOpenCL_INCLUDE_DIR=c:\tools\builds\fB-KRWEF\0\ispc\ispc-deps\OpenCL-Headers -DCMAKE_INSTALL_PREFIX=c:\tools\builds\fB-KRWEF\0\ispc\ispc-deps\install\level-zero -G "NMake Makefiles"
-- The C compiler identification is MSVC 19.28.29333.0
-- The CXX compiler identification is MSVC 19.28.29333.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/VC/Tools/MSVC/14.28.29333/bin/Hostx64/x64/cl.exe - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/VC/Tools/MSVC/14.28.29333/bin/Hostx64/x64/cl.exe - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Git: C:/Program Files/Git/cmd/git.exe (found version "2.23.0.windows.1")
-- Using patch version from commit count in git repository: 0
-- Configuring done
-- Generating done
CMake Warning:
Manually-specified variables were not used by the project:

OpenCL_INCLUDE_DIR

-- Build files have been written to: C:/tools/builds/fB-KRWEF/0/ispc/ispc-deps/level-zero/build

C:\tools\builds\fB-KRWEF\0\ispc\ispc-deps\level-zero\build>nmake install

Scanning dependencies of target ze_loader
[ 2%] Building RC object source/CMakeFiles/ze_loader.dir/ZeLoaderVersion.rc.res
Microsoft (R) Windows (R) Resource Compiler Version 10.0.10011.16384

[ 4%] Building CXX object source/CMakeFiles/ze_loader.dir/lib/ze_libapi.cpp.obj
ze_libapi.cpp
[ 7%] Building CXX object source/CMakeFiles/ze_loader.dir/lib/ze_libddi.cpp.obj
ze_libddi.cpp
[ 9%] Building CXX object source/CMakeFiles/ze_loader.dir/lib/zet_libapi.cpp.obj
zet_libapi.cpp
[ 12%] Building CXX object source/CMakeFiles/ze_loader.dir/lib/zet_libddi.cpp.obj
zet_libddi.cpp
[ 14%] Building CXX object source/CMakeFiles/ze_loader.dir/lib/zes_libapi.cpp.obj
zes_libapi.cpp
[ 17%] Building CXX object source/CMakeFiles/ze_loader.dir/lib/zes_libddi.cpp.obj
zes_libddi.cpp
[ 19%] Building CXX object source/CMakeFiles/ze_loader.dir/lib/zel_tracing_libapi.cpp.obj
zel_tracing_libapi.cpp
[ 21%] Building CXX object source/CMakeFiles/ze_loader.dir/lib/zel_tracing_libddi.cpp.obj
zel_tracing_libddi.cpp
[ 24%] Building CXX object source/CMakeFiles/ze_loader.dir/lib/ze_lib.cpp.obj
ze_lib.cpp
[ 26%] Building CXX object source/CMakeFiles/ze_loader.dir/lib/ze_tracing_register_cb_libapi.cpp.obj
ze_tracing_register_cb_libapi.cpp
[ 29%] Building CXX object source/CMakeFiles/ze_loader.dir/lib/windows/lib_init.cpp.obj
lib_init.cpp
[ 31%] Building CXX object source/CMakeFiles/ze_loader.dir/loader/ze_loader.cpp.obj
ze_loader.cpp
[ 34%] Building CXX object source/CMakeFiles/ze_loader.dir/loader/ze_loader_api.cpp.obj
ze_loader_api.cpp
[ 36%] Building CXX object source/CMakeFiles/ze_loader.dir/loader/ze_ldrddi.cpp.obj
ze_ldrddi.cpp
[ 39%] Building CXX object source/CMakeFiles/ze_loader.dir/loader/zet_ldrddi.cpp.obj
zet_ldrddi.cpp
[ 41%] Building CXX object source/CMakeFiles/ze_loader.dir/loader/zes_ldrddi.cpp.obj
zes_ldrddi.cpp
[ 43%] Building CXX object source/CMakeFiles/ze_loader.dir/loader/zel_tracing_ldrddi.cpp.obj
zel_tracing_ldrddi.cpp
[ 46%] Building CXX object source/CMakeFiles/ze_loader.dir/loader/windows/driver_discovery_win.cpp.obj
driver_discovery_win.cpp
C:\tools\builds\fB-KRWEF\0\ispc\ispc-deps\level-zero\source\loader\windows\driver_discovery_win.cpp(37): error C2065: 'GUID_DEVCLASS_COMPUTEACCELERATOR': undeclared identifier
C:\tools\builds\fB-KRWEF\0\ispc\ispc-deps\level-zero\source\loader\windows\driver_discovery_win.cpp(39): error C3536: 'computeDrivers': cannot be used before it is initialized
C:\tools\builds\fB-KRWEF\0\ispc\ispc-deps\level-zero\source\loader\windows\driver_discovery_win.cpp(39): error C2661: 'std::vector<loader::DriverLibraryPath,std::allocatorloader::DriverLibraryPath>::insert': no overloaded function takes 1 arguments
NMAKE : fatal error U1077: 'C:\PROGRA~~2\MICROS~~1\2019\PROFES~~1\VC\Tools\MSVC\1428~~1.293\bin\Hostx64\x64\cl.exe' : return code '0x2'
Stop.
NMAKE : fatal error U1077: '"C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\VC\Tools\MSVC\14.28.29333\bin\HostX64\x64\nmake.exe"' : return code '0x2'
Stop.
NMAKE : fatal error U1077: '"C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\VC\Tools\MSVC\14.28.29333\bin\HostX64\x64\nmake.exe"' : return code '0x2'
Stop.`

Supporting 20.21.16886 or later version?

(https://github.com/oneapi-src/level-zero/releases) shows that only 20.12.16259 is supported.
Can level-zero loader support more up-to-date version such as 20.21.16886?
Thanks

command list are per device not per queue

Why are command lists per device not per queue?

If you have multi-queue hardware with say separate compute and copy queues, you can't record the final command buffer until submit time, this means driver have to keep some sort of meta queue that isn't the real underlying hw queue.

Vulkan puts the command lists per queue for this reason, so that when you record a a compute or transfer (copy) command list it can be baked down to actual hw commands for the queue the command is intended to be executed on.

zeEventCreate(ze_event_desc_t(ZE_EVENT_SCOPE_FLAG_HOST)) fails validation

level-zero/include/core/ze_event.h

Lines 127 to 138 in f50b570

 typedef enum _ze_event_scope_flag_t 

 { 

 ZE_EVENT_SCOPE_FLAG_NONE = 0, ///< execution synchronization only; no cache hierarchies are flushed or 

 ///< invalidated 

 ZE_EVENT_SCOPE_FLAG_SUBDEVICE = ZE_BIT(0), ///< cache hierarchies are flushed or invalidated sufficient for local 

 ///< sub-device access 

 ZE_EVENT_SCOPE_FLAG_DEVICE = ZE_BIT(1), ///< cache hierarchies are flushed or invalidated sufficient for global 

 ///< device access and peer device access 

 ZE_EVENT_SCOPE_FLAG_HOST = ZE_BIT(2), ///< cache hierarchies are flushed or invalidated sufficient for device and 

 ///< host access 

 } ze_event_scope_flag_t;

level-zero/source/layers/validation/ze_core_layer.cpp

Lines 1345 to 1349 in f50b570

 if( 4 <= desc->signal ) 

 return ZE_RESULT_ERROR_INVALID_ENUMERATION; 

 if( 4 <= desc->wait ) 

 return ZE_RESULT_ERROR_INVALID_ENUMERATION;

Add a name field for the driver properties ?

The ze_device_properties_t struct has a name field to describe the device.

Would it make sense to add a name field to the ze_driver_properties_t as well, to describe the driver ?
E.g. something like "Intel(R) LevelZero HD Graphics" or "Intel(R) LevelZero NEO driver".

Error dispatching a single thread SPIRV Kernel?

Executing a single thread kernel (without specifying the thread-id intrinsic in the kernel) gives me wrong results.

I simplified the kernel as follows:

__kernel void initValues(__global long* input, __global long* output) {
       output[0] = input[0];
}

After running the SPIRV kernel I get wrong values in the output. However, if I run the following kernel:

__kernel void initValues(__global long* input, __global long* output) {
       uint idx = get_global_id(0);
       output[idx] = input[idx];
}

I get the correct results when running with a single thread.

This is the kernel dispatch

 uint32_t groupSizeX = 1u;
 uint32_t groupSizeY = 1u;
 uint32_t groupSizeZ = 1u;
 VALIDATECALL(zeKernelSuggestGroupSize(kernel, items, 1U, 1U, &groupSizeX, &groupSizeY, &groupSizeZ));
 VALIDATECALL(zeKernelSetGroupSize(kernel, groupSizeX, groupSizeY, groupSizeZ));

 // Push arguments
 VALIDATECALL(zeKernelSetArgumentValue(kernel, 0, sizeof(inputBuffer), &inputBuffer));
 VALIDATECALL(zeKernelSetArgumentValue(kernel, 1, sizeof(outputBuffer), &outputBuffer));

// Kernel thread-dispatch
ze_group_count_t dispatch;
dispatch.groupCountX = 1u;
dispatch.groupCountY = 1u;
dispatch.groupCountZ = 1u;
VALIDATECALL(zeCommandListAppendLaunchKernel(cmdList, kernel, &dispatch, nullptr, 0, nullptr))

Note, I am working on a tool that generates SPIRV, sometimes from single treads kernels (so we do not expose thread-ids). I wonder if I am missing something. I am not sure if this is an error in Level Zero, but it works in pure OpenCL.

Full program available here:
https://gist.github.com/jjfumero/a15f24c567953af57c3a02a963577980

Any ideas/pointers are appreciated.

Minimum GCC version to build level-zero API

Level-zero newbie here!

I am running level-zero on CentOS 7.8.

The default GCC version on CentOS is 4.8.5, which seems to be very old to compile level-zero.

By using:

$ scl enable devtoolset-9 bash
$ gcc --version
gcc (GCC) 9.3.1 20200408 (Red Hat 9.3.1-2)

I can compile and run the examples successfully. I think it would be good to add the minimum GCC version in the README file.
GCC >= ??

timeouts should be 64-bit as 4 seconds might not be enough

UINT32_MAX in nanoseconds is just over 4 seconds which seems to short. All timeouts should be uint64_t for future proofiing.

ZeEventPool(ze_event_pool_desc_t(ZE_EVENT_POOL_FLAG_TIMESTAMP)) fails validation

level-zero/include/core/ze_event.h

Lines 36 to 44 in f50b570

 typedef enum _ze_event_pool_flag_t 

 { 

 ZE_EVENT_POOL_FLAG_DEFAULT = 0, ///< signals and waits visible to the entire device and peer devices 

 ZE_EVENT_POOL_FLAG_HOST_VISIBLE = ZE_BIT(0), ///< signals and waits are also visible to host 

 ZE_EVENT_POOL_FLAG_IPC = ZE_BIT(1), ///< signals and waits may be shared across processes 

 ZE_EVENT_POOL_FLAG_TIMESTAMP = ZE_BIT(2), ///< Indicates all events in pool will contain timestamp information that 

 ///< can be queried using ::zeEventGetTimestamp 

 } ze_event_pool_flag_t;

level-zero/source/layers/validation/ze_core_layer.cpp

Lines 1287 to 1288 in f50b570

 if( 4 <= desc->flags ) 

 return ZE_RESULT_ERROR_INVALID_ENUMERATION;

zeKernelGetSourceAttributes: invalid ERROR_INVALID_NULL_POINTER with parameter validation

When invoking zeKernelGetSourceAttributes with a nullptr pString pointer to obtain the attribute size, https://github.com/intel/compute-runtime/blob/128cd8a31c16977ecc41bf13bdd35c2ac4907a5b/level_zero/core/source/kernel/kernel_imp.cpp#L443-L445, the validator erroneously throws an ERROR_INVALID_NULL_POINTER,

level-zero/source/layers/validation/ze_valddi.cpp

Lines 3087 to 3088 in 284ccb0

 if( nullptr == pString ) 

 return ZE_RESULT_ERROR_INVALID_NULL_POINTER;

zeModuleGetGlobalPointer can not get correct size of global variable

When we invoked zeModuleGetGlobalPointer to retrieve the size of global variable specified by the 2nd argument (which is a string for global variable's name) , zeModuleGetGlobalPointer just returned 0 in the 3rd argument, but gave correct device visible pointer in the 4th argument.

We referred to the description at: https://spec.oneapi.com/level-zero/latest/core/api.html?highlight=globalpointer#zemodulegetglobalpointer . The L0 module was create by adding the compilation flag '-cl-std=CL2.0 -cl-take-global-address'

We tried the definition of global variable as scalar type or array type, neither of their size can be retrieved via zeModuleGetGlobalPointer correctly.

[Question] Build instructions for Windows?

Is there any documentation about how to build level-zero from source on Windows 10? The documentation only specifies for Linux.

device memory allocations required to have mappings/be dereferenceable?

Looking at the device memory allocation API,

__ze_api_export ze_result_t __zecall
zeDriverAllocDeviceMem(
ze_driver_handle_t hDriver, ///< [in] handle of the driver instance
const ze_device_mem_alloc_desc_t* device_desc, ///< [in] pointer to device mem alloc descriptor
size_t size, ///< [in] size in bytes to allocate
size_t alignment, ///< [in] minimum alignment in bytes for the allocation
ze_device_handle_t hDevice, ///< [in] handle of the device
void** pptr ///< [out] pointer to device allocation
);

It looks like it always passes back a (void *). Is this expected to be a mapped pointer accessible from the CPU?

There are GPUs out there (maybe not yet from Intel) that have limited access to the VRAM from the CPU, by forcing that *pptr is a mapped address for all allocation this will limit the amount of mapped accessible objects you can have.

This why Vulkan has VkDeviceMemory objects, and Map/Unmap APIs.

Defines in headers don't match engine groups in linked OneAPI Sysman documentation

Documentation here: https://spec.oneapi.com/level-zero/latest/sysman/api.html#_CPPv418zes_engine_group_t

Lists following defines:

ZES_ENGINE_GROUP_RENDER_ALL
ZES_ENGINE_GROUP_3D_SINGLE
ZES_ENGINE_GROUP_3D_ALL
ZES_ENGINE_GROUP_3D_RENDER_COMPUTE_ALL
ZES_ENGINE_GROUP_MEDIA_ENHANCEMENT_SINGLE

However, they're missing from the latest L0 header: https://github.com/oneapi-src/level-zero/blob/master/include/zes_api.h#L1086

(And the values of several defines differ from the documentation.)

Ps. Link to L0 specification at end of README.md gives 404. Spec link in the sidebar works.

Removal of include_directories for wrapper include folder since no longer in code

level-zero/CMakeLists.txt

Line 111 in 0d30b1f

include_directories(${CMAKE_CURRENT_SOURCE_DIR}/source/wrapper/include)

Necessary OpenCL headers are not mentioned in Build instructions for level-zero

If to use provided build instructions the following error occurs:

CMake Error at CMakeLists.txt:77 (message):
  OpenCL headers required for interop APIs

It is clear after looking to cmake file that path to headers must be provided in the following way:
cmake -DOpenCL_INCLUDE_DIR=[path] ..

But should this be mentioned in instructions? So that user can build without problems from the first attempt and without looking to cmake files.

kernel arguments and group size thread safety recommendations

The spec doesn't seem to explicitly say this but kernel objects are currently defined should be generated from modules per command recording thread.

I wonder if it would be better to add a way to just record a kernel to a command list, set group size and inputs for and dispatch it. This seems more thread safe and would allow for sharing kernel objects across threads if all you wanted to do was record a command stream with different inputs or group sizes.

Otherwise you'd have to create a new kernel object per command stream, I suppose it depends on whether the API usage envisages kernel objects are being light to construct or something that will trigger a compiler backend pass.

libze_loader.so is missing

Hi,

Could you please put the libze_loader.so symlink in the .deb package also?
Currently there are only files with versions installed, e.g. libze_loader.so.0.91

Thank you.
BR,
Anton

Wrong error code after zeKernelCreate

I am getting this error code 0x78000004, after trying to create a SPIRV kernel with the wrong name. The error is not listed here:

https://spec.oneapi.com/level-zero/latest/core/api.html?highlight=zecommandlistclose#zekernelcreate

This error code code corresponds to an invalid argument:

ZE_RESULT_ERROR_INVALID_ARGUMENT = 0x78000004

https://spec.oneapi.com/level-zero/latest/core/api.html?highlight=zecommandlistclose#_CPPv432ZE_RESULT_ERROR_INVALID_ARGUMENT

However, following the documentation, it should be

ZE_RESULT_ERROR_INVALID_KERNEL_NAME = 0x78000011

https://spec.oneapi.com/level-zero/latest/core/api.html?highlight=zecommandlistclose#ze__api_8h_1a3ea791bcf22ddfc1d413579b12c22767ad719a3474cbfc829c6edb4c622e568d5

I am getting this for SPIR-V modules with OpenCL compute.

Some details about the environment:

Level Zero version: 284ccb089184180e34864a9f1e23971d3d736bd8
Platform: Intel(R) HD Graphics 630 [0x591b]
OpenCL version: OpenCL C 3.0
Driver: 21.12.19358

ZES_ENABLE_SYSMAN vs multiple layers using L0

Hello

I am porting hwloc to L0 for exposing the locality of GPU devices. Things worked fine with API 0.91 but porting to 1.0 doesn't look good.

I am listing devices with zeDriverGet() and zeDeviceGet() and then calling zesDevicePciGetProperties() and zesDeviceGetProperties() after casting into zes_device_handle_t. As documented in 1.0 doc, this only works if ZES_ENABLE_SYSMAN=1 in the environment. This looks like a terrible idea.

Requiring a third-party library to always modify the environment of a program isn't nice. It's ok when enabling some debugging in a library from time to time, but not when enabling a mandatory feature. I . Passing a flag to zeInit() would be better.
This won't work if multiple layers use L0 at the same time: an application using L0 (most likely a runtime system managing tasks) will call zeInit() without ZES_ENABLE_SYSMAN (because it doesn't need it). It will then call hwloc to get the locality of that some ze devices (to select the local one, etc). hwloc will fail because sysman wasn't enabled in L0 (it will actually segfault in the current implementation). hwloc could set the env var and call its own zeInit() too but it's likely too late anyway. Given that all drivers/devices are shared between all these layers (which is good), the behavior of L0 will basically change depending on which layer calls zeInit() first, that's unpredictable and bad.

0.95 had zetSysmanGet() for converting between ze and zet devices. 1.0 should just do the same: have an explicit function to enable sysman and "cast" a ze device handle to zes. The env var of init flags won't work just like multiple MPI_Init() don't work in MPI.

Incoherent Documentation/Validation between zeModuleCreate and ze_module_desc_t

Currently, the documentation of zeModuleCreate specifies that nullptr == desc->pConstants is an error, and the validation layer checks this. But the documentation of ze_module_desc_t specifies that pConstants must be set to nullptr if no specialization constants are provided.

Edit: The validation of zeModuleCreate also checks:

 if( 1 <= desc->format )
                  return ZE_RESULT_ERROR_INVALID_ENUMERATION;

But there exist two possible formats:
ZE_MODULE_FORMAT_SPIRV (0)
ZE_MODULE_FORMAT_NATIVE (1)

pkg-config file for level-zero missing

pkg-config is the standard interface of declaring project version, include paths and libraries: https://en.wikipedia.org/wiki/Pkg-config

It's used by most open source projects and all build systems support using it for checking these details. I think level-zero should also install level-zero.pc file with this info.

8th generation gpu support

Is it possible that level 0 can support Broadwell GPUs ?

I got segfault when running a program as shown below.

Thread 1 "a.out" received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
(gdb) bt
#0 0x0000000000000000 in ?? ()
#1 0x00007ffff6573b94 in L0::Kernel::create(unsigned int, L0::Module*, _ze_kernel_desc_t const*, _ze_result_t*) ()
from /usr/local/lib/libze_intel_gpu.so.0.8
#2 0x00007ffff6575cf6 in L0::ModuleImp::createKernel(_ze_kernel_desc_t const*, _ze_kernel_handle_t**) ()
from /usr/local/lib/libze_intel_gpu.so.0.8
#3 0x00007ffff6568207 in L0::BuiltinFunctionsLibImpl::loadBuiltIn(unsigned int, char const*) ()
from /usr/local/lib/libze_intel_gpu.so.0.8
#4 0x00007ffff656832b in L0::BuiltinFunctionsLibImpl::initFunctions() () from /usr/local/lib/libze_intel_gpu.so.0.8
#5 0x00007ffff656bb9d in L0::Device::create(L0::DriverHandle*, NEO::Device*) () from /usr/local/lib/libze_intel_gpu.so.0.8
#6 0x00007ffff656cc0b in L0::DriverHandleImp::initialize(std::vector<std::unique_ptr<NEO::Device, std::default_deleteNEO::Device >, std::allocator<std::unique_ptr<NEO::Device, std::default_deleteNEO::Device > > >) () from /usr/local/lib/libze_intel_gpu.so.0.8
#7 0x00007ffff656cd9b in L0::DriverHandle::create(std::vector<std::unique_ptr<NEO::Device, std::default_deleteNEO::Device >, std::allocator<std::unique_ptr<NEO::Device, std::default_deleteNEO::Device > > >) () from /usr/local/lib/libze_intel_gpu.so.0.8
#8 0x00007ffff656da0a in L0::DriverImp::initialize(bool*) () from /usr/local/lib/libze_intel_gpu.so.0.8
#9 0x00007ffff656d60e in ?? () from /usr/local/lib/libze_intel_gpu.so.0.8
#10 0x00007ffff621c827 in __pthread_once_slow () from /lib/x86_64-linux-gnu/libpthread.so.0
#11 0x00007ffff656d8f0 in L0::init(_ze_init_flag_t) () from /usr/local/lib/libze_intel_gpu.so.0.8
#12 0x00007ffff7b2df6a in zeInit () from /usr/local/lib/libze_loader.so.0.91
#13 0x0000555555557285 in main ()

Usage of _ze_module_desc_t

To create a Module, you need to pass a module description. (link)

typedef struct _ze_module_desc_t
{
    ze_module_desc_version_t version;
    ze_module_format_t format;
    size_t inputSize;                               ///< [in] size of input IL or ISA from pInputModule.
    const uint8_t* pInputModule;                    ///< [in] pointer to IL or ISA
    const char* pBuildFlags;
    const ze_module_constants_t* pConstants;
} ze_module_desc_t;

I'm having trouble determining the 3rd and 4th argument. Could someone provide me code samples of ze_module_desc_t for both "ZE_MODULE_FORMAT_IL_SPIRV" and "ZE_MODULE_FORMAT_NATIVE" format.

libze_loader.so loads itself.

Hello,

The library part of libze_loader.so loads itself:

level-zero/source/lib/ze_lib.cpp

Line 32 in 2fb87e7

 loader = LOAD_DRIVER_LIBRARY( MAKE_LIBRARY_NAME( "ze_loader", L0_LOADER_VERSION) ); 

This makes sense should the library be built as a standalone one and needs to open the loader library.
This is harmful for the loader itself as it can be made to use external symbols for it's initialization. The most simple example I can think of is:

#include <dlfcn.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

typedef int (*zeInit_t)();

int main() {
    const char *s;

    s = getenv("ZELIBPATH");       
    if (!s) {
        printf("No path given.\n");
        exit(1);
    }

    void *handle = NULL;
    handle = dlopen(s, RTLD_LAZY | RTLD_LOCAL);
    if (!handle) {
        printf("Wrong path given.\n");
        exit(1);
    }

    zeInit_t pf = (zeInit_t)(intptr_t)dlsym(handle, "zeInit");
    int res = (*pf)(0);
    if (res != 0)
        printf("Error.\n");
    else
        printf("Success.\n");
}

Compiling the sample and invoking it with (and with 2 different ze_loader installation):

LD_PRELOAD=/path/to/first/libze_loader.so ZELIBPATH=/path/to/second/libze_loader.so a.out

will cause a segfault, because the second ze_loader will try to use the first ze_loader symbols to initialize itself.
I think the loader shouldn't be trying to open itself, and use a different mechanism if the library is standalone or used inside a loader.

I hope I made a good enough job of explaining the issue.

Thanks,
Brice

(on linux) the hardcoded driver names include the library version

On Linux, the hardcoded library names include the major and minor library version, i.e.

MAKE_LIBRARY_NAME( "ze_intel_gpu", "0.4")

results in libze_intel_gpu.so.0.4.

Being hardcoded, this breaks loading the driver when its major or minor version changes (currently it is libze_intel_gpu.so.0.8).

Until a proper mechanism for enumerating the drivers is introduced, would it make sense to drop the version number from the library name also on Linux, as it is already the case on Windows ?

zeKernelSetGroupSize exception when setting wrong dimensions

Playing a bit with the thread scheduler I noticed that the following call throws an exception when setting a wrong block size:

uint32_t groupSizeX = 32u;
uint32_t groupSizeY = 32u;
uint32_t groupSizeZ = 1u;
zeKernelSetGroupSize(kernel, groupSizeX, groupSizeY, groupSizeZ);

The exception is as follows:

Error at zeKernelSetGroupSize(kernel, groupSizeX, groupSizeY, groupSizeZ): main: 161
terminate called without an active exception
Aborted (core dumped)

From my understanding, following the spec (https://spec.oneapi.com/level-zero/latest/core/api.html#zekernelsetgroupsize), we should get the error code ZE_RESULT_ERROR_INVALID_GROUP_SIZE_DIMENSION.

https://spec.oneapi.com/level-zero/latest/core/api.html#ze__api_8h_1a3ea791bcf22ddfc1d413579b12c22767ade51cf698db72dacd34378a5c33d45cf

Is this the expected behaviour?

I am using level zero: ae10adc
compute-runtime: 21.20.19883

Tracing callbacks get driver handles

From what I understand of the source code of both the loader and the driver, tracer callbacks get driver handles rather than loader handles (for level zero Objects). This raises a couple of issues:

If my tracer code is linked with the loader (binding, application adding tracepoints), I cannot use API calls inside the tracer using the handles obtained through the tracer callbacks.
In reverse, if, in the tracer, I override some arguments handles with ones obtained through API calls, those will be loader handles and cause issues with the driver call.

Am I correct?
Does this also mean that I cannot mix ddi API code with loader API code as objects would be different (unless I force loader intercept and ddi becomes moot)?

write a tool using level0 loadable through LD_PRELOAD

I tried to write a simple tool that I can load with LD_PRELOAD (in the spirit of MPI tools for example, or OpenCL interposers).
It seems to initialize OK but it doesn't get called.
One problem is that I can't get to the device handles that the runtime initializes, however the level0 spec seems to imply that I should be able to open the device again.
Code attached. Is this hopeless? Seems like there should be a way to have third-party tools invoked.

tool.cpp.txt

FYI: piglit level0 port

This isn't really an issue, it's more of an FYI to level0 developers.

In the piglit open source testing project we have a bunch of CL program tests. These tests are individual files describing a test run by an test execution binary. The tests cover a bunch of CL builtins and other things. I recently added CL SPIR-V support to the testing binary, which uses clang + llvm-spirv to generate spir-v from the CL C tests.

Following on from that I've gotten basic level0 execution working from the same files.

https://gitlab.freedesktop.org/airlied/piglit/-/tree/lvl0

I've only tested this against my own level0 implementation not the intel one and I'm not even 100% confident it's working at all, but some tests to appear to pass.

This tool might be useful for writing quick tests also that just use CL C to exercise some corner cases. It doesn't have image or half support yet, and a few other things are likely broken, but I thought I'd mention it in case it was useful.

image formats should be kept compatible with vulkan

I'm not sure why level 0 invents a new set of image parameters, it probably makes more sense to just import the vulkan image format list and use that as a baseline so you can remain compatible.

No other API does a layout/type/swizzle as the format without having a name and definition for users.

I don't see any clear advantage in diverging in this area.

loader::zeCommandQueueExecuteCommandLists "spoils" the input command list handles

loader::zeCommandQueueExecuteCommandLists "spoils" the input command list handles.
ze_ldrddi.cpp:818 - 819
I'm quite confused because now I do not understand how it has been working all this time...
Normally user has a local variable ze_command_list_handle_t cmd_list and passes it to zeCommandQueueExecuteCommandLists by address.
It's quite unexpected that its value changes after the call. And I guess it becomes non-usable for the further API calls like Destroy, or Reset, etc?
Is it just that this implementation is not used in the default case?
I got to this code from a machine with igfx + dgfx running 2 different drivers.

why no semaphores? events are a bit fine grained

While level 0 has fences and events it doesn't appear to have any semaphore support.

Events can be a messy solution from a scheduler point of view, as they pretty much block the GPU command processing hard at a point in time, some GPUs can reschedule other command streams, some can't. So while useful events can be a bit of a large hammer to hit a system with from a user application. Also GPU selected events invariably mean the CPU side is spinning on some memory location polling the event, again not really a great thing to have to depend on.

Vulkan recently added timeline semaphores and perhaps there should be some consideration of adding them to Level 0 as a better scheduling mechanism.

ze_device_kernel_properties_t.dp4aSupported contains garbage

ze_device_kernel_properties_t kernel_properties;
kernel_properties.version = ZE_DEVICE_KERNEL_PROPERTIES_VERSION_CURRENT;
kernel_properties.dp4aSupported = 100;
zeDeviceGetKernelProperties(allDevices[d], &kernel_properties);

printf("dp4aSupported: %i\n", kernel_properties.dp4aSupported);

This retains the 100 value. Am I supposed to zero-initialize these structs?

Also, is the use of the version field correct here?

Question: Is it possible to access Altera FPGAs with Level-Zero?

Following the spec:
https://spec.oneapi.com/level-zero/latest/core/INTRO.html

Level-zero SPEC also works for spatial architectures.

When looking at the zeInit function (https://github.com/oneapi-src/level-zero/blob/master/include/ze_api.h#L567), the current implementation only supports GPU.

typedef enum _ze_init_flag_t
{
    ZE_INIT_FLAG_GPU_ONLY = ZE_BIT(0),              ///< only initialize GPU drivers
    ZE_INIT_FLAG_FORCE_UINT32 = 0x7fffffff

} ze_init_flag_t;

Are there any plans to include FPGAs?

Related to FPGAs, if I want Level-Zero and SPIRV, I guess I would need at least OpenCL 2.1? However, I see that the latest Quartus still implements OpenCL 1.0. Any plans on this front?

Thanks
Juan

zes_fan_config_t doesn't contain a proper speed table configuration

zesFanGetConfig callback is intended to return the fan configuration and in case if the fan is configured for work with table - speedTable field should contain array of temperature/speed pairs, not a single one.

In current form of spec speedTable field in zes_fan_config_t is representing "A table containing temperature/speed pairs", but that's obviously not true because used type - zes_fan_temp_speed_t is represent only single temperature/speed pair, it's not an array.
To match logic, speedTable should be at least a literally array - which means that it still need to change it's type from zes_fan_temp_speed_t to zes_fan_temp_speed_t[]. The proper a data structure that will contain not only array of temperature/speed pairs, but also information how many of them are defined is zes_fan_speed_table_t.

I've created a pull request to update API - #44, but since corresponding change should be done in spec itself first, I 'm going to close PR and track this problem as an issue as suggested by @bmyates.

is there a public interface for the C++ API wrapper ?

Under source/wrapper/ there is what looks like a C++ wrapper on top of the Level Zero C API.

While it is built into a set of libraries (libze_api_cpp.a and libzet_api_cpp.a), there doesn't seem to be any public interface for it.

[Question] Blocking calls for data transfers and kernel launch?

In the spec, it seems data transfers using this function

zeCommandListAppendMemoryCopy

are not blocking. Is there any variant for blocking calls? Or is there any equivalent to the OpenCL call clEnqueue{Read/Write}Buffer with CL_TRUE to indicate a blocking call?

Similar to the kernel launch, is there any way to specify a blocking call?

What I have found is to close a command lists, and then launch all pending command within the list after each data transfer or kernel launch. For example, by running the following sequence:

zeCommandListAppendMemoryCopy( .. )
zeCommandListClose( .. )
zeCommandQueueExecuteCommandLists ( .. )
zeCommandQueueSynchronize ( ..) 
zeCommandListReset ( .. )

but is there any other way to get blocking calls?

	typedef enum _ze_kernel_attribute_t
	{
	ZE_KERNEL_ATTR_INDIRECT_HOST_ACCESS = 0, ///< Indicates that the function accesses host allocations indirectly
	///< (default: false, type: bool_t)
	ZE_KERNEL_ATTR_INDIRECT_DEVICE_ACCESS, ///< Indicates that the function accesses device allocations indirectly
	///< (default: false, type: bool_t)
	ZE_KERNEL_ATTR_INDIRECT_SHARED_ACCESS, ///< Indicates that the function accesses shared allocations indirectly
	///< (default: false, type: bool_t)
	ZE_KERNEL_ATTR_SOURCE_ATTRIBUTE, ///< Declared kernel attributes (i.e. can be specified with __attribute__
	///< in runtime language). (type: char[]) Returned as a null-terminated
	///< string and each attribute is separated by a space.
	///< ::zeKernelSetAttribute is not supported for this.

	} ze_kernel_attribute_t;

	typedef enum _ze_event_scope_flag_t
	{
	ZE_EVENT_SCOPE_FLAG_NONE = 0, ///< execution synchronization only; no cache hierarchies are flushed or
	///< invalidated
	ZE_EVENT_SCOPE_FLAG_SUBDEVICE = ZE_BIT(0), ///< cache hierarchies are flushed or invalidated sufficient for local
	///< sub-device access
	ZE_EVENT_SCOPE_FLAG_DEVICE = ZE_BIT(1), ///< cache hierarchies are flushed or invalidated sufficient for global
	///< device access and peer device access
	ZE_EVENT_SCOPE_FLAG_HOST = ZE_BIT(2), ///< cache hierarchies are flushed or invalidated sufficient for device and
	///< host access

	} ze_event_scope_flag_t;

	if( 4 <= desc->signal )
	return ZE_RESULT_ERROR_INVALID_ENUMERATION;

	if( 4 <= desc->wait )
	return ZE_RESULT_ERROR_INVALID_ENUMERATION;

	typedef enum _ze_event_pool_flag_t
	{
	ZE_EVENT_POOL_FLAG_DEFAULT = 0, ///< signals and waits visible to the entire device and peer devices
	ZE_EVENT_POOL_FLAG_HOST_VISIBLE = ZE_BIT(0), ///< signals and waits are also visible to host
	ZE_EVENT_POOL_FLAG_IPC = ZE_BIT(1), ///< signals and waits may be shared across processes
	ZE_EVENT_POOL_FLAG_TIMESTAMP = ZE_BIT(2), ///< Indicates all events in pool will contain timestamp information that
	///< can be queried using ::zeEventGetTimestamp

	} ze_event_pool_flag_t;

	if( 4 <= desc->flags )
	return ZE_RESULT_ERROR_INVALID_ENUMERATION;

	if( nullptr == pString )
	return ZE_RESULT_ERROR_INVALID_NULL_POINTER;

oneapi-src / level-zero Goto Github PK

level-zero's People

Contributors

Stargazers

Watchers

Forkers

level-zero's Issues

Recommend Projects

Recommend Topics

Recommend Org