Git Product home page Git Product logo

clspv's Introduction

clspv Discord Shield

Clspv is a prototype compiler for a subset of OpenCL C to Vulkan compute shaders.

It consists of:

  • A set of LLVM Module passes to transform a dialect of LLVM IR into a SPIR-V module containing Vulkan compute shaders.
  • A command line compiler tool called 'clspv' to compile a subset of OpenCL C into a Vulkan compute shader.

Clspv depends on external projects:

Legal

Clspv is licensed under the terms of the Apache 2.0 license. The AUTHORS file lists the copyright owners, while individual credit is given in the CONTRIBUTORS file. To contribute, see CONTRIBUTING.md.

Materials in projects Clspv depends on are licensed under their own terms.

Clspv is not an official Google product.

Status

The compiler is an incomplete prototype, with many rough edges.

The input language is a subset of OpenCL C version 1.2. The OpenCL C on Vulkan Specification describes the specific subset, and also the mapping into Vulkan compute shaders.

Examples

Compile a set of kernels into a SPIR-V binary module:

clspv foo.cl -o foo.spv

Emit the binary as a C initializer list, for easy embedding of a shader in in a C or C++ program source:

clspv -mfmt=c foo.cl -o -

Predefine some preprocessor symbols:

clspv -DWIDTH=32 -DHEIGHT=64 foo.cl -o foo.spv

Use OpenCL compiler options:

clspv -cl-fast-relaxed-math -cl-single-precision-constant foo.cl -o foo.spv

Show help:

clspv -help

Compiler-explorer

Compiler Explorer is an interactive compiler exploration website. Edit code in C, C++, C#, F#, Rust, Go, D, Haskell, Swift, Pascal, ispc, Python, Java or in any of the other 30+ supported languages, and see how that code looks after being compiled in real time. Multiple compilers are supported for each language, many different tools and visualisations are available, and the UI layout is configurable (thanks to GoldenLayout).

-- https://github.com/compiler-explorer/compiler-explorer#readme

clspv is available in compiler-explorer among the OpenCL C compilers.

It is built daily by the compiler-explorer infrastructure.

Build

Tools

You will need:

  • CMake
  • Python3
  • A C++ compiler
  • git

Getting sources for dependencies

Clspv depends on the sources for other projects, at specific commits. Run the following command to download those dependencies, and place them in the third_party directory:

python3 utils/fetch_sources.py

Building

Then, create a build directory:

cd <clspv-dir>
mkdir build
cd build

Then configure and build the code:

cmake <clspv-dir>
cmake --build .

This will build the clspv command line compiler and place it in location bin/clspv under the build directory.

Using Ninja to build, and other build options

We recommend you use the Ninja build tool if it's available. To do so, replace the last two commands with:

cmake -G Ninja <clspv-dir>
ninja

Other useful configuration options (the first cmake command):

  • -DCMAKE_BUILD_TYPE=RelWithDebInfo : Build in release mode, with debugging information. Default is a debug build.

See the CMake documentation for more generic options.

Test

To run the test suite from within the build directory:

cmake --build . --target check-spirv

Or if you are using Ninja:

ninja check-spirv

Builtin Libraries

Clspv includes two LLVM IR libraries (cmake/clspv--.bc, cmake/clspv64--.bc) containing implementations of some OpenCL builtin functions, for the spir and spir64 targets respectively. These files are distributed under the LLVM license (included in LICENSE) as they are generated from the LLVM sub-project libclc. Refer to the source for the relevant copyrights.

Rebuilding the Library

Required Tools

You will need a pre-built version of LLVM for your system, CMake, and (optionally) Ninja.

Build

To rebuild the library run the following commands:

cmake -GNinja <libclc dir> -DLIBCLC_TARGETS_TO_BUILD="clspv--;clspv64--" -DLLVM_CMAKE_DIR=</path/to/llvm_installation/lib/cmake/>
ninja

Copy the resulting clspv--.bc and clspv64--.bc files into the cmake/ directory and rebuild clspv.

clspv's People

Contributors

aarongreig avatar alan-baker avatar alexdemydenko avatar antiagainst avatar callumfare avatar daemyung avatar derekjchow avatar dj2 avatar dneto0 avatar dnovillo avatar fmarno avatar forchapeatl avatar gnl21 avatar jasper-bekkers avatar jgavris avatar jrprice avatar kpet avatar krooze avatar lpavank avatar mantognini avatar mehmetoguzderin avatar omarahmed1111 avatar rabijl avatar rekt3421 avatar rjodinchr avatar rsenapps avatar shangwuyao avatar sjw36 avatar sudonatalie avatar timgates42 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

clspv's Issues

Truncation from i32 to i8 should keep only the bottom 8 bits

Example from test/char4_insert.cl

kernel void foo(global uchar4* A, int n) {
 *A = (uchar4)(1,2,(uchar)n,4);
}

Currently produces:

         %27 = OpLoad %uint %26
         %28 = OpUConvert %uint %27  ;  This is a no-op.
         %29 = OpLoad %uint %25
         %30 = OpShiftLeftLogical %uint %uint_255 %uint_8
         %31 = OpNot %uint %30
         %32 = OpBitwiseAnd %uint %29 %31
         %33 = OpShiftLeftLogical %uint %28 %uint_8
         %34 = OpBitwiseOr %uint %32 %33
               OpStore %25 %34

The UConvert to produce %28 is a no-op. The i8 LLVM type maps to a 32-bit SPIR-V int. The LLVM truncation should become an "and" with 255.

Writing all zero's to a pointer generates incorrect code

Two test cases:

This one is broken and generates only one OpStore (for one float):

__kernel void myTest(__global float* jasper)
{
  *jasper++ = 0;
  *jasper++ = 0;
  *jasper++ = 0;
  *jasper++ = 0;
}
; SPIR-V
; Version: 1.0
; Generator: Codeplay; 0
; Bound: 22
; Schema: 0
               OpCapability Shader
               OpCapability VariablePointers
               OpExtension "SPV_KHR_storage_buffer_storage_class"
               OpExtension "SPV_KHR_variable_pointers"
               OpMemoryModel Logical GLSL450
               OpEntryPoint GLCompute %19 "myTest"
               OpSource OpenCL_C 120
               OpDecorate %13 SpecId 0
               OpDecorate %14 SpecId 1
               OpDecorate %15 SpecId 2
               OpDecorate %_runtimearr_float ArrayStride 4
               OpMemberDecorate %_struct_4 0 Offset 0
               OpDecorate %_struct_4 Block
               OpDecorate %gl_WorkGroupSize BuiltIn WorkgroupSize
               OpDecorate %18 DescriptorSet 0
               OpDecorate %18 Binding 0
      %float = OpTypeFloat 32
%_ptr_StorageBuffer_float = OpTypePointer StorageBuffer %float
%_runtimearr_float = OpTypeRuntimeArray %float
  %_struct_4 = OpTypeStruct %_runtimearr_float
%_ptr_StorageBuffer__struct_4 = OpTypePointer StorageBuffer %_struct_4
       %uint = OpTypeInt 32 0
       %void = OpTypeVoid
          %8 = OpTypeFunction %void
     %v3uint = OpTypeVector %uint 3
%_ptr_Private_v3uint = OpTypePointer Private %v3uint
     %uint_0 = OpConstant %uint 0
    %float_0 = OpConstant %float 0
         %13 = OpSpecConstant %uint 1
         %14 = OpSpecConstant %uint 1
         %15 = OpSpecConstant %uint 1
%gl_WorkGroupSize = OpSpecConstantComposite %v3uint %13 %14 %15
         %17 = OpVariable %_ptr_Private_v3uint Private %gl_WorkGroupSize
         %18 = OpVariable %_ptr_StorageBuffer__struct_4 StorageBuffer
         %19 = OpFunction %void None %8
         %20 = OpLabel
         %21 = OpAccessChain %_ptr_StorageBuffer_float %18 %uint_0 %uint_0
               OpStore %21 %float_0
               OpReturn
               OpFunctionEnd

This one works and generates 4 OpStore's, one for each value:

__kernel void myTest(__global float* jasper)
{
  *jasper++ = 0;
  *jasper++ = 1;
  *jasper++ = 2;
  *jasper++ = 3;
}
; SPIR-V
; Version: 1.0
; Generator: Codeplay; 0
; Bound: 31
; Schema: 0
               OpCapability Shader
               OpCapability VariablePointers
               OpExtension "SPV_KHR_storage_buffer_storage_class"
               OpExtension "SPV_KHR_variable_pointers"
               OpMemoryModel Logical GLSL450
               OpEntryPoint GLCompute %25 "myTest"
               OpSource OpenCL_C 120
               OpDecorate %19 SpecId 0
               OpDecorate %20 SpecId 1
               OpDecorate %21 SpecId 2
               OpDecorate %_runtimearr_float ArrayStride 4
               OpMemberDecorate %_struct_4 0 Offset 0
               OpDecorate %_struct_4 Block
               OpDecorate %gl_WorkGroupSize BuiltIn WorkgroupSize
               OpDecorate %24 DescriptorSet 0
               OpDecorate %24 Binding 0
      %float = OpTypeFloat 32
%_ptr_StorageBuffer_float = OpTypePointer StorageBuffer %float
%_runtimearr_float = OpTypeRuntimeArray %float
  %_struct_4 = OpTypeStruct %_runtimearr_float
%_ptr_StorageBuffer__struct_4 = OpTypePointer StorageBuffer %_struct_4
       %uint = OpTypeInt 32 0
       %void = OpTypeVoid
          %8 = OpTypeFunction %void
     %v3uint = OpTypeVector %uint 3
%_ptr_Private_v3uint = OpTypePointer Private %v3uint
     %uint_0 = OpConstant %uint 0
     %uint_1 = OpConstant %uint 1
    %float_0 = OpConstant %float 0
     %uint_2 = OpConstant %uint 2
    %float_1 = OpConstant %float 1
     %uint_3 = OpConstant %uint 3
    %float_2 = OpConstant %float 2
    %float_3 = OpConstant %float 3
         %19 = OpSpecConstant %uint 1
         %20 = OpSpecConstant %uint 1
         %21 = OpSpecConstant %uint 1
%gl_WorkGroupSize = OpSpecConstantComposite %v3uint %19 %20 %21
         %23 = OpVariable %_ptr_Private_v3uint Private %gl_WorkGroupSize
         %24 = OpVariable %_ptr_StorageBuffer__struct_4 StorageBuffer
         %25 = OpFunction %void None %8
         %26 = OpLabel
         %27 = OpAccessChain %_ptr_StorageBuffer_float %24 %uint_0 %uint_0
         %28 = OpAccessChain %_ptr_StorageBuffer_float %24 %uint_0 %uint_1
               OpStore %27 %float_0
         %29 = OpAccessChain %_ptr_StorageBuffer_float %24 %uint_0 %uint_2
               OpStore %28 %float_1
         %30 = OpAccessChain %_ptr_StorageBuffer_float %24 %uint_0 %uint_3
               OpStore %29 %float_2
               OpStore %30 %float_3
               OpReturn
               OpFunctionEnd

Generated code fails validation: The Object type (OpTypePointer) in OpCompositeInsert does not match the type that results from indexing into the Composite (OpTypeRuntimeArray).

Code generated from Convolution Layer case in #11 fails validation.

info: 172: Duplicate non-aggregate type declarations are not allowed. Opcode: 21 error: 1251: The Object type (OpTypePointer) in OpCompositeInsert does not match the type that results from indexing into the Composite (OpTypeRuntimeArray).

This might actually be two distinct issues. I've seen the duplicate non-aggregate type decl issue before, and will file that separately. This issue will be for the type mismatch.

compiler crashes with floating point exception

Occurs when compiling this code:

typedef struct {
  int a, b;
} S;

S convert(int n) {
  S s = { n, n};
  return s;
}

kernel void foo(global S* A, global uchar4* B, int n) {
 *B = (uchar4)((uchar)n,1,2,3);
 *A = convert(10);
}

Does not occur when S gains two extra int members, and the convert function is adjusted accordingly.

Some drivers don't like constants constructed with OpUndef scalar components

Example input:

void kernel foo(global float4* A)
{
  float4 value;
  value.w = 1111.0f;
  *A = value;
}

Produces assembly including this:

         %13 = OpUndef %float
 %float_1111 = OpConstant %float 1111
         %15 = OpConstantComposite %v4float %13 %13 %13 %float_1111
         %16 = OpSpecConstant %uint 1
         %17 = OpSpecConstant %uint 1
         %18 = OpSpecConstant %uint 1
%gl_WorkGroupSize = OpSpecConstantComposite %v3uint %16 %17 %18
         %20 = OpVariable %_ptr_Private_v3uint Private %gl_WorkGroupSize
         %21 = OpVariable %_ptr_StorageBuffer__struct_5 StorageBuffer
         %22 = OpFunction %void None %9
         %23 = OpLabel
         %24 = OpAccessChain %_ptr_StorageBuffer_v4float %21 %uint_0 %uint_0
               OpStore %24 %15
               OpReturn
               OpFunctionEnd

Some drivers have trouble with the OpConstantComposite. They happen to work if we use
%13 = OpConstantNull %float
instead.

About int16 requeriments..

Hi,

seeing "OpenCL C 1.2 Language on Vulkan" document I see:
"if the short/ushort types are used in the OpenCL C: shaderInt16 is required"

that's sad as seeing http://vulkan.gpuinfo.org reports shaderInt16 isn't supported by Nvidia GPU drivers and also not by AMD GPU Windows and RADV Linux drivers and not also by Intel Mesa drivers..
In fact only GPU OS system that has it is Intel Windows Vulkan drivers..

but I also see VK_KHR_16bit_storage new ext that allows " use of 16-bit types in shader input and output interfaces"
so seems this is also required if using int16 in OpenCL kernel arguments..

so questions are:
*VK_KHR_16bit_storage is also required in addition to shaderInt16 for kernels with int16 arguments?

*can shaderInt16 restriction be avoided by requiring only VK_KHR_16bit_storage for kernels with int16
variables/arguments by "treating it" internally as int32 for operations using it but storing as 16bits using VK_KHR_16bit_storage?

thanks..

Invalid OpLogicalNot when masking <4 x i8> values

Example:

kernel void foo(global uchar4* A, int n) {
 *A = (uchar4)((uchar)n,1,2,3);
}

Generates code like this:

         %27 = OpFunction %void None %10
         %28 = OpLabel
         %29 = OpAccessChain %_ptr_StorageBuffer_uint %25 %uint_0 %uint_0
         %30 = OpAccessChain %_ptr_StorageBuffer_uint_0 %26 %uint_0
         %31 = OpLoad %uint %30
         %32 = OpUConvert %uint %31
         %33 = OpShiftLeftLogical %uint %uint_255 %uint_0
         %34 = OpLogicalNot %uint %33   ; THIS IS WRONG
         %35 = OpBitwiseAnd %uint %uint_66051 %34
         %36 = OpShiftLeftLogical %uint %32 %uint_0
         %37 = OpBitwiseOr %uint %35 %36
               OpStore %29 %33
               OpReturn

Signed char extraction fails to take sign into account

kernel void foo_fixed(global uint* A, global char4* B) {
 char4 val = *B;
 A[0] = val.x;
 A[1] = val.y;
 A[2] = val.z;
 A[3] = val.w;
}

Generates, in part, this code:

         %34 = OpShiftRightLogical %uint %33 %uint_0
         %35 = OpBitwiseAnd %uint %34 %uint_255
         %36 = OpSConvert %uint %34    ; This signed conversion is not enough
               OpStore %31 %36

The bug is that the LLVM i8 is converted to a SPIR-V uint, and so the SConvert is from a uint to a uint. This does not do the sign extension that LLVM expects will be implicitly done as part of the conversion.

The clspv compiler needs to insert more instructions to replicate the sign bit from position 7 to all the remainder.

implement fmod

Compiler generates invalid code for:

kernel void foo(global float* A, float x, float y) {
  *A = fmod(x,y);
}

Can't disassemble. I get:

         %22 = OpFunction %void None %10
         %23 = OpLabel
         %24 = OpAccessChain %_ptr_StorageBuffer_float %19 %uint_0 %uint_0
         %25 = OpAccessChain %_ptr_StorageBuffer_float %20 %uint_0
         %26 = OpLoad %float %25
         %27 = OpAccessChain %_ptr_StorageBuffer_float %21 %uint_0
         %28 = OpLoad %float %27
error: 212: Id is 0
...

vector form of clz builtin produces invalid code

Example test/IntegerBuiltins/int2_clz.cl produces this (fragment):

         %14 = OpFunction %void None %9
         %15 = OpLabel
         %16 = OpAccessChain %_ptr_StorageBuffer_v2uint %12 %uint_0 %uint_0
         %17 = OpAccessChain %_ptr_StorageBuffer_v2uint %13 %uint_0 %uint_0
         %18 = OpLoad %v2uint %17
         %19 = OpExtInst %v2uint %1 FindUMsb %18
         %20 = OpISub %v2uint %uint_31 %19 ;   Scalar mixed with vector
               OpStore %16 %20
               OpReturn

The problem is the ISub operates on an scalar and a vector.

By default, reuse same descriptor set number for all kernels in a compilation unit

When creating a compute shader pipeline, you can only supply a single entry point. We can reduce the "pressure" on descriptor set counts by reusing the same descriptor set number for all kernels in a compilation unit. (It would be 0 if no sampler map, and 1 if there is a sampler map.)

Add a command line option to keep the old behaviour.

Use of uchar4 generates second copy of OpTypeInt 32 0

That's invalid by the SPIR-V spec. The validator briefly had that rule turned on, but some bad Vulkan CTS tests have duplicate scalar type decls that are hard to rewrite, so we turned that rule off.

Example input:

kernel void foo(global uchar4* A) {
  *A = (uchar4)(1,2,3,4);
}

Appears to be caused by SPIRVProducerPass::GenerateSPIRVTypes where it says "i8 is added to TypeMap as i32."
The issue is that the Types collection (returned by getTypeList) has entries for i8 and i32. But the i8 will be converted at the last second to an i32. So we end up with duplicate OpTypeInt 32 0.

Here's a fragment of the generated assembly:

%1 =    OpTypeInt 32 0    ; the first one
%2 =    OpTypePointer StorageBuffer %1
%3 =    OpTypeRuntimeArray %1
%4 =    OpTypeStruct %3
%5 =    OpTypePointer StorageBuffer %4
%6 =    OpTypeInt 32 0 ;   oops
%7 =    OpTypeVoid
%8 =    OpTypeFunction %7
%9 =    OpTypeVector %6 3

dead code in SPIRVProducer: no such thing as bool vector?

OpenCL C doesn't have vector-of-bool. But the SPIR-V code generation for cast operations tries to handle that case. In that case it would generate invalid code because the constants it selects from are scalars.
The check for vector-of-bool looks like a read herring: I suspect it never fires.

8/16 element vectors unsupported

A simple source file like:

__kernel void testv16()
{
	float16 f;
}

Produces the following error message:

$ ./build/bin/clspv vec16.cl 
vec16.cl:3:9: error: expected ';' after expression
        float16 f;
               ^
               ;
vec16.cl:3:2: error: use of undeclared identifier 'float16'
        float16 f;
        ^
vec16.cl:3:10: error: use of undeclared identifier 'f'
        float16 f;
                ^

Unsurprising given that the SPIR-V Vector16 capability currently depends on the Kernel capability.

Before someone makes a comment on that, I did notice it was a documented restriction. Do people find issues useful to track lifting these restrictions?

Translate LLVM's memmove

This currently causes an IE as follows:
%1 = bitcast float addrspace(1)* %A to i8 addrspace(1)*Handle above user of scalar bitcast with gep on ReplacePointerBitcastPass
UNREACHABLE executed at /build/dneto/aglet/SPIRV-LLVM/lib/ReplacePointerBitcastPass.cpp:834!

void kernel __attribute__((reqd_work_group_size(1, 1, 1)))
memmove_example(global float *A, int n, int k) {
  float src[20];
  for (int i = 0; i < 20; i++) {
    src[i] = A[k+i];
  }
  for (int i = 0; i < 20; i++) {
    A[n+i] = src[i];
  }
}
Intermediate translation is ; Function Attrs: norecurse nounwind
define spir_kernel void @src_is_array(float addrspace(1)* nocapture %A, i32 %n, i32 %k) local_unnamed_addr #0 !kernel_arg_addr_space !3 !kernel_arg_access_qual !4 !kernel_arg_type !5 !kernel_arg_base_type !5 !kernel_arg_type_qual !6 !reqd_work_group_size !7 {
entry:
  %0 = mul i32 %k, 4
  %1 = bitcast float addrspace(1)* %A to i8 addrspace(1)*
  %2 = lshr i32 %0, 2
  %3 = getelementptr i8, i8 addrspace(1)* %1, i32 %0
  %4 = mul i32 %n, 4
  %5 = bitcast float addrspace(1)* %A to i8 addrspace(1)*
  %6 = getelementptr i8, i8 addrspace(1)* %5, i32 %4
  call void @llvm.memmove.p1i8.p1i8.i32(i8 addrspace(1)* %6, i8 addrspace(1)* %3, i32 80, i32 4, i1 false)
  ret void
}

The test should go into test/LLVMIntrinsics alongside the memcpy tests

Request for fine grain image requeriments..

You say
"If images are used in the OpenCL C:
The shaderStorageImageReadWithoutFormat field of VkPhysicalDeviceFeatures must be set to true.
The shaderStorageImageWriteWithoutFormat field of VkPhysicalDeviceFeatures must be set to true.
"
also seeing vulkan.gpuinfo reports situation is better than int16 support at least modern NV and AMD GPUs support both reqs..
but Intel iGPUs don't support shaderStorageImageReadWithoutFormat neither Windows nor Linux driver..

So as OpenCL has __read_only and __write_only image access qualifiers maybe we can require only shaderStorageImageReadWithoutFormat in __read_only images and shaderStorageImageWriteWithoutFormat in __write_only images .. is that correct?

that way a kernel having only write only image access should work on Intel GPUs right?

thanks..

Document CMake settings

E.g. SKIP_CLSPV_TOOLS_INSTALL

Recommend use of CMAKE_BUILD_TYPE=Release ? (It links way way faster)

kernel lacks LocalSize execution mode if another kernel has reqd_work_group_size

Here the "align" kernel has a reqd_work_group_size. So it gets a LocalSize execution mode. It also suppresses generation of spec IDs for the components of a work-group-size vector.
However, the "boo" kernel is generated without an associated LocalSize execution model. This is an error. The most sane thing to do here is to emit a LocalSize of 1 1 1, and document it.

kernel void __attribute__((reqd_work_group_size(12,2,3))) align(global int* A, int x, float4 c) {
  *A = x + (int)c.x;
}

kernel void  boo(global int* A, int x, float4 c) {
  *A = x + (int)c.x;
}

There's an unfortunate problem in the Vulkan env spec in that a specialization value for workgroup size is freefloating and you can't tell what compute shaders it should affect. So the proposed defaulting is about as good as we can do.

OpPtrAccessChain into Private storage class sometimes generated when indexing into OpenCL module-scope constant

That's only allowed when indexing into StorageBuffer or Workgroup storage class.

See test/ProgramScopeConstants/constant_array_with_function_call.cl:

constant uint b[4] = { 42, 13, 0, 15 };

uint bar(constant uint* a) { return a[get_local_id(0)]; }

void kernel __attribute__((reqd_work_group_size(4, 1, 1))) foo(global uint* a) {
  *a = bar(b);
}

Will include the following in the body of bar():

%29 = OpPtrAccessChain %_ptr_Private_uint %25 %28

Neil Henning says:
So the issue is that:

  • program-scope constant variable and constant function argument variables are the same address space in OpenCL
  • we also conflate global and constant address spaces when we transform OpenCL C - > Vulkan SPIR-V (because we can't use, for instance, the UniformConstant storage class for constant variables)
  • we always generate OpPtrAccessChain when a non-kernel function is indexing into a pointer as if it is an array

I think the only safe thing to do here is if you see a program-scope constant variable being passed into a sub function, then inline that sub function.

Fails to build with latest SPIRV-Headers

The problem is a duplicate enum in a case table, due to token aliasing

include/clspv/spirv_c_strings.hpp: In function ‘const char* spv::getCapabilityName(spv::Capability)’:
include/clspv/spirv_c_strings.hpp:410:5: error: duplicate case value
     case spv::CapabilityShaderViewportIndexLayerNV: return "ShaderViewportIndexLayerNV"; 
     ^
include/clspv/spirv_c_strings.hpp:409:5: error: previously used here
     case spv::CapabilityShaderViewportIndexLayerEXT: return "ShaderViewportIndexLayerEXT"; 
     ^
ninja: build stopped: subcommand failed.

compiler adding StorageImageReadWithoutFormat when not using OpImageRead

Might be similar issue with StorageImageWriteWithoutFormat

Vulkan says:

The Image Format of an OpTypeImage declaration must not be Unknown, for variables which are used for OpImageRead or OpImageWrite operations, except under the following conditions:

...

For OpImageRead, if the shaderStorageImageReadWithoutFormat feature is enabled and the shader module declares the StorageImageReadWithoutFormat capability.

But clspv will add the capability even if only doing OpImageSampleExplicitLod. E.g. see tests/ImageBuiltins/read_imagef_sampler_float2.cl

It seems to be adding the capability when just mentioning the read image types.

The simplifying assumption is also reflected in the documentation.

Expected both objects to be of Result Type: Select

This is forked from comment #2 (comment)

The problem occurs when doing a somewhat simple selection between two loads from Private storage class (or __constant in OpenCL C). The inst-combine optimization changes

  • a selection between two loads
    into
  • a selection between two pointers, followed by a load from that pointer.

In the current compilation scheme the two pointers are in Private storage class, and the compiler incorrectly has the result of the selection being a pointer to StorageBuffer storage class.

Case is as below:


This is the symptom of compiling code like:

__constant float kFirst[3] = {1.0f, 2.0f, 3.0f};
__constant float kSecond[3] = {10.0f, 11.0f, 12.0f};

kernel void foo(global float*A, int c, int i) {
 *A = c==0 ? kFirst[i] : kSecond[i];
}

Produces this kind of code:

         %33 = OpVariable %_ptr_Private__arr_float_uint_3 Private %23
         %34 = OpVariable %_ptr_Private__arr_float_uint_3 Private %27
         %35 = OpVariable %_ptr_StorageBuffer__struct_4 StorageBuffer
         %36 = OpVariable %_ptr_StorageBuffer__struct_7 StorageBuffer
         %37 = OpVariable %_ptr_StorageBuffer__struct_7 StorageBuffer
         %38 = OpFunction %void None %11
         %39 = OpLabel
         %40 = OpAccessChain %_ptr_StorageBuffer_float %35 %uint_0 %uint_0
         %41 = OpAccessChain %_ptr_StorageBuffer_uint %36 %uint_0
         %42 = OpLoad %uint %41
         %43 = OpAccessChain %_ptr_StorageBuffer_uint %37 %uint_0
         %44 = OpLoad %uint %43
         %45 = OpIEqual %bool %42 %uint_0
         %46 = OpSelect %_ptr_StorageBuffer__arr_float_uint_3 %45 %33 %34
         %47 = OpAccessChain %_ptr_StorageBuffer_float %46 %44
         %48 = OpLoad %float %47
               OpStore %40 %48
               OpReturn
               OpFunctionEnd

Note the OpSelect at %46. Its operands are pointers into Private, but its result is pointer to StorageBuffer. That's invalid. We need the initializers for %33 and %34 but even with VariablePointers we can't select between two different pointer-to-Private values.


See also other cases from comments:

  • #2 (comment)
    See workarounds:
  • rewrite the loads as function calls, to isolate them from instcombine: #2 (comment)
  • rewrite the tables as two-dimensional: #2 (comment) (This is less general than wrapping in a function)

Another effective workaround is to use -O0 to disable the inst-combine optimization. But that's rather drastic.

generates OpSelect with operand 1 and 2 being pointers

Vulkan targets SPIR-V 1.0, and OpSelect in 1.0 requires the two data operands to be either scalar or vector. Not pointer type.

Example input:

void kernel __attribute__((reqd_work_group_size(1, 1, 1))) foo(global float2* a, global float2* b, int c)
{
  *a = c ? b[0] : b[1];
}

Generates:

         %20 = OpLabel
         %21 = OpAccessChain %_ptr_StorageBuffer_v2float %16 %uint_0 %uint_0
         %22 = OpAccessChain %_ptr_StorageBuffer_v2float %17 %uint_0 %uint_0
         %23 = OpAccessChain %_ptr_StorageBuffer_uint %18 %uint_0
         %24 = OpLoad %uint %23
         %25 = OpIEqual %bool %24 %uint_0
         %26 = OpAccessChain %_ptr_StorageBuffer_v2float %17 %uint_0 %uint_1
         %27 = OpSelect %_ptr_StorageBuffer_v2float %25 %26 %22
         %28 = OpLoad %v2float %27
               OpStore %21 %28
               OpReturn

The problem is that %27 selects between two pointers

full OpenCL C?

Would you accept patches to reach full OpenCL C compliance? Or is that a clear non-goal of the project?

vload_half3_* and vstore_half3_* are not supported

This is not documented in the language mapping doc.

Supporting 3-element vectors in replace-pointer-bitcasts is complicated because remapping 3-element vector into a pointer-to-4-elem vector can span elements in the target array.

cluster-pod-kernel-args: generates too-small Offsets, but correct output in descriptormap

Example:

kernel void align(global int* A, int x, float4 c) {
  *A = x + (int)c.x;
}

Compile with:
clspv align.cl -o align.spv -cluster-pod-kernel-args -descriptormap=foot

Then the x and c arguments are collected into a single struct argument, for a storage buffer. The descriptor map is:

kernel,align,arg,A,argOrdinal,0,descriptorSet,0,binding,0,offset,0
kernel,align,arg,x,argOrdinal,1,descriptorSet,0,binding,1,offset,0
kernel,align,arg,c,argOrdinal,2,descriptorSet,0,binding,1,offset,16

That's correct.

However, in SPIR-V we have:
OpMemberDecorate %_struct_8 0 Offset 0
OpMemberDecorate %_struct_8 1 Offset 4 ; this should be offset 16

So two bugs, really: Mismatch between descriptormap and the declared struct offset. Should make the declared struct offset the same as what's reported in the descriptormap.

Setup CI?

Any chance we could setup some CI for this? I've some notion to submit a PR that would allow you to build with an install of LLVM (rather than the versions we currently track), and at least that way we could run it on Travis CI / Appveyor?

Happy to submit the patches to do this, but think I'd need @dneto0 to enable CI on travis/appveyor for this repo?

clspv/third_party/clang/lib/Sema/Sema.cpp:309: clang::Sema::~Sema(): Assertion `DelayedTypos.empty() && "Uncorrected typos!"' failed.

A few CL files from the ARM Compute Library exhibit the following issue:

$ ./build/bin/clspv ComputeLibrary/src/core/CL/cl_kernels/convolution3x3.cl 
clspv: clspv/third_party/clang/lib/Sema/Sema.cpp:309: clang::Sema::~Sema(): Assertion `DelayedTypos.empty() && "Uncorrected typos!"' failed.
Aborted (core dumped)

Here's a couple that show the problem (just passing the file to clspv):

-mfmt=c is broken

  • Should treat the extracted characters as unsigned before computing the resulting word to write.
  • Output was truncated. Perhaps StringRef was referencing a stale binary buffer object?

Reduce duplication of pointer types

Example:

void foo(global uint* A, int n) {
  A[n] = n;
}

void bar(global uchar4 * A, int n) {
  A[n].x = (uchar)n;
}

kernel void goth(global uint *A, global uchar4 *B, int n) {
  foo(A, n);
  bar(B, n);
}

Produces two pointer-to-storage-buffer-of-runtime-array-of-32-bit-int types, each decorated with ArrayStride 4.
e.g.

               OpDecorate %_ptr_StorageBuffer_uint ArrayStride 4
               OpDecorate %_ptr_StorageBuffer_uint_0 ArrayStride 4
...
%_ptr_StorageBuffer_uint = OpTypePointer StorageBuffer %uint
...
%_ptr_StorageBuffer_uint_0 = OpTypePointer StorageBuffer %uint

This is slightly wasteful and could confuse older versions of the validator. (See KhronosGroup/SPIRV-Tools#783)

It would be an enhancement if we could avoid generating two different pointer types in this case.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.