google / clspv Goto Github PK

Clspv is a compiler for OpenCL C to Vulkan compute shaders

License: Apache License 2.0

CMake 0.49% C++ 23.42% C 33.18% Python 1.06% Perl 0.03% Shell 0.34% Batchfile 0.06% LLVM 41.35% Cool 0.07% Clojure 0.01%

spirv vulkan-compute-shaders opencl spir vulkan

clspv's Issues

By default, reuse same descriptor set number for all kernels in a compilation unit

When creating a compute shader pipeline, you can only supply a single entry point. We can reduce the "pressure" on descriptor set counts by reusing the same descriptor set number for all kernels in a compilation unit. (It would be 0 if no sampler map, and 1 if there is a sampler map.)

Add a command line option to keep the old behaviour.

Add TravisCI build bot config covering Linux and OSX

generates OpSelect with scalar bool condition with vector data operands

If operands 1 and 2 are vectors then condition operand must be a vector with the same number of elements.

The compiler is currently generating OpSelect with a scalar bool condition operand. This is passed through from LLVM.

dead code in SPIRVProducer: no such thing as bool vector?

OpenCL C doesn't have vector-of-bool. But the SPIR-V code generation for cast operations tries to handle that case. In that case it would generate invalid code because the constants it selects from are scalars.
The check for vector-of-bool looks like a read herring: I suspect it never fires.

8/16 element vectors unsupported

A simple source file like:

__kernel void testv16()
{
	float16 f;
}

Produces the following error message:

$ ./build/bin/clspv vec16.cl 
vec16.cl:3:9: error: expected ';' after expression
        float16 f;
               ^
               ;
vec16.cl:3:2: error: use of undeclared identifier 'float16'
        float16 f;
        ^
vec16.cl:3:10: error: use of undeclared identifier 'f'
        float16 f;
                ^

Unsurprising given that the SPIR-V Vector16 capability currently depends on the Kernel capability.

Before someone makes a comment on that, I did notice it was a documented restriction. Do people find issues useful to track lifting these restrictions?

vload_half and vload_store implementations are overly simplified when 16bit-storage extension is not available

They assume the pointer argument points to an even-numbered index into a half array. That's why they can get away with casting out to pointer-to-int and then using the GLSL std 450 half-packing instruction.
The vstore_half implementation assumes there is no contention for the word we're storing into. This can be fixed with a retry loop around the atomic_xor trick.

About int16 requeriments..

Hi,

seeing "OpenCL C 1.2 Language on Vulkan" document I see:
"if the short/ushort types are used in the OpenCL C: shaderInt16 is required"

that's sad as seeing http://vulkan.gpuinfo.org reports shaderInt16 isn't supported by Nvidia GPU drivers and also not by AMD GPU Windows and RADV Linux drivers and not also by Intel Mesa drivers..
In fact only GPU OS system that has it is Intel Windows Vulkan drivers..

but I also see VK_KHR_16bit_storage new ext that allows " use of 16-bit types in shader input and output interfaces"
so seems this is also required if using int16 in OpenCL kernel arguments..

so questions are:
*VK_KHR_16bit_storage is also required in addition to shaderInt16 for kernels with int16 arguments?

*can shaderInt16 restriction be avoided by requiring only VK_KHR_16bit_storage for kernels with int16
variables/arguments by "treating it" internally as int32 for operations using it but storing as 16bits using VK_KHR_16bit_storage?

thanks..

Must declare SPV_KHR_storage_buffer_storage_class extension

Emitted code uses StorageBuffer storage class, but there is no OpExtension "SPV_KHR_storage_buffer_storage_class". Need to generate one.

Add AppVeyor build bot config

Setup CI?

Any chance we could setup some CI for this? I've some notion to submit a PR that would allow you to build with an install of LLVM (rather than the versions we currently track), and at least that way we could run it on Travis CI / Appveyor?

Happy to submit the patches to do this, but think I'd need @dneto0 to enable CI on travis/appveyor for this repo?

vector form of clz builtin produces invalid code

Example test/IntegerBuiltins/int2_clz.cl produces this (fragment):

         %14 = OpFunction %void None %9
         %15 = OpLabel
         %16 = OpAccessChain %_ptr_StorageBuffer_v2uint %12 %uint_0 %uint_0
         %17 = OpAccessChain %_ptr_StorageBuffer_v2uint %13 %uint_0 %uint_0
         %18 = OpLoad %v2uint %17
         %19 = OpExtInst %v2uint %1 FindUMsb %18
         %20 = OpISub %v2uint %uint_31 %19 ;   Scalar mixed with vector
               OpStore %16 %20
               OpReturn

The problem is the ISub operates on an scalar and a vector.

Truncation from i32 to i8 should keep only the bottom 8 bits

Example from test/char4_insert.cl

kernel void foo(global uchar4* A, int n) {
 *A = (uchar4)(1,2,(uchar)n,4);
}

Currently produces:

         %27 = OpLoad %uint %26
         %28 = OpUConvert %uint %27  ;  This is a no-op.
         %29 = OpLoad %uint %25
         %30 = OpShiftLeftLogical %uint %uint_255 %uint_8
         %31 = OpNot %uint %30
         %32 = OpBitwiseAnd %uint %29 %31
         %33 = OpShiftLeftLogical %uint %28 %uint_8
         %34 = OpBitwiseOr %uint %32 %33
               OpStore %25 %34

The UConvert to produce %28 is a no-op. The i8 LLVM type maps to a 32-bit SPIR-V int. The LLVM truncation should become an "and" with 255.

Improve error messages. Currently you just get an internal error (usually UNREACHABLE).

This is somewhat by design, since we haven't touched the Clang front-end.

uchar4 component extraction is broken: Not seeing the effect of the mask with 255

Example input:

kernel void foo(const global float4* IN, global float4* OUT) {
  uchar4 in4 = *((global const uchar4*)IN);
  float4 result;
  result.x = in4.x;
  result.y = in4.y;
  result.z = in4.z;
  result.w = in4.w;
  *OUT = result;
}

UNREACHABLE executed at clspv/lib/ReplacePointerBitcastPass.cpp:834!

A few CL files from the ARM Compute Library exhibit the following issue:

  %26 = bitcast i8 addrspace(1)* %call52 to i16 addrspace(1)*Handle above user of scalar bitcast with gep on ReplacePointerBitcastPass
UNREACHABLE executed at clspv/lib/ReplacePointerBitcastPass.cpp:834!
Aborted (core dumped)

Here's a couple that show the problem (just passing the file to clspv):

Reduce duplication of pointer types

Example:

void foo(global uint* A, int n) {
  A[n] = n;
}

void bar(global uchar4 * A, int n) {
  A[n].x = (uchar)n;
}

kernel void goth(global uint *A, global uchar4 *B, int n) {
  foo(A, n);
  bar(B, n);
}

Produces two pointer-to-storage-buffer-of-runtime-array-of-32-bit-int types, each decorated with ArrayStride 4.
e.g.

               OpDecorate %_ptr_StorageBuffer_uint ArrayStride 4
               OpDecorate %_ptr_StorageBuffer_uint_0 ArrayStride 4
...
%_ptr_StorageBuffer_uint = OpTypePointer StorageBuffer %uint
...
%_ptr_StorageBuffer_uint_0 = OpTypePointer StorageBuffer %uint

This is slightly wasteful and could confuse older versions of the validator. (See KhronosGroup/SPIRV-Tools#783)

It would be an enhancement if we could avoid generating two different pointer types in this case.

support acospi

implement fmod

Compiler generates invalid code for:

kernel void foo(global float* A, float x, float y) {
  *A = fmod(x,y);
}

Can't disassemble. I get:

         %22 = OpFunction %void None %10
         %23 = OpLabel
         %24 = OpAccessChain %_ptr_StorageBuffer_float %19 %uint_0 %uint_0
         %25 = OpAccessChain %_ptr_StorageBuffer_float %20 %uint_0
         %26 = OpLoad %float %25
         %27 = OpAccessChain %_ptr_StorageBuffer_float %21 %uint_0
         %28 = OpLoad %float %27
error: 212: Id is 0
...

support fract for pointer-to-private variants

Document CMake settings

E.g. SKIP_CLSPV_TOOLS_INSTALL

Recommend use of CMAKE_BUILD_TYPE=Release ? (It links way way faster)

Writing all zero's to a pointer generates incorrect code

Two test cases:

This one is broken and generates only one OpStore (for one float):

__kernel void myTest(__global float* jasper)
{
  *jasper++ = 0;
  *jasper++ = 0;
  *jasper++ = 0;
  *jasper++ = 0;
}

; SPIR-V
; Version: 1.0
; Generator: Codeplay; 0
; Bound: 22
; Schema: 0
               OpCapability Shader
               OpCapability VariablePointers
               OpExtension "SPV_KHR_storage_buffer_storage_class"
               OpExtension "SPV_KHR_variable_pointers"
               OpMemoryModel Logical GLSL450
               OpEntryPoint GLCompute %19 "myTest"
               OpSource OpenCL_C 120
               OpDecorate %13 SpecId 0
               OpDecorate %14 SpecId 1
               OpDecorate %15 SpecId 2
               OpDecorate %_runtimearr_float ArrayStride 4
               OpMemberDecorate %_struct_4 0 Offset 0
               OpDecorate %_struct_4 Block
               OpDecorate %gl_WorkGroupSize BuiltIn WorkgroupSize
               OpDecorate %18 DescriptorSet 0
               OpDecorate %18 Binding 0
      %float = OpTypeFloat 32
%_ptr_StorageBuffer_float = OpTypePointer StorageBuffer %float
%_runtimearr_float = OpTypeRuntimeArray %float
  %_struct_4 = OpTypeStruct %_runtimearr_float
%_ptr_StorageBuffer__struct_4 = OpTypePointer StorageBuffer %_struct_4
       %uint = OpTypeInt 32 0
       %void = OpTypeVoid
          %8 = OpTypeFunction %void
     %v3uint = OpTypeVector %uint 3
%_ptr_Private_v3uint = OpTypePointer Private %v3uint
     %uint_0 = OpConstant %uint 0
    %float_0 = OpConstant %float 0
         %13 = OpSpecConstant %uint 1
         %14 = OpSpecConstant %uint 1
         %15 = OpSpecConstant %uint 1
%gl_WorkGroupSize = OpSpecConstantComposite %v3uint %13 %14 %15
         %17 = OpVariable %_ptr_Private_v3uint Private %gl_WorkGroupSize
         %18 = OpVariable %_ptr_StorageBuffer__struct_4 StorageBuffer
         %19 = OpFunction %void None %8
         %20 = OpLabel
         %21 = OpAccessChain %_ptr_StorageBuffer_float %18 %uint_0 %uint_0
               OpStore %21 %float_0
               OpReturn
               OpFunctionEnd

This one works and generates 4 OpStore's, one for each value:

__kernel void myTest(__global float* jasper)
{
  *jasper++ = 0;
  *jasper++ = 1;
  *jasper++ = 2;
  *jasper++ = 3;
}

; SPIR-V
; Version: 1.0
; Generator: Codeplay; 0
; Bound: 31
; Schema: 0
               OpCapability Shader
               OpCapability VariablePointers
               OpExtension "SPV_KHR_storage_buffer_storage_class"
               OpExtension "SPV_KHR_variable_pointers"
               OpMemoryModel Logical GLSL450
               OpEntryPoint GLCompute %25 "myTest"
               OpSource OpenCL_C 120
               OpDecorate %19 SpecId 0
               OpDecorate %20 SpecId 1
               OpDecorate %21 SpecId 2
               OpDecorate %_runtimearr_float ArrayStride 4
               OpMemberDecorate %_struct_4 0 Offset 0
               OpDecorate %_struct_4 Block
               OpDecorate %gl_WorkGroupSize BuiltIn WorkgroupSize
               OpDecorate %24 DescriptorSet 0
               OpDecorate %24 Binding 0
      %float = OpTypeFloat 32
%_ptr_StorageBuffer_float = OpTypePointer StorageBuffer %float
%_runtimearr_float = OpTypeRuntimeArray %float
  %_struct_4 = OpTypeStruct %_runtimearr_float
%_ptr_StorageBuffer__struct_4 = OpTypePointer StorageBuffer %_struct_4
       %uint = OpTypeInt 32 0
       %void = OpTypeVoid
          %8 = OpTypeFunction %void
     %v3uint = OpTypeVector %uint 3
%_ptr_Private_v3uint = OpTypePointer Private %v3uint
     %uint_0 = OpConstant %uint 0
     %uint_1 = OpConstant %uint 1
    %float_0 = OpConstant %float 0
     %uint_2 = OpConstant %uint 2
    %float_1 = OpConstant %float 1
     %uint_3 = OpConstant %uint 3
    %float_2 = OpConstant %float 2
    %float_3 = OpConstant %float 3
         %19 = OpSpecConstant %uint 1
         %20 = OpSpecConstant %uint 1
         %21 = OpSpecConstant %uint 1
%gl_WorkGroupSize = OpSpecConstantComposite %v3uint %19 %20 %21
         %23 = OpVariable %_ptr_Private_v3uint Private %gl_WorkGroupSize
         %24 = OpVariable %_ptr_StorageBuffer__struct_4 StorageBuffer
         %25 = OpFunction %void None %8
         %26 = OpLabel
         %27 = OpAccessChain %_ptr_StorageBuffer_float %24 %uint_0 %uint_0
         %28 = OpAccessChain %_ptr_StorageBuffer_float %24 %uint_0 %uint_1
               OpStore %27 %float_0
         %29 = OpAccessChain %_ptr_StorageBuffer_float %24 %uint_0 %uint_2
               OpStore %28 %float_1
         %30 = OpAccessChain %_ptr_StorageBuffer_float %24 %uint_0 %uint_3
               OpStore %29 %float_2
               OpStore %30 %float_3
               OpReturn
               OpFunctionEnd

cluster-pod-kernel-args: generates too-small Offsets, but correct output in descriptormap

Example:

kernel void align(global int* A, int x, float4 c) {
  *A = x + (int)c.x;
}

Compile with:
clspv align.cl -o align.spv -cluster-pod-kernel-args -descriptormap=foot

Then the x and c arguments are collected into a single struct argument, for a storage buffer. The descriptor map is:

kernel,align,arg,A,argOrdinal,0,descriptorSet,0,binding,0,offset,0
kernel,align,arg,x,argOrdinal,1,descriptorSet,0,binding,1,offset,0
kernel,align,arg,c,argOrdinal,2,descriptorSet,0,binding,1,offset,16

That's correct.

However, in SPIR-V we have:
OpMemberDecorate %_struct_8 0 Offset 0
OpMemberDecorate %_struct_8 1 Offset 4 ; this should be offset 16

So two bugs, really: Mismatch between descriptormap and the declared struct offset. Should make the declared struct offset the same as what's reported in the descriptormap.

support asinpi

clspv/third_party/clang/lib/Sema/Sema.cpp:309: clang::Sema::~Sema(): Assertion `DelayedTypos.empty() && "Uncorrected typos!"' failed.

A few CL files from the ARM Compute Library exhibit the following issue:

$ ./build/bin/clspv ComputeLibrary/src/core/CL/cl_kernels/convolution3x3.cl 
clspv: clspv/third_party/clang/lib/Sema/Sema.cpp:309: clang::Sema::~Sema(): Assertion `DelayedTypos.empty() && "Uncorrected typos!"' failed.
Aborted (core dumped)

Here's a couple that show the problem (just passing the file to clspv):

full OpenCL C?

Would you accept patches to reach full OpenCL C compliance? Or is that a clear non-goal of the project?

Request for fine grain image requeriments..

You say
"If images are used in the OpenCL C:
The shaderStorageImageReadWithoutFormat field of VkPhysicalDeviceFeatures must be set to true.
The shaderStorageImageWriteWithoutFormat field of VkPhysicalDeviceFeatures must be set to true.
"
also seeing vulkan.gpuinfo reports situation is better than int16 support at least modern NV and AMD GPUs support both reqs..
but Intel iGPUs don't support shaderStorageImageReadWithoutFormat neither Windows nor Linux driver..

So as OpenCL has __read_only and __write_only image access qualifiers maybe we can require only shaderStorageImageReadWithoutFormat in __read_only images and shaderStorageImageWriteWithoutFormat in __write_only images .. is that correct?

that way a kernel having only write only image access should work on Intel GPUs right?

thanks..

Translate LLVM's memmove

This currently causes an IE as follows:
%1 = bitcast float addrspace(1)* %A to i8 addrspace(1)*Handle above user of scalar bitcast with gep on ReplacePointerBitcastPass
UNREACHABLE executed at /build/dneto/aglet/SPIRV-LLVM/lib/ReplacePointerBitcastPass.cpp:834!

void kernel __attribute__((reqd_work_group_size(1, 1, 1)))
memmove_example(global float *A, int n, int k) {
  float src[20];
  for (int i = 0; i < 20; i++) {
    src[i] = A[k+i];
  }
  for (int i = 0; i < 20; i++) {
    A[n+i] = src[i];
  }
}

Intermediate translation is ; Function Attrs: norecurse nounwind
define spir_kernel void @src_is_array(float addrspace(1)* nocapture %A, i32 %n, i32 %k) local_unnamed_addr #0 !kernel_arg_addr_space !3 !kernel_arg_access_qual !4 !kernel_arg_type !5 !kernel_arg_base_type !5 !kernel_arg_type_qual !6 !reqd_work_group_size !7 {
entry:
  %0 = mul i32 %k, 4
  %1 = bitcast float addrspace(1)* %A to i8 addrspace(1)*
  %2 = lshr i32 %0, 2
  %3 = getelementptr i8, i8 addrspace(1)* %1, i32 %0
  %4 = mul i32 %n, 4
  %5 = bitcast float addrspace(1)* %A to i8 addrspace(1)*
  %6 = getelementptr i8, i8 addrspace(1)* %5, i32 %4
  call void @llvm.memmove.p1i8.p1i8.i32(i8 addrspace(1)* %6, i8 addrspace(1)* %3, i32 80, i32 4, i1 false)
  ret void
}

The test should go into test/LLVMIntrinsics alongside the memcpy tests

On abnormal error, the output file can remain. Causes problems in build systems

Should have auto-cleanup behaviour like Clang does.

compiler crashes with floating point exception

Occurs when compiling this code:

typedef struct {
  int a, b;
} S;

S convert(int n) {
  S s = { n, n};
  return s;
}

kernel void foo(global S* A, global uchar4* B, int n) {
 *B = (uchar4)((uchar)n,1,2,3);
 *A = convert(10);
}

Does not occur when S gains two extra int members, and the convert function is adjusted accordingly.

Fails to register SPV_KHR_16bit_storage extension when needed

See test/HalfStorage/vstorehalf_pointer_cast_to_short.cl

It stores through a pointer-to-global-short. The resulting module should have an OpExtension "SPV_KHR_16bit_storage" instruction but it does not.

OpPtrAccessChain into Private storage class sometimes generated when indexing into OpenCL module-scope constant

That's only allowed when indexing into StorageBuffer or Workgroup storage class.

See test/ProgramScopeConstants/constant_array_with_function_call.cl:

constant uint b[4] = { 42, 13, 0, 15 };

uint bar(constant uint* a) { return a[get_local_id(0)]; }

void kernel __attribute__((reqd_work_group_size(4, 1, 1))) foo(global uint* a) {
  *a = bar(b);
}

Will include the following in the body of bar():

%29 = OpPtrAccessChain %_ptr_Private_uint %25 %28

Neil Henning says:
So the issue is that:

program-scope constant variable and constant function argument variables are the same address space in OpenCL
we also conflate global and constant address spaces when we transform OpenCL C - > Vulkan SPIR-V (because we can't use, for instance, the UniformConstant storage class for constant variables)
we always generate OpPtrAccessChain when a non-kernel function is indexing into a pointer as if it is an array

I think the only safe thing to do here is if you see a program-scope constant variable being passed into a sub function, then inline that sub function.

compiler adding StorageImageReadWithoutFormat when not using OpImageRead

Might be similar issue with StorageImageWriteWithoutFormat

Vulkan says:

The Image Format of an OpTypeImage declaration must not be Unknown, for variables which are used for OpImageRead or OpImageWrite operations, except under the following conditions:

...

For OpImageRead, if the shaderStorageImageReadWithoutFormat feature is enabled and the shader module declares the StorageImageReadWithoutFormat capability.

But clspv will add the capability even if only doing OpImageSampleExplicitLod. E.g. see tests/ImageBuiltins/read_imagef_sampler_float2.cl

It seems to be adding the capability when just mentioning the read image types.

The simplifying assumption is also reflected in the documentation.

strange type on bitcast UNREACHABLE executed at clspv/lib/ReplacePointerBitcastPass.cpp:752!

A few CL files from the ARM Compute Library exhibit the following issue:

$ ./build/bin/clspv -DDATA_TYPE=float ComputeLibrary/src/core/CL/cl_kernels/convolution_layer.cl 
strange type on bitcast
UNREACHABLE executed at clspv/lib/ReplacePointerBitcastPass.cpp:752!
Aborted (core dumped)

Here's a couple that show the problem:

Convolution layer, passing -DDATA_TYPE=float
Normalization Layer

Fails to build with latest SPIRV-Headers

The problem is a duplicate enum in a case table, due to token aliasing

include/clspv/spirv_c_strings.hpp: In function ‘const char* spv::getCapabilityName(spv::Capability)’:
include/clspv/spirv_c_strings.hpp:410:5: error: duplicate case value
     case spv::CapabilityShaderViewportIndexLayerNV: return "ShaderViewportIndexLayerNV"; 
     ^
include/clspv/spirv_c_strings.hpp:409:5: error: previously used here
     case spv::CapabilityShaderViewportIndexLayerEXT: return "ShaderViewportIndexLayerEXT"; 
     ^
ninja: build stopped: subcommand failed.

support step

StorageClassPrivate shadow copy of WorkGroupSize can be replaced with OpVectorExtractDynamic

Currently, WorkGroupSize has a private scope variable in case the application dynamically extracts the workgroup size, but it seems more direct to use dynamic vector extraction.

EDIT: Removed mention of interoperability issues, missed that the spec constant had WorkGroupSize declared as a builtin.

Generated code fails validation: The Object type (OpTypePointer) in OpCompositeInsert does not match the type that results from indexing into the Composite (OpTypeRuntimeArray).

Code generated from Convolution Layer case in #11 fails validation.

info: 172: Duplicate non-aggregate type declarations are not allowed. Opcode: 21 error: 1251: The Object type (OpTypePointer) in OpCompositeInsert does not match the type that results from indexing into the Composite (OpTypeRuntimeArray).

This might actually be two distinct issues. I've seen the duplicate non-aggregate type decl issue before, and will file that separately. This issue will be for the type mismatch.

Generate Block decoration instead of BufferBlock

For storage buffers, we're using StorageBuffer storage class, but decorating those objects with BufferBlock.
We're supposed to use Block decorations instead.

See SPV_KHR_storage_buffer_storage_block and VK_KHR_storage_buffer_storage_block.
https://www.khronos.org/registry/vulkan/specs/1.0-extensions/html/vkspec.html#VK_KHR_storage_buffer_storage_class

generates OpSelect with operand 1 and 2 being pointers

Vulkan targets SPIR-V 1.0, and OpSelect in 1.0 requires the two data operands to be either scalar or vector. Not pointer type.

Example input:

void kernel __attribute__((reqd_work_group_size(1, 1, 1))) foo(global float2* a, global float2* b, int c)
{
  *a = c ? b[0] : b[1];
}

Generates:

         %20 = OpLabel
         %21 = OpAccessChain %_ptr_StorageBuffer_v2float %16 %uint_0 %uint_0
         %22 = OpAccessChain %_ptr_StorageBuffer_v2float %17 %uint_0 %uint_0
         %23 = OpAccessChain %_ptr_StorageBuffer_uint %18 %uint_0
         %24 = OpLoad %uint %23
         %25 = OpIEqual %bool %24 %uint_0
         %26 = OpAccessChain %_ptr_StorageBuffer_v2float %17 %uint_0 %uint_1
         %27 = OpSelect %_ptr_StorageBuffer_v2float %25 %26 %22
         %28 = OpLoad %v2float %27
               OpStore %21 %28
               OpReturn

The problem is that %27 selects between two pointers

support atan2pi

Expected both objects to be of Result Type: Select

This is forked from comment #2 (comment)

The problem occurs when doing a somewhat simple selection between two loads from Private storage class (or __constant in OpenCL C). The inst-combine optimization changes

a selection between two loads
into
a selection between two pointers, followed by a load from that pointer.

In the current compilation scheme the two pointers are in Private storage class, and the compiler incorrectly has the result of the selection being a pointer to StorageBuffer storage class.

Case is as below:

This is the symptom of compiling code like:

__constant float kFirst[3] = {1.0f, 2.0f, 3.0f};
__constant float kSecond[3] = {10.0f, 11.0f, 12.0f};

kernel void foo(global float*A, int c, int i) {
 *A = c==0 ? kFirst[i] : kSecond[i];
}

Produces this kind of code:

         %33 = OpVariable %_ptr_Private__arr_float_uint_3 Private %23
         %34 = OpVariable %_ptr_Private__arr_float_uint_3 Private %27
         %35 = OpVariable %_ptr_StorageBuffer__struct_4 StorageBuffer
         %36 = OpVariable %_ptr_StorageBuffer__struct_7 StorageBuffer
         %37 = OpVariable %_ptr_StorageBuffer__struct_7 StorageBuffer
         %38 = OpFunction %void None %11
         %39 = OpLabel
         %40 = OpAccessChain %_ptr_StorageBuffer_float %35 %uint_0 %uint_0
         %41 = OpAccessChain %_ptr_StorageBuffer_uint %36 %uint_0
         %42 = OpLoad %uint %41
         %43 = OpAccessChain %_ptr_StorageBuffer_uint %37 %uint_0
         %44 = OpLoad %uint %43
         %45 = OpIEqual %bool %42 %uint_0
         %46 = OpSelect %_ptr_StorageBuffer__arr_float_uint_3 %45 %33 %34
         %47 = OpAccessChain %_ptr_StorageBuffer_float %46 %44
         %48 = OpLoad %float %47
               OpStore %40 %48
               OpReturn
               OpFunctionEnd

Note the OpSelect at %46. Its operands are pointers into Private, but its result is pointer to StorageBuffer. That's invalid. We need the initializers for %33 and %34 but even with VariablePointers we can't select between two different pointer-to-Private values.

Invalid OpLogicalNot when masking <4 x i8> values

Example:

kernel void foo(global uchar4* A, int n) {
 *A = (uchar4)((uchar)n,1,2,3);
}

Generates code like this:

         %27 = OpFunction %void None %10
         %28 = OpLabel
         %29 = OpAccessChain %_ptr_StorageBuffer_uint %25 %uint_0 %uint_0
         %30 = OpAccessChain %_ptr_StorageBuffer_uint_0 %26 %uint_0
         %31 = OpLoad %uint %30
         %32 = OpUConvert %uint %31
         %33 = OpShiftLeftLogical %uint %uint_255 %uint_0
         %34 = OpLogicalNot %uint %33   ; THIS IS WRONG
         %35 = OpBitwiseAnd %uint %uint_66051 %34
         %36 = OpShiftLeftLogical %uint %32 %uint_0
         %37 = OpBitwiseOr %uint %35 %36
               OpStore %29 %33
               OpReturn

Generation of 32bit value for <4 x i8> looks wrong

When accumulating the bits, this:

       IntValue = (IntValue << Idx) | Val;

Should probably be:

       IntValue = (IntValue << 8 * Idx) | Val;

-mfmt=c is broken

Should treat the extracted characters as unsigned before computing the resulting word to write.
Output was truncated. Perhaps StringRef was referencing a stale binary buffer object?

Remove hacks for reading workgroup size

Clean up the hacky parts of #59 when we can prove the ecosystem has caught up.

kernel lacks LocalSize execution mode if another kernel has reqd_work_group_size

Here the "align" kernel has a reqd_work_group_size. So it gets a LocalSize execution mode. It also suppresses generation of spec IDs for the components of a work-group-size vector.
However, the "boo" kernel is generated without an associated LocalSize execution model. This is an error. The most sane thing to do here is to emit a LocalSize of 1 1 1, and document it.

kernel void __attribute__((reqd_work_group_size(12,2,3))) align(global int* A, int x, float4 c) {
  *A = x + (int)c.x;
}

kernel void  boo(global int* A, int x, float4 c) {
  *A = x + (int)c.x;
}

There's an unfortunate problem in the Vulkan env spec in that a specialization value for workgroup size is freefloating and you can't tell what compute shaders it should affect. So the proposed defaulting is about as good as we can do.

Use of uchar4 generates second copy of OpTypeInt 32 0

That's invalid by the SPIR-V spec. The validator briefly had that rule turned on, but some bad Vulkan CTS tests have duplicate scalar type decls that are hard to rewrite, so we turned that rule off.

Example input:

kernel void foo(global uchar4* A) {
  *A = (uchar4)(1,2,3,4);
}

Appears to be caused by SPIRVProducerPass::GenerateSPIRVTypes where it says "i8 is added to TypeMap as i32."
The issue is that the Types collection (returned by getTypeList) has entries for i8 and i32. But the i8 will be converted at the last second to an i32. So we end up with duplicate OpTypeInt 32 0.

Here's a fragment of the generated assembly:

%1 =    OpTypeInt 32 0    ; the first one
%2 =    OpTypePointer StorageBuffer %1
%3 =    OpTypeRuntimeArray %1
%4 =    OpTypeStruct %3
%5 =    OpTypePointer StorageBuffer %4
%6 =    OpTypeInt 32 0 ;   oops
%7 =    OpTypeVoid
%8 =    OpTypeFunction %7
%9 =    OpTypeVector %6 3

Some drivers don't like constants constructed with OpUndef scalar components

Example input:

void kernel foo(global float4* A)
{
  float4 value;
  value.w = 1111.0f;
  *A = value;
}

Produces assembly including this:

         %13 = OpUndef %float
 %float_1111 = OpConstant %float 1111
         %15 = OpConstantComposite %v4float %13 %13 %13 %float_1111
         %16 = OpSpecConstant %uint 1
         %17 = OpSpecConstant %uint 1
         %18 = OpSpecConstant %uint 1
%gl_WorkGroupSize = OpSpecConstantComposite %v3uint %16 %17 %18
         %20 = OpVariable %_ptr_Private_v3uint Private %gl_WorkGroupSize
         %21 = OpVariable %_ptr_StorageBuffer__struct_5 StorageBuffer
         %22 = OpFunction %void None %9
         %23 = OpLabel
         %24 = OpAccessChain %_ptr_StorageBuffer_v4float %21 %uint_0 %uint_0
               OpStore %24 %15
               OpReturn
               OpFunctionEnd

Some drivers have trouble with the OpConstantComposite. They happen to work if we use
%13 = OpConstantNull %float
instead.

Signed char extraction fails to take sign into account

kernel void foo_fixed(global uint* A, global char4* B) {
 char4 val = *B;
 A[0] = val.x;
 A[1] = val.y;
 A[2] = val.z;
 A[3] = val.w;
}

Generates, in part, this code:

         %34 = OpShiftRightLogical %uint %33 %uint_0
         %35 = OpBitwiseAnd %uint %34 %uint_255
         %36 = OpSConvert %uint %34    ; This signed conversion is not enough
               OpStore %31 %36

The bug is that the LLVM i8 is converted to a SPIR-V uint, and so the SConvert is from a uint to a uint. This does not do the sign extension that LLVM expects will be implicitly done as part of the conversion.

The clspv compiler needs to insert more instructions to replicate the sign bit from position 7 to all the remainder.

vload_half3_* and vstore_half3_* are not supported

This is not documented in the language mapping doc.

Supporting 3-element vectors in replace-pointer-bitcasts is complicated because remapping 3-element vector into a pointer-to-4-elem vector can span elements in the target array.

google / clspv Goto Github PK

clspv's Issues

Recommend Projects

Recommend Topics

Recommend Org