google / clspv Goto Github PK
View Code? Open in Web Editor NEWClspv is a compiler for OpenCL C to Vulkan compute shaders
License: Apache License 2.0
Clspv is a compiler for OpenCL C to Vulkan compute shaders
License: Apache License 2.0
When creating a compute shader pipeline, you can only supply a single entry point. We can reduce the "pressure" on descriptor set counts by reusing the same descriptor set number for all kernels in a compilation unit. (It would be 0 if no sampler map, and 1 if there is a sampler map.)
Add a command line option to keep the old behaviour.
If operands 1 and 2 are vectors then condition operand must be a vector with the same number of elements.
The compiler is currently generating OpSelect with a scalar bool condition operand. This is passed through from LLVM.
OpenCL C doesn't have vector-of-bool. But the SPIR-V code generation for cast operations tries to handle that case. In that case it would generate invalid code because the constants it selects from are scalars.
The check for vector-of-bool looks like a read herring: I suspect it never fires.
A simple source file like:
__kernel void testv16()
{
float16 f;
}
Produces the following error message:
$ ./build/bin/clspv vec16.cl
vec16.cl:3:9: error: expected ';' after expression
float16 f;
^
;
vec16.cl:3:2: error: use of undeclared identifier 'float16'
float16 f;
^
vec16.cl:3:10: error: use of undeclared identifier 'f'
float16 f;
^
Unsurprising given that the SPIR-V Vector16
capability currently depends on the Kernel
capability.
Before someone makes a comment on that, I did notice it was a documented restriction. Do people find issues useful to track lifting these restrictions?
Hi,
seeing "OpenCL C 1.2 Language on Vulkan" document I see:
"if the short/ushort types are used in the OpenCL C: shaderInt16 is required"
that's sad as seeing http://vulkan.gpuinfo.org reports shaderInt16 isn't supported by Nvidia GPU drivers and also not by AMD GPU Windows and RADV Linux drivers and not also by Intel Mesa drivers..
In fact only GPU OS system that has it is Intel Windows Vulkan drivers..
but I also see VK_KHR_16bit_storage new ext that allows " use of 16-bit types in shader input and output interfaces"
so seems this is also required if using int16 in OpenCL kernel arguments..
so questions are:
*VK_KHR_16bit_storage is also required in addition to shaderInt16 for kernels with int16 arguments?
*can shaderInt16 restriction be avoided by requiring only VK_KHR_16bit_storage for kernels with int16
variables/arguments by "treating it" internally as int32 for operations using it but storing as 16bits using VK_KHR_16bit_storage?
thanks..
Emitted code uses StorageBuffer storage class, but there is no OpExtension "SPV_KHR_storage_buffer_storage_class". Need to generate one.
Any chance we could setup some CI for this? I've some notion to submit a PR that would allow you to build with an install of LLVM (rather than the versions we currently track), and at least that way we could run it on Travis CI / Appveyor?
Happy to submit the patches to do this, but think I'd need @dneto0 to enable CI on travis/appveyor for this repo?
Example test/IntegerBuiltins/int2_clz.cl produces this (fragment):
%14 = OpFunction %void None %9
%15 = OpLabel
%16 = OpAccessChain %_ptr_StorageBuffer_v2uint %12 %uint_0 %uint_0
%17 = OpAccessChain %_ptr_StorageBuffer_v2uint %13 %uint_0 %uint_0
%18 = OpLoad %v2uint %17
%19 = OpExtInst %v2uint %1 FindUMsb %18
%20 = OpISub %v2uint %uint_31 %19 ; Scalar mixed with vector
OpStore %16 %20
OpReturn
The problem is the ISub operates on an scalar and a vector.
Example from test/char4_insert.cl
kernel void foo(global uchar4* A, int n) {
*A = (uchar4)(1,2,(uchar)n,4);
}
Currently produces:
%27 = OpLoad %uint %26
%28 = OpUConvert %uint %27 ; This is a no-op.
%29 = OpLoad %uint %25
%30 = OpShiftLeftLogical %uint %uint_255 %uint_8
%31 = OpNot %uint %30
%32 = OpBitwiseAnd %uint %29 %31
%33 = OpShiftLeftLogical %uint %28 %uint_8
%34 = OpBitwiseOr %uint %32 %33
OpStore %25 %34
The UConvert to produce %28 is a no-op. The i8 LLVM type maps to a 32-bit SPIR-V int. The LLVM truncation should become an "and" with 255.
This is somewhat by design, since we haven't touched the Clang front-end.
Example input:
kernel void foo(const global float4* IN, global float4* OUT) {
uchar4 in4 = *((global const uchar4*)IN);
float4 result;
result.x = in4.x;
result.y = in4.y;
result.z = in4.z;
result.w = in4.w;
*OUT = result;
}
A few CL files from the ARM Compute Library exhibit the following issue:
%26 = bitcast i8 addrspace(1)* %call52 to i16 addrspace(1)*Handle above user of scalar bitcast with gep on ReplacePointerBitcastPass
UNREACHABLE executed at clspv/lib/ReplacePointerBitcastPass.cpp:834!
Aborted (core dumped)
Here's a couple that show the problem (just passing the file to clspv
):
Example:
void foo(global uint* A, int n) {
A[n] = n;
}
void bar(global uchar4 * A, int n) {
A[n].x = (uchar)n;
}
kernel void goth(global uint *A, global uchar4 *B, int n) {
foo(A, n);
bar(B, n);
}
Produces two pointer-to-storage-buffer-of-runtime-array-of-32-bit-int types, each decorated with ArrayStride 4.
e.g.
OpDecorate %_ptr_StorageBuffer_uint ArrayStride 4
OpDecorate %_ptr_StorageBuffer_uint_0 ArrayStride 4
...
%_ptr_StorageBuffer_uint = OpTypePointer StorageBuffer %uint
...
%_ptr_StorageBuffer_uint_0 = OpTypePointer StorageBuffer %uint
This is slightly wasteful and could confuse older versions of the validator. (See KhronosGroup/SPIRV-Tools#783)
It would be an enhancement if we could avoid generating two different pointer types in this case.
Compiler generates invalid code for:
kernel void foo(global float* A, float x, float y) {
*A = fmod(x,y);
}
Can't disassemble. I get:
%22 = OpFunction %void None %10
%23 = OpLabel
%24 = OpAccessChain %_ptr_StorageBuffer_float %19 %uint_0 %uint_0
%25 = OpAccessChain %_ptr_StorageBuffer_float %20 %uint_0
%26 = OpLoad %float %25
%27 = OpAccessChain %_ptr_StorageBuffer_float %21 %uint_0
%28 = OpLoad %float %27
error: 212: Id is 0
...
E.g. SKIP_CLSPV_TOOLS_INSTALL
Recommend use of CMAKE_BUILD_TYPE=Release ? (It links way way faster)
Two test cases:
This one is broken and generates only one OpStore (for one float):
__kernel void myTest(__global float* jasper)
{
*jasper++ = 0;
*jasper++ = 0;
*jasper++ = 0;
*jasper++ = 0;
}
; SPIR-V
; Version: 1.0
; Generator: Codeplay; 0
; Bound: 22
; Schema: 0
OpCapability Shader
OpCapability VariablePointers
OpExtension "SPV_KHR_storage_buffer_storage_class"
OpExtension "SPV_KHR_variable_pointers"
OpMemoryModel Logical GLSL450
OpEntryPoint GLCompute %19 "myTest"
OpSource OpenCL_C 120
OpDecorate %13 SpecId 0
OpDecorate %14 SpecId 1
OpDecorate %15 SpecId 2
OpDecorate %_runtimearr_float ArrayStride 4
OpMemberDecorate %_struct_4 0 Offset 0
OpDecorate %_struct_4 Block
OpDecorate %gl_WorkGroupSize BuiltIn WorkgroupSize
OpDecorate %18 DescriptorSet 0
OpDecorate %18 Binding 0
%float = OpTypeFloat 32
%_ptr_StorageBuffer_float = OpTypePointer StorageBuffer %float
%_runtimearr_float = OpTypeRuntimeArray %float
%_struct_4 = OpTypeStruct %_runtimearr_float
%_ptr_StorageBuffer__struct_4 = OpTypePointer StorageBuffer %_struct_4
%uint = OpTypeInt 32 0
%void = OpTypeVoid
%8 = OpTypeFunction %void
%v3uint = OpTypeVector %uint 3
%_ptr_Private_v3uint = OpTypePointer Private %v3uint
%uint_0 = OpConstant %uint 0
%float_0 = OpConstant %float 0
%13 = OpSpecConstant %uint 1
%14 = OpSpecConstant %uint 1
%15 = OpSpecConstant %uint 1
%gl_WorkGroupSize = OpSpecConstantComposite %v3uint %13 %14 %15
%17 = OpVariable %_ptr_Private_v3uint Private %gl_WorkGroupSize
%18 = OpVariable %_ptr_StorageBuffer__struct_4 StorageBuffer
%19 = OpFunction %void None %8
%20 = OpLabel
%21 = OpAccessChain %_ptr_StorageBuffer_float %18 %uint_0 %uint_0
OpStore %21 %float_0
OpReturn
OpFunctionEnd
This one works and generates 4 OpStore's, one for each value:
__kernel void myTest(__global float* jasper)
{
*jasper++ = 0;
*jasper++ = 1;
*jasper++ = 2;
*jasper++ = 3;
}
; SPIR-V
; Version: 1.0
; Generator: Codeplay; 0
; Bound: 31
; Schema: 0
OpCapability Shader
OpCapability VariablePointers
OpExtension "SPV_KHR_storage_buffer_storage_class"
OpExtension "SPV_KHR_variable_pointers"
OpMemoryModel Logical GLSL450
OpEntryPoint GLCompute %25 "myTest"
OpSource OpenCL_C 120
OpDecorate %19 SpecId 0
OpDecorate %20 SpecId 1
OpDecorate %21 SpecId 2
OpDecorate %_runtimearr_float ArrayStride 4
OpMemberDecorate %_struct_4 0 Offset 0
OpDecorate %_struct_4 Block
OpDecorate %gl_WorkGroupSize BuiltIn WorkgroupSize
OpDecorate %24 DescriptorSet 0
OpDecorate %24 Binding 0
%float = OpTypeFloat 32
%_ptr_StorageBuffer_float = OpTypePointer StorageBuffer %float
%_runtimearr_float = OpTypeRuntimeArray %float
%_struct_4 = OpTypeStruct %_runtimearr_float
%_ptr_StorageBuffer__struct_4 = OpTypePointer StorageBuffer %_struct_4
%uint = OpTypeInt 32 0
%void = OpTypeVoid
%8 = OpTypeFunction %void
%v3uint = OpTypeVector %uint 3
%_ptr_Private_v3uint = OpTypePointer Private %v3uint
%uint_0 = OpConstant %uint 0
%uint_1 = OpConstant %uint 1
%float_0 = OpConstant %float 0
%uint_2 = OpConstant %uint 2
%float_1 = OpConstant %float 1
%uint_3 = OpConstant %uint 3
%float_2 = OpConstant %float 2
%float_3 = OpConstant %float 3
%19 = OpSpecConstant %uint 1
%20 = OpSpecConstant %uint 1
%21 = OpSpecConstant %uint 1
%gl_WorkGroupSize = OpSpecConstantComposite %v3uint %19 %20 %21
%23 = OpVariable %_ptr_Private_v3uint Private %gl_WorkGroupSize
%24 = OpVariable %_ptr_StorageBuffer__struct_4 StorageBuffer
%25 = OpFunction %void None %8
%26 = OpLabel
%27 = OpAccessChain %_ptr_StorageBuffer_float %24 %uint_0 %uint_0
%28 = OpAccessChain %_ptr_StorageBuffer_float %24 %uint_0 %uint_1
OpStore %27 %float_0
%29 = OpAccessChain %_ptr_StorageBuffer_float %24 %uint_0 %uint_2
OpStore %28 %float_1
%30 = OpAccessChain %_ptr_StorageBuffer_float %24 %uint_0 %uint_3
OpStore %29 %float_2
OpStore %30 %float_3
OpReturn
OpFunctionEnd
Example:
kernel void align(global int* A, int x, float4 c) {
*A = x + (int)c.x;
}
Compile with:
clspv align.cl -o align.spv -cluster-pod-kernel-args -descriptormap=foot
Then the x and c arguments are collected into a single struct argument, for a storage buffer. The descriptor map is:
kernel,align,arg,A,argOrdinal,0,descriptorSet,0,binding,0,offset,0
kernel,align,arg,x,argOrdinal,1,descriptorSet,0,binding,1,offset,0
kernel,align,arg,c,argOrdinal,2,descriptorSet,0,binding,1,offset,16
That's correct.
However, in SPIR-V we have:
OpMemberDecorate %_struct_8 0 Offset 0
OpMemberDecorate %_struct_8 1 Offset 4 ; this should be offset 16
So two bugs, really: Mismatch between descriptormap and the declared struct offset. Should make the declared struct offset the same as what's reported in the descriptormap.
A few CL files from the ARM Compute Library exhibit the following issue:
$ ./build/bin/clspv ComputeLibrary/src/core/CL/cl_kernels/convolution3x3.cl
clspv: clspv/third_party/clang/lib/Sema/Sema.cpp:309: clang::Sema::~Sema(): Assertion `DelayedTypos.empty() && "Uncorrected typos!"' failed.
Aborted (core dumped)
Here's a couple that show the problem (just passing the file to clspv
):
Would you accept patches to reach full OpenCL C compliance? Or is that a clear non-goal of the project?
You say
"If images are used in the OpenCL C:
The shaderStorageImageReadWithoutFormat field of VkPhysicalDeviceFeatures must be set to true.
The shaderStorageImageWriteWithoutFormat field of VkPhysicalDeviceFeatures must be set to true.
"
also seeing vulkan.gpuinfo reports situation is better than int16 support at least modern NV and AMD GPUs support both reqs..
but Intel iGPUs don't support shaderStorageImageReadWithoutFormat neither Windows nor Linux driver..
So as OpenCL has __read_only and __write_only image access qualifiers maybe we can require only shaderStorageImageReadWithoutFormat in __read_only images and shaderStorageImageWriteWithoutFormat in __write_only images .. is that correct?
that way a kernel having only write only image access should work on Intel GPUs right?
thanks..
This currently causes an IE as follows:
%1 = bitcast float addrspace(1)* %A to i8 addrspace(1)*Handle above user of scalar bitcast with gep on ReplacePointerBitcastPass
UNREACHABLE executed at /build/dneto/aglet/SPIRV-LLVM/lib/ReplacePointerBitcastPass.cpp:834!
void kernel __attribute__((reqd_work_group_size(1, 1, 1)))
memmove_example(global float *A, int n, int k) {
float src[20];
for (int i = 0; i < 20; i++) {
src[i] = A[k+i];
}
for (int i = 0; i < 20; i++) {
A[n+i] = src[i];
}
}
Intermediate translation is ; Function Attrs: norecurse nounwind
define spir_kernel void @src_is_array(float addrspace(1)* nocapture %A, i32 %n, i32 %k) local_unnamed_addr #0 !kernel_arg_addr_space !3 !kernel_arg_access_qual !4 !kernel_arg_type !5 !kernel_arg_base_type !5 !kernel_arg_type_qual !6 !reqd_work_group_size !7 {
entry:
%0 = mul i32 %k, 4
%1 = bitcast float addrspace(1)* %A to i8 addrspace(1)*
%2 = lshr i32 %0, 2
%3 = getelementptr i8, i8 addrspace(1)* %1, i32 %0
%4 = mul i32 %n, 4
%5 = bitcast float addrspace(1)* %A to i8 addrspace(1)*
%6 = getelementptr i8, i8 addrspace(1)* %5, i32 %4
call void @llvm.memmove.p1i8.p1i8.i32(i8 addrspace(1)* %6, i8 addrspace(1)* %3, i32 80, i32 4, i1 false)
ret void
}
The test should go into test/LLVMIntrinsics alongside the memcpy tests
Should have auto-cleanup behaviour like Clang does.
Occurs when compiling this code:
typedef struct {
int a, b;
} S;
S convert(int n) {
S s = { n, n};
return s;
}
kernel void foo(global S* A, global uchar4* B, int n) {
*B = (uchar4)((uchar)n,1,2,3);
*A = convert(10);
}
Does not occur when S gains two extra int members, and the convert function is adjusted accordingly.
See test/HalfStorage/vstorehalf_pointer_cast_to_short.cl
It stores through a pointer-to-global-short. The resulting module should have an OpExtension "SPV_KHR_16bit_storage"
instruction but it does not.
That's only allowed when indexing into StorageBuffer or Workgroup storage class.
See test/ProgramScopeConstants/constant_array_with_function_call.cl:
constant uint b[4] = { 42, 13, 0, 15 };
uint bar(constant uint* a) { return a[get_local_id(0)]; }
void kernel __attribute__((reqd_work_group_size(4, 1, 1))) foo(global uint* a) {
*a = bar(b);
}
Will include the following in the body of bar():
%29 = OpPtrAccessChain %_ptr_Private_uint %25 %28
Neil Henning says:
So the issue is that:
UniformConstant
storage class for constant variables)OpPtrAccessChain
when a non-kernel function is indexing into a pointer as if it is an arrayI think the only safe thing to do here is if you see a program-scope constant variable being passed into a sub function, then inline that sub function.
Might be similar issue with StorageImageWriteWithoutFormat
Vulkan says:
The Image Format of an OpTypeImage declaration must not be Unknown, for variables which are used for OpImageRead or OpImageWrite operations, except under the following conditions:
...
For OpImageRead, if the shaderStorageImageReadWithoutFormat feature is enabled and the shader module declares the StorageImageReadWithoutFormat capability.
But clspv will add the capability even if only doing OpImageSampleExplicitLod. E.g. see tests/ImageBuiltins/read_imagef_sampler_float2.cl
It seems to be adding the capability when just mentioning the read image types.
The simplifying assumption is also reflected in the documentation.
A few CL files from the ARM Compute Library exhibit the following issue:
$ ./build/bin/clspv -DDATA_TYPE=float ComputeLibrary/src/core/CL/cl_kernels/convolution_layer.cl
strange type on bitcast
UNREACHABLE executed at clspv/lib/ReplacePointerBitcastPass.cpp:752!
Aborted (core dumped)
Here's a couple that show the problem:
-DDATA_TYPE=float
The problem is a duplicate enum in a case table, due to token aliasing
include/clspv/spirv_c_strings.hpp: In function ‘const char* spv::getCapabilityName(spv::Capability)’:
include/clspv/spirv_c_strings.hpp:410:5: error: duplicate case value
case spv::CapabilityShaderViewportIndexLayerNV: return "ShaderViewportIndexLayerNV";
^
include/clspv/spirv_c_strings.hpp:409:5: error: previously used here
case spv::CapabilityShaderViewportIndexLayerEXT: return "ShaderViewportIndexLayerEXT";
^
ninja: build stopped: subcommand failed.
Currently, WorkGroupSize has a private scope variable in case the application dynamically extracts the workgroup size, but it seems more direct to use dynamic vector extraction.
EDIT: Removed mention of interoperability issues, missed that the spec constant had WorkGroupSize declared as a builtin.
Code generated from Convolution Layer case in #11 fails validation.
info: 172: Duplicate non-aggregate type declarations are not allowed. Opcode: 21 error: 1251: The Object type (OpTypePointer) in OpCompositeInsert does not match the type that results from indexing into the Composite (OpTypeRuntimeArray).
This might actually be two distinct issues. I've seen the duplicate non-aggregate type decl issue before, and will file that separately. This issue will be for the type mismatch.
For storage buffers, we're using StorageBuffer storage class, but decorating those objects with BufferBlock.
We're supposed to use Block decorations instead.
See SPV_KHR_storage_buffer_storage_block and VK_KHR_storage_buffer_storage_block.
https://www.khronos.org/registry/vulkan/specs/1.0-extensions/html/vkspec.html#VK_KHR_storage_buffer_storage_class
Vulkan targets SPIR-V 1.0, and OpSelect in 1.0 requires the two data operands to be either scalar or vector. Not pointer type.
Example input:
void kernel __attribute__((reqd_work_group_size(1, 1, 1))) foo(global float2* a, global float2* b, int c)
{
*a = c ? b[0] : b[1];
}
Generates:
%20 = OpLabel
%21 = OpAccessChain %_ptr_StorageBuffer_v2float %16 %uint_0 %uint_0
%22 = OpAccessChain %_ptr_StorageBuffer_v2float %17 %uint_0 %uint_0
%23 = OpAccessChain %_ptr_StorageBuffer_uint %18 %uint_0
%24 = OpLoad %uint %23
%25 = OpIEqual %bool %24 %uint_0
%26 = OpAccessChain %_ptr_StorageBuffer_v2float %17 %uint_0 %uint_1
%27 = OpSelect %_ptr_StorageBuffer_v2float %25 %26 %22
%28 = OpLoad %v2float %27
OpStore %21 %28
OpReturn
The problem is that %27 selects between two pointers
This is forked from comment #2 (comment)
The problem occurs when doing a somewhat simple selection between two loads from Private storage class (or __constant in OpenCL C). The inst-combine optimization changes
In the current compilation scheme the two pointers are in Private storage class, and the compiler incorrectly has the result of the selection being a pointer to StorageBuffer storage class.
Case is as below:
This is the symptom of compiling code like:
__constant float kFirst[3] = {1.0f, 2.0f, 3.0f};
__constant float kSecond[3] = {10.0f, 11.0f, 12.0f};
kernel void foo(global float*A, int c, int i) {
*A = c==0 ? kFirst[i] : kSecond[i];
}
Produces this kind of code:
%33 = OpVariable %_ptr_Private__arr_float_uint_3 Private %23
%34 = OpVariable %_ptr_Private__arr_float_uint_3 Private %27
%35 = OpVariable %_ptr_StorageBuffer__struct_4 StorageBuffer
%36 = OpVariable %_ptr_StorageBuffer__struct_7 StorageBuffer
%37 = OpVariable %_ptr_StorageBuffer__struct_7 StorageBuffer
%38 = OpFunction %void None %11
%39 = OpLabel
%40 = OpAccessChain %_ptr_StorageBuffer_float %35 %uint_0 %uint_0
%41 = OpAccessChain %_ptr_StorageBuffer_uint %36 %uint_0
%42 = OpLoad %uint %41
%43 = OpAccessChain %_ptr_StorageBuffer_uint %37 %uint_0
%44 = OpLoad %uint %43
%45 = OpIEqual %bool %42 %uint_0
%46 = OpSelect %_ptr_StorageBuffer__arr_float_uint_3 %45 %33 %34
%47 = OpAccessChain %_ptr_StorageBuffer_float %46 %44
%48 = OpLoad %float %47
OpStore %40 %48
OpReturn
OpFunctionEnd
Note the OpSelect at %46. Its operands are pointers into Private, but its result is pointer to StorageBuffer. That's invalid. We need the initializers for %33 and %34 but even with VariablePointers we can't select between two different pointer-to-Private values.
See also other cases from comments:
Another effective workaround is to use -O0 to disable the inst-combine optimization. But that's rather drastic.
Example:
kernel void foo(global uchar4* A, int n) {
*A = (uchar4)((uchar)n,1,2,3);
}
Generates code like this:
%27 = OpFunction %void None %10
%28 = OpLabel
%29 = OpAccessChain %_ptr_StorageBuffer_uint %25 %uint_0 %uint_0
%30 = OpAccessChain %_ptr_StorageBuffer_uint_0 %26 %uint_0
%31 = OpLoad %uint %30
%32 = OpUConvert %uint %31
%33 = OpShiftLeftLogical %uint %uint_255 %uint_0
%34 = OpLogicalNot %uint %33 ; THIS IS WRONG
%35 = OpBitwiseAnd %uint %uint_66051 %34
%36 = OpShiftLeftLogical %uint %32 %uint_0
%37 = OpBitwiseOr %uint %35 %36
OpStore %29 %33
OpReturn
When accumulating the bits, this:
IntValue = (IntValue << Idx) | Val;
Should probably be:
IntValue = (IntValue << 8 * Idx) | Val;
Clean up the hacky parts of #59 when we can prove the ecosystem has caught up.
Here the "align" kernel has a reqd_work_group_size. So it gets a LocalSize execution mode. It also suppresses generation of spec IDs for the components of a work-group-size vector.
However, the "boo" kernel is generated without an associated LocalSize execution model. This is an error. The most sane thing to do here is to emit a LocalSize of 1 1 1, and document it.
kernel void __attribute__((reqd_work_group_size(12,2,3))) align(global int* A, int x, float4 c) {
*A = x + (int)c.x;
}
kernel void boo(global int* A, int x, float4 c) {
*A = x + (int)c.x;
}
There's an unfortunate problem in the Vulkan env spec in that a specialization value for workgroup size is freefloating and you can't tell what compute shaders it should affect. So the proposed defaulting is about as good as we can do.
That's invalid by the SPIR-V spec. The validator briefly had that rule turned on, but some bad Vulkan CTS tests have duplicate scalar type decls that are hard to rewrite, so we turned that rule off.
Example input:
kernel void foo(global uchar4* A) {
*A = (uchar4)(1,2,3,4);
}
Appears to be caused by SPIRVProducerPass::GenerateSPIRVTypes where it says "i8 is added to TypeMap as i32."
The issue is that the Types collection (returned by getTypeList) has entries for i8 and i32. But the i8 will be converted at the last second to an i32. So we end up with duplicate OpTypeInt 32 0.
Here's a fragment of the generated assembly:
%1 = OpTypeInt 32 0 ; the first one
%2 = OpTypePointer StorageBuffer %1
%3 = OpTypeRuntimeArray %1
%4 = OpTypeStruct %3
%5 = OpTypePointer StorageBuffer %4
%6 = OpTypeInt 32 0 ; oops
%7 = OpTypeVoid
%8 = OpTypeFunction %7
%9 = OpTypeVector %6 3
Example input:
void kernel foo(global float4* A)
{
float4 value;
value.w = 1111.0f;
*A = value;
}
Produces assembly including this:
%13 = OpUndef %float
%float_1111 = OpConstant %float 1111
%15 = OpConstantComposite %v4float %13 %13 %13 %float_1111
%16 = OpSpecConstant %uint 1
%17 = OpSpecConstant %uint 1
%18 = OpSpecConstant %uint 1
%gl_WorkGroupSize = OpSpecConstantComposite %v3uint %16 %17 %18
%20 = OpVariable %_ptr_Private_v3uint Private %gl_WorkGroupSize
%21 = OpVariable %_ptr_StorageBuffer__struct_5 StorageBuffer
%22 = OpFunction %void None %9
%23 = OpLabel
%24 = OpAccessChain %_ptr_StorageBuffer_v4float %21 %uint_0 %uint_0
OpStore %24 %15
OpReturn
OpFunctionEnd
Some drivers have trouble with the OpConstantComposite. They happen to work if we use
%13 = OpConstantNull %float
instead.
kernel void foo_fixed(global uint* A, global char4* B) {
char4 val = *B;
A[0] = val.x;
A[1] = val.y;
A[2] = val.z;
A[3] = val.w;
}
Generates, in part, this code:
%34 = OpShiftRightLogical %uint %33 %uint_0
%35 = OpBitwiseAnd %uint %34 %uint_255
%36 = OpSConvert %uint %34 ; This signed conversion is not enough
OpStore %31 %36
The bug is that the LLVM i8 is converted to a SPIR-V uint, and so the SConvert is from a uint to a uint. This does not do the sign extension that LLVM expects will be implicitly done as part of the conversion.
The clspv compiler needs to insert more instructions to replicate the sign bit from position 7 to all the remainder.
This is not documented in the language mapping doc.
Supporting 3-element vectors in replace-pointer-bitcasts is complicated because remapping 3-element vector into a pointer-to-4-elem vector can span elements in the target array.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.