Comments (2)
Here's an example of the complexity I'm concerned about:
global float2* OUT;
float3 value = ...
vstore_half3(value, 1, (half*)OUT);
vstore_half3 says:
The floatn value given by data is converted
to a halfn value using the appropriate
rounding mode. The halfn value is then
written to address computed as (p + (offset * n)). The address computed as (p + (offset * n)) must be 16-bit aligned.
vstore_halfn uses the default rounding
mode. The default rounding mode is round
to nearest even.
In this case n = 3. So we should be writing at least 48 bits of data (possibly 64 bits) starting at 48bits into the OUT array. Each OUT element is 64 bits wide, so that means we have to update two adjacent elements:
- the last 16 bits of element 0 of OUT
- the first 32 bits (possibly 48 bits) of element 1 of OUT
from clspv.
I think this is fixed now with the ThreeElementVector
pass
from clspv.
Related Issues (20)
- UNREACHABLE executed at lib/BitcastUtils.cpp:104 HOT 2
- incorrect execution for simple reduction sample HOT 5
- Segfaults when passing struct in global memory by value to a function. HOT 1
- Incorrectly warns about mixing code and declarations. HOT 9
- Q: how to trigger the use of uniform buffers. HOT 2
- clang generates definitions of sqrt
- Implement sampler mask using uniform buffer instead of push constant
- SimplifyPointerBitcast performs wrong simplification HOT 1
- math functions issue since instcombine transform "and" pattern to call to fabs HOT 1
- Invalid word count when emitting OpConstant HOT 3
- Segmentation fault HOT 1
- OpAtomicUMax may have the wrong pointer type after cast + offset HOT 1
- Temporary Bugfix: SimplifyPointerBitcastPass emits faulty IR that loses OpLoads of 16 bit fields HOT 2
- ReplacePointerBitcastPass introduces invalid IR with struct field of a struct HOT 3
- migrate from deprecated Type::isOpaquePointerTy() to Type::isPointerTy() HOT 1
- Ternary operator on AMD vk drivers. HOT 3
- Segmentation fault with conditional pointer assignment from different address spaces HOT 4
- Canonicalization of GEPs to i8 HOT 10
- Constant initialized global variable rewrites produce invalid IR
- Handle LLVM intrinsic llvm.is_fpclass
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from clspv.