Comments (3)
TLDR: Works with -no-ansi-alias
(aka -fno-strict-aliasing
)
$ icx --version
Intel(R) oneAPI DPC++/C++ Compiler 2024.0.2 (2024.0.2.20231213)
What follows is just what I found out while debugging:
For this example, the code path ends up in
Lines 6232 to 6235 in d5e74eb
The compiler may assume that uint32_t * S
and uint64_t * D
do not alias and hence are different objects, and IntelLLVM takes advantage of that.
IntelLLVM will vectorize the for-loop:
--- !Passed
Pass: slp-vectorizer
Name: StoresVectorized
DebugLoc: { File: 'src/libsais64.c', Line: 6234, Column: 114 }
Function: libsais64
Args:
- String: 'Stores SLP vectorized with cost '
- Cost: '-1'
- String: ' and with tree size '
- TreeSize: '3'
...
The output is [10, 7, 0, ...]
.
When disabling the vectorization via -fno-slp-vectorize
, the output is [10, 0, 0, ..., 0]
.
The first value in SA
, which is accessed via S/D
, is 30064771082
, which is 011100000000000000000000000000001010
in binary.
The first 32 bit are cast to uint64_t
and written. 00.001010
is the 10
we see in the output. The rest is overwritten with 0s.
I'm assuming that the compiler also reorders the loop to go from left to right, I haven't looked at the assembly.
When explicitly marking the pointers as aliasing, e.g.,
typedef __attribute__ ((may_alias)) uint32_t uint32_alias_t;
typedef __attribute__ ((may_alias)) uint64_t uint64_alias_t;
static void libsais64_convert_right_to_left_32u_to_64u(uint32_alias_t * S, uint64_alias_t * D, fast_sint_t omp_block_start, fast_sint_t omp_block_size)
{
fast_sint_t i, j; for (i = omp_block_start + omp_block_size - 1, j = omp_block_start; i >= j; i -= 1) { D[i] = (uint64_t)S[i]; }
}
the code works as expected:
--- !Missed
Pass: gvn
Name: LoadClobbered
DebugLoc: { File: 'src/libsais64.c', Line: 6234, Column: 126 }
Function: libsais64
Args:
- String: 'load of type '
- Type: ''
- String: ' not eliminated'
- String: ' because it is clobbered by '
- ClobberedBy: store
DebugLoc: { File: 'src/libsais64.c', Line: 6234, Column: 114 }
...
the same can be achieved with
static void libsais64_convert_right_to_left_32u_to_64u(uint32_alias_t * S, uint64_alias_t * D, fast_sint_t omp_block_start, fast_sint_t omp_block_size)
{
volatile fast_sint_t i;
fast_sint_t j; for (i = omp_block_start + omp_block_size - 1, j = omp_block_start; i >= j; i -= 1) { D[i] = (uint64_t)S[i]; }
}
then the loop has to be evaluated as written because no assumptions about i
in [i]
can be made.
There may be other places where strict-aliasing is violated, which are not hit by the example.
Even though Clang/GCC enable strict-aliasing with (at least) O2 and above, it seems like IntelLLVM's more aggressive optimizations make the difference in this case.
from libsais.
I agree that, from a strict-aliasing rules perspective, my code is not compliant and I will make the necessary changes to correct this. I am looking for a solution that is more compiler-independent.
from libsais.
Ok. This should be fixed in latest version. I also confirmed that @eseiler is correct on assumption. The issue is that compiler reordered the loop to go from left to right. Unfortunately with the fix I was not able to preserve vectorization as I have to opt it to more compiler-independent approach.
from libsais.
Related Issues (16)
- UB in libsais_unbwt HOT 1
- multiple vulnerabilities in bzip3 HOT 5
- feature request: binary BWT HOT 1
- segmentation fault when building the suffix array for some large texts HOT 3
- Makefile missing HOT 3
- strange bug in combination with malloc_count HOT 2
- PLCP and LCP arrays for integer suffix trees HOT 1
- Is SA-IS means using induce sort for suffix array? yuta mori has Open source code of SA-IS HOT 2
- Constructing the suffix array of a string set HOT 8
- Why not compute the LCP Array during the Suffix Array's construction ? HOT 4
- Overlapping parameters are passed to memcpy HOT 2
- The lib builds has some problem HOT 1
- gnuplot?
- Crash for file size close to 2GB HOT 5
- Feature request? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from libsais.