Git Product home page Git Product logo

Comments (3)

eseiler avatar eseiler commented on July 22, 2024 2

TLDR: Works with -no-ansi-alias (aka -fno-strict-aliasing)

$ icx --version
Intel(R) oneAPI DPC++/C++ Compiler 2024.0.2 (2024.0.2.20231213)

What follows is just what I found out while debugging:

For this example, the code path ends up in

libsais/src/libsais64.c

Lines 6232 to 6235 in d5e74eb

static void libsais64_convert_right_to_left_32u_to_64u(uint32_t * S, uint64_t * D, fast_sint_t omp_block_start, fast_sint_t omp_block_size)
{
fast_sint_t i, j; for (i = omp_block_start + omp_block_size - 1, j = omp_block_start; i >= j; i -= 1) { D[i] = (uint64_t)S[i]; }
}

The compiler may assume that uint32_t * S and uint64_t * D do not alias and hence are different objects, and IntelLLVM takes advantage of that.

IntelLLVM will vectorize the for-loop:

--- !Passed
Pass:            slp-vectorizer
Name:            StoresVectorized
DebugLoc:        { File: 'src/libsais64.c', Line: 6234, Column: 114 }
Function:        libsais64
Args:
  - String:          'Stores SLP vectorized with cost '
  - Cost:            '-1'
  - String:          ' and with tree size '
  - TreeSize:        '3'
...

The output is [10, 7, 0, ...].

When disabling the vectorization via -fno-slp-vectorize, the output is [10, 0, 0, ..., 0].

The first value in SA, which is accessed via S/D, is 30064771082, which is 011100000000000000000000000000001010 in binary.
The first 32 bit are cast to uint64_t and written. 00.001010 is the 10 we see in the output. The rest is overwritten with 0s.
I'm assuming that the compiler also reorders the loop to go from left to right, I haven't looked at the assembly.

When explicitly marking the pointers as aliasing, e.g.,

typedef __attribute__ ((may_alias)) uint32_t uint32_alias_t;
typedef __attribute__ ((may_alias)) uint64_t uint64_alias_t;

static void libsais64_convert_right_to_left_32u_to_64u(uint32_alias_t * S, uint64_alias_t * D, fast_sint_t omp_block_start, fast_sint_t omp_block_size)
{
    fast_sint_t i, j; for (i = omp_block_start + omp_block_size - 1, j = omp_block_start; i >= j; i -= 1) { D[i] = (uint64_t)S[i]; }
}

the code works as expected:

--- !Missed
Pass:            gvn
Name:            LoadClobbered
DebugLoc:        { File: 'src/libsais64.c', Line: 6234, Column: 126 }
Function:        libsais64
Args:
  - String:          'load of type '
  - Type:            ''
  - String:          ' not eliminated'
  - String:          ' because it is clobbered by '
  - ClobberedBy:     store
    DebugLoc:        { File: 'src/libsais64.c', Line: 6234, Column: 114 }
...

the same can be achieved with

static void libsais64_convert_right_to_left_32u_to_64u(uint32_alias_t * S, uint64_alias_t * D, fast_sint_t omp_block_start, fast_sint_t omp_block_size)
{
    volatile fast_sint_t i;
    fast_sint_t j; for (i = omp_block_start + omp_block_size - 1, j = omp_block_start; i >= j; i -= 1) { D[i] = (uint64_t)S[i]; }
}

then the loop has to be evaluated as written because no assumptions about i in [i] can be made.

There may be other places where strict-aliasing is violated, which are not hit by the example.
Even though Clang/GCC enable strict-aliasing with (at least) O2 and above, it seems like IntelLLVM's more aggressive optimizations make the difference in this case.

from libsais.

IlyaGrebnov avatar IlyaGrebnov commented on July 22, 2024 2

I agree that, from a strict-aliasing rules perspective, my code is not compliant and I will make the necessary changes to correct this. I am looking for a solution that is more compiler-independent.

from libsais.

IlyaGrebnov avatar IlyaGrebnov commented on July 22, 2024

Ok. This should be fixed in latest version. I also confirmed that @eseiler is correct on assumption. The issue is that compiler reordered the loop to go from left to right. Unfortunately with the fix I was not able to preserve vectorization as I have to opt it to more compiler-independent approach.

from libsais.

Related Issues (16)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.