Git Product home page Git Product logo

Comments (9)

jan-wassenberg avatar jan-wassenberg commented on May 8, 2024

If a machine lacks AVX2, HWY_DYNAMIC_DISPATCH will not call into avx2 code. But maybe there is some behavior change if the vector length differs?

If you want to force either SSE4 (if available) or else scalar, you can #define HWY_DISABLED_TARGETS (HWY_AVX2|HWY_AVX3). You could also call DisableTargets() at runtime.

from highway.

boxerab avatar boxerab commented on May 8, 2024

@jan-wassenberg , thanks. I think there is still a bug in NearestInt:
In debug build on my local machine, all unit tests pass, but in release mode, I get failures. And these are the unit tests that use NearestInt. Any ideas on what would cause optimized builds to fail?

If I disable AVX2, then all tests pass.

Do you have tests comparing NearestInt with AVX2 to C++ lrintf ?

from highway.

boxerab avatar boxerab commented on May 8, 2024
auto y = va_r*r + va_g*g;

gives a different result from

auto y = va_r*r;
y +=  + va_g*g;

The second calculation matches the C++ calculation.

from highway.

boxerab avatar boxerab commented on May 8, 2024

So, my conclusion is that vanilla C++ code compiled inside HWY method gives different results from vanilla C++ code
compiled outside of HWY method..

from highway.

jan-wassenberg avatar jan-wassenberg commented on May 8, 2024

Depending on optimization flags, the compiler may perform unsafe, non-IEEE compliant transformations.
Do you have -ffast-math enabled? I'd recommend against that.

that vanilla C++ code compiled inside HWY method gives different results from vanilla C++ code
compiled outside of HWY method..

This could be due to FMA - compilers may replace a*b+c with FMA (depending on fp-contract/similar flags). That might even be a good thing, though it will change results by 1E-5 or so. Presumably the results would match if you compiled everything with -ffp-contract=fast ?

from highway.

boxerab avatar boxerab commented on May 8, 2024

Thanks. I don't have -ffast-math enabled.
So, are you saying that the compiler settings for HWY method will differ from settings for non-HWY ?

from highway.

jan-wassenberg avatar jan-wassenberg commented on May 8, 2024

Yes, that is necessarily the case - on GCC and Clang, you cannot emit SIMD code without -mavx2 etc. (or the target-specific attributes used here). Given we also specify -mfma, I believe the compiler is within its rights to contract some muls.

This seems to be one platform difference we cannot abstract away - SSE4 and AVX2 are going to return different results because the former does not support FMA. Does that make sense?

from highway.

boxerab avatar boxerab commented on May 8, 2024

Thanks. I may just disable SSE4 in that case.

But, let's just take AVX2 for now. I am hoping that I can get the same outputs for
both AVX2 and scalar, so that the output is uniform across different CPUs.

Also, even in AVX2 case, I still need to run scalar on last part of a buffer sometimes depending
on the number of threads in the system, as I break up the buffer into chunks, one chunk per thread,
and there can be some leftover bytes.

from highway.

boxerab avatar boxerab commented on May 8, 2024

@jan-wassenberg thanks, I think I understand the issue now. I will have to think about the best strategy for my project.

from highway.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.