Comments (9)
If a machine lacks AVX2, HWY_DYNAMIC_DISPATCH will not call into avx2 code. But maybe there is some behavior change if the vector length differs?
If you want to force either SSE4 (if available) or else scalar, you can #define HWY_DISABLED_TARGETS (HWY_AVX2|HWY_AVX3)
. You could also call DisableTargets() at runtime.
from highway.
@jan-wassenberg , thanks. I think there is still a bug in NearestInt
:
In debug build on my local machine, all unit tests pass, but in release mode, I get failures. And these are the unit tests that use NearestInt
. Any ideas on what would cause optimized builds to fail?
If I disable AVX2, then all tests pass.
Do you have tests comparing NearestInt
with AVX2 to C++ lrintf
?
from highway.
auto y = va_r*r + va_g*g;
gives a different result from
auto y = va_r*r;
y += + va_g*g;
The second calculation matches the C++ calculation.
from highway.
So, my conclusion is that vanilla C++ code compiled inside HWY method gives different results from vanilla C++ code
compiled outside of HWY method..
from highway.
Depending on optimization flags, the compiler may perform unsafe, non-IEEE compliant transformations.
Do you have -ffast-math enabled? I'd recommend against that.
that vanilla C++ code compiled inside HWY method gives different results from vanilla C++ code
compiled outside of HWY method..
This could be due to FMA - compilers may replace a*b+c with FMA (depending on fp-contract/similar flags). That might even be a good thing, though it will change results by 1E-5 or so. Presumably the results would match if you compiled everything with -ffp-contract=fast ?
from highway.
Thanks. I don't have -ffast-math enabled.
So, are you saying that the compiler settings for HWY method will differ from settings for non-HWY ?
from highway.
Yes, that is necessarily the case - on GCC and Clang, you cannot emit SIMD code without -mavx2 etc. (or the target-specific attributes used here). Given we also specify -mfma, I believe the compiler is within its rights to contract some muls.
This seems to be one platform difference we cannot abstract away - SSE4 and AVX2 are going to return different results because the former does not support FMA. Does that make sense?
from highway.
Thanks. I may just disable SSE4 in that case.
But, let's just take AVX2 for now. I am hoping that I can get the same outputs for
both AVX2 and scalar, so that the output is uniform across different CPUs.
Also, even in AVX2 case, I still need to run scalar on last part of a buffer sometimes depending
on the number of threads in the system, as I break up the buffer into chunks, one chunk per thread,
and there can be some leftover bytes.
from highway.
@jan-wassenberg thanks, I think I understand the issue now. I will have to think about the best strategy for my project.
from highway.
Related Issues (20)
- bit_pack-inl.h is missing from CMakeLists.txt
- how to convert int8_t vec to int64_t vec? HOT 2
- Does the Highway have partial sort functionality? HOT 3
- Support GatherIndex different sizes (_mm512_i64gather_epi32 etc.) HOT 2
- [feature request] Add a HWY_REGISTER_CALL macro for __vectorcall HOT 2
- Question: VEX-encoded SSE4 mentioned in `README.md` HOT 8
- Support for complex arithmetics HOT 9
- `SetTableIndices` for `TableLookupBytes` raises a compilation error on NEON HOT 4
- Support for saturating doubling multiply add HOT 9
- Choosing NEON over SVE when fixed size vectors are used where possible HOT 7
- ReorderWidenMulAccumulate with guaranteed lanes order and sum HOT 10
- README is ambiguous due to errant comma
- RFC: renumber Arm targets + Apple feature detection HOT 8
- `RVV` target test failures HOT 1
- About std:rint in highway HOT 1
- Compile failure: SVE vector type 'svbool_t' (aka '__SVBool_t') cannot be used in a target without sve HOT 14
- Question: header only version ? HOT 11
- test with target RVV failed with gcc13 and glibc 2.38 HOT 2
- tests fail on riscv64 ***Exception: Illegal on Milk-V Pioneer HOT 2
- cmake gives error: CMake can not determine linker language HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from highway.