Provide low-level (i.e. compile-time decision) SIMD support for whole <code class="not

Closing. The methods mentioned here would lead to a "naive SIM

More things to consider: How about two- and three-component ve

Low level SIMD support in Math namespace about magnum HOT 2 CLOSED

mosra commented on April 29, 2024

Low level SIMD support in Math namespace

from magnum.

mosra commented on April 29, 2024 2

Closing.

The methods mentioned here would lead to a "naive SIMD" approach, which (according to various benchmarks around the web) is not always the right solution.
Besides that, in my code I'm almost always using two- or three-component vectors, not four-component, thus the native approach of packing everything into _m128 or the like is useless anyway.
Proper way to SIMD is to make larger functions processing larger chunks of data, not just a single matrix multiplication at a time.
The compilers (in some cases) (might be able to) vectorize the code anyway.
Keeping things simple and maintainable. Having five different matrix multiplication implementations which need to be tested for correctness, performance regressions, precision regressions etc. on dozed different obscure machines of varying release date and SDK qualities doesn't help with that. I have enough issues with GL alone :)
If the user needs to process large amount of data and the CPU seems too slow for that, there is the GPU. This is also why this project exists :)
If the user still wants to invert 400x400 matrices on the CPU, it's always possible to integrate another library such as Eigen with more features and better performance.

from magnum.

mosra commented on April 29, 2024

More things to consider:

How about two- and three-component vectors? There are very few places in the engine where four-component vectors are really used (I can't think of anything else except MeshTools::transform() classes, which should have GPU implementation anyway). Don't know how to handle these efficiently, treating them as four-component vectors would be bad for memory performance (and computation performance won't me much better).
How about packing/unpacking SIMD vectors from/to floats? That will hurt memory performance even more, if not done properly. The compilers are already producing SSE-enabled x86 code when optimization is enabled (or at least my GCC on x86-64 does that with -O2, need to investigate if any additional flags are needed for other architectures), wouldn't be better to live with scalar code by default and do SIMD optimizations only for large functions where it's possible to use SOA instead of AOS (e.g. various bulk collision tests in Shapes namespace)?

from magnum.

Recommend Projects