aff3ct / mipp Goto Github PK

MIPP is a portable wrapper for SIMD instructions written in C++11. It supports NEON, SSE, AVX, AVX-512 and SVE (length specific).

License: MIT License

C++ 98.17% CMake 0.78% C 0.26% Shell 0.65% Batchfile 0.15%

avx avx-512 neon portable simd sse sve vector wrapper

mipp's Introduction

AFF3CT: A Fast Forward Error Correction Toolbox!

AFF3CT is a simulator and a library dedicated to the Forward Error Correction (FEC or channel coding). It is written in C++ and it supports a large range of codes: from the well-spread Turbo codes to the new Polar codes including the Low-Density Parity-Check (LDPC) codes. AFF3CT can be used as a command line program and it simulates communication chains based on a Monte Carlo method.

It is very easy to use, for instance, to estimate the BER/FER decoding performances of the (2048,1723) Polar code from 1.0 to 4.0 dB:

aff3ct -C "POLAR" -K 1723 -N 2048 -m 1.0 -M 4.0 -s 1.0

And the output will be:

# ----------------------------------------------------
# ---- A FAST FORWARD ERROR CORRECTION TOOLBOX >> ----
# ----------------------------------------------------
# Parameters :
# [...]
#
# The simulation is running...
# ---------------------||------------------------------------------------------||---------------------
#  Signal Noise Ratio  ||   Bit Error Rate (BER) and Frame Error Rate (FER)    ||  Global throughput
#         (SNR)        ||                                                      ||  and elapsed time
# ---------------------||------------------------------------------------------||---------------------
# ----------|----------||----------|----------|----------|----------|----------||----------|----------
#     Es/N0 |    Eb/N0 ||      FRA |       BE |       FE |      BER |      FER ||  SIM_THR |    ET/RT
#      (dB) |     (dB) ||          |          |          |          |          ||   (Mb/s) | (hhmmss)
# ----------|----------||----------|----------|----------|----------|----------||----------|----------
       0.25 |     1.00 ||      104 |    16425 |      104 | 9.17e-02 | 1.00e+00 ||    4.995 | 00h00'00
       1.25 |     2.00 ||      104 |    12285 |      104 | 6.86e-02 | 1.00e+00 ||   13.678 | 00h00'00
       2.25 |     3.00 ||      147 |     5600 |      102 | 2.21e-02 | 6.94e-01 ||   14.301 | 00h00'00
       3.25 |     4.00 ||     5055 |     2769 |      100 | 3.18e-04 | 1.98e-02 ||   30.382 | 00h00'00
# End of the simulation.

Features

The simulator targets high speed simulations and extensively uses parallel techniques like SIMD, multi-threading and multi-nodes programming models. Below, a list of the features that motivated the creation of the simulator:

reproduce state-of-the-art decoding performances,
explore various channel code configurations, find new trade-offs,
prototype hardware implementation (fixed-point receivers, hardware in the loop tools),
reuse tried and tested modules and add yours,
alternative to MATLAB, if you seek to reduce simulations time.

AFF3CT was first intended to be a simulator but as it developed, the need to reuse sub-parts of the code intensified: the library was born. Below is a list of possible applications for the library:

build custom communication chains that are not possible with the simulator,
facilitate hardware prototyping,
enable various modules to be used in SDR contexts.

If you seek for using AFF3CT as a library, please refer to the dedicated documentation page.

Installation

First make sure to have installed a C++11 compiler, CMake and Git. Then install AFF3CT by running:

git clone --recursive https://github.com/aff3ct/aff3ct.git
mkdir aff3ct/build
cd aff3ct/build
cmake .. -DCMAKE_BUILD_TYPE="Release"
make -j4

Contribute

Support

If you are having issues, please let us know on our issue tracker.

License

The project is licensed under the MIT license.

How to cite AFF3CT

We recommend you to cite the SoftwareX journal article: A. Cassagne et al., “AFF3CT: A Fast Forward Error Correction Toolbox!,“ Elsevier SoftwareX, 2019 [Bibtex Entry].

External Links

mipp's People

Contributors

Stargazers

Watchers

mipp's Issues

Last argument must be an 8-bit immediate

I tried to build my project for avx512 but I'm getting some errors which can be reproduced building the test on the same target.

[  6%] Building CXX object CMakeFiles/run_tests.dir/src/arithmetic_operations/fmadd.cpp.o
[  6%] Building CXX object CMakeFiles/run_tests.dir/src/arithmetic_operations/div4.cpp.o
[  6%] Building CXX object CMakeFiles/run_tests.dir/src/arithmetic_operations/div.cpp.o
[  6%] Building CXX object CMakeFiles/run_tests.dir/src/arithmetic_operations/div2.cpp.o
In file included from /home/debian/MIPP/tests/../src/mipp.h:1240:0,
                 from /home/debian/MIPP/tests/src/arithmetic_operations/div4.cpp:6:
/home/debian/MIPP/tests/../src/mipp_impl_AVX512.hxx: In function ‘mipp::reg mipp::rshift(mipp::reg, uint32_t) [with T = long int]’:
/home/debian/MIPP/tests/../src/mipp_impl_AVX512.hxx:2164:29: error: the last argument must be an 8-bit immediate
   return _mm512_castsi512_ps(_mm512_srli_epi64(_mm512_castps_si512(v1), n));
          ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/lib/gcc/x86_64-linux-gnu/6/include/immintrin.h:45:0,
                 from /home/debian/MIPP/tests/../src/math/avx512_mathfun.h:14,
                 from /home/debian/MIPP/tests/../src/mipp.h:45,
                 from /home/debian/MIPP/tests/src/arithmetic_operations/div4.cpp:6:
/home/debian/MIPP/tests/../src/mipp_impl_AVX512.hxx: In function ‘mipp::reg mipp::rshift(mipp::reg, uint32_t) [with T = int]’:
/home/debian/MIPP/tests/../src/mipp_impl_AVX512.hxx:2170:30: error: the last argument must be an 8-bit immediate
   return _mm512_castsi512_ps(_mm512_srli_epi32(_mm512_castps_si512(v1), n));
                              ^
In file included from /usr/lib/gcc/x86_64-linux-gnu/6/include/immintrin.h:55:0,
                 from /home/debian/MIPP/tests/../src/math/avx512_mathfun.h:14,
                 from /home/debian/MIPP/tests/../src/mipp.h:45,
                 from /home/debian/MIPP/tests/src/arithmetic_operations/div4.cpp:6:
/home/debian/MIPP/tests/../src/mipp_impl_AVX512.hxx: In function ‘mipp::reg mipp::rshift(mipp::reg, uint32_t) [with T = short int]’:
/home/debian/MIPP/tests/../src/mipp_impl_AVX512.hxx:2176:30: error: the last argument must be an 8-bit immediate
   return _mm512_castsi512_ps(_mm512_srli_epi16(_mm512_castps_si512(v1), n));
                              ^
In file included from /home/debian/MIPP/tests/../src/mipp.h:1240:0,
                 from /home/debian/MIPP/tests/src/arithmetic_operations/div2.cpp:6:
/home/debian/MIPP/tests/../src/mipp_impl_AVX512.hxx: In function ‘mipp::reg mipp::rshift(mipp::reg, uint32_t) [with T = long int]’:
/home/debian/MIPP/tests/../src/mipp_impl_AVX512.hxx:2164:29: error: the last argument must be an 8-bit immediate
   return _mm512_castsi512_ps(_mm512_srli_epi64(_mm512_castps_si512(v1), n));
          ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/lib/gcc/x86_64-linux-gnu/6/include/immintrin.h:45:0,
                 from /home/debian/MIPP/tests/../src/math/avx512_mathfun.h:14,
                 from /home/debian/MIPP/tests/../src/mipp.h:45,
                 from /home/debian/MIPP/tests/src/arithmetic_operations/div2.cpp:6:
/home/debian/MIPP/tests/../src/mipp_impl_AVX512.hxx: In function ‘mipp::reg mipp::rshift(mipp::reg, uint32_t) [with T = int]’:
/home/debian/MIPP/tests/../src/mipp_impl_AVX512.hxx:2170:30: error: the last argument must be an 8-bit immediate
   return _mm512_castsi512_ps(_mm512_srli_epi32(_mm512_castps_si512(v1), n));
                              ^
In file included from /usr/lib/gcc/x86_64-linux-gnu/6/include/immintrin.h:55:0,
                 from /home/debian/MIPP/tests/../src/math/avx512_mathfun.h:14,
                 from /home/debian/MIPP/tests/../src/mipp.h:45,
                 from /home/debian/MIPP/tests/src/arithmetic_operations/div2.cpp:6:
/home/debian/MIPP/tests/../src/mipp_impl_AVX512.hxx: In function ‘mipp::reg mipp::rshift(mipp::reg, uint32_t) [with T = short int]’:
/home/debian/MIPP/tests/../src/mipp_impl_AVX512.hxx:2176:30: error: the last argument must be an 8-bit immediate
   return _mm512_castsi512_ps(_mm512_srli_epi16(_mm512_castps_si512(v1), n));
                              ^
CMakeFiles/run_tests.dir/build.make:182: recipe for target 'CMakeFiles/run_tests.dir/src/arithmetic_operations/div4.cpp.o' failed
make[2]: *** [CMakeFiles/run_tests.dir/src/arithmetic_operations/div4.cpp.o] Error 1
make[2]: *** Waiting for unfinished jobs....
[  8%] Building CXX object CMakeFiles/run_tests.dir/src/arithmetic_operations/fmsub.cpp.o
CMakeFiles/run_tests.dir/build.make:158: recipe for target 'CMakeFiles/run_tests.dir/src/arithmetic_operations/div2.cpp.o' failed
make[2]: *** [CMakeFiles/run_tests.dir/src/arithmetic_operations/div2.cpp.o] Error 1
CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/run_tests.dir/all' failed
make[1]: *** [CMakeFiles/run_tests.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2

Plan for ARM SVE support

Hi, does MIPP plan to support ARM SVE? Thanks

uintptr_t support for pointer bulk operations

Hi , C++11 is new standard and uintptr_t is used for arithmetic operations on pointer address.
it would be good for adding uintptr_t operations to mipp.

hadd confusing

The reduction method hadd is a bit confusing. In AVX2 this is used for a pairwise add so the result is also a vector not a scalar. Maybe there is the possibility to change the behavior for the other implementations?

MIPP/src/mipp.h

Line 1224 in 6be95bb

 template <typename T> inline T hadd(const reg v) { return reduction<T,mipp::add<T>>::sapply(v); } 

mipp::mul<int64_t> (AVX512) is undefined!

Hello,

the following code:

#include <iostream>
#include "mipp.h"

int main(int argc,char **argv)
{
  mipp::Reg<int64_t> b = 3;
  mipp::Reg<int64_t> c = 9;

  auto d = b*c;

  std::cout << b << std::endl;
  std::cout << c << std::endl;
  std::cout << d << std::endl;

  return 1;
}

terminate called after throwing an instance of 'std::runtime_error'
what(): mipp::mul<int64_t> (AVX512) is undefined!

fix proposal in mipp_impl_AVX512.hxx :

#if defined(__AVX512DQ__)
        template <>
	inline reg mul<int64_t>(const reg v1, const reg v2) {
		return _mm512_castsi512_ps(_mm512_mullo_epi64(_mm512_castps_si512(v1), _mm512_castps_si512(v2)));
	}
#else
        template <>
	inline reg mul<int64_t>(const reg v1, const reg v2) {
		return _mm512_castsi512_ps(_mm512_mullox_epi64(_mm512_castps_si512(v1), _mm512_castps_si512(v2)));
	}    
#endif

Clearify using deinterleave

I tried to use the new Regx2 functionality in combination with complex numbers.

#include "../src/mipp.h"
#include <complex>
int main(){

    std::complex<float> data[16];
    data[0].real(0);
    data[0].imag(1);
    data[1].real(2);
    data[1].imag(3);
    data[2].real(4);
    data[2].imag(5);
    data[3].real(6);
    data[3].imag(7);
    data[4].real(8);
    data[4].imag(9);
    data[5].real(10);
    data[5].imag(11);
    data[6].real(12);
    data[6].imag(13);
    data[7].real(14);
    data[7].imag(15);
    data[8].real(16);
    data[8].imag(17);
    data[9].real(18);
    data[9].imag(19);
    mipp::Regx2<float> a;
    a.load((float*)data);
    a = a.deinterleave();
    std::cout<< a[0] << std::endl;
}

This produces

[     0,     16,      1,     17 |      2,     18,      3,     19 |      4,      0,      5,      0 |      6,      0,      7,      0]

So from my point of view they are still interleaved? I would have expected that a[0] contains all real parts?

All-in-one header

Hello, thank you for the great library and all the work you've done.
This is more of a feature request, is it possible to have all-in-one include header?
The reason behind is a prospect of using it in online Compiler Explorer.
Thank you.

Query about uint8 support

Hi, is there a plan to support uint8? It could be especially useful in image processing

Arm NEON add

Hello,
I see in mipp_impl_NEON.hxx that some add (also sub functions) use saturated arithmetic (vqadd...) for less than 16 bits type size while others don't (vadd...) for other type size.
Is it a choice or a mistake ?
If it's a choice: why this choice ?

A+
Thomas

Can you add an example about image processing?

The examples here are not rich enough
https://github.com/aff3ct/MIPP/tree/master/examples

Can you add an example of digital image processing algorithms such as threshold/canny/sobel?

Are there asin, acos or atan functions?

I found the opencv function phase is the bottleneck in my algorithm. So I am trying to implement the phase function with mipp. But it seems that mipp has all trigonometric functions except for arc ones. Any possibility that these functions will be uploaded recently?
BTW, why is there no other issues posted?

Doubt regarding horizontal sum in MIPP

Hi MIPP team,

Thank you for your great work and contribution.

I basically had a doubt about the availability of vectorized horizontal sum in MIPP. I want to perform the sum of all the elements in a Registers using vector instructions. Is it possible to assist me in finding the same implementation in MIPP?

Basically, I have 32 elements(each element of 8 bits) in a 256-bit register; I want to add all the 32 elements using the vector instructions.

Look forward to hearing from you.

Thanks,
Darshan

Is mipp::vector using SIMD?

I wonder if mipp::vector is using SIMD, or it's just a handy tool to assign aligned vectors (so that we can load data to mipp::Reg conveniently)?

Question: Is there any version plan for this library

Hi,
This library is super for writtng portable code which targets to ARM and Intel CPU.
But I didn't find any release version or tag for this library.
So, are we have any plan to release this library?

Segmentation fault when calling mipp::load<float>()

I get a segmentation fault when I call mipp::load<float>(ptr) when optimizations are turned off (my GCC flags -O0 -march=skylake). As far as I can tell, it happens because the body of the function is not inlined. It can be fixed by adding an attribute to all such functions (__attribute__((always_inline)) for GCC and Clang). I'm not sure about MSVS, though. Is it a known issue or maybe I just don't know/understand something? I wouldn't like to completely disable intrinsics for my debug builds. Is there a chance you will add this attribute?

testz() not working properly on AVX512

The result of testz(m_not) is incorrect. It should be 1 instead of 0.
The result is correct on AVX2.

Code:

  std::cout << mipp::InstructionFullType << std::endl;
  using T = int16_t;
  constexpr auto N = mipp::N<T>();
  cout << "N = " << N << endl;
  Msk<N> m = true;
  cout << "m = " << m << "\n";
  cout << "testz(m) = " << mipp::testz(m) << "\n";
  Msk<N> m_not = ~m; // & true;
  cout << "m_not = " << m_not << "\n";
  cout << "testz(m_not) = " << mipp::testz(m_not) << "\n";

Result:

AVX512
N = 32
m = [ 1, 1, 1, 1, 1, 1, 1, 1 | 1, 1, 1, 1, 1, 1, 1, 1 | 1, 1, 1, 1, 1, 1, 1, 1 | 1, 1, 1, 1, 1, 1, 1, 1]
testz(m) = 0
m_not = [ 0, 0, 0, 0, 0, 0, 0, 0 | 0, 0, 0, 0, 0, 0, 0, 0 | 0, 0, 0, 0, 0, 0, 0, 0 | 0, 0, 0, 0, 0, 0, 0, 0]
testz(m_not) = 0

overflow warnings with GCC

I'm getting the following warning (or in this case errors as I have warnings as errors enabled) when trying to compile MIPP on an ARM64 platform:

[build] /home/pi/convolution-thing/source/../externals/MIPP/src/mipp_impl_NEON.hxx:2181:38: error: overflow in conversion from ‘int’ to ‘int16_t’ {aka ‘short int’} changes value from ‘32768’ to ‘-32768’ [-Werror=overflow]

Looks like the AVX version adds this, so it knows it's going to overflow, but only hides this in this instance in Visual Studio:

#ifdef _MSC_VER
#pragma warning( disable : 4309 )
#endif

What is the code actually trying to do? If it knows it's going to overflow, why not just use -0x8000 directly, instead of 0x8000 which overflows to -0x8000 (and ditto for other integer types).

Rust support?

Rust is new generation system programming language.
It has far better solutions then C++
There is no performance penalty in rust.

Everybody is migrating to Rust.
Do you have any plan to add this great library to Rust Cargo package ?

More simple comparison

I'm currently evaluating multiple SIMD libs.
While looking in the code I recognized, that the less-than comparisons are realized a bit complicated for my opinion.

MIPP/src/mipp_impl_AVX.hxx

Line 1993 in 2a26291

return notb<N<int32_t>()>(cmpge<int32_t>(v1, v2));

Wouldn't it be enough to check for a simple greater with 2 flipped arguments, instead of calculating not greater-equal?

sizeof(reg_2<xxx>) return 64 instead of 32 with AVX512

Hello the following code with AVX512:

  std::cout << sizeof(mipp::Reg<int32_t>) << std::endl;
  std::cout << sizeof(mipp::Reg_2<int32_t>) <<  std::endl;

  std::cout << sizeof(mipp::reg) << std::endl;
  std::cout << sizeof(mipp::reg_2) << std::endl;

Gives the following result
64
64
64
32

while the expected result is:
64
32
64
32

When I look at Reg_2 I see:

virtual ~Reg_2() = default;

There is no reason to have a virtual destructor here (Reg do not have one).
Removing "virtual" solves the problem.

A+
Thomas

Exception: (AVX2) is undefined (type uint8_t)

Any clue?

terminate called after throwing an instance of 'std::runtime_error'
  what():  mipp::set1<uint8_t> (AVX2) is undefined!

Using fmadd or fmsub in cmul

Hi,

have you thought about using fmadd or fmsub inside the cmul?

MIPP/src/mipp.h

Lines 882 to 883 in 397ff5b

 auto v3_re = mipp::sub<T>(mipp::mul<T>(v1.val[0], v2.val[0]), mipp::mul<T>(v1.val[1], v2.val[1])); 

 auto v3_im = mipp::add<T>(mipp::mul<T>(v1.val[0], v2.val[1]), mipp::mul<T>(v1.val[1], v2.val[0]));

auto v3_re = mipp::fmsub<T>(v1.val[0], v2.val[0], mipp::mul<T>(v1.val[1], v2.val[1]));
auto v3_im = mipp::fmadd<T>(v1.val[0], v2.val[1], mipp::mul<T>(v1.val[1], v2.val[0]));

function with constexpr mipp::Reg<float> return is refused

trying to create a function with constexpr mipp::Reg<float> return, the compiler refuses with following error message:

/home/ayguen/.local/include/mipp/mipp_object.hxx:23:7: note: ‘mipp::Reg’ is not literal because:
23 | class Reg
| ^~~
/home/ayguen/.local/include/mipp/mipp_object.hxx:23:7: note: ‘mipp::Reg’ has a non-trivial destructor

haven't dived into the sources yet ..
why is there any need for a non-trivial destructor?
is there a simple way to deactivate it?

Question: mipp::set<1> (NO_INTRINSICS) is undefined

In MIPP\tests\src\bitwise_operations\xorb.cpp, I was trying to test this function,


template <typename T>
void test_msk_xorb()
{
	constexpr int N = mipp::N<T>();
	bool inputs1[N], inputs2[N];
	std::mt19937 g;
	std::uniform_int_distribution<uint16_t> dis(0, 1);

	for (auto t = 0; t < 100; t++)
	{
		for (auto i = 0; i < N; i++)
		{
			inputs1[i] = dis(g) ? true : false;
			inputs2[i] = dis(g) ? true : false;
		}

		std::shuffle(inputs1, inputs1 + mipp::N<T>(), g);
		std::shuffle(inputs2, inputs2 + mipp::N<T>(), g);

		mipp::msk m1 = mipp::set<N>(inputs1);
		mipp::msk m2 = mipp::set<N>(inputs2);
		mipp::msk m3 = mipp::xorb<N>(m1, m2);

		mipp::reg r = mipp::toreg<N>(m3);

		for (auto i = 0; i < N; i++)
		{
			bool res = inputs1[i] ^ inputs2[i];

			if (res)
				REQUIRE(mipp::get<T>(r, i) != (T)0);
			else
				REQUIRE(mipp::get<T>(r, i) == (T)res);
		}
	}
}

So I add a few lines into the test line,

TEST_CASE("Binary xor - mipp::Msk", "[mipp::xorb]")
{
#if defined(MIPP_64BIT)
	SECTION("datatype = int64_t") { test_Msk_xorb<int64_t>(); }
#endif
	SECTION("datatype = int32_t") { test_Msk_xorb<int32_t>(); }
#if defined(MIPP_BW)
	SECTION("datatype = int16_t") { test_Msk_xorb<int16_t>(); }
	SECTION("datatype = int8_t") { test_Msk_xorb<int8_t>(); }
#endif

#if defined(MIPP_64BIT)
	SECTION("datatype = int64_t") { test_msk_xorb<int64_t>(); }
#endif
	SECTION("datatype = int32_t") { test_msk_xorb<int32_t>(); }
#if defined(MIPP_BW)
	SECTION("datatype = int16_t") { test_msk_xorb<int16_t>(); }
	SECTION("datatype = int8_t") { test_msk_xorb<int8_t>(); }
#endif
}

However I got the following error info at runtime,

MIPP tests
----------

Instr. type:       NO
Instr. full type:  NO_INTRINSICS
Instr. version:    1
Instr. size:       0 bits
Instr. lanes:      1
64-bit support:    yes
Byte/word support: yes


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
run_tests.exe is a Catch v2.2.2 host application.
Run with -? for options

-------------------------------------------------------------------------------
Binary xor - mipp::Msk
  datatype = int64_t
-------------------------------------------------------------------------------
E:\vLIBS\MIPP\tests\src\bitwise_operations\xorb.cpp(174)
...............................................................................

E:\vLIBS\MIPP\tests\src\bitwise_operations\xorb.cpp(174): FAILED:
due to unexpected exception with message:
  mipp::set<1> (NO_INTRINSICS) is undefined!, try to add -mfpu=neon-vfpv4, -
  msse4.2, -mavx, -march=native... at the compile time.

-------------------------------------------------------------------------------
Binary xor - mipp::Msk
  datatype = int32_t
-------------------------------------------------------------------------------
E:\vLIBS\MIPP\tests\src\bitwise_operations\xorb.cpp(176)
...............................................................................

E:\vLIBS\MIPP\tests\src\bitwise_operations\xorb.cpp(176): FAILED:
due to unexpected exception with message:
  mipp::set<1> (NO_INTRINSICS) is undefined!, try to add -mfpu=neon-vfpv4, -
  msse4.2, -mavx, -march=native... at the compile time.

-------------------------------------------------------------------------------
Binary xor - mipp::Msk
  datatype = int16_t
-------------------------------------------------------------------------------
E:\vLIBS\MIPP\tests\src\bitwise_operations\xorb.cpp(178)
...............................................................................

E:\vLIBS\MIPP\tests\src\bitwise_operations\xorb.cpp(178): FAILED:
due to unexpected exception with message:
  mipp::set<1> (NO_INTRINSICS) is undefined!, try to add -mfpu=neon-vfpv4, -
  msse4.2, -mavx, -march=native... at the compile time.

-------------------------------------------------------------------------------
Binary xor - mipp::Msk
  datatype = int8_t
-------------------------------------------------------------------------------
E:\vLIBS\MIPP\tests\src\bitwise_operations\xorb.cpp(179)
...............................................................................

E:\vLIBS\MIPP\tests\src\bitwise_operations\xorb.cpp(179): FAILED:
due to unexpected exception with message:
  mipp::set<1> (NO_INTRINSICS) is undefined!, try to add -mfpu=neon-vfpv4, -
  msse4.2, -mavx, -march=native... at the compile time.

===============================================================================
test cases:   123 |   122 passed | 1 failed
assertions: 10353 | 10349 passed | 4 failed

It seems mipp::set<1> is not defined, but in mipp.h I can clearly see that the set function is defined as,

template <typename T> inline reg   set          (const T[nElReg<T>()])            { errorMessage<T>("set");           exit(-1); }
#ifdef _MSC_VER
template <int      N> inline msk   set          (const bool[])                    { errorMessage<N>("set");           exit(-1); }
#else
template <int      N> inline msk   set          (const bool[N])                   { errorMessage<N>("set");           exit(-1); }
#endif

I add the command "-mfpu=neon-vfpv4" to msvc2019 compiler, it seems doesn't help at all.
Could anybody give some explanation about this?

__PRETTY_FUNCTION__ macro is not defined by MSVC

According this page, the alternative macro is __func__ or __FUNCSIG__ or __FUNCTION__.

So far,
even if defining macro with reserved leading & trailing __ does not belong to best C++ practices,
as a workaround,
I've just added something like this before using #include <mipp.h>:

#if defined(_MSC_VER)
#define __PRETTY_FUNCTION__ __FUNCSIG__
#endif // _MSC_VER

Thanks for this fantastic library.

performance in debug mode with Visual Studio x64 avx2

Hi, MIPP Team

Thansk for providing this very-very-easy-to-get-started-to-use SIMD library.

I would like to calculate the dot-product result of two float32 array:

float dotproduct(size_t len, float* va, float* vb)

I implement this function by calling SIMD wrapper functions, from 3 libraries:

With /AVX2 enabled, I test the performance of these implementations in both Release and Debug mode. Specifically, I use len=200000000, and got:

Release Mode:

impl1(Naive), result is -5290.376953, time cost is 221.296500 ms
impl2(OpenCV), result is -5290.970215, time cost is 66.737100 ms
impl4(MIPP), result is -5290.970703, time cost is 65.687000 ms
impl5(Eigen), result is -5290.747070, time cost is 74.621900 ms

Debug Mode:

impl1(Naive), result is -5290.376953, time cost is 498.087700 ms
impl2(OpenCV), result is -5290.986328, time cost is 1607.252700 ms
impl4(MIPP), result is -5290.986816, time cost is 3864.709700 ms
impl5(Eigen), result is -5290.748047, time cost is 2963.119900 ms

i.e. In debug mode, MIPP cost 3864 ms where OpenCV/Eigen cost less time (still with /AVX2 enabled)

I'm not familiar with the implementation of these SIMD wrappers, nor do I have much experience with SIMD programming. Just wondering if there is some config I missing, to speed up MIPP's performance in Debug mode? Or, is this is a known issue and would be resolved in the future?

The whole implementation and comparison can be found here.

Request: Compile Time failure instead of Runtime failure

To make it easier to capture misconfiguration or lacking support of certain instrinsics, I would like to suggest the following change:

instead of throwing std::runtime_error when using unsupported intrinsics use static_assert

rough example:
https://godbolt.org/z/W6GPdGEhc

possible downside: it is not possible to provide a custom error message as static_assert requires a string literal

the function "mipp::N<float>()"

Why the function "mipp::N()" just returned 1 ?

value_type in Reg<T> and Regx2<T>

Please, add into a Reg and Regx2 classes public type declaration: using value_type = T.
In some cases (especially when writing generic functions) there is need to get info about reg's internal type.
Thanks!

why fmadd __ARM_FEATURE_FMA is not enabled by default?

for ARM:
__ARM_FEATURE_FMA macro is not defined
for SSE:
fmadd could implement as：
_mm_fmadd_ps

Is there any reason ?
thanks

Lzcnt/popcnt functions?

Hi,

Unless I haven't looked in the right places, there doesn't seem to be any support lane-wise lzcnt or popcount. Is it foreseen in the future or out of scope of this library?

Thanks in advance

mipp::cvt<int16_t,int32_t> failure

Building and running on an AVX512 system (-march=native compile flag) and get the following error with the cvt function:

terminate called after throwing an instance of 'std::runtime_error'
what(): mipp::cvt<int16_t,int32_t> (AVX512) is undefined!
Aborted

c2719 error when using vs2015(v140) platform toolset

I tried to compile my project in visual studio 2017 but with vs2015 (v140) platform toolset, and got many c2719 errors. With vs2017(v141) toolset, everything is fine.

	auto v3_re = mipp::sub<T>(mipp::mul<T>(v1.val[0], v2.val[0]), mipp::mul<T>(v1.val[1], v2.val[1]));
	auto v3_im = mipp::add<T>(mipp::mul<T>(v1.val[0], v2.val[1]), mipp::mul<T>(v1.val[1], v2.val[0]));