p12tic / libsimdpp Goto Github PK
View Code? Open in Web Editor NEWPortable header-only C++ low level SIMD library
License: Boost Software License 1.0
Portable header-only C++ low level SIMD library
License: Boost Software License 1.0
Is Visual Studio supported?
Compiling a simple example I get 100s of error of the form:
1>c:\users\jchown\work\simdpp\simd\int8x16.h(122): error C2719: 'd': formal parameter with __declspec(align('16')) won't be aligned
I was looking into using this but the lack of activity is unsettling as I'd like to know that if I find bugs and such that they are resolved and that the library keeps improving with new instruction sets.
float32x4 foo(float32x4 a, int32x4 b)
{
return a+b;
}
results in a single addps instruction for SSE2, as if b were a float32.
float32x4 foo(float32x4 a, int32x4 b)
{
return add(a,b);
}
results in a compilation error, which is resolved by explicitly converting b with to_float32(). Presumably use of + should fail in the same was as use of add() (or even better would be to automatically convert between float/int just scalar operations would)
I am trying to multiply two float32<8> numbers with SIMDPP_ARCH_X86_AVX setting.
The code something like:
float32 bigi = load(i);
float32 bigm = load(modifiers);
bigi = mul(bigi, bigm);
It works ok, but when I try to trace the code step-by-step I see that after multiplication the code goes to following piece of code:
template<class R, class T> SIMDPP_INL
R cast_memcpy(const T& t)
{
static_assert(sizeof(R) == sizeof(T), "Size mismatch");
R r;
::memcpy(&r, &t, sizeof(R));
return r;
}
I don't understand why we need to do memcpy after each operation. It's a big performance gap.
This syntax doesn't seem to be supported now, but I understand the api is in a transition period.
float32<4> a = make_float(1.0f);
a = add(a, 2.0f);
(OSX 10.10, Apple Clang 7.0.2, CMake 3.5.0)
This is probably a bug in CMake (still looking into it) but I found it while working with libsimdpp, so thought other users might find this helpful.
CMakeLists.txt:
[...]
simdpp_get_runnable_archs(RUNNABLE_ARCHS)
simdpp_multiarch(GEN_ARCH_FILES src/code.cpp ${RUNNABLE_ARCHS})
add_executable(simd-test src/main.cpp ${GEN_ARCH_FILES})
target_include_directories(simd-test PRIVATE ${CMAKE_SOURCE_DIR}/include/)
The simdpp_multiarch() CMake function (from SimdppMultiarch.cmake) will use configure_file() to copy ${CMAKE_SOURCE_DIR}/src/code.cpp into the build dir (e.g. ${CMAKE_BINARY_DIR}/src/code_simdpp_-x86_avx.cpp etc). It will also manually add an the include dir back to the original location:
SimdppMultiarch.cmake line 434:
set(CXX_FLAGS "-I\"${CMAKE_CURRENT_SOURCE_DIR}/${SRC_PATH}\" ${CXX_FLAGS}")
This ensures that local includes, such as #include "common.h" in code.cpp will still work at compile time.
The problem is that when CMake generates the file dependencies, it seems to ignore the file-specific include search path set on the generated files. This means that ${CMAKE_BINARY_DIR}/CMakeFiles/simd-test.dir/depend.make will not include src/common.h and when you change common.h without changing code.cpp, none of the generated files are recompiled! This results in linking with stale object files (which include the old version of common.h) and programs that could crash or be incorrect in subtle ways.
Add the local directory of code.cpp to the target include dir with a command like this:
target_include_directories(simd-test PRIVATE ${PROJECT_SOURCE_DIR}/src)
(You may need to update the path for your project, or have multiple of these lines if you simdpp_multiarch() files from multiple directories.)
There may be a way to update simdpp_multiarch() to handle this automatically but a simple solution eludes me at the moment.
Tested with xcode 9.0, SSE4_1 target
bool foo1a(uint32x4 a, uint32x4 b)
{
return test_bits_any(bit_and(a,b));
}
.../submodule/libsimdpp/simdpp/core/test_bits.h:28:70: No matching constructor for initialization of 'typename detail::get_expr_nosign<uint32<4, expr_bit_and<uint32<4, uint32<4, void> >, uint32<4, uint32<4, void> > > >, typename uint32<4, expr_bit_and<uint32<4, uint32<4, void> >, uint32<4, uint32<4, void> > > >::expr_type>::type' (aka 'uint32<16U / 4, simdpp::arch_sse4p1::expr_bit_and<simdpp::arch_sse4p1::uint32<4, simdpp::arch_sse4p1::uint32<4, void> >, simdpp::arch_sse4p1::uint32<4, simdpp::arch_sse4p1::uint32<4, void> > > >')
bool foo1c(uint32x4 a, uint32x4 b)
{
return test_bits_any(bit_and(a,b).eval());
}
this compiles successfully. the code uses an unnecessary pand instruction, presumably because of the use of eval():
Inspiration::foo1c(simdpp::arch_sse4p1::uint32<4u, void>, simdpp::arch_sse4p1::uint32<4u, void>):
0000000000000230 pushq %rbp
0000000000000231 movq %rsp, %rbp
0000000000000234 pand %xmm1, %xmm0
0000000000000238 ptest %xmm0, %xmm0
000000000000023d setne %al
0000000000000240 popq %rbp
0000000000000241 retq
(pand could have been skipped in favor of the AND operation done by ptest)
I noticed this function doesn't appear in the docs. Is it not meant to be a public interface?
I tried to load a vector of double into float64<2,void> but it encountered a segmentation error.
OS: Windows 10 64bit
platform: Visual Studio 2015 with 32 bits debug mode
#define SIMDPP_ARCH_X86_AVX2
//float vec[2]; // this works
double vec[2]; // this results in seg. error.
vec[0] = 0;
vec[1] = 1;
float64<2, void> vec64_4 = load(&vec[0]);
Using this on SSE4.1 then switching to AVX512F broke this call. Potential overload
SIMDPP_INL uint32_t extract_bits_any(const uint8x32& ca)
{
#if SIMDPP_USE_NULL
uint8<32> a = ca;
uint32_t r = 0;
for (unsigned i = 0; i < a.length; i++) {
uint8_t x = ca.el(i);
x = x & 1;
r = (r >> 1) | (uint32_t(x) << 31);
}
return r;
#elif SIMDPP_USE_SSE2
uint8<16> A,B;
split(a, A, B);
return (extract_bits_any(A) << 16) + extract_bits_any(B);
#elif SIMDPP_USE_AVX2
uint8<32> a = ca;
return _mm256_movemask_epi8(a);
#endif
}
hi, this snippet of code could not be compiled in vs2017, for out_vec was mistakenly induced as uint32x8
`void prelu_simdpp(const T* const in_data, const int len, const float coeff, T* const out_data)
{
const int len_aligned = len & (-8);
for (int i = 0; i < len_aligned; i += 8)
{
auto in_vec = simdpp::load_u<simdpp::int32x8>(in_data + i);
auto mask_vec = simdpp::cmp_gt(in_vec, 0);
auto out_vec = simdpp::blend(in_vec, in_vec * coeff, mask_vec);
auto out_vec2 = to_float32(out_vec);
}
}
`
When trying to build documentation, wget... http://doc.radix.lt/libsimdpp/ fails because that directory doesn't exist.
Hi Povilas,
this is just a minor issue. The load() function seems to work only with vectors of unsigned ints and floats. Signed ints seem to be not supported:
int vi[16];
int32x4 i;
i = load(&vi[0]); // OK
i = load<uint32x4>(&vi[0]); // OK
i = load<int32x4>(&vi[0]); // compilation FAILS
i = 10 + load<int32x4>(&vi[0]) * 2; // compilation FAILS - real use case
...the last line shows just why I would like to use the explicit template argument for the load function. Although it works with uint32x4, It would be nice to be able to use the same type as for the target "i" variable. But as I said, this is a very low priority thing :o)
Cheers,
Michal
Hi Povilas,
I have few issues with comparison functions:
int32x4 i = make_int(13);
int32x4 j = make_int(10);
mask_int32x4 m;
m = cmp_le(i, j); // compilation FAILS
m = cmp_le(10, i); // compilation FAILS
m = cmp_le(i, 10); // compilation FAILS
float32x4 i = make_float(10);
float32x4 j = make_float(13);
mask_float32x4 m;
m = cmp_le(i, j); // OK
m = cmp_le(10, j); // OK
m = cmp_le(j, 10); // OK
m = cmp_le(make_float(13.0f), make_float(10.0f)); // compilation FAILS
In the simdpp/core/cmp_le.h it seems that cmp_le takes only float arguments. What is then the preffered way to compare integers? This issue is probably present in all comparison functions.
Thanks for any hints,
Michal
P.S.: the division operator is missing in simdpp/simd.h. I just added
#include <simdpp/operators/f_div.h>
in my local copy to be able to use it.
When compiling with ICC 16.0, some bad overloads are selected for assignment (not sure if compiler bug?). This test code:
#define SIMDPP_ARCH_X86_SSE2
#include <simdpp/simd.h>
#include <cstdio>
int main()
{
using namespace simdpp;
uint32x4 v1 = make_ones<uint32x4>();
v1 = v1 << 3;
std::printf("%u\n", reduce_add(v1));
return 0;
}
compiles to
--- C:\Users\Mak\Documents\bitpacker\lib\libsimdpp\simdpp\detail\insn\i_shift_l.h
00FE1451 pcmpeqd xmm1,xmm1
00FE1455 pslld xmm1,3
--- C:\Users\Mak\Documents\bitpacker\test\simdpp_test1.cpp ---------------------
00FE145A or dword ptr [esp+80h],8000h
--- C:\Users\Mak\Documents\bitpacker\lib\libsimdpp\simdpp\types\int32x4.h ------
00FE1465 movdqa xmmword ptr [esp+90h],xmm1
--- C:\Users\Mak\Documents\bitpacker\test\simdpp_test1.cpp ---------------------
00FE146E ldmxcsr dword ptr [esp+80h]
--- C:\Users\Mak\Documents\bitpacker\lib\libsimdpp\simdpp\detail\cast.inl ------
00FE1476 movaps xmm2,xmmword ptr [esp+90h]
00FE147E movaps xmmword ptr [esp+0A0h],xmm2
--- C:\Users\Mak\Documents\bitpacker\lib\libsimdpp\simdpp\types\int32x4.h ------
00FE1486 movdqa xmm0,xmmword ptr [esp+0A0h]
--- C:\Users\Mak\Documents\bitpacker\lib\libsimdpp\simdpp\detail\insn\i_reduce_add.h
00FE148F movdqa xmmword ptr [esp+90h],xmm0
--- C:\Users\Mak\Documents\bitpacker\lib\libsimdpp\simdpp\types\int8x16.h ------
00FE1498 movdqa xmmword ptr [esp+0B0h],xmm0
--- C:\Users\Mak\Documents\bitpacker\lib\libsimdpp\simdpp\detail\cast.inl ------
00FE14A1 movaps xmm1,xmmword ptr [esp+0B0h]
00FE14A9 movaps xmmword ptr [esp+80h],xmm1
--- C:\Users\Mak\Documents\bitpacker\lib\libsimdpp\simdpp\detail\insn\move_l.h -
00FE14B1 movdqa xmm0,xmmword ptr [esp+80h]
00FE14BA psrldq xmm0,8
--- C:\Users\Mak\Documents\bitpacker\lib\libsimdpp\simdpp\types\int32x4.h ------
00FE14BF movdqa xmmword ptr [esp+0A0h],xmm0
--- C:\Users\Mak\Documents\bitpacker\lib\libsimdpp\simdpp\detail\cast.inl ------
00FE14C8 movaps xmm1,xmmword ptr [esp+0A0h]
00FE14D0 movaps xmmword ptr [esp+0B0h],xmm1
--- C:\Users\Mak\Documents\bitpacker\lib\libsimdpp\simdpp\types\empty_expr.h ---
00FE14D8 movaps xmm0,xmmword ptr [esp+0B0h]
--- C:\Users\Mak\Documents\bitpacker\lib\libsimdpp\simdpp\detail\insn\i_reduce_add.h
00FE14E0 movdqa xmm1,xmmword ptr [esp+90h]
--- C:\Users\Mak\Documents\bitpacker\lib\libsimdpp\simdpp\detail\expr\i_add.h --
00FE14E9 paddd xmm1,xmm0
--- C:\Users\Mak\Documents\bitpacker\lib\libsimdpp\simdpp\types\int32x4.h ------
00FE14ED movdqa xmmword ptr [esp+80h],xmm1
--- C:\Users\Mak\Documents\bitpacker\lib\libsimdpp\simdpp\detail\cast.inl ------
00FE14F6 movaps xmm2,xmmword ptr [esp+80h]
00FE14FE movaps xmmword ptr [esp+0A0h],xmm2
--- C:\Users\Mak\Documents\bitpacker\lib\libsimdpp\simdpp\types\int32x4.h ------
00FE1506 movdqa xmm0,xmmword ptr [esp+0A0h]
--- C:\Users\Mak\Documents\bitpacker\lib\libsimdpp\simdpp\detail\insn\i_reduce_add.h
00FE150F movdqa xmmword ptr [esp+90h],xmm0
--- C:\Users\Mak\Documents\bitpacker\lib\libsimdpp\simdpp\types\int8x16.h ------
00FE1518 movdqa xmmword ptr [esp+0B0h],xmm0
--- C:\Users\Mak\Documents\bitpacker\lib\libsimdpp\simdpp\detail\cast.inl ------
00FE1521 movaps xmm1,xmmword ptr [esp+0B0h]
00FE1529 movaps xmmword ptr [esp+80h],xmm1
--- C:\Users\Mak\Documents\bitpacker\lib\libsimdpp\simdpp\detail\insn\move_l.h -
00FE1531 movdqa xmm0,xmmword ptr [esp+80h]
00FE153A psrldq xmm0,4
--- C:\Users\Mak\Documents\bitpacker\lib\libsimdpp\simdpp\types\int32x4.h ------
00FE153F movdqa xmmword ptr [esp+0B0h],xmm0
--- C:\Users\Mak\Documents\bitpacker\lib\libsimdpp\simdpp\detail\cast.inl ------
00FE1548 movaps xmm1,xmmword ptr [esp+0B0h]
00FE1550 movaps xmmword ptr [esp+0A0h],xmm1
--- C:\Users\Mak\Documents\bitpacker\lib\libsimdpp\simdpp\types\empty_expr.h ---
00FE1558 movaps xmm0,xmmword ptr [esp+0A0h]
--- C:\Users\Mak\Documents\bitpacker\lib\libsimdpp\simdpp\detail\insn\i_reduce_add.h
00FE1560 movdqa xmm1,xmmword ptr [esp+90h]
--- C:\Users\Mak\Documents\bitpacker\lib\libsimdpp\simdpp\detail\expr\i_add.h --
00FE1569 paddd xmm1,xmm0
--- C:\Users\Mak\Documents\bitpacker\lib\libsimdpp\simdpp\types\int32x4.h ------
00FE156D movdqa xmmword ptr [esp+80h],xmm1
--- C:\Users\Mak\Documents\bitpacker\lib\libsimdpp\simdpp\detail\cast.inl ------
00FE1576 movaps xmm2,xmmword ptr [esp+80h]
00FE157E movaps xmmword ptr [esp+0B0h],xmm2
--- C:\Users\Mak\Documents\bitpacker\lib\libsimdpp\simdpp\types\int32x4.h ------
00FE1586 movdqa xmm0,xmmword ptr [esp+0B0h]
--- C:\Users\Mak\Documents\bitpacker\lib\libsimdpp\simdpp\detail\insn\i_reduce_add.h
00FE158F movdqa xmmword ptr [esp+90h],xmm0
Some old API has been deleted for convenience of libsimdpp development. In many cases it's worth to reconsider the deletion and at least put the old API within SIMDPP_ENABLE_DEPRECATED
or similar ifdef block.
To understand the pros and cons of a software library, it can be good to compare it to the
alternatives that exist.
I found for instance this:
https://github.com/VcDevel/Vc
If anyone has compared libsimdpp to other SIMD wrapper template libraries (C++), I would be interested to hear your opinion.
Hi Povilas,
with the current git version, this code does not compile:
float32x4 b = make_float(1.0f);
float32x4 r = add(add(b, b), 2.0f);
with the error message:
error: could not convert ‘simdpp::arch_sse2::add<4u, simdpp::arch_sse2::expr_add<simdpp::arch_sse2::float32<4u>, simdpp::arch_sse2::float32<4u> >, simdpp::arch_sse2::expr_scalar<float> >((* & a), (* & simdpp::arch_sse2::detail::cast_expr<simdpp::arch_sse2::float32<4u, simdpp::arch_sse2::expr_scalar<float> >, float>((* & b))))’ from ‘simdpp::arch_sse2::float32<4u, simdpp::arch_sse2::expr_add<simdpp::arch_sse2::float32<4u, simdpp::arch_sse2::expr_add<simdpp::arch_sse2::float32<4u>, simdpp::arch_sse2::float32<4u> > >, simdpp::arch_sse2::float32<4u, simdpp::arch_sse2::expr_scalar<float> > > >’ to ‘simdpp::arch_sse2::float32<4u, simdpp::arch_sse2::expr_add<simdpp::arch_sse2::float32<4u>, simdpp::arch_sse2::float32<4u, simdpp::arch_sse2::expr_scalar<float> > > >’
template<unsigned N, class V> SIMDPP_INL RET_VEC<N, EXPR<VEC<N>, VEC<N,expr_scalar< float>>>> FUNC(const VEC<N,V>& a, const float& b) { return FUNC(a, detail::cast_expr<VEC<N,expr_scalar< float>>>(b)); } \
^
/home/miso/install/libsimdpp/simdpp/core/f_add.h:43:1: note: in expansion of macro ‘SIMDPP_SCALAR_ARG_IMPL_EXPR’
SIMDPP_SCALAR_ARG_IMPL_EXPR(add, expr_add, float32, float32)
^
The problem seems to be in the scalar argument, because this code compiles correctly:
float32x4 b = make_float(1.0f);
float32x4 r = add(add(b, b), b);
Thanks for any hints,
Miso
I wrote a simple test to familiarize myself with the library.
//#define SIMDPP_ARCH_X86_SSE4_1
#define SIMDPP_ARCH_X86_AVX
#include <simdpp/simd.h>
using namespace simdpp;
int main(int argc, char *argv[]) {
float32<8> test = make_float(1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f, 8.0f);
float32<8> test2 = make_float(2.0f);
float32<8> sum = add(test, test2);
const float *lp = reinterpret_cast<const float *>(&sum);
return lp[0] > lp[1];
}
This compiles with the SSE flag, but fails to compile with AVX. The following error is from clang 3.4
libsimdpp/simdpp/types/float32x8.h:82:26: error: implicit instantiation of undefined template 'simdpp::arch_sse2_sse3_ssse3_sse4p1_avx::uint32<8, void>'
float32<8>(uint32<8> d) { *this = bit_cast<float32<8>>(d); }
libsimdpp/simdpp/types/fwd.h:89:44: note: template is declared here
template<unsigned N, class E = void> class uint32;
Of course I could also use Google or GitHub advanced search to search for examples.
When I search for "<simdpp/simd.h>" in GitHub advanced search, I get
the result.
Maybe we should create a list in the wiki
https://github.com/p12tic/libsimdpp/wiki
?
AFAICT, there's no need for me to run cmake. I can just point my include directories at libsimdpp and get going. It seems cmake is only needed for building a distribution, right?
Thanks.
given that SSE3 enabled implies SSE2 is also enabled, the sequence of #elif sections in the code:
#elif SIMDPP_USE_SSE2
float32x4 sum2 = _mm_movehl_ps(a, a);
float32x4 sum = add(a, sum2);
sum = add(sum, permute2<1,0>(sum));
return _mm_cvtss_f32(sum);
#elif SIMDPP_USE_SSE3
float32x4 b = a;
b = _mm_hadd_ps(b, b);
b = _mm_hadd_ps(b, b);
return _mm_cvtss_f32(b);
causes SSE2 code to be used even when SSE3 is available.
In the standard library, unary type traits that have a boolean value all derive from std::true_type
or std::false_type
(and generally all unary type traits derive from a specialization of std::integral_constant
) instead of defining their own custom value
member. This makes them easy to use for, e.g., tag dispatching, like so:
template <typename T>
void f_impl(std::true_type, T whatever) {
// implement for integral types
}
template <typename T>
void f_impl(std::false_type, T whatever) {
// implement for non-integral types
}
template <typename T>
void f(T whatever) { f_impl(std::is_integral<T>(), whatever); }
std::integral_constant
also provides a few other common niceties like an implicit conversion to to the constant's type (i.e. any object of type std::true_type
implicitly converts to a bool
with value true
) and a type
member that can be useful in other meta-programming contexts.
simdpp::is_vector
and simdpp::is_mask
don't follow this convention, but they should do so, as good C++11 citizens.
I tried to use libsimdpp in VS2015. However it resulted in errors because ucrt/stdlib.h defines min and max as macros so any occurence of min/max in libsimdpp was getting replaced.
Not sure what's the best way to fix that but currently I added #undef
for both of those in null/math.h.
On the FreeBSD 11.1 I am getting these errors:
===> Testing for libsimdpp-2.0
[0/1] cd /usr/ports/devel/libsimdpp/work/libsimdpp-2.0 && /usr/local/bin/ctest --force-new-ctest-process
Test project /usr/ports/devel/libsimdpp/work/libsimdpp-2.0
Start 1: s_test1
Could not find executable test1
Looked in the following places:
test1
test1
Release/test1
Release/test1
Debug/test1
Debug/test1
MinSizeRel/test1
MinSizeRel/test1
RelWithDebInfo/test1
RelWithDebInfo/test1
Deployment/test1
Deployment/test1
Development/test1
Development/test1
Unable to find executable: test1
1/9 Test #1: s_test1 ..........................***Not Run 0.00 sec
Start 2: s_test_dispatcher1
Could not find executable test_dispatcher
Looked in the following places:
test_dispatcher
test_dispatcher
Release/test_dispatcher
Release/test_dispatcher
Debug/test_dispatcher
Debug/test_dispatcher
MinSizeRel/test_dispatcher
MinSizeRel/test_dispatcher
RelWithDebInfo/test_dispatcher
RelWithDebInfo/test_dispatcher
Deployment/test_dispatcher
Deployment/test_dispatcher
Development/test_dispatcher
Development/test_dispatcher
Unable to find executable: test_dispatcher
2/9 Test #2: s_test_dispatcher1 ...............***Not Run 0.00 sec
Start 3: s_test_dispatcher2
Could not find executable test_dispatcher
Looked in the following places:
test_dispatcher
test_dispatcher
Release/test_dispatcher
Release/test_dispatcher
Debug/test_dispatcher
Debug/test_dispatcher
MinSizeRel/test_dispatcher
MinSizeRel/test_dispatcher
RelWithDebInfo/test_dispatcher
RelWithDebInfo/test_dispatcher
Deployment/test_dispatcher
Deployment/test_dispatcher
Development/test_dispatcher
Development/test_dispatcher
Unable to find executable: test_dispatcher
3/9 Test #3: s_test_dispatcher2 ...............***Not Run 0.00 sec
Start 4: s_test_dispatcher3
Could not find executable test_dispatcher
Looked in the following places:
test_dispatcher
test_dispatcher
Release/test_dispatcher
Release/test_dispatcher
Debug/test_dispatcher
Debug/test_dispatcher
MinSizeRel/test_dispatcher
MinSizeRel/test_dispatcher
RelWithDebInfo/test_dispatcher
RelWithDebInfo/test_dispatcher
Deployment/test_dispatcher
Deployment/test_dispatcher
Development/test_dispatcher
Development/test_dispatcher
Unable to find executable: test_dispatcher
4/9 Test #4: s_test_dispatcher3 ...............***Not Run 0.00 sec
Start 5: s_test_dispatcher4
Could not find executable test_dispatcher
Looked in the following places:
test_dispatcher
test_dispatcher
Release/test_dispatcher
Release/test_dispatcher
Debug/test_dispatcher
Debug/test_dispatcher
MinSizeRel/test_dispatcher
MinSizeRel/test_dispatcher
RelWithDebInfo/test_dispatcher
RelWithDebInfo/test_dispatcher
Deployment/test_dispatcher
Deployment/test_dispatcher
Development/test_dispatcher
Development/test_dispatcher
Unable to find executable: test_dispatcher
5/9 Test #5: s_test_dispatcher4 ...............***Not Run 0.00 sec
Start 6: s_test_dispatcher5
Could not find executable test_dispatcher
Looked in the following places:
test_dispatcher
test_dispatcher
Release/test_dispatcher
Release/test_dispatcher
Debug/test_dispatcher
Debug/test_dispatcher
MinSizeRel/test_dispatcher
MinSizeRel/test_dispatcher
RelWithDebInfo/test_dispatcher
RelWithDebInfo/test_dispatcher
Deployment/test_dispatcher
Deployment/test_dispatcher
Development/test_dispatcher
Development/test_dispatcher
Unable to find executable: test_dispatcher
6/9 Test #6: s_test_dispatcher5 ...............***Not Run 0.00 sec
Start 7: s_test_dispatcher6
Could not find executable test_dispatcher
Looked in the following places:
test_dispatcher
test_dispatcher
Release/test_dispatcher
Release/test_dispatcher
Debug/test_dispatcher
Debug/test_dispatcher
MinSizeRel/test_dispatcher
MinSizeRel/test_dispatcher
RelWithDebInfo/test_dispatcher
RelWithDebInfo/test_dispatcher
Deployment/test_dispatcher
Deployment/test_dispatcher
Development/test_dispatcher
Development/test_dispatcher
Unable to find executable: test_dispatcher
7/9 Test #7: s_test_dispatcher6 ...............***Not Run 0.00 sec
Start 8: s_test_dispatcher7
Could not find executable test_dispatcher
Looked in the following places:
test_dispatcher
test_dispatcher
Release/test_dispatcher
Release/test_dispatcher
Debug/test_dispatcher
Debug/test_dispatcher
MinSizeRel/test_dispatcher
MinSizeRel/test_dispatcher
RelWithDebInfo/test_dispatcher
RelWithDebInfo/test_dispatcher
Deployment/test_dispatcher
Deployment/test_dispatcher
Development/test_dispatcher
Development/test_dispatcher
Unable to find executable: test_dispatcher
8/9 Test #8: s_test_dispatcher7 ...............***Not Run 0.00 sec
Start 9: s_test_expr1
Could not find executable test_expr
Looked in the following places:
test_expr
test_expr
Release/test_expr
Release/test_expr
Debug/test_expr
Debug/test_expr
MinSizeRel/test_expr
MinSizeRel/test_expr
RelWithDebInfo/test_expr
RelWithDebInfo/test_expr
Deployment/test_expr
Deployment/test_expr
Development/test_expr
Development/test_expr
Unable to find executable: test_expr
9/9 Test #9: s_test_expr1 .....................***Not Run 0.00 sec
0% tests passed, 9 tests failed out of 9
Total Test time (real) = 0.02 sec
The following tests FAILED:
1 - s_test1 (Not Run)
2 - s_test_dispatcher1 (Not Run)
3 - s_test_dispatcher2 (Not Run)
4 - s_test_dispatcher3 (Not Run)
5 - s_test_dispatcher4 (Not Run)
6 - s_test_dispatcher5 (Not Run)
7 - s_test_dispatcher6 (Not Run)
8 - s_test_dispatcher7 (Not Run)
9 - s_test_expr1 (Not Run)
Errors while running CTest
As mentioned in Intel® Xeon® Processor Scalable Family Technical Overview
the platform Purley will come with support for AVX512BW. I believe CPUs will be released autumn 2017 (or maybe later).
By looking at
http://p12tic.github.io/libsimdpp/v2.0~rc2/libsimdpp/arch/selection.html
it seems AVX512BW is not supported by libsimdpp right now.
AVX512BW will provide 8-bit and 16-bit integer operations that could speed things up.
As a feature request:
It would nice if libsimdpp could support AVX512BW.
I only see functions where size is a template in simdpp/core/f_max.h. How do I compute when size is arbitrary?
i_reduce_max(float32x4) should use vmaxnmvq_f32() to produce the result in a single instruction when ARMv8 instructions are available.
this code doesn't produce an error and compiles dubious looking disassembly on clang
simdpp::mask_int32x4 foo1(simdpp::int32x4 a, simdpp::int32x4 b)
{
return a >= b;
}
Inspiration::foo1(simdpp::arch_ssse3::int32<4u, void>, simdpp::arch_ssse3::int32<4u, void>):
0000000000001280 pushq %rbp
0000000000001281 movq %rsp, %rbp
0000000000001284 movdqa 0x134(%rip), %xmm2
000000000000128c pxor %xmm2, %xmm0
0000000000001290 pxor %xmm2, %xmm1
0000000000001294 movdqa %xmm1, %xmm2
0000000000001298 pcmpgtd %xmm0, %xmm2
000000000000129c pshufd $0xa0, %xmm2, %xmm3
00000000000012a1 pcmpeqd %xmm0, %xmm1
00000000000012a5 pshufd $0xf5, %xmm1, %xmm0
00000000000012aa pand %xmm3, %xmm0
00000000000012ae pshufd $0xf5, %xmm2, %xmm1
00000000000012b3 por %xmm0, %xmm1
00000000000012b7 pcmpeqd %xmm0, %xmm0
00000000000012bb pxor %xmm1, %xmm0
00000000000012bf popq %rbp
00000000000012c0 retq
same code on gcc produces a compilation error
Hello,
Probably a silly question, but since there is absolutely no install or basic usage documentation...
So I got the git repo, then:
cd examples/dynamic_dispatch
make test
and I get:
In file included from test.cc:4:0:
../../simdpp/dispatch/get_arch_gcc_builtin_cpu_supports.h: In function ‘simdpp::Arch simdpp::get_arch_gcc_builtin_cpu_supports()’:
../../simdpp/dispatch/get_arch_gcc_builtin_cpu_supports.h:24:41: error: Parameter to builtin not valid: avx512f
if (__builtin_cpu_supports("avx512f")) {
gcc (Ubuntu 4.8.4-2ubuntu1~14.04) 4.8.4
Hello.
In function simdpp::detail::get_cpuid
i found one little error with big effects.
Original code:
...
#elif _MSC_VER
uint32_t regs[4];
__cpuidex((int*) regs, subleaf, level);
*eax = regs[0];
*ebx = regs[1];
*ecx = regs[2];
*edx = regs[3];
#else
...
But if you'll see MSDN (https://msdn.microsoft.com/ru-ru/library/hskdteyh.aspx) you can notice what subleaf and level params followed in the wrong order. If you change that order all works as intended.
You can use some simple test:
#include <iostream>
#include <simdpp/simd.h>
#include <simdpp/dispatch/get_arch_raw_cpuid.h>
#define SIMDPP_USER_ARCH_INFO ::simdpp::get_arch_raw_cpuid()
namespace SIMDPP_ARCH_NAMESPACE {
std::string archToString(simdpp::Arch arch)
{
std::string ret = "none";
if ((arch & simdpp::Arch::X86_SSE2) == simdpp::Arch::X86_SSE2)
ret += " sse2";
if ((arch & simdpp::Arch::X86_SSE3) == simdpp::Arch::X86_SSE3)
ret += " sse3";
if ((arch & simdpp::Arch::X86_SSSE3) == simdpp::Arch::X86_SSSE3)
ret += " ssse3";
if ((arch & simdpp::Arch::X86_SSE4_1) == simdpp::Arch::X86_SSE4_1)
ret += " sse4.1";
if ((arch & simdpp::Arch::X86_FMA3) == simdpp::Arch::X86_FMA3)
ret += " fma3";
if ((arch & simdpp::Arch::X86_FMA4) == simdpp::Arch::X86_FMA4)
ret += " fma4";
if ((arch & simdpp::Arch::X86_XOP) == simdpp::Arch::X86_XOP)
ret += " xop";
if ((arch & simdpp::Arch::X86_AVX) == simdpp::Arch::X86_AVX)
ret += " avx";
if ((arch & simdpp::Arch::X86_AVX2) == simdpp::Arch::X86_AVX2)
ret += " avx2";
if ((arch & simdpp::Arch::X86_AVX512F) == simdpp::Arch::X86_AVX512F)
ret += " avx512f";
return ret;
}
void printArch()
{
std::cout << "cpu arch: " << archToString(SIMDPP_USER_ARCH_INFO).c_str();
std::cout << std::endl;
std::cout << "compile arch: " << archToString(simdpp::this_compile_arch()).c_str();
std::cout << std::endl;
}
} // namespace SIMDPP_ARCH_NAMESPACE
SIMDPP_MAKE_DISPATCHER_VOID0(printArch)
I'm tested it on Microsoft C++ Build Tools (based on MSVC 2015 SP3) with my i5-4460 CPU.
Output before fix:
cpu arch: none fma3
compile arch: none
Output after fix:
cpu arch: none sse2 sse3 ssse3 sse4.1 fma3 avx avx2
compile arch: none sse2 sse3 ssse3 sse4.1 fma3 avx
Sorry me, but I can't create pull request at that time :(
Hi,
I'd like to try libsimdpp, but I don't know if needs to be installed and how to interface it with a piece of code. I tried to find a tutorial, but it seems that it is currently missing.
First of all, thanks a lot for this great library!
I have a problem to compile this code on my system:
#include <emmintrin.h>
#include <simdpp/sse2.h>
using namespace simdpp::SIMDPP_ARCH_NAMESPACE;
int main(int argc, char** argv) {
uint32x4 a = uint32x4::make_const(0x11111111, 0x22222222, 0x33333333, 0x44444444);
return 0;
}
...with this command:
g++ -std=c++11 -msse2 -I.. main.cpp
...I get these errors:
In file included from ../simdpp/simd/math_shift.h:17:0,
from ../simdpp/simd.h:47,
from ../simdpp/sse2.h:23,
from main.cpp:2:
../simdpp/simd/extract.h: In function ‘uint64_t simdpp::simdpp_arch_sse2::extract(simdpp::simdpp_arch_sse2::basic_int64x2)’:
../simdpp/simd/extract.h:124:31: error: there are no arguments to ‘_mm_cvtsi128_si64’ that depend on a template parameter, so a declaration of ‘_mm_cvtsi128_si64’ must be available [-fpermissive]
return _mm_cvtsi128_si64(t);
^
../simdpp/simd/extract.h:124:31: note: (if you use ‘-fpermissive’, G++ will accept your code, but allowing the use of an undeclared name is deprecated)
In file included from ../simdpp/simd.h:62:0,
from ../simdpp/sse2.h:23,
from main.cpp:2:
../simdpp/simd/insert.h: In function ‘simdpp::simdpp_arch_sse2::int128 simdpp::simdpp_arch_sse2::insert(simdpp::simdpp_arch_sse2::basic_int64x2, uint64_t)’:
../simdpp/simd/insert.h:136:37: error: there are no arguments to ‘_mm_cvtsi64_si128’ that depend on a template parameter, so a declaration of ‘_mm_cvtsi64_si128’ must be available [-fpermissive]
int64x2 vx = _mm_cvtsi64_si128(x);
^
In file included from ../simdpp/simd.h:69:0,
from ../simdpp/sse2.h:23,
from main.cpp:2:
../simdpp/simd/int64x2.inl: In static member function ‘static simdpp::simdpp_arch_sse2::uint64x2 simdpp::simdpp_arch_sse2::uint64x2::set_broadcast(uint64_t)’:
../simdpp/simd/int64x2.inl:82:30: error: ‘_mm_cvtsi64_si128’ was not declared in this scope
r0 = _mm_cvtsi64_si128(v0);
^
In file included from ../simdpp/simd.h:71:0,
from ../simdpp/sse2.h:23,
from main.cpp:2:
../simdpp/simd/float64x2.inl: In static member function ‘static simdpp::simdpp_arch_sse2::float64x2 simdpp::simdpp_arch_sse2::float64x2::set_broadcast(double)’:
../simdpp/simd/float64x2.inl:52:49: error: ‘_mm_cvtsi64_si128’ was not declared in this scope
r0 = _mm_cvtsi64_si128(bit_cast<int64_t>(v0));
Those undefined functions (_mm_cvtsi128_si64 and _mm_cvtsi64_si128) are defined in emmintrin.h, but only for 64bit systems. Right now, I have just commented out the code around:
in simdpp/simd/float64x2.inl
inline float64x2 float64x2::set_broadcast(double v0)
{
#if SIMDPP_USE_NULL || SIMDPP_USE_NEON_VFP_DP
return null::make_vec<float64x2>(v0);
#elif SIMDPP_USE_SSE2
return zero();
// int64x2 r0;
// r0 = _mm_cvtsi64_si128(bit_cast<int64_t>(v0));
// return permute<0,0>(float64x2(r0));
#else
return SIMDPP_NOT_IMPLEMENTED1(v0);
#endif
}
in simdpp/simd/int64x2.inl
inline uint64x2 uint64x2::set_broadcast(uint64_t v0)
{
#if SIMDPP_USE_NULL
return null::make_vec<uint64x2>(v0);
#elif SIMDPP_USE_SSE2
return zero();
// uint64x2 r0;
// r0 = _mm_cvtsi64_si128(v0);
// r0 = permute<0,0>(r0);
// return uint64x2(r0);
#elif SIMDPP_USE_NEON
uint64x1_t r0 = vcreate_u64(v0);
return vcombine_u64(r0, r0);
#endif
}
in simdpp/simd/extract.h
template<unsigned id>
inline uint64_t extract(basic_int64x2 a)
{
static_assert(id < 2, "index out of bounds");
#if SIMDPP_USE_NULL
return a[id];
#elif SIMDPP_USE_SSE4_1
return _mm_extract_epi64(a, id);
#elif SIMDPP_USE_SSE2
return 0;
// uint64x2 t = a;
// if (id != 0) {
// t = move_l<id>(t);
// }
// return _mm_cvtsi128_si64(t);
#elif SIMDPP_USE_NEON
return vgetq_lane_u64(a, id);
#endif
}
in simdpp/simd/insert.h
template<unsigned id>
int128 insert(basic_int64x2 a, uint64_t x)
{
#if SIMDPP_USE_NULL
a[id] = x;
return a;
#elif SIMDPP_USE_SSE4_1
return _mm_insert_epi64(a, x, id);
#elif SIMDPP_USE_SSE2
return 0;
// int64x2 vx = _mm_cvtsi64_si128(x);
// if (id == 0) {
// a = shuffle1<0,1>(vx, a);
// } else {
// a = shuffle1<0,0>(a, vx);
// }
// return a;
#elif SIMDPP_USE_NEON
return vsetq_lane_u64(x, a, id);
#endif
}
Is there a correct way to overcome those compiler errors?
Thanks a lot!
Michal
Hi Povilas,
sorry for my recent splash of messages :o) I started to work more intensely on vectorizing some scalar code.
When I have a mask vector, I sometimes need to know whether all values in the vector are true or false. I have to use a bit_cast to convert the mask to uint vector like this:
mask_int32x8 mask = ...
bool all_true = simdpp::sse::test_ones(simdpp::bit_cast<uint32x8>(mask));
which is not a problem, but the test_zero() and test_ones() in simdpp/sse/compare.h are implemented only for 128bit vectors. I don't know if it would be ok to add the support for 256bit vectors to the same header file, since such long vectors are supported by avx, not sse.
namespace simdpp {
namespace SIMDPP_ARCH_NAMESPACE {
namespace sse {
template<class = void> SIMDPP_INL
bool test_ones(const uint32x8& a)
{
uint32x4 v1, v2;
simdpp::split(a, v1, v2);
return
test_ones(uint8x16(v1)) &&
test_ones(uint8x16(v2));
}
template<class = void> SIMDPP_INL
bool test_zero(const uint32x8& a)
{
uint32x4 v1, v2;
simdpp::split(a, v1, v2);
return
test_zero(uint8x16(v1)) &&
test_zero(uint8x16(v2));
}
// variants for uint16x16, uint8x32, uint64x8 should follow
}}}
Cheers,
Michal
I believe the instruction VPBLENDD is faster than the instruction PBLENDVB. The first instruction only handles dwords but the second instruction handles bytes.
For that reason I thought that a simdpp::blend()
that makes use of a mask_int32<8>
would be compiled into a VPBLENDD instead of a PBLENDVB.
My test program gets compiled into PBLENDVB:
#include <iostream>
#include <simdpp/simd.h>
int main() {
simdpp::uint32<8> v1 = simdpp::make_uint(std::numeric_limits< uint32_t >::max(), 0);
simdpp::uint32<8> v2 = simdpp::make_uint(std::numeric_limits< uint32_t >::max());
const auto mask = simdpp::cmp_eq(v1, v2);
v1 = simdpp::blend(v1, v2, mask);
// Just output something so that the compiler does not optimize away everything
std::cout << simdpp::reduce_max(v1) << "\n";
}
$ g++-7.1 -I/home/user/libsimdpp/inst/include/libsimdpp-2.0 -I. -std=c++14 -msse4.1 -mavx2 -O3 -D SIMDPP_ARCH_X86_AVX2 -save-temps /home/user/test.cc
$ grep blend test.s
vpblendvb %ymm1, %ymm1, %ymm0, %ymm1
Do you know why PBLENDVB is being used and not VPBLENDD?
Simply including simd.h causing a bunch of warnings; we compile with warnings as errors so we can't use this library:
C:\playground\test\libsimdpp\simdpp/detail/expr/scalar.h(48): warning C4244: '=' : conversion from 'const double' to 'simdpp::arch_avx::any_float32<4,simdpp::arch_avx::float32<4,void>>::element
C:\playground\test\libsimdpp\simdpp/detail/insn/shuffle2x2.h(318): warning C4556: value of intrinsic immediate argument '334' is out of range '0 - 255'
And on and on. Any plans to make this library compile without warnings? Some of the overflow values are worrisome.
Repro:
arch:avx2, MSVC2013
#include "stdafx.h"
#define SIMDPP_ARCH_X86_AVX
#include "simdpp/simd.h"
int main(int argc, _TCHAR* argv[])
{
return 0;
}
It should detect which SIMD instructions are available when the project is built. The libsimdpp package can be created on the machine with a narrow SIMD set, and this package can be used on the machine with a wide SIMD set, and vice versa. One shouldn't affect the other.
libsimdpp should detect SIMD availability purely in the runtime when it is used. You shouldn't even have a 'configure' step.
Hi!
I'm not very experienced with C++ and especially not with this library, but I've found that some of my core uses of this library require patterns like:
uint64_t count = _mm_popcnt_u64(extract<0>(x));
#if UINT64_VECTOR_SIZE >= 2
count += _mm_popcnt_u64(extract<1>(x));
#if UINT64_VECTOR_SIZE >= 4
count += _mm_popcnt_u64(extract<2>(x));
count += _mm_popcnt_u64(extract<3>(x));
#if UINT64_VECTOR_SIZE >= 8
count += _mm_popcnt_u64(extract<4>(x));
count += _mm_popcnt_u64(extract<5>(x));
count += _mm_popcnt_u64(extract<6>(x));
count += _mm_popcnt_u64(extract<7>(x));
#if UINT64_VECTOR_SIZE > 8
#error "we do not support vectors longer than 8, please file an issue"
#endif
#endif
#endif
It would be awesome if there was some syntax like:
uint64_t count = 0
x.foreach<64>( [=](e) {
count += _mm_popcnt_u64(e);
})
I'm happy to hack this up, but I'd need some guidance/scaffolding about how to approach the problem in the framework of libsimdpp.
Hi. I tried to write a simple example for libsimdpp.
I thought the following code should run, but it returned an run time error at load(a).
Do you have any idea?
#define SIMDPP_ARCH_X86_AVX2
#include<simdpp/simd.h>
int main()
{
const int N = 8;
// should be aligned to __mm256
float SIMDPP_ALIGN(32) a[N];
for (int i = 0; i < N; ++i)
{
a[i] = i;
}
// this works.
//simdpp::float32<4, void> a_avx = simdpp::load(&a[0]);
simdpp::float32<8, void> a_avx = simdpp::load(&a[0]);
return 0;
}
float32x4 foo(float a)
{
return splat(a);
}
the generated code looks really terrible, at least in comparison to SSE. I guess ARMv7 can't do any better than that? Nevertheless ARMv8 can using vdupq_laneq_f32(). The NEON implementation of i_splat4() for float32x4 should probably have a SIMDPP_64_BITS variant.
I'm trying to figure out how to generate the macros when using the dispatcher.
First, one need to add -DEMIT_DISPATCHER and the list of supported platforms. But it's not really great for cross platform automated build .
Then, my functions are templated factory builders, and this doesn't work...
I was wondering if using the get compilable macro with cmake could generate a comma separated list (easy to do with CMake) and then use it with Boost preprocessor macros to generate the proper code in a template acceptable way.
Any thoughts on this? Without this 2 features, the SIMD filters I'm trying to build for my library are just unusable :/
Hi there,
AFAICT, libsimdpp is erroneously including AVX2 instructions when I neither enable them with a compiler switch nor enable them with the instruction set selection macro. I have a file called fail.cpp
:
#define SIMDPP_ARCH_X86_SSE2
#include <simdpp/simd.h>
#include <inttypes.h>
using namespace simdpp;
int main(int argc, char ** argv) {
return 0;
}
uint64<2> bad(uint64<2> x, uint64<2> y) {
return bit_andnot(x, y);
}
which I compile with this invocation:
g++ -march=native -std=c++11 -Ilibsimdpp-2.0-rc2 -Wall -Werror fail.cpp
and then I take a look at a.out
:
[ec2-user@ip-172-31-54-96 c]$ objdump -M intel -d a.out
...
00000000004005f2 <_Z3badN6simdpp9arch_sse26uint64ILj2EvEES2_>:
4005f2: 55 push rbp
4005f3: 48 89 e5 mov rbp,rsp
4005f6: 48 81 ec 20 02 00 00 sub rsp,0x220
...
40085a: c5 f9 df 85 30 ff ff vpandn xmm0,xmm0,XMMWORD PTR [rbp-0xd0]
and it includes vpandn
which, AFAIK, is an AVX2 instruction. Moreover, this triggers a SIGILL
on my machine, so at the very least it's not compatible with my architecture.
Have I done something wrong? Perhaps a bad flag somewhere?
When I compile libsimdpp with cmake .
, it does correctly conclude that I lack AVX2:
...
-- Performing Test CAN_RUN_X86_AVX
-- Performing Test CAN_RUN_X86_AVX - Success
-- Performing Test CAN_RUN_X86_AVX2
-- Performing Test CAN_RUN_X86_AVX2 - Failed
...
[ec2-user@ip-172-31-54-96 c]$ gcc --version
gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9)
[ec2-user@ip-172-31-54-96 c]$ cat /proc/cpuinfo | grep flags
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm fsgsbase smep erms xsaveopt
[ec2-user@ip-172-31-54-96 c]$ gcc -march=native -Q --help=target | grep avx
-march= core-avx-i
-mavx [enabled]
-mavx2 [disabled]
-mavx256-split-unaligned-load [disabled]
-mavx256-split-unaligned-store [disabled]
-mprefer-avx128 [disabled]
-msse2avx [disabled]
-mtune= core-avx-i
Can be seen here: https://ci.appveyor.com/project/mbrucher/audiotk/build/2.2.0.565/job/2rl0xf4j6aa3fbam
For VS2015, everything is fine, the AVX512 test fails, and we don't compile the AVX version of the code. For VS2017, the test succeeds, but can't be compiled after.
Known issue or new one?
simdpp::float32x4 foo1(float a)
{
return simdpp::splat(a);
}
.../submodule/libsimdpp/simdpp/detail/insn/set_splat.h: In function ‘simdpp::arch_neonfltsp::float32x4 foo1(float)’:
.../submodule/libsimdpp/simdpp/detail/insn/set_splat.h:302:43: warning: ‘r.simdpp::arch_neonfltsp::float32<4u>::d_’ is used uninitialized in this function [-Wuninitialized]
typename detail::remove_sign::type r;
aarch64-poky-linux-g++ (GCC) 6.3.0
#define SIMDPP_ARCH_ARM_NEON_FLT_SP
Maybe I haven't fully understood expressions yet, but how do you envision calling functions that only have partial architecture support, e.g., https://github.com/p12tic/libsimdpp/blob/master/simdpp/detail/expr/f_fmadd.h#L024. It seems counterintuitive to add checks for platform support in application code again.
Wouldn't it make sense to add a generic implementation as fallback in case no specific instruction set is available?
Hi,
From what I think I understood, in order to use the simdpp optimized functions, you must use the libsimdpp vector types.
So suppose I already have two float arrays, if I want to add them using libsimdpp, I have to create two vectors and copy the arrays in the vectors, is it right ?
Meaning that you cannot directly use the functions on stl vectors for instance ?
Thanks
would it be better to use uint64x2 as the 128-bit optimized implementation of i_test_bits_any() for NEON?
this generates fewer instructions:
SIMDPP_INL bool i_test_bits_any(const uint64<2>& a)
{
uint64x2 r = bit_or(a, move2_l<1>(a));
return extract<0>(r) != 0;
}
as compared to this:
SIMDPP_INL bool i_test_bits_any(const uint32<4>& a)
{
uint32x4 r = bit_or(a, move4_l<2>(a));
r = bit_or(r, move4_l<1>(r));
return extract<0>(r) != 0;
}
This seems wrong, it should probably be vminq_f64.
libsimdpp/simdpp/detail/insn/f_min.h
Line 68 in 39354c3
I am building a Python extension that uses libsimdpp. As I want to provide compatibility with Python 2.7 (yep, it's still pretty popular) I need to compile against VS 2008 (using the cxx98 branch). There, I am getting following error when compiling with SSE2 options enabled:
error C3861: '_mm_set_epi64x': identifier not found
According to https://msdn.microsoft.com/en-us/library/dk2sdw0h(v=vs.90).aspx the correct header file is intrin.h
and just adding it makes it indeed work.
Newer VS versions don't have that problem. Have you heard about this before? Why does libsimdpp not include intrin.h
? Does my workaround look ok?
Full logs: https://ci.spacy.io/builders/sense2vec-win64-py27-64-install/builds/47/steps/shell_2/logs/stdio
Workaround: explosion/sense2vec@1d94617
How is this library licensed? According to README.md it's BSD but the COPYING file says GPL3.
Hi
I am new to this library, so maybe I am wrong. But I noticed that in here, the flag for ICC is -mavx512f
.
Shouldn't it be -xCOMMON-AVX512
(or -xMIX-AVX512
for Xeon Phi x200 and -CORE-AVX512
for other Xeon Phi) according to intel specification?
On my KNL, the ICC can compile the code with warnings that
icpc: command line warning #10159: invalid argument for option '-m'
Will the program still run SIMD instructions correctly ?
Thanks
Qi
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.