Comments (8)
from sse-popcount.
I have added a shorter AVX2 HS...
CountOnes/hamming_weight@2ab07ac
Predictably, the result is slower for long arrays. Of course, if you have short arrays, it might be preferable.
from sse-popcount.
Awesome, thanks!
from sse-popcount.
@kimwalisch We also tested 5th iteration, but with AVX512F. It's faster for longer inputs.
from sse-popcount.
@WojciechMula I cannot find any popcount benchmark results for AVX512. Have you benchmarked it? Is AVX512 popcount faster than AVX2?
from sse-popcount.
Is AVX512 popcount faster than AVX2?
This question is not well posed. Currently, the only available hardware where AVX512 runs is Knights Landing, and it is a system optimized for AVX-512 execution.
from sse-popcount.
@kimwalisch We haven't published any benchmark yet, but for sure AVX512 is faster than AVX2. Will post numbers when I'm back home.
And as Daniel said, AVX512 is the main instruction set on KNL. Many AVX2 instructions that are really fast on Skylake and other popular desktop CPUs, on KNL are incredibly slow. Take a look at the latest documents from Agner Fog http://agner.org/optimize/#manuals and compare instructions timing. For example on Skylake PSHUFB both latency and throughput are 1 cycle, on KNL it is 11 cycles (and VPSHUFB is two times slower).
from sse-popcount.
@kimwalisch The metric we're using in the project Daniel linked is CPU cycles per 64-bit word. Going to the numbers: popcount of 8192 words, the fastest AVX2 procedure runs at rate 1.12 cycle, while AVX512F runs at 0.33. More than 3 times faster.
from sse-popcount.
Related Issues (8)
- AVX2 & popcnt mix HOT 1
- popcnt_SSE_bit_parallel, popcnt_AVX2_lookup, popcnt_parallel_64bit_optimized
- Harleay-Seal AVX2 produces wrong results
- Cannot compile popcnt-avx2-harley-seal.cpp using MSVC 2015 HOT 6
- builtin-popcnt-movdq-unrolled_manual produces wrong results HOT 1
- missing include cstdio in files verfiy.cpp and speed.cpp
- SSE Harley-Seal popcount algorithm HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sse-popcount.