Git Product home page Git Product logo

alugowski / poolstl Goto Github PK

View Code? Open in Web Editor NEW
20.0 2.0 1.0 157 KB

Light and self-contained implementation of C++17 parallel algorithms.

License: BSD 2-Clause "Simplified" License

CMake 8.70% C++ 89.49% Python 1.54% Shell 0.26%
cpp11 cpp14 cpp17 parallel parallel-computing sorting sorting-algorithms sorting-algorithms-implemented stl stl-algorithms thread threading concurrency multithreading emscripten wasm parallel-sort parallel-sorting

poolstl's Introduction

tests codecov

poolSTL

Light, self-contained, thread pool-based implementation of C++17 parallel standard library algorithms.

C++17 introduced parallel overloads of standard library algorithms that accept an Execution Policy as the first argument. Policies specify limits on how the implementation may parallelize the algorithm, enabling methods like threads, vectorization, or even GPU. Policies can be supplied by the compiler or by libraries like this one.

std::sort(std::execution::par, vec.begin(), vec.end());
    //    ^^^^^^^^^^^^^^^^^^^ native C++17 parallel Execution Policy      

Unfortunately compiler support varies. Quick summary of compilers' default standard libraries:

Linux macOS Windows
GCC 9+ TBB Required TBB Required TBB Required
GCC 8-
Clang (libc++)
Clang (libstdc++) TBB Required TBB Required TBB Required
Apple Clang
MSVC 15.7+ (2017)
Parallel STL TBB Required TBB Required TBB Required
poolSTL ✅* ✅* ✅*

PoolSTL is a supplement to fill in the support gaps. It is not a full implementation; only the basics are covered. However, it is small, easy to integrate, and has no external dependencies. A good backup to the other options.

Use poolSTL exclusively, or only on platforms lacking native support, or only if TBB is not present.

Supports C++11 and higher. Algorithms introduced in C++17 require C++17 or higher.
Tested in CI on GCC 7+, Clang/LLVM 5+, Apple Clang, MSVC, MinGW, and Emscripten.

Implemented Algorithms

Algorithms are added on an as-needed basis. If you need one open an issue or contribute a PR.
Limitations: All iterators must be random access. No nested parallel calls.

<algorithm>

<numeric>

All in std:: namespace.

Other

  • poolstl::iota_iter - Iterate over integers. Same as iterating over output of std::iota but without materializing anything. Iterator version of std::ranges::iota_view.
  • poolstl::for_each_chunk - Like std::for_each, but explicitly splits the input range into chunks then exposes the chunked parallelism. A user-specified chunk constructor is called for each parallel chunk then its output is passed to each loop iteration. Useful for workloads that need an expensive workspace that can be reused between iterations, but not simultaneously by all iterations in parallel.
  • poolstl::pluggable_sort - Like std::sort, but allows specification of sequential sort method. To parallelize pdqsort: pluggable_sort(par, v.begin(), v.end(), pdqsort).

Usage

PoolSTL provides:

  • poolstl::par: Substitute for std::execution::par. Parallelized using a thread pool.
  • poolstl::seq: Substitute for std::execution::seq. Simply calls the regular (non-policy) overload.
  • poolstl::par_if(): Choose parallel or sequential at runtime. See below.

In short, use poolstl::par to make your code parallel. Complete example:

#include <iostream>
#include <poolstl/poolstl.hpp>

int main() {
    std::vector<int> v = {0, 1, 2, 3, 4, 5};
    auto sum = std::reduce(poolstl::par, vec.cbegin(), vec.cend());
    //                     ^^^^^^^^^^^^
    //                     Add this to make your code parallel.
    std::cout << "Sum=" << sum << std::endl;
    return 0;
}

Controlling Thread Pool Size with par.on(pool)

The thread pool used by poolstl::par is managed internally by poolSTL. It is started on first use.
Use your own thread pool with poolstl::par.on(pool) for control over thread count, startup/shutdown, etc.:

task_thread_pool::task_thread_pool pool{4};  // 4 threads

std::reduce(poolstl::par.on(pool), vec.begin(), vec.end());

Choosing Parallel or Sequential at Runtime with par_if

Sometimes the choice whether to parallelize or not should be made at runtime. For example, small datasets may not amortize the cost of starting threads, while large datasets do and should be parallelized.

Use poolstl::par_if to select between par and seq at runtime:

bool is_parallel = vec.size() > 10000;

std::reduce(poolstl::par_if(is_parallel), vec.begin(), vec.end());

Use poolstl::par_if(is_parallel, pool) to control the thread pool used by par, if selected.

Examples

Parallel for (auto& value : vec)

std::vector<int> vec = {0, 1, 2, 3, 4, 5};

// Parallel for-each
std::for_each(poolstl::par, vec.begin(), vec.end(), [](auto& value) {
    std::cout << value;  // loop body
});

Parallel for (int i = 0; i < 100; ++i)

using poolstl::iota_iter;

// parallel for loop
std::for_each(poolstl::par, iota_iter<int>(0), iota_iter<int>(100), [](auto i) {
    std::cout << i;  // loop body
});

Parallel Sort

std::vector<int> vec = {5, 2, 1, 3, 0, 4};

std::sort(poolstl::par, vec.begin(), vec.end());

Installation

Single File

Each release publishes a single-file amalgamated poolstl.hpp. Simply copy this into your project.

Build requirements:

  • Clang and GCC 8 or older: require -lpthread to use C++11 threads.
  • Emscripten: compile and link with -pthread to use C++11 threads. See docs.

CMake

include(FetchContent)
FetchContent_Declare(
        poolSTL
        GIT_REPOSITORY https://github.com/alugowski/poolSTL
        GIT_TAG main
        GIT_SHALLOW TRUE
)
FetchContent_MakeAvailable(poolSTL)

target_link_libraries(YOUR_TARGET poolSTL::poolSTL)

Alternatively copy or checkout the repo into your project and:

add_subdirectory(poolSTL)

Benchmark

See benchmark/ to compare poolSTL against the standard sequential implementation, and (if available) the native std::execution::par implementation.

Results on an M1 Pro (6 power, 2 efficiency cores), with GCC 13:

-------------------------------------------------------------------------------------------------------
Benchmark                                                             Time             CPU   Iterations
-------------------------------------------------------------------------------------------------------
all_of()/real_time                                                 19.9 ms         19.9 ms           35
all_of(poolstl::par)/real_time                                     3.47 ms        0.119 ms          198
all_of(std::execution::par)/real_time                              3.45 ms         3.25 ms          213
find_if()/needle_percentile:5/real_time                           0.988 ms        0.987 ms          712
find_if()/needle_percentile:50/real_time                           9.87 ms         9.86 ms           71
find_if()/needle_percentile:100/real_time                          19.7 ms         19.7 ms           36
find_if(poolstl::par)/needle_percentile:5/real_time               0.405 ms        0.050 ms         1730
find_if(poolstl::par)/needle_percentile:50/real_time               1.85 ms        0.096 ms          393
find_if(poolstl::par)/needle_percentile:100/real_time              3.64 ms        0.102 ms          193
find_if(std::execution::par)/needle_percentile:5/real_time        0.230 ms        0.220 ms         3103
find_if(std::execution::par)/needle_percentile:50/real_time        1.75 ms         1.60 ms          410
find_if(std::execution::par)/needle_percentile:100/real_time       3.51 ms         3.24 ms          204
for_each()/real_time                                               94.6 ms         94.6 ms            7
for_each(poolstl::par)/real_time                                   18.7 ms        0.044 ms           36
for_each(std::execution::par)/real_time                            15.3 ms         12.9 ms           46
sort()/real_time                                                    603 ms          602 ms            1
sort(poolstl::par)/real_time                                        112 ms         6.64 ms            6
sort(std::execution::par)/real_time                                 113 ms          102 ms            6
pluggable_sort(poolstl::par, ..., pdqsort)/real_time               71.7 ms         6.67 ms           10
transform()/real_time                                              95.0 ms         94.9 ms            7
transform(poolstl::par)/real_time                                  17.4 ms        0.037 ms           38
transform(std::execution::par)/real_time                           15.3 ms         13.2 ms           45
exclusive_scan()/real_time                                         33.7 ms         33.7 ms           21
exclusive_scan(poolstl::par)/real_time                             11.6 ms        0.095 ms           55
exclusive_scan(std::execution::par)/real_time                      19.8 ms         15.3 ms           32
reduce()/real_time                                                 15.2 ms         15.2 ms           46
reduce(poolstl::par)/real_time                                     4.06 ms        0.044 ms          169
reduce(std::execution::par)/real_time                              3.38 ms         3.16 ms          214

poolSTL as std::execution::par

USE AT YOUR OWN RISK! THIS IS A HACK!

Two-line hack for missing compiler support. A no-op on compilers with support.

If POOLSTL_STD_SUPPLEMENT is defined then poolSTL will check for native compiler support. If not found then poolSTL will alias its poolstl::par as std::execution::par:

#define POOLSTL_STD_SUPPLEMENT
#include <poolstl/poolstl.hpp>

Now just use std::execution::par as normal, and poolSTL will fill in as necessary. See supplement_test.cpp.

Example use case: You can link against TBB, so you'll use native support on GCC 9+, Clang, MSVC, etc. PoolSTL will fill in automatically on GCC <9 and Apple Clang.

Example use case 2: You'd prefer to use the TBB version, but don't want to fail on systems that don't have it. Simply use the supplement as above, but have your build system (CMake, meson, etc.) check for TBB. If not found, define POOLSTL_STD_SUPPLEMENT_NO_INCLUDE and the supplement will not #include <execution> (and neither should your code!), thus dropping the TBB link requirement. The poolSTL supplement fills in.
See the supplement section of tests/CMakeLists.txt for an example.

poolstl's People

Contributors

alugowski avatar dependabot[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

jerrycxj

poolstl's Issues

Add a policy that selects between parallel and sequential

Some codes are run on both small and large inputs. The small inputs may be best handled sequentially. Add a policy that allows selecting between sequential and parallel.

This mechanism should work both with poolstl::par/poolstl::seq and std::execution::par/std::execution::seq.

Add a std supplement mode

Add a single-line way to define std::execution::par if the compiler does not support it.

This way user code can be written such that the native compiler-provided methods are used if available else poolSTL fallback.

Emscripten - Clang

Hi,
I have tested the replacement in my project and it works just fine on windows visual studio 2022 and android studio 2022.3.1 (ndk 26.1.109)

It works as expected, however with emscripten, the code below fails with the provided log.
version of poolSTL 0.3.4

I am not sure if you have tested it with emscripten, this stack trace is from google chrome. I'll keep looking in to it, I have provided the report in case you can provide more info.

#define POOLSTL_STD_SUPPLEMENT 1
#include <poolSTL/poolstl.hpp>
#include <vector>

int  main()
{
    // Create a vector of 10 numbers
    std::vector<int> numbers = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};

    // Sum the numbers in parallel using std::reduce with std::execution::par
    int sum = std::reduce(std::execution::par, numbers.begin(), numbers.end());

	return sum;
}
$std::__2::__throw_system_error(int, char const*) @ system_error.cpp:293
$std::__2::thread::thread<void (task_thread_pool::task_thread_pool::*)(), task_thread_pool::task_thread_pool*, void>(void (task_thread_pool::task_thread_pool::*&&)(), task_thread_pool::task_thread_pool*&&) @ thread.h:252
$void std::__2::allocator<std::__2::thread>::construct[abi:ue170004]<std::__2::thread, void (task_thread_pool::task_thread_pool::*)(), task_thread_pool::task_thread_pool*>(std::__2::thread*, void (task_thread_pool::task_thread_pool::*&&)(), task_thread_pool::task_thread_pool*&&) @ allocator.h:167
$void std::__2::allocator_traits<std::__2::allocator<std::__2::thread>>::construct[abi:ue170004]<std::__2::thread, void (task_thread_pool::task_thread_pool::*)(), task_thread_pool::task_thread_pool*, void>(std::__2::allocator<std::__2::thread>&, std::__2::thread*, void (task_thread_pool::task_thread_pool::*&&)(), task_thread_pool::task_thread_pool*&&) @ allocator_traits.h:296
$void std::__2::vector<std::__2::thread, std::__2::allocator<std::__2::thread>>::__emplace_back_slow_path<void (task_thread_pool::task_thread_pool::*)(), task_thread_pool::task_thread_pool*>(void (task_thread_pool::task_thread_pool::*&&)(), task_thread_pool::task_thread_pool*&&) @ vector:1660
$std::__2::thread& std::__2::vector<std::__2::thread, std::__2::allocator<std::__2::thread>>::emplace_back<void (task_thread_pool::task_thread_pool::*)(), task_thread_pool::task_thread_pool*>(void (task_thread_pool::task_thread_pool::*&&)(), task_thread_pool::task_thread_pool*&&) @ vector:1681
$task_thread_pool::task_thread_pool::start_threads(unsigned int) @ poolstl.hpp:407
$task_thread_pool::task_thread_pool::task_thread_pool(unsigned int) @ poolstl.hpp:157
$std::__2::__shared_ptr_emplace<task_thread_pool::task_thread_pool, std::__2::allocator<task_thread_pool::task_thread_pool>>::__shared_ptr_emplace[abi:ue170004]<>(std::__2::allocator<task_thread_pool::task_thread_pool>) @ shared_ptr.h:302
$std::__2::shared_ptr<task_thread_pool::task_thread_pool> std::__2::allocate_shared[abi:ue170004]<task_thread_pool::task_thread_pool, std::__2::allocator<task_thread_pool::task_thread_pool>, void>(std::__2::allocator<task_thread_pool::task_thread_pool> const&) @ shared_ptr.h:1022
$std::__2::shared_ptr<task_thread_pool::task_thread_pool> std::__2::make_shared[abi:ue170004]<task_thread_pool::task_thread_pool, void>() @ shared_ptr.h:1031
$poolstl::execution::internal::get_default_pool()::'lambda'()::operator()() const @ poolstl.hpp:686
$decltype(std::declval<poolstl::execution::internal::get_default_pool()::'lambda'()>()()) std::__2::__invoke[abi:ue170004]<poolstl::execution::internal::get_default_pool()::'lambda'()>(poolstl::execution::internal::get_default_pool()::'lambda'()&&) @ invoke.h:340
$void std::__2::__call_once_param<std::__2::tuple<poolstl::execution::internal::get_default_pool()::'lambda'()&&>>::__execute[abi:ue170004]<>(std::__2::__tuple_indices<...>) @ mutex:632
$std::__2::__call_once_param<std::__2::tuple<poolstl::execution::internal::get_default_pool()::'lambda'()&&>>::operator()[abi:ue170004]() @ mutex:624
$void std::__2::__call_once_proxy[abi:ue170004]<std::__2::tuple<poolstl::execution::internal::get_default_pool()::'lambda'()&&>>(void*) @ mutex:660
$std::__2::__call_once(unsigned long volatile&, void*, void (*)(void*)) @ mutex.cpp:238
$void std::__2::call_once[abi:ue170004]<poolstl::execution::internal::get_default_pool()::'lambda'()>(std::__2::once_flag&, poolstl::execution::internal::get_default_pool()::'lambda'()&&) @ mutex:677
$poolstl::execution::internal::get_default_pool() @ poolstl.hpp:686
$poolstl::execution::parallel_policy::pool() const @ poolstl.hpp:730
$std::__2::vector<std::__2::future<int>, std::__2::allocator<std::__2::future<int>>> poolstl::internal::parallel_chunk_for_1<poolstl::execution::parallel_policy const&, std::__2::__wrap_iter<int*>, int (*)(std::__2::__wrap_iter<int*>, std::__2::__wrap_iter<int*>, int, std::__2::plus<int>), int, int&, std::__2::plus<int>&>(poolstl::execution::parallel_policy const&, std::__2::__wrap_iter<int*>, std::__2::__wrap_iter<int*>, int (*)(std::__2::__wrap_iter<int*>, std::__2::__wrap_iter<int*>, int, std::__2::plus<int>), int*, int, int&, std::__2::plus<int>&) @ poolstl.hpp:1698
$std::__2::enable_if<std::is_base_of<poolstl::execution::poolstl_policy, std::__2::remove_cv<std::__2::remove_reference<poolstl::execution::parallel_policy const&>::type>::type>::value, int>::type std::reduce<poolstl::execution::parallel_policy const&, std::__2::__wrap_iter<int*>, int, std::__2::plus<int>>(poolstl::execution::parallel_policy const&, std::__2::__wrap_iter<int*>, std::__2::__wrap_iter<int*>, int, std::__2::plus<int>) @ poolstl.hpp:2578
$std::__2::enable_if<std::is_base_of<poolstl::execution::poolstl_policy, std::__2::remove_cv<std::__2::remove_reference<poolstl::execution::parallel_policy const&>::type>::type>::value, int>::type std::reduce<poolstl::execution::parallel_policy const&, std::__2::__wrap_iter<int*>, int>(poolstl::execution::parallel_policy const&, std::__2::__wrap_iter<int*>, std::__2::__wrap_iter<int*>, int) @ poolstl.hpp:2594
$std::__2::enable_if<std::is_base_of<poolstl::execution::poolstl_policy, std::__2::remove_cv<std::__2::remove_reference<poolstl::execution::parallel_policy const&>::type>::type>::value, std::__2::iterator_traits<std::__2::__wrap_iter<int*>>::value_type>::type std::reduce<poolstl::execution::parallel_policy const&, std::__2::__wrap_iter<int*>>(poolstl::execution::parallel_policy const&, std::__2::__wrap_iter<int*>, std::__2::__wrap_iter<int*>) @ poolstl.hpp:2605
$ToolKit::Game::Frame(float) @ Game.cpp:36

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.