Git Product home page Git Product logo

sort-research-rs's Introduction

sort-research-rs

  • Is a sort implementation correct?
  • Is a sort implementation fast?

This repository contains:

  • An exhaustive test suite, including properties not commonly checked or upheld
  • An extensive benchmark suite, abstracting over types, patterns and sizes
  • A fuzzing harness
  • A novel sort implementation ipnsort (Instruction-Parallel-Network-Sort)
  • Vendored sort implementations (Rust, C++, C), eg. cpp_pdqsort, rust_std_stable
  • Various experiments and demonstrations
  • Results of the research as papers

Most tests and benchmarks can be applied to non Rust implementations. This works by implementing the 5 benchmark types as #[repr(C)] and having a defined C API that can be implemented, wrapping the C/C++ sorts.

Most functionality is by default disabled via cargo features, see the Cargo.toml. Some functionality can be enabled or switched by setting environment variables. See for example benches/bench.rs.

Research results

Running the tests

cargo test

cargo miri test

# Might require disabling criterion dependency.
RUSTFLAGS=-Zsanitizer=address cargo t --release

Running the benchmarks

cargo bench

cargo bench <sort_name>-<prediction_state>-<type>-<pattern>-<size>

# Eg:
cargo bench rust_std_stable-hot-u64-random-10000
cargo bench hot-u64-random-10000
cargo bench random

You can also set a custom regex to filter the benchmarks you want to run:

BENCH_REGEX="std.*i32-random-8$" cargo bench

If you want to collect a set of results that can then later be used to create graphs, you can use the run_benchmarks.py utility script:

# Will write results to my_test_zen3.json
BENCH_REGEX="_stable.*random-" python util/run_benchmarks.py my_test_zen3

To run the graph_all.py script to create graphs from this data you need to first install the dependencies as specified in requirements.txt e.g. on Linux:

python -m venv venv
source ./venv/bin/active
pip install --requirement util/graph_bench_result/requirements.txt

Afterwards, you can use the graph_all.py script:

python util/graph_bench_result/graph_all.py my_test_zen3.json

Fuzzing

You'll need to install cargo fuzz and cargo afl respectively. See https://rust-fuzz.github.io/book/introduction.html.

Fuzzing with libfuzzer

cd fuzz
cargo fuzz run libfuzzer_main

Fuzzing with afl

cd fuzz-afl
RUSTFLAGS=-Zsanitizer=address cargo afl build --release && cargo afl fuzz -i in -o out -D target/release/afl

Adjust fuzz/fuzz_targets/libfuzzer_main.rs and fuzz/fuzz_targets/libfuzzer_main.rs respectively to change the fuzz target. Default rust_ipn_stable.

Contributing

Please respect the CODE_OF_CONDUCT.md when contributing.

Adding a new sort implementation

Please open a PR before investing the effort of adding a new sort implementation. The maintainer of this project is currently not interested in building an up-to-date database of all existing sort implementations. Implementations are added based on situational context and relevance to the maintainers' goals. Baseline for all new sort implementations is they must pass all functionality tests, they may pass the safety tests but don't have to.

Authors

See also the list of contributors who participated in this project.

License

This project is licensed under the Apache License, Version 2.0 - see the LICENSE.md file for details.

sort-research-rs's People

Contributors

voultapher avatar heinenen avatar laundmo avatar dureuill avatar nicholasgorski avatar orlp avatar

Stargazers

Raphaël Marinier avatar orzogc avatar HTROY avatar Carter avatar Ming-Hung Tsai avatar VPC avatar iulx0 avatar Jonathan Bluett-Duncan avatar Hiroki Noda avatar Johann Prescher avatar David Lakin avatar Richard Diamond avatar Benjamin Wade avatar Niranjan Anandkumar avatar JasonL avatar Alexander Jung avatar Ezneh avatar iunno avatar Shani Pribadi avatar bitstring avatar Ted Mostly avatar Ben Chuanlong Du avatar Adam S avatar leo avatar Darin Gordon avatar Vinícius Freitas de Almeida avatar 爱可可-爱生活 avatar Matteo Bigoi avatar  avatar 听风 avatar  avatar Shadab Zafar avatar orociic avatar Zac avatar Jonas Platte avatar Andrew avatar veclavtalica avatar qb avatar Vanius Bittencourt avatar Tim Kersey avatar Reinis Mazeiks avatar Stanislav Tkach avatar Jose Quintana avatar Tejas Sanap avatar Michael Cheng avatar Sergey Monin avatar  avatar Arindam Das avatar Trojanking avatar Sarah Laplace avatar Ryuici avatar Rajan Maghera avatar Lukas Schulte avatar Phosphorus Moscu avatar  avatar Ben Levin avatar Lina avatar Louis avatar  avatar edgimar avatar Smith Noorah  avatar Andre Günther avatar Yuan avatar  avatar Aatif Syed avatar Steven Millington avatar Kobi Cohen-Arazi avatar invzhi avatar Serhii avatar  avatar  avatar Hugefiver avatar Alan Chung Ma avatar Greg DeAngelis avatar Jun WU avatar Felipe S. S. Schneider avatar Krzysztof Sibiński avatar AG avatar Andrew Hlynskyi avatar Benoît Cortier avatar Stefan V. avatar YEUNG King On avatar Kaizhao Zhang avatar Walter J. avatar François-Simon Fauteux-Chapleau avatar  avatar QuantumGhost avatar  avatar Rui Osório avatar Shuduo Sang avatar  avatar Amando avatar everiu avatar Stephen Liang avatar  avatar  avatar Robert Balas avatar Noah Winter avatar  avatar xlzheng avatar

Watchers

Rémy Rakic avatar timothy avatar OvermindDL1 avatar James Cloos avatar Like Xu avatar  avatar  avatar Richard Kleijn avatar

sort-research-rs's Issues

Redundant patterns

If I see it correctly, the patterns 90p_zero_10p_random, 95p_zero_5p_random and 99p_zero_1p_random are exactly the same as random_p10, random_p5 and random_p1.
I propose that we remove the former (strictly speaking random_p5 doesn't exist yet, but that can be changed easily).

Add WILLEM_TUNED version for cpp_powersort_4way?

Something along the lines of:

template<typename T>
void powersort_4way_willem_generic(T* data, size_t len){
  T* mid1 = std::partition(data, data+len, [](const T& el){return el < T{};});
  T* mid2 = std::partition(mid1, data+len, [](const T& el){return el == T{};});
  // sort [data, mid1> range
  // sort [mid2, data+len> range
}

That doesn't require the sentinel to not appear in the sequence

For the performance benchmarks it might be interesting

It might make sense to make it generic over the sentinel, to test both T{} and std::numeric_limits<uint64_t>::max() ? Not familiar with the algo :\

Clarification about exception safety

Hello,

Thank you for this article, it is a very interesting read that demonstrates very thorough work.

Unfortunately there's a claim that I'm unable to independently reproduce.

The article rates various implementations for a property called "exception safety", and states that implementations either leave the input unmodified, or may leave duplicated objects in the input.

In particular, the sort implementations of the C++ std are marked as not exception safe, which implies that they cause object duplication. This is a surprising result that the standard library would do this, because a tenet of C++ is to use move and copy operators when available, and avoid untyped bitwise copies.

Trying to replicate that result, this is not what I'm seeing for cpp_std_libcxx_stable at least.
I do see that the input is not preserved; it now contains "moved-out" items (nullptrs in the case of a unique_ptr), but it doesn't contain duplicated objects. Here's a link to a Godbolt with my experiment.

These two issues are very distinct, because the one I observe leaves the input in a "valid, but unspecified state", which is not amenable to UB if due diligence is taken, while object duplication of the input would necessarily cause UB by way of double free etc. "Exception safety" is generally understood as "an exception at a certain point in the program causes UB", not the stronger variant of "the input is left in its original state".

Unfortunately, as far as I can understand, the code of the benchmark only uses types that are trivially copyable; in C++ this can cause specialized codepaths to be used, which may duplicate elements of the input: these particular results are not generalizable to types that are not trivially (or at all) copyable.

Would you mind to augment the benchmark with results on non copyable types that would demonstrate the object duplication, or clarify in the text that no object duplication can occur in the sort implementations of the C++ standard library?

Additionally, in the interest of clarity, if the C++ standard library does not cause object duplication (for objects that are non-copyable), can you either enrich the rating for "exception safety" to distinguish between "will lead to UB" and the weaker "may not preserve the input", and also clarify that the article is using a stronger-than-usual definition of exception safety?

Apologies if there's anything I misunderstood or missed. Also, I understand that my quick attempt at reproducing your results might be insufficient, and that UB is an elusive beast that is not always easy to observe. Hence why I'm asking for a more reliable reproduction.

Again, thank you for your work on this article, which is in no way invalidated by this clarification request.

Clarify requirements for new benchmarks or other contributions

Some querstions which could be addressed either in the Readme or a specific Contributing document:

  • Which classes and categories of sorts are welcome? which aren't?
  • Is there a requirement for popularity/quality of a new sort?
  • Should different implementations of the same algorithm be added?
  • Should a issue be created to discuss the addition of a sort before starting work on a PR?

iter_batched vs iter_batched_ref

I tried coming up with why cargo bench and the run_benchmarks.py yielded different results for me. The option that made the difference was that the feature "cold_benchmarks" was enabled in the python script but not with cargo bench. This seemed odd to me as I was only executing the hot benchmarks. Do you know why this could be?

After replacing iter_batched() with iter_batched_ref() (see below), disabling or enabling the cold_benchmarks-feature didn't seem to make a difference anymore.

let bench_name_hot = format!("{bench_name}-hot-{transform_name}-{pattern_name}-{test_size}");
if is_bench_name_ok(&bench_name_hot) {
c.bench_function(&bench_name_hot, |b| {
b.iter_batched(
|| transform(pattern_provider(test_size)),
|mut test_data| {
test_fn(black_box(test_data.as_mut_slice()));
black_box(test_data); // side-effect
},
batch_size,
)
});
}

Is there a reason why you chose iter_batched() over iter_batched_ref()?
The main difference to me seems to be the timing model, iter_batched() additionaly accounts for the time it takes to drop the elements passed to it's routine. For small test sizes this seems to matter substantially.

It is absurd to compare stable sort implementations for u64

Hi. Thanks a lot for your sorting research! Now I'm reading this document: https://github.com/Voultapher/sort-research-rs/blob/main/writeup/glidesort_perf_analysis/text.md . I have not read it fully yet. The very first graph seems very strange. It compares multiple sort algorithms. All they end with "stable", i. e. it is assumed they are stable. And you seem to sort u64. But there is no any sense to perform stable sort on u64, because equal u64 values are actually identical, i. e. interchangeable

SingeliSort?

I seem to be working on sorting again, so I ought to ask: do you have any interest in reviewing SingeliSort? I think of it as a quicksort (fluxsort/crumsort) hybrid with many base cases, and a glidesort-ish layer on top. Because of the Robin Hood base case it's much faster than fluxsort on random data, up to 3x or so for i32 in the 1e3 to 1e4 range.

There are still a few improvements I'd like to make, most importantly some kind of adaptive merge instead of always using the branchless one. Also considering an initial sample to skip the glidesort layer if there are many duplicates, which would fix the degradation you've observed in your scaling-random_p benchmark.

While I think results at this stage would still be interesting and mostly remain applicable, the main reason to ask now is to check if there's anything I can do to make SingeliSort more accessible, for benchmarking or in terms of source code. I do intend to add to the README describing the various parts and how they're related, as well as giving some guidance on how to read the source.

Starfive VisionFive 2 riscv64 benchmark test

I wanted to drop by and provide to you the duration it took to run your cargo test.

[davidm@fedora-riscv sort-research-rs]$ cargo test --release
   Compiling cc v1.0.78
   Compiling serde_derive v1.0.152
   Compiling unicode-width v0.1.10
   Compiling bstr v0.2.17
   Compiling textwrap v0.11.0
   Compiling paste v1.0.11
   Compiling plotters-svg v0.3.3
   Compiling rayon-core v1.10.2
   Compiling serde_json v1.0.91
   Compiling syn v1.0.107
   Compiling sort_comp v0.1.0 (/home/davidm/sort-research-rs)
   Compiling num-traits v0.2.15
   Compiling itertools v0.10.5
   Compiling csv-core v0.1.10
   Compiling half v1.8.2
   Compiling regex-syntax v0.6.28
   Compiling cast v0.3.0
   Compiling bitflags v1.3.2
   Compiling same-file v1.0.6
   Compiling itoa v0.4.8
   Compiling csv v1.1.6
   Compiling criterion-plot v0.4.5
   Compiling regex v1.7.1
   Compiling walkdir v2.3.2
   Compiling clap v2.34.0
   Compiling serde_cbor v0.11.2
   Compiling plotters v0.3.4
   Compiling tinytemplate v1.2.1
   Compiling rayon v1.6.1
   Compiling rand v0.8.5
   Compiling atty v0.2.14
   Compiling once_cell v1.17.0
   Compiling oorandom v11.1.3
   Compiling core_affinity v0.7.6
   Compiling criterion v0.3.6
    Finished release [optimized] target(s) in 3m 51s
     Running unittests src/lib.rs (target/release/deps/sort_comp-dacceccf28ca295e)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running tests/main.rs (target/release/deps/main-9118c8a514cdc1ec)

running 36 tests

Seed: 11674962991725472644
Testing: rust_ipn_stable

test basic ... ok
test ascending ... ok
test comp_panic ... ok
test all_equal ... ok
test descending ... ok
test fixed_seed ... ok
test int_edge ... ok
test observable_is_less ... ok
test observable_is_less_mut_ptr ... ok
test observable_is_less_u64 ... ok
test panic_observable_is_less ... ok
test panic_retain_original_set ... ok
test ascending_saw ... ok
test descending_saw ... ok
test pipe_organ ... ok
test random ... ok
test random_1024 ... ok
test random_16 ... ok
test random_4 ... ok
test random_256 ... ok
test random_8 ... ok
test random_binary ... ok
test random_f128 ... ok
test random_narrow ... ok
test dyn_val ... ok
test random_large_val ... ok
test random_u64 ... ok
test saw_mixed ... ok
test random_u128 ... ok
test sort_vs_sort_by ... ok
test stability ... ok
test stability_with_patterns ... ok
test saw_mixed_range ... ok
test violate_ord_retain_original_set ... ok
test random_ffi_str ... ok
test random_str ... ok

test result: ok. 36 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 17.63s

panic_observable_is_less assumes sorting algorithm does not use fewer comparisons for invalid inputs

Hey, glidesort passes all your tests except one. However I believe the test to check for things that the sorting interface does not guarantee at all.

This check is the problematic one: https://github.com/Voultapher/sort-research-rs/blob/main/tests/main.rs#L853

Glidesort uses different amounts of memory and thus has different behavior for differently sized objects. You can't count the number of comparisons directly on i32 and then pass it something else and expect it to use the same number of comparisons.

Old module references in fuzz testers

Both fuzz testers use sort_comp::stable::rust_ipn instead of sort_comp::unstable::rust_ipnsort. Also I see some changes in Cargo.lock when building, if you want to keep this up to date.

I'd also find it helpful to have the commands cargo install cargo-fuzz and cargo install afl listed in the README to get started quicker.

Naturally my goal is to run these on SingeliSort. The obvious replacement isn't working right away so I guess I'll need to actually figure out how some of this stuff works, but I'd appreciate any quick comments you have!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.