google / crumsort-rs Goto Github PK
View Code? Open in Web Editor NEWA parallelized Rust port of crumsort
License: Apache License 2.0
A parallelized Rust port of crumsort
License: Apache License 2.0
Classically sort is seen as an operation that only changes the order of the elements. Modifying or duplicating elements is surprising behavior. And in Rust this can make code that relies on this property exhibit UB.
Below is an example of such an application, that when using slice::sort
or slice::sort_unstable
is fine and exhibits no UB. However when using crumsort::ParCrumSort::par_crumsort
it exhibits use-after-free UB. That's because crumsort_rs and the original crumsort may duplicate elements if the logic returns from unexpected places.
use std::collections::HashSet;
use std::sync::Mutex;
struct ObjStore {
scratch_arena: Vec<String>,
occupied_indices: Vec<ObjIndex>,
}
impl ObjStore {
fn new(values: &[usize]) -> Self {
// Ensure that there are no duplicate values.
let values_check = HashSet::<usize>::from_iter(values.iter().copied());
assert_eq!(values_check.len(), values.len());
let max_value = (*values.iter().max().unwrap()) + 1;
let mut scratch_arena: Vec<String> = Vec::with_capacity(max_value);
let scratch_arena_ptr = scratch_arena.as_mut_ptr();
for idx in values {
// SAFETY: We made sure that scratch_arena is large enough so that
// scratch_arena_ptr.add(*idx) is a valid pointer. And we made sure that idx only shows
// up once.
unsafe {
scratch_arena_ptr.add(*idx).write(format!("{idx:010}"));
}
}
Self {
scratch_arena,
occupied_indices: values.iter().map(|val| ObjIndex { val: *val }).collect(),
}
}
}
impl Drop for ObjStore {
fn drop(&mut self) {
let scratch_arena_ptr = self.scratch_arena.as_mut_ptr();
for idx in &self.occupied_indices {
// SAFETY: We made sure that scratch_arena is large enough and the ensured that
// scratch_arena and occupied_indices stay in sync, so that
// scratch_arena_ptr.add(idx.val) is a valid pointer. And we made sure that idx only
// shows up once.
unsafe {
std::ptr::drop_in_place(scratch_arena_ptr.add(idx.val));
}
}
}
}
#[derive(Copy, Clone, PartialEq, Eq, Default)]
struct ObjIndex {
val: usize,
}
static COMP_PANIC_TRIGGER_COUNT: Mutex<usize> = Mutex::new(0);
static COMP_COUNT: Mutex<usize> = Mutex::new(0);
impl PartialOrd for ObjIndex {
fn partial_cmp(&self, other: &Self) -> Option<std::cmp::Ordering> {
// const COMP_PANIC_TRIGGER_COUNT: usize = 24;
let mut comp_count_lock_guard = COMP_COUNT.lock().unwrap();
if *comp_count_lock_guard == *COMP_PANIC_TRIGGER_COUNT.lock().unwrap() {
drop(comp_count_lock_guard);
panic!();
}
*comp_count_lock_guard += 1;
self.val.partial_cmp(&other.val)
}
}
impl Ord for ObjIndex {
fn cmp(&self, other: &Self) -> std::cmp::Ordering {
self.partial_cmp(other).unwrap()
}
}
fn main() {
*COMP_COUNT.lock().unwrap() = 0;
*COMP_PANIC_TRIGGER_COUNT.lock().unwrap() = 14;
let mut obj_store = ObjStore::new(&[218, 47, 7, 124, 163, 135, 222, 36, 31, 212]);
// Duplicates elements -> UB
crumsort::ParCrumSort::par_crumsort(obj_store.occupied_indices.as_mut_slice());
// works
// obj_store.occupied_indices.sort();
// obj_store.occupied_indices.sort_unstable();
}
And while crumsort::ParCrumSort::par_crumsort
doesn't directly trigger UB here, it's entirely conceivable that a user uses indices in a similar fashion in a real application, and that it panics because of some rare edge-case. This topic has been discussed in the context of the standard library sort implementations and the consensus was that not only is this behavior highly surprising and may lead to other logic issues down the line, it may also lead to UB as seen here.
I discovered this issue by running crumsort_rs in my custom sort implementation test harness https://github.com/Voultapher/sort-research-rs/blob/3d5175737ccc426cbf186ddffcf2f6344171dd98/sort_test_tools/src/tests.rs wich has tests specifically for this property, e.g. panic_retain_original_set
.
Classically sort is seen as an operation that only changes the order of the elements. Modifying or duplicating elements is surprising behavior. And in Rust this can make code that relies on this property exhibit UB.
Below is an example of such an application, that when using slice::sort
or slice::sort_unstable
is fine and exhibits no UB. However when using crumsort::ParCrumSort::par_crumsort
it exhibits use-after-free UB. That's because crumsort_rs and the original crumsort may duplicate elements if the comparison function does not implement a total order.
use std::cmp::Ordering;
use std::collections::HashSet;
use std::sync::Mutex;
struct ObjStore {
scratch_arena: Vec<String>,
occupied_indices: Vec<ObjIndex>,
}
impl ObjStore {
fn new(values: &[usize]) -> Self {
// Ensure that there are no duplicate values.
let values_check = HashSet::<usize>::from_iter(values.iter().copied());
assert_eq!(values_check.len(), values.len());
let max_value = (*values.iter().max().unwrap()) + 1;
let mut scratch_arena: Vec<String> = Vec::with_capacity(max_value);
let scratch_arena_ptr = scratch_arena.as_mut_ptr();
for idx in values {
// SAFETY: We made sure that scratch_arena is large enough so that
// scratch_arena_ptr.add(*idx) is a valid pointer.
unsafe {
scratch_arena_ptr.add(*idx).write(format!("{idx:010}"));
}
}
Self {
scratch_arena,
occupied_indices: values.iter().map(|val| ObjIndex { val: *val }).collect(),
}
}
}
impl Drop for ObjStore {
fn drop(&mut self) {
let scratch_arena_ptr = self.scratch_arena.as_mut_ptr();
for idx in &self.occupied_indices {
// SAFETY: We made sure that scratch_arena is large enough and the ensured that
// scratch_arena and occupied_indices stay in sync, so that
// scratch_arena_ptr.add(idx.val) is a valid pointer.
unsafe {
std::ptr::drop_in_place(scratch_arena_ptr.add(idx.val));
}
}
}
}
#[derive(Copy, Clone, PartialEq, Eq, Default)]
struct ObjIndex {
val: usize,
}
const ORD_VALS: [Ordering; 8] = [
Ordering::Less,
Ordering::Equal,
Ordering::Less,
Ordering::Greater,
Ordering::Greater,
Ordering::Greater,
Ordering::Less,
Ordering::Equal,
];
static COMP_COUNT: Mutex<usize> = Mutex::new(0);
impl PartialOrd for ObjIndex {
fn partial_cmp(&self, _other: &Self) -> Option<Ordering> {
// A logically wrong Ord implementation that doesn't implement a total order is considered
// safe code in Rust.
// This is here to have something simple and deterministic, real code would be more complex.
let comp_count = {
let mut c = COMP_COUNT.lock().unwrap();
let comp_count = *c;
*c += 1;
comp_count
};
Some(ORD_VALS[comp_count % ORD_VALS.len()])
}
}
impl Ord for ObjIndex {
fn cmp(&self, other: &Self) -> Ordering {
self.partial_cmp(other).unwrap()
}
}
fn main() {
let mut obj_store = ObjStore::new(&[218, 47, 7, 124, 163, 135, 222, 36, 31, 212]);
// Duplicates elements -> UB
crumsort::ParCrumSort::par_crumsort(obj_store.occupied_indices.as_mut_slice());
// works
// obj_store.occupied_indices.sort();
// obj_store.occupied_indices.sort_unstable();
}
And while crumsort::ParCrumSort::par_crumsort doesn't directly trigger UB here, it's entirely conceivable that a user uses indices in a similar fashion in a real application, and that they accidentally implement that looks a lot like a total order but isn't under some circumstances. This topic has been discussed in the context of the standard library sort implementations and the consensus was that not only is this behavior surprising and may lead to other logic issues down the line, it may also lead to UB as seen here.
I discovered this issue by running crumsort_rs in my custom sort implementation test harness https://github.com/Voultapher/sort-research-rs/blob/3d5175737ccc426cbf186ddffcf2f6344171dd98/sort_test_tools/src/tests.rs wich has tests specifically for this property, e.g. violate_ord_retain_original_set
.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.