Git Product home page Git Product logo

crumsort-rs's Issues

`Ord` implementations that panic can lead to UB

Classically sort is seen as an operation that only changes the order of the elements. Modifying or duplicating elements is surprising behavior. And in Rust this can make code that relies on this property exhibit UB.

Below is an example of such an application, that when using slice::sort or slice::sort_unstable is fine and exhibits no UB. However when using crumsort::ParCrumSort::par_crumsort it exhibits use-after-free UB. That's because crumsort_rs and the original crumsort may duplicate elements if the logic returns from unexpected places.

use std::collections::HashSet;
use std::sync::Mutex;

struct ObjStore {
    scratch_arena: Vec<String>,
    occupied_indices: Vec<ObjIndex>,
}

impl ObjStore {
    fn new(values: &[usize]) -> Self {
        // Ensure that there are no duplicate values.
        let values_check = HashSet::<usize>::from_iter(values.iter().copied());
        assert_eq!(values_check.len(), values.len());

        let max_value = (*values.iter().max().unwrap()) + 1;
        let mut scratch_arena: Vec<String> = Vec::with_capacity(max_value);

        let scratch_arena_ptr = scratch_arena.as_mut_ptr();
        for idx in values {
            // SAFETY: We made sure that scratch_arena is large enough so that
            // scratch_arena_ptr.add(*idx) is a valid pointer. And we made sure that idx only shows
            // up once.
            unsafe {
                scratch_arena_ptr.add(*idx).write(format!("{idx:010}"));
            }
        }

        Self {
            scratch_arena,
            occupied_indices: values.iter().map(|val| ObjIndex { val: *val }).collect(),
        }
    }
}

impl Drop for ObjStore {
    fn drop(&mut self) {
        let scratch_arena_ptr = self.scratch_arena.as_mut_ptr();
        for idx in &self.occupied_indices {
            // SAFETY: We made sure that scratch_arena is large enough and the ensured that
            // scratch_arena and occupied_indices stay in sync, so that
            // scratch_arena_ptr.add(idx.val) is a valid pointer. And we made sure that idx only
            // shows up once.
            unsafe {
                std::ptr::drop_in_place(scratch_arena_ptr.add(idx.val));
            }
        }
    }
}

#[derive(Copy, Clone, PartialEq, Eq, Default)]
struct ObjIndex {
    val: usize,
}

static COMP_PANIC_TRIGGER_COUNT: Mutex<usize> = Mutex::new(0);
static COMP_COUNT: Mutex<usize> = Mutex::new(0);

impl PartialOrd for ObjIndex {
    fn partial_cmp(&self, other: &Self) -> Option<std::cmp::Ordering> {
        // const COMP_PANIC_TRIGGER_COUNT: usize = 24;

        let mut comp_count_lock_guard = COMP_COUNT.lock().unwrap();
        if *comp_count_lock_guard == *COMP_PANIC_TRIGGER_COUNT.lock().unwrap() {
            drop(comp_count_lock_guard);
            panic!();
        }
        *comp_count_lock_guard += 1;

        self.val.partial_cmp(&other.val)
    }
}

impl Ord for ObjIndex {
    fn cmp(&self, other: &Self) -> std::cmp::Ordering {
        self.partial_cmp(other).unwrap()
    }
}

fn main() {
    *COMP_COUNT.lock().unwrap() = 0;
    *COMP_PANIC_TRIGGER_COUNT.lock().unwrap() = 14;

    let mut obj_store = ObjStore::new(&[218, 47, 7, 124, 163, 135, 222, 36, 31, 212]);

    // Duplicates elements -> UB
    crumsort::ParCrumSort::par_crumsort(obj_store.occupied_indices.as_mut_slice());

    // works
    // obj_store.occupied_indices.sort();
    // obj_store.occupied_indices.sort_unstable();
}

And while crumsort::ParCrumSort::par_crumsort doesn't directly trigger UB here, it's entirely conceivable that a user uses indices in a similar fashion in a real application, and that it panics because of some rare edge-case. This topic has been discussed in the context of the standard library sort implementations and the consensus was that not only is this behavior highly surprising and may lead to other logic issues down the line, it may also lead to UB as seen here.

I discovered this issue by running crumsort_rs in my custom sort implementation test harness https://github.com/Voultapher/sort-research-rs/blob/3d5175737ccc426cbf186ddffcf2f6344171dd98/sort_test_tools/src/tests.rs wich has tests specifically for this property, e.g. panic_retain_original_set.

`Ord` implementations that don't implement a total order can lead to UB

Classically sort is seen as an operation that only changes the order of the elements. Modifying or duplicating elements is surprising behavior. And in Rust this can make code that relies on this property exhibit UB.

Below is an example of such an application, that when using slice::sort or slice::sort_unstable is fine and exhibits no UB. However when using crumsort::ParCrumSort::par_crumsort it exhibits use-after-free UB. That's because crumsort_rs and the original crumsort may duplicate elements if the comparison function does not implement a total order.

use std::cmp::Ordering;
use std::collections::HashSet;
use std::sync::Mutex;

struct ObjStore {
    scratch_arena: Vec<String>,
    occupied_indices: Vec<ObjIndex>,
}

impl ObjStore {
    fn new(values: &[usize]) -> Self {
        // Ensure that there are no duplicate values.
        let values_check = HashSet::<usize>::from_iter(values.iter().copied());
        assert_eq!(values_check.len(), values.len());

        let max_value = (*values.iter().max().unwrap()) + 1;
        let mut scratch_arena: Vec<String> = Vec::with_capacity(max_value);

        let scratch_arena_ptr = scratch_arena.as_mut_ptr();
        for idx in values {
            // SAFETY: We made sure that scratch_arena is large enough so that
            // scratch_arena_ptr.add(*idx) is a valid pointer.
            unsafe {
                scratch_arena_ptr.add(*idx).write(format!("{idx:010}"));
            }
        }

        Self {
            scratch_arena,
            occupied_indices: values.iter().map(|val| ObjIndex { val: *val }).collect(),
        }
    }
}

impl Drop for ObjStore {
    fn drop(&mut self) {
        let scratch_arena_ptr = self.scratch_arena.as_mut_ptr();
        for idx in &self.occupied_indices {
            // SAFETY: We made sure that scratch_arena is large enough and the ensured that
            // scratch_arena and occupied_indices stay in sync, so that
            // scratch_arena_ptr.add(idx.val) is a valid pointer.
            unsafe {
                std::ptr::drop_in_place(scratch_arena_ptr.add(idx.val));
            }
        }
    }
}

#[derive(Copy, Clone, PartialEq, Eq, Default)]
struct ObjIndex {
    val: usize,
}

const ORD_VALS: [Ordering; 8] = [
    Ordering::Less,
    Ordering::Equal,
    Ordering::Less,
    Ordering::Greater,
    Ordering::Greater,
    Ordering::Greater,
    Ordering::Less,
    Ordering::Equal,
];

static COMP_COUNT: Mutex<usize> = Mutex::new(0);

impl PartialOrd for ObjIndex {
    fn partial_cmp(&self, _other: &Self) -> Option<Ordering> {
        // A logically wrong Ord implementation that doesn't implement a total order is considered
        // safe code in Rust.

        // This is here to have something simple and deterministic, real code would be more complex.
        let comp_count = {
            let mut c = COMP_COUNT.lock().unwrap();
            let comp_count = *c;
            *c += 1;
            comp_count
        };

        Some(ORD_VALS[comp_count % ORD_VALS.len()])
    }
}

impl Ord for ObjIndex {
    fn cmp(&self, other: &Self) -> Ordering {
        self.partial_cmp(other).unwrap()
    }
}

fn main() {
    let mut obj_store = ObjStore::new(&[218, 47, 7, 124, 163, 135, 222, 36, 31, 212]);

    // Duplicates elements -> UB
    crumsort::ParCrumSort::par_crumsort(obj_store.occupied_indices.as_mut_slice());

    // works
    // obj_store.occupied_indices.sort();
    // obj_store.occupied_indices.sort_unstable();
}

And while crumsort::ParCrumSort::par_crumsort doesn't directly trigger UB here, it's entirely conceivable that a user uses indices in a similar fashion in a real application, and that they accidentally implement that looks a lot like a total order but isn't under some circumstances. This topic has been discussed in the context of the standard library sort implementations and the consensus was that not only is this behavior surprising and may lead to other logic issues down the line, it may also lead to UB as seen here.

I discovered this issue by running crumsort_rs in my custom sort implementation test harness https://github.com/Voultapher/sort-research-rs/blob/3d5175737ccc426cbf186ddffcf2f6344171dd98/sort_test_tools/src/tests.rs wich has tests specifically for this property, e.g. violate_ord_retain_original_set.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.