Git Product home page Git Product logo

rust-slimmer_box's Introduction

slimmer_box   Latest Version License

A SlimmerBox<T> is a packed alternative to Box<T> whose 'fat' pointer is 'slimmer'

Documentation

Rationale

A normal Box<[T]> is an owned 'fat pointer' that contains both the 'raw' pointer to memory as well as the size (as an usize) of the managed slice.

On 64-bit targets (where sizeof(usize) == sizeof(u64)), this makes a Box<[T]> take up 16 bytes (128 bits, 2 words). That's a shame: It means that if you build an enum that contains a Box<[T]>, then it will at least require 24 bytes (196 bits, 3 words) of stack memory.

But it is rather common to work with slices that will never be that large. For example, what if we store the size in a u32 instead? Will your slices really contain more than 2ˆ32 (4_294_967_296) elements? a [u8; 2^32] takes 4GiB of space.

And since the length is counted in elements, a [u64; 2^32] takes 32GiB.

So lets slim this 'fat' pointer down! By storing the length inside a u32 rather than a u64, a SlimmerBox<[T], u32> only takes up 12 bytes (96 bits, 1.5 words) rather than 16 bytes.

This allows it to be used inside another structure, such as in one or more variants of an enum. The resulting structure will then still only take up 16 bytes.

In situations where you are trying to optimize for memory usage, cache locality, etc, this might make a difference:

Motivating Example

The following 'small str optimization' enum still only takes up two words, just like a normal &str would:

use slimmer_box::SlimmerBox;
pub enum CompactStr {
    Small{buffer: [u8; 14], len: u8}, // <- Or, using the `modular_bitfield` crate, this could even be { buffer: [u8; 15], len: u4} !
    Large{ptr: SlimmerBox<str>},
}

impl From<&str> for CompactStr {
    fn from(val: &str) -> CompactStr {
        if val.len() < 14 {
            let len = val.len() as u8;
            let mut buffer = [0u8; 14];
            buffer[0..val.len()].copy_from_slice(val.as_bytes());
            CompactStr::Small{ buffer, len }
        } else {
            CompactStr::Large{ ptr: SlimmerBox::new(val) }
        }
    }
}

let compact_str: CompactStr = "hello world".into();
assert_eq!(core::mem::size_of_val(&compact_str), 16);

// An Option<CompactStr> also only takes up two words:
assert_eq!(core::mem::size_of_val(&Some(compact_str)), 16);

(A full version of this example including Debug, Display and Deref traits can be found in this test)

The following immutable AST still only takes up two words. Even Option<AST> is only two words:

pub enum AST {
    Bool(bool),
    Int(i64),
    Float(f64),
    Str(SlimmerBox<str>),
    Bytes(SlimmerBox<[u8]>),
    List(SlimmerBox<[AST]>),
    // 2^32 - 7 other variants could be added and the size would still stay the same :-)
}
assert_eq!(core::mem::size_of::<AST>(), 16);
assert_eq!(core::mem::size_of::<Option<AST>>(), 16);

With some care, you could even combine the above two examples together, and still end up with an AST type that takes up just two words!

Different sizes

SlimmerBox<T, u32> is the most common version, and therefore u32 is the default SlimmerMetadata to use. But it is possible to use another variant, if you are sure that your data will be even shorter.

  • SlimmerMetadata = () is used for sized types. In this case a SlimmerBox will only contain the normal pointer and be exactly 1 word size, just like a normal Box containing a sized type.
  • SlimmerMetadata = u64 would make SlimmerBox behave exactly like a normal Box containing a dynamically-sized type on a 64-bit system.
SlimmerMetadata max DST length¹ resulting size (32bit) resulting size (64bit) Notes
() - 4 bytes 8 bytes Used for normal sized types. Identical in size to a normal Box in this case.
u8 255 5 bytes 9 bytes
u16 65535 6 bytes 10 bytes Identical to Box on 16-bit systems
u32 4294967295 8 bytes (2 words) 12 bytes Identical to Box on 32-bit systems
u64 18446744073709551615 16 bytes (2 words) Identical to Box on 64-bit systems
  • ¹ Max DST length is in bytes for str and in the number of elements for slices.

Niche optimization

Just like a normal Box, sizeof(Option<SlimmerBox<T>>) == sizeof(SlimmerBox<T>).

Rkyv

rkyv's Archive, Serialize and Deserialize have been implemented for SlimmerBox. The serialized version of a SlimmerBox is 'just' a normal rkyv::ArchivedBox<[T]>. This is a match made in heaven, since rkyv's relative pointers use only 32 bits for the pointer part as well as the length part. As such, sizeof(rkyv::Archived<SlimmerBox<T>>) == 8 bytes (!). (This is assuming rkyv's feature size_32 is used which is the default. Changing it to size_64 is rarely useful for the same reason as the rant about lengths above.)

Limitations

You can not use a SlimmerBox to store a trait object. This is because the metadata of a dyn pointer is another full-sized pointer. We cannot make that smaller!

no_std support

SlimmerBox works perfectly fine in no_std environments, as long as the alloc crate is available.

(The only thing that is missing in no_std environments are implementations for SlimmerPointee of std::ffi::OsStr and std::ffi::CStr, neither of which exists when std is disabled.)

Usage Examples

(Below examples assume a 64-bit system)

Smaller than a normal Box for dynamically-sized types like slices or strings:

use slimmer_box::SlimmerBox;

let array: [u64; 4] = [1, 2, 3, 4];

let boxed_slice: Box<[u64]> = Box::from(&array[..]);
assert_eq!(core::mem::size_of_val(&boxed_slice), 16);

let slimmer_boxed_slice: SlimmerBox<[u64]> = SlimmerBox::new(&array[..]);
assert_eq!(core::mem::size_of_val(&slimmer_boxed_slice), 12);

Just like normal Box for normal, Sized types:

use slimmer_box::SlimmerBox;

let int = 42;

let boxed_int = Box::new(&int);
assert_eq!(core::mem::size_of_val(&boxed_int), 8);

let slimmer_boxed_int: SlimmerBox<u64, ()> = SlimmerBox::new(&int);
assert_eq!(core::mem::size_of_val(&slimmer_boxed_int), 8);

You can configure how much space you want to use for the length of a dynamically-sized slice or str:

use slimmer_box::SlimmerBox;

let array: [u64; 4] = [1, 2, 3, 4];
// Holds at most 255 elements:
let tiny: SlimmerBox<[u64], u8>  = SlimmerBox::new(&array);
assert_eq!(core::mem::size_of_val(&tiny), 9);

// Holds at most 65535 elements or a str of 64kb:
let small: SlimmerBox<[u64], u16>  = SlimmerBox::new(&array);
assert_eq!(core::mem::size_of_val(&small), 10);

// Holds at most 4294967295 elements or a str of 4GB:
let medium: SlimmerBox<[u64], u32>  = SlimmerBox::new(&array);
assert_eq!(core::mem::size_of_val(&medium), 12);

// Holds at most 18446744073709551615 elements, or a str of 16EiB:
let large: SlimmerBox<[u64], u64>  = SlimmerBox::new(&array); // <- Indistinguishable from a normal Box
assert_eq!(core::mem::size_of_val(&large), 16);

You can turn a Box into a SlimmerBox and vice-versa:

use slimmer_box::SlimmerBox;

let message = "hello, world!";
let boxed = Box::new(message);
let slimmer_box = SlimmerBox::from_box(boxed);
let again_boxed = SlimmerBox::into_box(slimmer_box);

Feature flags

  • "std". Enabled by default. Disable the default features to use the crate in no_std environments. slimmer_box does require the alloc crate to be available.
  • "rkyv". Enable support for the rkyv zero-copy serialisation/deserialisation library, which is a very good match for this crate!
  • "serde". Enable support for the serde serialisation/deserialisation library.

MSRV

The minimum supported Rust version of slimmer_box is 1.58.1.

rust-slimmer_box's People

Contributors

qqwy avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

rust-slimmer_box's Issues

UB leading to real-world segfaults with SlimmerBox<[String]>

Hi,

First thanks for writing this, you saved me quite a lot of time. However, writing something like this in anywhere in my 1 million LOC program causes an imediate segfault:

let b = slimmer_box::SlimmerBox::from_box(vec!["1".to_string()].into_boxed_slice());
let v = Value::List(b.clone());
let c = v.clone();
Signal: SIGSEGV (signal SIGSEGV: address not mapped to object (fault address: 0x0))
Signal: SIGSEGV (signal SIGSEGV: address not mapped to object (fault address: 0x0))

I can't get a simple program to segfault but miri indeed picks up UB for this:

fn main() {
	let a = slimmer_box::SlimmerBox::<[String], u32>::from_box(vec!["1".to_string()].into_boxed_slice());
	let b = a.clone();

	println!("{:?}", a);
	println!("{:?}", b);
}
cargo +nightly miri run
error: Undefined Behavior: using uninitialized data, but this operation requires initialized memory
   --> /home/alexandre/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/raw_vec.rs:278:9
    |
278 |         self.ptr.as_ptr()
    |         ^^^^^^^^ using uninitialized data, but this operation requires initialized memory
    |
    = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
    = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
    = note: BACKTRACE:
    = note: inside `alloc::raw_vec::RawVec::<u8>::ptr` at /home/alexandre/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/raw_vec.rs:278:9: 278:17
    = note: inside `std::vec::Vec::<u8>::as_mut_ptr` at /home/alexandre/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/vec/mod.rs:1392:9: 1392:23
    = note: inside `<std::vec::Vec<u8> as std::ops::DerefMut>::deref_mut` at /home/alexandre/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/vec/mod.rs:2823:44: 2823:61
    = note: inside `std::vec::Vec::<u8>::as_mut_slice` at /home/alexandre/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/vec/mod.rs:1272:9: 1272:13
    = note: inside `std::vec::Vec::<u8>::clear` at /home/alexandre/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/vec/mod.rs:2229:31: 2229:50
    = note: inside `<[u8] as std::slice::SpecCloneIntoVec<u8, std::alloc::Global>>::clone_into` at /home/alexandre/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/slice.rs:833:9: 833:23
    = note: inside `<std::vec::Vec<u8> as std::clone::Clone>::clone_from` at /home/alexandre/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/vec/mod.rs:2872:9: 2872:76
    = note: inside `<std::string::String as std::clone::Clone>::clone_from` at /home/alexandre/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/string.rs:2107:9: 2107:41
    = note: inside `<[std::string::String] as slimmer_box::CloneUnsized>::unsized_clone_from` at /home/alexandre/.cargo/registry/src/index.crates.io-6f17d22bba15001f/slimmer_box-0.6.5/src/clone_unsized.rs:22:9: 22:38
    = note: inside `slimmer_box::SlimmerBox::<[std::string::String]>::try_new` at /home/alexandre/.cargo/registry/src/index.crates.io-6f17d22bba15001f/slimmer_box-0.6.5/src/lib.rs:335:13: 335:66
    = note: inside `slimmer_box::SlimmerBox::<[std::string::String]>::new_unchecked` at /home/alexandre/.cargo/registry/src/index.crates.io-6f17d22bba15001f/slimmer_box-0.6.5/src/lib.rs:319:9: 319:29
    = note: inside `<slimmer_box::SlimmerBox<[std::string::String]> as std::clone::Clone>::clone` at /home/alexandre/.cargo/registry/src/index.crates.io-6f17d22bba15001f/slimmer_box-0.6.5/src/lib.rs:586:18: 586:50
note: inside `main`
   --> src/main.rs:3:10
    |
3   |     let b = a.clone();
    |             ^^^^^^^^^

note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrace

error: aborting due to 1 previous error

I'll try to investigate this sometime, but for now I'll store indices as I needed to de-duplicate strings anyways.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.