dermesser / integer-encoding-rs Goto Github PK

Integer encoding for primitive integer types: Supports varint/varint+zigzag and fixed-length integer encoding and decoding, and provides synchronous and asynchronous Write/Read types for easily writing/reading integers.

License: Other

Rust 98.99% Shell 1.01%

encoding endian endianness rust varint

integer-encoding-rs's People

Contributors

Stargazers

Watchers

Forkers

mesalock-linux sammyne yihuang zyansheep shuoli84 haoheharry isgasho icodein saethlin jorgecarleitao dignifiedquire stringhandler tari-project jaywhite2357

integer-encoding-rs's Issues

write instead of write_all causes very rare failures

In the very rare cases where the underlying stream can only accept a small number of bytes for the first write call, the VarIntWriter functions will behave incorrectly, discarding the bytes that aren't written in the first call. I believe these calls should be replaced with write_all

Miri complains of OOB caused by pointer size mismatch

Hey! Thanks for a cool (and fast) library.

We run miri on dependencies, and I think there may be some memory issues as of 6da6d58. You can trigger this on an 64bit ubuntu machine with cargo miri test fixed_tests::tests::test_invalid_decode_size

$ cargo miri test fixed_tests::tests::test_invalid_decode_size
    Finished test [unoptimized + debuginfo] target(s) in 0.03s
     Running unittests src/lib.rs (target/miri/x86_64-unknown-linux-gnu/debug/deps/integer_encoding-784333d1193fd5df)

running 1 test
test fixed_tests::tests::test_invalid_decode_size - should panic ... error: Undefined Behavior: memory access failed: alloc160031 has size 6, so pointer to 8 bytes starting at offset 0 is out-of-bounds
    --> /home/ubuntu/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/intrinsics.rs:2450:9
     |
2450 |         copy_nonoverlapping(src, dst, count)
     |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ memory access failed: alloc160031 has size 6, so pointer to 8 bytes starting at offset 0 is out-of-bounds
     |
     = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
     = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
     = note: backtrace:
     = note: inside `std::intrinsics::copy_nonoverlapping::<u8>` at /home/ubuntu/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/intrinsics.rs:2450:9
     = note: inside `std::ptr::read_unaligned::<u64>` at /home/ubuntu/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:1199:9
     = note: inside `std::ptr::const_ptr::<impl *const u64>::read_unaligned` at /home/ubuntu/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/const_ptr.rs:1191:18
note: inside `<u64 as fixed::FixedInt>::decode_fixed` at src/fixed.rs:83:26
    --> src/fixed.rs:83:26
     |
83   |                 unsafe { (src.as_ptr() as *const $t).read_unaligned() }
     |                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...
103  | impl_fixedint!(u64);
     | ------------------- in this macro invocation
note: inside `fixed_tests::tests::test_invalid_decode_size` at src/fixed_tests.rs:144:24
    --> src/fixed_tests.rs:144:24
     |
144  |         assert_eq!(33, u64::decode_fixed(&[1, 0, 0, 0, 0, 1]));
     |                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside closure at src/fixed_tests.rs:143:5
    --> src/fixed_tests.rs:143:5
     |
142  |       #[test]
     |       ------- in this procedural macro expansion
143  | /     fn test_invalid_decode_size() {
144  | |         assert_eq!(33, u64::decode_fixed(&[1, 0, 0, 0, 0, 1]));
145  | |     }
     | |_____^
     = note: this error originates in the macro `impl_fixedint` (in Nightly builds, run with -Z macro-backtrace for more info)

note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrace

error: aborting due to previous error

error: test failed, to rerun pass '--lib'

Panic triggered by Thrift's usage of library, introduced in 1.1.0

I've reported what I think is a problem in Thrift's use of this library, with a reproduction recipe, in https://issues.apache.org/jira/browse/THRIFT-5131.

Just in case it's actually a bug in this library I'm reporting it here too. I did "git bisect" to find the commit that introduced the problem, it's f7d55af.

Silent downcast overflow

When using VarintReader with types smaller than 64bits, the current implementation first reads a 64bit integer:

integer-encoding-rs/src/reader.rs

Lines 40 to 58 in 14cd007

 impl VarIntProcessor { 

 fn push(&mut self, b: u8) -> Result<()> { 

 if self.i >= 10 { 

 return Err(io::Error::new( 

 io::ErrorKind::InvalidData, 

 "Unterminated varint", 

 )); 

 } 

 self.buf[self.i] = b; 

 self.i += 1; 

 Ok(()) 

 } 

 fn finished(&self) -> bool { 

 self.i > 0 && (self.buf[self.i - 1] & MSB == 0) 

 } 

 fn decode<VI: VarInt>(&self) -> Option<VI> { 

 Some(VI::decode_var(&self.buf[0..self.i])?.0) 

 } 

 }

and then casts it to a smaller integer:

integer-encoding-rs/src/varint.rs

Lines 74 to 77 in 14cd007

 fn decode_var(src: &[u8]) -> Option<(Self, usize)> { 

 let (n, s) = u64::decode_var(src)?; 

 Some((n as Self, s)) 

 }

integer-encoding-rs/src/varint.rs

Lines 90 to 93 in 14cd007

 fn decode_var(src: &[u8]) -> Option<(Self, usize)> { 

 let (n, s) = i64::decode_var(src)?; 

 Some((n as Self, s)) 

 }

Now imagine a data stream w/ 9 bits 0xff followed by 0x00, which would be a pretty large 64bit integer. If you use VarIntReader::read_varint::<i32>(...) to decode that, it will sucessfully read the 64bit varint, consume all the bytes and during the conversion to 32bit will silently truncate the result. What it should do instead is to read only the number of bytes that are at max required for 32bit (5?) and then fail with "Unterminated varint".

Consider a new release?

tokio has been upgraded to 1.0, consider release a new version? So crates depending on integer-encoding can update their tokio version. :)

Panic during read

At this line, I'm encountering a panic when i becomes 10 and then the index is invalid. Should probably return an Err when i becomes that large, rather than indirectly cause a panic.

consider make zigzag explicitly instead of implicit apply it on signed

Zigzag and varint are two separate thing, though for signed integer, it is better to use both. But there are cases to apply only varint on signed integer. One example is thrift's compact protocol's sequence number field. It stated explicitly should be encoded as var int, though it is signed i32.

Is there any plan on supporing network-endian on FixedInt ?

reading/write from network-endian buffer or stream, which I think could be very helpful on network protocol things.

Decoding doesn't always return `None` when it should.

[0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x02] is the encoding of u64::MAX.
This does not return None when decoding even though it should.

I have implemented a fix here: #36.

tokio = 1.0

Tokio 1.0 released, consider release a new version? :)

Build of the crate fails on big-endian architecture

The build fails for the s390x architecture, might be a problem with any big-endian architecture.

failures:
---- fixed_tests::tests::test_i16_enc stdout ----
thread 'fixed_tests::tests::test_i16_enc' panicked at 'assertion failed: `(left == right)`
  left: `[128, 0]`,
 right: `[0, 128]`', src/fixed_tests.rs:26:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
---- fixed_tests::tests::test_i32_enc stdout ----
thread 'fixed_tests::tests::test_i32_enc' panicked at 'assertion failed: `(left == right)`
  left: `[255, 255, 128, 1]`,
 right: `[1, 128, 255, 255]`', src/fixed_tests.rs:31:9
---- fixed_tests::tests::test_i32_enc_light stdout ----
thread 'fixed_tests::tests::test_i32_enc_light' panicked at 'assertion failed: `(left == right)`
  left: `[255, 255, 128, 1]`,
 right: `[1, 128, 255, 255]`', src/fixed_tests.rs:48:9
---- fixed_tests::tests::test_u16_enc stdout ----
thread 'fixed_tests::tests::test_u16_enc' panicked at 'assertion failed: `(left == right)`
  left: `[1, 0]`,
 right: `[0, 1]`', src/fixed_tests.rs:21:9
---- fixed_tests::tests::test_u32_enc stdout ----
thread 'fixed_tests::tests::test_u32_enc' panicked at 'assertion failed: `(left == right)`
  left: `[0, 0, 0, 32]`,
 right: `[32, 0, 0, 0]`', src/fixed_tests.rs:16:9
failures:
    fixed_tests::tests::test_i16_enc
    fixed_tests::tests::test_i32_enc
    fixed_tests::tests::test_i32_enc_light
    fixed_tests::tests::test_u16_enc
    fixed_tests::tests::test_u32_enc
test result: FAILED. 18 passed; 5 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

Add `#[forbid(unsafe_code)]`?

The crate fulfills this invariant, maybe we could add it?

Consider sealing `FixedInt`

I was reviewing why we could not use forbid(unsafe_code) as part of #28, and I think that we can trigger UB in safe Rust by an incorrect implementation of FixedInt:

#[test]
fn test_switch() {
    impl FixedInt for i128 {
        // the invariant Self <-> [u8; size_of<Self>] is violated
        const REQUIRED_SPACE: usize = 256;
        fn required_space() -> usize {
            todo!()
        }

        fn encode_fixed(self, dst: &mut [u8]) {
            todo!()
        }

        fn decode_fixed(src: &[u8]) -> Self {
            todo!()
        }

        fn encode_fixed_light<'a>(&'a self) -> &'a [u8] {
            todo!()
        }
    }

    let int = -32767i128;
    let int = int.switch_endianness();  // 💥
}

This case is specific to an invalid REQUIRED_SPACE and we can fix it assert_eq!(size_of::<Self>(), Self::REQUIRED_SPACE); on switch_endianness, but the notion more general - AFAIK the transmutes happening here are only valid "plain old data" types (bytesmuck::Pod).

A more exotic example is #[derive(Clone, Copy, Debug)] struct A<'a>(&'a [u8]); we allow impl FixedInt for A<'_> (even a correct REQUIRED_SPACE of 16 results in UB).

AFAIK we fundamentally can't have the current implementation of switch_endianness without assuming that the implementer fulfills the plain old data invariant.

I can't think of a way of mitigating this without breaking backward compatibility. Specifically, FixedInt must require more invariants than it currently requires, which is a backward incompatible change.

Assuming that we must break backward compatibility and none of the public APIs expects users to implement FixedInt, my preference is to seal the trait FixedInt - we only use to expose functions to some fundamental types (arguably because Rust std does not have traits for to_le_bytes, to_be_bytes, etc.).

This would give us the full flexibility to use it within the constraints we have specifically in the crate (the types that implement the trait). I am confident that it would allow us to remove all unsafe in the crate.

Alternatively, we could use FixedInt to FixedInt: bytesmuck::Pod (which introduces a new dependency :/)

EDIT: encode_fixed_light has an equivalent problem - transmute of structs to byte slices is only valid for bytesmuck::Zeroable structs - Rust allows uninitialized padding bytes, which transmuting to &[u8] results in UB (&[u8] assumes initialized bytes).

Would it be possible to separate writing a varint vs. wring a zig-zag varint?

Some background: I'm working on a thrift plugin that autogenerates Rust from a Thrift IDL. The compact wire-format uses both varints and zigzag encoded varints for numbers. From what I can tell they don't use zigzag encoding for some numbers (such as message sizes) because they're never negative.

I really wanted to use this crate but I noticed that the impl VarInt for i64 and u64 always zig-zag encodes the number before writing it out as a varint. Would there be a way to split this up? In other words, have a write_varint and write_zigzag_varint? I understand if you would prefer not to do this, but thought I'd ask.

Thank you - and I really appreciate your work on this crate - it's super clean!

Incorrect result without error on overflow

The decoded integer value is evaluated with respect to only the first bytes that can fit within the integer size in case of overflow.

e.g:

let n = 0x393939;
let buffer: &mut [u8] = &mut vec![0; 4][..];
let u = n.encode_var(buffer);
println!("{}", u);
println!("{:?}", buffer);

match u8::decode_var(buffer) {
    Some((u, b)) => {
        println!("{} {}", u, b);
    }
    None => panic!(),
}

4
[242, 228, 201, 3]
decoded: 114 4

Handling varint decode errors

If all bytes of the slice have MSB set, the library assumes the last slice does not have the MSB set, and silently truncates.

Furthermore, if the value of the varint overflows, it silently drops the overflowing bits.

These are malformed representations and should have error reported.

Integer encoding to buffer fails

The code below fails. I'm not sure why.

let num = 1654648798768;
let required_space = num.required_space();
let mut dst = Vec::with_capacity(required_space);
num.encode_var(&mut dst);

Implement async versions of VarIntReader and VarIntWriter

Using futures::io traits: AsyncRead and AsyncWrite.

Optimize the fast-path of 0 for varint

For varint decoding there is no fast-path for under 128(which fits within the 7 bits) and for encoding there is a fast-path for 0. This could be extended to 128 easily and still needs to set 2 variables(n and i) even though they are only used when n > 0.

decode_var return value changed in 1.2.0

The return value of decode_var changed from 1.1.x to 1.2.x by returning an optional value:

https://docs.rs/integer-encoding/1.1.5/integer_encoding/trait.VarInt.html#tymethod.decode_var
https://docs.rs/integer-encoding/1.2.0/integer_encoding/trait.VarInt.html#tymethod.decode_var

This leads to compilation failures in dependent crates that use this function and define

integer-encoding = "1.0"

in their Cargo.toml since cargo assumes semantic versioning and upgrades to version 1.2.1 when compiling.
This hits e.g. the sstable project, that won't compile after doing a cargo update or doing a fresh checkout.
My proposal would be to release the current version as 2.0.0 and yank both 1.2.0 and 1.2.1 so cargo does not consider these versions anymore.

	impl VarIntProcessor {
	fn push(&mut self, b: u8) -> Result<()> {
	if self.i >= 10 {
	return Err(io::Error::new(
	io::ErrorKind::InvalidData,
	"Unterminated varint",
	));
	}
	self.buf[self.i] = b;
	self.i += 1;
	Ok(())
	}
	fn finished(&self) -> bool {
	self.i > 0 && (self.buf[self.i - 1] & MSB == 0)
	}
	fn decode<VI: VarInt>(&self) -> Option<VI> {
	Some(VI::decode_var(&self.buf[0..self.i])?.0)
	}
	}

	fn decode_var(src: &[u8]) -> Option<(Self, usize)> {
	let (n, s) = u64::decode_var(src)?;
	Some((n as Self, s))
	}

	fn decode_var(src: &[u8]) -> Option<(Self, usize)> {
	let (n, s) = i64::decode_var(src)?;
	Some((n as Self, s))
	}