Git Product home page Git Product logo

integer-encoding-rs's People

Contributors

allengeorge avatar changrui0608 avatar dermesser avatar devashishdxt avatar dignifiedquire avatar jamesbornholt avatar jorgecarleitao avatar saethlin avatar yihuang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

integer-encoding-rs's Issues

write instead of write_all causes very rare failures

In the very rare cases where the underlying stream can only accept a small number of bytes for the first write call, the VarIntWriter functions will behave incorrectly, discarding the bytes that aren't written in the first call. I believe these calls should be replaced with write_all

Miri complains of OOB caused by pointer size mismatch

Hey! Thanks for a cool (and fast) library.

We run miri on dependencies, and I think there may be some memory issues as of 6da6d58. You can trigger this on an 64bit ubuntu machine with cargo miri test fixed_tests::tests::test_invalid_decode_size

$ cargo miri test fixed_tests::tests::test_invalid_decode_size
    Finished test [unoptimized + debuginfo] target(s) in 0.03s
     Running unittests src/lib.rs (target/miri/x86_64-unknown-linux-gnu/debug/deps/integer_encoding-784333d1193fd5df)

running 1 test
test fixed_tests::tests::test_invalid_decode_size - should panic ... error: Undefined Behavior: memory access failed: alloc160031 has size 6, so pointer to 8 bytes starting at offset 0 is out-of-bounds
    --> /home/ubuntu/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/intrinsics.rs:2450:9
     |
2450 |         copy_nonoverlapping(src, dst, count)
     |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ memory access failed: alloc160031 has size 6, so pointer to 8 bytes starting at offset 0 is out-of-bounds
     |
     = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
     = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
     = note: backtrace:
     = note: inside `std::intrinsics::copy_nonoverlapping::<u8>` at /home/ubuntu/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/intrinsics.rs:2450:9
     = note: inside `std::ptr::read_unaligned::<u64>` at /home/ubuntu/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:1199:9
     = note: inside `std::ptr::const_ptr::<impl *const u64>::read_unaligned` at /home/ubuntu/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/const_ptr.rs:1191:18
note: inside `<u64 as fixed::FixedInt>::decode_fixed` at src/fixed.rs:83:26
    --> src/fixed.rs:83:26
     |
83   |                 unsafe { (src.as_ptr() as *const $t).read_unaligned() }
     |                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...
103  | impl_fixedint!(u64);
     | ------------------- in this macro invocation
note: inside `fixed_tests::tests::test_invalid_decode_size` at src/fixed_tests.rs:144:24
    --> src/fixed_tests.rs:144:24
     |
144  |         assert_eq!(33, u64::decode_fixed(&[1, 0, 0, 0, 0, 1]));
     |                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside closure at src/fixed_tests.rs:143:5
    --> src/fixed_tests.rs:143:5
     |
142  |       #[test]
     |       ------- in this procedural macro expansion
143  | /     fn test_invalid_decode_size() {
144  | |         assert_eq!(33, u64::decode_fixed(&[1, 0, 0, 0, 0, 1]));
145  | |     }
     | |_____^
     = note: this error originates in the macro `impl_fixedint` (in Nightly builds, run with -Z macro-backtrace for more info)

note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrace

error: aborting due to previous error

error: test failed, to rerun pass '--lib'

Silent downcast overflow

When using VarintReader with types smaller than 64bits, the current implementation first reads a 64bit integer:

impl VarIntProcessor {
fn push(&mut self, b: u8) -> Result<()> {
if self.i >= 10 {
return Err(io::Error::new(
io::ErrorKind::InvalidData,
"Unterminated varint",
));
}
self.buf[self.i] = b;
self.i += 1;
Ok(())
}
fn finished(&self) -> bool {
self.i > 0 && (self.buf[self.i - 1] & MSB == 0)
}
fn decode<VI: VarInt>(&self) -> Option<VI> {
Some(VI::decode_var(&self.buf[0..self.i])?.0)
}
}

and then casts it to a smaller integer:

fn decode_var(src: &[u8]) -> Option<(Self, usize)> {
let (n, s) = u64::decode_var(src)?;
Some((n as Self, s))
}

fn decode_var(src: &[u8]) -> Option<(Self, usize)> {
let (n, s) = i64::decode_var(src)?;
Some((n as Self, s))
}

Now imagine a data stream w/ 9 bits 0xff followed by 0x00, which would be a pretty large 64bit integer. If you use VarIntReader::read_varint::<i32>(...) to decode that, it will sucessfully read the 64bit varint, consume all the bytes and during the conversion to 32bit will silently truncate the result. What it should do instead is to read only the number of bytes that are at max required for 32bit (5?) and then fail with "Unterminated varint".

Consider a new release?

tokio has been upgraded to 1.0, consider release a new version? So crates depending on integer-encoding can update their tokio version. :)

Panic during read

At this line, I'm encountering a panic when i becomes 10 and then the index is invalid. Should probably return an Err when i becomes that large, rather than indirectly cause a panic.

consider make zigzag explicitly instead of implicit apply it on signed

Zigzag and varint are two separate thing, though for signed integer, it is better to use both. But there are cases to apply only varint on signed integer. One example is thrift's compact protocol's sequence number field. It stated explicitly should be encoded as var int, though it is signed i32.

tokio = 1.0

Tokio 1.0 released, consider release a new version? :)

Build of the crate fails on big-endian architecture

The build fails for the s390x architecture, might be a problem with any big-endian architecture.

failures:
---- fixed_tests::tests::test_i16_enc stdout ----
thread 'fixed_tests::tests::test_i16_enc' panicked at 'assertion failed: `(left == right)`
  left: `[128, 0]`,
 right: `[0, 128]`', src/fixed_tests.rs:26:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
---- fixed_tests::tests::test_i32_enc stdout ----
thread 'fixed_tests::tests::test_i32_enc' panicked at 'assertion failed: `(left == right)`
  left: `[255, 255, 128, 1]`,
 right: `[1, 128, 255, 255]`', src/fixed_tests.rs:31:9
---- fixed_tests::tests::test_i32_enc_light stdout ----
thread 'fixed_tests::tests::test_i32_enc_light' panicked at 'assertion failed: `(left == right)`
  left: `[255, 255, 128, 1]`,
 right: `[1, 128, 255, 255]`', src/fixed_tests.rs:48:9
---- fixed_tests::tests::test_u16_enc stdout ----
thread 'fixed_tests::tests::test_u16_enc' panicked at 'assertion failed: `(left == right)`
  left: `[1, 0]`,
 right: `[0, 1]`', src/fixed_tests.rs:21:9
---- fixed_tests::tests::test_u32_enc stdout ----
thread 'fixed_tests::tests::test_u32_enc' panicked at 'assertion failed: `(left == right)`
  left: `[0, 0, 0, 32]`,
 right: `[32, 0, 0, 0]`', src/fixed_tests.rs:16:9
failures:
    fixed_tests::tests::test_i16_enc
    fixed_tests::tests::test_i32_enc
    fixed_tests::tests::test_i32_enc_light
    fixed_tests::tests::test_u16_enc
    fixed_tests::tests::test_u32_enc
test result: FAILED. 18 passed; 5 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

Consider sealing `FixedInt`

I was reviewing why we could not use forbid(unsafe_code) as part of #28, and I think that we can trigger UB in safe Rust by an incorrect implementation of FixedInt:

#[test]
fn test_switch() {
    impl FixedInt for i128 {
        // the invariant Self <-> [u8; size_of<Self>] is violated
        const REQUIRED_SPACE: usize = 256;
        fn required_space() -> usize {
            todo!()
        }

        fn encode_fixed(self, dst: &mut [u8]) {
            todo!()
        }

        fn decode_fixed(src: &[u8]) -> Self {
            todo!()
        }

        fn encode_fixed_light<'a>(&'a self) -> &'a [u8] {
            todo!()
        }
    }

    let int = -32767i128;
    let int = int.switch_endianness();  // ๐Ÿ’ฅ
}

This case is specific to an invalid REQUIRED_SPACE and we can fix it assert_eq!(size_of::<Self>(), Self::REQUIRED_SPACE); on switch_endianness, but the notion more general - AFAIK the transmutes happening here are only valid "plain old data" types (bytesmuck::Pod).

A more exotic example is #[derive(Clone, Copy, Debug)] struct A<'a>(&'a [u8]); we allow impl FixedInt for A<'_> (even a correct REQUIRED_SPACE of 16 results in UB).

AFAIK we fundamentally can't have the current implementation of switch_endianness without assuming that the implementer fulfills the plain old data invariant.

I can't think of a way of mitigating this without breaking backward compatibility. Specifically, FixedInt must require more invariants than it currently requires, which is a backward incompatible change.

Assuming that we must break backward compatibility and none of the public APIs expects users to implement FixedInt, my preference is to seal the trait FixedInt - we only use to expose functions to some fundamental types (arguably because Rust std does not have traits for to_le_bytes, to_be_bytes, etc.).

This would give us the full flexibility to use it within the constraints we have specifically in the crate (the types that implement the trait). I am confident that it would allow us to remove all unsafe in the crate.

Alternatively, we could use FixedInt to FixedInt: bytesmuck::Pod (which introduces a new dependency :/)

EDIT: encode_fixed_light has an equivalent problem - transmute of structs to byte slices is only valid for bytesmuck::Zeroable structs - Rust allows uninitialized padding bytes, which transmuting to &[u8] results in UB (&[u8] assumes initialized bytes).

Would it be possible to separate writing a varint vs. wring a zig-zag varint?

Some background: I'm working on a thrift plugin that autogenerates Rust from a Thrift IDL. The compact wire-format uses both varints and zigzag encoded varints for numbers. From what I can tell they don't use zigzag encoding for some numbers (such as message sizes) because they're never negative.

I really wanted to use this crate but I noticed that the impl VarInt for i64 and u64 always zig-zag encodes the number before writing it out as a varint. Would there be a way to split this up? In other words, have a write_varint and write_zigzag_varint? I understand if you would prefer not to do this, but thought I'd ask.

Thank you - and I really appreciate your work on this crate - it's super clean!

Incorrect result without error on overflow

The decoded integer value is evaluated with respect to only the first bytes that can fit within the integer size in case of overflow.

e.g:

let n = 0x393939;
let buffer: &mut [u8] = &mut vec![0; 4][..];
let u = n.encode_var(buffer);
println!("{}", u);
println!("{:?}", buffer);

match u8::decode_var(buffer) {
    Some((u, b)) => {
        println!("{} {}", u, b);
    }
    None => panic!(),
}
4
[242, 228, 201, 3]
decoded: 114 4

Handling varint decode errors

If all bytes of the slice have MSB set, the library assumes the last slice does not have the MSB set, and silently truncates.

Furthermore, if the value of the varint overflows, it silently drops the overflowing bits.

These are malformed representations and should have error reported.

Integer encoding to buffer fails

The code below fails. I'm not sure why.

let num = 1654648798768;
let required_space = num.required_space();
let mut dst = Vec::with_capacity(required_space);
num.encode_var(&mut dst);

Optimize the fast-path of 0 for varint

For varint decoding there is no fast-path for under 128(which fits within the 7 bits) and for encoding there is a fast-path for 0. This could be extended to 128 easily and still needs to set 2 variables(n and i) even though they are only used when n > 0.

decode_var return value changed in 1.2.0

The return value of decode_var changed from 1.1.x to 1.2.x by returning an optional value:

https://docs.rs/integer-encoding/1.1.5/integer_encoding/trait.VarInt.html#tymethod.decode_var
https://docs.rs/integer-encoding/1.2.0/integer_encoding/trait.VarInt.html#tymethod.decode_var

This leads to compilation failures in dependent crates that use this function and define

integer-encoding = "1.0"

in their Cargo.toml since cargo assumes semantic versioning and upgrades to version 1.2.1 when compiling.
This hits e.g. the sstable project, that won't compile after doing a cargo update or doing a fresh checkout.
My proposal would be to release the current version as 2.0.0 and yank both 1.2.0 and 1.2.1 so cargo does not consider these versions anymore.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.