Git Product home page Git Product logo

dlhn's Introduction

Latest Version Documentation License

DLHN

DLHN is a blazing fast and small data serialization format.
Specification

Overview

DLHN ( Pronounced the same as "Dullahan" ) is a language and platform neutral binary serialization format that is inspired by JSON, CSV, MessagePack, and Protocol Buffers. It is designed for blazing fast serialization and deserialization with the smallest possible data size without the need for schema file. However, we are also considering supporting schema file in the future.

QuickStart

[dependencies]
dlhn = "0.1"

Serialize and deserialize body

use dlhn::{Deserializer, Serializer};
use serde::{Deserialize, Serialize};

#[derive(Serialize, Deserialize, PartialEq, Debug)]
struct Test {
    a: bool,
    b: u8,
    c: String,
}

fn main() {
    let body = Test {
        a: true,
        b: 123,
        c: "test".to_string(),
    };

    // Serialize body
    let mut output = Vec::new();
    let mut serializer = Serializer::new(&mut output);
    body.serialize(&mut serializer).unwrap();

    // Deserialize body
    let mut reader = output.as_slice();
    let mut deserializer = Deserializer::new(&mut reader);
    let deserialized_body = Test::deserialize(&mut deserializer).unwrap();

    assert_eq!(body, deserialized_body);
}

Serialize and deserialize header

use dlhn::{DeserializeHeader, SerializeHeader, Header};

#[derive(SerializeHeader)]
struct Test {
    a: bool,
    b: u8,
    c: String,
}

fn main() {
    let mut output = Vec::new();

    // Serialize header
    Test::serialize_header(&mut output).unwrap();
    assert_eq!(
        output,
        [
            21, // Tuple code
            3,  // Tuple size
            2,  // Boolean code
            3,  // UInt8 code
            18, // String code
        ]
    );

    // Deserialize header
    let deserialized_header = output.as_slice().deserialize_header().unwrap();
    assert_eq!(
        deserialized_header,
        Header::Tuple(vec![Header::Boolean, Header::UInt8, Header::String])
    );
}

Stream version serialize and deserialize bodies

use dlhn::{de::Error, Deserializer, Serializer};
use serde::{Deserialize, Serialize};

fn main() {
    let mut output = Vec::new();

    // Serialize body
    let mut serializer = Serializer::new(&mut output);
    true.serialize(&mut serializer).unwrap();
    false.serialize(&mut serializer).unwrap();
    assert_eq!(output, [1, 0]);

    // Deserialize body
    let mut reader = output.as_slice();
    let mut deserializer = Deserializer::new(&mut reader);
    assert_eq!(bool::deserialize(&mut deserializer), Ok(true));
    assert_eq!(bool::deserialize(&mut deserializer), Ok(false));
    assert_eq!(bool::deserialize(&mut deserializer), Err(Error::Read));
}

Benchmark

Rust serialization benchmark

Copyright

Copyright 2020-2022 Shogo Otake

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

dlhn's People

Contributors

otake84 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

dlhn's Issues

u64 serialize/deserialize doesn't match

This test scenario fails:

    #[test]
    fn test_dlhn_u64() {
        let id: u64 = 37486878914941;
        let mut buf = Vec::new();
        let mut serializer = dlhn_serializer::new(&mut buf);
        id.serialize(&mut serializer).unwrap();

        let mut reader = buf.as_slice();
        let mut deserializer = dlhn_deserializer::new(&mut reader);
        let id_deserialized = u64::deserialize(&mut deserializer).unwrap();

        assert_eq!(id, id_deserialized);
    }

String > 128 bytes fails to roundtrip

dlhn = "0.1.6"
serde = "1.0"
fn main() {
    use serde::{Serialize, Deserialize};

    let original = " ".repeat(129);
    let mut serialized = vec![];
    original.serialize(&mut dlhn::Serializer::new(&mut serialized)).unwrap();

    let deserialized = String::deserialize(&mut dlhn::Deserializer::new(&mut serialized.as_slice())).unwrap();
    assert_eq!(original, deserialized);
}

The string gets serialized with all 129 bytes, but only 128 bytes are deserialized causing the assert_eq to fail. This is caused by #13.
A possible fix for deserializing String and Vec<u8> that wouldn't break #11:

fn main() -> std::io::Result<()> {
    use std::io::{self, Read};
    let reader: &mut dyn Read = &mut [b' '; 100].as_slice();
    let len = 10;

    let mut s = String::new();
    if reader.take(len as u64).read_to_string(&mut s)? != len {
        return Err(io::Error::new(io::ErrorKind::UnexpectedEof, ""));
    };
    assert_eq!(s, " ".repeat(len));

    let mut v = Vec::new();
    if reader.take(len as u64).read_to_end(&mut v)? != len {
        return Err(io::Error::new(io::ErrorKind::UnexpectedEof, ""));
    };
    assert_eq!(v, vec![b' '; len]);

    Ok(())
}

Header should contain property names?

This is great. I love how the header is optional to keep the data size small when the format is known and compile time!

DLHN is a language and platform neutral binary serialization format. It is designed for blazing fast serialization and deserialization with the smallest possible data size without the need for schema file.

If I wanted to create a parser, for example for JavaScript with no schema file, the header does not contain enough information to allow me to reconstruct the object as property names would be required.

Error when decoding bytes dlhn produced

Hi!

Included in musli is a deterministic fuzzer, one thing it does is generate random datastructures and then tries to pipe them through a serialize / deserialize phase.

I've had to disable dlhn now because it produces bytes that it fails to deserialize:

> cargo run -p musli-tests --bin fuzz --features dlhn -- large
    Finished dev [unoptimized + debuginfo] target(s) in 0.03s
     Running `target/debug/fuzz large`
serde_dlhn/large: .E
0: error during decode: Read error
0: failing structure written to target/serde_dlhn_error.bin
 734.045µs

This is the structure being serialized.

You can check out the project yourself and give the above command a run and you should be able to troubleshoot this yourself.

Example of serialized data

Is there a representation somewhere of what the serialized data will look like? In the specification, I saw only a representation of specific types.
It may be worth adding a small Overview, which will show the result of serialization of a simple struct.

Untrusted input can be crafted to cause large internal allocations

Found through fuzzing, should be reproducible if you check out musli and run this:

cargo +nightly miri run --bin fuzz --no-default-features --features model_dlhn,dlhn -- --random --iter 10
MIRI backtrace
error: resource exhaustion: tried to allocate more memory than available to compiler
   --> C:\Users\udoprog\.rustup\toolchains\nightly-x86_64-pc-windows-msvc\lib\rustlib\src\rust\library\alloc\src\alloc.rs:164:14
    |
164 |     unsafe { __rust_alloc_zeroed(layout.size(), layout.align()) }
    |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ tried to allocate more memory than available to compiler
    |
    = note: inside `std::alloc::alloc_zeroed` at C:\Users\udoprog\.rustup\toolchains\nightly-x86_64-pc-windows-msvc\lib\rustlib\src\rust\library\alloc\src\alloc.rs:164:14: 164:64
    = note: inside `std::alloc::Global::alloc_impl` at C:\Users\udoprog\.rustup\toolchains\nightly-x86_64-pc-windows-msvc\lib\rustlib\src\rust\library\alloc\src\alloc.rs:175:43: 175:63
    = note: inside `<std::alloc::Global as std::alloc::Allocator>::allocate_zeroed` at C:\Users\udoprog\.rustup\toolchains\nightly-x86_64-pc-windows-msvc\lib\rustlib\src\rust\library\alloc\src\alloc.rs:240:9: 240:38
    = note: inside `alloc::raw_vec::RawVec::<u8>::allocate_in` at C:\Users\udoprog\.rustup\toolchains\nightly-x86_64-pc-windows-msvc\lib\rustlib\src\rust\library\alloc\src\raw_vec.rs:185:38: 185:67      
    = note: inside `alloc::raw_vec::RawVec::<u8>::with_capacity_zeroed_in` at C:\Users\udoprog\.rustup\toolchains\nightly-x86_64-pc-windows-msvc\lib\rustlib\src\rust\library\alloc\src\raw_vec.rs:138:9: 138:62
    = note: inside `<u8 as std::vec::spec_from_elem::SpecFromElem>::from_elem::<std::alloc::Global>` at C:\Users\udoprog\.rustup\toolchains\nightly-x86_64-pc-windows-msvc\lib\rustlib\src\rust\library\alloc\src\vec\spec_from_elem.rs:52:31: 52:72
    = note: inside `std::vec::from_elem::<u8>` at C:\Users\udoprog\.rustup\toolchains\nightly-x86_64-pc-windows-msvc\lib\rustlib\src\rust\library\alloc\src\vec\mod.rs:2550:5: 2550:52
    = note: inside `dlhn::de::Deserializer::<'_, &[u8]>::new_dynamic_buf` at C:\Users\udoprog\.rustup\toolchains\nightly-x86_64-pc-windows-msvc\lib\rustlib\src\rust\library\alloc\src\macros.rs:47:36: 47:69
    = note: inside `<&mut dlhn::de::Deserializer<'_, &[u8]> as serde::de::Deserializer<'_>>::deserialize_string::<serde::de::impls::StringVisitor>` at C:\Users\udoprog\.cargo\registry\src\index.crates.io-6f17d22bba15001f\dlhn-0.1.5\src\de.rs:217:28: 217:50
    = note: inside `serde::de::impls::<impl serde::de::Deserialize<'_> for std::string::String>::deserialize::<&mut dlhn::de::Deserializer<'_, &[u8]>>` at C:\Users\udoprog\.cargo\registry\src\index.crates.io-6f17d22bba15001f\serde-1.0.163\src\de\impls.rs:586:9: 586:55
    = note: inside `<std::marker::PhantomData<std::string::String> as serde::de::DeserializeSeed<'_>>::deserialize::<&mut dlhn::de::Deserializer<'_, &[u8]>>` at C:\Users\udoprog\.cargo\registry\src\index.crates.io-6f17d22bba15001f\serde-1.0.163\src\de\mod.rs:791:9: 791:37
    = note: inside `<dlhn::de::StructDeserializer<'_, '_, &[u8]> as serde::de::MapAccess<'_>>::next_value_seed::<std::marker::PhantomData<std::string::String>>` at C:\Users\udoprog\.cargo\registry\src\index.crates.io-6f17d22bba15001f\dlhn-0.1.5\src\de.rs:457:9: 457:50
    = note: inside `<dlhn::de::StructDeserializer<'_, '_, &[u8]> as serde::de::MapAccess<'_>>::next_value::<std::string::String>` at C:\Users\udoprog\.cargo\registry\src\index.crates.io-6f17d22bba15001f\serde-1.0.163\src\de\mod.rs:1870:9: 1870:42
note: inside `<musli_tests::models::_::<impl serde::de::Deserialize<'de> for musli_tests::models::Allocated>::deserialize::__Visitor<'_> as serde::de::Visitor<'_>>::visit_map::<dlhn::de::StructDeserializer<'_, '_, &[u8]>>`
   --> D:\Repo\projects\repos\musli\crates\musli-tests\src\models.rs:106:49

This is the problematic line: https://github.com/otake84/dlhn/blob/6f25c178a255c93eab6f18aa3ca5e4b11b504380/dlhn/src/de.rs#LL58C12-L58C12.

The cause is fairly straight forward: a vector is being eagerly allocated and zerod, its size is picked verbatim from the byte stream. So a few bytes worth of payload can therefore be used to allocate as much memory as you want to on the target machine without it actually being filled with any input data.

There are several ways to mitigate this:

  • Apply a limit to how much you pre-allocate when allocating data structures. They will then asymptotically grow as they're being filled with input. This is the most common approach, and this way in order to actually cause huge allocations, a huge amount of input data has to be processed as well which usually is already limited.
  • You can apply a limit to how much you'll allocate in total. But that's hard with rust allocation APIs being unstable right now. I will certainly do this when allocator apis are stable.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.