sharksforarms / deku Goto Github PK
View Code? Open in Web Editor NEWDeclarative binary reading and writing: bit-level, symmetric, serialization/deserialization
License: Apache License 2.0
Declarative binary reading and writing: bit-level, symmetric, serialization/deserialization
License: Apache License 2.0
Defaults the member to the default of the type and skips reading
Test against bitvec develop
branch:
count
is the only difference and could be added to BitsReader::read
as an Option<usize>
pub struct MyStruct {
ext_len: usize,
#[deku(len_field= "ext_len")]
extensions: Vec<Extension>,
}
Allow something like this and update the ext_len
before dumping bits to acc
For example, passing the top-level endian down to it's child
#[deku(endian = "big")]
struct Parent {
child: Child
}
#[deku(id_type = "u16", ctx = "_endian: deku::ctx::Endian")] // will default to system endianess, no way to use ctx endian
enum Child {
Variant
}
I feel like another name would be better suited, possibly matching the proc-macro's name if that's the convention? DekuRead DekuWrite
For example:
enum Foo {
#[deku(id = "0..=9")]
A(u8)
#[deku(id = "id if id > 9")]
B(u8, u8)
}
For composability, it would be nice to do something like the following:
#[derive(Debug, PartialEq, DekuRead, DekuWrite)]
struct FieldB {
#[deku(bits = "6")]
data: u8,
}
#[derive(Debug, PartialEq, DekuRead, DekuWrite)]
struct DekuTest {
#[deku(bits = "2")]
field_a: u8,
field_b: FieldB
}
I can't think of a reason why there's both id_bits
and bits
, for example there's endian
but not id_endian
(we use endian
for enum)
This would be more consistent with field/structs
Also rename id_type
to type
and id
to value
Before:
#[deku(id_type = "u8", id_bits = "5")]
enum Test {
#[deku(id = "0x01")]
VarA,
}
After
#[deku(type = "u8", bits = "5")]
enum Test {
#[deku(value = "0x01")]
VarA,
}
Instead of specifying arbitrary code in these attributes, only accept a function ident (like serde default) and document the function prototypes for each
Current enum behavior
#[derive(Debug, PartialEq, DekuRead, DekuWrite)]
#[deku(id_type = "u8")]
enum Packet {
#[deku(id = "0x00")]
Zero,
#[deku(id = "0x01")]
One,
#[deku(id = "0x02")]
Two,
#[deku(id = "0x03")]
Three,
#[deku(id = "0x04")]
Four,
}
attribute inherit
which would inherit the id
from the value already assigned.
#[derive(Debug, PartialEq, DekuRead, DekuWrite)]
#[deku(id_type = "u8", inherit)]
enum Packet {
Zero = 0x00,
One = 0x01,
Two = 0x02,
Three = 0x03,
Four = 0x04,
}
attribute ordered
which would take the first element and increase the id
value for each value after that one.
Maybe this would increase for every enum field that didn't have an id
defined.
#[derive(Debug, PartialEq, DekuRead, DekuWrite)]
#[deku(id_type = "u8", ordered = "0x00")]
enum Packet {
Zero,
One,
Two,
Three,
Four,
#[deku(id = "44")]
FourtyFour,
}
A fixed number of elements to be read
Example:
struct Test {
#[deku(count = 2)]
data: Vec<u8>
}
Possibly also rename len
attribute to count_field
?
Currently when an error happend in deive macro it will just panic, we should use syn::Error::to_compile_error()
instead.
For example, an invalid attribute give this message:
error: proc-macro derive panicked
--> src\main.rs:3:10
|
3 | #[derive(DekuRead)]
| ^^^^^^^^
|
= help: message: called `Result::unwrap()` on an `Err` value: Error { kind: UnknownField(ErrorUnknownField { name: "id", did_you_mean: None }), locations: ["b"], span: Some(#0 bytes(79..86)) }
A better message could be:
error: unknown deku field attribute `id`
--> src\main.rs:3:10
|
7| #[deku(id = "")]
| ^^
Allow running a function of the read value
Examples:
struct SomeStruct {
#[deku(map= "|f: u8| f.to_string()")]
field_a: String,
}
fn map_string(f: u8) -> String {
f.to_string()
}
struct SomeStruct {
#[deku(map= "map_string")]
field_a: String,
}
Edit:
Can do something with trait calls like so:
https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=9a711662336e1b0059e40947250b05ff
Currently I don't see a way to use ctx as the bytes to parse an enum (instead of reading bits/bytes). The key part is the category
and length
need to be read before the Messages
are parsed. The category
field is used to decide which struct from an enum is parsed.
#[derive(Debug, PartialEq, DekuRead, DekuWrite)]
struct AsterixPacket {
#[deku(bytes = "1", endian = "big")]
category: u8,
#[deku(bytes = "2", endian = "big")]
length: u16,
#[deku(ctx = "*category")]
messages: Vec<Message>,
}
#[derive(Debug, PartialEq, DekuRead, DekuWrite)]
#[deku(id_ctx = "category")]
enum Message {
#[deku(id = "48")]
Cat48(Cat48),
}
Dependabot couldn't parse the Cargo.toml found at /Cargo.toml
.
The error Dependabot encountered was:
Dependabot::DependencyFileNotParseable
Add a ctx_default
attribute to allow for containers which can both take a context, or default to a fixed context
#[derive(PartialEq, Debug, DekuRead, DekuWrite)]
#[deku(ctx = "a: u8, b: u8", ctx_default = "1, 2")]
pub struct TopLevelCtxStructDefault {
#[deku(cond = "a == 1")]
pub a: Option<u8>
#[deku(cond = "b == 1")]
pub b: Option<u8>,
}
#[test]
fn test_ctx_default_struct() {
let expected = samples::TopLevelCtxStructDefault {
a: Some(0xff),
b: None,
};
let test_data = [0xffu8];
// Use default
let ret_read = samples::TopLevelCtxStructDefault::try_from(test_data.as_ref()).unwrap();
assert_eq!(expected, ret_read);
let ret_write: Vec<u8> = ret_read.try_into().unwrap();
assert_eq!(ret_write, test_data);
// Use context
let (rest, ret_read) =
samples::TopLevelCtxStructDefault::read(test_data.bits(), (1, 2)).unwrap();
assert!(rest.is_empty());
assert_eq!(expected, ret_read);
let ret_write = ret_read.write((1, 2)).unwrap();
assert_eq!(test_data.to_vec(), ret_write.into_vec());
}
It would be nice if there was a way to skip over bits or bytes without creating dummy fields in order to save on space e.g :
pub struct SomeStruct {
field_01: u8,
// This field is useless but is needed for proper read/write
unused01: u8,
field_02: u8,
// This field is useless but is needed for proper read/write
unused02: u8,
}
maybe some skip_[bytes|bits]
that could be added before and after fields in structs :
#[deku(skip_bytes="1")]
In the following line:
deku-derive/src/macros/deku_read.rs:175: return Err(DekuError::Parse(format!("Could not match enum variant id = {:?}", variant_id)));
It would be nice to print out the ident name (so the name of the Enum) for easier troubleshooting.
I would do it, but I can't for the life of me figure out how to print out the ident. New to proc_macros
/quote
Users can implement custom readers but not custom writers
Merged in #37
struct DekuTest {
field_a: u8,
#[deku(reader = "bytes_to_str(rest)")]
field_b: String,
}
You should not need to implement anything on String
as the custom reader handles the parsing
Consider the following code:
use deku::prelude::*;
#[derive(Debug, PartialEq, DekuRead, DekuWrite)]
pub struct Packet {
#[deku(bytes = "1")]
length: u8,
// byte len of all of messages is length - 2
messages: Vec<Message>,
}
/// In the real packet, this would be of variable length, so we can't use just the `count` attribute on messages
#[derive(Debug, PartialEq, DekuRead, DekuWrite)]
pub struct Message {
#[deku(bytes = "1")]
msg: u8,
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test01() {
let data: Vec<u8> = vec![0x04, 0x01, 0x02];
let (_, value) = Packet::from_bytes((data.as_ref(), 0)).unwrap();
assert_eq!(
Packet {
length: 0x04,
messages: vec![Message { msg: 0x01 }, Message { msg: 0x02 }]
},
value
);
}
}
I can use the count
attribute to give the length of Vec<T>
elements, but I see no way of telling the max bytes that a Vec can have in it's container as a whole. Wondering if this would be a feature, or do I need to do some weird custom write/read implementation with a read_bytes
field.
Why a function convert a type to bytes (Vec<u8>
) is to_bytes
but a function convert type to bits(BitVec<Msb0, u8>
) is to_bitvec
? why not to_bits
?
Lines 251 to 255 in c7e0377
If bytes
attribute is used and the index is on a byte boundary, it may be quicker to read from a slice of &[u8] instead of reading 8*n bits.
I'd like for more benchmarks to be written before so this optimization can be measured
One option could be to feature flag the bits/bytes attributes
Hey, I'm trying to write a simple binary parser with Deku, and here is two problems I found.
I read your source and found i can pass whatever arguments to it. Sorry about it
Think this binary structure:
struct Data {
a: u8,
// This field depends on `Header.flag`
b: Option<u8>
}
struct Bin {
flag: u8,
data: Vec<Data>
}
Because the lack of context, I cant find any way to parse it except writing a custom reading function manully. By the way, add context supportting is a little complicated, I still dont know what the best way is.
Overall, thanks for your great crate.
Maybe something like this?
use deku::prelude::*;
#[derive(Debug, PartialEq, DekuRead, DekuWrite)]
struct Packet {
s: String,
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test01() {
// [len | string ]
let data: Vec<u8> = vec![5, 104, 101, 108, 108, 111];
let (_, value) = Packet::from_bytes((data.as_ref(), 0)).unwrap();
assert_eq!(
Packet {
s: "hello".to_string(),
},
value
);
}
}
(len would be an u64 in a real example)
Allows for conditional field reading dependent on the return of a lambda
fn my_condition(input: &[u8], index: usize) {
// TODO: somehow get access to field_a ?
// if (field_a == 0xAB) {
// return true;
// }
return false;
}
#[derive(Debug, PartialEq, DekuRead, DekuWrite)]
struct DekuTest {
field_a: u8,
#[deku(bits = "7", read_if=my_condition)]
field_b: Option<u32>,
}
Not sure the best way to give lambda access to the previously parsed fields
Line 235 in c7e0377
from_bytes(bytes: Bytes)
and from_bits(bits: Bits)
? Why do I need to care about which bit the byte start from when I use a function called from_bytes
?readers/writers should have access to:
rest, struct variables, final attribute variables (bit size, input_is_le)
The provided function can be run in a function sandbox where the needed variables are passed with documented names: i.e.
let variant_read_func = if variant_reader.is_some() {
fn sandbox_reader(rest:, input_is_le:, field_a: field_b:) {
quote! { #variant_reader; }
}
sandbox_reader(rest, input_is_le, field_a, field_b);
}
...
Using criterion
+bonus if compared to other crates like nom
for reading
This will allow us to remove the unwrap
in emit_field_update
Branch created, but is stale... rebase to master
I believe it would be a nice features for child structs and enums to inherit the parents endian type.
Currently the following code produces the following error:
`deku::DekuRead<deku::ctx::Endian>` is not implemented for B
use deku::prelude::*;
#[derive(Debug, PartialEq, DekuRead, DekuWrite)]
#[deku(endian = "big")]
struct Packet {
len: u16,
messages: B,
}
#[derive(Debug, PartialEq, DekuRead, DekuWrite)]
#[deku(id_type = "u8")]
enum B {
#[deku(id = "0x00")]
one,
#[deku(id = "0x01")]
two,
#[deku(id = "0x02")]
three,
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test01() {
let data: Vec<u8> = vec![0x04, 0x13, 0x01];
let (_, value) = Packet::from_bytes((data.as_ref(), 0)).unwrap();
assert_eq!(
Packet {
len: 0x0413,
messages: B::two,
},
value
);
}
}
In fact, the way of creating a compiling version of this code seems a bit odd. As I only add endian to the len
field.
use deku::prelude::*;
#[derive(Debug, PartialEq, DekuRead, DekuWrite)]
struct Packet {
#[deku(endian = "big")]
len: u16,
messages: B,
}
#[derive(Debug, PartialEq, DekuRead, DekuWrite)]
#[deku(id_type = "u8")]
enum B {
#[deku(id = "0x00")]
one,
#[deku(id = "0x01")]
two,
#[deku(id = "0x02")]
three,
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test01() {
let data: Vec<u8> = vec![0x04, 0x13, 0x01];
let (_, value) = Packet::from_bytes((data.as_ref(), 0)).unwrap();
assert_eq!(
Packet {
len: 0x0413,
messages: B::two,
},
value
);
}
}
Document each proc-macro attribute and how they're used
from_bytes
returns (rest, value) while try_from
impls should return Error if not all input is consumed
It's useful when a prototype has a magic header(e.g. zlib, pyc, jpg) or a field has a limit.
struct Foo {
#[deku(assert_eq("[0xAA, 0xBB, 0xCC]"))]
magic: [u8; 3]
#[deku(assert("a >= 128"))]
a: u8,
}
deku/deku-derive/src/macros/deku_read.rs
Line 79 in 26eccca
This seems like it could be fixed by using const
functions. std::mem::size_of() is const
which can all be evaluated at compile time. e.g. :
const fn bit_size() -> usize {
std::mem::size_of::<$typ>() * 8
}
gets called when the struct is .update()
'd, kinda like the len
attribute but provides a custom impl
Example:
pub struct Ipv4 {
....
#[deku(update = "calc_checksum(...)")]
pub checksum: u16, // Header checksum
....
}
Since option_as_tokenstream
uses Option<String>
as its input, darling
will discard the span while parsing(because String
doesn't have a span). It makes error message hard to read.
For example:
#[derive(DekuRead)]
struct Foo {
#[deku(cond = "'a' == 2")]
a: u8,
}
error:
error[E0308]: mismatched types
--> src\main.rs:24:10
|
24 | #[derive(DekuRead)]
| ^^^^^^^^ expected `char`, found `u8`
|
= note: this error originates in a derive macro (in Nightly builds, run with -Z macro-backtrace for more info)
replace it with LitStr
:
error[E0308]: mismatched types
--> src\main.rs:26:19
|
26 | #[deku(cond = "'a' == 2")]
| ^^^^^^^^^^ expected `char`, found `u8`
Great idea of a library.
I have a protocol that needs conditional parsing of fields in a struct. I see the skip attribute, but would something like a conditional skip be possible?
use deku::prelude::*;
use std::convert::TryFrom;
#[derive(PartialEq, Debug, DekuRead, DekuWrite)]
pub struct DekuTest {
pub field_a: u8,
#[deku(skip, if=(field_a, 1))]
pub field_b: Option<u8>,
#[deku(skip, if=(field_b, Some(1)))]
pub field_c: Option<u8>,
}
fn main() {
let data: Vec<u8> = vec![0x01, 0x02];
let value = DekuTest::from_bytes((data.as_ref(), 0)).unwrap();
println!("{:#?}", value)
}
Instead of having an allocation per field, write should take a mut bitvec by ref to extend.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.