jam1garner / binrw Goto Github PK

View Code? Open in Web Editor NEW

489.0 489.0 31.0 1.46 MB

A Rust crate for helping parse and rebuild binary data using ✨macro magic✨.

Home Page: https://binrw.rs

License: MIT License

Rust 98.81% Shell 1.19%

binrw's People

Contributors

$octylfractal avatar$

Stargazers

Watchers

binrw's Issues

Error diagnostic span bug megathread

Alright, you've all seen this before:

  |
3 | #[derive(BinWrite { .. })]
  |                   ++++++

rustc thinks some code binrw generates is causing an error due to some type or something incorrect in your attribute code. Very inconvenient! In order to fix this we need to use quote_spanned! a lot, in order to forcefully make rustc point to the right code.

This is, of course, a sisyphean task. In order to better handle this we're going to keep track of it in a checklist here! Just comment to get a new problem added, with any additional details you want to give.

The list:

pad_size_to incorrect type (see #30)
BinRead not implemented for a field (see #30)

This thread replaces #30

Add functions for allowing `FilePtr` to take a custom parser

FilePtr::parse_with - takes a custom parser that returns T and treats it as a FilePtr<Ptr, T>, resulting in a field type of T
FilePtr::with - takes a custom parser that returns T and treats it as a FilePtr<Ptr, T>, resulting in a field type of FilePtr<Ptr, T>

Implement fake specialization optimization for BinWrite for &[u8]/Vec<u8>

Extendible ReadOptions (and eventually equiv for BinWrite, maybe)

I've been putting off writing an issue about this for a while >_>

ReadOptions has been referred to as this library's bag of cheats. I have a couple of things I want to tackle there, but I also want users to be able to add their own cheats, for whenever passing something down through arguments would end up either impossible or not very ergonomic.

From now on I'm going to refer to them as Options, as I think they could be useful for both reading and writing.

Motivating Usecases

it just occurred to me that these are both related to FilePtr. FilePtr really is the problem child, eh? :P

FilePtr offsets

FilePtr currently lets you set an offset that gets added to whatever is read as part of the pointer. This is really cool if the offset is consistent throughout the entire file, but a lot of the time it has to change.

However, if we were able to store multiple different offsets, we could give different pointers different types depending on which offset they should be relative to.

FilePtr serialization

Basically, allowing the user to implement it themselves in an ergonomic way. I touched on this briefly in #4 , i want to put a more concrete example in here, but for now i'm too tired to cover it.

Version numbers (new, edited in)

If you have a version number in the header of a file, and it ends up affecting details several layers deep, it would be nice to not have to thread the version number as an argument throughout the entire structure.

Possible Implementation

Last time I was implementing this I came up with the idea of turning Options into a kind of typemap. You have a type, the map stores a value of that type, you index into the map by using the type. Naive implementation involves a heap, complicated type-system abuse can work entirely using the stack. (I wrote https://github.com/kitlith/typemap_core/ for this purpose)

unfortunately the creative solution ends up running into issues once we get to map and maybe parse_with on the macro side of things, though there's almost certainly a way around it, it just compromises cleanliness/ergonomics a bit.

I need to come back to this and give it some more explanation, but for now here's the branch i was implementing things on: https://github.com/kitlith/binrw/tree/option_3

Add proper diagnostic when an integer is passed to the magic attribute with an unspecified type

#[binrw]
#[br(little, magic = 1)]
struct Header {
    x: u8,
}

error: expected type, found `;`
 --> src/main.rs:5:1
  |
5 | #[binrw]
  | ^^^^^^^^
  | |
  | expected type
  | while parsing this item list starting here
  | the item list ends here
  |
  = note: this error originates in the attribute macro `binrw` (in Nightly builds, run with -Z macro-backtrace for more info)

error[E0046]: not all trait items implemented, missing: `MagicType`, `MAGIC`
 --> src/main.rs:5:1
  |
5 | #[binrw]
  | ^^^^^^^^ missing `MagicType`, `MAGIC` in implementation
  |
  = note: this error originates in the attribute macro `binrw` (in Nightly builds, run with -Z macro-backtrace for more info)
  = help: implement the missing item: `type MagicType = Type;`
  = help: implement the missing item: `const MAGIC: <Self as HasMagic>::MagicType = value;`

binrw version: 0.8.0

Improve documentation for `BinrwNamedArgs`

Similar to #75, errors relating to improperly passing arguments can be confusing to binrw beginners due to the fact all the named types currently have no/minimal documentation. The goal of such a documentation improvement would be to help give users the information needed to solve errors that would reference the trait.

Add test to ensure field-level attributes are rejected when top-level map is applied

additionally this may require actually applying a fix

Named Arguments (or other solutions)

Background

ReadOptions contains count, which I think should be an argument to Vec and similar instead of beping part of implicit parser state. We can already write this as type Args = (usize, <T as BinRead>::Args);, but that results in #[br(args(3, ()))] (for i.e. Vec), and @jam1garner prefers #[br(count = 3)].

Option 1 (keep argument system as-is)

arguments stay the way they currently are. snover appears (appeared?) to prefer this option

We may be able to help differentiate the above case with something like #[br(args(VecCount(3), ()))]

Variants

Ordered Named Arguments

add const ArgNames: &'static [&'static str]; to the BinRead trait,
use static_assert / const_assert to check that the name for each callsite matches the name at the appropriate index in the implementation
perhaps, like the following variation, providing that name could be optional?

Optional Unchecked Names

allow user to specify variable names that are explicitly ignored.
This way the user gets to clarify their code whenever and however they like,
and the implementation stays simple.

i.e. the goal with this implementation is that all of the following would work fine

#[br(args(count = 3, inner = ()))]
#[br(args(3, ()))]
#[br(args(whatever = 3, ()))]

Option 2 (named arguments)

arguments become unordered, passed by name.
argument structs are autogenerated by the derive macro
calling syntax looks something like this (for Vec<Vec<u8>>): #[br(args(count: 3, inner: args! { count: 6} ))]
which de-sugars to:

{
    let mut args = Default::default();
    args.count = 3;
    {
        let args = &mut args.inner;
        args.count = 6;
    }
    args
}

thoughts: would we want to allow both ordered and unordered arguments? would we want to allow users to use both in a single arguments list?

Any Other Options?

perhaps named arguments are still overkill for this scenario, idk. is there some other way of solving the original reason for investigating this?

Add binrw attribute macro

i.e. #[binrw] would be equivalent to applying both #[binread] and #[binwrite]

Improve error handling and recovery

(1) Error recovery by rewinding the stream on parse failure is implemented only in the derive and the default impls. Requiring manual implementations to handle this themselves is error-prone, remembering to consistently implement it BinRead itself is error-prone, and manually tracking and reporting the position of the field/struct/variant which failed to be read successfully is error-prone. Consider something like splitting the API so read_options becomes something like:

fn read_options<R: io::Read + io::Seek>(reader: &mut R, options: &ReadOptions, args: Self::Args) -> BinResult<Self> {
    let pos = reader.seek(SeekFrom::Current(0))?;
    read_options_impl(reader, options, args).map_err(|mut err| {
        if let Some(seek_error) = reader.seek(SeekFrom::Start(pos)) {
            err.append(seek_error);
        }
        err
    })
}

…and then read_options_impl (bikeshed name) is where the actual implementation goes, so any error recovery can be fully uniform (and more options could be made available if someone requests them, e.g. if someone prefers to have a stream end in an undefined state or receive no position information on error in exchange for fewer seeks).

The code example above also illustrates issue (2): The process of attempting error recovery may generate more errors, and there is no way to handle this right now without losing one error or the other. Also, read failures often occur deep within several layers of objects, and this contextual information is lost too. (An example of a good message in a cascading read failure situation might be something like: “Failed to load model "player": failed to load texture "face" at 4567: unexpected end of file reading field 'blend_mode'”.) Rust unstable backtrace feature is helpful in being able to see the call stack, but often in resource loading you’ve got a tree of dependencies which reuse the same types, and they get data from parents via args which cause different things to happen, and being able to capture all that information is key to being able to troubleshoot failures. See e.g. anyhow’s Context trait.

(3) Creating binread errors manually is verbose and could probably be easier.

(4) The Custom error variant is not great. I have to look to see if there is a good way to deal with this but right now the display output of the error is always just “Any” and that is completely unhelpful.

(5) parse_with with invalid function signature points to derive instead of the parse_with span

(6) non-primitive cast: Option<u32> as usize for count directive points to derive instead of count span

(7) args with bad types points to derive instead of the bad arg

(8) map_err?

Move br(count) attribute to desugar to binrw::helpers::count

Unlike the count attribute, which is limited to supporting Vec<T>, the count helper function works for any type which implements FromIterator<T>.

Upsides:

Less magic implementation, more type safe
Adds support for more types

Possible pitfalls:

Perf regression due to the lack of "fake specialization" provided for Vec<u8>
Breaking some existing use-case maybe?

Add no_std implementation of io::Write for io::Cursor

Implementation instructions:

All the work that's needed will be in this file.

Add Write to the import at the top and then implement it for:

Cursor<&mut [u8]>
Cursor<&mut Vec<u8>>
Cursor<Vec<u8>>
Cursor<Box<[u8]>>

Why specifically these individual types and not something more generic like AsMut<[u8]>? Well... the implementations for Vec (and &mut Vec) allow for appending data (which AsMut<[u8]> wouldn't be able to do while also conflicting with any special-cased Vec implementation that could be written).

Even the standard library itself has the issue (and it's the interface we need to be compatible against anyways):

the actual implementations should be rather straightforward and can likely be mostly a copy/paste job from std (source for std implementation)

If anyone wants to tackle this issue but needs help feel free to contact me on discord

Improve error message when arguments aren't passed to a type

If you miss a count attribute on a Vec right now you get the following:

error[E0277]: the trait bound `VecArgs<()>: Default` is not satisfied
   --> src/segment.rs:34:10
    |
34  | #[derive(BinRead)]
    |          ^^^^^^^ the trait `Default` is not implemented for `VecArgs<()>`
    |
note: required by `std::default::Default::default`
   --> /home/life/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/default.rs:116:5
    |
116 |     fn default() -> Self;
    |     ^^^^^^^^^^^^^^^^^^^^^
    = note: this error originates in the derive macro `BinRead` (in Nightly builds, run with -Z macro-backtrace for more info)

this is not good enough to lead users to how to fix their error. Needs improvement.

Needs:

a more specialized message. there's likely some trick to get this error to display the name of a trait or something that gives more information in the form of a NameLikeThisWhichTellsYouWhatsGoingOn
better span information to point to the field in question

Remove BinRead implementation for char

This is a pretty abusable implementation and is not properly implemented (as it just parses u8 and casts to char), and it a multi-year old hack that serves no purpose but confusion at this point. Similarly to bool there's no clear default behavior/representation, and thus shouldn't be usable without either a mapping function or custom parser

Assessment of attributes

Tracking issue for figuring out how attributes will translate into the binread/binwrite merge. The main focus being which binread attributes should "just work", which can be accessible for both but not together, etc.

Key:

br - works for binread only
bw - works for binwrite only
br, bw - works for both, however a different value will need to be provided for each (example: map is a one way operation, so two maps will need to be provided to convert in both directions)
brw - works for both using a single implementation. One example could be "align", where a single alignment value would allow for both reading and writing. Would also allow for separate implementations of br and bw if desired.

binread attributes

attribute	compatibility	more info
big	brw
little	brw
map	br, bw
magic	brw
assert	br, bw	up for debate, usefulness of `brw` questionable
pre_assert	br, bw	up for debate, usefulness of `brw` questionable
import	br, bw	up for debate, usefulness of `brw` questionable
args	br, bw
default	br	possibly should be removed in favor of `ignore`
ignore	brw
temp	br, bw
postprocess_now	br	dependent on the design of the binwrite redesign, this may have some usability
deref_now	br	ditto
restore_position	br, bw
try	br	binwrite likely has no use as the reverse should be the default behavior for `Option<T> where T: BinWrite`
parse_with	br
calc	br, bw
is_little	brw
is_big	brw
offset	br, bw(?)	needs discussion, not sure how this will work
if	br	should "just work" since it reads to an Option
pad_before	brw
pad_after	brw
align_before	brw
align_after	brw
seek_before	brw
pad_size_to	brw
return_all_errors	br(?)	error handling probably needs a rework tbh

binwrite attributes

attribute	compatibility	more info
preprocessor	no	superseded by `map`
postprocessor	???	probably needs a rework
with	bw	rename to `write_with`, name pending
cstr	brw	a useful shortcut that probably should've carried over to binread
utf16	brw	ditto
utf16_null	brw	ditto
align	no	renamed to align_before
pad	no	renamed to pad_before

Bikeshedding syntax for named vs list vs raw arguments and imports

Putting discussion on this here for after #21 is finished and gets merged.

Previously suggested:

Give them all separate names

this is the option that binread started with, since I made raw stuff be args_tuple = ... and import_tuple(name: ty)

i don't like this option in the long run, but i'm willing to accept import_raw(name: ty) as a special case because i'm having a hard time coming up with other syntax for it that doesn't conflict with e.g. list syntax

Same name, different syntax

this is what @csnover proposed:

args { ... } -> named arguments
args(...) -> list arguments
args = ... -> raw arguments
- this doesn't have an easy equivalent for import, because we need both a name and a type.

@jam1garner has proposed making args { ... } and args(...) both be named arguments, and list arguments become handled by raw syntax. Personally, i'm wary that this will cause even more issues for finding ways to make import work.

something i've thought of: if we make it import = pattern, import_ty = ty then it would work, but it feels really awkward to me. guess that shouldn't matter as much as we should be dealing with writing types less often than calling them, but still.

"caculation" typo and broken doc link in write module

binrw/binrw/src/attribute/write.rs

Line 328 in 16ccad4

//! # Caculations

serializing FilePtr

Here's what roblabla had to say (on discord):

actually with seek+write, fileptr seems doable with some global context. Here's the idea: when you get a FilePtr, record the current location, write some zeroes, and schedule a sort of "late serialization" pass. When the structure is fully serialized, all scheduled late serialization pass will run sequentially. That serialization pass will just serialize the underlying structure, seek to the previously saved location, and write the proper offset

that does mean you'd need some sort of heap allocation support to keep track of the serialization passes

but I can't think of a better way.

I had a similar idea awhile back (though I wasn't looking to implement at the library level) but i don't think this'll quite work for some types of files -- specifically those which break it into different sections, and different things go into different sections. i.e.:

struct File {
    info: Vec<Info>,
    data: DataSection
}

struct DataSection {
    // header...
    // data blocks go here
}

struct InfoSection {
    // header...
    list: Vec<Info>,
    // string blocks go here
}

struct Info {
    data_name: FilePtr32<String>, // points into InfoSection
    data_ptr: FilePtr32<Vec<u8>> // points into DataSection
}

under the scheme of the current idea i don't think we have a good way to represent this: we'd write out info section, info, info, info, data section, string, data, string, data, string, data instead of info section, info, info, info, string, string, string, data section, data, data, data. I think this is fixable by adding some sort of marker type Pool with some way of specifying a identifier such that you can do the same thing as the original idea, but for each pool in turn as you come across them while writing the file.

Are there any other methods we might want to consider?

Master WIP Todo List

Top-level map can't access arguments

#[derive(BinRead)]
#[br(import(offset: u64)]
#[br(map = |x: u64| Self(x + offset))]
struct PlusOffset(u64);

The above will yield an error stating it can't find 'offset' within the map function.

Add documentation to `VecArgs`

Fix UI tests

current commit of master doesn't pass CI because of minor rustc changes. should just require re-approving trybuild UI tests but yea

Design BinRead backtrace output

Here is a possible mockup:

Raw text minus ansi escape sequences:

 ╺━━━━━━━━━━━━━━━━━━━━┅ Backtrace ┅━━━━━━━━━━━━━━━━━━━━╸
 
 0: Reader reached the end of the file
     at src/entry.rs:100
  ┄───╮
   98 │  #[br(big)]
   99 │  #[br(calc = index_count)]
  100 ⎬╴ indices: Vec<u32>,
  ┄───╯

 1: While parsing field 'entries' in FileHeader
     at src/file_header.rs:6
  ┄───╮
    5 │  #[br(big, calc = entry_count + 1)]
    6 ⎬╴ entries: Vec<Entry>,
  ┄───╯

 2: While parsing field 'header' in MyFile
     at src/my_file.rs:15
  ┄───╮  
      │  #[br(little)]
   12 │  struct MyFile {
      …
   15 ⎬╴     header: FileHeader,
  ┄───╯

 ╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸

(Excuse github's mono font having horrible box drawing character support, any sane terminal won't have this issue)

And here is what condensed mode will look like:

 0: Reader reached the end of the file
     at src/entry.rs:100
 1: While parsing field 'entries' in FileHeader
     at src/file_header.rs:6
 2: While parsing field 'header' in MyFile
     at src/my_file.rs:15

BinWrite raises internal error if only BinRead field is temp

See bullet 5 in issue #47

Add documentation for writing

which should you use: binrw or binread

At this point, I purely recommend binrw. API churn has slowed down quite a bit and at this point it is primarily a strict superset of binread. Alongside this I have also removed the pre-release warning from binrw as, while not 1.0 yet, it is still ready for actual usage.

BinWrite attribute codegen checklist

Remove hard-dependency on Seek

Ideas:

Split BinRead into BinRead and BinReadSeek? (not the biggest fan of this solution but might be needed)
???

Motivation: requiring a seek bound prevents usage for streaming parsers without faking the seek bound via a newtype

Integration with cargo-binrw

To Be Continued ...

Use spanned tokens

Currently, this:

#[derive(Debug, Eq, PartialEq)]
struct SubTest {}
#[derive(BinRead, Debug, Eq, PartialEq)]
struct Test {
    a: SubTest
}

prints this error:

error[E0277]: the trait bound `SubTest: BinRead` is not satisfied
   --> binrw/tests/derive/struct.rs:514:14
    |
    |     #[derive(BinRead, Debug, Eq, PartialEq)]
    |              ^^^^^^^ the trait `BinRead` is not implemented for `SubTest`
    |

It would be better to have it point to SubType rather than BinRead.
It's an issue with attributes too:

#[derive(BinRead, Debug, Eq, PartialEq)]
struct Test {
    #[br(pad_size_to="a")]
    a: u8
}

prints this error:

error[E0606]: casting `&'static str` as `i64` is invalid
   --> binrw/tests/derive/struct.rs:512:14
    |
    |     #[derive(BinRead, Debug, Eq, PartialEq)]
    |              ^^^^^^^
    |

which is not really helpful if it does not point to "a".

How to use a FilePtr to a vector, where the item count is defined after the pointer.

So I have a binary file format I'm trying to implement. And in this fileformat I have data structured like this:

pub struct Data {
    pub version: u32,

    pub tables_offset: i32, 

    pub number_of_tables: i32,

    ...
}

Now, preferably I'd like to use something like this:

pub struct Data {
    pub version: u32,

    #[br(count=number_of_tables, parse_with=FilePtr::<i32, Vec<Table>>::parse)]
    pub tables: Vec<TableInfo>,

    pub number_of_tables: i32,

    ...
}

Which, of course, isn't working because number_of_tables is defined after the pointer.

Is there a way I can acomplish this? I figure maybe I can store both pointer and count in temporary variables and then read the tables, but I'm not sure how I would send arguments to a custom parser to do that.

Add Vec<T> read-until-EOF and read-until-condition

Read until EOF: jam1garner/binread#36
Read until user-defined condition (terminator, etc.): jam1garner/binread#34

This blocks on #5 since the primary use case for named arguments is to enable better direct passing of options to support stuff like this instead of requiring parser changes for type-specific feature enhancements.

after_parse API is underdefined and inconsistent

The documentation makes it clear that read, read_args, or read_options are the correct calls to construct objects, but they don’t always construct fully-formed objects—only types generated by derive are guaranteed to call after_parse on their children (vs Vec<T>, [T;N], tuples, or other types that manually implement BinRead). Whether something was derived or not is an implementation detail which shouldn’t have extra side-effects like this. Most of the time it does not matter because most types do not use after_parse.

How best to forward unevaluated offsets to child fields?

Hi folks, I'm running into a situation where I have a file pointer in a child structure which has an offset only known after parsing most of the rest of the parent structure:

#[binread(little, import { file_data_offset: u64 })]
pub struct MorrowindFileEntry {
    ...

    #[binread(count = size)]
    #[binread(offset_after = file_data_offset)]
    pub data: FilePtr32<Vec<u8>>,

    ...
}

#[derive_binread]
#[derive(Debug)]
#[binread(little, magic = b"\x00\x01\x00\x00")]
pub struct MorrowindArchive {
    ...

    /// File entries.
    #[binread(count = header.file_count)]
    #[binread(args { inner: args! { file_data_offset: file_data_marker.pos } })]
    pub file_entries: Vec<MorrowindFileEntry>,

    ...

    /// Seek marker for the file data table.
    #[binread(temp)]
    file_data_marker: PosValue<()>,
}

I'm not entirely surprised this doesn't work, because my understanding is that args are evaluated immediately:

error[E0425]: cannot find value `file_data_marker` in this scope
  --> src/lib.rs:98:55
   |
98 |     #[binread(args { inner: args! { file_data_offset: file_data_marker.pos } })]
   |                                                       ^^^^^^^^^^^^^^^^ not found in this scope

But I'm not sure how else to solve this problem. Any input appreciated!

Add error when `#[br(temp)]` or `#[bw(calc)]` is used with the derive macro

This will help users know to move from the derive macros to the attribute macros when they need them.

bw(ignore) does not work

Both of the following examples do work, and fail with the same error (in case there's something unrelated that's invalid about one):

#[binrw]
struct X {
    #[bw(ignore)]
    b: u8,
}

#[binrw]
#[br(map = x)]
struct X {
    #[bw(ignore)]
    b: u8,
}

fn x(a: u8) -> X {
    X { b: 1 }
}

error[E0283]: type annotations needed
  --> src/main.rs:3:1
   |
3  | #[binrw]
   | ^^^^^^^^ consider giving this closure parameter a type
   |
  ::: /Users/winter/.cargo/registry/src/github.com-1ecc6299db9ec823/binrw-0.8.0/src/private.rs:98:11
   |
98 |     Args: Clone,
   |           ----- required by this bound in `write_fn_type_hint`
   |
   = note: cannot satisfy `_: Clone`
   = note: this error originates in the attribute macro `binrw` (in Nightly builds, run with -Z macro-backtrace for more info)
help: consider specifying the type arguments in the function call
   |
3  | #[binrw]::<T, WriterFn, Writer, Args>
   |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

binrw version: 0.8.0

f64 field reading as u64

Hi @jam1garner ,
Thanks for this crate, it is super efficient as I have many structures/headers in the file I want to parse (26 in total with various complexity, using byteorder or nom would be crazy). One example:

#[derive(Debug, Clone, BinRead)]
#[br(little)]
pub struct Cc4Block {
    cc_id: [u8; 4], 
    reserved: [u8; 4],
    cc_len: u64,
    cc_links: u64, 
    cc_tx_name: i64, 
    cc_md_unit: i64, 
    cc_md_comment: i64,
    cc_cc_inverse: i64,
    #[br(if(cc_links > 4), little, count = cc_links - 4)]
    pub cc_ref: Vec<i64>,
    pub cc_type: u8, 
    cc_precision: u8, 
    cc_flags: u16, 
    cc_ref_count: u16, 
    cc_val_count: u16, 
    cc_phy_range_min: f64, 
    cc_phy_range_max: f64, 
    #[br(if(cc_val_count > 0 && cc_type < 11), little, count = cc_val_count)]
    pub cc_val_real: Vec<f64>, 
    #[br(if(cc_val_count > 0 && cc_type == 11), little, count = cc_val_count)]
    pub cc_val_uint: Vec<u64>,
}

In this case, parsing goes correctly but the f64 values are actually read as u64 (vect or not). For instance, I have 4598571536312485020 instead of 0.5
I used cargo expand to understand what could go wrong but the variables are properly declared as f64.
I tried also with BinRead and behavior is same -> issue is at a lower level, but I cannot figure out where from the expanded code. Would you have any advice ?

Second point, minor issue, while I was trying to switch from binread to binrw, the rust-analyzer is giving me errors which are actually not raised at compile :

range end index 8 out of range for slice of length 4

I guess this is related to the array [u8; 4]

Add website for binrw

For a while I've wanted this, but haven't made an issue until now as I want to have it ready by 0.9.0's release.

Things needed:

landing page
showcase section
blog(?) (might just keep binrw-related blog posts on jam1.re, unsure)
tutorial section
github/discord/matrix links

The site will be a static site generated using zola. I'll likely use github pages+actions for hosting/deployment.

Sugar for length-prefixed strings

Thanks for this crate, I'm a novice with Rust but this allows me to work with bytes much more cleanly than it would be otherwise, and more similarly to how I would in other languages.

I'm working with a format that encodes strings with a length prefix. In some cases that's a u16, others it's u32.

At the moment I'm defining a struct with a separate length field but it would be nice if there was an attribute that would handle this. But maybe that would be offloading too much to this crate.

This Vec<u8> needs to be manually turned into a String by calling String::from_utf8_lossy().

#[derive(BinRead)]
pub struct MyType {
    string_len: u32,
    #[br(count = string_len)]
    string: Vec<u8>,
}

Fix clippy lints

no_std testing is broken

When CI was enabled, no_std testing was added by just re-including stuff in io which was excluded by replacing #[cfg(not(feature = "std"))] with #[cfg(any(not(feature = "std"), test))]. This does not properly test no_std mode. Other places in the code also compile differently if no_std is turned on. It is necessary to actually run cargo test --no-default-features. Running tests with --no-default-features is broken currently because APIs that rely on the std prologue (format! instead of alloc::format!) were introduced into some tests.

Port dbg attribute to modern binrw

The br(dbg) attribute was an attribute I proposed a while back that never actually landed on master. I would still be interested in it, and for the most part the process of porting it to work using the current codebase shouldn't actually require significant changes.

Original PR to base this off: #16

Effectively the goal is to port or reimplement the contents of the PR. If there's any trouble with it feel free to message me in the discord/matrix chat.

Have temp behavior automatically apply across binread/binwrite boundaries

Currently, fields that are temp from binread won't automatically be temp to binwrite, or vice-versa. This results in a behavior where an internal error due to a missing field that one implementation expects that another implementation removed.

In order to fix this, temp-ness will have to propagate across parsing. Fortunately this is simplified by 2 factors:

attributes (as opposed to derives) need to be used for temp behavior
binread/binwrite attributes are forbidden from being used together in favor of just the #[binrw] attribute (attempting to use both with yield a compiler error telling you to use #[binrw]

Recommended Implementation Strategy

Add 'binread_temp: bool' and binwrite_temp: bool fields to:

binrw/binrw_derive/src/parser/write/field_level_attrs.rs

Lines 17 to 20 in 4c69cff

 pub(crate) struct StructField { 

 pub(crate) ident: syn::Ident, 

 pub(crate) generated_ident: bool, 

 pub(crate) ty: syn::Type,

and

binrw/binrw_derive/src/parser/read/field_level_attrs.rs

Lines 17 to 20 in 4c69cff

 pub(crate) struct StructField { 

 pub(crate) ident: syn::Ident, 

 pub(crate) generated_ident: bool, 

 pub(crate) ty: syn::Type,

(Note! this field should represent the opposite attribute's temp-ness for the field. so write::StructField should have a binread_temp field and read::StructField should have a binwrite_temp field)

Add a method to each StructField for checking if a field is temporary. For binread this should look for 'temp' and for binwrite this should look for a write_mode of 'Calc' (and maybe other things in the future). Also have this check the previously declared binread_temp/binwrite_temp field in order to ensure those indicate a field should be treated as temp.
Replace every usage of .temp.is_some() with the aforementioned method. To help with this either use "find references to field" from Rust analyzer or just rename the temp/write_mode fields and use the errors to know where to modify.
The following code handles the parsing of the binread and binwrite inputs, as the two shouldn't know about each other in order to simplify codegen:

binrw/binrw_derive/src/binrw_attr.rs

Lines 14 to 15 in 4c69cff

 let (binread_input, generated_impl_rd) = binread::derive_from_input(&derive_input); 

 let (binwrite_input, generated_impl_wr) = binwrite::derive_from_input(&derive_input);

In order to inform them of each other's temp-ness, iterate over the fields and mutate the corresponding fields' {binread,binwrite}_temp field to indicate the temp-ness of one another.

(Bonus, covers #46) In the case of binwrite, if a field is shown to be temp but isn't write_mode == Calc, a compiler error should be emitted. In order to do this, modify the following else clause:

binrw/binrw_derive/src/codegen/write_options/struct.rs

Lines 196 to 205 in 4c69cff

 let initialize = if let WriteMode::Calc(expr) = &self.field.write_mode { 

 Some({ 

 let ty = &self.field.ty; 

 quote! { 

 let #name: #ty = #expr; 

 } 

 }) 

 } else { 

 None 

 };

In order to do this, add an else if to check the self.field.binread_temp field, and if so, return the following from the else if block:

            Some({
                let ty = &self.field.ty;
                quote! {
                    let #name: #ty = compile_error!(concat!(
                        "The field ",
                        stringify!(#name),
                        " is temp but not provided a value in its BinWrite derivation. Try using `bw(calc = ...)]`"));
                }
            })

This is equivalent to if the user used #[bw(calc = compile_error!("..."))] and thus not raise any additional errors due to the field being missing on the binwrite side of things.

Testing

An example of some code that should be allowed after this change that currently does not compile:

    #[binrw]
    #[bw(import { x: u32, _y: u8 })]
    struct Test {
        #[br(ignore)]
        #[bw(calc = x)]
        x_copy: u32,
    }

You may add other tests if you so choose.

Verify that all documentation links are correct/working

See the following comment:

#7 (comment)

Add top-level `args` attribute for use with top-level map/try_map

In order to pass arguments to the input being passed to a top-level map function this is necessary.

#[derive_binread]
#[br(import { b: u64 })]
struct SomeOtherStruct {
  // ...
}

#[derive_binread]
#[br(import { a: u32, b: u64 })]
#[br(args { b })]
#[br(map = |x: SomeOtherStruct| SomeStruct {/* ... */} )]
struct SomeStruct {
  // ...
}

Document order of operations for attributes

Implementation notes:

Add a section at the bottom of the corresponding attributes page
Link to it from a sentence at the bottom of the attribute overview tables

Reading a C-like enum with magics of different types no longer compiles

#[binread]
enum Test {
    #[br(magic = b"abc")] // <-- [u8; 3]
    Abc,

    #[br(magic = b"ab")] // <-- [u8; 2]
    Ab,

    #[br(magic = b"a")] // <-- [u8; 1]
    A,
}

At some point while re-organizing, the above has stopped working. This type is non-trivial to handle as it requires storing the start position and restarting for each variant.

Relevant location of the code:

binrw/binrw_derive/src/codegen/read_options/enum.rs

Lines 63 to 89 in 2e73f33

 let matches = variants.iter().filter_map(|field| { 

 if let Some(magic) = &field.magic { 

 let ident = &field.ident; 

 let magic = magic.match_value(); 

 let condition = if field.pre_assertions.is_empty() { 

 quote! { #magic } 

 } else { 

 let pre_assertions = field.pre_assertions.iter().map(|assert| &assert.condition); 

 quote! { #magic if true #(&& (#pre_assertions))* } 

 }; 

 Some(quote! { #condition => Ok(Self::#ident) }) 

 } else { 

 None 

 } 

 }); 

 let amp = en 

 .expected_field_magic 

 .as_ref() 

 .map(|magic| magic.add_ref()); 

 quote! { 

 match #amp#READ_METHOD(#READER, #OPT, ())? { 

 #(#matches,)* 

 _ => Err(#BIN_ERROR::NoVariantMatch { pos: #POS }) 

 } 

 }

The solution should not involve removing the above code, as it was derived from the fast-path of the previous code, but rather first checking if all the magic literal types (including byte string lengths) are identical (and if so continue using the above code as it will be much faster).

If not all literal types are the same, each type should attempt to be parsed one after another until one successfully parses. The general structure can be borrowed from the following code: