near / borsh Goto Github PK
View Code? Open in Web Editor NEWBinary Object Representation Serializer for Hashing
Home Page: https://borsh.io/
Binary Object Representation Serializer for Hashing
Home Page: https://borsh.io/
They should do nothing.
We should add a suit of tests for security of deserialization, specifically:
borsh-derive-internals
is designed for testability. It'd be great to see how well the tests are doing by including code coverage (e.g. using tarpaulin + codecov/coveralls).
https://travis-ci.com/github/near/create-near-app/jobs/356517030#L670-L709
Compiling status-message v0.1.0 (/home/travis/build/near/create-near-app/tmp-project/contract)
error[E0277]: the trait bound `Welcome: borsh::de::BorshDeserialize` is not satisfied
while dependabot is trying to bump borsh from 0.6.2 to 0.7.0 in github/near/create-near-app/common/contracts/rust
not sure what to do about this
As suggested by @vgrichina we need to make sure Rust and JS implementations fail and succeed on exactly the same UTF-8 sequences.
Also, we need to make sure specification explains that implementation must disallow illegal UTF-8 to allow for deterministic roundtrip.
On https://borsh.io link to discord server is not working.
Line: https://github.com/near/borsh/blob/master/docs/index.html#L394
We should also address #26 . The change that I did to array was very suboptimal.
Since Borsh is heavily focused on security, we should use all the available tooling to ensure that we catch as many corner cases as possible.
Miri is an interpreter for Rust's mid-level intermediate representation.
Using Miri is as simple as cargo miri test
, but there are a few quirks:
Is it within the purview of this library to support borrowing byte slices from the deserialization input? zero-copy would be nice, but it'd make codegen ever so slightly more complicated. What's the stance of borsh-rs on features vs size creep?
@vgrichina requested a JSON example of the new schema format
borsh solidity implemented here: https://github.com/near/rainbow-bridge/blob/master/libs-sol/nearbridge/contracts/Borsh.sol
And also need a code generator to generate deserialize/serialize struct XXX. Generated solidity code should look like:
https://github.com/near/rainbow-bridge/blob/master/libs-sol/nearbridge/contracts/NearDecoder.sol#L52
Im looking for something to replace bincode
. It would be great if bosh
have a first class Go support since it is my only blocker! Thanks
Suppose we have rust structure:
#[derive(BorshSerialize, BorshDeserialize)]
struct A {
f1: T1,
f2: T2
}
Suppose we have serialized into some data (e.g. on disk in rocksdb, in contract state, or circulating in network). Then we want to upgrade this structure by adding another field:
#[derive(BorshSerialize, BorshDeserialize)]
struct A {
f1: T1,
f2: T2,
f3: T3
}
It would be extremely convenient for upgradability if we could deserialize old data using new Rust type.
We can introduce #[borsh_optional]
decorator that can be used like this:
#[derive(BorshSerialize, BorshDeserialize)]
struct A {
f1: T1,
f2: T2,
#[borsh_optional]
f3: Option<T3>
}
Then when we deserialize old data with this structure f3
will be None
, but when we deserialize new data using this structure it will be Some
.
It will only work if optional fields are included at the back:
#[derive(BorshSerialize, BorshDeserialize)]
struct A {
f1: T1,
f2: T2,
#[borsh_optional]
f3: Option<T3>,
#[borsh_optional]
f4: Option<T4>,
#[borsh_optional]
f5: Option<T5>
}
And the compilation should fail if the following situations:
#[derive(BorshSerialize, BorshDeserialize)]
struct A {
f1: T1,
f2: T2,
#[borsh_optional]
f3: Option<T3>,
f4: Option<T4>,
#[borsh_optional]
f5: Option<T5>
}
CC @mfornet Since it might be relevant to near/NEPs#95
I propose of a common set of operations that should be implemented by all borsh implementations in different programming languages. Some implementation, like rust has macros, can have additional features such as derive[BorshSerialize]
From borsh user, they're going to use borsh in this way.
write a borsh schema definition that would be common in any language, currently it's in json, but json isn't very friendly human writable, so we may consider yaml, toml, or a rust type definition like DSL. They're all equivalent and can trivially convert to each other. This defines type they want to serialize/deserialize.
borsh should be able to generate these from schema:
People then use the generated source code.
With this schema based approach, each language's borsh implementation is:
@nearmax WDYT? is this how borsh schema suppose to work?
Make sure Borsh either does not work with trait objects entirely (because we don't know the type that we need to deserialize into) or if it works it works correctly.
Currently BorshSchema
has static but not constant methods. These methods when compiled to Wasm occupy significant space. Also, they create a significant execution overhead when self-described borsh deserialization/serialization is called using https://docs.rs/borsh/0.6.2/borsh/schema_helpers/index.html
To fix it we need to implement const version of BorshSchema:: schema_container()
. Unfortunately, it means two things:
schema_container()
cannot return a type that requires allocation.We currently intend to serialize BorshSchemaContainer
using either borsh or JSON. Therefore we can have two versions of schema_container()
:
schema_container_borsh() -> &[u8]
;schema_container_json() -> &str
;BorshSchemaContainer
from it, if necessary.schema_container_json
internally would define const
variables for each type and recursively concatenate them using std::concat
. As the result, each type will have a compile-time computed schema. Similar technique can be used with byte slices and schema_container_borsh
.This will improve performance in the following way:
schema_container_borsh
to prepend an already generated sequence of bytes in front the borsh serialized object. Upon deserialization of that object the helper will check that the schema in the self-described type matches the schema of the type is deserializes it into.Disclaimer: The following performance tests were done using Rust on BPF which is under development.
I noticed that upgrading Borsh from 2.4 to 2.5 is causing a large increase in the number of BPF instructions it takes to serialize and deserialize byte arrays. From a few thousand instructions to 20k+ for a 32 byte array. The performance went from beating to being far worse than bincode.
It looks like the single-byte copies introduced in this PR is the culprit: #20
Instead of copying the entire array with an exend_from_slice
, extend_from_slice
is performed for each byte. It's possible that other rustc targets are optimizing this better then the BPF backend is.
Currently we can't do much to affect how derive(BorshSerialize, Deserialize)
works besides borsh_skip
, however one case is very useful. Consider this big structure in wasmer:
pub struct ModuleInfo {
pub memories: Map<LocalMemoryIndex, MemoryDescriptor>,
pub globals: Map<LocalGlobalIndex, GlobalInit>,
pub tables: Map<LocalTableIndex, TableDescriptor>,
pub imported_functions: Map<ImportedFuncIndex, ImportName>,
pub imported_memories: Map<ImportedMemoryIndex, (ImportName, MemoryDescriptor)>,
pub imported_tables: Map<ImportedTableIndex, (ImportName, TableDescriptor)>,
pub imported_globals: Map<ImportedGlobalIndex, (ImportName, GlobalDescriptor)>,
pub exports: IndexMap<String, ExportIndex>,
pub data_initializers: Vec<DataInitializer>,
pub elem_initializers: Vec<TableInitializer>,
pub start_func: Option<FuncIndex>,
pub func_assoc: Map<FuncIndex, SigIndex>,
pub signatures: Map<SigIndex, FuncSig>,
pub backend: String,
pub namespace_table: StringTable<NamespaceIndex>,
pub name_table: StringTable<NameIndex>,
pub em_symbol_map: Option<HashMap<u32, String>>,
pub custom_sections: HashMap<String, Vec<Vec<u8>>>,
pub generate_debug_info: bool,
#[borsh_skip]
pub(crate) debug_info_manager: jit_debug::JitCodeDebugInfoManager,
}
Every fields in this giant struct can derive BorshSerialize and BorshDeserialize, except one: IndexMap<String, ExportIndex>
, because IndexMap isn't a type defined in this crate, nor it's a type defined in std or borsh, so you cannot
impl BorshSerialize for IndexMap
, but due to this one field, you cannot derive BorshSerialize of the giant struct. There's two workaround of this:
pub exports: IndexMap<String, ExportIndex>,
to pub exports: ExportsMap
and define a struct ExportsMap
enclosing IndexMap, so you can impl BorshSerialize/Deserialize on ExportsMap and make ModuleInfo
Borsh-derivable. But this cause any reference to exports become exports.inner or exports.0borsh_serializer/borsh_deserializer
macro:fn borsh_serialize_index_map<K:BorshSerialize, V:BorshSerialize, W: Write>(index_map: &IndexMap<K,V>, writer: &mut W) -> std::io::Result<()> {
...
}
#[borsh_serializer(borsh_serialize_index_map)]
#[borsh_deserializer(borsh_deserialize_index_map)]
pub exports: IndexMap<String, ExportIndex>,
With help of these macros, user can specify a customize borsh serializer/deserializer to a field of struct, making the whole struct borsh-derivable
We can use struct from https://github.com/koute/serde-bench/blob/master/src/lib.rs
and https://github.com/erickt/rust-serialization-benchmarks/tree/master/rust (cc @frol ) to compare with existing setups.
We can later add benchmarking of BORsh to repos there, but first let's add benchmarking suit here, including some large data structs (e.g. repeated of repeated of repeated fields) to monitor speed.
It is painful to clone the repository (it took me 10 minutes lately).
/docs/criterion
folder is 200MB. Can we trim it down?
See PR which fixes only part of the issue: #42
It'll greatly benefit all implementations for PublicKeys and CryptoHash structs in nearcore
Besides #84 , borsh-c should generate header file / serialization / deserialization c source code given existing schema file. The result c source code can be compiled in a C project. Note, borsh-c itself is not necessarily implemented in C, instead implement in rust probably faster
Vec seems to take too much gas. An easy optimization can be Vec implementation with buffered read similar to strings, but it's unclear how to handle non fixed size types, e.g. Vec
Borsh is not a self-descriptive language, so some languages like JS need to either generate schema (which is currently consumed by borsh-js) or generate the full class implementation in JS.
We could take the following approach. Write cargo extension that would provide command like cargo generate-borsh js
that generates JS classes from Rust classes by walking over the crate looking for types decorated with #[derive(BorshSerialize, BorshDeserialize)]
and dump the generate JS analog while preserving the directory structure. We need to decide what do we want to do with types that implement BorshSerialize
, BorshDeserialize
explicitly. I suggest we skip them delegating JS code to the user. We can use https://crates.io/crates/syn for that.
Another question to discuss. CC @ilblackdragon @vgrichina What is the advantage of generating a schema that is later consumed by BinaryReader
over generating the full class implementation in JS? That it is more human-readable? Is there any performance disadvantage?
a python borsh implementation
Currently JS implementation is in nearlib.
BorshDeserialize for bool and Option accepts arbitrary values where it should only allow 0 or 1.
This makes it possible for an object to have multiple representations which can potentially allow attacks on our usage of borsh.
Currently because we don't know the size of the buffer inside the deserialize
method we can't predict if the length is way too large and should error out (right now it would seg fault due to memory allocation error).
See test_invalid_length
for example.
As described in #83. They're more all less implemented but we want to fully test them and fix bugs encountered during test
We need to write the following fuzz tests for borsh:
A) Generate random type. Creating an object of the type filled with random data. Then serialize it and deserialize it, and compare that structure before and after are the same;
B) Generate random type. Creating an object of the type filled with random data. Serialize it. Randomly flip a subset of bits in the serialized structure. Try deserializing it and assert that it does not panic, but instead either deserializes or returns an error.
The two difficult things to implement would be:
As an option, I suggest we do both using procedural macros. We can have a macro random_type!(Name, X, Y, seed)
that generates a token stream corresponding to a declaration of some type Name
using https://doc.rust-lang.org/reference/procedural-macros.html#function-like-procedural-macros where X
would be the max depth (e.g. if we have nested structures) and Y
is the max width of each node (e.g. max number of fields in a struct or max number of variants in an enum).
Each type would also be decorated with #[derive(RandomInit)]
which implements trait
trait RandomInit {
random_init() -> Self
}
for the type, just like we do with serializers. We then would implement RandomInit
for basic types and collections, just like we do with serializers.
Then our test would be something like:
random_type!(T0, 1, 1, 42);
...
random_type!(T42, 10, 12, 42);
#[test]
fn test0() {
for _ in 0..100 {
let t0 = T0::init_random();
let out_t0: T0 = try_from_slice(&t0.try_to_vec().unwrap()).unwrap();
assert_eq!(t0, out_t0);
}
}
Note should also look at the fuzzing tools that sigma prime wrote for our borsh, we might not need to write it ourselves.
Ideally gives a better performance compare to language native implementation. It should be an optional feature (native extension) that people can use in borsh-js and borsh-python
use borsh::{BorshSerialize, BorshDeserialize};
#[derive(BorshSerialize, BorshDeserialize, PartialEq, Debug)]
struct B {
x: [u8; 20],
y: [u8; 100],
z: String,
}
fn test_simple_struct() {
let b = B {
x: [0; 20],
y: [0; 100],
z: "liber primus".to_string(),
};
let encoded_b = b.try_to_vec().unwrap();
let decoded_b = B::try_from_slice(&encoded_b).unwrap();
assert_eq!(b, decoded_b);
}
fn main() {
test_simple_struct();
}
~/BORSH/borsh-test/src$ cargo run
Compiling borsh-test v0.1.0 (/Users/mrsmith/BORSH/borsh-test)
error[E0277]: the trait bound `[u8; 100]: borsh::BorshDeserialize` is not satisfied
--> src/main.rs:4:26
|
4 | #[derive(BorshSerialize, BorshDeserialize, PartialEq, Debug)]
| ^^^^^^^^^^^^^^^^ the trait `borsh::BorshDeserialize` is not implemented for `[u8; 100]`
|
= help: the following implementations were found:
<[T; 0] as borsh::BorshDeserialize>
<[T; 1024] as borsh::BorshDeserialize>
<[T; 10] as borsh::BorshDeserialize>
<[T; 11] as borsh::BorshDeserialize>
and 36 others
= help: see issue #48214
= note: this error originates in a derive macro (in Nightly builds, run with -Z macro-backtrace for more info)
error[E0277]: the trait bound `[u8; 100]: borsh::BorshSerialize` is not satisfied
--> src/main.rs:4:10
|
4 | #[derive(BorshSerialize, BorshDeserialize, PartialEq, Debug)]
| ^^^^^^^^^^^^^^ the trait `borsh::BorshSerialize` is not implemented for `[u8; 100]`
|
= help: the following implementations were found:
<[T; 0] as borsh::BorshSerialize>
<[T; 1024] as borsh::BorshSerialize>
<[T; 10] as borsh::BorshSerialize>
<[T; 11] as borsh::BorshSerialize>
and 37 others
= help: see issue #48214
= note: this error originates in a derive macro (in Nightly builds, run with -Z macro-backtrace for more info)
error: aborting due to 2 previous errors
For more information about this error, try `rustc --explain E0277`.
error: could not compile `borsh-test`.
To learn more, run the command again with --verbose.
After #33 borsh can serialize fixed sized arrays of any type, but deserialize only byte arrays.
That's inconsistent
I think this line:
borsh/borsh-rs/borsh/src/schema.rs
Line 162 in 50c3c5d
Should be changed to be the same as this line:
borsh/borsh-rs/borsh/src/ser/mod.rs
Line 347 in 50c3c5d
but with the addition of โ0โ as the first element as per the schema line.
If I have the following structs
#[derive(BorshSerialize]
struct A {}
#[derive(BorshSerialize)]
struct B<'a> {
a: &'a [A]
}
The derive for B
will fail even though we have derive for [T]
and &T
if T
implements BorshSerialize
. It fails with "size for value cannot be known at compile time. If I change [A]
to Vec<A>
, it works.
Write a full test suite and a supplementary borsh spec documentation. All borsh implementation should pass this suite.
Doesn't work, because of dependencies on Vec and HashMap:
cargo test --no-default-features
Such as serialize and deserialize unsigned int, int of all size, float, etc. So it's enough to do borsh serialize/deserialize in C.
Since the writer is vector, it will never return Result::Err()
. Since that we're able to not return an Result
. We use the interface quite often everywhere. Right now code is bloated with unnecessary unwrap()
s.
A Vec Write implementation:
https://doc.rust-lang.org/src/std/io/impls.rs.html#339
What do you think @nearmax ?
Add specification of the format to README
Hi, I'm investigating using the Near network for a project I'm working on. Looking through the examples on smart contracts and seeing Borsh, it looks like a really cool serialization format. I'm a bit curious if it plays nicely with Serde-based structs at all.
My use-cases are for using things like chrono::Datetime
and url::Url
which come with serde implementations. I suppose I could wrap these in newtypes and implement Borsh by hand, but I think it would be much easier if Borsh could work on top of serde (as well as having its own derive macros). This would make it easier to use the format with existing libraries and projects. I understand that Borsh layers some new features on top of its own implementation so obviously those would not be available in a serde-driven version.
I'm curious if this is a possibility for the projects near future. Thank you!
Ideally, there would be
and that'd be checked in CI
The following check assumes the length of the serialized slice equals the length of the serialized data: https://github.com/nearprotocol/borsh/blob/c5693fcb8af4636878fa13e8fc622953cf9b4e1e/borsh-rs/borsh/src/de/mod.rs#L15
There are cases where the serialized data may only occupy the first x bytes of a slice (fixed-size data packets for example). In these cases, deserialization will fail and it's impossible for the receiver to know to what size to prune the slice (how much of the slice contain serialized data). For comparison, bincode allows passing slices that are larger than the serialized data.
Can this restriction be lifted?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.