fxamacker / cbor Goto Github PK
View Code? Open in Web Editor NEWCBOR codec (RFC 8949) with CBOR tags, Go struct tags (toarray, keyasint, omitempty), float64/32/16, big.Int, and fuzz tested billions of execs.
License: MIT License
CBOR codec (RFC 8949) with CBOR tags, Go struct tags (toarray, keyasint, omitempty), float64/32/16, big.Int, and fuzz tested billions of execs.
License: MIT License
Add Marshaler and Unmarshaler interfaces to let user-defined types implement their own CBOR encoding and decoding.
This is needed by issue #11.
It would be easier and cleaner to do this in v2.0 instead of v1.4 due to SemVer policy.
RFC 7049bis indicates additional considerations for encoding not mentioned in 7049, so future protocols are likely to use new encoding modes.
One way to handle options would be to use some integer "enums" to specify different aspects of encoding modes. So 6-7+ of these aspects can combine to specify any current or future mode. And each is an integer so they can have new values made available as needed.
Something like the following (with better names than this rough draft):
And one or more aspects for each of these:
So a combination of these options can handle all existing CBOR modes plus some future modes that don't exist yet.
Allow CBOR integer value to be decoded to Go float. This is currently denied by the library.
Also, see kanban regarding optional new feature involving floats. It might be convenient for you to implement everything float-related in one shot if you agree with this change.
Adding Unmarshaler decreased decoding speed in commit 1a29187. This affects all data types.
Make it faster by caching the result of reflect pkg's Type.Implements()
.
I have two struct like this
type IdsMapping struct {
UserIdIndexDict map[float64]float64
IndexUserIdDict map[float64]float64
ItemIdIndexDict map[float64]float64
indexItemIdDict map[float64]float64
}
type LFM struct {
classCount int
iterCount int
featureName string
labelName string
lr float64
lam float64
UserItemRatingMatrix *mat.Dense
ModelPfactor *mat.Dense
ModelQfactor *mat.Dense
RatingDf *dataframe.DataFrame
UseridItemidDict map[float64]map[float64]float64
UseridSet *gset.Set
ItemidSet *gset.Set
IdsMapping *IdsMapping
UserIndexItemIndexDict map[float64]map[float64]float64
}
I want to use cbor to serilazition LFM struct ,how to use it ,because the struct is complex ,thx
Valid() should return SyntaxError instead of SemanticError when:
Probably no difference except for logs. So this won't require a bugfix release.
When decoding into a Go map with interface{} as key type, reject CBOR data if it can't be used as Go map key.
Is there any interest in supporting special handling for types that know how to serialize themselves into a binary representation? encoding/json
supports calling a given object's encoding.TextMarshaler
and encoding.TextUnmarshaler
implementations if provided, as does the go-codec library.
For CBOR the binary equivalents probably make more sense. I've started poking at this in a local branch, and while the encoding side was straightforward to add, the decoding side is a bit more involved due to parseInterface
.
I created a pull request to add your library to cbor.io. There are a few open questions you might be more qualified to answer than me: cbor/cbor.github.io#56 (comment)
Please jump in. :-)
When float16 numbers are subnomal (exponent 0, significand โ 0), decoded values are incorrect.
This problem only affects subnormal float16 numbers during decoding.
Fix this by replacing float16 to float32 conversion function with a new implementation that is verified 100% correct for all 65536 possible conversions.
Encoding struct with null pointer to non-embedded struct works as expected.
Problem only affects null pointer to embedded struct.
What version?
go v1.12.12
cbor v1.2.0 (likely affects older versions too)
What did you do?
type (
T1 struct {
N int
}
T2 struct {
*T1
}
)
v := T2{}
cborData, err := cbor.Marshal(v, cbor.EncOptions{}
What did you expect to see?
cborData = []byte{0xa0}
err = nil
What did you see instead?
cborData = nil
err = "cbor: cannot set embedded pointer to unexported struct: *cbor_test.T1"
It would be nice to be able to decode CBOR maps with integer keys to Go struct, and vice versa.
This feature would speed up and simplify using this library for COSE. It would also speed up WebAuthn since that uses COSE.
This would help users of older versions evaluate new releases. And help prevent undetected performance regressions.
benchmark | old ns/op | new ns/op | delta |
---|---|---|---|
BenchmarkFoo | 523 | 68.6 | -86.88% |
benchmark | old allocs | new allocs | delta |
---|---|---|---|
BenchmarkFoo | 3 | 1 | -66.67% |
benchmark | old bytes | new bytes | delta |
---|---|---|---|
BenchmarkFoo | 80 | 48 | -40.00% |
v1.3.3 looks solid, passed 220+ million execs fuzzing with 1000+ corpus files as starting point.
workers: 2, corpus: 1071 (18h46m ago), crashers: 0, restarts: 1/10000, execs: 227582949 (3368/sec), cover: 2011, uptime: 18h46m
The charts don't need 3 bars for each aspect being compared. Just use default build settings for every library compared for every comparison.
I'll post updated speed comparison and size comparison charts as comments to this issue.
Timing is good to bump major version since adding CBOR Tags is a big change and there's a need for extended options based on 7049bis. E.g. newer encoding modes and error handling options.
As discussed, renaming v1.4 to v2.0 allows extended option handling to be done in a cleaner and simpler way without dragging along the old options struct.
Improved option handling can be designed with 7049bis and "generic" CBOR library in mind, so it should be more future-proof than the current design.
RFC 2026 says,
Under no circumstances should an Internet-Draft be referenced by any paper, report, or Request-for-Proposal, nor should a vendor claim compliance with an Internet-Draft.
Update README.md:
As discussed in #62 this can be a small release in v1.x before merging in CBOR tags feature.
The combination of SortMode and ShortestFloat can be used to specify these modes in v1:
type SortMode int
const (
SortNone SortMode = 0 // no sorting
SortLengthFirst SortMode = 1 // RFC 7049 Canonical
SortBytewiseLexical SortMode = 2 // RFC 7049bis Bytewise Lexicographic
SortCanonical SortMode = SortLengthFirst
SortCTAP2 SortMode = SortBytewiseLexical
SortCoreDeterministic SortMode = SortBytewiseLexical
)
type ShortestFloat int
const (
ShortestFloatNone ShortestFloat = 0 // no change
ShortestFloat16 ShortestFloat = 1 // float16 as shortest form of float that preserves value
ShortestFloat32 ShortestFloat = 2 // float32 as shortest form of float that preserves value
ShortestFloat64 ShortestFloat = 3 // float64 as shortest form of float (this may convert from float32 to float64, etc.)
)
Don't allow CBOR byte string (major type 2) as input to Go's Time.UnmarshalBinary.
Time values should only be encoded/decoded using these CBOR data types: pos or neg integer, float, and text string.
For more info, see RFC 7049 section 2.4.1.
RFC 7049bis describes three kinds of malformed CBOR data:
I created 87 unit tests based on kind 2 and kind 3 and v1.3.2 passed all of them. ๐
Kind 1 is only an error when the application (not CBOR library) assumes that the input bytes would span exactly one data item.
RFC 7049bis Appendix G:
Too much data: There are input bytes left that were not consumed. This is only an error if the application assumed that the input bytes would span exactly one data item. Where the application uses the self-delimiting nature of CBOR encoding to permit additional data after the data item, as is for example done in CBOR sequences [I-D.ietf-cbor-sequence], the CBOR decoder can simply indicate what part of the input has not been consumed.
Need to review the tests and commit.
For example:
For v1, security fixes are provided only for the latest released version since the API won't break compatibility.
To report security vulnerabilities, please email [email protected] and allow time for the problem to be resolved before reporting it to the public.
Please consider resolving issue discussed before adding new features like CBOR tags, optimizing for speed, or refactoring to rename functions, etc.
Since WebAuthn requires CBOR, add some benchmarks using typical WebAuthn data.
The following code returns error "cbor: cannot unmarshal array into Go value of type interface {}"
s := "hello"
var v interface{} = s
cbor.Unmarshal(data, &v)
Decoder should handle not-nil interface in the same way as nil interface by storing appropraite Go type in the interface value. This behavior should be made consistent with encoding/json.
Encoding modes have different aspects, for example:
The encoding mode bools should've been one int since there will be more encoding modes in the future than anticipated.
However, providing an integer encoding mode is also inflexible because it would only support known encoding modes.
Simply deprecate the bool encoding modes and add an integer SortMode to EncOptions.
When using encoding modes that do not shrink/expand floats, the sort mode basically determines simple encoding modes like Canonical or CTAP2 Canonical.
This way, future encoding modes not yet known/named today can be supported by setting the required combination of options in EncOptions.
This issue is closed by commit 3b78ee0 and is the first half of existing issue #74.
A separate issue can be opened to add a NewEncOptions or EncOptions.New function (specifying a known encoding mode) that returns EncOptions having proper values for SortMode, ShortestFloatMode, and etc.
type SortMode int
const (
SortNone SortMode = 0 // no sorting
SortLengthFirst SortMode = 1 // RFC 7049 Canonical
SortBytewiseLexical SortMode = 2 // RFC 7049bis Bytewise Lexicographic
SortCanonical SortMode = SortLengthFirst
SortCTAP2 SortMode = SortBytewiseLexical
SortCoreDeterministic SortMode = SortBytewiseLexical
)
Add support for json
struct field tags whencbor
struct field tags are absent. This helps CBOR be a drop-in replacement for JSON.
Current status:
go test -cover
Most of remaining coverage involves edge-cases and error handling.
This allows:
This makes encoded data more compact and structs are easier to use.
type T struct {
_ struct{} `cbor:",toarray"`
A int `cbor:",omitempty"`
B string `cbor:",omitempty"`
}
Special field "_" is used to specify struct level options, such as "toarray". Any value of T type is encoded as array of 2 elements. "omitempty" is disabled by "toarray" to ensure that the same number of elements are encoded every time.
If "toarray" is omitted, "omitempty" works just like encoding/json
.
I came across this at oasisprotocol/oasis-core@ade6a1b. @Yawning has the best commit messages!
Encoding benchmarks only use canonical CBOR, which is slower. Add benchmarks for the faster encoding.
CTAP2 "canonical CBOR" != RFC 7049 "Canonical CBOR" when map keys have different data types due to sorting rules. If all map keys have same data type, there's no difference.
When "Canonical CBOR" option is specified, this library has been using sorting rules from "Core Deterministic Encoding Requirements" making it effectively sort like CTAP2.
There should be two canonical modes as options: "Canonical CBOR" and "CTAP2 Canonical CBOR". They only differ in sorting rules involving map keys with mixed types.
NOTE: issue edited based on your comments
Data validation should add this rule:
Reject CBOR primitives (major type 7) with additional info 24 if value (next byte) is < 32.
BTW, always verify with RFC 7049 because there's incorrect info about this item even on reputable websites.
Program size comparison should use existing 3rd-party program(s), instead of one you wrote for solely for comparison.
If you want, I'll give it a shot and post result(s) here. I'll find IoT and security related projects because they value size, safety and reliability.
Let me know.
Add CONTRIBUTING.md with info on how to contribute to this project.
My favorite C++ JSON library has a nice one: https://github.com/nlohmann/json/blob/master/.github/CONTRIBUTING.md
Additional performance improvements for milestone v1.3, that is not already covered by issue #15 and #17.
And more if time allows.
This was discovered by using a newer cbor-fuzz after adding "toarray" struct tag to fuzzing.
While fixing this, I added an additional type compatibility check when not using "toarray".
I'll provide a comparison using release versions of CBOR libraries. The input data won't be contrived/biased so it'll be fair and useful.
README says speed isn't a primary design goal and that faster libraries exist, so I think the results will surprise some people.
As documented under Limitations section, CBOR negative integers like -18446744073709551616 are unsupported because they cannot fit into Go's int64.
But instead of returning 0 with err=nil, I think cbor.Unmarshal should return an error when trying to decode these values.
Go's encoding/json handles this scenario by returning json.UnmarshalTypeError. So this library should probably return cbor.UnmarshalTypeError.
Spotted several optimizations while working on milestone v1.3.1.
Speedups should be around 4% or 5% faster for COSE and CWT decoding depending on other changes in the same release.
CWT (RFC 8392) is encoded in CBOR and uses COSE for added app-layer security. It's derived from JSON Web Token.
New type causes panic if float32 is the underlying type. It causes "panic: interface conversion: interface {} is cbor_test.myFloat32, not float32 [recovered]"
type myFloat32 float32
cbor.Marshal(myFloat32(0.0), cbor.EncOptions{})
Type alias works fine.
type aliasFloat32 = float32
cbor.Marshal(aliasFloat32(0.0), cbor.EncOptions{})
UPDATED on Jan 30, 2020:
I've been working on adding support for CBOR tags and made good progress. I moved this feature from v2.0 to milestone v2.1.
v2.0 will be released on Feb 2, 2020 and v2.1 about 1 week after that.
Chunks are not being checked for UTF-8 validity.
If any chunk has invalid UTF-8 sequence then the entire indefinite-length text string should be rejected.
It would be nice if this library supported something similar to json.RawMessage
which could be used to delay decoding of a field.
When user specifies "Core Deterministic Encoding" mode, this library should:
For example, try to convert float64 to float32, if that works, try to convert float32 to float16.
This mode makes serialized data more compact when it contains floating point numbers.
Enforce this in data validation.
RFC 7049 2.2.2
For indefinite-length byte strings, every data item (chunk) between
the indefinite-length indicator and the "break" MUST be a definite-
length byte string item; if the parser sees any item type other than
a byte string before it sees the "break", it is an error.
This also affects indefinite length text string.
Chunks with tags inside CBOR strings (major type 2 and 3) should be treated as malformed.
For example, DataErrReader, HalfReader, TimeoutReader, etc.
https://golang.org/pkg/testing/iotest/
Does it make sense for you to directly support time.Time struct fields in this lib? This guy supports it: https://github.com/ugorji/go/blob/02994ffebd8e7ef482130d13b570c8be0968b790/codec/cbor.go#L222
Add this to Usage section today? Last code-related commit has been fuzzing nonstop for days without single crash and speed is fast with this feature!
How to Decode SenML with fxamacker/cbor v1.3
// RFC 8428 says, "The data is structured as a single array that
// contains a series of SenML Records that can each contain fields"
// fxamacker/cbor v1.3 has "keyasint" struct tag (ideal for SenML)
type SenMLRecord struct {
BaseName string `cbor:"-2,keyasint,omitempty"`
BaseTime float64 `cbor:"-3,keyasint,omitempty"`
BaseUnit string `cbor:"-4,keyasint,omitempty"`
BaseValue float64 `cbor:"-5,keyasint,omitempty"`
BaseSum float64 `cbor:"-6,keyasint,omitempty"`
BaseVersion int `cbor:"-1,keyasint,omitempty"`
Name string `cbor:"0,keyasint,omitempty"`
Unit string `cbor:"1,keyasint,omitempty"`
Value float64 `cbor:"2,keyasint,omitempty"`
ValueS string `cbor:"3,keyasint,omitempty"`
ValueB bool `cbor:"4,keyasint,omitempty"`
ValueD []byte `cbor:"8,keyasint,omitempty"`
Sum float64 `cbor:"5,keyasint,omitempty"`
Time int `cbor:"6,keyasint,omitempty"`
UpdateTime float64 `cbor:"7,keyasint,omitempty"`
}
// When cborData is a []byte containing SenML,
// it can easily be decoded into a []SenMLRecord.
var v []SenMLRecord
if err := cbor.Unmarshal(cborData, &v); err != nil {
t.Fatal("Unmarshal:", err)
}
// That's it! Decoding speed is fast and v contains easy to use SenML Records.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.